using p2K Chrome Extension, much of wired.com article is dropped

VicM · March 11, 2025, 10:52pm

With this page, only the last “chapter” (seems to be related to

s with class=“chapter”) is included:

HolgerAusB · March 12, 2025, 1:44pm

@VicM, I fixed that problem. Try again. But I found another problem.

@fivefilters: Images are not shown. They are referenced relative to article path, e.g.

<img src="inline1.jpg">

But the image is not found at

https://www.wired.com/2014/09/coupland-bell-labs/inline1.jpg

but at

https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/inline1.jpg

I think this is because of this html line:

<base href="https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/">

adding an index.html to this base URL also brings the article but all special characters are scrambled-up:

When setting

single_page_link: concat(//base/@href, 'index.html')

to wired.com.txt, I get a failure from FTR 3.9.13, while this works for wallabag (also with wrong special characters)

XML Parsing Error: junk after document element
Location: https://xxxxxx.example.com/makefulltextfeed.php?url=https%3A%2F%2Fwww.wired.com%2F2014%2F09%2Fcoupland-bell-labs%2F&max=3&links=preserve&exc=
Line Number 2, Column 1:

<b>Deprecated</b>:  mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead in <b>/path-to-ftr/libraries/readability/Readability.php</b> on line <b>145</b><br />
^

Of course, I couldn’t try this with P2K

Do you see any way to activate images here?

fivefilters · March 14, 2025, 10:26pm

Thanks @HolgerAusB. Full-Text RSS already looks for a <base> element to use when rewriting relative URLs. The problem is that the base element here doesn’t appear inside a <head> element. That’s where the base element is usually found and it’s where Full-Text RSS looks for the base element.

I think we need to update the code to look for the base element outside of the head element too.

fivefilters · April 6, 2025, 7:23pm

Quick update to say we’ve updated the code to look for the base element wherever it may appear, not just the head. So the image URLs should now get resolved properly. @HolgerAusB, @VicM

VicM · April 8, 2025, 3:51am

I’m still not seeing the images, FWIW.

fivefilters · April 16, 2025, 6:31pm

Hi @VicM, you will have to re-send the articles with Push to Kindle to see the images. If you’re opening the article you sent previously, it will not be updated.

Hopefully you see the images in the preview of Push to Kindle…

Push to Kindle preview

VicM · April 16, 2025, 11:13pm

I didn’t send again, because I didn’t see the images in the preview.
I was using Brave, but I just tried it on Chrome, and the extension does retrieve the images there.