With this page, only the last “chapter” (seems to be related to
@VicM, I fixed that problem. Try again. But I found another problem.
@fivefilters: Images are not shown. They are referenced relative to article path, e.g.
<img src="inline1.jpg">
But the image is not found at
https://www.wired.com/2014/09/coupland-bell-labs/inline1.jpg
but at
https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/inline1.jpg
I think this is because of this html line:
<base href="https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/">
adding an index.html to this base URL also brings the article but all special characters are scrambled-up:
When setting
single_page_link: concat(//base/@href, 'index.html')
to wired.com.txt, I get a failure from FTR 3.9.13, while this works for wallabag (also with wrong special characters)
XML Parsing Error: junk after document element
Location: https://xxxxxx.example.com/makefulltextfeed.php?url=https%3A%2F%2Fwww.wired.com%2F2014%2F09%2Fcoupland-bell-labs%2F&max=3&links=preserve&exc=
Line Number 2, Column 1:
<b>Deprecated</b>: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead in <b>/path-to-ftr/libraries/readability/Readability.php</b> on line <b>145</b><br />
^
Of course, I couldn’t try this with P2K
Do you see any way to activate images here?
Thanks @HolgerAusB. Full-Text RSS already looks for a <base>
element to use when rewriting relative URLs. The problem is that the base element here doesn’t appear inside a <head>
element. That’s where the base element is usually found and it’s where Full-Text RSS looks for the base element.
I think we need to update the code to look for the base element outside of the head element too.