using p2K Chrome Extension, much of wired.com article is dropped

With this page, only the last “chapter” (seems to be related to

s with class=“chapter”) is included:

@VicM, I fixed that problem. Try again. But I found another problem.

@fivefilters: Images are not shown. They are referenced relative to article path, e.g.

<img src="inline1.jpg">

But the image is not found at

https://www.wired.com/2014/09/coupland-bell-labs/inline1.jpg

but at

https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/inline1.jpg

I think this is because of this html line:

<base href="https://interactive.wired.com/www-wired-com__2014__09__coupland-bell-labs/">

adding an index.html to this base URL also brings the article but all special characters are scrambled-up:

grafik

When setting

single_page_link: concat(//base/@href, 'index.html')

to wired.com.txt, I get a failure from FTR 3.9.13, while this works for wallabag (also with wrong special characters)

XML Parsing Error: junk after document element
Location: https://xxxxxx.example.com/makefulltextfeed.php?url=https%3A%2F%2Fwww.wired.com%2F2014%2F09%2Fcoupland-bell-labs%2F&max=3&links=preserve&exc=
Line Number 2, Column 1:

<b>Deprecated</b>:  mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead in <b>/path-to-ftr/libraries/readability/Readability.php</b> on line <b>145</b><br />
^

Of course, I couldn’t try this with P2K

Do you see any way to activate images here?

Thanks @HolgerAusB. Full-Text RSS already looks for a <base> element to use when rewriting relative URLs. The problem is that the base element here doesn’t appear inside a <head> element. That’s where the base element is usually found and it’s where Full-Text RSS looks for the base element.

I think we need to update the code to look for the base element outside of the head element too.

Quick update to say we’ve updated the code to look for the base element wherever it may appear, not just the head. So the image URLs should now get resolved properly. @HolgerAusB, @VicM

1 Like

I’m still not seeing the images, FWIW.

Hi @VicM, you will have to re-send the articles with Push to Kindle to see the images. If you’re opening the article you sent previously, it will not be updated.

Hopefully you see the images in the preview of Push to Kindle…

Push to Kindle preview

I didn’t send again, because I didn’t see the images in the preview.
I was using Brave, but I just tried it on Chrome, and the extension does retrieve the images there.

Hello,

It seems that Wired.com articles are truncated. I am a Wired subscriber. The article is shown in full in my browser, but content retrieved by either the P2K Chrome extension or the bookmarklet is truncated.

As far as I can tell, it only retrieves the first div.GridItem-beYvyV. But the article is made up of several of those, nested under a div.ArticlePageChunks-fwcPjP.

These two articles I tried today showed the same symptoms:

Thanks a lot for your work on this service!

PS: I think that the test article from the current wired.com config has the same problem.

Welcome @FRHdBdk,

I had only time for a short look on that, yet. Seems that Wired has different layouts for their articles and we have not this one in our config yet.

I could manage to get the full article with article.main-content but need to strip some meta fringe. But that needs to wait until maybe next week. Sorry.

Meanwhile: could you please search for one or two articles in that layout, containing images within the article text? I mean not that lead image before the first sentence, which exists in most/all articles.

I am not an employee of FiveFilters/P2K, but I am happy to help if I can. However, I don’t have much time this weekend.

Thanks a lot for looking into this @HolgerAusB!

Here are a few Wired articles that I found that use different layouts:

@FRHdBdk: The text should now be complete. If you have problems, try to DISABLE browser retrieved content.

But there are no images and I don’t know why.
@fivefilters: When inspecting the preview in P2K there are image nodes with correct src URLs but P2K only shows the filename of the images.

e.g. Where Are All the AI Drugs? | WIRED

<figure>
  <p>
    <img src="https://media.wired.com/photos/687835c1fbd61d57ce2036ae/master/w_1600%2Cc_limit/CellPaint-Coculture1LungandMacrophage.jpg" alt="CellPaint-Coculture1LungandMacrophage.jpg">
  </p>
  <p>Macrophage and lung fibroblast cellsCourtesy of Recursion</p>
</figure>

No problems with FTR and wallabag here. All text and images are in.

Thanks so much for your help @HolgerAusB, I can confirm the full text is retrieved!

For this article, I saw several images being correctly included in the document my Kindle received: “Lisa Su Runs AMD—and Is Out for Nvidia’s Blood | WIRED”

PS: I think that my posts above were auto-flagged, maybe due to posting multiple links to the same domain (Wired)? I received an automated DM saying “Your post was flagged as spam: the community feels it is an advertisement, something that is overly promotional in nature instead of being useful or relevant to the topic as expected.”, but of course that is wrong.

I removed the automatic set spam flag.

You saw the images when using plugin or bookmarklet? As I have no subscription, I can’t test this option. When just sending the URL to pushtokindle there are just the filenames of the images.

So if you can confirm, that you receive images of all your “wired” articles, we are fine with that.