Only content from Twitter blockquote is extracted

This URL contains a blockquote/preview/embedded tweet on the page, and when I try to scrape the content using the Push to Kindle app or web app it only grabs the content for the tweet, but not the rest of the email. Defense in Depth is actually a good thing • Buttondown

Thanks for letting us know. Push to Kindle actually operates in two modes:

  1. If you use our mobile apps or old browser extensions, you’ll be relying on our server retrieving the content and processing it.
  2. If you use our newer browser extensions, you’ll have the content sent over after your browser has processed it. That tends to produce better results for many sites (particularly the more complex ones, or ones with markup errors like this one).

This website uses invalid HTML tags to surround the tweet, which trips up our parser in the first mode. If you view the original article source in Firefox, you’ll see something like this:

Firefox has marked the invalid tags in red, and that’s why our parser gets confused and only extracts that element.

But if you use the newer extension, the bulk of the body gets extracted, but not the tweet itself (we’ll try and fix that later):

Thanks for the reply. Are there are any plans to support alternatives to 1) like 2) for the mobile app? Maybe if the article could be loaded into “in-app browser” (I think some password managers do something like that) then something like 2) could be possible, but I have no idea. I just run into this a fair amount on smaller sites, here’s another. Willingness to look stupid

How do you enable the “Use browser-retrieved content” option? It’s not selectable for me in Firefox and Chrome.

Hi there,

Are there are any plans to support alternatives to 1) like 2) for the mobile app?

We’re looking into it, but unsure how much support there is for this on mobile platforms. It does look like that an app on iOS can receive article content as seen by Safari, but that would require that the app is called after loading the article inside Safari, so sharing from other apps would still require server-side fetching of content.

At the moment the best option for mobile is to use our bookmarklet. It requires a little more setup, but afterwards the results should look very similar to the that of our regular browser extension.

How do you enable the “Use browser-retrieved content” option? It’s not selectable for me in Firefox and Chrome.

If you install our Chrome or Firefox browser extension (or the bookmarklet for mobile Chrome, Firefox or Safari), then when you load articles via the extension (or bookmarklet) you’ll see that option enabled by default.