Dear @HolgerAusB ,
I am subscribed to a newsletter and when I want to send the full-length version of it to Push2Kindle, it cuts most of it. Do you see a solution?
Example URL:
https://mailchi.mp/7de9e228c7bb/a-vademecum-hrlevl-819-szma-6738469?e=753fa56314
Thank you!
@barnabarna:
mailchimp is a huge service and I don’t know if every newsletter has the same html format.
Could you send me more links to other issues? via PM because the links are personalized for an individual subscriber.
Are you subscribed to other, different newsletters, hosted by mailchimp? I need links, too.
For future issues with different domains, please don’t add it to an existing discussion and open a new topic instead. This makes the threads clearer and the topics discussed fit the title.
@fivefilters:
you wrote a config for mailchimp two years ago, which does not work with this link. Could you PM me an example link of your newsletter (if this is still working with the config)?
With FullText-RSS the following config works here, but I could not test with wallabag or P2K before the weekend.
body: //table[@role='presentation']
prune: no
or, if replacing the table nodes, as you had done:
# Mailchimp uses tables a lot in its email templates
replace_string(<table): <div
replace_string(</table): </div
replace_string(<tbody): <div
replace_string(</tbody): </div
replace_string(<tr): <div
replace_string(</tr): </div
replace_string(<td): <div
replace_string(</td): </div
#body: //div[@role='presentation']
body: //div
strip_id_or_class: awesomewrap
prune: no
1 Like
Sorry for the slow reply, @HolgerAusB. I wish I remembered what I tested that site config with. If I find it, I’ll PM it to you.
For this URL that @barnabarna has posted, I can see that there is no <div id="bodyTable">
element. But there is <body id="archivebody">
. I don’t know if these are attribute values that Mailchimp includes in all its email templates, or if it’s specific to some templates. Might be good to add another line with a body selector below the tableBody one:
body: //body[@id="archivebody"]
so if the bodyTable element isn’t found, it will try to find the archivebody element.
But the heavy use of tables in Mailchimp templates means that the quality of the extraction probably isn’t going to be that great. But it would be nice if we could come up with a good-enough site config.
I don’t have time to test this much right now, @HolgerAusB, but feel free to try something with the site config file. And if I find some more examples, I’ll send them your way.
1 Like
@fivefilters : mhhhh, as <body>
should always be the top node of HTML content, it does not make much sense to me, to restrict the selector to a specific @id
.
Just to be on the safe side, I’ll restrict the selector to the first body element found:
body: //body[1]
@barnabarna: Please wait one hour and check again. Push2Kindle will then have pulled my new config.
@Everyone:
If you are a subscriber to any mailchimp newsletter, please check, if you get a proper content on ftr.fivefilters.net or pushtokindle.com which both sharing the same engine and config. Please report if the fetch is good or not, including the name of the Newsletter, not the link, because it is a personalized URL!
If the excerpt is erroneous, I would be happy if you could send me a direct link to the tested article as a direct message. And don’t forget to tell us what you think is wrong with the result