Hi,
Article: Diese Fehler in der Altersvorsorge sollten Sie vermeiden
next_page_link: //li[@class=‘next’]/a
body: //p[@class=‘teaser’] | //div[@class=‘artical-content-area’]
We get an error ‘This article appears to continue on subsequent pages which we could not extract’, any ideas what’s happening? It’s the same for any multipage article on that site.
Thanks!
Don’t know why this happens, I am just a user. But the link to next page is a second time in the html source. I didn’t need a body-statement. FTR found all neccassary things without any config, except for the next page link. Maybe other articles of this site need the body-identifier.
Additionaly I stripped the “mehr-zum-thema” box.
Complete working config is:
next_page_link: //link[@rel='next']
strip_id_or_class: mehr-zum-thema-outer
test_url: https://www.pfefferminzia.de/strategie-diese-fehler-in-der-altersvorsorge-sollten-sie-vermeiden/
Please let me know if this works and if you have aditional tweaks.
Will you provide your config at Github so others could benefit from it? If not, I can do this for you?
OK, just saw, the article teaser is missing, so we indeed need a body-identifyer. But yours did not include the teaser so here is my suggestion:
body: //div[@class='artical-outer']
#body: //p[@class='teaser'] | //div[@class='artical-content-area']
author: //div[@class='artical-author-info']/p/strong
next_page_link: //link[@rel='next']
strip_id_or_class: mehr-zum-thema-outer
strip_id_or_class: artical-download
strip_id_or_class: reading-time-ad-word
test_url: https://www.pfefferminzia.de/strategie-diese-fehler-in-der-altersvorsorge-sollten-sie-vermeiden/
I fixed another small problem and have now uploaded my config to Github: PR1068