Article: Diese Fehler in der Altersvorsorge sollten Sie vermeiden
body: //p[@class=‘teaser’] | //div[@class=‘artical-content-area’]
We get an error ‘This article appears to continue on subsequent pages which we could not extract’, any ideas what’s happening? It’s the same for any multipage article on that site.
Don’t know why this happens, I am just a user. But the link to next page is a second time in the html source. I didn’t need a body-statement. FTR found all neccassary things without any config, except for the next page link. Maybe other articles of this site need the body-identifier.
Additionaly I stripped the “mehr-zum-thema” box.
Complete working config is:
Please let me know if this works and if you have aditional tweaks.
Will you provide your config at Github so others could benefit from it? If not, I can do this for you?
OK, just saw, the article teaser is missing, so we indeed need a body-identifyer. But yours did not include the teaser so here is my suggestion:
#body: //p[@class='teaser'] | //div[@class='artical-content-area']
I fixed another small problem and have now uploaded my config to Github: PR1068
Thanks, that worked. I didn’t spot the next page link in the header.