More and more often I have the problem that only parts of the content are extracted. In this case the headlines are missing. So far I haven’t found a way to include them via config. Is this a bug?
Config: body: //div[@id=‘content-left’] or/combi //div[@class=‘body-text__paragraph-header’] with prune: no
You say you’re using prune: no too, so I’m not sure why it’s not working for you. Perhaps the file is not named correctly. Can you enable debugging to see if it’s being loaded?
I haven’t looked at the second one, but I would say this is a pruning issue too. If you add prune: no (I don’t see it in the config you posted) to the site config file, it should keep those elements.
The use of tidy isn’t really necessary in recent versions of Full-Text RSS as we’re defaulting to the bundled HTML5 parser, which handles difficult markup better than PHP’s libxml. We used tidy before as it would often help prepare HTML that the old parser by itself wouldn’t be able to handle.