I have a div class that is often, but not always, the target I use for pulling content from an article. I’ve been adding custom config files for these, like so (works fine):
body: //div[contains(@class, 'lxb_af-post_content')]
Is it possible to add this to my custom/global.txt file so that this div class name is used for content IF IT IS PRESENT, yet still NOT require that the content must always be in this div. The vast majority of content we monitor does not use this class, but enough do that it would be nice to not have to use custom configs for all of these.
Hi there, yes, you should be able to do this.
The body XPath doesn’t have to match. If it doesn’t, the other body XPaths that follow it are checked in order until one matches. If none matches, automatic detection occurs.
This is useful if, for example, a site has articles with different templates under the same domain, e.g.
You can create a site config file called
example.org.txt with two body directives:
# template 1
body: //div[contains(@class, 'main-content')]
# template 2
If article-b.html is being processed, the first body XPath won’t match, so the second one gets used. If article-a.html is processed, the first one matches and any body XPaths that follow are ignored.
If global.txt is used, the behaviour should be the same (and if a site has a site config file written for it, the global rules get merged with it, but won’t take precedence).
This is exactly what I was hoping.
Thanks for the great explanation!
I’m not getting this to work. I’ve added
body: //div[contains(@class, 'lxb_af-post_content')] to my /site_config/custom/global.txt file but cannot target that div. It only returns content if I make a domain-based config file and declare it there.
Can’t reproduce this in Full-Text RSS 3.9.5. Are you using an older version perhaps?
If you’re using the latest version, can you post the debug output?
You should make sure you see something like this in the output:
- . looking for site config for global in custom folder
- … found site config (global.txt)
- Cached site config with key global
- . looking for site config for global in standard folder
- … site config for global already loaded in this request
- . merging config files
- Cached site config with key global.merged.ex
- Appending site config settings from global.txt