Overriding global strip


#1

I’m wondering how to declare a div NOT to be stripped for a site, when stripped globally.

I have the following in my global config, as I need to strip this title div from nearly ALL feeds I am extracting:

strip_id_or_class: post-title

But I have one feed where I don’t want to strip post-title. I can’t seem to figure out how to override the global file. As I understand it, the site specific rules should take precedence. Tried adding this to site specific file, with no luck:

title: //h1[@class="post-title"]

#2

Hi Marc,

You can’t specify that something not be stripped when a rule says that it should be. Having said that, there are a number of workarounds that come to mind for your situation:

  1. If you’re trying to prevent stripping to be able to use the element text as the title (as in your example), then you shouldn’t have to worry about the strip rules as we process the title directive before we strip elements. So your example should work. If it’s not, my suggestion would be to look to see if the the class attribute only contains ‘post-title’ or other values too. If it contains other values, your XPath expression should be amended to select the element - it should look something like this:

    title: //h1[contains(@class, 'post-title')]
    
  2. You can also tell Full-Text RSS not to merge a particular site’s rules with the global site config file. To do that you can add:

    autodetect_on_failure: no
    

You’ll find more about how site config files are merged here: http://help.fivefilters.org/customer/portal/articles/223153-site-patterns

Hope that’s some help.