Full-Text RSS 3.4 Not extracting

desk-user · March 27, 2015, 2:35am

Gday,

I am trying to extract the articles on this page: http://www.nbnnews.com.au/index.php/category/nthn-rivers-sport/feed/

And only some of the descriptions are returned, it appears true on the fivefilters.org site to, right not article 3’s description is missing and the results appears to contain info that is not in the original article…

Scratching my head.

It is more noticeable when you look at 10 items per page…

Any clues would be much appreciated.

Rick

fivefilters · March 27, 2015, 2:57pm

Hi Rick,

If there are no extraction rules for a website in Full-Text RSS, it tries to guess where the content is based on a number of heuristics. Such guesses are more likely to be correct if the article contains more than a few short sentences for Full-Text RSS to work with. In this case, the articles are very short, and so the guessing often results in other pieces of text being selected incorrectly.

If you’re using our self-hosted version, the solution is to write custom extraction rules. We have a simple tool to help with this available at http://siteconfig.fivefilters.org

I’ve gone ahead and created one for this site, so if you update your site configuration files (which hold the custom extraction rules), you should find that it now works as expected.

desk-user · March 29, 2015, 1:24pm

Hi Keyvan,

Sorry for my late reply, thank you kindly.

Can you please let me know where I can down load the config file, or perhaps paste here the contents.

Much appreciate your time.

Cheers

rick byronit

fivefilters · March 29, 2015, 1:30pm

Hi Rick,

You can update your site config files from the admin pages of Full-Text RSS. If you access the admin/ folder of your Full-Text RSS installation, you can see how to set that up.

Another way to do it, if you don’t want to enable admin access, is to download the updated set of site config files from our GitHub repository: https://github.com/fivefilters/ftr-site-config (you can use the ‘Download ZIP’ link on the right). Once you have that you can unzip its content into the site_config/standard/ folder of Full-Text RSS.

Hope that’s some help.

desk-user · March 30, 2015, 6:02pm

Thank you Keyvan, very helpful…

rick byronit