Looking at the standard site_config folder I saw that one of my favourite feeds (http://www.anandtech.com/rss/) was profiled however the profile is from Instapaper and contains a directive that isn’t supported by FTR (next_page_link) and consequently the filtered feed only shows the first page of the multi-page articles.
So I did a little work and developed the site_config below using single_page_link but there is a frustrating problem in the sites “PRINT THIS ARTICLE” code - it displays the feed description as well as the article, because the feed description is repeated in the article the result is a duplicated section at the beginning of the filtered feed.
I’ve tried to work around with “body: substring-after(//div, ‘Page 1’)” but that doesn’t appear to be evaluated. In NPP with XPatherizerNPP it evaluates to the string beginning at the first page of the article (eliminating the description from the feed).
Any suggestions of alternative methods to acomplish the goal of eliminating the chunk of repeated text (in the test URL that’s everything before “It Was a Monster Mash…”)?
full-text-rss/site_config/custom/anandtech.com.txt
single_page_link: concat(‘http://www.anandtech.com/print/’, substring-after(//meta[@property=‘og:url’]/@content, ‘/show/’))
author: //a[@class=‘b’][1]
date: substring-after(substring-before(//div, ‘Posted in’), ’ on ')
strip: //h2
strip_image_src: /content/images/globals/
test_url: http://www.anandtech.com/show/5812/eurocom-monster-10-clevos-little-monster/