About Site Patterns


Just downloaded the Full-Text RSS 2.8 from code.fivefilters.org. Tried to add my own site pattern but seems that it’s not working. Here’s the feed URL: http://udn.com/udnrss/financesfocus.xml. Just in case it’s been updated I’ll post what I have here. There’s a tag in the description tag of a feed item like this:

<![CDATA[ The weird thing is that it gets removed by Full-Text RSS. Then I came across http://help.fivefilters.org/customer/portal/questions/321606-full-text-rss-always-skipping-images and tried to add a custom site pattern but can't get it to work. The file path: site_config/custom/.udn.com.txt. Here's the content of the file: title: //div[@id='story_title'] body: //div[@id='body'] | //div[@id='media_file'] author: //div[@id='story_author'] prune: no Here's the URL of that news: http://udn.com/NEWS/FINANCE/FIN2/7459773.shtml. The image part goes like this:
Even with the site pattern I added the image is still missing. Any idea?

Changed .udn.com.txt to udn.com and now images are coming out. The problem is now I’m left with a few empty feed items and a few full-text ones without images and a few items with only images but not the text. Is there something wrong with my site pattern?


Hi Jeffrey, I would suggest you check each page individually to see if the elements you are trying to match exist - it might be that the site’s structure differs from page to page. Also might be a good idea to try setting tidy: no as well, to see if that makes a difference. Sometimes Tidy will produce HTML which produces a different DOM to the original you see in your browser.

Hope that’s some help.