Help creating site pattern, need to add image.

I am attempting to add photos to feeds from hosted2.ap.org

I am trying the following in the hosted2.ap.org.txt file:

body: //div[@class=‘ap_photo_wrapper’]//img[@class=‘ap_photo’]
body: //div[@class=‘ap_photo_wrapper’]//div[@class=‘ap_pht_cap’]

test_url: http://hosted2.ap.org/KSSUC/*/Article_2015-09-24-FBN-Skyboard-Craze/id-589cf87959414856b5cd5e1ad869c68e

However, it doesn’t appear to be picking up any of that. I would like to insert the photo from the page above the story, with its caption below it.

Hi George,

Thanks for posting on the forum.

Here are a few issues I could spot:

  1. Using more than one body rule is okay, but Full-Text RSS will try them in sequence and stop when it finds the first one that matches. So if your intention is for Full-Text RSS to extract and combine all the elements in the body rules, you need to declare them using a single body rule. For example:

body: //div[@id=‘element1’] | //div[@id=‘element2’]

This will include two div elements (assuming they both exist)

If you write it as:

body: //div[@id=‘element1’]
body: //div[@id=‘element2’]

Full-Text RSS will first check for element with id ‘element1’ and if it’s there, it’ll include it and not look for ‘element2’. If it’s not there, it will try to find element2. This can however be useful if a domain presents its content differently depending on which category you’re viewing. In such cases, you can use multiple body rules, knowing that Full-Text RSS will stop as soon as one matches.

  1. You have to be careful when you use @class=‘value’ as you have done here. It will only match if the class attribute of an element contains exactly what you’ve specified. In HTML, multiple class names can be present in a class attribute, separated by spaces. E.g.
    . I checked the URL you provided and noticed that that’s the case here too, so that’s probably why your XPath expression is not matching.

To match the element above //div[@class=“ap_photo”] will not work. You have to use something like //div[contains(@class, ‘ap_photo’)].

I’ve had a go at creating a site config file for this site which you can find here: https://github.com/fivefilters/ftr-site-config/blob/master/hosted2.ap.org.txt