multi extraction


already i can customize any feed for my purpose but i have a problem.
when i strip some id or class within a post, it makes some white space in autoblogged posts.

my question is : how can i extract a page in multi steps instead of extract the whole id or class and strip rival ids and classes that makes many white spaces.

for example extract image of post and content separately and join together at last.

body1: //div[@class=‘post-image’]
body2: //div[@class=‘post-content’]

and result = body1 + body2

thank you

Hi Mohammad, you should be able to do the following:

body: //div[@class=‘post-image’] | //div[@class=‘post-content’]


body: //div[@class=‘post-image’ or @class=‘post-content’]

It works like a charm.
thank you very much keyvan jan.

Mohammad Jamallou

No problem. Glad it worked.

hi keyvan
is it possible do the same act for titles?
this method :
title: //div[@class=‘post-title1’] | //div[@class=‘post-title2’]
dosn’t work for title and extract just first one.
may i use mixed of (html code or text) and normal title in title or body?
title: (some text or html code) | //div[@class=‘post-title’]

Mohammad Jamallou

Hi Mohammad, this is not possible with titles yet. Our code will simply pick the first element matching your XPath expression in the document. One way you can try to add your own text to the title is to use find_string and replace_string. E.g.


My text -

thanks again keyvan.
it helped.

Mohammad Jamallou