multi extraction

desk-user · April 13, 2013, 12:32pm

hi

already i can customize any feed for my purpose but i have a problem.
when i strip some id or class within a post, it makes some white space in autoblogged posts.

my question is : how can i extract a page in multi steps instead of extract the whole id or class and strip rival ids and classes that makes many white spaces.

for example extract image of post and content separately and join together at last.

body1: //div[@class=‘post-image’]
body2: //div[@class=‘post-content’]

and result = body1 + body2

thank you

fivefilters · April 13, 2013, 12:38pm

Hi Mohammad, you should be able to do the following:

body: //div[@class=‘post-image’] | //div[@class=‘post-content’]

or

body: //div[@class=‘post-image’ or @class=‘post-content’]

desk-user · April 13, 2013, 1:25pm

Hello
It works like a charm.
thank you very much keyvan jan.

Mohammad Jamallou

fivefilters · April 14, 2013, 9:19pm

No problem. Glad it worked.

desk-user · May 5, 2013, 8:41am

hi keyvan
is it possible do the same act for titles?
this method :
title: //div[@class=‘post-title1’] | //div[@class=‘post-title2’]
dosn’t work for title and extract just first one.
or
may i use mixed of (html code or text) and normal title in title or body?
e.g.:
title: (some text or html code) | //div[@class=‘post-title’]

Mohammad Jamallou

fivefilters · May 6, 2013, 11:14am

Hi Mohammad, this is not possible with titles yet. Our code will simply pick the first element matching your XPath expression in the document. One way you can try to add your own text to the title is to use find_string and replace_string. E.g.

find_string:

replace_string:

My text -

desk-user · May 7, 2013, 7:52am

thanks again keyvan.
it helped.

Mohammad Jamallou