next_page_links image

i was try to extrating mutlipage using next_page_link

when i try using

next_page_link: //div[@class=‘imgContainer’]/div[@class=‘imgC’]/a

is produce error :
Catchable fatal error: Argument 1 passed to DOMDocument::importNode() must be an instance of DOMNode, string given in makefulltextfeed.php on line 668

is same when using this code
next_page_link: //a[@class=‘fr’ or @id=‘nextImgLink’]

this link of the web


but the code
next_page_link: //a[@class=‘fr’ or @id=‘nextImgLink’]

work perfectly at this address

any help would be greatly appreciated.
thanks in advance

Hi Martin, can you tell me which version of Full-Text RSS you’re using?

thanks Keyvan for reply,

i use version 3.1

Martin

i got my own answer, error produce because the page loop to main page (i mean, loop without stop)

how i can follow ‘next_page_link’ and stop if the title change ?

Martin

Hi Martin,

There’s no way to stop when the title changes, but you can maybe try to be more explicit as to which links should be treated as ‘next page’ links. For example, in addition to @class/@id, why not see if the URL structure changes in some way. If it does, see which URL segments are unique to the next page URLs you want and which ones aren’t. Then you can do something like:

Follow if /real-next-page-link/ part of next page URL

next_page_link: //a[@class=‘xxx’ and contains(@href, ‘/real-next-page-link/’)]

or

Don’t follow if next page URL contains /not-real-next-page/

next_page_link: //a[@class=‘xxx’ and not(contains(@href, ‘/not-real-next-page/’))]

Hope that helps.

Thanks Keyvan for reply…

your explanation very helpfull
my xpath combine found…

next_page_link:
//div[@class=‘xxx’][(translate((substring-before(//span[contains(text(), ‘from’)], ‘from’)), translate(.,‘0123456789’,’’), ‘’))!=(translate((substring-after(//span[contains(text(),‘from’)],‘from’)), translate(.,‘0123456789’,’’), ‘’))]//a[(contains(@id,‘xxx’))or(contains(@class,‘xxx’))or(contains(translate(text(),‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’,‘abcdefghijklmnopqrstuvwxyz’), ‘next’))]

long code, very powerfull :smiley:
get the span numbering page parent and compare before-from-after, when before-from-after not match, then get the //a link, but when match will stop, with extra translation code for upper/lowercase and number double translate :), and effective for another domain too.

Martin