next_page_links image

desk-user · November 30, 2013, 11:40am

i was try to extrating mutlipage using next_page_link

when i try using

next_page_link: //div[@class=‘imgContainer’]/div[@class=‘imgC’]/a

is produce error :
Catchable fatal error: Argument 1 passed to DOMDocument::importNode() must be an instance of DOMNode, string given in makefulltextfeed.php on line 668

is same when using this code
next_page_link: //a[@class=‘fr’ or @id=‘nextImgLink’]

this link of the web

but the code
next_page_link: //a[@class=‘fr’ or @id=‘nextImgLink’]

work perfectly at this address

any help would be greatly appreciated.
thanks in advance

fivefilters · November 30, 2013, 11:46am

Hi Martin, can you tell me which version of Full-Text RSS you’re using?

desk-user · November 30, 2013, 1:36pm

thanks Keyvan for reply,

i use version 3.1

Martin

desk-user · November 30, 2013, 5:14pm

i got my own answer, error produce because the page loop to main page (i mean, loop without stop)

how i can follow ‘next_page_link’ and stop if the title change ?

Martin

fivefilters · December 3, 2013, 1:56pm

Hi Martin,

There’s no way to stop when the title changes, but you can maybe try to be more explicit as to which links should be treated as ‘next page’ links. For example, in addition to @class/@id, why not see if the URL structure changes in some way. If it does, see which URL segments are unique to the next page URLs you want and which ones aren’t. Then you can do something like:

Follow if /real-next-page-link/ part of next page URL

next_page_link: //a[@class=‘xxx’ and contains(@href, ‘/real-next-page-link/’)]

or

Don’t follow if next page URL contains /not-real-next-page/

next_page_link: //a[@class=‘xxx’ and not(contains(@href, ‘/not-real-next-page/’))]

Hope that helps.

desk-user · December 5, 2013, 6:48pm

Thanks Keyvan for reply…

your explanation very helpfull
my xpath combine found…

next_page_link:
//div[@class=‘xxx’][(translate((substring-before(//span[contains(text(), ‘from’)], ‘from’)), translate(.,‘0123456789’,’’), ‘’))!=(translate((substring-after(//span[contains(text(),‘from’)],‘from’)), translate(.,‘0123456789’,’’), ‘’))]//a[(contains(@id,‘xxx’))or(contains(@class,‘xxx’))or(contains(translate(text(),‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’,‘abcdefghijklmnopqrstuvwxyz’), ‘next’))]

long code, very powerfull
get the span numbering page parent and compare before-from-after, when before-from-after not match, then get the //a link, but when match will stop, with extra translation code for upper/lowercase and number double translate :), and effective for another domain too.

Martin