Trying to make a site config to pull the images in the article from the following website and can’t seem to do it:
I can get the header image with //img[contains(concat(’ ‘,normalize-space(@class),’ ‘),’ wp-post-image ')]
The body images appear to be div.wp-block-image but when I try that it doesn’t load.
Thanks for any guidance!
Hi @SixthStreet,
if you are using Firefox, rightclick somewhere on the original webpage, chose inspect and hit F1. Scroll down and "deactivate JavaScript’. Then you will see the problem. The images are injected by JavaScript, which is not catchable by FTR. But there is also a noscript version, but this node is removed by FTR. Other browser have similar ways of deactivating JavaScript.
You can prevent this behavior of FTR by string replacing the noscript node. Additionally you have to strip the original image, because it was placing an empty placeholder in the excerpt.
find_string: noscript>
replace_string: div>
strip: //img[contains(@src, 'data:image/gif')]
As there are a few more things you might want to strip, it would be great, if you can do a PullRequest of your final config to the GitHub-repo. Or you can post it here, so I will do this for you.