Page doesn't render

Hi,
I’m testing now various site configuration for my new project.
And now I met with the issue - some sites don’t render.
I have a message “[unable to retrieve full-text content]” with zero-filled json output.
Moreover, when I’m trying to select a body section with http://siteconfig.fivefilters.org/
then show a message “You look like a bot, go away.”
What’s the matter?
Sites and pages I tested:
https://www.eurointegration.com.ua/news/2019/05/23/7096521/ (and any other page from the site)
https://vybory.pravda.com.ua/news/2019/05/23/7149993/ (and anyl other page from the site)
custom XPath configurations don’t work :frowning:

Please help to find a solution for such pages.
Thanks a lot for a fast reply.

Moreover, I see in logs errors maybe for ALL non-latin pages
2019/05/25 13:15:15 [error] 1058#1058: 2074604 FFastCGI sent in stderr: "PHP message: PHP Notice: https://***** is invalid XML, likely due to invalid characters. XML error: ******* (a variety of errors from SimplePie) in /var/www//ftr/libraries/simplepie/library/SimplePie.php on line 1496" while reading response header from upstream
(most of latin-letters site work well)

I have installed the newest SimplePie library (v.1.5.2) with the same result :frowning:

Added:
But all XML errors gone after installing SimplePie library v.1.3.2 :open_mouth:

Added again:
SimplePie function parse() uses html entities, which brake the testing.
I replace one from the v.1.3.2 and all the errors gone

Added:
SimplePie 1.4 & 1.5.2 demo (without ftr) works like a charm and extract feeds without any error.

So, it’s not a problem of the above-mentioned feeds
IMHO, its a problem in FTR.

Thanks for those problem URLs. Some sites implement systems which try to detect automated access. Either through examining IP addresses or looking for clues in the request headers. You can sometimes get past these measures by running Full-Text RSS from a server whose IP address isn’t blocked (if IP blocking is the cause) or by changing the request headers Full-Text RSS sends - in the site config files we allow you to override the User-Agent string with something like:

http_header(user-agent): Mozilla.... 

But this will require some trial and error, and if the sites in question update their blocking rules, you might find you’re faced with similar messages in the future.

Oh, and thanks for letting us know about the error log messages you’re seeing. We’ll take a look to see if we can reproduce this.

Hello,
I tried to use your solution - put real headers from browsers (Safari, Chrome, Mozilla)
It’s not working. Alas.
Maybe another way exists?

Hello,
I’ll be very glad if you could suggest headers for the sites listed above.
All my attempts ended without any success :frowning:
Thank you in advance!