Reuters articles not loading

Hi @fivefilters, following up this issue on GitHub:

After I updated reuters.com.txt in the repo, I still can’t get full-content via ftr.fivefilters.net while it works on my self-hosted FTR. And on P2K, I also get the error message when just sending the URL. Using Plugin with browser-retrieved content works fine, of course.

I suspect that Reuters is blocking FiveFilters’ servers or that the domain is on your block list at their request? Can you confirm this?

1 Like

Found out that using
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:143.0) Gecko/20100101 Firefox/143.0"
on the console, just receives a hint to activate JS and cookies.

When adding the cookie (I saw on the browser) to the console command
-b 'reuters-geo={"country":"-", "region":"-"}'
I receive full content.

But when trying to inject this cookie into the config file I also get the error on my local FTR:
http_header(cookie): reuters-geo={"country":"-", "region":"-"}
I tryied to mask the quotation marks and the blank and the { with \ or %xx. No change.

Even using a fake cookie bla=foo results in error with FTR. While the article is still fetchable in wallabag with both cookies. So I had to remove this line.

1 Like

@HolgerAusB Interesting! Thanks for this. Will try to take a look this weekend and see if I can reproduce.

@HolgerAusB Thanks for looking into this. I tested a bit more and it seems certain combinations of headers and source IP will not allow the request through.

We made these changes to the site config:

This now works for Full-Text RSS for us, and I suspect they’ll work for you too. But let me know if it doesn’t.

@fivefilters:Thank you for looking into this, but unfortunately this works NOT with my 3.9.13 instance. The Macintosh user-agent don’t work here. The content is only fetched with the “WIndows NT” one, I had set before. The referer does not change anything here.

The log output doesn’t seem to help here either.

* APC is disabled or not available on server
* Supplied URL: https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/
* ** Loading class HumbleHttpAgent (humble-http-agent/HumbleHttpAgent.php)
* ** Loading class ContentExtractor (content-extractor/ContentExtractor.php)
* ** Loading class SiteConfig (content-extractor/SiteConfig.php)
* --------
* Attempting to process URL as feed
* ** Loading class SimplePie_HumbleHttpAgent (humble-http-agent/SimplePie_HumbleHttpAgent.php)
* ** Loading class DisableSimplePieSanitize (DisableSimplePieSanitize.php)
* Fetching URL (https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/)
* Starting parallel fetch (curl_multi_*)
* Processing set of 1
* ...https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/
* ......adding to pool
* . looking for site config for reuters.com in custom folder
* ... found site config (reuters.com.txt)
* Cached site config with key reuters.com
* . looking for site config for reuters.com in standard folder
* ... site config for reuters.com already loaded in this request
* . merging config files
* Cached site config with key reuters.com.merged
* Checking fingerprints...
* No fingerprint matches
* . looking for site config for global in custom folder
* . looking for site config for global in standard folder
* ... found site config in standard folder (global.txt)
* Cached site config with key global
* Cached site config with key global.merged.ex
* Appending site config settings from global.txt
* ......user-agent set to: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36
* ......referer set to: https://www.reuters.com/
* Sending request...
* Received responses
* ... site config for reuters.com.merged already loaded in this request
* Checking fingerprints...
* No fingerprint matches
* ... site config for global.merged.ex already loaded in this request
* Appending site config settings from global.txt
* --------
* Constructing a single-item feed from URL
* ** Loading class FeedWriter (feedwriter/FeedWriter.php)
* --------
* Fetching feed items
* Starting parallel fetch (curl_multi_*)
* Processing set of 1
* ...https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/
* ......in memory
* --------
* Processing feed item 1
* Item URL: https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/
* ** Loading class FeedItem (feedwriter/FeedItem.php)
* URL already fetched - in memory (https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/, effective: https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/)
* Done!

Same for wallabag. It fetches articles with the Windows-UA but not with the Macintosh-UA.
But the log delivers a bit more information here:
401 / Unauthorized

[2025-10-03T08:17:28.370542+00:00] graby.INFO: Fetching url: https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/ {"url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/"} []
[2025-10-03T08:17:28.370607+00:00] graby.INFO: Trying using method "get" on url "https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/" {"method":"get","url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/"} []
[2025-10-03T08:17:28.370663+00:00] graby.INFO: Found user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36" for url "https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/" from site config {"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36","url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/"} []
[2025-10-03T08:17:28.370741+00:00] graby.INFO: Found referer "https://www.reuters.com/" for url "https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/" from site config {"referer":"https://www.reuters.com/","url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/"} []
[2025-10-03T08:17:28.456565+00:00] graby.WARNING: Request throw exception (with a response): Unauthorized {"error_message":"Unauthorized"} []
[2025-10-03T08:17:28.456704+00:00] graby.INFO: Data fetched: array{"effective_url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/","body":"<html lang=\"en\"><head><title>reuters.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style=\"margin:0\"><p id=\"cmsg\">Please enable JS and disable any ad blocker</p><script data-cfasync=\"false\">var dd={'rt':'c','cid':'AHrlqAAAAAMAtvIDR_H5j-kA9H-gLw==','hsh':'2013457ADA70C67D6A4123E0A76873','t':'fe','qp':'','s':46743,'e':'639a5779f9bd89165f7bb612ff43fe392ace0d4d750e839de76237ca8d747e03ec0023853355f98f50facfad48c29100','host':'geo.captcha-delivery.com','cookie':'tXQkwJp~JZ_mWSjVu2cRD7T_y3Syc3t0ahG4cNnRFfBNlsA3y1RFdXM2r9eFv2jsSZfatLposh8aF8fT1riB7o_omy0VVVVNRLGYDyUghzA0uJabEURN4VeYoo1INEr4'}</script><script data-cfasync=\"false\" src=\"https://ct.captcha-delivery.com/c.js\"></script></body></html>","headers":{"content-type":"text/html;charset=utf-8","content-length":"771","connection":"keep-alive","server":"CloudFront","date":"Fri, 03 Oct 2025 08:17:28 GMT","x-datadome":"protected","accept-ch":"Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory","charset":"utf-8","cache-control":"max-age=0, private, no-cache, no-store, must-revalidate","pragma":"no-cache","access-control-allow-credentials":"true","access-control-expose-headers":"x-dd-b, x-set-cookie","access-control-allow-origin":"https://www.reuters.com","x-datadome-cid":"AHrlqAAAAAMAtvIDR_H5j-kA9H-gLw==","x-dd-b":"1","set-cookie":"datadome=tXQkwJp~JZ_mWSjVu2cRD7T_y3Syc3t0ahG4cNnRFfBNlsA3y1RFdXM2r9eFv2jsSZfatLposh8aF8fT1riB7o_omy0VVVVNRLGYDyUghzA0uJabEURN4VeYoo1INEr4; Max-Age=31536000; Domain=.reuters.com; Path=/; Secure; SameSite=Lax","x-cache":"LambdaGeneratedResponse from cloudfront","via":"1.1 ad82d8a80f2c6497aad660c7722475c0.cloudfront.net (CloudFront)","x-amz-cf-pop":"FRA60-P9","x-amz-cf-id":"9lu5cFWBjU1kMRO0uRpW2TmcjIqj294fuepadNLnLdLPFlwQaL5ncg==","content-security-policy":"frame-ancestors 'self'; report-uri https://reuters.report-uri.com/r/t/csp/enforce; report-to report-uri","report-to":"{\"endpoints\":[{\"url\":\"https://reuters.report-uri.com/a/t/g\"}],\"group\":\"report-uri\",\"include_subdomains\":true,\"max_age\":31536000}"},"status":401} {"data":{"effective_url":"https://www.reuters.com/world/us/us-attorney-general-garland-confirms-fbi-investigating-trump-2022-08-11/","body":"<html lang=\"en\"><head><title>reuters.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style=\"margin:0\"><p id=\"cmsg\">Please enable JS and disable any ad blocker</p><script data-cfasync=\"false\">var dd={'rt':'c','cid':'AHrlqAAAAAMAtvIDR_H5j-kA9H-gLw==','hsh':'2013457ADA70C67D6A4123E0A76873','t':'fe','qp':'','s':46743,'e':'639a5779f9bd89165f7bb612ff43fe392ace0d4d750e839de76237ca8d747e03ec0023853355f98f50facfad48c29100','host':'geo.captcha-delivery.com','cookie':'tXQkwJp~JZ_mWSjVu2cRD7T_y3Syc3t0ahG4cNnRFfBNlsA3y1RFdXM2r9eFv2jsSZfatLposh8aF8fT1riB7o_omy0VVVVNRLGYDyUghzA0uJabEURN4VeYoo1INEr4'}</script><script data-cfasync=\"false\" src=\"https://ct.captcha-delivery.com/c.js\"></script></body></html>","headers":{"content-type":"text/html;charset=utf-8","content-length":"771","connection":"keep-alive","server":"CloudFront","date":"Fri, 03 Oct 2025 08:17:28 GMT","x-datadome":"protected","accept-ch":"Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory","charset":"utf-8","cache-control":"max-age=0, private, no-cache, no-store, must-revalidate","pragma":"no-cache","access-control-allow-credentials":"true","access-control-expose-headers":"x-dd-b, x-set-cookie","access-control-allow-origin":"https://www.reuters.com","x-datadome-cid":"AHrlqAAAAAMAtvIDR_H5j-kA9H-gLw==","x-dd-b":"1","set-cookie":"datadome=tXQkwJp~JZ_mWSjVu2cRD7T_y3Syc3t0ahG4cNnRFfBNlsA3y1RFdXM2r9eFv2jsSZfatLposh8aF8fT1riB7o_omy0VVVVNRLGYDyUghzA0uJabEURN4VeYoo1INEr4; Max-Age=31536000; Domain=.reuters.com; Path=/; Secure; SameSite=Lax","x-cache":"LambdaGeneratedResponse from cloudfront","via":"1.1 ad82d8a80f2c6497aad660c7722475c0.cloudfront.net (CloudFront)","x-amz-cf-pop":"FRA60-P9","x-amz-cf-id":"9lu5cFWBjU1kMRO0uRpW2TmcjIqj294fuepadNLnLdLPFlwQaL5ncg==","content-security-policy":"frame-ancestors 'self'; report-uri https://reuters.report-uri.com/r/t/csp/enforce; report-to report-uri","report-to":"{\"endpoints\":[{\"url\":\"https://reuters.report-uri.com/a/t/g\"}],\"group\":\"report-uri\",\"include_subdomains\":true,\"max_age\":31536000}"},"status":401}} []

And Jérémy’s/J0k3r’s f43.me also works only with the Windows-UA

Thanks Holger, I know you saw my reply in the Github thread, but I’ll copy it here too for the forum readers:

I think the issue here is not the Mac/Windows part of the UA, but the Firefox/Chrome part.

Services like Cloudflare and Cloudfront (which Reuters use) can analyze the TLS fingerprint of the client during the TLS handshake - before any HTTP headers are sent. This fingerprint is often used to determine whether the connection is coming from Chrome, Firefox, cURL, or another client.

They then compare this TLS-identified client with the browser claimed in the HTTP User-Agent header. If these don’t match (e.g., the TLS handshake looks like Chrome but the User-Agent claims Firefox), the request receives a penalty score. If the overall score is too low, the request gets rejected.

This is unfortunately becoming more common, especially on larger sites. But it’s not that difficult to control your TLS fingerprint, in a similar way you can control the User-Agent string. The link above shows some solutions. I think the issue we’re experiencing here is that our servers are using a Chrome-like TLS handshake for this particular site, and so when we present a Firefox User-Agent header, our request gets blocked (and possibly the source IP is contributing to the negative score in a way that running the service locally on a residential IP doesn’t).

1 Like