I did some debugging and it appears that my hosted FTR isn’t fetching the content from the WSJ URLs…
On the left it’s my hosted one, on the right is FiveFilter’s hosted. You can clearly see that FiveFilter’s is fetching content, whereas in mine that’s completely skipped for some reason.
My self hosted FTRSS delivers large articles. Are you shure that you have no extra config files wsj.com.txt or (feeds.a.)dj.com.txt in your site_config/standard or /custom folder?
Sorry for confusing. I read your question on smartphone and answered there. Your own log also has changed the useragent and referrer. So no solution from my side, sorry.
Just saw: Your version check is from fivefilters.org not your own server
That is actually from my server… when you go to the self-hosted FTR webpage and click “check for updates”, it takes you to fivefilters.org passing your version and site config date on the GET request
Are you able to get full articles from WSJ in your self hosted?
Hm, I am at home now and checked again. I am just a user so I just trying to find out.
your referrer for the item-link is set to google, while ftr.fivefilter’s and mine is set to the item-link:
* ......user-agent set to: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0
* ......referer set to: https://www.wsj.com/amp/articles/russia-launches-new-drone-attacks-as-partnership-with-iran-deepens-11670666867
Don’t know where this is done, because in wsj.com.txt the part for user_agent and referrer differs and they are commented out.
Are you shure, that you did no changes to the original wsj.com.txt file? Maybe you Downlad this from Github again and replace your version.
Hi, I’m not sure what you mean… In the stage where the list of feeds is being retrieved, both mine and FiveFilter’s logs use the same user agent and referrer (www.google.co.uk) (see green rectangle in screenshot)
The issue seems to be that, in my hosted FTR, the code never performs the stuff shown in the red rectangle, it stops in the “* ** Loading class Readability (readability/Readability.php)” line and then skips to “* Attempting to extract content”
I’m sure I’m using the original wsj.com.txt file… I didnt even copy the file manually to the web server, I just used the “check for updates” link in the Admin UI and FTR itself downloaded all the files from Github to the “/site_config/standard” folder
ýes, but in the red rectangle there is a second pair of useragent/referrer. And this seems to to come of one of the three classes loaded at the beginning. The green part is reading the feed, the red part reads the article from source.
Hi Fabio, can I ask where your server is located? And one thing you can try is to pass &debug=rawhtml in the querystring to see the HTTP response, including content, sent back from the server.
I wonder if you’re getting a different response from their server. That can sometimes explain the difference users experience. Simply uploading Full-Text RSS to a different server (e.g. Hetzner Cloud, Digital Ocean) or a different server location (US or Europe) can affect how the target site responds.
It might not be either of those of course, but if you have an easy way to try running on a different server, I’d suggest trying that before debugging too much.
Hi, thanks for the response. I’m hosting FTR in my home lab, in Texas, USA. I tried the debug=rawhtml yesterday when I was troubleshooting it and FTR just isn’t retrieving the content from the URL it finds in the feed.
I obviously tried WSJ from my desktop web browser and the page loads successfully, all of it (after the paywall removers “cleanse” the pages)
@fivefilters I’m having the same problem as @fabio. I also have a self-hosted setup and am located in the US. WSJ used to work great until a few months ago. There has been no change on my end; it just stopped pulling full articles one day. I have tried a copy of my setup on a different server that uses Starlink instead of Comcast and have had the same issue. Using debug=rawhtml on extract.php appears to show that wsj is responding with only the stub article (what a nonsubscriber would receive). However, from a browser on the same server, I can login to WSJ and view articles just fine. My guess is that credentials need to be passed; is there a way to do this?
we troubleshooted the problem and the issue is related to FTR running anywhere in the US - for some reason it gets a response from the WSJ servers which is different than that received by an FTR anywhere in Europe.
I created a VM in a vps in Germany and installed FTR in it and it works perfectly. I’m using that temporarily to see if they can change the wsj site settings file to handle the different scenarios.
Try digital ocean, they give you a $200 credit valid for 2 months