spiegel.de - unable to retrieve full-text content

Hi,

i’m unable to get the content of spiegel.de, example-feed: DER SPIEGEL - Erneuerbare Energien

Debug says:

* APC is disabled or not available on server
* Supplied URL: https://www.spiegel.de/thema/erneuerbare_energien/index.rss
* ** Loading class HumbleHttpAgent (humble-http-agent/HumbleHttpAgent.php)
* ** Loading class ContentExtractor (content-extractor/ContentExtractor.php)
* ** Loading class SiteConfig (content-extractor/SiteConfig.php)
* --------
* Attempting to process URL as feed
* ** Loading class SimplePie_HumbleHttpAgent (humble-http-agent/SimplePie_HumbleHttpAgent.php)
* ** Loading class DisableSimplePieSanitize (DisableSimplePieSanitize.php)
* Fetching URL (https://www.spiegel.de/thema/erneuerbare_energien/index.rss)
* Starting parallel fetch (curl_multi_*)
* Processing set of 1
* ...https://www.spiegel.de/thema/erneuerbare_energien/index.rss
* ......adding to pool
* . looking for site config for spiegel.de in custom folder
* . looking for site config for spiegel.de in standard folder
* ... found site config in standard folder (spiegel.de.txt)
* Cached site config with key spiegel.de
* Cached site config with key spiegel.de.merged
* Checking fingerprints...
* No fingerprint matches
* . looking for site config for global in custom folder
* . looking for site config for global in standard folder
* ... found site config in standard folder (global.txt)
* Cached site config with key global
* Cached site config with key global.merged.ex
* Appending site config settings from global.txt
* ......user-agent set to: PHP/7.4
* ......referer set to: http://www.google.de/url?sa=t&source=web&cd=1
* Sending request...
* Received responses
* ... site config for spiegel.de.merged already loaded in this request
* Checking fingerprints...
* No fingerprint matches
* ... site config for global.merged.ex already loaded in this request
* Appending site config settings from global.txt
* --------
* Constructing a single-item feed from URL
* ** Loading class FeedWriter (feedwriter/FeedWriter.php)
* --------
* Fetching feed items
* Starting parallel fetch (curl_multi_*)
* Processing set of 1
* ...https://www.spiegel.de/thema/erneuerbare_energien/index.rss
* ......in memory
* --------
* Processing feed item 1
* Item URL: https://www.spiegel.de/thema/erneuerbare_energien/index.rss
* ** Loading class FeedItem (feedwriter/FeedItem.php)
* URL already fetched - in memory (https://www.spiegel.de/thema/erneuerbare_energien/index.rss, effective: https://www.spiegel.de/thema/erneuerbare_energien/index.rss)
* Done!

Any tips to get this working?

regards,
Friedhelm

@macfly: Thank you for reporting this! spiegel.de does not accept user-agent ‘PHP’ any longer. I fixed that problem. Please update your site configs in admin-area.

I also removed subscription advertisements from teaser articles.

1 Like

works like a charm.

thank you!

2 Likes

Thank you for the report @macfly and the speedy fix @HolgerAusB! :pray: