vienna.at is sending scrambled garbage

Hello @fivefilters,

For the past few weeks, the feed from vienna.at has been returning nothing but unreadable gibberish from my self-hosted FiveFilters, while wallabag works fine. Your ftr.fivefilters.net also worked at first, but after I updated the config, it has the same problem. As html has changed, I needed to set a new body selector.

[i·SdàªöCF"¬j?윜œ´öPµHȼ`õǯ?ÿü÷Ÿƒqa�„Él±Úì§ËíñúüþSÓ·~›
aÆ‘˜@�xI`ä×áh4$("	ÌAK2—e±õÝ*ËÕüÍ,ÊL}°¿á\œõVvLP]PÅ5O¨ë8ë®oퟩúÿ«-.^i °´ûìÙrQ;º–k$IX$À� êh³}ÛöûbsWY}]©ZåŸD}8£Û*MsÚNÏ®ÈwNzçU*„ž�R™·AþAê“ô/ÊþúOg}ýªžÃîÝ5Ð!ÎE켇(©¨A@k„táXüŸïkª’TËO¿
!WNqK.æÞ™ñ¾÷ƒüx
�Ac
Ç\…™;wîûÿ}üOàbIebT±+B¨rUÍ{Xû€dC­S,eW²«Ußù¸èüÿû3ÿmÍãÏé ë$
...

When I use a single article URL, my own FTR also returns nothing but garbage or an error message in the preview - at first. But sometimes, I get good results when reloading the page several times. But that doesn’t help with my Feed Aggregator.

I looked into this further using cURL and found that when I request a single URL, I also get nothing but garbage back. It’s not HTML, but just a jumble of characters.
However, if I add a slash at the end of the URL, the articles from FTR are always displayed correctly right away.

bad:
https://www.vienna.at/zweiter-vorfall-binnen-zwei-tagen-us-maschinen-dringen-in-oesterreichischen-luftraum-ein/10165364

good:
https://www.vienna.at/zweiter-vorfall-binnen-zwei-tagen-us-maschinen-dringen-in-oesterreichischen-luftraum-ein/10165364/

I also tested with different UAs and prune/tidy options, no luck.

FTR 3.9.13
PHP 8.1.34
compatibility test shows, that everything is enabled, except APC/APCu which is intended to be off.

Best regards,
Holger

ARRRG, even the feed URL itself needs that additional Slash to work.

I’d prefer not to use a proxy kind to add that slash, but here is a proxy, generated by AI (lumo.proton.me):

<?php
// Simple PHP script to append a slash to RSS feed links

// Get the RSS feed URL from the query parameter
$rssUrl = isset($_GET['url']) ? $_GET['url'] : null;

if (!$rssUrl) {
    die("Error: Please provide a URL via the 'url' parameter. Example: script.php?url=https://example.com/feed.xml");
}

// Validate the URL (optional but recommended)
if (!filter_var($rssUrl, FILTER_VALIDATE_URL)) {
    die("Error: The provided URL is invalid.");
}

// Download the RSS feed
// Note: In production, use libxml_disable_entity_loader(true) for security against XXE attacks
$feedContent = @file_get_contents($rssUrl);

if ($feedContent === false) {
    die("Error: Could not load RSS feed from $rssUrl.");
}

// Parse the XML
libxml_use_internal_errors(true); // Suppress errors to continue parsing
$dom = new DOMDocument();
if (!$dom->loadXML($feedContent)) {
    die("Error: Invalid XML in the RSS feed.");
}
libxml_clear_errors();

// Find all <item> elements
$items = $dom->getElementsByTagName('item');

foreach ($items as $item) {
    $links = $item->getElementsByTagName('link');
    
    foreach ($links as $link) {
        $currentUrl = trim($link->nodeValue);
        
        // Check if the URL already ends with a slash
        if (substr($currentUrl, -1) !== '/') {
            $newUrl = $currentUrl . '/';
            $link->nodeValue = $newUrl;
        }
    }
}

// Output as XML
header('Content-Type: application/xml; charset=utf-8');
echo $dom->saveXML();
?>

Then using this URL with FTR:

http(s)://example.com/add_slash.php?url=https%3A%2F%2Fwww.vienna.at%2Fnews%2Fwien%2Ffeed%2F