Feed is not longer created with v2.3

Since version 2.3 of Feed Creator (self-hosted AND fivefilters.org), the following doesn’t work any longer. It worked before v2.3

index.php?url=https%3A%2F%2Fwww.nordmainische-s-bahn.de%2Fhome.html&item=div.pgSlide&item_title=h3&item_date=time&feed_title=Nordmainische+S-Bahn&max=5&order=document&guid=0

and I don’t know why. Alternative item selectors also not working:
.layout_latest
.pgSlide

same problem with
index.php?url=https%3A%2F%2Fwww.knoten-stadion.de%2Fbaustellen-journal.html&item=.layout_latest&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Knoten+Stadion+Journal&max=3&order=document&guid=0

Other German railway construction sites with nearly the same html code are working FINE:

index.php?url=https%3A%2F%2Fwww.fernbahntunnel-frankfurt.de%2Fmeldungen.html&item=.layout_latest&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Fernbahntunnel+Frankfurt&max=3&order=document&guid=0

index.php?url=https%3A%2F%2Fwww.frmplus.de%2Faktuelles.html&item=.layout_latest&item_title=h2&item_desc=.teaser&item_date=time&feed_title=FRM+Plus&max=3&order=document&guid=0

index.php?url=https%3A%2F%2Fwww.mein-hbf-ffm.de%2Fhome.html&item=.pgSlide&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Masterplan+Ffm+Hbf&max=3&order=document&guid=0

index.php?url=https%3A%2F%2Fwww.riedbahn.de%2Fhome.html&item=.layout_latest&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Riedbahn&max=3

index.php?url=https%3A%2F%2Fhanau-wuerzburg-fulda.de%2Fhome.html&item=.pgSlide&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Hanau+-+W%C3%BCrzburg+-+Fulda&max=3&order=document&guid=0

index.php?url=https%3A%2F%2Fwww.s6-frankfurt-friedberg.de%2Fhome.html&item=.pgSlide&item_title=h3&item_desc=.teaser&item_date=time&feed_title=S6+Frankfurt+-+Friedberg&max=3&order=document&guid=0

The last one, is the main news page of knoten-stadion, which works. Above there is the faulty sub page baustellen-journal.html

index.php?url=https%3A%2F%2Fwww.knoten-stadion.de%2Fmeldungen.html&item=.layout_latest&item_title=h3&item_desc=.teaser&item_date=time&feed_title=Knoten+Stadion&max=3&order=document&guid=0&strip_if_url%5B%5D=baustellen-journal.html

Thanks for the report, Holger. Looking into it…

Maybe I have found a hint. It seems there is a faulty implementation of an image on the source page within the div.layout_latest selector. Maybe this is the reason:

<div
  class="backpicDiv"
  style="background: url(files/page/02_Aktuelles/01_AI_Meldung/2024/20240611_Infomobil_Maintal/20240611_NMS_Infomobil_Maintal_1.jpg) 50% 50% no-repeat; background-size: cover; height: 250px")">
&nbsp;
</div>

The style element ends with an IMHO illegal ")"> instead of just ">
So the parser can’t detect the end of the div initialization.

I wrote a bug report to source site and they reacted fast and fixed it. But FC still is not creating the feed. So that wasn’t the problem.

Now I found out, that curl (Debian Trixie) is throwing an error and the html is incomplete for both articles:

curl -o test.txt https://www.nordmainische-s-bahn.de/home.html -v

result: (click to show)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Host www.nordmainische-s-bahn.de:443 was resolved.
* IPv6: (none)
* IPv4: 84.38.79.51
*   Trying 84.38.79.51:443...
* Connected to www.nordmainische-s-bahn.de (84.38.79.51) port 443
* ALPN: curl offers h2,http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2867 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [520 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / RSASSA-PSS
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=www.nordmainische-s-bahn.de
*  start date: May 16 00:12:25 2024 GMT
*  expire date: Aug 14 00:12:24 2024 GMT
*  subjectAltName: host "www.nordmainische-s-bahn.de" matched cert's "www.nordmainische-s-bahn.de"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
*   Certificate level 0: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 1: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
} [5 bytes data]
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.nordmainische-s-bahn.de/home.html
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.nordmainische-s-bahn.de]
* [HTTP/2] [1] [:path: /home.html]
* [HTTP/2] [1] [user-agent: curl/8.8.0]
* [HTTP/2] [1] [accept: */*]
} [5 bytes data]
> GET /home.html HTTP/2
> Host: www.nordmainische-s-bahn.de
> User-Agent: curl/8.8.0
> Accept: */*
>
* Request completely sent off
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [297 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [297 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0< HTTP/2 200
< cache-control: must-revalidate, no-cache, no-store, private
< date: Thu, 27 Jun 2024 16:48:44 GMT
< x-content-type-options: nosniff
< referrer-policy: no-referrer-when-downgrade, strict-origin-when-cross-origin
< permissions-policy: interest-cohort=()
< x-frame-options: SAMEORIGIN
< x-xss-protection: 1; mode=block
< contao-cache: miss
< age: 0
< set-cookie: csrf_https-contao_csrf_token=6C8ZBOrpuJ1pA7s9e4p90X-rDflWp8Ckae_0tGRhV7M; path=/; secure; httponly; samesite=lax
< content-length: 34660
< vary: Accept-Encoding
< content-type: text/html; charset=UTF-8
< server: Apache
<
{ [15997 bytes data]
* HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
 94 34660   94 32750    0     0  24835      0  0:00:01  0:00:01 --:--:-- 24848
* Connection #0 to host www.nordmainische-s-bahn.de left intact
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

the html is ending abruptly in the middle of an unclosed <element> on the same point of the code with every try and I cant find a reason in the html itself. I even tried to curl this on my paid vserver (Ubuntu noble) and the code stops on the very same point. So no timing problem.

As for this, the problem might be on the missing </body> and </html> elements, or?

A curl (8.7.1) on the windows 11 (23H2) console delivers a complete html result with no errors. I’m going crazy!

OK, error 92 is a protocol error and I could solve the error by adding --http1.1 to the curl command.

Which does not help me on FC, yet. Because I don’t know if and how to fix that in the php code.

Is simplePie able to handle http/1.1 requests? If yes, could we get an option for this, please?

EDIT:
…or kind of an automatic fallback: If there is a protocol error with http/2 or autodetect, then try http/1.1, if this fails too, try http/1

Still not had a chance to investigate, but will do, hopefully tomorrow.

1 Like