How to get content for single article from forums.lotro.com?

I want to write a site config for forums.lotro.com but I can’t find the right XPath for body and title and author. But for me it seems, that something different is the problem. While the feed itself is allready fulltext I neet the config for wallabag which says ‘can’t retrieve contents for this article’ when trying to send a link of an article.

I tried to build the config for FTR but nothing of the following will match, while videlibri finds matches with my XPath. Here are several examples, while I had only one title or body per test, of course:

http_header(user-agent): Mozilla/5.0 (Windows NT 10.0; rv:103.0) Gecko/20100101 Firefox/103.0
http_header(referer): https://forums.lotro.com/

prune: no
tidy: no

author: //a[contains(@class,'username')]

title: //title
title: /html/head/title

body: //blockquote
body: //blockquote[normalize-space(@class) = 'postcontent restore']
body: //blockquote[contains(concat(' ',normalize-space(@class),' '),' postcontent restore ')]
body: //div[@class='content']/div[1]/blockquote

test_url: https://forums.lotro.com/forums/showthread.php?695978-Bullroarer-Update-33-2-Beta-2-OPEN

When trying to curl the source I got no data, even when using it with -o filname and --insecure:

user@server:~$ curl -vvv -A 'Mozilla/5.0 (Windows NT 10.0; rv:103.0) Gecko/20100101 Firefox/103.0' 'https://forums.lotro.com/forums/showthread.php?695978-Bullroarer-Update-33-2-Beta-2-OPEN'
*   Trying 198.252.160.58:443...
* Connected to forums.lotro.com (198.252.160.58) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (OUT), TLS alert, handshake failure (552):
* error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small
* Closing connection 0
curl: (35) error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small

so I think, this is NOT a problem of FTR/Wallabag/vBulletin. When adding --cipher 'DEFAULT:!DH' to the curl line, the page is loading

so after setting CipherString = DEFAULT@SECLEVEL=1(from 2) at the end of /etc/ssl/openssl.cnf
I got the full-text in FTR and wallabag with just two lines in site-config:

author: (//a[contains(@class,'username')])[1]
body: (//blockquote[contains(concat(' ',normalize-space(@class),' '),' postcontent restore ')])[1]

test_url: https://forums.lotro.com/forums/showthread.php?695978-Bullroarer-Update-33-2-Beta-3-OPEN

In Full-Text RSS we do lower the seclevel as you’ve done here, but only when OpenSSL 1.1.1f is detected. That version caused issues with some sites, so we make adjustments if that version is detected. The issue we encountered was similar to the one reported here: cURL error 35: error:1414D172:SSL routines:tls12_check_peer_sigalg:wrong signature type · Issue #3029 · FreshRSS/FreshRSS · GitHub

If you open HumbleHttpAgent.php and search for ‘seclevel’ you should see the part of the code where we do this. The change was introduced in Full-Text RSS 3.9.8.

But I’m not sure if this is the same issue you’re encountering here. It could be that the site you’re trying to process has a misconfigured server that requires the lowering of the seclevel, or perhaps something entirely different.

1 Like

so maybe we could have an ignore-TLS-error in site-config?
#993

Not keen to add things like this at the site config level. In our experience these are not very common server issues. When they do occur, they’re usually either related to the remote server or the server making the request. If it’s the latter, there’s probably something that needs to be fixed or updated. If it’s the remote server, probably best to try to contact them to fix the issue.

The reason we implemented the fix I mentioned was because it was a known OpenSSL issue at the time affecting lots of users.

But I’m not sure what you’ve reported here is the same. If I try using Full-Text RSS on our servers, the content can be retrieved without any cipher changes.