I wrote this on Guthub, but I think the right place is here, isn’t it?
I host RSS-Bridge and FTR locally. I use the FeedMergeBridge.php to merge 5 sub-feeds of the same site to a single one, so FreshRSS can sort out duplicates. Between them is FullTextRSS (FTR) which takes the full articles. This works great!
But FTR sets the author field dc:creator to literal ‘RSS-Bridge’ and I can’t find out how to get rid of that. I tried to find out how to do that on RSS-Bidge side, but that is too heavy for me. RSS-Bridge set this as the feed-author and ftr uses this as item author instead of the matched one from the article.
So I made configs in FTR both for my ip-address of RSS-Bridge and for the original site, where the full-text-content is from.
While the debug mode shows, that FTR found a correct match for the author, this is not passed to the outgoing feed. That match comes from the original site config. I added to existing diepresse.com.txt:
author: //span[@class='author__name']
author: '-'
Some articles have no author, so the second line would catch that. Will this change to empty string when I get arround the problem. I uploaded both, the feed from bridge and the feed from FTR to the github issue.
So how to override this feed-author field from RSS-Bridge?
We should really document this aspect of Full-Text RSS. Currently the author name extracted from the article is only used when the input feed doesn’t contain author information. If the input feed has author info, it’s always prioritised. If you have control over the input feed and can remove the author info from it, then the extracted author should be used in the feed Full-Text RSS produces.
We’ll introduce a request parameter in a future version to override this behaviour, as we have with ‘use_extracted_title’, which you can use to tell Full-Text RSS that the extracted article title should replace any title in the source feed.
The next problem with my combination was for timing. Getting 5 feeds, filter duplicates, joining them to one feed and get fulltext causes fresh-RSS to say, the feed is not available. But it was just a timeout. So I wrote a little script which is run by cron job every 20 min as user www-data. This gave me the oportunity to fix the author-problem in the script. I just renamed the “author”-field to “origin” The path in first row is accessible by browsers and freshRSS in local net via http://xxx.example.com/feeds.
@fivefilters I found two other feeds (from same company), where FTR is not setting the author field properly.
https://www.telepolis.de/news-atom.xml
https://www.heise.de/rss/heise-atom.xml
even when setting author: //meta[@name='author']/@content the dc_creator field is set to “heise online”, which is defined in <feed><author> of the feed’s xml.
When using a single article link, the author name will find it’s way to dc_creator.
Using the rdf feed instead, only works for https://www.heise.de/rss/heise.rdf but not for https://www.telepolis.de/news.rdf
Interesting, thanks @HolgerAusB! Looked into it and it seems SimplePie, if it can’t find an author associated with an item, will use the author associated with the feed. In such a situation we probably want to prioritise the the extracted author.
I also found myself in a similar case to @HolgerAusB where I’d like the extracted author to replace the original feed author (which is wrong). Did you have a chance to include a new parameter in the config file just like for titles ? Couldn’t find one…