Full-Text RSS prioritises the title in the feed (if input is a feed) or the
<title> element of the page if input is a URL of an HTML page.
In most cases the title element will contain the main title. It’s usually in the site’s interest to put the main title here as that’s what Google will list in its search results too. In this case, you’ll see Google treats the article’s title the same way Full-Text RSS does:
But because you’re giving Full-Text RSS a feed which you’ve generated using our Feed Creator tool, you’ve actually got the real title in the Feed Creator output:
But by using the
use_extracted_title=1 parameter when passing this to Full-Text RSS, you’re telling Full-Text RSS to ignore the article’s title in the feed and try to extract it from the HTML. If you remove this parameter, you should get the result you want:
So the simplest solution here seems to be the above. Just let Full-Text RSS use the title you’ve already got in the feed.
Having said that, to answer your question:
Is there a way that a user can manage their own extraction rules.
You can override default extraction rules using the
siteconfig parameter. This allows you to use the directives we list here. So let’s say you’re not using an input feed and want to extract the correct press release title from the HTML. The source HTML looks like this:
<strong>Bank of the West Named One of the Best
Places to Work for LGBTQ Equality and Diversity</strong>
<i>Sustainable Finance Leader Received a Perfect
Score of 100% on the Human Rights Campaign
Foundation’s Corporate Equality Index for Second
Year in a Row<br><br>
Named to Forbes’ 2020 Best Employers for Diversity List</i><br><br>
San Francisco, CA | Jan 15, 2020
What you’re after is the
<strong> element inside the
With XPath this would be:
To tell Full-Text RSS you want this as the title, you’d use:
Because we’re going to pass this to Full-Text RSS in the query string of the URL, in the
siteconfig parameter, we need to make sure it’s URL Encoded:
If we add this to your original URL, you’ll see its effect:
I’ll add that if you find you need to adjust extraction rules a lot like this, you will have more flexibility running our self-hosted copy of Full-Text RSS. You then have access to a
site_config/custom/ folder where you can create extraction rules to override the default ones. And you won’t have to worry about URL encoding things to pass in the query string. For example, you could have a file
site_config/custom/bankofthewest.com.txt with the contents:
Hope that’s some help.