Is it possible to get text from meta tags?

extractor_fan · October 1, 2019, 10:25am

Hello,

I’m trying to extract from here: https://www.wfdd.org/story/week-brexit

It’s an audio article and there is no visible text, but there is a description meta tag.

<meta property="og:description" content="It was a disastrous week for new British Prime Minister Boris Johnson. NPR's Noel King talks to Brexit researcher Georgina Wright of the Institute for Government in London." />

I tried this:
//meta[@property='og:description']/@content

but the return is empty.

Also, I need to set prune: no or I get this error:

Fatal error : Call to undefined method DOMAttr::getElementsByTagName() in /app/fivefilters/full-text-rss-3.9.1/libraries/readability/Readability.php on line 905

fivefilters · October 2, 2019, 1:36pm

At the moment it’s not possible to do that, but if you enable JSON output (using the checkbox or passing &format=json to makefulltextfeed.php), or use the extract.php endpoint with this URL, you’ll get the og:description text returned in the output automatically (no need for site config changes).

As for the error, it’s connected to your targetting of a DOM attribute (DOMAttr) @content rather than a DOM element (DOMElement) which is what Full-Text RSS currently expects when processing the body directive in a site config file. We should really make this clear in the documentation as it’s currently not explicit.

I’ve made a note to see if we can handle attributes in a future version when they’re the target of the body expression in a site config file.

Thanks for reporting this.