Both Gizmodo and Lifehacker extractions are failing

As you know both are sister sites and use identical HTML codes. This is the site pattern defined for Gizmodo by FTR.

title: //head/title
author: //meta[@name="author"]/@content
body: //div[contains(@class, 'post-content')]
strip: //div[contains(@class, 'content-summary')]

strip: //aside

test_url: https://gizmodo.com/the-new-bad-tick-is-going-to-take-over-half-the-united-1831079855

But it fails to extract full content from the page. It fails in the test URL mentioned as well. Could you please update the site pattern for both Gizmodo and Lifehacker?

Thanks for letting us know. We’ll take a look. It might be server-retrieval issues. We’ll update here when we know more.

Can you please try updating your site config files and see if you have any luck. We’ve updated both gizmodo.com.txt and lifehacker.com.txt.

Thank you. I deleted the custom patterns I created and updated standard patterns from the server. Both working fine now.

BTW aren’t standard patterns included for .gizmodo.com.txt and .lifehacker.com.txt (starting with dot for subdomains)?

Good to hear!

We used to have .gizmodo.com.txt and .lifehacker.com.txt to match subdomains like io9.gizmodo.com. These now all appear to redirect to the main gizmodo.com / lifehacker.com sites, so we removed those site config files as they don’t appear to be needed any more.

1 Like