Get author from BusinessWire

In the future we’ll have an option to allow users to have extraction occur on pages after JS has been executed, but no estimate as to when that will be ready.

Biggest wish by far! I am no using ProxyCrawl to get data from. As a suggestion: I like that they have two parameters: the ajax_wait (let the page render) and the page_wait (which will just wait x seconds after the page is loaded). I find I need both approaches to tackle Angular and Raven websites.

Additional tip for the OP: sometimes it helps if you set http_header(user-agent): Googlebot-News Some site the serve a more crawler friendly version of the page.

1 Like