When I extract an article (not feed) that has this raw html markup:
<div class="foo">I'm a div</div>
It is converted to this parsed html:
<p>I'm a div</p>
Is that something that can be turned off?
I have tried using makefulltextfeed.php and extract.php
$params = array(
'format'=>'json',
'max'=> 1,
'accept' => 'html',
'summary'=>1,
'url'=>$article_url);
http://ftr.fivefilters.org/makefulltextfeed.php?max=1&url=http://www.cloozle.com/testing/ckeditor-tester.html
UPDATE: I see that Readability does this…
Turn all divs that don't have children block level elements into p's
That explains it. I can disable. I don’t see any pref for this setting.
Hi there,
Glad you found what was causing it. I haven’t tested this, but I think if you create a site config file for the site you’re interested in and use something like the following, Readability should not be invoked.
body: //div[@class="entry-content"]
Perhaps we should have a setting to override this Readability behaviour.
Yep. You’re right. If I explicitly set the body in a config, it does not invoke Readability (I tested). But I generally only create a config for special cases where the content is not correctly targeted, so a setting would be nice if one wants to override it all the time, as I do.
Thanks for responding!