As I see readability.php doesn’t have full support of now common HTML5 tags. Please update it for the parser to work better with the modern web.
Readability.php doesn’t examine HTML elements from a fixed list, so HTML5 elements are considered when evaluating what to extract. It currently adjusts the scores of certain elements slightly higher/lower based on a list of tag names. But that’s a minor adjustment, so the absence of HTML5 elements in that list is unlikely to disadvantage those elements very much. Nonetheless, it’s something on our todo list.
This adjustment should be considered though as article tag have the highest possibility of being content and nav/header/footer the lowest.
Also htmLawed.php was updated with HTML5 support: http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/beta