Only half article were successfully extracted

Hii Team,

There is any words/characters limitation for Full Text Extraction ?
I test this link, only half article is extracted.

Thank you @harboot for reporting this. It seems they (Trendmicro) changed their layout and they are not longer using the subdomain

I fixed that with a new config.
You may wait an hour until ftr.fivefilters got the update

Hii… Need help again. same issue.

Iam still trying to understand the script. I dont know how to remove some part, example on the last post i want to remove “meet the authors”, but its not working

body: //div[contains(concat(' ',normalize-space(@class),' '),' full ') and (contains(concat(' ',normalize-space(@class),' '),' blog '))]
strip: //div[contains(concat(' ',normalize-space(@class),' '),' author-block ')]

test_url: On-Device Fraud on the rise: exposing a recent Copybara fraud campaign | Cleafy Labs

@harboot: you may try to keep the config simple in the first place. If that don’t work you can try to get more specific.

I just uploaded a new config:

body: //div[contains(@class, 'full blog')]

strip_id_or_class: author-block


Very often a
prune: no
is doing most of the trick.

1 Like

But your example for strip is working too.

If you post code or config examples here in the forum, you should use the ‘preformatted text’ button from the toolbar: </>

1 Like