Hii Team,
There is any words/characters limitation for Full Text Extraction ?
I test this link, only half article is extracted.
Hii Team,
There is any words/characters limitation for Full Text Extraction ?
I test this link, only half article is extracted.
Thank you @harboot for reporting this. It seems they (Trendmicro) changed their layout and they are not longer using the subdomain blog.trendmirco.com
I fixed that with a new config.
You may wait an hour until ftr.fivefilters got the update
Hii⌠Need help again. same issue.
Iam still trying to understand the script. I dont know how to remove some part, example on the last post i want to remove âmeet the authorsâ, but its not working
body: //div[contains(concat(' ',normalize-space(@class),' '),' full ') and (contains(concat(' ',normalize-space(@class),' '),' blog '))]
strip: //div[contains(concat(' ',normalize-space(@class),' '),' author-block ')]
test_url: On-Device Fraud on the rise: exposing a recent Copybara fraud campaign | Cleafy Labs
@harboot: you may try to keep the config simple in the first place. If that donât work you can try to get more specific.
I just uploaded a new config:
body: //div[contains(@class, 'full blog')]
strip_id_or_class: author-block
test_url: https://www.cleafy.com/cleafy-labs/on-device-fraud-on-the-rise-exposing-a-recent-copybara-fraud-campaign
Very often a
prune: no
is doing most of the trick.
But your example for strip is working too.
If you post code or config examples here in the forum, you should use the âpreformatted textâ button from the toolbar: </>