Get full text rss from inwestomat.eu

kp90 · February 21, 2023, 12:37pm

Hi,

I have problems with inwestomat.eu
RSS: https://inwestomat.eu/feed/

I don’t gett full article. Anyone could help?

HolgerAusB · February 21, 2023, 8:10pm

I am not a team member, just a user. But I will try to help. That is a polish site, isn’t it? I don’t speak polish.
Please send two URLs of longer articles and post the last sentence, so I could see where the article ends and other, not needed clutter starts.

Every/most articles seems to be containing a Podcast-Box with a youtube-video (audio only). How important is that to the article. I haven’t testet, if I could preserve that box. Or is it just a voice reading of all the article text, in this case I would strip that box.

Maybe I can do a site_config tomorrow.

kp90 · February 21, 2023, 8:46pm

Probably I have done it by myself. If you have idea how to make it better do not hesitate to fix it. Podcast and YT probably could be done in a better way.

body: //div[@id='primary']

# Remove podcast and YT
strip: //section[@class='elementor-section elementor-inner-section elementor-element elementor-element-416ba637 elementor-section-boxed elementor-section-height-default elementor-section-height-default']

# Remove banners and ads 
strip_id_or_class: elementor-shortcode

# Remove comments section
strip_id_or_class: comments

# Remove related posts
strip_id_or_class: related-posts

# Remove previous / next post
strip: //nav[@aria-label='artykuły']


prune: no

test_url: https://inwestomat.eu/ile-naprawde-kosztuje-przeplacanie-prowizji-maklerskiej/

HolgerAusB · February 22, 2023, 6:44am

Good job, so far @kp90. Unfortunately the used id 416ba637 for stripping the podcast/YT is a unique id, which only works for that single article. The software, investomat uses, gives unique id for every page, so no chance to identify the right section with that. But as the embedded media is working in the fulltext version, we can keep it just in.

I did some aditional cleanup in the header (stripping date, tags and the title from the text).

And I used //article for body definition, that is more common.

I contributed this to the repo (PR1057). ~~But we need to wait, until a dev has approved this.~~ It has been confirmed and is live now.