No data is retrive from the rss feed

KalimeroMK · October 9, 2020, 10:51pm

the feed is :https://pokajanie.mk/category/besedi/feed/
and is full of data but whan I whant to retrived it shows
обновување на лозинка(update password)

враќање на лозинка(retrive password)

вашата е-мејл(our email)

А лозинка ќе ви биде испратено до вас.(pass will be sed to you ) translate of the text

fivefilters · October 11, 2020, 11:15am

Hi there, the automatic extraction we rely on looks for certain HTML elements to determine which HTML block is most likely to contain the article content. One of the clues is the number of paragraph (<p>) elements.

This site, very unusually, appears to be using header (<h3>) elements instead of paragraph elements. So my guess is that’s what’s causing extraction issues.

Here’s a sample:

<h3>Самооправдувањето е болест која е пораширена, понезабележлива и потешка од судењето и осудувањето.</h3>
<h3>Самооправдувањето и осудувањето се две страни на една гревовна состојба – гордоста.</h3>
<h3>Пораширена – затоа што, најчесто, дури и тие што внимаваат да не осудуваат и да не им судат на другите, не внимаваат дека се самооправдуваат.</h3>
<h3>Понезабележлива – затоа што, најчесто, тој што се самооправдува не го фаќа моментот на гревот на самооправдувањето.</h3>

These should all be marked up as regular paragraph (<p>) elements.

If you’re using a self-hosted version of Full-Text RSS, you can create a custom site config file to tell Full-Text RSS which block to extract from this site. That should give you more consistent results.

KalimeroMK · October 12, 2020, 5:15pm

Can tou prvide me with doc or example code how to create a custom site config file to tell Full-Text RSS which block to extract from this site pls

fivefilters · October 13, 2020, 8:38am

The usual way to do this is to load an article from the site and examine its HTML. We have instructions on how to write a site config file for the site based on its HTML here: https://help.fivefilters.org/full-text-rss/site-patterns.html

We also have a tool at https://siteconfig.fivefilters.org for a simpler, visual way to do this. You select the content block and then click Download Full-Text RSS site config.

See here for what it produces for an article on this site: http://siteconfig.fivefilters.org/grab.php?url=https%3A%2F%2Fpokajanie.mk%2F2020%2F10%2F11%2F29597%2F

If I click into the main body text - when the blue outline covers just the content block - it produces the following Full-Text RSS site config file:

pokajanie.mk.txt

# Generated by FiveFilters.org's web-based selection tool
# Place this file inside your site_config/custom/ folder
# Source: http://siteconfig.fivefilters.org/grab.php?url=https%3A%2F%2Fpokajanie.mk%2F2020%2F10%2F11%2F29597%2F

body: //div[contains(concat(' ',normalize-space(@class),' '),' td-post-content ')]
test_url: https://pokajanie.mk/2020/10/11/29597/

You then place this file inside your full-text-rss/site_config/custom/ folder.

Hope that’s some help.