the title of the rss once passed through fivefitlers becomes “Με χαράκωσαν οι χρυσαυγίτες”
and that happens in cases where the title contains quotes " " in greek. What is the fix for that? thanks.
the title of the rss once passed through fivefitlers becomes “Με χαράκωσαν οι χρυσαυγίτες”
and that happens in cases where the title contains quotes " " in greek. What is the fix for that? thanks.
Ionnis, do you have a URL I can try out to see if I can reproduce this problem?
i can’t give you the rss because it has changed and changes constantly and not all titles have quotes.
if you can take the title and pass it through the script you might reproduce it.thanks!
ioannis
Hi Ioannis, I couldn’t reproduce this. Do you have the same trouble when you use that URL in our hosted version? http://fivefilters.org/content-only/
I did notice that double quotes in the article’s title do not appear in the Full-Text RSS result, but that’s because they do not appear in the title element in the HTML header (which is where Full-Text RSS was grabbing it from). We’ve added a site config file for this site to improve extraction and to use the article title which appears in the HTML body: https://github.com/fivefilters/ftr-site-config/blob/master/news247.gr.txt
Thanks Keyvan,
It seems it’s not the double quotes, because i tested again on my version and quotes work…i could not test in your version it only showed me the 3 first in the rss no matter if i changed the option.
Anyway, i will keep you updated when i meet this in specific rss.Some kind of encoding problem either in the script itself or my server PHP maybe.
Thanks a lot for looking into this.
Ioannis
Ok, no problem. Let me know if you come across the problem again.
Our free, hosted service only process up to 3 items per feed. But if it’s an issue for a certain article, you can paste the article URL in rather than the feed, and we’ll create a 1-item feed for that article.
hi, i nailed it again on a random feed.
its title gives me 7 άλλ’ αντ’ άλλα του ΣΔΟΕ για τις 5 λίστες επωνύμων
the thing is through your site it works fine…so what could it be? any suggestions? some kind of encoding problem but where?
Ioannis
actually when i write “its title gives me” i was seeing something like this : 0;κωσαν οι χρυσαυ&ga and then when i posted it’s normal again!!
lol what is going on…
ioannis
i think it is the feed reader or the browser i use firefox
Ioannis
I Ionnis, this is strange. I can’t reproduce the problem locally or on fivefilters.org. I tried on Chrome and Firefox. If it happens again, could you enable debug mode (check the debug checkbox) and send us the output. Thanks.
hi Keyvan, it happened again, it is the third item on the feed. thanks for taking a look.
APC is disabled or not available on server
Supplied URL: http://www.iefimerida.gr/πολιτισμός/feed
Caching is enabled…
** Loading class Zend_Cache (Zend/Cache.php)
** Loading class HumbleHttpAgent (humble-http-agent/HumbleHttpAgent.php)
** Loading class CookieJar (humble-http-agent/CookieJar.php)
** Loading class ContentExtractor (content-extractor/ContentExtractor.php)
** Loading class SiteConfig (content-extractor/SiteConfig.php)
Attempting to process URL as feed
** Loading class SimplePie_HumbleHttpAgent (humble-http-agent/SimplePie_HumbleHttpAgent.php)
Fetching URL (http://www.iefimerida.gr/πολιτισμός/feed)
Starting parallel fetch (curl_multi_*)
Processing set of 1
…adding to pool
Sending request…
Received responses
** Loading class FeedWriter (feedwriter/FeedWriter.php)
Fetching feed items
Starting parallel fetch (curl_multi_*)
Processing set of 4
…adding to pool
Sending request…
Received responses
Processing feed item 3
URL already fetched - in memory (http://www.iefimerida.gr/news/126719/υποψήφια-για-grammy-η-αλέξια-η-λατρεμένη-τραγουδίστρια-των-80s-επανεμφανίζεται-δυναμικά, effective: http://www.iefimerida.gr/news/126719/υποψήφια-για-grammy-η-αλέξια-η-λατρεμένη-τραγουδίστρια-των-80s-επανεμφανίζεται-δυναμικά)
Character encoding: utf-8
Looking for site config files to see if single page link exists
Returning cached and merged site config for iefimerida.gr
. looking for site config for iefimerida.gr.merged in primary folder
… site config for iefimerida.gr.merged already loaded in this request
Attempting to extract content
Returning cached and merged site config for iefimerida.gr
. looking for site config for iefimerida.gr.merged in primary folder
… site config for iefimerida.gr.merged already loaded in this request
Attempting to parse HTML with libxml
Using Readability
Detecting title
Detecting body
Pruning content
Ioannis
Hi Ioannis, I’m afraid I can’t reproduce this. I tried checking results in Chrome and Firefox, and both look fine. If you think it might be a server issue, you could try hosting somewhere else and see if you experience the same issue. Not sure what else to suggest.
Hi keyvan, first of all thanks a lot for looking into it.
I changed server, same thing, i’ am sure it is some kind of utf8 encoding error all the weird stuff like &epsilon &alpha, is how the encoding of utf8_general_ci? might represent the data. I actually tried to read the erroneous feed with another feed reader and the problem was gone!
it is really weird, i have utf_unicode_ci everywhere on my server should not have any problems, and if it were a script issue you would be able to reproduce it…
anyway…thanks it’s not a big deal but it happens from time to time.
take care!
Ioannis
hi again, see for a last time if you can replicate the problem on this feed
http://koykoycook.gr/?feed=rss2
thanks!
ioannis
Okay Keyvan MYSTERY solved! after tinkering with php.ini making sure it has charset utf8 my web server nginx has also utf8 and ofcourse mysql…nothing worked…
then I tried chrome web browser…tataaa problem gone. It was damn Firefox, and I although I had selected utf-8 it wasn’t working correctly!
ioannis
HI Ioannis, that’s very strange. I’m glad you managed to get it working in the end. Thanks for the update.