Extracting dates in unusual format

<p class="spot_post_date">2021.06.17. <span class="lt_bar"><span>55,060</span>read</span></p>

I want to extract date in that code
It can’t extract date that setting Item date selector to p.spot_post_date.
How can I delete <span> tag?

When you select the date with p.spot_post_date you’ll get the following string:

2021.06.17. 55,060read

You can delete span elements using Feed Creator’s cleanup by entering p span in the “Source HTML: Remove elements (CSS)” field.

However, doing so will leave with you with the following string:

2021.06.17. 

PHP’s strtotime doesn’t recognise the Y.M.D format, it recognises Y-M-D (with dashes).

So you’ll have to use Feed Creator’s “Item date format” field to guide it. You can enter something like this:

Y.m.d+

The plus (+) sign at the end tells it to ignore the rest of the string, so you can use it even without removing the span elements.

So you enter two pieces of information for the date:

  1. Item date selector (CSS): p.spot_post_date
  2. Item date format: Y.m.d+
1 Like

Thank you so much! please solve this one too!
https://forum.fivefilters.org/t/failed-to-fetch-url-why/1226/3

1 Like

@fivefilters

I’m having a similar issue to extract the date from the following html:

<div class="author-date">By <span class="author">John Doe</span> on 05/26/2023</div>

I’m using the following options in Feed Creator:

  1. Item date selector (CSS): div.author-date
  2. Item date format: +m/d/Y

I’m adding the + sign to ignore anything coming before the date, but it’s not working.

Any other options to extract the date?

Thanks.

It doesn’t look like you can isolate the date using CSS selectors for the HTML you’ve provided. So I don’t think you can get the date to be parsed by Feed Creator.

As for using the + sign, that’s part of PHP’s date format handling. It can be used for trailing data only, not leading data. From the docs:

+ If this format specifier is present, trailing data in the string will not cause an error, but a warning instead

You could try something like the following format, but I’m not sure if it will match all entries if the author names have more than 2/3 parts ***** m/d/Y

You might get more consistent results if you use Feed Creator’s cleanup to remove span.author and then use a format like: **** m/d/Y

Both of your suggestions work perfectly. Thanks so much @fivefilters!

1 Like