How does Google recognize publish date of a post

I very much doubt that the published date of a post or article is based on the <lastmod> entry in an XML sitemap (as others have suggested) or the Last-Modified HTTP header for that matter. An XML Sitemap is only advisory, not authoritative. The last modified date of a document is probably not the same as the (original) publish date of an article. And, as I mentioned in my comment at the top of the page, the last modified date of a document is probably more important for caching and perhaps determining crawl rates. The Last-Modified HTTP header of dynamically generated pages are often very close to the actual date/time (as it is for WordPress blogs).

An RSS/Atom feed on the other hand does contain this specific nugget of information. And indeed, on Wordpress sites that do not include the publish date in the content, the publish date still appears in Google's search results. And as far as I can tell, this matches the date in the RSS Feed.

EDIT#1: However, an RSS feed does not necessarily contain all the pages. In most cases it should only contain the latest or most recently updated pages. But there is no reason that Google should forget what it has already read, and providing the content of that page has not changed then neither should the last modified date.

If there is no RSS feed I think Google is clever enough to analyse the page content. Particularly if dates are marked up 'semantically' with the help of microformats. It's perfectly feasible that Google will see the following as the authoritative published date for an article that it is contained within:

<abbr class="published" title="2010-08-27T15:45:00-0700">
Friday, August 27th, 2010
</abbr>

Google certainly does read microformats - hCard, hReview, etc.

Just to add, I don't think Google would state a publish date unless it was able to find something authoritative that would suggest this. It's not going to deduce a 'publish date' on speculative data, since an incorrect 'publish date' is no use to anybody and Google would get a lot of stick for it!

And just for the record (if @Tom is suggesting otherwise :) I think posts/articles should have the publish date visibly displayed. Many don't, and this can be frustrating for the reader particularly when researching technology issues and you find that having read half way through the article it's out of date!

EDIT#2: I have since experienced a similar annoyance that @mmdanziger details in his answer. On one of my old sites I have text of the form "Site Last Updated Sun 17th Jun 2012" (not marked up in any special way) at the top of every page (written to the page with JavaScript!!). This same date has been picked up by Google and now appears alongside several (but not all) pages that appear in the SERPS - this certainly is not the publish date of the page. It would seem that Google is simply scrapping the page for a string of the form "last updated (datestring)" (having processed the JavaScript!!). This particular site does not have an RSS feed. The site does have a Sitemap.xml file but the dates are different.

I have noticed similar behaviour on other sites also.


I just had a problem that all of my main pages were shown as being updated over 4 years ago, even though Google knows that that's not true because the pages have been indexed for that long and change substantially from month to month. After being really puzzled, then really annoyed, then puzzled again, I finally found the problem. Our legal terms were being served in a hidden div with a "Last updated: October 30 2007" and the div was being loaded on almost all our pages. (Because it pops up on registration) I've removed it and now I assume the date will either disappear or be corrected to something more reasonable.

A cautionary tale and one more piece of evidence that they check the semantics of the site more than the technical details or their own indexing history.


I think Google uses Sitemap and RSS feed to recognize published date.. you can impliment this feature in your CMS by creating a xml site map according to Standards.

<lastmod>2011-08-18</lastmod>