Hi - the timestamp is just the time when a page is being indexed. Not very useful except for deduplication. If you want to index some publishing date you must first identify the source of that date and get it out of webpages. It's possible to use og:date or other meta meta tags or perhaps other sources but to do so you must create a custom parse filter.
Meta tags can be indexed without creating a custom parse filter. If you don't trust websites or need special (re)formatting or checking logic you need to make a parse filter for it. I've also built a date parsing filter to retrieve dates in various formats from free text, check Jira for a patch for the dateparsefilter. It's an older version but still works well. -----Original message----- > From:Joe Zhang <smartag...@gmail.com> > Sent: Sun 04-Nov-2012 05:44 > To: user <user@nutch.apache.org> > Subject: timestamp in nutch schema > > My understanding is that the timestamp stores crawling time. Is there any > way to get nutch to parse out the publishing time of webpages and store > such info in timestamp or some other field? >