It looks like there used to be a plugin that did exactly what i need to, but it is not compatible with Nutch 2: http://tinyurl.com/3dzrzv4 I made a quick attempt at refactoring, but it is too complex without good understanding of the Nutch architecture.
On Thu, Apr 14, 2011 at 1:03 PM, Dietrich <[email protected]> wrote: > Is there a possible workaround you would suggest, maybe using a > different/custom plugin that parses the lastModified date and creates > another field in the proper format? > > > > On Thu, Apr 14, 2011 at 12:44 PM, Markus Jelsma > <[email protected]> wrote: >> Hi, >> >> This is tricky. Although Solr is currently the only supported indexer and >> wants dates in one single format, we cannot easily change this behaviour >> because it will break existing setups that rely on this format. >> >> Cheers, >> >>> I am using the index-more plugin to parse the lastModified data in web >>> pages in order to store it in a Solr data field. >>> >>> In solrindex-mapping.xml I am mapping lastModified to a field "changed" in >>> Solr: <field dest="changed" source="lastModified"/> >>> >>> However, when posting data to Solr the SolrIndexer posts it as a long, >>> not as a date: >>> <add><doc boost="1.0"><field >>> name="changed">1079326800000</field><field >>> name="tstamp">20110414144140188</field><field >>> name="date">20040315</field> >>> >>> Solr rejects the data because of the improper data type. >>> Strangely, the tstamp is in the proper, which is idential in the Nutch >>> schema, is in a proper date format, and there is also a field "date" >>> of unknown origin (it is not in the Nutch schema). >>> >>> Any suggestions would be most appreciated. >> >

