It looks like there used to be a plugin that did exactly what i need
to, but it is not compatible with Nutch 2:
http://tinyurl.com/3dzrzv4
I made a quick attempt at refactoring, but it is too complex without
good understanding of the Nutch architecture.



On Thu, Apr 14, 2011 at 1:03 PM, Dietrich <[email protected]> wrote:
> Is there a possible workaround you would suggest, maybe using a
> different/custom plugin that parses the lastModified date and creates
> another field in the proper format?
>
>
>
> On Thu, Apr 14, 2011 at 12:44 PM, Markus Jelsma
> <[email protected]> wrote:
>> Hi,
>>
>> This is tricky. Although Solr is currently the only supported indexer and
>> wants dates in one single format, we cannot easily change this behaviour
>> because it will break existing setups that rely on this format.
>>
>> Cheers,
>>
>>> I am using the index-more plugin to parse the lastModified data in web
>>> pages in order to store it in a Solr data field.
>>>
>>> In solrindex-mapping.xml I am mapping lastModified to a field "changed" in
>>> Solr: <field dest="changed" source="lastModified"/>
>>>
>>> However, when posting data to Solr the SolrIndexer posts it as a long,
>>> not as a date:
>>> <add><doc boost="1.0"><field
>>> name="changed">1079326800000</field><field
>>> name="tstamp">20110414144140188</field><field
>>> name="date">20040315</field>
>>>
>>> Solr rejects the data because of the improper data type.
>>> Strangely, the tstamp is in the proper, which is idential in the Nutch
>>> schema, is in a proper date format, and there is also a field "date"
>>> of unknown origin (it is not in the Nutch schema).
>>>
>>> Any suggestions would be most appreciated.
>>
>

Reply via email to