Hello all,
I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
the advice from
http://www.mail-archive.com/[email protected]/msg11887.html about
converting date fields to SortableLong fields for better memory efficiency.
However, whenever I try to do this using the DateFormater, I get exceptions
when indexing for every row that tries to create my sortable fields.
In my schema.xml, I have the following definitions for the fieldType and
dynamicField:
<fieldType name="sdate" class="solr.SortableLongField" indexed="true"
stored="false" sortMissingLast="true" omitNorms="true" />
<dynamicField name="sort_date_*" type="sdate" stored="false" indexed="true"
/>
In my dih.xml, I have the following definitions:
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<entity
name="xml_stories"
rootEntity="false"
dataSource="null"
processor="FileListEntityProcessor"
fileName="legacy_stories.*\.xml$"
recursive="false"
baseDir="/usr/local/extracts"
newerThan="${dataimporter.xml_stories.last_index_time}"
>
<entity
name="stories"
pk="id"
dataSource="xml_stories"
processor="XPathEntityProcessor"
url="${xml_stories.fileAbsolutePath}"
forEach="/RECORDS/RECORD"
stream="true"
transformer="DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer"
onError="continue"
>
<field column="_modified_date"
xpath="/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL" />
<field column="modified_date" sourceColName="_modified_date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
<field column="_df_date_published"
xpath="/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL" />
<field column="df_date_published"
sourceColName="_df_date_published" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
/>
<field column="sort_date_modified"
sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
<field column="sort_date_published"
sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
</entity>
</entity>
</document>
</dataConfig>
The fields in question are in the formats:
<RECORDS>
<RECORD>
<PROP NAME="R_StoryDate">
<PVAL>2001-12-04T00:00:00Z</PVAL>
</PROP>
<PROP NAME="R_ModifiedTime">
<PVAL>2001-12-04T19:38:01Z</PVAL>
</PROP>
</RECORD>
</RECORDS>
The exception that I am receiving is:
Oct 15, 2010 6:23:24 PM
org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
at java.text.DateFormat.parse(DateFormat.java:337)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
I know that it has to be the SortableLong fields, because if I remove just
those two lines from my dih.xml, everything imports as I expect it to. Am I
doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
this not supported in my version of SOLR? I'm not very experienced with
Java, so digging into the code would be a lost cause for me right now. I was
hoping that somebody here might be able to help point me in the
right/correct direction.
It should be noted that the modified_date and df_date_published fields index
just fine (so long as I do it as I've defined above).
Thank you,
- Ken
It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"