Hello all,

I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
the advice from
http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about
converting date fields to SortableLong fields for better memory efficiency.
However, whenever I try to do this using the DateFormater, I get exceptions
when indexing for every row that tries to create my sortable fields.

In my schema.xml, I have the following definitions for the fieldType and
dynamicField:

<fieldType name="sdate" class="solr.SortableLongField" indexed="true"
stored="false" sortMissingLast="true" omitNorms="true" />
<dynamicField name="sort_date_*" type="sdate" stored="false" indexed="true"
/>

In my dih.xml, I have the following definitions:

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
        <entity
            name="xml_stories"
            rootEntity="false"
            dataSource="null"
            processor="FileListEntityProcessor"
            fileName="legacy_stories.*\.xml$"
            recursive="false"
            baseDir="/usr/local/extracts"
            newerThan="${dataimporter.xml_stories.last_index_time}"
        >
            <entity
                name="stories"
                pk="id"
                dataSource="xml_stories"
                processor="XPathEntityProcessor"
                url="${xml_stories.fileAbsolutePath}"
                forEach="/RECORDS/RECORD"
                stream="true"

transformer="DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer"
                onError="continue"
            >
                <field column="_modified_date"
xpath="/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL" />
                <field column="modified_date" sourceColName="_modified_date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />

                <field column="_df_date_published"
xpath="/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL" />
                <field column="df_date_published"
sourceColName="_df_date_published" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
/>

                <field column="sort_date_modified"
sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
                <field column="sort_date_published"
sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
            </entity>
        </entity>
    </document>
</dataConfig>

The fields in question are in the formats:

<RECORDS>
<RECORD>
    <PROP NAME="R_StoryDate">
        <PVAL>2001-12-04T00:00:00Z</PVAL>
    </PROP>
    <PROP NAME="R_ModifiedTime">
        <PVAL>2001-12-04T19:38:01Z</PVAL>
    </PROP>
</RECORD>
</RECORDS>

The exception that I am receiving is:

Oct 15, 2010 6:23:24 PM
org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
    at java.text.DateFormat.parse(DateFormat.java:337)
    at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
    at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
    at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
    at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
    at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
    at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
    at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
    at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
    at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
    at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
    at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

I know that it has to be the SortableLong fields, because if I remove just
those two lines from my dih.xml, everything imports as I expect it to. Am I
doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
this not supported in my version of SOLR? I'm not very experienced with
Java, so digging into the code would be a lost cause for me right now. I was
hoping that somebody here might be able to help point me in the
right/correct direction.

It should be noted that the modified_date and df_date_published fields index
just fine (so long as I do it as I've defined above).

Thank you,

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"

Reply via email to