Just following up to see if anybody might have some words of wisdom on the
issue?

Thank you,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Fri, Oct 15, 2010 at 6:42 PM, Ken Stanley <doh...@gmail.com> wrote:

> Hello all,
>
> I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
> the advice from
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.htmlabout 
> converting date fields to SortableLong fields for better memory
> efficiency. However, whenever I try to do this using the DateFormater, I get
> exceptions when indexing for every row that tries to create my sortable
> fields.
>
> In my schema.xml, I have the following definitions for the fieldType and
> dynamicField:
>
> <fieldType name="sdate" class="solr.SortableLongField" indexed="true"
> stored="false" sortMissingLast="true" omitNorms="true" />
> <dynamicField name="sort_date_*" type="sdate" stored="false" indexed="true"
> />
>
> In my dih.xml, I have the following definitions:
>
> <dataConfig>
>     <dataSource type="FileDataSource" encoding="UTF-8" />
>         <entity
>             name="xml_stories"
>             rootEntity="false"
>             dataSource="null"
>             processor="FileListEntityProcessor"
>             fileName="legacy_stories.*\.xml$"
>             recursive="false"
>             baseDir="/usr/local/extracts"
>             newerThan="${dataimporter.xml_stories.last_index_time}"
>         >
>             <entity
>                 name="stories"
>                 pk="id"
>                 dataSource="xml_stories"
>                 processor="XPathEntityProcessor"
>                 url="${xml_stories.fileAbsolutePath}"
>                 forEach="/RECORDS/RECORD"
>                 stream="true"
>
> transformer="DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer"
>                 onError="continue"
>             >
>                 <field column="_modified_date"
> xpath="/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL" />
>                 <field column="modified_date"
> sourceColName="_modified_date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
>
>                 <field column="_df_date_published"
> xpath="/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL" />
>                 <field column="df_date_published"
> sourceColName="_df_date_published" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
> />
>
>                 <field column="sort_date_modified"
> sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
>                 <field column="sort_date_published"
> sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> The fields in question are in the formats:
>
> <RECORDS>
> <RECORD>
>     <PROP NAME="R_StoryDate">
>         <PVAL>2001-12-04T00:00:00Z</PVAL>
>     </PROP>
>     <PROP NAME="R_ModifiedTime">
>         <PVAL>2001-12-04T19:38:01Z</PVAL>
>     </PROP>
> </RECORD>
> </RECORDS>
>
> The exception that I am receiving is:
>
> Oct 15, 2010 6:23:24 PM
> org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
> WARNING: Could not parse a Date field
> java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
>     at java.text.DateFormat.parse(DateFormat.java:337)
>     at
> org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
>     at
> org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
>     at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
>     at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>     at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>     at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>     at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
> I know that it has to be the SortableLong fields, because if I remove just
> those two lines from my dih.xml, everything imports as I expect it to. Am I
> doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
> this not supported in my version of SOLR? I'm not very experienced with
> Java, so digging into the code would be a lost cause for me right now. I was
> hoping that somebody here might be able to help point me in the
> right/correct direction.
>
> It should be noted that the modified_date and df_date_published fields
> index just fine (so long as I do it as I've defined above).
>
> Thank you,
>
> - Ken
>
> It looked like something resembling white marble, which was
> probably what it was: something resembling white marble.
>                 -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>

Reply via email to