Hi all,

I'm encountering a problem when I try to add records with a date field to the index.

The records I'm adding have very little date precision, usually YYYYMMDD but some only have year and month, others only have a year. I'm trying to get around this by using a text pattern factory to modify the field before indexing. This seems to work fine if the class is solr.TextField and a date will be converted from eg 1953 to 1953-01-01T00:00:00.000Z and then inserted into the index.

However, if I want to have the field as an actual date field (for doing range searches etc) I get the following error when I post the XML file.

        SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953

The corresponding stack trace from the solr server is:

Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
        at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
        at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
        at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
        at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 243) at org .apache .solr .update .processor .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org .apache .solr .handler .XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) at org .apache .solr .handler .XmlUpdateRequestHandler .handleRequestBody(XmlUpdateRequestHandler.java:123) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 405) at org .mortbay .jetty .handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 211) at org .mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.content(HttpConnection.java:835)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442)

My schema.xml file looks something like this:

...
<fieldType name="dateFormatter" class="solr.DateField" sortMissingLast="true" omitNorms="true">
                <analyzer>
                        <filter class="solr.TrimFilterFactory" />
                        <tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4}) $" replacement="$1.01.01" replace="all" /> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\. (\d{2})$" replacement="$1.$2.01" replace="all" /> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\. (\d{2})\.(\d{2})$" replacement="$1-$2-$3T00:00:00.000Z" replace="all" />
                </analyzer>
   </fieldType>
...
<field name="DateRecorded" type="dateFormatter" indexed="true" stored="true" multiValued="false"/>
...


My thinking is that Solr is trying to add the field directly as '1953' before doing the text factory stuff and is therefore not in the right format for indexing. Does that sound like a reasonable assumption and am I missing something which is causing it to go wrong? Can anyone help please?

I was originally storing the date in YYMMDD format as a text field and searching with wildcards, but that strikes me as somewhat inefficient. I could go back to doing that if necessary, but I'd rather do it the right way if I can.

Many thanks for your help.

Mark
PS. Apologies if this message comes through twice - I sent it yesterday afternoon but it hasn't turned up on the mailing list yet, so I'm trying again.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply via email to