Hi all,
I'm encountering a problem when I try to add records with a date field
to the index.
The records I'm adding have very little date precision, usually
YYYYMMDD but some only have year and month, others only have a year.
I'm trying to get around this by using a text pattern factory to
modify the field before indexing. This seems to work fine if the
class is solr.TextField and a date will be converted from eg 1953 to
1953-01-01T00:00:00.000Z and then inserted into the index.
However, if I want to have the field as an actual date field (for
doing range searches etc) I get the following error when I post the
XML file.
SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953
The corresponding stack trace from the solr server is:
Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:
243)
at
org
.apache
.solr
.update
.processor
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler
.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org
.apache
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:
131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org
.apache
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org
.apache
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:
216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
405)
at
org
.mortbay
.jetty
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:
211)
at
org
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:
114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
502)
at org.mortbay.jetty.HttpConnection
$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector
$Connection.run(SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:442)
My schema.xml file looks something like this:
...
<fieldType name="dateFormatter" class="solr.DateField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<filter class="solr.TrimFilterFactory" />
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})
$" replacement="$1.01.01" replace="all" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\.
(\d{2})$" replacement="$1.$2.01" replace="all" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\.
(\d{2})\.(\d{2})$" replacement="$1-$2-$3T00:00:00.000Z" replace="all" />
</analyzer>
</fieldType>
...
<field name="DateRecorded" type="dateFormatter" indexed="true"
stored="true" multiValued="false"/>
...
My thinking is that Solr is trying to add the field directly as '1953'
before doing the text factory stuff and is therefore not in the right
format for indexing. Does that sound like a reasonable assumption and
am I missing something which is causing it to go wrong? Can anyone
help please?
I was originally storing the date in YYMMDD format as a text field and
searching with wildcards, but that strikes me as somewhat
inefficient. I could go back to doing that if necessary, but I'd
rather do it the right way if I can.
Many thanks for your help.
Mark
PS. Apologies if this message comes through twice - I sent it
yesterday afternoon but it hasn't turned up on the mailing list yet,
so I'm trying again.
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.