Hi all, I'm trying to index float values that are not required, input is an XML file. I have problems avoiding the NFE. I'm using SOLR 3.6.
Index input: - XML using DataImportHandler with XPathProcessor Data: Optional, Float, CDATA like: <estimated_hours>2.0</estimated_hours> or <estimated_hours/> Original Problem: Empty values would cause a NumberFormatException when being loaded directly into a "tfloat" type field. Processing chain (to avoid NFE): via XPath loaded into a field of type text with a trim and length filter, then via copyField directive into the tfloat type field data-config.xml: <field column="s_estimated_hours" xpath="/issues/issue/estimated_hours" /> schema.xml: <types>... <fieldtype name="text_not_empty" class="solr.TextField"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.LengthFilterFactory" min="1" max="20" /> </analyzer> </fieldtype> </types> <fields>... <field name="estimated_hours" type="tfloat" indexed="true" stored="true" required="false" /> <field name="s_estimated_hours" type="text_not_empty" indexed="false" stored="false" /> </fields> <copyField source="s_estimated_hours" dest="estimated_hours" /> Problem: Well, yet another NFE. But this time reported on the text field "s_estimated_hours": WARNUNG: Error creating document : SolrInputDocument[{id=id(1.0)={2930}, s_estimated_hours=s_estimated_hours(1.0)={}}] org.apache.solr.common.SolrException: ERROR: [doc=2930] Error adding field 's_estimated_hours'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:992) at java.lang.Float.parseFloat(Float.java:422) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.FieldType.createFields(FieldType.java:289) at org.apache.solr.schema.SchemaField.createFields(SchemaField.java:107) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:312) ... 11 more It is like it would copy the empty value - which must not make it through the LengthFilter of "s_estimated_hours" - to the tfloat field "estimated_hours" anyway. How can I avoid this? Or is there any other way to make the indexer ignore the empty values when creating the tfloat fields? If it could at least create the document and enter the other values… (onError="continue" is not helping as this is only a Warning (I've tried)) BTW: I did try with the XPath that should only select those nodes with text: /issues/issue/estimated_hours[text()] The result was that no values would make it into the tfloat fields while all documents would be indexed without warnings or errors. (I discarded this option thinking that the xpath was not correctly evaluated.) Thank you for any suggestions! Chantal