Hi all,

I'm trying to index float values that are not required, input is an XML file. I 
have problems avoiding the NFE.
I'm using SOLR 3.6.



Index input:
- XML using DataImportHandler with XPathProcessor

Data:
Optional, Float, CDATA like: <estimated_hours>2.0</estimated_hours> or 
<estimated_hours/>

Original Problem:
Empty values would cause a NumberFormatException when being loaded directly 
into a "tfloat" type field.

Processing chain (to avoid NFE):
via XPath loaded into a field of type text with a trim and length filter, then 
via copyField directive into the tfloat type field

data-config.xml:
<field column="s_estimated_hours" xpath="/issues/issue/estimated_hours" />

schema.xml:
<types>...
                <fieldtype name="text_not_empty" class="solr.TextField">
                        <analyzer>
                                <tokenizer class="solr.KeywordTokenizerFactory" 
/>
                                <filter class="solr.TrimFilterFactory" />
                                <filter class="solr.LengthFilterFactory" 
min="1" max="20" />
                        </analyzer>
                </fieldtype>
</types>

<fields>...
                <field name="estimated_hours" type="tfloat" indexed="true" 
stored="true" required="false" />
                <field name="s_estimated_hours" type="text_not_empty" 
indexed="false" stored="false" />
</fields>

        <copyField source="s_estimated_hours" dest="estimated_hours" />

Problem:
Well, yet another NFE. But this time reported on the text field 
"s_estimated_hours":

WARNUNG: Error creating document : SolrInputDocument[{id=id(1.0)={2930}, 
s_estimated_hours=s_estimated_hours(1.0)={}}]
org.apache.solr.common.SolrException: ERROR: [doc=2930] Error adding field 
's_estimated_hours'=''
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
        at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66)
        at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.NumberFormatException: empty String
        at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:992)
        at java.lang.Float.parseFloat(Float.java:422)
        at org.apache.solr.schema.TrieField.createField(TrieField.java:410)
        at org.apache.solr.schema.FieldType.createFields(FieldType.java:289)
        at org.apache.solr.schema.SchemaField.createFields(SchemaField.java:107)
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:312)
        ... 11 more


It is like it would copy the empty value - which must not make it through the 
LengthFilter of "s_estimated_hours" - to the tfloat field "estimated_hours" 
anyway. How can I avoid this? Or is there any other way to make the indexer 
ignore the empty values when creating the tfloat fields? If it could at least 
create the document and enter the other values… (onError="continue" is not 
helping as this is only a Warning (I've tried))


BTW: I did try with the XPath that should only select those nodes with text: 
/issues/issue/estimated_hours[text()]
The result was that no values would make it into the tfloat fields while all 
documents would be indexed without warnings or errors. (I discarded this option 
thinking that the xpath was not correctly evaluated.)


Thank you for any suggestions!
Chantal

Reply via email to