It looks from your stack trace that your XML document has a value for "ChemicalNameOfSubstance" yet you do not have such a column defined in schema.xml. Is this your problem?
The easiest way to get Solr to ignore extra fields that you do not wish to index or store is to add a "catch-all" dynamic field to your schema.xml: <fields> ...all your fields go here... <!--last line at the end of "fields" --> <dynamicField name="*" type="string" indexed="false" stored="false" multiValued="true" /> </fields> This tells it to allow any column name that isn't explicitly defined but to just ignore it. This overrides Solr's default behavior in throwing an exception in such cases. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Konrad Lötzsch [mailto:konrad.loetz...@antibodies-online.com] Sent: Friday, December 21, 2012 6:25 AM To: solr-user@lucene.apache.org Subject: Can DataImportHandler ignore Missing Tags in XML? Hi, we are trying to import medline into a solr core. everthing works fine except the problem, that in the xml files from medline, sometimes certain tags are missing. If we define them in the data-config.xml file for our core, the dataimporthandler throws an exception for every tag, that is missing: SCHWERWIEGEND: Exception while solr commit. java.lang.IllegalArgumentException: no such field ChemicalNameOfSubstance at org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49) at org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380) Can we tell the DataImportHandler that it should write a default value if the tag is missing? Here is our data-config.xml (skipped most of the lines that work for simplicity) |<dataConfig> <dataSource name="medline" type="FileDataSource" encoding="UTF-8" /> <document name="MedlineCitations"> <entity name="file" processor="FileListEntityProcessor" baseDir="/home/" fileName=".*xml" recursive="true" rootEntity="false" dataSource="null"> <entity name="MedlineCitation" processor="XPathEntityProcessor" stream="true" forEach="/MedlineCitationSet/MedlineCitation" url="${file.fileAbsolutePath}" > <field column="PMID" xpath="/MedlineCitationSet/MedlineCitation/PMID" /> <field column="CreationYear" xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Year" /> <field column="CreationMonth" xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Month" /> <field column="CreationDay" xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Day" /> <!-- These cause DataImportHandler exceptions! <field column="RevisionYear" xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Year" /> <field column="RevisionMonth" xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Month" /> <field column="RevisionDay" xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Day" /> --> </entity> </entity> </document> </dataConfig>| With kind regards, Konrad Lötzsch. -- *Konrad Loetzsch* Dipl. Math *antibodies-online GmbH* Schloß-Rahe-Str. 15 DE-52072 Aachen Tel.: +49(0)241 9367-2544 konrad.loetz...@antibodies-online.com <mailto:konrad.loetz...@antibodies-online.com> www.antikoerper-online.de <http://www.antikoerper-online.de> | www.antibodies-online.com <http://www.antibodies-online.com> Eingetragen beim Amtsgericht Aachen unter HRB 13919 Geschäftsführer: Dr. Tim Hiddemann, Dr. Andreas Kessell