I modified TikaEntityProcessor to ignore these exceptions.:
If the Tika Entity processor encounters an exception it will stop indexing.
I had to make two fixes to TikaEntityProcessor to work around this problem.
>From the Solr SVN trunk edit the file:
~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar
First of all if a file is not found on the disk we want to continue
indexing. At the top of nextRow() add
File f = new File (context.getResolvedEntityAttribute(URL));
if (! f.exists()) {
return null;
}
Secondly if the document parser throws an error, for example certain PDF
revisions can cause the PDFBox parser to barf, we will trap the exception
and continue:
try {
tikaParser.parse(is, contentHandler, metadata , new ParseContext());
} catch (Exception e) {
return null;
} finally {
IOUtils.closeQuietly(is);
}
We will also close IOUtils in the finally section which is not done in the
original code. Build and deploy the extras.jar in the solr-instance/lib
directory.
see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration
--
View this message in context:
http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html
Sent from the Solr - User mailing list archive at Nabble.com.