Hi I'm getting the following error when trying to index PDF documents using
the MapReduceIndexerTool in Cloudera:

<http://lucene.472066.n3.nabble.com/file/n4332881/Screenshot_from_2017-04-28_21-39-13.png>
 

The cause of the error is:
org.apache.lucene.index.IndexFormatTooNewException: Format version is not
supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
be between 0 and 3).

Reading out there I found the exception is thrown when Lucene detects an
index that is newer that the Lucene version.

My configuration is:
SOLR: 4.10.3
Cloudera: 5.8.0
Hadoop: 2.6.0



In order to index I´m following the tutorial:  </a>
<https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_batch_index_use_mapreduce.html>
 

Using the following hadoop command:
hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-D dfs.replication=1 \
--morphline-file /root/$COLLECTION/conf/pdf_morphlines.conf \
--output-dir hdfs://localhost:8020/user/$USER/outdir --verbose \
--solr-home-dir $HOME/$COLLECTION --shards 1 \
hdfs://localhost:8020/user/$USER/indir

The morphlines file:
pdf_morphlines.conf
<http://lucene.472066.n3.nabble.com/file/n4332881/pdf_morphlines.conf>  

And the schema file:
schema.xml <http://lucene.472066.n3.nabble.com/file/n4332881/schema.xml>  

Thank you.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to