Hi I'm getting the following error when trying to index PDF documents using the MapReduceIndexerTool in Cloudera:
<http://lucene.472066.n3.nabble.com/file/n4332881/Screenshot_from_2017-04-28_21-39-13.png> The cause of the error is: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to be between 0 and 3). Reading out there I found the exception is thrown when Lucene detects an index that is newer that the Lucene version. My configuration is: SOLR: 4.10.3 Cloudera: 5.8.0 Hadoop: 2.6.0 In order to index I´m following the tutorial: </a> <https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_batch_index_use_mapreduce.html> Using the following hadoop command: hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar \ org.apache.solr.hadoop.MapReduceIndexerTool \ -D mapreduce.job.maps=1 \ -D mapreduce.job.reduces=1 \ -D dfs.replication=1 \ --morphline-file /root/$COLLECTION/conf/pdf_morphlines.conf \ --output-dir hdfs://localhost:8020/user/$USER/outdir --verbose \ --solr-home-dir $HOME/$COLLECTION --shards 1 \ hdfs://localhost:8020/user/$USER/indir The morphlines file: pdf_morphlines.conf <http://lucene.472066.n3.nabble.com/file/n4332881/pdf_morphlines.conf> And the schema file: schema.xml <http://lucene.472066.n3.nabble.com/file/n4332881/schema.xml> Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881.html Sent from the Solr - User mailing list archive at Nabble.com.