Unexplainable indexing i/o errors

simon Mon, 27 Mar 2017 12:49:31 -0700

I'm seeing an odd error during indexing for which I can't find any reason.

The relevant solr log entry:


2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [
x:build0324] o.a.s.u.CommitTracker auto commit
error...:java.io.EOFException: read past EOF:
 MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
     at
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)
...
    Suppressed: org.apache.lucene.index.CorruptIndexException: checksum
status indeterminate: remaining=0, please run checkindex for more details
(resource=
BufferedChecksumIndexInput(MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")))
         at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:451)
         at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:140)
 followed within a few seconds by

 2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [
x:build0324] o.a.s.u.CommitTracker auto commit
error...:org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1820)
    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931)
...
Caused by: java.io.EOFException: read past EOF:
MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
    at
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)

This error is repeated a few times as the indexing continued and further
autocommits were triggered.

I stopped the indexing process, made a backup snapshot of the index,
 restarted indexing at a checkpoint, and everything then completed without
further incidents

I ran checkIndex on the saved snapshot and it reported no errors
whatsoever. Operations on the complete index (inclcuing an optimize and
several query scripts) have all been error-free.

Some background:
 Solr information from the beginning of the checkindex output:
 -------
 Opening index @ /indexes/solrindexes/build0324.bad/index

Segments file=segments_9s numSegments=105 version=6.3.0
id=7m1ldieoje0m6sljp7xocbz9l userData={commitTimeMSec=1490400514324}
  1 of 105: name=_be maxDoc=1227144
    version=6.3.0
    id=7m1ldieoje0m6sljp7xocburb
    codec=Lucene62
    compound=false
    numFiles=14
    size (MB)=4,926.186
    diagnostics = {os=Linux, java.vendor=Oracle Corporation,
java.version=1.8.0_45, java.vm.version=25.45-b02, lucene.version=6.3.0,
mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_45-b13,
source=merge, mergeFactor=19, os.version=3.10.0-229.1.2.el7.x86_64,
timestamp=1490380905920}
    no deletions
    test: open reader.........OK [took 0.176 sec]
    test: check integrity.....OK [took 37.399 sec]
    test: check live docs.....OK [took 0.000 sec]
    test: field infos.........OK [49 fields] [took 0.000 sec]
    test: field norms.........OK [17 fields] [took 0.030 sec]
    test: terms, freq, prox...OK [14568108 terms; 612537186 terms/docs
pairs; 801208966 tokens] [took 30.005 sec]
    test: stored fields.......OK [150164874 total field count; avg 122.4
fields per doc] [took 35.321 sec]
    test: term vectors........OK [4804967 total term vector count; avg 3.9
term/freq vector fields per doc] [took 55.857 sec]
    test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1 NUMERIC;
2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec]
    test: points..............OK [0 fields, 0 points] [took 0.000 sec]
  -----

  The indexing process is a Python script (using the scorched Python
client)  which spawns multiple instance of itself, in this case 6, so there
are definitely concurrent calls ( to /update/json )

Solrconfig and the schema have not been changed for several months, during
which time many ingests have been done, and the documents which were being
indexed at the time of the error have been indexed before without problems,
so I don't think it's a data issue.

I saw the same error occur earlier in the day, and decided at that time to
delete the core and restart the Solr instance.

The server is an Amazon instance running CentOS 7. I checked the system
logs and didn't see any evidence of hardware errors

I'm puzzled as to why this would start happening out of the blue and I
can't find any partiuclarly relevant posts to this forum or Stackexchange.
Anyone have an idea what's going on ?

-Simon

Unexplainable indexing i/o errors

Reply via email to