Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0?
It looks like new deletions were flushed against the segment (del file changed from _ncc_22s.del to _ncc_24f.del). Are you hitting any exceptions during indexing? Mike On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat <stephane.delp...@blogspirit.com> wrote: > I got another corruption. > > It sure looks like it's the same type of error. (on a different field) > > It's also not linked to a merge, since the segment size did not change. > > > *** good segment : > > 1 of 9: name=_ncc docCount=1841685 > compound=false > hasProx=true > numFiles=9 > size (MB)=6,683.447 > diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, > os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 > 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 > _20, java.vendor=Sun Microsystems Inc.} > has deletions [delFileName=_ncc_22s.del] > test: open reader.........OK [275881 deleted docs] > test: fields..............OK [51 fields] > test: field norms.........OK [51 fields] > test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs; > 204561440 tokens] > test: stored fields.......OK [45511958 total field count; avg 29.066 > fields per doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > > > a few hours latter : > > *** broken segment : > > 1 of 17: name=_ncc docCount=1841685 > compound=false > hasProx=true > numFiles=9 > size (MB)=6,683.447 > diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64, > os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06 > 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0 > _20, java.vendor=Sun Microsystems Inc.} > has deletions [delFileName=_ncc_24f.del] > test: open reader.........OK [278167 deleted docs] > test: fields..............OK [51 fields] > test: field norms.........OK [51 fields] > test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num > docs seen 0 + num docs deleted 0] > java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen > 0 + num docs deleted 0 > at > org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) > test: stored fields.......OK [45429565 total field count; avg 29.056 > fields per doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > FAILED > WARNING: fixIndex() would remove reference to this segment; full > exception: > java.lang.RuntimeException: Term Index test failed > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) > > > I'll activate infoStream for next time. > > > Thanks, > > > Le 12/01/2011 00:49, Michael McCandless a écrit : >> >> When you hit corruption is it always this same problem?: >> >> java.lang.RuntimeException: term source:margolisphil docFreq=1 != >> num docs seen 0 + num docs deleted 0 >> >> Can you run with Lucene's IndexWriter infoStream turned on, and catch >> the output leading to the corruption? If something is somehow messing >> up the bits in the deletes file that could cause this. >> >> Mike >> >> On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat >> <stephane.delp...@blogspirit.com> wrote: >>> >>> Hi, >>> >>> We are using : >>> Solr Specification Version: 1.4.1 >>> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 >>> Lucene Specification Version: 2.9.3 >>> Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 >>> >>> # java -version >>> java version "1.6.0_20" >>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>> >>> We want to index 4M docs in one core (and when it works fine we will add >>> other cores with 2M on the same server) (1 doc ~= 1kB) >>> >>> We use SOLR replication every 5 minutes to update the slave server >>> (queries >>> are executed on the slave only) >>> >>> Documents are changing very quickly, during a normal day we will have >>> approx >>> : >>> * 200 000 updated docs >>> * 1000 new docs >>> * 200 deleted docs >>> >>> >>> I attached the last good checkIndex : solr20110107.txt >>> And the corrupted one : solr20110110.txt >>> >>> >>> This is not the first time a segment gets corrupted on this server, >>> that's >>> why I ran frequent "checkIndex". (but as you can see the first segment is >>> 1.800.000 docs and it works fine!) >>> >>> >>> I can't find any "SEVER" "FATAL" or "exception" in the Solr logs. >>> >>> >>> I also attached my schema.xml and solrconfig.xml >>> >>> >>> Is there something wrong with what we are doing ? Do you need other info >>> ? >>> >>> >>> Thanks, >>> >> >