Re: segment gets corrupted (after background merge ?)

Michael McCandless Wed, 12 Jan 2011 07:51:30 -0800

Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0?


It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).

Are you hitting any exceptions during indexing?

Mike

On Wed, Jan 12, 2011 at 10:33 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com> wrote:
> I got another corruption.
>
> It sure looks like it's the same type of error. (on a different field)
>
> It's also not linked to a merge, since the segment size did not change.
>
>
> *** good segment :
>
>  1 of 9: name=_ncc docCount=1841685
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=6,683.447
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
> _20, java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ncc_22s.del]
>    test: open reader.........OK [275881 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs pairs;
> 204561440 tokens]
>    test: stored fields.......OK [45511958 total field count; avg 29.066
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
>
> a few hours latter :
>
> *** broken segment :
>
>  1 of 17: name=_ncc docCount=1841685
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=6,683.447
>    diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0
> _20, java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_ncc_24f.del]
>    test: open reader.........OK [278167 deleted docs]
>    test: fields..............OK [51 fields]
>    test: field norms.........OK [51 fields]
>    test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 != num
> docs seen 0 + num docs deleted 0]
> java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs seen
> 0 + num docs deleted 0
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields.......OK [45429565 total field count; avg 29.056
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
>
> I'll activate infoStream for next time.
>
>
> Thanks,
>
>
> Le 12/01/2011 00:49, Michael McCandless a écrit :
>>
>> When you hit corruption is it always this same problem?:
>>
>>   java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
>> num docs seen 0 + num docs deleted 0
>>
>> Can you run with Lucene's IndexWriter infoStream turned on, and catch
>> the output leading to the corruption?  If something is somehow messing
>> up the bits in the deletes file that could cause this.
>>
>> Mike
>>
>> On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
>> <stephane.delp...@blogspirit.com>  wrote:
>>>
>>> Hi,
>>>
>>> We are using :
>>> Solr Specification Version: 1.4.1
>>> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
>>> Lucene Specification Version: 2.9.3
>>> Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
>>>
>>> # java -version
>>> java version "1.6.0_20"
>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>
>>> We want to index 4M docs in one core (and when it works fine we will add
>>> other cores with 2M on the same server) (1 doc ~= 1kB)
>>>
>>> We use SOLR replication every 5 minutes to update the slave server
>>> (queries
>>> are executed on the slave only)
>>>
>>> Documents are changing very quickly, during a normal day we will have
>>> approx
>>> :
>>> * 200 000 updated docs
>>> * 1000 new docs
>>> * 200 deleted docs
>>>
>>>
>>> I attached the last good checkIndex : solr20110107.txt
>>> And the corrupted one : solr20110110.txt
>>>
>>>
>>> This is not the first time a segment gets corrupted on this server,
>>> that's
>>> why I ran frequent "checkIndex". (but as you can see the first segment is
>>> 1.800.000 docs and it works fine!)
>>>
>>>
>>> I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.
>>>
>>>
>>> I also attached my schema.xml and solrconfig.xml
>>>
>>>
>>> Is there something wrong with what we are doing ? Do you need other info
>>> ?
>>>
>>>
>>> Thanks,
>>>
>>
>

Re: segment gets corrupted (after background merge ?)

Reply via email to