I got another corruption.
It sure looks like it's the same type of error. (on a different field)
It's also not linked to a merge, since the segment size did not change.
*** good segment :
1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10,
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true,
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge,
os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_22s.del]
test: open reader.........OK [275881 deleted docs]
test: fields..............OK [51 fields]
test: field norms.........OK [51 fields]
test: terms, freq, prox...OK [17952652 terms; 174113812 terms/docs
pairs; 204561440 tokens]
test: stored fields.......OK [45511958 total field count; avg
29.066 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
a few hours latter :
*** broken segment :
1 of 17: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size (MB)=6,683.447
diagnostics = {optimize=false, mergeFactor=10,
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true,
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge,
os.arch=amd64, java.version=1.6.0
_20, java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_ncc_24f.del]
test: open reader.........OK [278167 deleted docs]
test: fields..............OK [51 fields]
test: field norms.........OK [51 fields]
test: terms, freq, prox...ERROR [term post_id:1599104 docFreq=1 !=
num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term post_id:1599104 docFreq=1 != num docs
seen 0 + num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields.......OK [45429565 total field count; avg
29.056 fields per doc]
test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
I'll activate infoStream for next time.
Thanks,
Le 12/01/2011 00:49, Michael McCandless a écrit :
When you hit corruption is it always this same problem?:
java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0
Can you run with Lucene's IndexWriter infoStream turned on, and catch
the output leading to the corruption? If something is somehow messing
up the bits in the deletes file that could cause this.
Mike
On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
<stephane.delp...@blogspirit.com> wrote:
Hi,
We are using :
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
We want to index 4M docs in one core (and when it works fine we will add
other cores with 2M on the same server) (1 doc ~= 1kB)
We use SOLR replication every 5 minutes to update the slave server (queries
are executed on the slave only)
Documents are changing very quickly, during a normal day we will have approx
:
* 200 000 updated docs
* 1000 new docs
* 200 deleted docs
I attached the last good checkIndex : solr20110107.txt
And the corrupted one : solr20110110.txt
This is not the first time a segment gets corrupted on this server, that's
why I ran frequent "checkIndex". (but as you can see the first segment is
1.800.000 docs and it works fine!)
I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.
I also attached my schema.xml and solrconfig.xml
Is there something wrong with what we are doing ? Do you need other info ?
Thanks,