CorruptIndexException with some versions of java

2008-03-18 Thread Ian Lea
Hi When bulk loading into a new index I'm seeing this exception Exception in thread "Thread-1" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _4l: fieldsReader shows 67861 but segmentInfo shows 67862 at or

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
Can you call IndexWriter.setInfoStream(...) and get the error to happen and post back the resulting output? And, turn on assertions (java -ea) since that may catch the issue sooner. Can you describe you are setting up IndexWriter (autoCommit, compound, etc.), and what your documents are

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
One question: do you know whether 67,861 docs "feels like" a newly flushed segment, or, the result of a merge? Ie, roughly how many docs are you buffering in IndexWriter before it flushes? Are they very small documents and your RAM buffer is large? Mike Ian Lea wrote: Hi When bulk l

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Ian Lea
The data is loaded in chunks of up to 100K docs in separate runs of the program if that helps answer the first question. All buffers have default values, docs are small but not tiny, JVM is running with default settings. Answers to previous questions, and infostream, will follow once the job has

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Ian Lea
Documents are biblio records. All have title, author etc. stored, some have a few extra fields as well. Typically around 25 fields per doc. The index is created with compound format, everything else as default. I've rerun the job until failure. Different numbers this time, but basically the sa

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
I don't see an attachment here -- maybe the mailing list software stripped it off. If so can you send directly to me? Thanks. Mike Ian Lea wrote: Documents are biblio records. All have title, author etc. stored, some have a few extra fields as well. Typically around 25 fields per doc.

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Yonik Seeley
On Tue, Mar 18, 2008 at 7:38 AM, Ian Lea <[EMAIL PROTECTED]> wrote: > Hi > > > When bulk loading into a new index I'm seeing this exception > > Exception in thread "Thread-1" > org.apache.lucene.index.MergePolicy$MergeException: > org.apache.lucene.index.CorruptIndexException: doc counts differ

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Ian Lea
It's failed on servers running SuSE 10.0 and 8.2 (ancient!) $ uname -a shows Linux phoebe 2.6.13-15-smp #1 SMP Tue Sep 13 14:56:15 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux and Linux phobos 2.4.20-64GB-SMP #1 SMP Mon Mar 17 17:56:03 UTC 2003 i686 unknown unknown GNU/Linux The first one has a 2.8G

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
Ian, Could you apply the attached patch applied to the head of the 2.3 branch? It only adds more asserts, to try to pinpoint where exactly this corruption starts. Then, re-run the test with asserts enabled and infoStream turned on and post back. Thanks. Mike Ian Lea wrote: It'

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
Hi Ian, Sheesh that's odd. The SegmentMerger produced an .fdx file that is one document too short. Can you run with this patch now, again applied to head of 2.3 branch? I just added another assert inside the loop that does the field merging. I will scrutinize this code... Mike I

Re: CorruptIndexException with some versions of java

2008-03-18 Thread Michael McCandless
Ian can you attach your version of SegmentMerger.java? Somehow my lines are off from yours. Mike Ian Lea wrote: Mike Latest patch produces similar exception: Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError: after

Re: CorruptIndexException with some versions of java

2008-03-24 Thread Michael McCandless
Just to bring closure here: this in fact looks like some sort of JVM hotspot compiler issue, as best we can tell. Running java with -Xbatch (forces up front compilation) prevents (works around) the issue. I've committed some additional assertions to the particular Lucene code (merging o