Just to bring closure here: this in fact looks like some sort of JVM
hotspot compiler issue, as best we can tell.
Running java with -Xbatch (forces up front compilation) prevents
(works around) the issue.
I've committed some additional assertions to the particular Lucene
code (merging o
Ian can you attach your version of SegmentMerger.java? Somehow my
lines are off from yours.
Mike
Ian Lea wrote:
Mike
Latest patch produces similar exception:
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.AssertionError: after
Hi Ian,
Sheesh that's odd. The SegmentMerger produced an .fdx file that is
one document too short.
Can you run with this patch now, again applied to head of 2.3
branch? I just added another assert inside the loop that does the
field merging.
I will scrutinize this code...
Mike
I
Ian,
Could you apply the attached patch applied to the head of the 2.3
branch?
It only adds more asserts, to try to pinpoint where exactly this
corruption starts.
Then, re-run the test with asserts enabled and infoStream turned on
and post back. Thanks.
Mike
Ian Lea wrote:
It'
It's failed on servers running SuSE 10.0 and 8.2 (ancient!)
$ uname -a shows
Linux phoebe 2.6.13-15-smp #1 SMP Tue Sep 13 14:56:15 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux
and
Linux phobos 2.4.20-64GB-SMP #1 SMP Mon Mar 17 17:56:03 UTC 2003 i686
unknown unknown GNU/Linux
The first one has a 2.8G
On Tue, Mar 18, 2008 at 7:38 AM, Ian Lea <[EMAIL PROTECTED]> wrote:
> Hi
>
>
> When bulk loading into a new index I'm seeing this exception
>
> Exception in thread "Thread-1"
> org.apache.lucene.index.MergePolicy$MergeException:
> org.apache.lucene.index.CorruptIndexException: doc counts differ
I don't see an attachment here -- maybe the mailing list software
stripped it off. If so can you send directly to me? Thanks.
Mike
Ian Lea wrote:
Documents are biblio records. All have title, author etc. stored,
some have a few extra fields as well. Typically around 25 fields per
doc.
Documents are biblio records. All have title, author etc. stored,
some have a few extra fields as well. Typically around 25 fields per
doc. The index is created with compound format, everything else as
default.
I've rerun the job until failure. Different numbers this time, but
basically the sa
The data is loaded in chunks of up to 100K docs in separate runs of
the program if that helps answer the first question. All buffers have
default values, docs are small but not tiny, JVM is running with
default settings.
Answers to previous questions, and infostream, will follow once the
job has
One question: do you know whether 67,861 docs "feels like" a newly
flushed segment, or, the result of a merge?
Ie, roughly how many docs are you buffering in IndexWriter before it
flushes? Are they very small documents and your RAM buffer is large?
Mike
Ian Lea wrote:
Hi
When bulk l
Can you call IndexWriter.setInfoStream(...) and get the error to
happen and post back the resulting output? And, turn on assertions
(java -ea) since that may catch the issue sooner.
Can you describe you are setting up IndexWriter (autoCommit,
compound, etc.), and what your documents are
Hi
When bulk loading into a new index I'm seeing this exception
Exception in thread "Thread-1"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: doc counts differ for
segment _4l: fieldsReader shows 67861 but segmentInfo shows 67862
at
or
12 matches
Mail list logo