Hey Bill,
Any status on this?
On Dec 2, 2007, at 10:37 PM, Bill Janssen wrote:
Hmmm, it still sounds like you are hitting a threading issue that is
probably exacerbated by the multicore platform of the newer machine.
Exactly what I was thinking.
What are the details of the CPUs of these two
> > Hmmm, it still sounds like you are hitting a threading issue that is
> > probably exacerbated by the multicore platform of the newer machine.
>
> Exactly what I was thinking.
> What are the details of the CPUs of these two systems?
Ah, good point. The bad machine is a dual-processor 1GHz G4
On Dec 2, 2007 9:28 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Hmmm, it still sounds like you are hitting a threading issue that is
> probably exacerbated by the multicore platform of the newer machine.
Exactly what I was thinking.
What are the details of the CPUs of these two systems?
-Yon
Hmmm, it still sounds like you are hitting a threading issue that is
probably exacerbated by the multicore platform of the newer machine.
Is there anyway to put together a unit test that we can try?
Thanks,
Grant
On Dec 2, 2007, at 9:10 PM, Bill Janssen wrote:
I'll see if I can get back t
> I'll see if I can get back to this over the weekend.
I got a chance to copy my corpus to another G4 and try indexing with
Lucene 2.2. This one seems OK! Same texts. So now I'm inclined to
believe that it *is* the machine, rather than the code. Whew! Though
that doesn't explain why 2.0 works
> Your errors seem to happen around the same area (~20K docs). If you
> skip the first say ~18K docs does the error still happen? We need to
> somehow narrow this down.
I'm trying to boil down the documents to a set which I can deploy on a
DVD-ROM, so I can move the same set around from machine
I have PPC and Intel access if that helps. Just need a test case.
On Nov 29, 2007, at 5:37 PM, Michael McCandless wrote:
"Bill Janssen" <[EMAIL PROTECTED]> wrote:
No. It's in another location, but perhaps I can get it tomorrow.
On the other hand, the success when using 2.0 makes it likely t
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
> Just a theory (make that a guess), Mike, but is it possible that the
> one merge scheduler is hitting a synchronization issue with the
> deletedDocuments bit vector? That is one thread is cleaning it up and
> the other is accessing and they are
"Bill Janssen" <[EMAIL PROTECTED]> wrote:
> No. It's in another location, but perhaps I can get it tomorrow.
> On the other hand, the success when using 2.0 makes it likely to me
> that the machine isn't the problem.
Yeah good point. Seems like a long shot (wishful thinking on my
part!).
Your
Just a theory (make that a guess), Mike, but is it possible that the
one merge scheduler is hitting a synchronization issue with the
deletedDocuments bit vector? That is one thread is cleaning it up and
the other is accessing and they aren't synchronizing their access?
This doesn't explain
This is in the nightly JAR. It's o.a.l.index.CheckIndex (it defines
a static main).
Mike
"Bill Janssen" <[EMAIL PROTECTED]> wrote:
> > Also, could you try out the CheckIndex tool in 2.3-dev before and
> > after the deletes?
>
> Great idea! I don't suppose there's a jar file of it?
>
> Bill
So, it's a little clearer. I get the Array-out-of-bounds exception if
I'm re-indexing some already indexed documents -- if there are
deletions involved. I get the CorruptIndexException if I'm indexing
freshly -- no deletions. Here's an example of that (with the latest
nightly). I removed the ex
> Also, could you try out the CheckIndex tool in 2.3-dev before and
> after the deletes?
Great idea! I don't suppose there's a jar file of it?
Bill
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
> Have you tried another PPC machine?
No. It's in another location, but perhaps I can get it tomorrow. On
the other hand, the success when using 2.0 makes it likely to me that
the machine isn't the problem.
OK, I've reverted to my original codebase (where I first create a
reader and do the dele
> Could you post this part of the code (deleting) too?
Here it is:
private static void remove (File index_file, String[] doc_ids, int start) {
String number;
String list;
Term term;
TermDocs matches;
if (debug_mode)
System.err.println("in
On Nov 29, 2007, at 2:26 PM, Bill Janssen wrote:
Are you still getting the original exception too or just the Array
out =20=
of bounds one now? Also, are you doing anything else to the index
=20
while this is happening? The code at the point in the exception
below =20=
is trying to p
"Bill Janssen" <[EMAIL PROTECTED]> wrote:
> Here's the dump with last night's build:
Those logs look healthy up until the exception.
One odd thing is when you instantiate your writer, your index has 2
segments in it. I expected only 1 since each time you visit your
index you leave it optimized.
> Are you still getting the original exception too or just the Array out =20=
>
> of bounds one now? Also, are you doing anything else to the index =20
> while this is happening? The code at the point in the exception below =20=
>
> is trying to properly handle deleted documents.
Just the arra
Are you still getting the original exception too or just the Array out
of bounds one now? Also, are you doing anything else to the index
while this is happening? The code at the point in the exception below
is trying to properly handle deleted documents.
-Grant
On Nov 29, 2007, at 1:34 P
> Can you try running with the trunk version of Lucene (2.3-dev) and see
> if the error still occurs? EG you can download this AM's build here:
>
>
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts
Still there. Here's the dump with last night's build:
/L
> > Another thing to try is turning on the infoStream
> > (IndexWriter.setInfoStream(...)) and capture & post the resulting log.
> > It will be very large since it takes quite a while for the error to
> > occur...
>
> I can do that.
Here's a more complete dump. I've modified the code so that I n
> > Another thing to try is turning on the infoStream
> > (IndexWriter.setInfoStream(...)) and capture & post the resulting log.
> > It will be very large since it takes quite a while for the error to
> > occur...
>
> I can do that.
Here's what I see:
Optimizing...
merging segments _ram_a (1 doc
> Do you have another PPC machine to reproduce this on? (To rule out
> bad RAM/hard-drive on the first one).
I'll dig up an old laptop and try it there.
> Another thing to try is turning on the infoStream
> (IndexWriter.setInfoStream(...)) and capture & post the resulting log.
> It will be very
"Bill Janssen" <[EMAIL PROTECTED]> wrote:
> > Hmmm ... how many chunks of "about 50 pages" do you do before
> > hitting this? Roughly how many docs are in the index when it
> > happens?
>
> Oh, gosh, not sure. I'm guessing it's about half done.
Ugh, OK. If we could boil this down to a smaller
> Hmmm ... how many chunks of "about 50 pages" do you do before hitting this?
> Roughly how many docs are in the index when it happens?
Oh, gosh, not sure. I'm guessing it's about half done.
> Can you describe the docs/fields you're adding?
I've got 1735 documents, 18969 pages -- average page s
> I'm going to run the same software on an
> Intel machine and see what happens.
So, I ran the same codebase with lucene-core-2.2.0.jar on an Intel Mac
Pro, OS X 10.5.0, Java 1.5, and no exception is raised. Different
corpus, about 5 pages instead of 2. This is reinforcing my
thinking th
> You are not hitting any other exception before this one right?
>
> Can you change your test case so that the "catch" clause is run
> before the "finally" clause? I wonder if you are hitting some
> interesting exception and then trying to optimize, which then
> masks the original exception.
Yes
Hmmm ... how many chunks of "about 50 pages" do you do before hitting this?
Roughly how many docs are in the index when it happens?
Can you describe the docs/fields you're adding?
You are not hitting any other exception before this one right?
Can you change your test case so that the "catch" cl
> Are you really sure in your 2.2 test you are starting with no prior
> index?
I'd ask that too, but yes, I'm really really sure. Building a
completely new index each time.
Works with 2.0.0. Fails with 2.2.0. Works with 2.2.0 *if* I remove
the optimization step.
Bill
---
Are you really sure in your 2.2 test you are starting with no prior
index?
2.2 should in fact work fine with a 2.0 index but it's possible there
was some latent corruption in the 2.0 index if you are accidentally
using it. That exception looks alot like this dreaded bug:
https://issues.apache.
I just tried re-indexing with lucene-core-2.0.0.jar and the same
indexing code; works great. So what am I doing wrong with 2.2?
Bill
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Here's the code I'm using:
try {
// Now add the documents to the index
IndexWriter writer = new IndexWriter(index_loc, new
StandardAnalyzer(), !index_loc.exists());
writer.setMaxFieldLength(Integer.MAX_VALUE);
try {
for (in
I've got a DB of about 2 pages which I thought I'd update to
Lucene 2.2. I removed the old index (2.0 based) completely, and
started re-indexing all the documents. I do this in stages, of about
50 pages at a time, serially, starting a new JVM each time, and reading
in the existing index, then
33 matches
Mail list logo