Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-18 Thread Grant Ingersoll
Hey Bill, Any status on this? On Dec 2, 2007, at 10:37 PM, Bill Janssen wrote: Hmmm, it still sounds like you are hitting a threading issue that is probably exacerbated by the multicore platform of the newer machine. Exactly what I was thinking. What are the details of the CPUs of these two

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-02 Thread Bill Janssen
> > Hmmm, it still sounds like you are hitting a threading issue that is > > probably exacerbated by the multicore platform of the newer machine. > > Exactly what I was thinking. > What are the details of the CPUs of these two systems? Ah, good point. The bad machine is a dual-processor 1GHz G4

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-02 Thread Yonik Seeley
On Dec 2, 2007 9:28 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Hmmm, it still sounds like you are hitting a threading issue that is > probably exacerbated by the multicore platform of the newer machine. Exactly what I was thinking. What are the details of the CPUs of these two systems? -Yon

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-02 Thread Grant Ingersoll
Hmmm, it still sounds like you are hitting a threading issue that is probably exacerbated by the multicore platform of the newer machine. Is there anyway to put together a unit test that we can try? Thanks, Grant On Dec 2, 2007, at 9:10 PM, Bill Janssen wrote: I'll see if I can get back t

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-02 Thread Bill Janssen
> I'll see if I can get back to this over the weekend. I got a chance to copy my corpus to another G4 and try indexing with Lucene 2.2. This one seems OK! Same texts. So now I'm inclined to believe that it *is* the machine, rather than the code. Whew! Though that doesn't explain why 2.0 works

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-30 Thread Bill Janssen
> Your errors seem to happen around the same area (~20K docs). If you > skip the first say ~18K docs does the error still happen? We need to > somehow narrow this down. I'm trying to boil down the documents to a set which I can deploy on a DVD-ROM, so I can move the same set around from machine

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Grant Ingersoll
I have PPC and Intel access if that helps. Just need a test case. On Nov 29, 2007, at 5:37 PM, Michael McCandless wrote: "Bill Janssen" <[EMAIL PROTECTED]> wrote: No. It's in another location, but perhaps I can get it tomorrow. On the other hand, the success when using 2.0 makes it likely t

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Michael McCandless
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > Just a theory (make that a guess), Mike, but is it possible that the > one merge scheduler is hitting a synchronization issue with the > deletedDocuments bit vector? That is one thread is cleaning it up and > the other is accessing and they are

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Michael McCandless
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > No. It's in another location, but perhaps I can get it tomorrow. > On the other hand, the success when using 2.0 makes it likely to me > that the machine isn't the problem. Yeah good point. Seems like a long shot (wishful thinking on my part!). Your

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Grant Ingersoll
Just a theory (make that a guess), Mike, but is it possible that the one merge scheduler is hitting a synchronization issue with the deletedDocuments bit vector? That is one thread is cleaning it up and the other is accessing and they aren't synchronizing their access? This doesn't explain

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Michael McCandless
This is in the nightly JAR. It's o.a.l.index.CheckIndex (it defines a static main). Mike "Bill Janssen" <[EMAIL PROTECTED]> wrote: > > Also, could you try out the CheckIndex tool in 2.3-dev before and > > after the deletes? > > Great idea! I don't suppose there's a jar file of it? > > Bill

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
So, it's a little clearer. I get the Array-out-of-bounds exception if I'm re-indexing some already indexed documents -- if there are deletions involved. I get the CorruptIndexException if I'm indexing freshly -- no deletions. Here's an example of that (with the latest nightly). I removed the ex

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Also, could you try out the CheckIndex tool in 2.3-dev before and > after the deletes? Great idea! I don't suppose there's a jar file of it? Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Have you tried another PPC machine? No. It's in another location, but perhaps I can get it tomorrow. On the other hand, the success when using 2.0 makes it likely to me that the machine isn't the problem. OK, I've reverted to my original codebase (where I first create a reader and do the dele

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Could you post this part of the code (deleting) too? Here it is: private static void remove (File index_file, String[] doc_ids, int start) { String number; String list; Term term; TermDocs matches; if (debug_mode) System.err.println("in

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Grant Ingersoll
On Nov 29, 2007, at 2:26 PM, Bill Janssen wrote: Are you still getting the original exception too or just the Array out =20= of bounds one now? Also, are you doing anything else to the index =20 while this is happening? The code at the point in the exception below =20= is trying to p

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Michael McCandless
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > Here's the dump with last night's build: Those logs look healthy up until the exception. One odd thing is when you instantiate your writer, your index has 2 segments in it. I expected only 1 since each time you visit your index you leave it optimized.

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Are you still getting the original exception too or just the Array out =20= > > of bounds one now? Also, are you doing anything else to the index =20 > while this is happening? The code at the point in the exception below =20= > > is trying to properly handle deleted documents. Just the arra

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Grant Ingersoll
Are you still getting the original exception too or just the Array out of bounds one now? Also, are you doing anything else to the index while this is happening? The code at the point in the exception below is trying to properly handle deleted documents. -Grant On Nov 29, 2007, at 1:34 P

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Can you try running with the trunk version of Lucene (2.3-dev) and see > if the error still occurs? EG you can download this AM's build here: > > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts Still there. Here's the dump with last night's build: /L

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> > Another thing to try is turning on the infoStream > > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > > It will be very large since it takes quite a while for the error to > > occur... > > I can do that. Here's a more complete dump. I've modified the code so that I n

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> > Another thing to try is turning on the infoStream > > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > > It will be very large since it takes quite a while for the error to > > occur... > > I can do that. Here's what I see: Optimizing... merging segments _ram_a (1 doc

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Bill Janssen
> Do you have another PPC machine to reproduce this on? (To rule out > bad RAM/hard-drive on the first one). I'll dig up an old laptop and try it there. > Another thing to try is turning on the infoStream > (IndexWriter.setInfoStream(...)) and capture & post the resulting log. > It will be very

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-29 Thread Michael McCandless
"Bill Janssen" <[EMAIL PROTECTED]> wrote: > > Hmmm ... how many chunks of "about 50 pages" do you do before > > hitting this? Roughly how many docs are in the index when it > > happens? > > Oh, gosh, not sure. I'm guessing it's about half done. Ugh, OK. If we could boil this down to a smaller

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> Hmmm ... how many chunks of "about 50 pages" do you do before hitting this? > Roughly how many docs are in the index when it happens? Oh, gosh, not sure. I'm guessing it's about half done. > Can you describe the docs/fields you're adding? I've got 1735 documents, 18969 pages -- average page s

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> I'm going to run the same software on an > Intel machine and see what happens. So, I ran the same codebase with lucene-core-2.2.0.jar on an Intel Mac Pro, OS X 10.5.0, Java 1.5, and no exception is raised. Different corpus, about 5 pages instead of 2. This is reinforcing my thinking th

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> You are not hitting any other exception before this one right? > > Can you change your test case so that the "catch" clause is run > before the "finally" clause? I wonder if you are hitting some > interesting exception and then trying to optimize, which then > masks the original exception. Yes

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Michael McCandless
Hmmm ... how many chunks of "about 50 pages" do you do before hitting this? Roughly how many docs are in the index when it happens? Can you describe the docs/fields you're adding? You are not hitting any other exception before this one right? Can you change your test case so that the "catch" cl

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
> Are you really sure in your 2.2 test you are starting with no prior > index? I'd ask that too, but yes, I'm really really sure. Building a completely new index each time. Works with 2.0.0. Fails with 2.2.0. Works with 2.2.0 *if* I remove the optimization step. Bill ---

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Michael McCandless
Are you really sure in your 2.2 test you are starting with no prior index? 2.2 should in fact work fine with a 2.0 index but it's possible there was some latent corruption in the 2.0 index if you are accidentally using it. That exception looks alot like this dreaded bug: https://issues.apache.

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
I just tried re-indexing with lucene-core-2.0.0.jar and the same indexing code; works great. So what am I doing wrong with 2.2? Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
Here's the code I'm using: try { // Now add the documents to the index IndexWriter writer = new IndexWriter(index_loc, new StandardAnalyzer(), !index_loc.exists()); writer.setMaxFieldLength(Integer.MAX_VALUE); try { for (in

lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-11-28 Thread Bill Janssen
I've got a DB of about 2 pages which I thought I'd update to Lucene 2.2. I removed the old index (2.0 based) completely, and started re-indexing all the documents. I do this in stages, of about 50 pages at a time, serially, starting a new JVM each time, and reading in the existing index, then