Re: Corrupt index

2012-06-15 Thread Michael McCandless
I think the 0-segment segments_1 file is expected in Lucene.Net since we changed that later, in 3.1 in Lucene (LUCENE-2386)? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 14, 2012 at 8:40 PM, Itamar Syn-Hershko wrote: > I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 al

Re: Corrupt index

2012-06-15 Thread Michael McCandless
Right: Lucene never autocommits anymore ... If you create a new index, add a bunch of docs, and things crash before you have a chance to commit, then there is no index (not even a 0 doc one) in that directory. Mike McCandless http://blog.mikemccandless.com On Thu, Jun 14, 2012 at 1:41 PM, Itama

Re: Corrupt index

2012-06-15 Thread Michael McCandless
On Wed, Jun 13, 2012 at 8:45 PM, Itamar Syn-Hershko wrote: > Mike, > > On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless > wrote: >> >> Hi Itamar, >> >> One quick question: does Lucene.Net include the fixes done for >> LUCENE-1044 (to fsync files on commit)?  Those are very important for >> an

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so Lucene.Net doesn't have autoCommit. So I don't have autoCommit set to true, but I can clearly see a segments_1 file there along with the other files. If that helpes, it always keeps with the name segments_1 with 32 bytes, neve

Re: Corrupt index

2012-06-14 Thread Troy Howard
> If this is the case, 2328 probably made it's way to Lucene.Net since we are > using the released sources for porting, and we now need to apply 3418 in > the current version. Iatmar: I confirmed that 2328 is in the latest code. Thanks, Troy On Wed, Jun 13, 2012 at 5:45 PM, Itamar Syn-Hershko

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I'm quite certain this shouldn't happen also when Commit wasn't called. Mike, can you comment on that? On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens < currens.ch...@gmail.com> wrote: > Well, the only thing I see is that there is no place where writer.Commit() > is called in the delegate a

Re: Corrupt index

2012-06-14 Thread Christopher Currens
Well, the only thing I see is that there is no place where writer.Commit() is called in the delegate assigned to corpusReader.OnDocument. I know that lucene is very transactional, and at least in 3.x, the writer will never auto commit to the index. You can write millions of documents, but if comm

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Christopher, I used the IndexBuilder app from here https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a 8.5GB wikipedia dump. After running for 2.5 days I had to forcefully close it (infinite loop in the wiki-markdown parser at 92%, go figure), and the 40-something GB index I h

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Mike, On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hi Itamar, > > One quick question: does Lucene.Net include the fixes done for > LUCENE-1044 (to fsync files on commit)? Those are very important for > an index to be intact after OS/JVM crash or power

Re: Corrupt index

2012-06-13 Thread Christopher Currens
Mike, The codebase for lucene.net should be almost identical to java's 3.0.3 release, and LUCENE-1044 is included in that. Itamar, are you committing the index regularly? I only ask because I can't reproduce it myself by forcibly terminating the process while it's indexing. I've tried both 3.0.3

Corrupt index

2012-06-12 Thread Itamar Syn-Hershko
Hi Java devs, I'm a Lucene.Net committer, and there is a chance we have a bug in our FSDirectory implementation that causes indexes to get corrupted when indexing is cut while the IW is still open. As it roots from some retroactive fixes you made, I'd appreciate your feedback. Correct me if I'm w