Re: Corrupt index

2012-06-15 Thread Michael McCandless
On Wed, Jun 13, 2012 at 8:45 PM, Itamar Syn-Hershko ita...@code972.com wrote: Mike, On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Itamar, One quick question: does Lucene.Net include the fixes done for LUCENE-1044 (to fsync files on commit)?  Those

Re: Corrupt index

2012-06-15 Thread Michael McCandless
I think the 0-segment segments_1 file is expected in Lucene.Net since we changed that later, in 3.1 in Lucene (LUCENE-2386)? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 14, 2012 at 8:40 PM, Itamar Syn-Hershko ita...@code972.com wrote: I can confirm 2.9.4 had autoCommit, but it

Re: Corrupt index

2012-06-14 Thread Christopher Currens
Well, the only thing I see is that there is no place where writer.Commit() is called in the delegate assigned to corpusReader.OnDocument. I know that lucene is very transactional, and at least in 3.x, the writer will never auto commit to the index. You can write millions of documents, but if

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I'm quite certain this shouldn't happen also when Commit wasn't called. Mike, can you comment on that? On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens currens.ch...@gmail.com wrote: Well, the only thing I see is that there is no place where writer.Commit() is called in the delegate

Re: Corrupt index

2012-06-14 Thread Troy Howard
If this is the case, 2328 probably made it's way to Lucene.Net since we are using the released sources for porting, and we now need to apply 3418 in the current version. Iatmar: I confirmed that 2328 is in the latest code. Thanks, Troy On Wed, Jun 13, 2012 at 5:45 PM, Itamar Syn-Hershko

Re: Corrupt index

2012-06-14 Thread Itamar Syn-Hershko
I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so Lucene.Net doesn't have autoCommit. So I don't have autoCommit set to true, but I can clearly see a segments_1 file there along with the other files. If that helpes, it always keeps with the name segments_1 with 32 bytes,

Re: Corrupt index

2012-06-13 Thread Christopher Currens
Mike, The codebase for lucene.net should be almost identical to java's 3.0.3 release, and LUCENE-1044 is included in that. Itamar, are you committing the index regularly? I only ask because I can't reproduce it myself by forcibly terminating the process while it's indexing. I've tried both

Re: Corrupt index

2012-06-13 Thread Itamar Syn-Hershko
Christopher, I used the IndexBuilder app from here https://github.com/synhershko/Talks/tree/master/LuceneNeatThings with a 8.5GB wikipedia dump. After running for 2.5 days I had to forcefully close it (infinite loop in the wiki-markdown parser at 92%, go figure), and the 40-something GB index I