Re: Lock-less commits

2006-08-18 Thread Michael McCandless
You also have to make sure you test this on non-Windows systems. Since a delete in Windows is prevented when the file is open, but non-Windows system do not have this limitation so there is a far greater chance you will have an inconsistent index. Excellent point, will do. I'm now testing a

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
I am betting that if your remote locking has issues, you will have the similar problems (since your new code requires accurate reading of the directory to determine the "latest" files). I also believe that directory reads like this are VERY inefficient in most cases. OK, I will test the cost

Re: Lock-less commits

2006-08-18 Thread robert engels
You also have to make sure you test this on non-Windows systems. Since a delete in Windows is prevented when the file is open, but non- Windows system do not have this limitation so there is a far greater chance you will have an inconsistent index. On Aug 18, 2006, at 5:00 PM, Michael McCan

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
Also, the commit lock is there to allow the merge process to remove unused segments. Without it, a reader might get half way through reading the segments, only to find some missing, and then have to restart reading again. In a highly interactive environment this would be too inefficient. OK

[jira] Commented: (LUCENE-635) [PATCH] Decouple locking implementation from Directory implementation

2006-08-18 Thread Michael McCandless (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-635?page=comments#action_12429135 ] Michael McCandless commented on LUCENE-635: --- OK, does anyone have a strong opinion one way or another on these small changes? I would lean towards keepi

Re: Lock-less commits

2006-08-18 Thread robert engels
Also, the commit lock is there to allow the merge process to remove unused segments. Without it, a reader might get half way through reading the segments, only to find some missing, and then have to restart reading again. In a highly interactive environment this would be too inefficient.

Re: Lock-less commits

2006-08-18 Thread robert engels
I am betting that if your remote locking has issues, you will have the similar problems (since your new code requires accurate reading of the directory to determine the "latest" files). I also believe that directory reads like this are VERY inefficient in most cases. I think these proposed

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
i don't think these changes are going to work. With multiple writers and or readers doing deletes, without serializing the writes you will have inconsistencies - and the del files will need to be unioned. That is: station A opens the index station B opens the index station A deletes some do

Re: Custom sorting - memory leaks

2006-08-18 Thread Chris Hostetter
: You can reproduce OutOfMemory easily. I've attach test files - this is : altered DistanceSortingTest example from LIA book. Also you can : profile it and see caching of distances arrays. An OutOfMemory error is differnet from a memory leak. Sorting with a custom Comparator does in fact use a l

Re: Lock-less commits

2006-08-18 Thread robert engels
i don't think these changes are going to work. With multiple writers and or readers doing deletes, without serializing the writes you will have inconsistencies - and the del files will need to be unioned. That is: station A opens the index station B opens the index station A deletes some do

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
It could in theory lead to starvation but this should be rare in practice unless you have an IndexWriter that's constantly committing. An index with a small mergeFactor (say 2) and a small maxBufferedDocs (default 10), would have segments deleted every mergeFactor*maxBufferedDocs when rapidly

Re: TermScorer explain

2006-08-18 Thread Chris Hostetter
: soon too. Just came across this while writing up documentation on : scoring and thought it sounded like a reasonable and easy fix. I : know Hoss has done a lot with Explanations, so he may know best if : there are issues with skipTo and explain. All tests still pass I can't think of any reas

[jira] Updated: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-18 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=all ] Doron Cohen updated LUCENE-388: --- Attachment: doron_2b_IndexWriter.patch Right... actually it should be like this: int minSegment = segmentInfos.size() - singleDocSegmentsCount - 1; But sinc

Re: Lock-less commits

2006-08-18 Thread Yonik Seeley
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote: It could in theory lead to starvation but this should be rare in practice unless you have an IndexWriter that's constantly committing. An index with a small mergeFactor (say 2) and a small maxBufferedDocs (default 10), would have segment

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
The basic idea is to change all commits (from SegmentReader or IndexWriter) so that we never write to an existing file that a reader could be reading from. Instead, always write to a new file name using sequentially numbered files. For example, for "segments", on every commit, write to a the s

TermScorer explain

2006-08-18 Thread Grant Ingersoll
Anyone see any reason why I shouldn't make the following commit to TermScorer explain per Otis' TODO comment on the method: * @todo Modify to make use of [EMAIL PROTECTED] TermDocs#skipTo(int)}. public Explanation explain(int doc) throws IOException { TermQuery query = (TermQuery)we

[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-18 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12429027 ] Yonik Seeley commented on LUCENE-388: - We could also make the following change to flushRamSegments, right? private final void flushRamSegments() throws IOExc

Re: Lock-less commits

2006-08-18 Thread Yonik Seeley
The basic idea is to change all commits (from SegmentReader or IndexWriter) so that we never write to an existing file that a reader could be reading from. Instead, always write to a new file name using sequentially numbered files. For example, for "segments", on every commit, write to a the seq

[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-18 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12429012 ] Yonik Seeley commented on LUCENE-388: - Thanks Doron, I caught that too and I was just going to set the count to 0 in mergeSegments (mergeSegments is always cal

Lock-less commits

2006-08-18 Thread Michael McCandless
I think it's possible to modify Lucene's commit process so that it does not require any commit locking at all. This would be a big win because it would prevent all the various messy errors (FileNotFound exceptions on instantiating an IndexReader, Access Denied errors on renaming X.new -> X, Lock

[jira] Updated: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-18 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=all ] Doron Cohen updated LUCENE-388: --- Attachment: doron_2_IndexWriter.patch The attached doron_2_IndexWriter.patch is fixing the updating of singleDocSegmentsCount to take place in mergeSegments(minS

[jira] Commented: (LUCENE-650) NPE doing local sensitive sorting when sort field is missing

2006-08-18 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-650?page=comments#action_12428955 ] Doron Cohen commented on LUCENE-650: I reviewed this patch and think that it is valid. This seems like a real bug: - In FieldSortedHitQueue, when no locale is

[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-18 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12428953 ] Doron Cohen commented on LUCENE-388: well there is a problem in the current patch after all... the counter is not decremented when a merge is triggerred b

Custom sorting - memory leaks

2006-08-18 Thread Aleksey Serba
Hi! Could you please read the following discussion in java-user mail list - http://www.gossamer-threads.com/lists/lucene/java-user/35352 You can reproduce OutOfMemory easily. I've attach test files - this is altered DistanceSortingTest example from LIA book. Also you can profile it and see cachi