Re: lock-less commits data no longer in FileFormats document

2006-11-29 Thread Grant Ingersoll
On Nov 29, 2006, at 8:47 AM, Grant Ingersoll wrote: I will take care of the file formats issue. I thought I updated before committing, but I obviously missed it. Done. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

Re: lock-less commits data no longer in FileFormats document

2006-11-29 Thread Grant Ingersoll
I will take care of the file formats issue. I thought I updated before committing, but I obviously missed it. I would like to move the rest of this discussion to http:// issues.apache.org/jira/browse/LUCENE-708 so it is captured on the issue, so look for a message from that issue shortly.

Re: lock-less commits data no longer in FileFormats document

2006-11-28 Thread Doron Cohen
> More then likely the specific changes you asked about were overlooked when > porting from the old style xdocs directory to the new forrest directory > structure ... Should we create an issue for restoring this data in the trunk version of the docs? > but even if they weren't how should these ch

Re: lock-less commits data no longer in FileFormats document

2006-11-28 Thread Chris Hostetter
: But I first searched the FileFormats document - as far as I can tell the : lock-less commit data is no longer in this document - not in the trunk, nor : in the updated Web site. I understand that Web site currently reflects the : last release 2.0, but I thought the trunk should have it. this is

lock-less commits data no longer in FileFormats document

2006-11-27 Thread Doron Cohen
I was trying to figure out something about the way deleted docs are handled. Eventually found the answer in the code. But I first searched the FileFormats document - as far as I can tell the lock-less commit data is no longer in this document - not in the trunk, nor in the updated Web site. I und

[jira] Resolved: (LUCENE-701) Lock-less commits

2006-11-17 Thread Michael McCandless (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ] Michael McCandless resolved LUCENE-701. --- Fix Version/s: 2.1 Resolution: Fixed > Lock-less commits > - > > Key: LUCENE-701 >

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-10 Thread Michael McCandless (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-701?page=comments#action_12448902 ] Michael McCandless commented on LUCENE-701: --- Oooh -- I would love to! > Lock-less commits > - > > Key

[jira] Updated: (LUCENE-701) Lock-less commits

2006-11-10 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ] Otis Gospodnetic updated LUCENE-701: Lucene Fields: [Patch Available] (was: [New]) > Lock-less commits > - > > Key: LUCENE-701 >

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-10 Thread Otis Gospodnetic (JIRA)
ently substantial change that will make us want to make a 2.1 release. Maybe Michael should commit this next week. > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 >

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-06 Thread Steven Parkes (JIRA)
changes the on-disk format, which affects more than Lucene Java and, should anyone that's using Lucene out there care (via scripts, etc.), the naming of files on disk. I'm just wondering if there's any interest/reason for doing a 2.1 before something with those side

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-06 Thread Yonik Seeley (JIRA)
? > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions

[jira] Updated: (LUCENE-701) Lock-less commits

2006-11-06 Thread Michael McCandless (JIRA)
tch. This addresses all feedback/TODOs that I knew about. All unit tests pass. > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java >

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Michael McCandless (JIRA)
his is part of a wider context. Maybe it's the creation of the compound file you're thinking of? That writes 0's into the header, adds the files, then rewinds and puts the actual offsets into it. Then let's open a separate issue to track this -- I'll

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Michael McCandless (JIRA)
o so). Still, I think the point at which starvation starts to happen is far beyond a normal usage of Lucene (ie, committing > ten times / sec). > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/L

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Yonik Seeley (JIRA)
(segment file lengths at the beginning IIRC). When this option is added, perhaps the configuration name should be generic and not tied to the implementation specifics that could change more frequently? Something like WRITE_ONCE or setWriteOnce()? > Lock-less commits > -

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Ning Li (JIRA)
ic is applied to the loading of all the files of an index. > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java > Issue Type: Improv

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Yonik Seeley (JIRA)
vent a reader from opening at all? I don't think so (and it would be a mis-configured writer IMO), but maybe Michael could speak to that. > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Ning Li (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-701?page=comments#action_12446638 ] Ning Li commented on LUCENE-701: Can the following scenario happen with lock-less commits? 1 A reader reads segments.1, which says the index contains seg_1. 2 A

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-02 Thread Michael McCandless (JIRA)
; everything (like Solr) will see. Ahh got it, OK. That's fair. > I'm not sure I understand the "segments.gen" logic of writing two > longs that are identical. Looking at the code, it doesn't seem like > you are implementing this: > http://www.nabble.com

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-01 Thread Yonik Seeley (JIRA)
rstand the "segments.gen" logic of writing two longs that are identical. Looking at the code, it doesn't seem like you are implementing this: http://www.nabble.com/Re%3A-Lock-less-commits-p5978090.html Are there two longs instead of one in order to leave "space" for that implementat

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-01 Thread Michael McCandless (JIRA)
check whether the searcher is current and if not, reopen it, and then run the query, vs having separate background thread do this, which is certainly feasible just more complicated. > Lock-less commits > - > > Key: LUCENE-701 > URL: ht

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-01 Thread Michael McCandless (JIRA)
ays correct. But I think it's entirely likely that filesystems do this kind of time-based (only) cache validation. I figured better safe than sorry here, and Lucene should tolerate stale caching around either file contents or directory listing. > Lock-less commits > ---

[jira] Commented: (LUCENE-701) Lock-less commits

2006-11-01 Thread Michael McCandless (JIRA)
keeps track of old/new (and tries to hide these implementation details under its methods) with delGen, normGen, isCompoundFile and preLockless (which is derived from isCompoundFile). Once an optimize is done, or, all old segments have been merged away, then all segments are now the lockles

[jira] Commented: (LUCENE-701) Lock-less commits

2006-10-31 Thread Yonik Seeley (JIRA)
rectory listing can be incorrect > (stale) by up to 1.0 seconds. That sucks... (but great job on the very thorough testing). Can it happen with the latest version of OS X? If not, couldn't we just require an upgrade, or do you think that other platforms suffer from this. >

[jira] Updated: (LUCENE-701) Lock-less commits

2006-10-27 Thread Michael McCandless (JIRA)
). > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions

[jira] Updated: (LUCENE-701) Lock-less commits

2006-10-27 Thread Michael McCandless (JIRA)
). > Lock-less commits > - > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions

[jira] Updated: (LUCENE-701) Lock-less commits

2006-10-27 Thread Michael McCandless (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ] Michael McCandless updated LUCENE-701: -- Attachment: lockless-commits-patch.txt > Lock-less commits > - > > Key: LUCENE-701 >

[jira] Created: (LUCENE-701) Lock-less commits

2006-10-27 Thread Michael McCandless (JIRA)
Lock-less commits - Key: LUCENE-701 URL: http://issues.apache.org/jira/browse/LUCENE-701 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.1 Reporter: Michael

Re: Lock-less commits

2006-08-26 Thread Yonik Seeley
On 8/25/06, Doron Cohen <[EMAIL PROTECTED]> wrote: In this highly interactive search scenario is it true that every opened searcher needs a directory listing? - If so is this a possible performance hit for the searchers, similar to discussion in this thread for writers. But we should worry more f

Re: Lock-less commits

2006-08-26 Thread Michael McCandless
Doron Cohen wrote: In my local changes (using numbered files) for lock-less commits, I've implemented Yonik's suggestsion of opening segments in reverse order, and this has definitely reduced the number of "retries" that the searchers hit on opening the index. Even i

Re: Lock-less commits

2006-08-25 Thread Doron Cohen
> I think that because of NFS caching thats only true when a1 and > a2 deal with the same file -- i could be wrong however. If i'm right then > your proposal suffers the same problem (another client might see all of > the changes you've made to the version file and think the index is in a > consis

Re: Lock-less commits

2006-08-25 Thread Chris Hostetter
: The RFC for NFS version 2 (http://tools.ietf.org/html/rfc1094) says: "All : of the procedures in the NFS protocol are assumed to be synchronous. When : a procedure returns to the client, the client can assume that the operation : has completed and any data associated with the request is now on

Re: Lock-less commits

2006-08-25 Thread Doron Cohen
as to wait, while in this suggestion readers may > > need to wait for a writer that commits just now. > > Yes ideally a reader should never have to wait. > > In my local changes (using numbered files) for lock-less commits, I've > implemented Yonik's suggestsion of op

Re: Lock-less commits

2006-08-25 Thread Michael McCandless
a half second that's okay. Right... this is an important point that I missed - in the numbered-files approach a reader never has to wait, while in this suggestion readers may need to wait for a writer that commits just now. Yes ideally a reader should never have to wait. In my local chan

Re: Lock-less commits

2006-08-25 Thread Doron Cohen
and they completed, it seems that a reader "seeing" the result of action a2 must also "feel" the result of action a1. (This would prevent errors with the proposed version number.) But I am no expert in NFS and may be wrong here. > > > : Date: Thu, 24 Aug 2006 23:22:56

Re: Lock-less commits

2006-08-24 Thread Chris Hostetter
ed. Your version file might suffer the same fate (with reader clients seeing V1==V2 because the whole file is a second stale) : Date: Thu, 24 Aug 2006 23:22:56 -0700 : From: Doron Cohen <[EMAIL PROTECTED]> : Reply-To: java-dev@lucene.apache.org : To: java-dev@lucene.apache.org : Subject: Re

Re: Lock-less commits

2006-08-24 Thread Doron Cohen
I would like to discuss an additional approach, that requires small changes to current Lucene implementation. Here, the index version (currently in segments file) is maintained in a separate file, and is used to synchronize between readers and writers, without requiring readers to create/obtain any

Re: Lock-less commits

2006-08-21 Thread robert engels
You are correct - I wasn't thinking of it that way. On Aug 21, 2006, at 10:44 AM, Yonik Seeley wrote: On 8/21/06, robert engels <[EMAIL PROTECTED]> wrote: Then keeping the segments in memory is not helpful, as every open of the writer needs to traverse the directory (since another writer still

Re: Lock-less commits

2006-08-21 Thread Yonik Seeley
On 8/21/06, robert engels <[EMAIL PROTECTED]> wrote: Then keeping the segments in memory is not helpful, as every open of the writer needs to traverse the directory (since another writer still could have created segments). For example, Computer A opens writer, modifies index, closes writer. Com

Re: Lock-less commits

2006-08-21 Thread robert engels
Then keeping the segments in memory is not helpful, as every open of the writer needs to traverse the directory (since another writer still could have created segments). For example, Computer A opens writer, modifies index, closes writer. Computer B opens writer (this must read the directory

Re: Lock-less commits

2006-08-20 Thread Michael McCandless
robert engels wrote: I don't think you can do this. If two different writers are opened for the same indexed, you always need to read the directory since the other may have created new segments. This case should be OK. You have to close one IndexWriter before opening the other (only 1 writer

Re: Lock-less commits

2006-08-20 Thread robert engels
I don't think you can do this. If two different writers are opened for the same indexed, you always need to read the directory since the other may have created new segments. On Aug 20, 2006, at 1:35 PM, Michael McCandless wrote: Yonik Seeley wrote: On 8/20/06, Michael McCandless <[EMAIL PR

Re: Lock-less commits

2006-08-20 Thread Michael McCandless
Yonik Seeley wrote: On 8/20/06, Michael McCandless <[EMAIL PROTECTED]> wrote: On deletable: yes, I'm currently GC'ing unused segments by doing a full directory listing. Actually, you could get a full directory listing once per IndexWriter and keep the results up-to-date in memory (including de

Re: Lock-less commits

2006-08-20 Thread Yonik Seeley
On 8/20/06, Michael McCandless <[EMAIL PROTECTED]> wrote: On deletable: yes, I'm currently GC'ing unused segments by doing a full directory listing. Actually, you could get a full directory listing once per IndexWriter and keep the results up-to-date in memory (including deletes that fail). No

Re: Lock-less commits

2006-08-20 Thread Michael McCandless
Yonik Seeley wrote: On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote: > One would also have to worry about partially deleted segments on > Windows... while removing a segment, some of the files might fail to > delete (due to still being open) and some might succeed. Yes, I think this ca

Re: Lock-less commits

2006-08-20 Thread Yonik Seeley
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote: > One would also have to worry about partially deleted segments on > Windows... while removing a segment, some of the files might fail to > delete (due to still being open) and some might succeed. Yes, I think this case is handled correct

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
You also have to make sure you test this on non-Windows systems. Since a delete in Windows is prevented when the file is open, but non-Windows system do not have this limitation so there is a far greater chance you will have an inconsistent index. Excellent point, will do. I'm now testing a

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
rmed there (and not in Lucene). Yes I agree, and this is in process: http://issues.apache.org/jira/browse/LUCENE-635 I think even if we can do lock-less commits, we would still want to use native locks for the write locks. I'm also working on an OS level locking implementation

Re: Lock-less commits

2006-08-18 Thread robert engels
You also have to make sure you test this on non-Windows systems. Since a delete in Windows is prevented when the file is open, but non- Windows system do not have this limitation so there is a far greater chance you will have an inconsistent index. On Aug 18, 2006, at 5:00 PM, Michael McCan

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
Also, the commit lock is there to allow the merge process to remove unused segments. Without it, a reader might get half way through reading the segments, only to find some missing, and then have to restart reading again. In a highly interactive environment this would be too inefficient. OK

Re: Lock-less commits

2006-08-18 Thread robert engels
Also, the commit lock is there to allow the merge process to remove unused segments. Without it, a reader might get half way through reading the segments, only to find some missing, and then have to restart reading again. In a highly interactive environment this would be too inefficient.

Re: Lock-less commits

2006-08-18 Thread robert engels
I am betting that if your remote locking has issues, you will have the similar problems (since your new code requires accurate reading of the directory to determine the "latest" files). I also believe that directory reads like this are VERY inefficient in most cases. I think these proposed

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
i don't think these changes are going to work. With multiple writers and or readers doing deletes, without serializing the writes you will have inconsistencies - and the del files will need to be unioned. That is: station A opens the index station B opens the index station A deletes some do

Re: Lock-less commits

2006-08-18 Thread robert engels
i don't think these changes are going to work. With multiple writers and or readers doing deletes, without serializing the writes you will have inconsistencies - and the del files will need to be unioned. That is: station A opens the index station B opens the index station A deletes some do

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
It could in theory lead to starvation but this should be rare in practice unless you have an IndexWriter that's constantly committing. An index with a small mergeFactor (say 2) and a small maxBufferedDocs (default 10), would have segments deleted every mergeFactor*maxBufferedDocs when rapidly

Re: Lock-less commits

2006-08-18 Thread Yonik Seeley
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote: It could in theory lead to starvation but this should be rare in practice unless you have an IndexWriter that's constantly committing. An index with a small mergeFactor (say 2) and a small maxBufferedDocs (default 10), would have segment

Re: Lock-less commits

2006-08-18 Thread Michael McCandless
The basic idea is to change all commits (from SegmentReader or IndexWriter) so that we never write to an existing file that a reader could be reading from. Instead, always write to a new file name using sequentially numbered files. For example, for "segments", on every commit, write to a the s

Re: Lock-less commits

2006-08-18 Thread Yonik Seeley
The basic idea is to change all commits (from SegmentReader or IndexWriter) so that we never write to an existing file that a reader could be reading from. Instead, always write to a new file name using sequentially numbered files. For example, for "segments", on every commit, write to a the seq

Lock-less commits

2006-08-18 Thread Michael McCandless
I think it's possible to modify Lucene's commit process so that it does not require any commit locking at all. This would be a big win because it would prevent all the various messy errors (FileNotFound exceptions on instantiating an IndexReader, Access Denied errors on renaming X.new -> X, Lock