On Nov 29, 2006, at 8:47 AM, Grant Ingersoll wrote:
I will take care of the file formats issue. I thought I updated
before committing, but I obviously missed it.
Done.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addit
I will take care of the file formats issue. I thought I updated
before committing, but I obviously missed it.
I would like to move the rest of this discussion to http://
issues.apache.org/jira/browse/LUCENE-708 so it is captured on the
issue, so look for a message from that issue shortly.
> More then likely the specific changes you asked about were overlooked
when
> porting from the old style xdocs directory to the new forrest directory
> structure ...
Should we create an issue for restoring this data in the trunk version of
the docs?
> but even if they weren't how should these ch
: But I first searched the FileFormats document - as far as I can tell the
: lock-less commit data is no longer in this document - not in the trunk, nor
: in the updated Web site. I understand that Web site currently reflects the
: last release 2.0, but I thought the trunk should have it.
this is
I was trying to figure out something about the way deleted docs are
handled.
Eventually found the answer in the code.
But I first searched the FileFormats document - as far as I can tell the
lock-less commit data is no longer in this document - not in the trunk, nor
in the updated Web site. I und
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ]
Michael McCandless resolved LUCENE-701.
---
Fix Version/s: 2.1
Resolution: Fixed
> Lock-less commits
> -
>
> Key: LUCENE-701
>
[
http://issues.apache.org/jira/browse/LUCENE-701?page=comments#action_12448902 ]
Michael McCandless commented on LUCENE-701:
---
Oooh -- I would love to!
> Lock-less commits
> -
>
> Key
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ]
Otis Gospodnetic updated LUCENE-701:
Lucene Fields: [Patch Available] (was: [New])
> Lock-less commits
> -
>
> Key: LUCENE-701
>
ently substantial change that will make us want to
make a 2.1 release.
Maybe Michael should commit this next week.
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
>
changes the on-disk format, which affects more than
Lucene Java and, should anyone that's using Lucene out there care (via scripts,
etc.), the naming of files on disk.
I'm just wondering if there's any interest/reason for doing a 2.1 before
something with those side
?
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>Affects Versions
tch.
This addresses all feedback/TODOs that I knew about.
All unit tests pass.
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
> Project: Lucene - Java
>
his is part of a wider context. Maybe it's the creation
of the compound file you're thinking of? That writes 0's into the
header, adds the files, then rewinds and puts the actual offsets into
it. Then let's open a separate issue to track this -- I'll
o so).
Still, I think the point at which starvation starts to happen is far
beyond a normal usage of Lucene (ie, committing > ten times / sec).
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/L
(segment
file lengths at the beginning IIRC).
When this option is added, perhaps the configuration name should be generic and
not tied to the implementation specifics that could change more frequently?
Something like WRITE_ONCE or setWriteOnce()?
> Lock-less commits
> -
ic is applied to the
loading of all the files of an index.
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
> Project: Lucene - Java
> Issue Type: Improv
vent a reader from opening at all? I don't think so (and it would be a
mis-configured writer IMO), but maybe Michael could speak to that.
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse
[
http://issues.apache.org/jira/browse/LUCENE-701?page=comments#action_12446638 ]
Ning Li commented on LUCENE-701:
Can the following scenario happen with lock-less commits?
1 A reader reads segments.1, which says the index contains seg_1.
2 A
; everything (like Solr) will see.
Ahh got it, OK. That's fair.
> I'm not sure I understand the "segments.gen" logic of writing two
> longs that are identical. Looking at the code, it doesn't seem like
> you are implementing this:
> http://www.nabble.com
rstand the "segments.gen" logic of writing two longs that
are identical.
Looking at the code, it doesn't seem like you are implementing this:
http://www.nabble.com/Re%3A-Lock-less-commits-p5978090.html
Are there two longs instead of one in order to leave "space" for that
implementat
check whether the searcher is current and if not, reopen
it, and then run the query, vs having separate background thread do
this, which is certainly feasible just more complicated.
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: ht
ays
correct. But I think it's entirely likely that filesystems do this
kind of time-based (only) cache validation. I figured better safe than
sorry here, and Lucene should tolerate stale caching around either
file contents or directory listing.
> Lock-less commits
> ---
keeps track of old/new (and tries to hide these implementation
details under its methods) with delGen, normGen, isCompoundFile and
preLockless (which is derived from isCompoundFile).
Once an optimize is done, or, all old segments have been merged away,
then all segments are now the lockles
rectory listing can be incorrect
> (stale) by up to 1.0 seconds.
That sucks... (but great job on the very thorough testing).
Can it happen with the latest version of OS X? If not, couldn't we just
require an upgrade, or do you think that other platforms suffer from this.
>
).
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>Affects Versions
).
> Lock-less commits
> -
>
> Key: LUCENE-701
> URL: http://issues.apache.org/jira/browse/LUCENE-701
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>Affects Versions
[ http://issues.apache.org/jira/browse/LUCENE-701?page=all ]
Michael McCandless updated LUCENE-701:
--
Attachment: lockless-commits-patch.txt
> Lock-less commits
> -
>
> Key: LUCENE-701
>
Lock-less commits
-
Key: LUCENE-701
URL: http://issues.apache.org/jira/browse/LUCENE-701
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.1
Reporter: Michael
On 8/25/06, Doron Cohen <[EMAIL PROTECTED]> wrote:
In this highly interactive search scenario is it true that every opened
searcher needs a directory listing? - If so is this a possible performance
hit for the searchers, similar to discussion in this thread for writers.
But we should worry more f
Doron Cohen wrote:
In my local changes (using numbered files) for lock-less commits, I've
implemented Yonik's suggestsion of opening segments in reverse order,
and this has definitely reduced the number of "retries" that the
searchers hit on opening the index. Even i
> I think that because of NFS caching thats only true when a1 and
> a2 deal with the same file -- i could be wrong however. If i'm right
then
> your proposal suffers the same problem (another client might see all of
> the changes you've made to the version file and think the index is in a
> consis
: The RFC for NFS version 2 (http://tools.ietf.org/html/rfc1094) says: "All
: of the procedures in the NFS protocol are assumed to be synchronous. When
: a procedure returns to the client, the client can assume that the operation
: has completed and any data associated with the request is now on
as to wait, while in this suggestion readers
may
> > need to wait for a writer that commits just now.
>
> Yes ideally a reader should never have to wait.
>
> In my local changes (using numbered files) for lock-less commits, I've
> implemented Yonik's suggestsion of op
a half second that's okay.
Right... this is an important point that I missed - in the numbered-files
approach a reader never has to wait, while in this suggestion readers may
need to wait for a writer that commits just now.
Yes ideally a reader should never have to wait.
In my local chan
and they completed, it
seems that a reader "seeing" the result of action a2 must also "feel" the
result of action a1. (This would prevent errors with the proposed version
number.) But I am no expert in NFS and may be wrong here.
>
>
> : Date: Thu, 24 Aug 2006 23:22:56
ed. Your version file might suffer the same fate
(with reader clients seeing V1==V2 because the whole file is a second
stale)
: Date: Thu, 24 Aug 2006 23:22:56 -0700
: From: Doron Cohen <[EMAIL PROTECTED]>
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Re
I would like to discuss an additional approach, that requires small changes
to current Lucene implementation. Here, the index version (currently in
segments file) is maintained in a separate file, and is used to synchronize
between readers and writers, without requiring readers to create/obtain any
You are correct - I wasn't thinking of it that way.
On Aug 21, 2006, at 10:44 AM, Yonik Seeley wrote:
On 8/21/06, robert engels <[EMAIL PROTECTED]> wrote:
Then keeping the segments in memory is not helpful, as every open of
the writer needs to traverse the directory (since another writer
still
On 8/21/06, robert engels <[EMAIL PROTECTED]> wrote:
Then keeping the segments in memory is not helpful, as every open of
the writer needs to traverse the directory (since another writer
still could have created segments).
For example,
Computer A opens writer, modifies index, closes writer.
Com
Then keeping the segments in memory is not helpful, as every open of
the writer needs to traverse the directory (since another writer
still could have created segments).
For example,
Computer A opens writer, modifies index, closes writer.
Computer B opens writer (this must read the directory
robert engels wrote:
I don't think you can do this. If two different writers are opened for
the same indexed, you always need to read the directory since the other
may have created new segments.
This case should be OK. You have to close one IndexWriter before
opening the other (only 1 writer
I don't think you can do this. If two different writers are opened
for the same indexed, you always need to read the directory since the
other may have created new segments.
On Aug 20, 2006, at 1:35 PM, Michael McCandless wrote:
Yonik Seeley wrote:
On 8/20/06, Michael McCandless <[EMAIL PR
Yonik Seeley wrote:
On 8/20/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
On deletable: yes, I'm currently GC'ing unused segments by doing a
full directory listing.
Actually, you could get a full directory listing once per IndexWriter
and keep the results up-to-date in memory (including de
On 8/20/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
On deletable: yes, I'm currently GC'ing unused segments by doing a
full directory listing.
Actually, you could get a full directory listing once per IndexWriter
and keep the results up-to-date in memory (including deletes that
fail). No
Yonik Seeley wrote:
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
> One would also have to worry about partially deleted segments on
> Windows... while removing a segment, some of the files might fail to
> delete (due to still being open) and some might succeed.
Yes, I think this ca
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
> One would also have to worry about partially deleted segments on
> Windows... while removing a segment, some of the files might fail to
> delete (due to still being open) and some might succeed.
Yes, I think this case is handled correct
You also have to make sure you test this on non-Windows systems. Since a
delete in Windows is prevented when the file is open, but non-Windows
system do not have this limitation so there is a far greater chance you
will have an inconsistent index.
Excellent point, will do.
I'm now testing a
rmed there (and
not in Lucene).
Yes I agree, and this is in process:
http://issues.apache.org/jira/browse/LUCENE-635
I think even if we can do lock-less commits, we would still want to use
native locks for the write locks.
I'm also working on an OS level locking implementation
You also have to make sure you test this on non-Windows systems.
Since a delete in Windows is prevented when the file is open, but non-
Windows system do not have this limitation so there is a far greater
chance you will have an inconsistent index.
On Aug 18, 2006, at 5:00 PM, Michael McCan
Also, the commit lock is there to allow the merge process to remove
unused segments. Without it, a reader might get half way through reading
the segments, only to find some missing, and then have to restart
reading again. In a highly interactive environment this would be too
inefficient.
OK
Also, the commit lock is there to allow the merge process to remove
unused segments. Without it, a reader might get half way through
reading the segments, only to find some missing, and then have to
restart reading again. In a highly interactive environment this would
be too inefficient.
I am betting that if your remote locking has issues, you will have
the similar problems (since your new code requires accurate reading
of the directory to determine the "latest" files). I also believe
that directory reads like this are VERY inefficient in most cases.
I think these proposed
i don't think these changes are going to work. With multiple writers and
or readers doing deletes, without serializing the writes you will have
inconsistencies - and the del files will need to be unioned.
That is:
station A opens the index
station B opens the index
station A deletes some do
i don't think these changes are going to work. With multiple writers
and or readers doing deletes, without serializing the writes you
will have inconsistencies - and the del files will need to be unioned.
That is:
station A opens the index
station B opens the index
station A deletes some do
It could in theory lead to starvation but this should be rare in
practice unless you have an IndexWriter that's constantly committing.
An index with a small mergeFactor (say 2) and a small maxBufferedDocs
(default 10), would have segments deleted every
mergeFactor*maxBufferedDocs when rapidly
On 8/18/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
It could in theory lead to starvation but this should be rare in
practice unless you have an IndexWriter that's constantly committing.
An index with a small mergeFactor (say 2) and a small maxBufferedDocs
(default 10), would have segment
The basic idea is to change all commits (from SegmentReader or
IndexWriter) so that we never write to an existing file that a reader
could be reading from. Instead, always write to a new file name using
sequentially numbered files. For example, for "segments", on every
commit, write to a the s
The basic idea is to change all commits (from SegmentReader or
IndexWriter) so that we never write to an existing file that a reader
could be reading from. Instead, always write to a new file name using
sequentially numbered files. For example, for "segments", on every
commit, write to a the seq
I think it's possible to modify Lucene's commit process so that it
does not require any commit locking at all.
This would be a big win because it would prevent all the various messy
errors (FileNotFound exceptions on instantiating an IndexReader,
Access Denied errors on renaming X.new -> X, Lock
59 matches
Mail list logo