Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-09 Thread Nigel
Got it -- thanks, Mark! (Recently I read elsewhere in the archives of this list about the value or lack thereof of segments.gen, so skipping that file was in the back of my mind as well.) Chris On Thu, Oct 8, 2009 at 3:04 PM, Mark Miller wrote: > Nigel wrote: > > Thanks, Mark. That makes sens

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-08 Thread Mark Miller
Nigel wrote: > Thanks, Mark. That makes sense. I guess if you do it in the right order, > you're guaranteed to have the files in a consistent state, since the only > thing that's actually overwritten is the segments.gen file at the end. > The main thing to do is to copy the segments_N files la

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-08 Thread Nigel
Thanks, Mark. That makes sense. I guess if you do it in the right order, you're guaranteed to have the files in a consistent state, since the only thing that's actually overwritten is the segments.gen file at the end. What about the technique of creating a copy of the directory with hard links a

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-07 Thread Mark Miller
Solr just copies them into the same directory - Lucene files are write once, so its not much different than what happens locally. Nigel wrote: > Right now we logically re-open an index by making an updated copy of the > index in a new directory (using rsync etc.), opening the new copy, and > closi

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-07 Thread Nigel
Right now we logically re-open an index by making an updated copy of the index in a new directory (using rsync etc.), opening the new copy, and closing the old one. We don't use IndexReader.reopen() because the updated index is in a different directory (as opposed to being updated in-place). (Rea

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Mark Miller
I keep considering a full response too this, but I just can't get over the hump and spend the time writing something up. Figured someone else would get to it - perhaps they still will. I will make a comment here though: >Before Lucene 2.9, I don't think this made any difference, as (I think) the

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Michael Busch
On 10/5/09 5:30 PM, Nigel wrote: Before Lucene 2.9, I don't think this made any difference, as (I think) the only advantage to calling reopen vs. just creating another IndexReader was having reopen figure out whether the index had actually changed. (And whave a different way to figure that out

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Jason Rutherglen
I'm not sure I understand the question. You're trying to reopen the segments that you're replicated and you're wondering what's changed in Lucene? On Mon, Oct 5, 2009 at 5:30 PM, Nigel wrote: > Anyone have any ideas here?  I imagine a lot of other people will have a > similar question when trying

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Nigel
Anyone have any ideas here? I imagine a lot of other people will have a similar question when trying to take advantage of the reopen improvements in 2.9. Thanks, Chris On Thu, Oct 1, 2009 at 5:15 PM, Nigel wrote: > I have a question about the reopen functionality in Lucene 2.9. As I > underst

Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-01 Thread Nigel
I have a question about the reopen functionality in Lucene 2.9. As I understand it, since FieldCaches are now per-segment, it can avoid reloading everything when the index is reopened, and instead just load the new segments. For background, like many people we have a distributed architecture wher

Re: Distributed Indexes

2008-02-11 Thread Ruslan Sivak
Basically the index is big is because there is a large number of documents, but each individual document is very small. There is also a lot of redundancy, which, I believe is also why the index size is fairly small. Basically I am using the index to store the user's profile information, and

Re: Distributed Indexes

2008-02-11 Thread Ruslan Sivak
Cedric Ho wrote: On Feb 9, 2008 12:07 AM, Ruslan Sivak <[EMAIL PROTECTED]> wrote: The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances running on two servers for load balancing. Each app does the searches, and the search tim

Re: Distributed Indexes

2008-02-11 Thread Grant Ingersoll
Solr has a strategy using rsync that makes it relatively easy to copy an index around to other servers. It uses rsync to just copy the diffs, so you could easily mirror this in your application. There is no SQL backend for Lucene, but at 4mb you could certainly serialize it as a blob to a

Re: Distributed Indexes

2008-02-10 Thread Cedric Ho
On Feb 9, 2008 12:07 AM, Ruslan Sivak <[EMAIL PROTECTED]> wrote: > The app does other things then search the index. I'm basically using > ColdFusion for the website and have four instances running on two > servers for load balancing. Each app does the searches, and the search > times are small, t

Re: Distributed Indexes

2008-02-10 Thread Ruslan Sivak
So nobody's run into anything like this before? The need to share the index between many copies of the app possibly running on multiple servers? Russ Ruslan Sivak wrote: The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances ru

Re: Distributed Indexes

2008-02-08 Thread Ruslan Sivak
The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances running on two servers for load balancing. Each app does the searches, and the search times are small, the index is small, but it takes a long time to fully create the index

Re: Distributed Indexes

2008-02-07 Thread Erick Erickson
With an index that small, I wonder why you bother with so many copies? What kind of load are you hitting it with and how complex are the queries? Because unless you have *very* high query rate, I'd look at why my queries were taking so long before complexifying things this way. Best Erick On Feb

Re: Distributed Indexes

2008-02-07 Thread Ruslan Sivak
My index is only 4mb. Is there a SQL backend for Lucene? Russ Michael McCandless wrote: If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for this. It basically protects a consistent snapshot of your index, ensuring those fil

Re: Distributed Indexes

2008-02-07 Thread Ruslan Sivak
No, FRS copies the whole directory. It's fairly fast, but if there is a modification on both servers at the same time, there will be issues. Russ Michael McCandless wrote: If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for

Re: Distributed Indexes

2008-02-07 Thread Michael McCandless
If you're able to tell Windows FRS which specific files to copy, then SnapshotDeletionPolicy (in 2.3) should work for this. It basically protects a consistent snapshot of your index, ensuring those files will not be deleted, while not blocking further updates to the index. Mike Ruslan

Distributed Indexes

2008-02-07 Thread Ruslan Sivak
I'm wondering if this is a problem that lucene users have already tackled. I have four copies of the application using a lucene index. They are located on two physical servers with two copies on each server accessing two copies of the lucene index. I use Windows FRS (File Replication Service

Re: Distributed Indexes.

2007-10-29 Thread Otis Gospodnetic
er 29, 2007 4:05:26 PM Subject: Distributed Indexes. HI Folks, We are planning on distributing our index data bases. Has any one got any recommendations, code samples to how this can be acheived. Also please provide any performance hits or bottle necks. Th

Distributed Indexes.

2007-10-29 Thread Durga . Tirunagari
HI Folks, We are planning on distributing our index data bases. Has any one got any recommendations, code samples to how this can be acheived. Also please provide any performance hits or bottle necks. Thanks Much _Durga -