Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Yonik Seeley
This strategy looks very promising. One drawback is that documents must be added directly to the main index for this to be efficient. This is a bit of a problem if there is a document uniqueness requirement (a unique id field). If one takes the approach of adding docs to a separate lucene index

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Doug Cutting
Yonik Seeley wrote: This strategy looks very promising. One drawback is that documents must be added directly to the main index for this to be efficient. This is a bit of a problem if there is a document uniqueness requirement (a unique id field). This is easy to do with a single index. Here's th

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Yonik Seeley
I'm trying to support an interface where documents can be added one at a time at a high rate (via HTTP POST). You don't know all of the documents ahead of time, so you can't delete them all ahead of time. Given this constraint, it seems like you can do one of two things: 1) collect all the docume

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Doug Cutting
Yonik Seeley wrote: I'm trying to support an interface where documents can be added one at a time at a high rate (via HTTP POST). You don't know all of the documents ahead of time, so you can't delete them all ahead of time. A simple solution is to queue documents as they're posted. When either

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-05-13 Thread Luke Francl
On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: > I don't really consider reading/writing to an NFS mounted FSDirectory to > be viable for the very reasons you listed; but I haven't really found any > evidence of problems if you take they approach that a single "writer" > node indexes to local

RE: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Peter Gelderbloem
Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
---Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: I don't really consider reading/writing

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Otis Gospodnetic
ecture? Any advice would be > greatly > > appreciated. > > > > Peter Gelderbloem > > > > Registered in England 3186704 > > -Original Message- > > From: Luke Francl [mailto:[EMAIL PROTECTED] > > Sent: 13 May 2005 22:04 > > To: java-user

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
w/proj/seda/ I am just reading up on it now. Does anyone have experience building a lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECT

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Erik Hatcher
edu/~mdw/proj/seda/ I am just reading up on it now. Does anyone have experience building a lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
essage- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: I don't really consider reading/writing to an NF

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Erik Hatcher
On Jul 14, 2005, at 9:45 PM, Paul Smith wrote: Cl, I should go have a look at that.. That begs another question though, where does Nutch stand in terms of the ASF? Did I read (or dream) that Nutch may be coming in under ASF? I guess I should get myself subscribed to the Nutch mailing

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Otis Gospodnetic
t;>> an insignificant time. You also have to use bookkeeping to work > > >>>> out > >>>> > >>>> if a 'job' has not been completed in time (maybe failure by the > >>>> worker) and decide whether the job should be resubmi

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
On 15/07/2005, at 3:57 PM, Otis Gospodnetic wrote: The problem that I saw (from your email only) with the "ship the full little index to the Queen" approach is that, from what I understand, you eventually do addIndexes(Directory[]) in there, and as this optimizes things in the end, this means y

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-15 Thread Andrzej Bialecki
Paul Smith wrote: I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce code is in Nutch, I haven't been paying close enough attention. Me too.. :) I didn't even know Nutch was now fully in the ASF, and I'm a Member... :-$ Let me pipe in on behalf of the Nutch project... T

RE: Best Practices for Distributing Lucene Indexing and Searching

2005-07-18 Thread Peter Gelderbloem
I am thinking of having a cluster of one indexer and a few searchers 1 to n. The indexer will consist of a number of stages as defined in SEDA. I must still do this decomposition. the resulting index will be published via message q to the searchers that will stop doing searches long enough to upda

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-18 Thread Chris Lamprecht
See the paper at: http://labs.google.com/papers/mapreduce.html "MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a re