Re: FederatedSearch and large Lucene distributed indexes

Mike Klaas Thu, 10 May 2007 15:14:51 -0700

On 10-May-07, at 3:02 PM, Daniel Creão wrote:

So, I tried Solr and read about FederatedSearch andCollectionDistribution.An 'all-machines-have-complete-index' strategy (using rsync) canimprove
system throughput and concurrency by each station processing different
queries, but each query will spend the same amount of time that a
single-node system (what sucks).


A single-node system _with 1/N the traffic_, sure.

When each of a N-station cluster indexing 1/N of text collection,each willmachine spend less time processing queries, but all machines mustprocessthe same query at the same time (a 'goodbye, concurrency', IMO),then merge
results.


I don't really understand this.

For huge corpora, you must distribute different parts of the indexover multiple servers. For high throughput, you must distribute thesame part of the index over multiple servers. These are notcompeting strategies, and to solve both problems, both solutions mustbe employed.

Did I get anything wrong (about Hadoop and Solr)?
Is Multiple Masters/FederatedSearch under development? What status?Or did I
should develop it for myself?

Implementation of this in Solr is still in the highly theoreticalstage, so is unlikely to happen any time soon.

You might try Nutch, which is basically an implementation of thisstrategy using Lucene.


-Mike

Re: FederatedSearch and large Lucene distributed indexes

Reply via email to