HI,
        What you say is done by hadoop that support Hardware Failure态Data
Replication and some else . 
        If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think. :) 
        I just want to know whether there is a component existed can do the
distributed search based on Solr.

Thanks 
                Jarvis.

-----Original Message-----
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 20, 2007 10:06 AM
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Thu, 20 Sep 2007 09:37:51 +0800
"Jarvis" <[EMAIL PROTECTED]> wrote:

> If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in
this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built if it addresses solr's needs.

>  Manually separate the index is required .

hmm i imagine this really depends on the application. In my case, this
separation of which docs go where happens @ a completely different layer.

> We will receive reduplicate result if there is reduplicate index document
on
> different servers. 

Maybe I got this wrong...but isn't this what mapreduce is meant to deal
with?
eg, 

1) get the job (a query)
2) map it to workers ( servers that provide search results from their own
indexing)
3) wait for the results from all workers that reply within acceptable
timeframe.
4) comb through the lot of  results from all workers, reduce them according
to
your own biz rules (eg, remove dupes, sort them by quality / priority...
here possibly relying on the original parameters of the query in 1)
5) return the reduced results to the frontend.

> And also the data updating and single server's error is
> hard to deal with.

this really depends on your infrastructure + design. 

Having the indexing , searching and providing of results in different layers
should make for some interesting design options...

If each searcher (or wherever the index resides) is really a small cluster
of
servers , the issue of data safety / server error is addressed @ that point.
You can also have repeated data across indexes (again, independent indexes)
and
that's a more ... randomised :) way of keeping the docs safe... For example,
IIRC, googleFS keeps copies of each file in 3 servers or more...

cheers,
B
_________________________
{Beto|Norberto|Numard} Meijome

"He uses statistics as a drunken man uses lamp-posts ... for support rather
than illumination." Andrew Lang (1844-1912)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to