Dear all,

I am very interested in Solr and would like to deploy Solr for distributed
indexing and searching. I hope you are the right Solr expert who can help me
out.
However, I have concerns about the scalability and management overhead of
Solr. I am wondering if anyone could give me some guidance on Solr.

Basically, I have the following questions,
For indexing
1.  How does Solr handle the distributed indexing? It seems Solr generates
index on a single box. What if the index is huge and can't sit on one box?
2.  Is it possible for Solr to generate index in HDFS?

For searching
3.  Solr provides Master/Slave framework. How does the Solr distribute the
search? Does Solr know which index/shard to deliver the query to? Or does it
have to do a multicast query to all the nodes?

For fault tolerance
4. Does Solr handle the management overhead automatically? suppose master
goes down, how does Solr recover the master in order to get the latest index
updates?
    Do we have to code ourselves to handle this?
5. Suppose master goes down immediately after the index updates, while the
updates haven't been replicated to the slaves, data loss seems to happen.
Does Solr have any mechanism to deal with that?

Performance of real-time index updating
6. How is the performance of this realtime index updating? Suppose we are
updating a million records for a huge index with billions of records
frequently. Can Solr provides a reasonable performance and low latency on
that? (Probably it is related to Lucene library)




I would be very appreciated if you can give us some guidance.

Best,
edward

Reply via email to