Re: SolrCloud vs Distributed Solr

Erick Erickson Mon, 08 Jul 2013 05:00:49 -0700

Flavio:

I think you're missing a critical bit about SolrCloud,
namely Zookeeper (ZK), see here on the SolrCloud page
for a start:
http://wiki.apache.org/solr/SolrCloud#ZooKeeper


You'll notice that each Solr node, when it is started,
requires the address of your ZK ensemble, NOT a
solr node. That allows ZK to "know" where all the
nodes are in your cluster.

So each of the nodes "just knows" where all the other
shards are since that info is kept it ZK, so any request
to any node in the cluster "does the right thing", whether
update or query. So updates are forwarded to all
correct leaders, queries are sent to a member of
each shard etc, all automatically.

Now take a look at CloudSolrServer (assuming that
you're using SolrJ from your client). The constructor
takes the address of ZK too. Using this info the client
code has access to information about the state of the
entire cluster, so you don't have to do anything, the
client code will just "know" how to connect to Solr.

So for 1, 2 and 3 above, don't do anything <G>. Just
start up all the solr nodes with the proper
zkHost (or zkRun) parameter and send requests
to any node. You do NOT have to configure shards
in solrconfig.xml or anything else.


For <4>, I'm going to pass on the shard splitting details
since I haven't had time to dive into that yet. But increasing
capacity comes in two "flavors". If you simply need to
get more query throughput, just add more nodes. Solr
will assign them to "the right" shard (although you can
control this), copy the index for that shard down and
start automatically routing new requests to that node too.

The second "flavor" is when your index is too big to fit
on your physical hardware and you need more shards (as
opposed to more replicas). Then you need to do the shard
splitting thing which I'm going to skip rather than
mislead you.

Final note: The other thing  that's confusing you I think is
the distinction between SolrCloud and Solr Master/Slave.
SolrCloud is the new way of doing things. Master/Slave
is a situation in where all the automatic stuff you can do with
SolrCloud must be done manually, things like assigning
documents to particular shards, configuring solrconfig.xml
with the addresses of all the other shards, all that stuff.

Best
Erick

Re: SolrCloud vs Distributed Solr

Reply via email to