Hi,

I'm currently looking at SolrCloud. I've managed to set up a scalable
cluster with ZooKeeper.
( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick
understanding )
This way, all different shards / replicas are stored in a centralised
configuration.

Moreover the ZooKeeper contains out-of-the-box loadbalancing.
So, lets say - you have 2 different shards and each is replicated 2 times.
Your zookeeper config will look like this:

\config
 ...
   /live_nodes (v=6 children=4)
          lP_Port:7500_solr (ephemeral v=0)
          lP_Port:7574_solr (ephemeral v=0)
          lP_Port:8900_solr (ephemeral v=0)
          lP_Port:8983_solr (ephemeral v=0)
     /collections (v=20 children=1)
          collection1 (v=0 children=1) "configName=myconf"
               shards (v=0 children=2)
                    shard1 (v=0 children=3)
                         lP_Port:8983_solr_ (v=4)
"node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/";
                         lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";
                         lP_Port:8900_solr_ (v=1)
"node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/";
                    shard2 (v=0 children=2)
                         lP_Port:7500_solr_ (v=0)
"node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/";
                         lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";

--> This setup can be realised, by 1 ZooKeeper module - the other solr
machines need just to know the IP_Port were the zookeeper is active & that's
it.
--> So no configuration / installing is needed to realise quick a scalable /
load balanced cluster.

Disclaimer:
ZooKeeper is a relative new feature - I'm not sure if it will work out in a
real production environment, which has a tight SLA pending.
But - definitely keep your eyes on this stuff - this will mature quickly!

Stijn Vanhoorelbeke

Reply via email to