Hi, I'm currently looking at SolrCloud. I've managed to set up a scalable cluster with ZooKeeper. ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick understanding ) This way, all different shards / replicas are stored in a centralised configuration.
Moreover the ZooKeeper contains out-of-the-box loadbalancing. So, lets say - you have 2 different shards and each is replicated 2 times. Your zookeeper config will look like this: \config ... /live_nodes (v=6 children=4) lP_Port:7500_solr (ephemeral v=0) lP_Port:7574_solr (ephemeral v=0) lP_Port:8900_solr (ephemeral v=0) lP_Port:8983_solr (ephemeral v=0) /collections (v=20 children=1) collection1 (v=0 children=1) "configName=myconf" shards (v=0 children=2) shard1 (v=0 children=3) lP_Port:8983_solr_ (v=4) "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/" lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/" lP_Port:8900_solr_ (v=1) "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/" shard2 (v=0 children=2) lP_Port:7500_solr_ (v=0) "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/" lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/" --> This setup can be realised, by 1 ZooKeeper module - the other solr machines need just to know the IP_Port were the zookeeper is active & that's it. --> So no configuration / installing is needed to realise quick a scalable / load balanced cluster. Disclaimer: ZooKeeper is a relative new feature - I'm not sure if it will work out in a real production environment, which has a tight SLA pending. But - definitely keep your eyes on this stuff - this will mature quickly! Stijn Vanhoorelbeke