So does it means while doing "document add" the state of cluster is fetched from zookeeper and then depending upon hash of docid the target shard is decided ?
Assume we have 3 shards ( with no replicas) in which 1 went down while indexing , so will all the documents will be routed to remaining 2 shards or only 2/3 rd of the documents will be indexed ? If answer is remaining 2 shards will get all the documents , then if later 3rd shard comes up online then will solr cloud will do rebalancing ? Is anywhere in zookeeper we store the range of docids stored in each shard, or any other information about actual docs ? We have 2 datacentres (dc1 and dc2) which need to be indexed with exactly same data and we update index only once a day. Both dc1 and dc2 have exact same solrcloud config and machines. Can we populate dc2 by just copying all the index binaries from solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing same documents on dc2). I guess solr replication API doesn't work in solrcloud, hence loooking for work around. Thanks Varun On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> wrote: > ZooKeeper > / > /clusterstate.json - info about the layout and state of the cluster - > collections, shards, urls, etc > /collections - config to use for the collection, shard leader voting zk > nodes > /configs - sets of config files > /live_nodes - ephemeral nodes, one per Solr node > /overseer - work queue for update clusterstate.json, creating new > collections, etc > /overseer_elect - overseer voting zk nodes > > - Mark > > On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com> > wrote: > > > Hi Mark, > > One more question > > > > While doing solr doc update/add what information is required from > zookeeper > > ? Can you tell what all information is stored in zookeeper other than the > > startup configs. > > > > Thanks > > Varun > > > > On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com> > wrote: > > > >> > >> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com> > >> wrote: > >> > >>> Hi All, > >>> I have some questions regarding role of zookeeper in solrcloud runtime, > >>> while processing the queries . > >>> > >>> 1) Is zookeeper cluster referred by solr shards for processing every > >>> request, or its only used to copy config on startup time ? > >> > >> No, it's not used per request. Solr talks to ZooKeeper on SolrCore > startup > >> - to get configs and set itself up. Then it only talks to ZooKeeper > when a > >> cluster state change happens - in that case, ZooKeeper pings Solr and > Solr > >> will get an update view of the cluster. That view is cached and used for > >> requests. In a stable state, Solr is not talking to ZooKeeper other than > >> the heartbeat they keep to know a node is up. > >> > >>> 2) How loadbalancing is done between replicas ? Is traffic stat shared > >>> through zookeeper ? > >> > >> Basic round robin. Traffic stats are not currently in Zk. > >> > >>> 3) If for any reason zookeeper cluster goes offline for sometime, does > >> solr > >>> cloud will not be able to server any traffic ? > >> > >> It will stop allowing updates, but continue serving searches. > >> > >> - Mark > >> > >>> > >>> > >>> Thanks > >>> Varun > >> > >> > >