So does it means while doing "document add" the state of cluster is fetched
from zookeeper and then depending upon hash of docid the target shard is
decided ?

Assume we have 3 shards ( with no replicas) in which 1 went down while
indexing , so will all the documents will be routed to remaining 2 shards
or only 2/3 rd of the documents will be indexed ? If answer is remaining 2
shards will get all the documents , then if later 3rd shard comes up online
then will solr cloud will do rebalancing ?

Is anywhere in zookeeper we store the range of docids stored in each shard,
or any other information about actual docs ? We have 2 datacentres (dc1 and
dc2) which need to be indexed with exactly same data and we update index
only once a day. Both dc1 and dc2 have exact same solrcloud config and
machines.

 Can we populate dc2 by just copying all the index binaries from
solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing
same documents on dc2). I guess solr replication API doesn't work in
solrcloud, hence loooking for work around.

Thanks
Varun

On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> wrote:

> ZooKeeper
> /
>  /clusterstate.json - info about the layout and state of the cluster -
> collections, shards, urls, etc
>  /collections - config to use for the collection, shard leader voting zk
> nodes
>  /configs - sets of config files
>  /live_nodes - ephemeral nodes, one per Solr node
>  /overseer - work queue for update clusterstate.json, creating new
> collections, etc
>  /overseer_elect - overseer voting zk nodes
>
> - Mark
>
> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com>
> wrote:
>
> > Hi Mark,
> > One more question
> >
> > While doing solr doc update/add what information is required from
> zookeeper
> > ? Can you tell what all information is stored in zookeeper other than the
> > startup configs.
> >
> > Thanks
> > Varun
> >
> > On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com>
> wrote:
> >
> >>
> >> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com>
> >> wrote:
> >>
> >>> Hi All,
> >>> I have some questions regarding role of zookeeper in solrcloud runtime,
> >>> while processing the queries .
> >>>
> >>> 1) Is zookeeper cluster referred by solr shards for processing every
> >>> request, or its only used to copy config on startup time ?
> >>
> >> No, it's not used per request. Solr talks to ZooKeeper on SolrCore
> startup
> >> - to get configs and set itself up. Then it only talks to ZooKeeper
> when a
> >> cluster state change happens - in that case, ZooKeeper pings Solr and
> Solr
> >> will get an update view of the cluster. That view is cached and used for
> >> requests. In a stable state, Solr is not talking to ZooKeeper other than
> >> the heartbeat they keep to know a node is up.
> >>
> >>> 2) How loadbalancing is done between replicas ? Is traffic stat shared
> >>> through zookeeper ?
> >>
> >> Basic round robin. Traffic stats are not currently in Zk.
> >>
> >>> 3) If for any reason zookeeper cluster goes offline for sometime, does
> >> solr
> >>> cloud will not be able to server any traffic ?
> >>
> >> It will stop allowing updates, but continue serving searches.
> >>
> >> - Mark
> >>
> >>>
> >>>
> >>> Thanks
> >>> Varun
> >>
> >>
>
>

Reply via email to