Re: Role of zookeeper at runtime

Erick Erickson Thu, 28 Feb 2013 05:29:17 -0800

To update at least one node must be up for each shard,
otherwise updates fail.


Solr replication works fine in 4.x, in fact it's used to synchronize
when bulk updates happen (say you bring up a new node).
The transaction logs are only used to store at least 100 currently
documents for synchronizing.

I haven't personally tried it, but I'd guess it's possible to set up dc2 NOT
as part of a cluster (i.e. not ZK aware) and just have it use old-style
replication. But why do this? "avoiding indexing in both DCs" strikes
me as a false savings. Just set up two independent Solr clusters, one
in each DC and send the does to each DC. Only go to more complex
solutions if you can demonstrate that this doesn't work would be my
first approach.

Best
Erick


On Tue, Feb 26, 2013 at 6:49 PM, varun srivastava <varunmail...@gmail.com>wrote:

> So does it means while doing "document add" the state of cluster is fetched
> from zookeeper and then depending upon hash of docid the target shard is
> decided ?
>
> Assume we have 3 shards ( with no replicas) in which 1 went down while
> indexing , so will all the documents will be routed to remaining 2 shards
> or only 2/3 rd of the documents will be indexed ? If answer is remaining 2
> shards will get all the documents , then if later 3rd shard comes up online
> then will solr cloud will do rebalancing ?
>
> Is anywhere in zookeeper we store the range of docids stored in each shard,
> or any other information about actual docs ? We have 2 datacentres (dc1 and
> dc2) which need to be indexed with exactly same data and we update index
> only once a day. Both dc1 and dc2 have exact same solrcloud config and
> machines.
>
>  Can we populate dc2 by just copying all the index binaries from
> solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing
> same documents on dc2). I guess solr replication API doesn't work in
> solrcloud, hence loooking for work around.
>
> Thanks
> Varun
>
> On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com>
> wrote:
>
> > ZooKeeper
> > /
> >  /clusterstate.json - info about the layout and state of the cluster -
> > collections, shards, urls, etc
> >  /collections - config to use for the collection, shard leader voting zk
> > nodes
> >  /configs - sets of config files
> >  /live_nodes - ephemeral nodes, one per Solr node
> >  /overseer - work queue for update clusterstate.json, creating new
> > collections, etc
> >  /overseer_elect - overseer voting zk nodes
> >
> > - Mark
> >
> > On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com>
> > wrote:
> >
> > > Hi Mark,
> > > One more question
> > >
> > > While doing solr doc update/add what information is required from
> > zookeeper
> > > ? Can you tell what all information is stored in zookeeper other than
> the
> > > startup configs.
> > >
> > > Thanks
> > > Varun
> > >
> > > On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com>
> > wrote:
> > >
> > >>
> > >> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Hi All,
> > >>> I have some questions regarding role of zookeeper in solrcloud
> runtime,
> > >>> while processing the queries .
> > >>>
> > >>> 1) Is zookeeper cluster referred by solr shards for processing every
> > >>> request, or its only used to copy config on startup time ?
> > >>
> > >> No, it's not used per request. Solr talks to ZooKeeper on SolrCore
> > startup
> > >> - to get configs and set itself up. Then it only talks to ZooKeeper
> > when a
> > >> cluster state change happens - in that case, ZooKeeper pings Solr and
> > Solr
> > >> will get an update view of the cluster. That view is cached and used
> for
> > >> requests. In a stable state, Solr is not talking to ZooKeeper other
> than
> > >> the heartbeat they keep to know a node is up.
> > >>
> > >>> 2) How loadbalancing is done between replicas ? Is traffic stat
> shared
> > >>> through zookeeper ?
> > >>
> > >> Basic round robin. Traffic stats are not currently in Zk.
> > >>
> > >>> 3) If for any reason zookeeper cluster goes offline for sometime,
> does
> > >> solr
> > >>> cloud will not be able to server any traffic ?
> > >>
> > >> It will stop allowing updates, but continue serving searches.
> > >>
> > >> - Mark
> > >>
> > >>>
> > >>>
> > >>> Thanks
> > >>> Varun
> > >>
> > >>
> >
> >
>

Re: Role of zookeeper at runtime

Reply via email to