Re: Role of zookeeper at runtime

varun srivastava Thu, 28 Feb 2013 11:34:02 -0800

How can I setup cloud master-slave ? Can you point me to any sample config
or tutorial which describe the steps to get slor cloud in master-slave
setup.


As you know from my previous mails, that I dont need active solr replicas,
I just need a mechanism to copy a given solr cloud index to a new instance
of solr-cloud ( classic master-slave setup)

Eric/ Mark,
  We have 10 virtual data centres . Now its setup like this because we do
rolling update. While 1 st dc is getting indexed other 9 serve traffic .
Indexing one dc take 2 hours. Now with single shard we use to index one dc
and then quickly replicate index into other dcs by having master-slave
setup. Now in case of solr cloud obviously we can't index each dc
sequentially as it will take 2*10 hours. So we need way of indexing 1 dc
and then somehow quickly propagate the index binary to others. What will
you recommend for solr cloud ?

Thanks
Varun

On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller <markrmil...@gmail.com> wrote:

>
> On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com>
> wrote:
>
> > So does it means while doing "document add" the state of cluster is
> fetched
> > from zookeeper and then depending upon hash of docid the target shard is
> > decided ?
>
> We keep the zookeeper info cached locally. We only updated it when
> ZooKeeper tells us it has changed.
>
> >
> > Assume we have 3 shards ( with no replicas) in which 1 went down while
> > indexing , so will all the documents will be routed to remaining 2 shards
> > or only 2/3 rd of the documents will be indexed ? If answer is remaining
> 2
> > shards will get all the documents , then if later 3rd shard comes up
> online
> > then will solr cloud will do rebalancing ?
>
> All of the updates that hash to the third shard will fail. That is why we
> have replicas - if you have a replica, it will take over as the leader.
>
> >
> > Is anywhere in zookeeper we store the range of docids stored in each
> shard,
> > or any other information about actual docs ?
>
> The range of hashes are stored for each shard in zk.
>
> > We have 2 datacentres (dc1 and
> > dc2) which need to be indexed with exactly same data and we update index
> > only once a day. Both dc1 and dc2 have exact same solrcloud config and
> > machines.
> >
> > Can we populate dc2 by just copying all the index binaries from
> > solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing
> > same documents on dc2). I guess solr replication API doesn't work in
> > solrcloud, hence loooking for work around.
> >
> > Thanks
> > Varun
> >
> > On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com>
> wrote:
> >
> >> ZooKeeper
> >> /
> >> /clusterstate.json - info about the layout and state of the cluster -
> >> collections, shards, urls, etc
> >> /collections - config to use for the collection, shard leader voting zk
> >> nodes
> >> /configs - sets of config files
> >> /live_nodes - ephemeral nodes, one per Solr node
> >> /overseer - work queue for update clusterstate.json, creating new
> >> collections, etc
> >> /overseer_elect - overseer voting zk nodes
> >>
> >> - Mark
> >>
> >> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com>
> >> wrote:
> >>
> >>> Hi Mark,
> >>> One more question
> >>>
> >>> While doing solr doc update/add what information is required from
> >> zookeeper
> >>> ? Can you tell what all information is stored in zookeeper other than
> the
> >>> startup configs.
> >>>
> >>> Thanks
> >>> Varun
> >>>
> >>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com>
> >> wrote:
> >>>
> >>>>
> >>>> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi All,
> >>>>> I have some questions regarding role of zookeeper in solrcloud
> runtime,
> >>>>> while processing the queries .
> >>>>>
> >>>>> 1) Is zookeeper cluster referred by solr shards for processing every
> >>>>> request, or its only used to copy config on startup time ?
> >>>>
> >>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore
> >> startup
> >>>> - to get configs and set itself up. Then it only talks to ZooKeeper
> >> when a
> >>>> cluster state change happens - in that case, ZooKeeper pings Solr and
> >> Solr
> >>>> will get an update view of the cluster. That view is cached and used
> for
> >>>> requests. In a stable state, Solr is not talking to ZooKeeper other
> than
> >>>> the heartbeat they keep to know a node is up.
> >>>>
> >>>>> 2) How loadbalancing is done between replicas ? Is traffic stat
> shared
> >>>>> through zookeeper ?
> >>>>
> >>>> Basic round robin. Traffic stats are not currently in Zk.
> >>>>
> >>>>> 3) If for any reason zookeeper cluster goes offline for sometime,
> does
> >>>> solr
> >>>>> cloud will not be able to server any traffic ?
> >>>>
> >>>> It will stop allowing updates, but continue serving searches.
> >>>>
> >>>> - Mark
> >>>>
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Varun
> >>>>
> >>>>
> >>
> >>
>
>

Re: Role of zookeeper at runtime

Reply via email to