Re: Role of zookeeper at runtime

varun srivastava Thu, 28 Feb 2013 15:20:34 -0800

Any thought on this ?

We have 10 virtual data centres . Now its setup like this because we do
rolling update. While 1 st dc is getting indexed other 9 serve traffic .
Indexing one dc take 2 hours. Now with single shard we use to index one dc
and then quickly replicate index into other dcs by having master-slave
setup. Now in case of solr cloud obviously we can't index each dc
sequentially as it will take 2*10 hours. So we need way of indexing 1 dc
and then somehow quickly propagate the index binary to others. What will
you recommend for solr cloud ?


Thanks
Varun

On Thu, Feb 28, 2013 at 11:33 AM, varun srivastava
<varunmail...@gmail.com>wrote:

> How can I setup cloud master-slave ? Can you point me to any sample config
> or tutorial which describe the steps to get slor cloud in master-slave
> setup.
>
> As you know from my previous mails, that I dont need active solr replicas,
> I just need a mechanism to copy a given solr cloud index to a new instance
> of solr-cloud ( classic master-slave setup)
>
> Eric/ Mark,
>   We have 10 virtual data centres . Now its setup like this because we do
> rolling update. While 1 st dc is getting indexed other 9 serve traffic .
> Indexing one dc take 2 hours. Now with single shard we use to index one dc
> and then quickly replicate index into other dcs by having master-slave
> setup. Now in case of solr cloud obviously we can't index each dc
> sequentially as it will take 2*10 hours. So we need way of indexing 1 dc
> and then somehow quickly propagate the index binary to others. What will
> you recommend for solr cloud ?
>
> Thanks
> Varun
>
>
> On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller <markrmil...@gmail.com>wrote:
>
>>
>> On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com>
>> wrote:
>>
>> > So does it means while doing "document add" the state of cluster is
>> fetched
>> > from zookeeper and then depending upon hash of docid the target shard is
>> > decided ?
>>
>> We keep the zookeeper info cached locally. We only updated it when
>> ZooKeeper tells us it has changed.
>>
>> >
>> > Assume we have 3 shards ( with no replicas) in which 1 went down while
>> > indexing , so will all the documents will be routed to remaining 2
>> shards
>> > or only 2/3 rd of the documents will be indexed ? If answer is
>> remaining 2
>> > shards will get all the documents , then if later 3rd shard comes up
>> online
>> > then will solr cloud will do rebalancing ?
>>
>> All of the updates that hash to the third shard will fail. That is why we
>> have replicas - if you have a replica, it will take over as the leader.
>>
>> >
>> > Is anywhere in zookeeper we store the range of docids stored in each
>> shard,
>> > or any other information about actual docs ?
>>
>> The range of hashes are stored for each shard in zk.
>>
>> > We have 2 datacentres (dc1 and
>> > dc2) which need to be indexed with exactly same data and we update index
>> > only once a day. Both dc1 and dc2 have exact same solrcloud config and
>> > machines.
>> >
>> > Can we populate dc2 by just copying all the index binaries from
>> > solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing
>> > same documents on dc2). I guess solr replication API doesn't work in
>> > solrcloud, hence loooking for work around.
>> >
>> > Thanks
>> > Varun
>> >
>> > On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>> >
>> >> ZooKeeper
>> >> /
>> >> /clusterstate.json - info about the layout and state of the cluster -
>> >> collections, shards, urls, etc
>> >> /collections - config to use for the collection, shard leader voting zk
>> >> nodes
>> >> /configs - sets of config files
>> >> /live_nodes - ephemeral nodes, one per Solr node
>> >> /overseer - work queue for update clusterstate.json, creating new
>> >> collections, etc
>> >> /overseer_elect - overseer voting zk nodes
>> >>
>> >> - Mark
>> >>
>> >> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com>
>> >> wrote:
>> >>
>> >>> Hi Mark,
>> >>> One more question
>> >>>
>> >>> While doing solr doc update/add what information is required from
>> >> zookeeper
>> >>> ? Can you tell what all information is stored in zookeeper other than
>> the
>> >>> startup configs.
>> >>>
>> >>> Thanks
>> >>> Varun
>> >>>
>> >>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com>
>> >> wrote:
>> >>>
>> >>>>
>> >>>> On Feb 26, 2013, at 5:25 PM, varun srivastava <
>> varunmail...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi All,
>> >>>>> I have some questions regarding role of zookeeper in solrcloud
>> runtime,
>> >>>>> while processing the queries .
>> >>>>>
>> >>>>> 1) Is zookeeper cluster referred by solr shards for processing every
>> >>>>> request, or its only used to copy config on startup time ?
>> >>>>
>> >>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore
>> >> startup
>> >>>> - to get configs and set itself up. Then it only talks to ZooKeeper
>> >> when a
>> >>>> cluster state change happens - in that case, ZooKeeper pings Solr and
>> >> Solr
>> >>>> will get an update view of the cluster. That view is cached and used
>> for
>> >>>> requests. In a stable state, Solr is not talking to ZooKeeper other
>> than
>> >>>> the heartbeat they keep to know a node is up.
>> >>>>
>> >>>>> 2) How loadbalancing is done between replicas ? Is traffic stat
>> shared
>> >>>>> through zookeeper ?
>> >>>>
>> >>>> Basic round robin. Traffic stats are not currently in Zk.
>> >>>>
>> >>>>> 3) If for any reason zookeeper cluster goes offline for sometime,
>> does
>> >>>> solr
>> >>>>> cloud will not be able to server any traffic ?
>> >>>>
>> >>>> It will stop allowing updates, but continue serving searches.
>> >>>>
>> >>>> - Mark
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> Thanks
>> >>>>> Varun
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Role of zookeeper at runtime

Reply via email to