Any thought on this ? We have 10 virtual data centres . Now its setup like this because we do rolling update. While 1 st dc is getting indexed other 9 serve traffic . Indexing one dc take 2 hours. Now with single shard we use to index one dc and then quickly replicate index into other dcs by having master-slave setup. Now in case of solr cloud obviously we can't index each dc sequentially as it will take 2*10 hours. So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. What will you recommend for solr cloud ?
Thanks Varun On Thu, Feb 28, 2013 at 11:33 AM, varun srivastava <varunmail...@gmail.com>wrote: > How can I setup cloud master-slave ? Can you point me to any sample config > or tutorial which describe the steps to get slor cloud in master-slave > setup. > > As you know from my previous mails, that I dont need active solr replicas, > I just need a mechanism to copy a given solr cloud index to a new instance > of solr-cloud ( classic master-slave setup) > > Eric/ Mark, > We have 10 virtual data centres . Now its setup like this because we do > rolling update. While 1 st dc is getting indexed other 9 serve traffic . > Indexing one dc take 2 hours. Now with single shard we use to index one dc > and then quickly replicate index into other dcs by having master-slave > setup. Now in case of solr cloud obviously we can't index each dc > sequentially as it will take 2*10 hours. So we need way of indexing 1 dc > and then somehow quickly propagate the index binary to others. What will > you recommend for solr cloud ? > > Thanks > Varun > > > On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller <markrmil...@gmail.com>wrote: > >> >> On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com> >> wrote: >> >> > So does it means while doing "document add" the state of cluster is >> fetched >> > from zookeeper and then depending upon hash of docid the target shard is >> > decided ? >> >> We keep the zookeeper info cached locally. We only updated it when >> ZooKeeper tells us it has changed. >> >> > >> > Assume we have 3 shards ( with no replicas) in which 1 went down while >> > indexing , so will all the documents will be routed to remaining 2 >> shards >> > or only 2/3 rd of the documents will be indexed ? If answer is >> remaining 2 >> > shards will get all the documents , then if later 3rd shard comes up >> online >> > then will solr cloud will do rebalancing ? >> >> All of the updates that hash to the third shard will fail. That is why we >> have replicas - if you have a replica, it will take over as the leader. >> >> > >> > Is anywhere in zookeeper we store the range of docids stored in each >> shard, >> > or any other information about actual docs ? >> >> The range of hashes are stored for each shard in zk. >> >> > We have 2 datacentres (dc1 and >> > dc2) which need to be indexed with exactly same data and we update index >> > only once a day. Both dc1 and dc2 have exact same solrcloud config and >> > machines. >> > >> > Can we populate dc2 by just copying all the index binaries from >> > solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing >> > same documents on dc2). I guess solr replication API doesn't work in >> > solrcloud, hence loooking for work around. >> > >> > Thanks >> > Varun >> > >> > On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> > >> >> ZooKeeper >> >> / >> >> /clusterstate.json - info about the layout and state of the cluster - >> >> collections, shards, urls, etc >> >> /collections - config to use for the collection, shard leader voting zk >> >> nodes >> >> /configs - sets of config files >> >> /live_nodes - ephemeral nodes, one per Solr node >> >> /overseer - work queue for update clusterstate.json, creating new >> >> collections, etc >> >> /overseer_elect - overseer voting zk nodes >> >> >> >> - Mark >> >> >> >> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com> >> >> wrote: >> >> >> >>> Hi Mark, >> >>> One more question >> >>> >> >>> While doing solr doc update/add what information is required from >> >> zookeeper >> >>> ? Can you tell what all information is stored in zookeeper other than >> the >> >>> startup configs. >> >>> >> >>> Thanks >> >>> Varun >> >>> >> >>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com> >> >> wrote: >> >>> >> >>>> >> >>>> On Feb 26, 2013, at 5:25 PM, varun srivastava < >> varunmail...@gmail.com> >> >>>> wrote: >> >>>> >> >>>>> Hi All, >> >>>>> I have some questions regarding role of zookeeper in solrcloud >> runtime, >> >>>>> while processing the queries . >> >>>>> >> >>>>> 1) Is zookeeper cluster referred by solr shards for processing every >> >>>>> request, or its only used to copy config on startup time ? >> >>>> >> >>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore >> >> startup >> >>>> - to get configs and set itself up. Then it only talks to ZooKeeper >> >> when a >> >>>> cluster state change happens - in that case, ZooKeeper pings Solr and >> >> Solr >> >>>> will get an update view of the cluster. That view is cached and used >> for >> >>>> requests. In a stable state, Solr is not talking to ZooKeeper other >> than >> >>>> the heartbeat they keep to know a node is up. >> >>>> >> >>>>> 2) How loadbalancing is done between replicas ? Is traffic stat >> shared >> >>>>> through zookeeper ? >> >>>> >> >>>> Basic round robin. Traffic stats are not currently in Zk. >> >>>> >> >>>>> 3) If for any reason zookeeper cluster goes offline for sometime, >> does >> >>>> solr >> >>>>> cloud will not be able to server any traffic ? >> >>>> >> >>>> It will stop allowing updates, but continue serving searches. >> >>>> >> >>>> - Mark >> >>>> >> >>>>> >> >>>>> >> >>>>> Thanks >> >>>>> Varun >> >>>> >> >>>> >> >> >> >> >> >> >