How can I setup cloud master-slave ? Can you point me to any sample config or tutorial which describe the steps to get slor cloud in master-slave setup.
As you know from my previous mails, that I dont need active solr replicas, I just need a mechanism to copy a given solr cloud index to a new instance of solr-cloud ( classic master-slave setup) Eric/ Mark, We have 10 virtual data centres . Now its setup like this because we do rolling update. While 1 st dc is getting indexed other 9 serve traffic . Indexing one dc take 2 hours. Now with single shard we use to index one dc and then quickly replicate index into other dcs by having master-slave setup. Now in case of solr cloud obviously we can't index each dc sequentially as it will take 2*10 hours. So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. What will you recommend for solr cloud ? Thanks Varun On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller <markrmil...@gmail.com> wrote: > > On Feb 26, 2013, at 6:49 PM, varun srivastava <varunmail...@gmail.com> > wrote: > > > So does it means while doing "document add" the state of cluster is > fetched > > from zookeeper and then depending upon hash of docid the target shard is > > decided ? > > We keep the zookeeper info cached locally. We only updated it when > ZooKeeper tells us it has changed. > > > > > Assume we have 3 shards ( with no replicas) in which 1 went down while > > indexing , so will all the documents will be routed to remaining 2 shards > > or only 2/3 rd of the documents will be indexed ? If answer is remaining > 2 > > shards will get all the documents , then if later 3rd shard comes up > online > > then will solr cloud will do rebalancing ? > > All of the updates that hash to the third shard will fail. That is why we > have replicas - if you have a replica, it will take over as the leader. > > > > > Is anywhere in zookeeper we store the range of docids stored in each > shard, > > or any other information about actual docs ? > > The range of hashes are stored for each shard in zk. > > > We have 2 datacentres (dc1 and > > dc2) which need to be indexed with exactly same data and we update index > > only once a day. Both dc1 and dc2 have exact same solrcloud config and > > machines. > > > > Can we populate dc2 by just copying all the index binaries from > > solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing > > same documents on dc2). I guess solr replication API doesn't work in > > solrcloud, hence loooking for work around. > > > > Thanks > > Varun > > > > On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller <markrmil...@gmail.com> > wrote: > > > >> ZooKeeper > >> / > >> /clusterstate.json - info about the layout and state of the cluster - > >> collections, shards, urls, etc > >> /collections - config to use for the collection, shard leader voting zk > >> nodes > >> /configs - sets of config files > >> /live_nodes - ephemeral nodes, one per Solr node > >> /overseer - work queue for update clusterstate.json, creating new > >> collections, etc > >> /overseer_elect - overseer voting zk nodes > >> > >> - Mark > >> > >> On Feb 26, 2013, at 6:18 PM, varun srivastava <varunmail...@gmail.com> > >> wrote: > >> > >>> Hi Mark, > >>> One more question > >>> > >>> While doing solr doc update/add what information is required from > >> zookeeper > >>> ? Can you tell what all information is stored in zookeeper other than > the > >>> startup configs. > >>> > >>> Thanks > >>> Varun > >>> > >>> On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller <markrmil...@gmail.com> > >> wrote: > >>> > >>>> > >>>> On Feb 26, 2013, at 5:25 PM, varun srivastava <varunmail...@gmail.com > > > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> I have some questions regarding role of zookeeper in solrcloud > runtime, > >>>>> while processing the queries . > >>>>> > >>>>> 1) Is zookeeper cluster referred by solr shards for processing every > >>>>> request, or its only used to copy config on startup time ? > >>>> > >>>> No, it's not used per request. Solr talks to ZooKeeper on SolrCore > >> startup > >>>> - to get configs and set itself up. Then it only talks to ZooKeeper > >> when a > >>>> cluster state change happens - in that case, ZooKeeper pings Solr and > >> Solr > >>>> will get an update view of the cluster. That view is cached and used > for > >>>> requests. In a stable state, Solr is not talking to ZooKeeper other > than > >>>> the heartbeat they keep to know a node is up. > >>>> > >>>>> 2) How loadbalancing is done between replicas ? Is traffic stat > shared > >>>>> through zookeeper ? > >>>> > >>>> Basic round robin. Traffic stats are not currently in Zk. > >>>> > >>>>> 3) If for any reason zookeeper cluster goes offline for sometime, > does > >>>> solr > >>>>> cloud will not be able to server any traffic ? > >>>> > >>>> It will stop allowing updates, but continue serving searches. > >>>> > >>>> - Mark > >>>> > >>>>> > >>>>> > >>>>> Thanks > >>>>> Varun > >>>> > >>>> > >> > >> > >