>Are distributed commits also done in parallel across shards? I meant 'sequentially' across shards.
On Wed, Apr 16, 2014 at 9:08 AM, Peter Keegan <peterlkee...@gmail.com>wrote: > Are distributed commits also done in parallel across shards? > > Peter > > > On Tue, Apr 15, 2014 at 3:50 PM, Mark Miller <markrmil...@gmail.com>wrote: > >> Inline responses below. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) >> wrote: >> >> I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 >> ZKs. The Solr indexes are behind a load balancer. There is one >> CloudSolrServer client updating the indexes. The index schema includes 3 >> ExternalFileFields. When the CloudSolrServer client issues a hard commit, >> I >> observe that the commits occur sequentially, not in parallel, on the >> leader >> and replica. The duration of each commit is about a minute. Most of this >> time is spent reloading the 3 ExternalFileField files. Because of the >> sequential commits, there is a period of time (1 minute+) when the index >> searchers will return different results, which can cause a bad user >> experience. This will get worse as replicas are added to handle >> auto-scaling. The goal is to keep all replicas in sync w.r.t. the user >> queries. >> >> My questions: >> >> 1. Is there a reason that the distributed commits are done in sequence, >> not >> in parallel? Is there a way to change this behavior? >> >> >> The reason is that updates are currently done this way - it’s the only >> safe way to do it without solving some more problems. I don’t think you can >> easily change this. I think we should probably file a JIRA issue to track a >> better solution for commit handling. I think there are some complications >> because of how commits can be added on update requests, but its something >> we probably want to try and solve before tackling *all* updates to replicas >> in parallel with the leader. >> >> >> >> 2. If instead, the commits were done in parallel by a separate client via >> a >> GET to each Solr instance, how would this client get the host/port values >> for each Solr instance from zookeeper? Are there any downsides to doing >> commits this way? >> >> Not really, other than the extra management. >> >> >> >> >> >> Thanks, >> Peter >> > >