Re: Distributed commits in CloudSolrServer

Peter Keegan Wed, 16 Apr 2014 06:10:53 -0700

>Are distributed commits also done in parallel across shards?
I meant 'sequentially' across shards.



On Wed, Apr 16, 2014 at 9:08 AM, Peter Keegan <peterlkee...@gmail.com>wrote:

> Are distributed commits also done in parallel across shards?
>
> Peter
>
>
> On Tue, Apr 15, 2014 at 3:50 PM, Mark Miller <markrmil...@gmail.com>wrote:
>
>> Inline responses below.
>> --
>> Mark Miller
>> about.me/markrmiller
>>
>> On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com)
>> wrote:
>>
>> I have a SolrCloud index, 1 shard, with a leader and one replica, and 3
>> ZKs. The Solr indexes are behind a load balancer. There is one
>> CloudSolrServer client updating the indexes. The index schema includes 3
>> ExternalFileFields. When the CloudSolrServer client issues a hard commit,
>> I
>> observe that the commits occur sequentially, not in parallel, on the
>> leader
>> and replica. The duration of each commit is about a minute. Most of this
>> time is spent reloading the 3 ExternalFileField files. Because of the
>> sequential commits, there is a period of time (1 minute+) when the index
>> searchers will return different results, which can cause a bad user
>> experience. This will get worse as replicas are added to handle
>> auto-scaling. The goal is to keep all replicas in sync w.r.t. the user
>> queries.
>>
>> My questions:
>>
>> 1. Is there a reason that the distributed commits are done in sequence,
>> not
>> in parallel? Is there a way to change this behavior?
>>
>>
>> The reason is that updates are currently done this way - it’s the only
>> safe way to do it without solving some more problems. I don’t think you can
>> easily change this. I think we should probably file a JIRA issue to track a
>> better solution for commit handling. I think there are some complications
>> because of how commits can be added on update requests, but its something
>> we probably want to try and solve before tackling *all* updates to replicas
>> in parallel with the leader.
>>
>>
>>
>> 2. If instead, the commits were done in parallel by a separate client via
>> a
>> GET to each Solr instance, how would this client get the host/port values
>> for each Solr instance from zookeeper? Are there any downsides to doing
>> commits this way?
>>
>> Not really, other than the extra management.
>>
>>
>>
>>
>>
>> Thanks,
>> Peter
>>
>
>

Re: Distributed commits in CloudSolrServer

Reply via email to