Re: Distributed commits in CloudSolrServer

Peter Keegan Wed, 16 Apr 2014 06:09:59 -0700

Are distributed commits also done in parallel across shards?

Peter



On Tue, Apr 15, 2014 at 3:50 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Inline responses below.
> --
> Mark Miller
> about.me/markrmiller
>
> On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com)
> wrote:
>
> I have a SolrCloud index, 1 shard, with a leader and one replica, and 3
> ZKs. The Solr indexes are behind a load balancer. There is one
> CloudSolrServer client updating the indexes. The index schema includes 3
> ExternalFileFields. When the CloudSolrServer client issues a hard commit,
> I
> observe that the commits occur sequentially, not in parallel, on the
> leader
> and replica. The duration of each commit is about a minute. Most of this
> time is spent reloading the 3 ExternalFileField files. Because of the
> sequential commits, there is a period of time (1 minute+) when the index
> searchers will return different results, which can cause a bad user
> experience. This will get worse as replicas are added to handle
> auto-scaling. The goal is to keep all replicas in sync w.r.t. the user
> queries.
>
> My questions:
>
> 1. Is there a reason that the distributed commits are done in sequence,
> not
> in parallel? Is there a way to change this behavior?
>
>
> The reason is that updates are currently done this way - it’s the only
> safe way to do it without solving some more problems. I don’t think you can
> easily change this. I think we should probably file a JIRA issue to track a
> better solution for commit handling. I think there are some complications
> because of how commits can be added on update requests, but its something
> we probably want to try and solve before tackling *all* updates to replicas
> in parallel with the leader.
>
>
>
> 2. If instead, the commits were done in parallel by a separate client via
> a
> GET to each Solr instance, how would this client get the host/port values
> for each Solr instance from zookeeper? Are there any downsides to doing
> commits this way?
>
> Not really, other than the extra management.
>
>
>
>
>
> Thanks,
> Peter
>

Re: Distributed commits in CloudSolrServer

Reply via email to