RE: Load balancing with solr cloud

Garth Grimm Fri, 21 Oct 2016 05:40:20 -0700

I just realized that I made an assumption about your initial question that may 
not be true.

Everything I've said has been based on handling requests to add/update 
documents during the indexing process.  That process involves the "leader 
first" concept I've been mentioning.

So to answer your original question on the query side....

> Actually, zookeeper really won't participate in the query process at all.  
> And the leader role for a core in a shard has no bearing whatsoever.
>
> ;-) Read ymonad's answer. ;-)  The CloudSolrServer class has been renamed to 
> CloudSolrClient (or something similar) recently, but otherwise, I think his 
> answer is still basically correct.

It's worth noting that even if the node that receives the request has a core 
that could participate in generating results, it might ask some other core of 
that same shard to return the results for that shard.  The preferLocalShards 
parameter can be used to avoid that (near the bottom of 
https://cwiki.apache.org/confluence/display/solr/Distributed+Requests).  

In any case, if you have many shards, load balancing on the query side is 
definitely more important than on the indexing side.  The query controller will 
have to merge the result sets (one from each shard), and initiate the second 
pass of requests to get stored fields, and then marshall all that data back 
through the HTTP response.  That's more extra work then the controller has to 
do for an update request, which is basically just pass along whatever 
information the shard leader responded with.

And load balancing for reliability purposes is always a good thing.

>>> Also, for indexing, I think it's possible to control how many replicas need 
>>> to confirm to the leader before the response is supplied to the client, as 
>>> you can with say MongoDB replicas.

Yes, that's possible.  It's what I was thinking about when I mentioned 
"...general case flow".  That capability is relatively new, and not the 
default, which is why I didn't mention it.

-----Original Message-----
From: hairymccla...@yahoo.com.INVALID [mailto:hairymccla...@yahoo.com.INVALID] 
Sent: Friday, October 21, 2016 4:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

As I understand it for non-SolrCloud aware clients you have to manually load 
balance your searches, see ymonad's answer here:
http://stackoverflow.com/questions/22523588/loadbalancer-and-solrcloud

This is from 2014 so maybe this has changed now - would be interested to know 
as well.
Also, for indexing, I think it's possible to control how many replicas need to 
confirm to the leader before the response is supplied to the client, as you can 
with say MongoDB replicas.

    On Friday, October 21, 2016 1:18 AM, Garth Grimm 
<garthgr...@averyranchconsulting.com> wrote:

 No matter where you send the update to initially, it will get sent to the 
leader of the shard first.  The leader does a parsing of it to ensure it can be 
indexed, then it will send it to all the replicas in parallel.  The replicas 
will do their parsing and report back that they have persisted the data to 
their tlogs.  Once the leader hears back from all the replicas, the leader will 
reply back that the update is complete, and your client will receive it's HTTP 
response on the transaction.

At least that's the general case flow.

So it really won't matter how your load balancing is handled above the cloud.  
All the work is done the same way, with the leader having to do slightly more 
work than the replicas.

If you can manage to initially send all the updates to the correct leader, you 
can skip one hop before the work starts, which may buy you a small performance 
boost compared to randomly picking a node to send the request to.  But you'll 
need to be taxing the cloud pretty heavily before that difference becomes too 
noticeable.

-----Original Message-----
From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
Sent: Thursday, October 20, 2016 5:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

Thank you very much John and Garth,

I've tested it out and it works fine, I can send the updates to any of the solr 
nodes.

If I am not using a zookeeper aware client and If I direct all my queries (read 
queries) always to the leader of the solr instances,does it automatically load 
balance between the replicas?

Or do I have to hit each instance in a round robin way and have the load 
balanced through the code?

Please advise the best way to do so..

Thank you very much again..

On Fri, Oct 21, 2016 at 9:18 AM, Garth Grimm < 
garthgr...@averyranchconsulting.com> wrote:

> Actually, zookeeper really won't participate in the update process at all.
>
> If you're using a "zookeeper aware" client like SolrJ, the SolrJ 
> library will read the cloud configuration from zookeeper, but will 
> send all the updates to the leader of the shard that the document is meant to 
> go to.
>
> If you're not using a "zookeeper aware" client, you can send the 
> update to any of the solr nodes, and they will evaluate the cloud 
> configuration information they've already received from zookeeper, and 
> then forward the document to leader of the shard that will handle the 
> document update.
>
> In general, Zookeeper really only provides the cloud configuration 
> information once (at most) during all the updates, the actual document 
> update only gets sent to solr nodes.  There's definitely no need to 
> distribute load between zookeepers for this situation.
>
> Regards,
> Garth Grimm
>
> -----Original Message-----
> From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
> Sent: Thursday, October 20, 2016 5:11 PM
> To: solr-user@lucene.apache.org
> Subject: Load balancing with solr cloud
>
> Hi again Experts,
>
> I have a question related to load balancing in solr cloud.
>
> If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 
> secondary replicas and 1 shard), when the traffic comes in the primary 
> zookeeper server will be hammered, correct?
>
> I understand (or is it wrong) that zookeeper will load balance between 
> solr nodes but if we want to distribute the load between zookeeper 
> nodes as well, what is the best approach.
>
> Cost is a concern for us too.
>
> Thank you very much, in advance.
>
> --
> Regards
>
> Sadheera Vithanage
>

--
Regards

Sadheera Vithanage

RE: Load balancing with solr cloud

Reply via email to