Thanks so much for the very quick and detailed explanation, Erick!

  

According to the following page, it seems numRecordsToKeep cannot be too high
that must fit in a singe POST.

It seems your 1> or 3> approaches would be the best in pratical when the
number of updated documents is high.

  

https://support.lucidworks.com/hc/en-us/articles/203842143-Recovery-times-
while-restarting-a-SolrCloud-node  

  

Thanks again for happy thanksgiving!

  
Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b
/local-76daa0e7-1a84/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29s
ci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn), the extensible, open source mail client.

![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-
76daa0e7-1a84?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

  
On Nov 25 2016, at 2:33 pm, Erick Erickson <erickerick...@gmail.com> wrote:  

> First, get out of thinking about the replication API, things like  
DISABLEPOLL and the like when in SolrCloud mode. The  
"old style" replication is used under the control of the synching  
strategy. Unless you've configured master/slave sections of  
your solrconfig.xml files and somehow dealt with the leader  
changing (who should be polled?), I'm pretty sure this is a total red herring.

>

> As for the rest, that's just the way it works. In SolrCloud, the  
raw documents are forwarded from the leader to the followers.  
Outside of a node going into recovery, replication isn't used  
at all.

>

> However, when a node goes into recovery (which by definition it will  
when the core is reloaded or the Solr instance is restarted) then  
the replica checks with the leader to see if it's "too far" out of date. The  
default "too far" is 100 docs, although this can be changed by setting  
the updatelog numRecordsToKeep to a higher number in solrconfig.xml.  
If the replica is too far out of date, a full index replication is done which  
is what you're observing.

>

> If the number of updates the leader has received is < 100  
(or numRecordsToKeep) the leader sends the raw documents to the  
follower from it's update log and there is no "old style" replication there  
at all.

>

> So, the net-net here is that your choices are limited:

>

> 1> stop indexing while doing the restart.

>

> 2> bump numRecordsToKeep to some larger number that  
     you expect not to be exceeded for the time it takes to  
     restart each node.

>

> 3> live with the full index replication in this situation.

>

> I'll add parenthetically that having to redeploy plugins and the like  
_should_ be a relatively rare operation, and it seems (at least from  
the outside) to be a perfectly reasonable thing to do in a maintenance  
window when index updates are disabled.

>

> You can also consider using collection aliasing to switch back and  
forth between two collections so you can manipulate the current  
cold one and, when you're satisfied, switch the alias.

>

> Best,  
Erick

>

> On Fri, Nov 25, 2016 at 1:40 PM, Jichi Guo <jichi...@gmail.com> wrote:  
> Hi,  
>  
>  
>  
> I am seeking for the best practice to restart a sharded SolrCloud that
taking  
> search traffic as well as realtime updates without downtime.  
>  
> When I deploy new customized Solr plugins,for example, it will require  
> restarting the whole SolrCloud cluster.  
>  
> I am testing Solr 6.2.1 with 4 shards.  
>  
> And I find that when SolrCloud is taking updates, when I restart any Solr
node  
> (no matter whether it is a leader node or overseer or other normal replica),  
> the restarted node would Reindex it's whole data from its leader. i.e., it  
> will redownload the whole index data and then drop its old data.  
>  
> The only way I find to avoid such reindexing is to temporarily disable  
> updates, such as invoke disableReplication in the leader node before  
> restarting.  
>  
>  
>  
> Additionally, I didn't find a way to temporarily pause Solr replication to a  
> single replica. Before sharding, we can do disablePoll to disable
replication  
> in a slave. But after sharding, disable replication from the leader node is  
> the only way I found, which will pause not only the replication to the one  
> node to restart, but also disable replication in all nodes in the same
shard.  
>  
>  
>  
> The procedure becomes more complex if I want to restart a leader node: I
need  
> first manually trigger a leader node failover through rebalancing, then  
> disable replication in the new leader node, then restart the old leader
node,  
> and at last reenable replication in the new leader node.  
>  
>  
>  
> As you can see, it seems to take many steps to restart SolrCloud node by
node  
> this way.  
>  
> I am not sure if this is the best procedure to restart the whole SolrCloud  
> that is taking realtime update?  
>  
>  
>  
> Thanks!  
>  
>  
> Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b  
> /local-
7bf8174b-7288/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29s  
> ci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn), the extensible, open source mail client.  
>  
> ![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-  
> 7bf8174b-7288?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)  
>

Reply via email to