Re: [Solr-5-4-1] Why SolrCloud leader is putting all replicas in recovery at the same time ?

Gerald Reinhart Thu, 13 Oct 2016 05:55:21 -0700


Hi Pushkar Raste,


  Thanks for your hits.
  We will try the 3rd solution and keep you posted.

Gérald Reinhart

On 10/07/2016 02:23 AM, Pushkar Raste wrote:
A couple of questions/suggestions
- This normally happens after leader election, when new leader gets elected, it 
will force all the nodes to sync with itself.
Check logs to see when this happens, if leader was changed. If that is true 
then you will have to investigate why leader change takes place.
I suspect leader goes into long enough GC pause that makes zookeeper leader is 
no longer available and initiates leader election.

- What version of Solr you are using.  
SOLR-8586<https://issues.apache.org/jira/browse/SOLR-8586> introduced 
IndexFingerprint check, unfortunately it was broken and hence replica would always do full 
index replication. Issue is now fixed in 
SOLR-9310<https://issues.apache.org/jira/browse/SOLR-9310>, this should help replicas 
recover faster.

- You should also increase ulog log size (default threshold is 100 docs or 10 
tlogs whichever is hit first). This will again help replicas recover faster 
from tlogs (of course, there would be a threshold after which recovering from 
tlog would in fact take longer than copying over all the index files from 
leader)


On Thu, Oct 6, 2016 at 5:23 AM, Gerald Reinhart 
<gerald.reinh...@kelkoo.com<mailto:gerald.reinh...@kelkoo.com>> wrote:

Hello everyone,

   Our Solr Cloud  works very well for several months without any significant 
changes: the traffic to serve is stable, no major release deployed...

   But randomly, the Solr Cloud leader puts all the replicas in recovery at the 
same time for no obvious reason.

   Hence, we can not serve the queries any more and the leader is overloaded 
while replicating all the indexes on the replicas at the same time which 
eventually implies a downtime of approximately 30 minutes.

   Is there a way to prevent it ? Ideally, a configuration saying a percentage 
of replicas to be put in recovery at the same time?

Thanks,

Gérald, Elodie and Ludovic


--
[cid:part1.00000508.06030105@kelkoo.com]

Gérald Reinhart Software Engineer

E <mailto:gerald.reinh...@kelkoo.com> <mailto:gerald.reinh...@kelkoo.com> 
gerald.reinh...@kelkoo.com<mailto:gerald.reinh...@kelkoo.com>    Y!Messenger gerald.reinhart
T +33 (0)4 56 09 07 41<tel:%2B33%20%280%294%2056%2009%2007%2041>
A Parc Sud Galaxie - Le Calypso, 6 rue des Méridiens, 38130 Echirolles


[cid:part4.08030706.00010405@kelkoo.com]



________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.



--


--
[cid:part9.00000705.09050102@kelkoo.com]

Gérald Reinhart Software Engineer

E <mailto:gerald.reinh...@kelkoo.com> <mailto:gerald.reinh...@kelkoo.com> 
gerald.reinh...@kelkoo.com<mailto:gerald.reinh...@kelkoo.com>    Y!Messenger gerald.reinhart
T +33 (0)4 56 09 07 41
A Parc Sud Galaxie - Le Calypso, 6 rue des Méridiens, 38130 Echirolles


[cid:part12.05000204.08070006@kelkoo.com]



________________________________
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: [Solr-5-4-1] Why SolrCloud leader is putting all replicas in recovery at the same time ?

Reply via email to