Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

Gili Nachum Thu, 14 Jan 2016 14:47:52 -0800

Opps. Got omitted.
v4.72. plus it kept reproducing after upgrading to v4.9 (was trying to see
if it was fixed later on).



On Thu, Jan 14, 2016 at 5:26 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Which version of Solr is this on?
>
> On Thu, Jan 14, 2016 at 4:10 PM, Gili Nachum <gilinac...@gmail.com> wrote:
> > Clarificaiton: If we restart nodes after reloading collection and before
> > pausing, then recovery works fine.
> >
> > On Thu, Jan 14, 2016 at 12:08 PM, Gili Nachum <gilinac...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> Our Solr cluster is running VMs that could freeze for more than the ZK
> >> tick time (it's a non critical CI/CD pipeline running on an overloaded
> >> ESX). When this happens the node's shards will be registered as down.
> Then
> >> when the node is back recovery takes place, and all shards replicas end
> up
> >> active state. Everyone is happy.
> >>
> >> However, we noticed that recover doesn't take place if the collection
> was
> >> reloaded and the server didn't restart since. Shards end up in done
> state.
> >> Before providing log messages, I wonder if this is a known issue?
> >>
> >> Reproducing recipe (assume two nodes):
> >> 1. Before starting: restart both solr1 and solr2: all shards are active.
> >> 2. Reload the collection
> >> 3. Cause disconnect by freezing the Java process:
> >> On Solr2: kill -SIGSTOP <solr server pid> and then in 2 min kill
> -SIGCONT
> >> <solr server pid>
> >> 4. solr2 shard replicas are *Down *forever. No recovery.
> >>
> >> If we omit step #2, the cluster recovers as expected.
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

Reply via email to