On Wed, Mar 25, 2015 at 9:24 PM, Shai Erera <ser...@gmail.com> wrote:

> >
> > There's even a param onyIfDown=true which will remove a
> > replica only if it's already 'down'.
> >
>
> That will only work if the replica is in DOWN state correct? That is, if
> the Solr JVM was killed, and the replica stays in ACTIVE, but its node is
> not under /live_nodes, it won't get deleted? What I chose to do is to
> delete the replica if its node is not under /live_nodes, and I'm sure it
> will never return.
>

Probably not and we should fix it. It should be possible to delete replicas
which are not live I guess. But there are more behaviors that need to
defined e.g. what happens if a node was down and you deleted the replica
which was supposed to be on it and then the node came back up. Should we
re-create the replica automatically or ask the node to delete the local
core and have something new assigned to it? Some of these behaviors are
what we informally call ZK as Truth features where we want to move to a
world where ZK is the source of truth and nodes modify their state and
cores depending on what's inside ZK.


>
> No, there is no penalty because we always check for the state=active and
> > the live-ness before routing any requests to a replica.
> >
>
> Well, that's also a penalty :), though I agree it's a minor one. There is
> also a penalty ZK-wise -- clusterstate.json still records these orphanage
> replicas, so I'll make sure I do this cleanup from time to time.
>
>
Yeah but just to avoid any misunderstanding -- the live nodes are watched
by ZK so checking live-ness is a hash set lookup which is the cost but a
small one. But yeah you do need to cleanup from time to time.


> Thanks for the responses and clarifications!
>
> Shai
>
> On Wed, Mar 25, 2015 at 11:39 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > On Wed, Mar 25, 2015 at 12:51 PM, Shai Erera <ser...@gmail.com> wrote:
> >
> > > Thanks.
> > >
> > > Does Solr ever clean up those states? I.e. does it ever remove "down"
> > > replicas, or replicas belonging to non-live_nodes after some time? Or
> > will
> > > these remain in the cluster state forever (assuming they never come
> back
> > > up)?
> > >
> >
> > No, they remain there forever. You can still call the deletereplica API
> to
> > clean them up. There's even a param onyIfDown=true which will remove a
> > replica only if it's already 'down'.
> >
> >
> > >
> > > If they remain there, is there any penalty? E.g. Solr tries to send
> them
> > > updates, maybe tries to route search requests to? I'm talking about
> > > replicas that stay in ACTIVE state, but their nodes aren't under
> > > /live_nodes.
> > >
> >
> > No, there is no penalty because we always check for the state=active and
> > the live-ness before routing any requests to a replica.
> >
> >
> > >
> > > Shai
> > >
> > > On Wed, Mar 25, 2015 at 8:05 PM, Shalin Shekhar Mangar <
> > > shalinman...@gmail.com> wrote:
> > >
> > > > Comments inline:
> > > >
> > > > On Wed, Mar 25, 2015 at 8:30 AM, Shai Erera <ser...@gmail.com>
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Is it possible for a replica to be DOWN, while the node it resides
> on
> > > is
> > > > > under /live_nodes? If so, what can lead to it, aside from someone
> > > > unloading
> > > > > a core.
> > > > >
> > > >
> > > > Yes, aside from someone unloading the index, this can happen in two
> > ways
> > > 1)
> > > > during startup each core publishes it's state as 'down' before it
> > enters
> > > > recovery, and 2) the leader force-publishes a replica as 'down' if it
> > is
> > > > not able to forward updates to that replica (this mechanism is called
> > > > Leader-Initiated-Recovery or LIR in short)
> > > >
> > > > The #2 above can happen when the replica is partitioned from leader
> but
> > > > both are able to talk to ZooKeeper.
> > > >
> > > >
> > > > >
> > > > > I don't know if each SolrCore reports status to ZK independently,
> or
> > > it's
> > > > > done by the Solr process as a whole.
> > > > >
> > > > >
> > > > It is done on a per-core basis for now. But the 'live' node is
> > maintained
> > > > one per Solr instance (JVM).
> > > >
> > > >
> > > > > Also, is it possible for a replica to report ACTIVE, while the node
> > it
> > > > > lives on is no longer under /live_nodes? Are there any ZK timings
> > that
> > > > can
> > > > > cause that?
> > > > >
> > > >
> > > > Yes, this can happen if the JVM crashed. A replica publishes itself
> as
> > > > 'down' on shutdown so if the graceful shutdown step is skipped then
> the
> > > > replica will continue to be 'active' in the cluster state. Even LIR
> > > doesn't
> > > > apply here because there's no point in the leader marking a node as
> > > 'down'
> > > > if it is not 'live' already.
> > > >
> > > >
> > > > >
> > > > > Shai
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Shalin Shekhar Mangar.
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to