Re: Rolling restarts and the Solr Operator

Joel Bernstein Thu, 14 Oct 2021 15:05:17 -0700

Thanks Houston,

You are right, the main motivation for the SolrCloud per shard is
auto-scaling.


Here is the issue I created:
https://github.com/apache/solr-operator/issues/348



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Oct 14, 2021 at 2:09 PM Houston Putman <[email protected]>
wrote:

> Ok, I found why this is happening:
>
>
> https://github.com/apache/solr-operator/blob/v0.4.0/controllers/util/solr_update_util.go#L185
>
> Basically we make the assumption that the number of nodes in the
> statefulset is the same number of nodes in the cluster state.
> We should remove this check and just make sure that all of the nodes we
> care about are in the cluster state live nodes.
> That would solve this.
>
> Do you mind creating a Github Issue? This should be an easy fix to make
> this paradigm "supported" in v0.5.0.
>
> Also it would be great to allow the SolrCloud to be split into multiple
> StatefulSets in v0.6.0 (or sometime in the future), so that you don't have
> to manage multiple SolrCloud resources independently.
>
> - Houston
>
> On Thu, Oct 14, 2021 at 2:05 PM Houston Putman <[email protected]>
> wrote:
>
> > So this is interesting.
> >
> > I'm assuming that you are running a SolrCloud resource per-shard, so that
> > you can set system properties separately for autoscaling purposes.
> > The Solr Operator assumes that each cloud it is managing is independent.
> > However, the rolling restart process really just kills as many pods as
> > possible until the cluster state is too unhealthy to kill more
> > (configurable).
> >
> > In theory it should be fine to do a rolling restart at the same time on
> > each SolrCloud resource.
> > This is especially true because no two-SolrCloud resources share shard,
> so
> > their restarts should not affect each other.
> > (Actually you have devised the only truly safe way of upgrading multiple
> > SolrCloud resources at the same time that are actually one large cloud)
> >
> > The only overlap in logic between the SolrCloud resources is the
> overseer.
> > The logic in the solr operator is to restart the overseer last, and wait
> > for all nodes to be live and the cluster state to be healthy before
> killing
> > it.
> >
> > Are you seeing that all other node upgrades have succeeded, and the
> > cluster is healthy, but the overseer is still not upgraded?
> >
> > On Thu, Oct 14, 2021 at 1:50 PM Joel Bernstein <[email protected]>
> wrote:
> >
> >> This is a followup to my last question with my findings thus far. In a
> >> scenario where there is one SolrCloud resource per-shard I'm seeing the
> >> overseer node get skipped entirely during rolling restarts. So, it
> appears
> >> the solr-operator can only manage rolling restarts when there is one
> >> SolrCloud object in the cluster.
> >>
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >>
> >> On Tue, Oct 12, 2021 at 6:44 PM Joel Bernstein <[email protected]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I saw that the Solr operator takes into account collection topology
> when
> >> > performing rolling restarts. In a situation where there is one
> SolrCloud
> >> > object per-shard, I'm wondering how this will behave. In this case the
> >> Solr
> >> > Operator would receive a different CR for each shard which would kick
> >> off
> >> > the rolling restarts in parallel. Would the operator be able to
> >> understand
> >> > that it was operating on a single shard in each CR and not get tangled
> >> up
> >> > in the larger cluster state?
> >> >
> >> > Thanks,
> >> > Joel
> >> >
> >> >
> >> >
> >>
> >
>

Re: Rolling restarts and the Solr Operator

Reply via email to