Thanks Houston, You are right, the main motivation for the SolrCloud per shard is auto-scaling.
Here is the issue I created: https://github.com/apache/solr-operator/issues/348 Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Oct 14, 2021 at 2:09 PM Houston Putman <[email protected]> wrote: > Ok, I found why this is happening: > > > https://github.com/apache/solr-operator/blob/v0.4.0/controllers/util/solr_update_util.go#L185 > > Basically we make the assumption that the number of nodes in the > statefulset is the same number of nodes in the cluster state. > We should remove this check and just make sure that all of the nodes we > care about are in the cluster state live nodes. > That would solve this. > > Do you mind creating a Github Issue? This should be an easy fix to make > this paradigm "supported" in v0.5.0. > > Also it would be great to allow the SolrCloud to be split into multiple > StatefulSets in v0.6.0 (or sometime in the future), so that you don't have > to manage multiple SolrCloud resources independently. > > - Houston > > On Thu, Oct 14, 2021 at 2:05 PM Houston Putman <[email protected]> > wrote: > > > So this is interesting. > > > > I'm assuming that you are running a SolrCloud resource per-shard, so that > > you can set system properties separately for autoscaling purposes. > > The Solr Operator assumes that each cloud it is managing is independent. > > However, the rolling restart process really just kills as many pods as > > possible until the cluster state is too unhealthy to kill more > > (configurable). > > > > In theory it should be fine to do a rolling restart at the same time on > > each SolrCloud resource. > > This is especially true because no two-SolrCloud resources share shard, > so > > their restarts should not affect each other. > > (Actually you have devised the only truly safe way of upgrading multiple > > SolrCloud resources at the same time that are actually one large cloud) > > > > The only overlap in logic between the SolrCloud resources is the > overseer. > > The logic in the solr operator is to restart the overseer last, and wait > > for all nodes to be live and the cluster state to be healthy before > killing > > it. > > > > Are you seeing that all other node upgrades have succeeded, and the > > cluster is healthy, but the overseer is still not upgraded? > > > > On Thu, Oct 14, 2021 at 1:50 PM Joel Bernstein <[email protected]> > wrote: > > > >> This is a followup to my last question with my findings thus far. In a > >> scenario where there is one SolrCloud resource per-shard I'm seeing the > >> overseer node get skipped entirely during rolling restarts. So, it > appears > >> the solr-operator can only manage rolling restarts when there is one > >> SolrCloud object in the cluster. > >> > >> > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> > >> On Tue, Oct 12, 2021 at 6:44 PM Joel Bernstein <[email protected]> > >> wrote: > >> > >> > Hi, > >> > > >> > I saw that the Solr operator takes into account collection topology > when > >> > performing rolling restarts. In a situation where there is one > SolrCloud > >> > object per-shard, I'm wondering how this will behave. In this case the > >> Solr > >> > Operator would receive a different CR for each shard which would kick > >> off > >> > the rolling restarts in parallel. Would the operator be able to > >> understand > >> > that it was operating on a single shard in each CR and not get tangled > >> up > >> > in the larger cluster state? > >> > > >> > Thanks, > >> > Joel > >> > > >> > > >> > > >> > > >
