The baseline topology is still A Thing with memory--only clusters. The difference is that auto-adjust is enabled by default. But yes, in short you don't need to worry about the baseline if you don't use native persistence.
On Thu, 12 Sept 2024 at 21:21, Humphrey <[email protected]> wrote: > And about baseline topology, that is only for when you using storage > right? When using only in memory you don’t have baseline topology but just > a cluster with pods. > > I’ll incorporate the check if the node has joined the cluster. > > On 10 Sep 2024, at 23:13, Jeremy McMillan <[email protected]> wrote: > > > Your pod flag should check baseline topology to see if it has fully joined > the cluster AND that rebalancing has BOTH started and finished. > > There is a race condition if the pod does not immediately join the > cluster, but checks to see if the cluster is balanced and THEN joins the > cluster, triggering another rebalance after it's already reported that it > is ready. > > Try to control for that. > > On Tue, Sep 10, 2024 at 3:01 AM Humphrey Lopez <[email protected]> wrote: > >> Thanks, seems a bit complicated. When I have more time I'll try that >> approach. >> For now we still going to (mis) use the Readiness probe to wait for the >> rebalancing in a smart way. When the pod starts we have a flag that is set >> to False, then the pod won't get ready until the cluster is rebalanced. >> When the status of the cluster is rebalanced the pod will get the state to >> ready and the flag will be set to true. Next Rebalancing triggered by >> another pod will not affect the already running pod cause the flag will be >> True. >> >> Let's see if this will wait long enough for the cluster to be in a stable >> phase. >> >> Humphrey >> >> Op ma 9 sep 2024 om 17:34 schreef Jeremy McMillan <[email protected]>: >> >>> An operator as I understand it, is just a pod that interacts with your >>> application and Kubernetes API server as necessary to do what you might be >>> doing manually. >>> >>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ >>> https://kubernetes.io/docs/reference/using-api/client-libraries/ >>> >>> You might start by creating an admin-pod with Ignite control.sh, >>> sqlline.sh, thin client, etc. tools PLUS kubectl or some other Kubernetes >>> API client that you can exec into and manually perform all of the rolling >>> update steps. Once you know you have all the tools and steps complete, you >>> can try adding scripts to the pod to automate sequences of steps. Then once >>> the scripts are fairly robust and complete, you can use the admin-pod as a >>> basis for Kubernetes Job definitions. It's up to you whether you'd like to >>> continue integrating with Kubernetes further. Next steps would be to create >>> a CustomResourceDefinition instead of using Kubernetes Job, or >>> writing/adding a Kubernetes compatible API that does what your Job command >>> line startup does, but with more control over parameters. >>> >>> Please share your results once you've got things working. Best of luck! >>> >>> On Fri, Sep 6, 2024 at 10:15 AM Humphrey <[email protected]> wrote: >>> >>>> Thanks for the explanation, is there any operator ready for use? Is it >>>> hard to create own Operator if it doesn’t exist yet? >>>> >>>> Thanks >>>> >>>> On 5 Sep 2024, at 19:39, Jeremy McMillan <[email protected]> wrote: >>>> >>>> >>>> It is correct for an operator, but not correct for readiness probe. >>>> It's not your understanding of Ignite metrics. It is your understanding of >>>> Kubernetes. >>>> Kubernetes rolling update logic assumes all of your service backend >>>> nodes are completely independent, but you have chosen a readiness probe >>>> which reflects how nodes are interacting and interdependent. >>>> >>>> Hypothetically: >>>> We have bounced one node, and it has rejoined the cluster, and is >>>> rebalancing. >>>> If Kubernetes probes this node for readiness, we fail because we are >>>> rebalancing. The scheduler will block progress of the rolling update. >>>> If Kubernetes probes any other node for readiness, it will fail >>>> because we are rebalancing. The scheduler will remove this node from any >>>> services. >>>> All the nodes will reflect the state of the cluster: rebalancing. >>>> No nodes will remain in the service backend. If you are using the >>>> Kubernetes discovery SPI, the restarted node will find itself unable to >>>> discover any peers. >>>> >>>> The problem is that Kubernetes interprets the readiness probe as a NODE >>>> STATE. The cluster.rebalanced metric is a CLUSTER STATE. >>>> >>>> If you had a Kubernetes job that executes Kubectl commands from within >>>> the cluster, looping over the pods in a StatefulSet and restarting them, it >>>> would make perfect sense to check cluster.rebalanced and block until >>>> rebalancing finishes, but Kubernetes does something different with >>>> readiness probes based on some assumptions about clustering which do not >>>> apply to Ignite. >>>> >>>> On Thu, Sep 5, 2024 at 11:29 AM Humphrey Lopez <[email protected]> >>>> wrote: >>>> >>>>> Yes I’m trying to read the cluster.rebalanced metric from the JMX >>>>> mBean, is that the correct one? I’ve build that into the readiness >>>>> endpoint >>>>> from actuator and let kubernetes wait for the cluster to be ready before >>>>> move to the next pod. >>>>> >>>>> Humphrey >>>>> >>>>> On 5 Sep 2024, at 17:34, Jeremy McMillan <[email protected]> wrote: >>>>> >>>>> >>>>> I assume you have created your caches/tables with backups>=1. >>>>> >>>>> You should restart one node at a time, and wait until the restarted >>>>> node has rejoined the cluster, then wait for rebalancing to begin, then >>>>> wait for rebalancing to finish before restarting the next node. Kubernetes >>>>> readiness probes aren't sophisticated enough. "Node ready" state isn't the >>>>> same thing as "Cluster ready" state, but Kubernetes scheduler can't >>>>> distinguish. This should be handled by an operator, either human, or a >>>>> Kubernetes automated one. >>>>> >>>>> On Tue, Sep 3, 2024 at 1:13 PM Humphrey <[email protected]> wrote: >>>>> >>>>>> Thanks, I meant Rolling Update of the same version of Ignite (2.16). >>>>>> Not upgrade to a new version. We have our ignite embedded in Spring Boot >>>>>> application, and when changing code we need to deploy new version of the >>>>>> jar. >>>>>> >>>>>> Humphrey >>>>>> >>>>>> On 3 Sep 2024, at 19:24, Gianluca Bonetti <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>>> Hello >>>>>> >>>>>> If you want to upgrade Apache Ignite version, this is not supported >>>>>> by Apache Ignite >>>>>> >>>>>> "Ignite cluster cannot have nodes that run on different Ignite >>>>>> versions. You need to stop the cluster and start it again on the new >>>>>> Ignite >>>>>> version." >>>>>> https://ignite.apache.org/docs/latest/installation/upgrades >>>>>> >>>>>> If you need rolling upgrades you can upgrade to GridGain which bring >>>>>> rolling upgrades together with many other interesting features >>>>>> "Rolling Upgrades is a feature of GridGain Enterprise and Ultimate >>>>>> Edition that allows nodes with different GridGain versions to coexist in >>>>>> a >>>>>> cluster while you roll out a new version. This prevents downtime when >>>>>> performing software upgrades." >>>>>> >>>>>> https://www.gridgain.com/docs/latest/installation-guide/rolling-upgrades >>>>>> >>>>>> Cheers >>>>>> Gianluca Bonetti >>>>>> >>>>>> On Tue, 3 Sept 2024 at 18:15, Humphrey Lopez <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hello, we have several pods with ignite caches running in >>>>>>> kubernetes. We only use memory mode (not persistence) and want to >>>>>>> perform >>>>>>> rolling update of without losing data. What metric should we monitor to >>>>>>> know when it’s safe to replace the next pod? >>>>>>> >>>>>>> We have tried the Cluser.Rebalanced (1) metric from JMX in a >>>>>>> readiness probe but we still end up losing data from the caches. >>>>>>> >>>>>>> 1) >>>>>>> https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#cluster >>>>>>> >>>>>>> Should we use another mechanism or metric for determining the >>>>>>> readiness of the new started pod? >>>>>>> >>>>>>> >>>>>>> Humphrey >>>>>>> >>>>>>
