Re: Rolling Update

Stephen Darlington Fri, 13 Sep 2024 00:42:26 -0700

The baseline topology is still A Thing with memory--only clusters. The
difference is that auto-adjust is enabled by default. But yes, in short you
don't need to worry about the baseline if you don't use native persistence.


On Thu, 12 Sept 2024 at 21:21, Humphrey <[email protected]> wrote:

> And about baseline topology, that is only for when you using storage
> right? When using only in memory you don’t have baseline topology but just
> a cluster with pods.
>
> I’ll incorporate the check if the node has joined the cluster.
>
> On 10 Sep 2024, at 23:13, Jeremy McMillan <[email protected]> wrote:
>
> 
> Your pod flag should check baseline topology to see if it has fully joined
> the cluster AND that rebalancing has BOTH started and finished.
>
> There is a race condition if the pod does not immediately join the
> cluster, but checks to see if the cluster is balanced and THEN joins the
> cluster, triggering another rebalance after it's already reported that it
> is ready.
>
> Try to control for that.
>
> On Tue, Sep 10, 2024 at 3:01 AM Humphrey Lopez <[email protected]> wrote:
>
>> Thanks, seems a bit complicated. When I have more time I'll try that
>> approach.
>> For now we still going to (mis) use the Readiness probe to wait for the
>> rebalancing in a smart way. When the pod starts we have a flag that is set
>> to False, then the pod won't get ready until the cluster is rebalanced.
>> When the status of the cluster is rebalanced the pod will get the state to
>> ready and the flag will be set to true. Next Rebalancing triggered by
>> another pod will not affect the already running pod cause the flag will be
>> True.
>>
>> Let's see if this will wait long enough for the cluster to be in a stable
>> phase.
>>
>> Humphrey
>>
>> Op ma 9 sep 2024 om 17:34 schreef Jeremy McMillan <[email protected]>:
>>
>>> An operator as I understand it, is just a pod that interacts with your
>>> application and Kubernetes API server as necessary to do what you might be
>>> doing manually.
>>>
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> https://kubernetes.io/docs/reference/using-api/client-libraries/
>>>
>>> You might start by creating an admin-pod with Ignite control.sh,
>>> sqlline.sh, thin client, etc. tools PLUS kubectl or some other Kubernetes
>>> API client that you can exec into and manually perform all of the rolling
>>> update steps. Once you know you have all the tools and steps complete, you
>>> can try adding scripts to the pod to automate sequences of steps. Then once
>>> the scripts are fairly robust and complete, you can use the admin-pod as a
>>> basis for Kubernetes Job definitions. It's up to you whether you'd like to
>>> continue integrating with Kubernetes further. Next steps would be to create
>>> a CustomResourceDefinition instead of using Kubernetes Job, or
>>> writing/adding a Kubernetes compatible API that does what your Job command
>>> line startup does, but with more control over parameters.
>>>
>>> Please share your results once you've got things working. Best of luck!
>>>
>>> On Fri, Sep 6, 2024 at 10:15 AM Humphrey <[email protected]> wrote:
>>>
>>>> Thanks for the explanation, is there any operator ready for use? Is it
>>>> hard to create own Operator if it doesn’t exist yet?
>>>>
>>>> Thanks
>>>>
>>>> On 5 Sep 2024, at 19:39, Jeremy McMillan <[email protected]> wrote:
>>>>
>>>> 
>>>> It is correct for an operator, but not correct for readiness probe.
>>>> It's not your understanding of Ignite metrics. It is your understanding of
>>>> Kubernetes.
>>>> Kubernetes rolling update logic assumes all of your service backend
>>>> nodes are completely independent, but you have chosen a readiness probe
>>>> which reflects how nodes are interacting and interdependent.
>>>>
>>>> Hypothetically:
>>>>   We have bounced one node, and it has rejoined the cluster, and is
>>>> rebalancing.
>>>>   If Kubernetes probes this node for readiness, we fail because we are
>>>> rebalancing. The scheduler will block progress of the rolling update.
>>>>   If Kubernetes probes any other node for readiness, it will fail
>>>> because we are rebalancing. The scheduler will remove this node from any
>>>> services.
>>>>   All the nodes will reflect the state of the cluster: rebalancing.
>>>>   No nodes will remain in the service backend. If you are using the
>>>> Kubernetes discovery SPI, the restarted node will find itself unable to
>>>> discover any peers.
>>>>
>>>> The problem is that Kubernetes interprets the readiness probe as a NODE
>>>> STATE. The cluster.rebalanced metric is a CLUSTER STATE.
>>>>
>>>> If you had a Kubernetes job that executes Kubectl commands from within
>>>> the cluster, looping over the pods in a StatefulSet and restarting them, it
>>>> would make perfect sense to check cluster.rebalanced and block until
>>>> rebalancing finishes, but Kubernetes does something different with
>>>> readiness probes based on some assumptions about clustering which do not
>>>> apply to Ignite.
>>>>
>>>> On Thu, Sep 5, 2024 at 11:29 AM Humphrey Lopez <[email protected]>
>>>> wrote:
>>>>
>>>>> Yes I’m trying to read the cluster.rebalanced metric from the JMX
>>>>> mBean, is that the correct one? I’ve build that into the readiness 
>>>>> endpoint
>>>>> from actuator and let kubernetes wait for the cluster to be ready before
>>>>> move to the next pod.
>>>>>
>>>>> Humphrey
>>>>>
>>>>> On 5 Sep 2024, at 17:34, Jeremy McMillan <[email protected]> wrote:
>>>>>
>>>>> 
>>>>> I assume you have created your caches/tables with backups>=1.
>>>>>
>>>>> You should restart one node at a time, and wait until the restarted
>>>>> node has rejoined the cluster, then wait for rebalancing to begin, then
>>>>> wait for rebalancing to finish before restarting the next node. Kubernetes
>>>>> readiness probes aren't sophisticated enough. "Node ready" state isn't the
>>>>> same thing as "Cluster ready" state, but Kubernetes scheduler can't
>>>>> distinguish. This should be handled by an operator, either human, or a
>>>>> Kubernetes automated one.
>>>>>
>>>>> On Tue, Sep 3, 2024 at 1:13 PM Humphrey <[email protected]> wrote:
>>>>>
>>>>>> Thanks, I meant Rolling Update of the same version of Ignite (2.16).
>>>>>> Not upgrade to a new version. We have our ignite embedded in Spring Boot
>>>>>> application, and when changing code we need to deploy new version of the
>>>>>> jar.
>>>>>>
>>>>>> Humphrey
>>>>>>
>>>>>> On 3 Sep 2024, at 19:24, Gianluca Bonetti <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>> Hello
>>>>>>
>>>>>> If you want to upgrade Apache Ignite version, this is not supported
>>>>>> by Apache Ignite
>>>>>>
>>>>>> "Ignite cluster cannot have nodes that run on different Ignite
>>>>>> versions. You need to stop the cluster and start it again on the new 
>>>>>> Ignite
>>>>>> version."
>>>>>> https://ignite.apache.org/docs/latest/installation/upgrades
>>>>>>
>>>>>> If you need rolling upgrades you can upgrade to GridGain which bring
>>>>>> rolling upgrades together with many other interesting features
>>>>>> "Rolling Upgrades is a feature of GridGain Enterprise and Ultimate
>>>>>> Edition that allows nodes with different GridGain versions to coexist in 
>>>>>> a
>>>>>> cluster while you roll out a new version. This prevents downtime when
>>>>>> performing software upgrades."
>>>>>>
>>>>>> https://www.gridgain.com/docs/latest/installation-guide/rolling-upgrades
>>>>>>
>>>>>> Cheers
>>>>>> Gianluca Bonetti
>>>>>>
>>>>>> On Tue, 3 Sept 2024 at 18:15, Humphrey Lopez <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello, we have several pods with ignite caches running in
>>>>>>> kubernetes. We only use memory mode (not persistence) and want to 
>>>>>>> perform
>>>>>>> rolling update of without losing data. What metric should we monitor to
>>>>>>> know when it’s safe to replace the next pod?
>>>>>>>
>>>>>>> We have tried the Cluser.Rebalanced (1) metric from JMX in a
>>>>>>> readiness probe but we still end up losing data from the caches.
>>>>>>>
>>>>>>> 1)
>>>>>>> https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#cluster
>>>>>>>
>>>>>>> Should we use another mechanism or metric for determining the
>>>>>>> readiness of the new started pod?
>>>>>>>
>>>>>>>
>>>>>>> Humphrey
>>>>>>>
>>>>>>

Re: Rolling Update

Reply via email to