[ovirt-users] Re: Does cluster upgrade wait for heal before proceeding to next host?

Kaustav Majumder Mon, 16 Sep 2019 01:05:52 -0700

Hi Jayme,
It would be great if you could raise a bug regarding the same.

On Wed, Sep 11, 2019 at 5:05 PM Jayme <jay...@gmail.com> wrote:


> This sounds similar to the issue I hit with the cluster upgrade process in
> my environment. I have large 2tb ssds and most of my vms are several
> hundred Gbs in size. The heal process after host reboot can take 5-10
> minutes to complete. I may be able to address this with better gluster
> tuning.
>
> Either way the upgrade process should be aware of the heal status and wait
> for it to complete before attempting to move on to the next host.
>
>
> On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sab...@redhat.com> wrote:
>
>>
>>
>> On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mper...@redhat.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbona...@redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jay...@gmail.com> ha
>>>> scritto:
>>>>
>>>>> I’m aware of the heal process but it’s unclear to me if the update
>>>>> continues to run while the volumes are healing and resumes when they are
>>>>> done. There doesn’t seem to be any indication in the ui (unless I’m
>>>>> mistaken)
>>>>>
>>>>
>>>> Adding @Martin Perina <mper...@redhat.com> , @Sahina Bose
>>>> <sab...@redhat.com>   and @Laura Wright <lwri...@redhat.com>  on this,
>>>> hyperconverged deployments using cluster upgrade command would probably
>>>> need some improvement.
>>>>
>>>
>>> The cluster upgrade process continues to the 2nd host after the 1st host
>>> becomes Up. If 2nd host then fails to switch to maintenance, we stop the
>>> upgrade process to prevent breakage.
>>> Sahina, is gluster healing process status exposed in RESTAPI? If so,
>>> does it makes sense to wait for healing to be finished before trying to
>>> move next host to maintenance? Or any other ideas how to improve?
>>>
>>
>> I need to cross-check this, if we expose the heal count in the gluster
>> bricks. Moving a host to maintenance does check if there are pending heal
>> entries or possibility of quorum loss. And this would prevent the
>> additional hosts to upgrade.
>> +Gobinda Das <go...@redhat.com> +Sachidananda URS <s...@redhat.com>
>>
>>
>>>>
>>>>
>>>>>
>>>>> On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <ok...@khm.de> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Often(?), updates to a hypervisor that also has (provides) a Gluster
>>>>>> brick takes the hypervisor offline (updates often require a reboot).
>>>>>>
>>>>>> This reboot then makes the brick "out of sync" and it has to be
>>>>>> resync'd.
>>>>>>
>>>>>> I find it a "feature" than another host that is also part of a
>>>>>> gluster
>>>>>> domain can not be updated (rebooted) before all the bricks are
>>>>>> updated
>>>>>> in order to guarantee there is not data loss. It is called Quorum, or?
>>>>>>
>>>>>> Always let the heal process end. Then the next update can start.
>>>>>> For me there is ALWAYS a healing time before Gluster is happy again.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Robert O'Kane
>>>>>>
>>>>>>
>>>>>> Am 06.08.2019 um 16:38 schrieb Shani Leviim:
>>>>>> > Hi Jayme,
>>>>>> > I can't recall such a healing time.
>>>>>> > Can you please retry and attach the engine & vdsm logs so we'll be
>>>>>> smarter?
>>>>>> >
>>>>>> > *Regards,
>>>>>> > *
>>>>>> > *Shani Leviim
>>>>>> > *
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Aug 6, 2019 at 5:24 PM Jayme <jay...@gmail.com
>>>>>> > <mailto:jay...@gmail.com>> wrote:
>>>>>> >
>>>>>> >     I've yet to have cluster upgrade finish updating my three host
>>>>>> HCI
>>>>>> >     cluster.  The most recent try was today moving from oVirt 4.3.3
>>>>>> to
>>>>>> >     4.3.5.5.  The first host updates normally, but when it moves on
>>>>>> to
>>>>>> >     the second host it fails to put it in maintenance and the
>>>>>> cluster
>>>>>> >     upgrade stops.
>>>>>> >
>>>>>> >     I suspect this is due to that fact that after my hosts are
>>>>>> updated
>>>>>> >     it takes 10 minutes or more for all volumes to sync/heal.  I
>>>>>> have
>>>>>> >     2Tb SSDs.
>>>>>> >
>>>>>> >     Does the cluster upgrade process take heal time in to account
>>>>>> before
>>>>>> >     attempting to place the next host in maintenance to upgrade it?
>>>>>> Or
>>>>>> >     is there something else that may be at fault here, or perhaps a
>>>>>> >     reason why the heal process takes 10 minutes after reboot to
>>>>>> complete?
>>>>>> >     _______________________________________________
>>>>>> >     Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
>>>>>> >     To unsubscribe send an email to users-le...@ovirt.org
>>>>>> >     <mailto:users-le...@ovirt.org>
>>>>>> >     Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>> >     oVirt Code of Conduct:
>>>>>> >     https://www.ovirt.org/community/about/community-guidelines/
>>>>>> >     List Archives:
>>>>>> >
>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIPAKY4KTTOSJZMCWHUPD/
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Users mailing list -- users@ovirt.org
>>>>>> > To unsubscribe send an email to users-le...@ovirt.org
>>>>>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>> > oVirt Code of Conduct:
>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>> > List Archives:
>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7Q4KGVR63RIQZFYXGWK/
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Systems Administrator
>>>>>> Kunsthochschule für Medien Köln
>>>>>> Peter-Welter-Platz 2
>>>>>> 50676 Köln
>>>>>> _______________________________________________
>>>>>> Users mailing list -- users@ovirt.org
>>>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>> oVirt Code of Conduct:
>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>> List Archives:
>>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LHAH5AVI5OPUQUQTABWM/
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users@ovirt.org
>>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475HBHTFDGRBSYHJMWYDR/
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sandro Bonazzola
>>>>
>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>>>
>>>> Red Hat EMEA <https://www.redhat.com/>
>>>>
>>>> sbona...@redhat.com
>>>> <https://www.redhat.com/>*Red Hat respects your work life balance.
>>>> Therefore there is no need to answer this email out of your office hours.
>>>> <https://mojo.redhat.com/docs/DOC-1199578>*
>>>>
>>>
>>>
>>> --
>>> Martin Perina
>>> Manager, Software Engineering
>>> Red Hat Czech s.r.o.
>>>
>> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/I4KLDBPYBCQEMC2MD6CRC5MLLLTYKJBG/
>


-- 

Thanks,

Kaustav Majumder

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ESENYDYMWG7YXQJWKQPXWBGBTDKW647H/

[ovirt-users] Re: Does cluster upgrade wait for heal before proceeding to next host?

Reply via email to