Hi Jayme, It would be great if you could raise a bug regarding the same. On Wed, Sep 11, 2019 at 5:05 PM Jayme <jay...@gmail.com> wrote:
> This sounds similar to the issue I hit with the cluster upgrade process in > my environment. I have large 2tb ssds and most of my vms are several > hundred Gbs in size. The heal process after host reboot can take 5-10 > minutes to complete. I may be able to address this with better gluster > tuning. > > Either way the upgrade process should be aware of the heal status and wait > for it to complete before attempting to move on to the next host. > > > On Wed, Sep 11, 2019 at 3:53 AM Sahina Bose <sab...@redhat.com> wrote: > >> >> >> On Fri, Aug 9, 2019 at 3:41 PM Martin Perina <mper...@redhat.com> wrote: >> >>> >>> >>> On Thu, Aug 8, 2019 at 10:25 AM Sandro Bonazzola <sbona...@redhat.com> >>> wrote: >>> >>>> >>>> >>>> Il giorno mar 6 ago 2019 alle ore 23:17 Jayme <jay...@gmail.com> ha >>>> scritto: >>>> >>>>> I’m aware of the heal process but it’s unclear to me if the update >>>>> continues to run while the volumes are healing and resumes when they are >>>>> done. There doesn’t seem to be any indication in the ui (unless I’m >>>>> mistaken) >>>>> >>>> >>>> Adding @Martin Perina <mper...@redhat.com> , @Sahina Bose >>>> <sab...@redhat.com> and @Laura Wright <lwri...@redhat.com> on this, >>>> hyperconverged deployments using cluster upgrade command would probably >>>> need some improvement. >>>> >>> >>> The cluster upgrade process continues to the 2nd host after the 1st host >>> becomes Up. If 2nd host then fails to switch to maintenance, we stop the >>> upgrade process to prevent breakage. >>> Sahina, is gluster healing process status exposed in RESTAPI? If so, >>> does it makes sense to wait for healing to be finished before trying to >>> move next host to maintenance? Or any other ideas how to improve? >>> >> >> I need to cross-check this, if we expose the heal count in the gluster >> bricks. Moving a host to maintenance does check if there are pending heal >> entries or possibility of quorum loss. And this would prevent the >> additional hosts to upgrade. >> +Gobinda Das <go...@redhat.com> +Sachidananda URS <s...@redhat.com> >> >> >>>> >>>> >>>>> >>>>> On Tue, Aug 6, 2019 at 6:06 PM Robert O'Kane <ok...@khm.de> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Often(?), updates to a hypervisor that also has (provides) a Gluster >>>>>> brick takes the hypervisor offline (updates often require a reboot). >>>>>> >>>>>> This reboot then makes the brick "out of sync" and it has to be >>>>>> resync'd. >>>>>> >>>>>> I find it a "feature" than another host that is also part of a >>>>>> gluster >>>>>> domain can not be updated (rebooted) before all the bricks are >>>>>> updated >>>>>> in order to guarantee there is not data loss. It is called Quorum, or? >>>>>> >>>>>> Always let the heal process end. Then the next update can start. >>>>>> For me there is ALWAYS a healing time before Gluster is happy again. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Robert O'Kane >>>>>> >>>>>> >>>>>> Am 06.08.2019 um 16:38 schrieb Shani Leviim: >>>>>> > Hi Jayme, >>>>>> > I can't recall such a healing time. >>>>>> > Can you please retry and attach the engine & vdsm logs so we'll be >>>>>> smarter? >>>>>> > >>>>>> > *Regards, >>>>>> > * >>>>>> > *Shani Leviim >>>>>> > * >>>>>> > >>>>>> > >>>>>> > On Tue, Aug 6, 2019 at 5:24 PM Jayme <jay...@gmail.com >>>>>> > <mailto:jay...@gmail.com>> wrote: >>>>>> > >>>>>> > I've yet to have cluster upgrade finish updating my three host >>>>>> HCI >>>>>> > cluster. The most recent try was today moving from oVirt 4.3.3 >>>>>> to >>>>>> > 4.3.5.5. The first host updates normally, but when it moves on >>>>>> to >>>>>> > the second host it fails to put it in maintenance and the >>>>>> cluster >>>>>> > upgrade stops. >>>>>> > >>>>>> > I suspect this is due to that fact that after my hosts are >>>>>> updated >>>>>> > it takes 10 minutes or more for all volumes to sync/heal. I >>>>>> have >>>>>> > 2Tb SSDs. >>>>>> > >>>>>> > Does the cluster upgrade process take heal time in to account >>>>>> before >>>>>> > attempting to place the next host in maintenance to upgrade it? >>>>>> Or >>>>>> > is there something else that may be at fault here, or perhaps a >>>>>> > reason why the heal process takes 10 minutes after reboot to >>>>>> complete? >>>>>> > _______________________________________________ >>>>>> > Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> >>>>>> > To unsubscribe send an email to users-le...@ovirt.org >>>>>> > <mailto:users-le...@ovirt.org> >>>>>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> > oVirt Code of Conduct: >>>>>> > https://www.ovirt.org/community/about/community-guidelines/ >>>>>> > List Archives: >>>>>> > >>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XM3QB3364ZYIPAKY4KTTOSJZMCWHUPD/ >>>>>> > >>>>>> > >>>>>> > _______________________________________________ >>>>>> > Users mailing list -- users@ovirt.org >>>>>> > To unsubscribe send an email to users-le...@ovirt.org >>>>>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> > oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> > List Archives: >>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBX3L23MWGMTF7Q4KGVR63RIQZFYXGWK/ >>>>>> > >>>>>> >>>>>> -- >>>>>> Systems Administrator >>>>>> Kunsthochschule für Medien Köln >>>>>> Peter-Welter-Platz 2 >>>>>> 50676 Köln >>>>>> _______________________________________________ >>>>>> Users mailing list -- users@ovirt.org >>>>>> To unsubscribe send an email to users-le...@ovirt.org >>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> List Archives: >>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OBAHFFFTDOI7LHAH5AVI5OPUQUQTABWM/ >>>>>> >>>>> _______________________________________________ >>>>> Users mailing list -- users@ovirt.org >>>>> To unsubscribe send an email to users-le...@ovirt.org >>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/T27ROHWZPJL475HBHTFDGRBSYHJMWYDR/ >>>>> >>>> >>>> >>>> -- >>>> >>>> Sandro Bonazzola >>>> >>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>> >>>> Red Hat EMEA <https://www.redhat.com/> >>>> >>>> sbona...@redhat.com >>>> <https://www.redhat.com/>*Red Hat respects your work life balance. >>>> Therefore there is no need to answer this email out of your office hours. >>>> <https://mojo.redhat.com/docs/DOC-1199578>* >>>> >>> >>> >>> -- >>> Martin Perina >>> Manager, Software Engineering >>> Red Hat Czech s.r.o. >>> >> _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/I4KLDBPYBCQEMC2MD6CRC5MLLLTYKJBG/ > -- Thanks, Kaustav Majumder
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ESENYDYMWG7YXQJWKQPXWBGBTDKW647H/