> On 16 Sep 2016, at 16:34, aleksey.maksi...@it-kb.ru wrote: > > Тested. > > If I run 'shutdown -h now' on host with running HA VM (not HostedEngine VM)... > > in oVirt web-console appears event: > > Sep 16, 2016 5:13:18 PM VM KOM-AD01-PBX02 is down. Exit message: User shut > down from within the guest
that would be another bug. It should be recognized properly as a “kill”. Can you please share host logs from this attempt as well? > > HA VM is turned off and will not start on another host. > > This journald log from HA VM guest OS: > > ... > Sep 16 17:06:48 KOM-AD01-PBX02 python[2637]: [100B blob data] > Sep 16 17:06:53 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for > reply from 91.189.91.157:123 (ntp.ubuntu.com). > Sep 16 17:07:03 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for > reply from 91.189.89.199:123 (ntp.ubuntu.com). > Sep 16 17:07:13 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for > reply from 91.189.89.198:123 (ntp.ubuntu.com). > Sep 16 17:07:23 KOM-AD01-PBX02 systemd-timesyncd[1739]: Timed out waiting for > reply from 91.189.94.4:123 (ntp.ubuntu.com). > Sep 16 17:08:48 KOM-AD01-PBX02 python[2637]: [90B blob data] > Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [155B blob data] > Sep 16 17:08:49 KOM-AD01-PBX02 python[2637]: [100B blob data] > Sep 16 17:10:49 KOM-AD01-PBX02 python[2637]: [90B blob data] > Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [155B blob data] > Sep 16 17:10:50 KOM-AD01-PBX02 python[2637]: [100B blob data] > -- Reboot -- > ... > > Before shutting down in the log no termination procedures. > It looks like a rough poweroff the VM yep, that is expected. But it should be properly detected as such and HE VM should restart. Somehow vdsm misidentifies the reason for the shutdown. > > 16.09.2016, 17:08, "Simone Tiraboschi" <stira...@redhat.com>: >> On Fri, Sep 16, 2016 at 4:02 PM, <aleksey.maksi...@it-kb.ru> wrote: >>> So, colleagues. >>> I again tested the Fencing and now I think that my host-server power-button >>> (physically or through ILO) sends a KILL-command to the host OS (and as a >>> result to VM) >>> This journald log in my guest OS when I press the power-button on the host: >>> >>> ... >>> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping ACPI event daemon... >>> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Stopping User Manager for UID >>> 1000... >>> Sep 16 16:19:27 KOM-AD01-PBX02 systemd[1]: Starting Unattended Upgrades >>> Shutdown... >>> Sep 16 16:19:27 KOM-AD01-PBX02 snapd[2583]: 2016/09/16 16:19:27.289063 >>> main.go:67: Exiting on terminated signal. >>> Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2940]: pam_unix(sshd:session): session >>> closed for user user >>> Sep 16 16:19:27 KOM-AD01-PBX02 su[3015]: pam_unix(su:session): session >>> closed for user root >>> Sep 16 16:19:27 KOM-AD01-PBX02 spice-vdagentd[2638]: vdagentd quiting, >>> returning status 0 >>> Sep 16 16:19:27 KOM-AD01-PBX02 sudo[3014]: pam_unix(sudo:session): session >>> closed for user root >>> Sep 16 16:19:27 KOM-AD01-PBX02 /usr/lib/snapd/snapd[2583]: main.go:67: >>> Exiting on terminated signal. >>> Sep 16 16:19:27 KOM-AD01-PBX02 sshd[2812]: Received signal 15; terminating. >>> ... >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Unmount All >>> Filesystems. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped target Local File >>> Systems (Pre). >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopping Monitoring of LVM2 >>> mirrors, snapshots etc. using dmeventd or progress polling... >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Remount Root and Kernel >>> File Systems. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Create Static Device >>> Nodes in /dev. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Shutdown. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Reached target Final Step. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Starting Reboot... >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Stopped Monitoring of LVM2 >>> mirrors, snapshots etc. using dmeventd or progress polling. >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd[1]: Shutting down. >>> Sep 16 16:19:28 KOM-AD01-PBX02 kernel: [drm:qxl_enc_commit [qxl]] *ERROR* >>> head number too large or missing monitors config: ffffc9000084a000, >>> 0systemd-shutdown[1]: Sending SIGTERM to remaining processes... >>> Sep 16 16:19:28 KOM-AD01-PBX02 systemd-journald[3342]: Journal stopped >>> -- Reboot -- >>> >>> Perhaps this feature of HP ProLiant DL 360 G5. I dont know. >>> >>> If I test the unavailability of a host other ways that everything is going >>> well. >>> >>> I described my experience testing Fencing on practical examples on my blog >>> for everyone in Russian. >>> https://blog.it-kb.ru/2016/09/16/install-ovirt-4-0-part-4-about-ssh-soft-fencing-and-hard-fencing-over-hp-proliant-ilo2-power-managment-agent-and-test-of-high-availability/ >>> >>> Thank you all very much for your participation and support. >>> >>> Michal, what kind of scenario are you talking about? >> >> Basically what you just did, >> the question is what happens when you run 'shutdown -h now' (or press the >> physical button if configured to trigger a soft shutdown); is it going to >> propagate somehow the shutdown action to the VMs or to brutally kill them? >> >> In the first case the VMs will not restart regardless of their HA flags. >> >>> PS: Excuse me for my bad English :) >>> >>> 16.09.2016, 16:37, "Simone Tiraboschi" <stira...@redhat.com>: >>>> On Fri, Sep 16, 2016 at 3:34 PM, Michal Skrivanek >>>> <michal.skriva...@redhat.com> wrote: >>>>>> On 16 Sep 2016, at 15:31, aleksey.maksi...@it-kb.ru wrote: >>>>>> >>>>>> Hi Simone. >>>>>> Exactly. >>>>>> Now I'll put the journald on the guest and try to understand how the >>>>>> guest off. >>>>> >>>>> great. thanks >>>>> >>>>>> 16.09.2016, 16:25, "Simone Tiraboschi" <stira...@redhat.com>: >>>>>>> On Fri, Sep 16, 2016 at 3:13 PM, Michal Skrivanek >>>>>>> <michal.skriva...@redhat.com> wrote: >>>>>>>>> On 16 Sep 2016, at 15:05, Gianluca Cecchi <gianluca.cec...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Fri, Sep 16, 2016 at 2:50 PM, Michal Skrivanek >>>>>>>>> <michal.skriva...@redhat.com> wrote: >>>>>>>>>> no, that’s not how HA works today. When you log into a guest and >>>>>>>>>> issue “shutdown” we do not restart the VM under your hands. We can >>>>>>>>>> argue how it should or may work, but this is the defined behavior >>>>>>>>>> since the dawn of oVirt. >>>>>>>>>> >>>>>>>>>>> AFAIK that's correct, we need to be able >>>>>>>>>>> shutdown HA VM >>>>>>>>>>> >>>>>>>>>>> without being it immediately restarted on different host. We want >>>>>>>>>>> to restart HA VM only if host, where HA VM is running, is >>>>>>>>>>> non-responsive. >>>>>>>>>> >>>>>>>>>> we try to restart it in all other cases other than user initiated >>>>>>>>>> shutdown, e.g. a QEMU process crash on an otherwise-healthy host >>>>>>>>> Hi, just another question in case HA is not configured at all. >>>>>>>> >>>>>>>> by “HA configured” I expect you’re referring to the “Highly Available” >>>>>>>> checkbox in Edit VM dialog. >>>>>>>> >>>>>>>>> If I run the "shutdown -h now" command on an host where some VMs are >>>>>>>>> running, what is the expected behavior? >>>>>>>>> Clean VM shutdown (with or without timeout in case it doesn't >>>>>>>>> complete?) or crash of their related QEMU processes? >>>>>>>> >>>>>>>> expectation is that you won’t do that. That’s why there is the >>>>>>>> Maintenance host state. >>>>>>>> But if you do that regardless, with VMs running, all the processes >>>>>>>> will be terminated in a regular system way, i.e. all QEMU processes >>>>>>>> get SIGTERM. From the perspective of each guest this is not a clean >>>>>>>> shutdown and it would just get killed >>>>>>> >>>>>>> Aleksey is reporting that he started a shutdown on his host by power >>>>>>> management and the VM processes didn't get roughly killed but smoothly >>>>>>> shut down and so they didn't restarted regardless of their HA flag and >>>>>>> so this thread. >>>>> >>>>> Gianluca talks about “shutdown -h now”, you talk about power management >>>>> action, those are two different things. The current idea is that systemd >>>>> or some other component just propagates the action to the guest and if >>>>> that guest is configured to handle it as a shutdown it starts it itself >>>>> as well so it looks like a user-initiated one. Even though this mostly >>>>> makes sense it is not ok for current HA logic >>>> >>>> Aleksey, can you please also test this scenario? >>>>>>>> Thanks, >>>>>>>> michal >>>>>>>>> Thanks, >>>>>>>>> Gianluca >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list >>>>>>>>> Users@ovirt.org >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> Users@ovirt.org >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> Users@ovirt.org >>>>>> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users