I've repeated the experiment without any shared storage, so that
eliminates GlusterFS as a suspect.

server-a# virsh migrate --live --persistent --undefinesource --copy-
storage-inc guest qemu+tls://server-b/system

Result: After about a week of uptime, the guest froze solid for 27
seconds after the migration. This is after the migration, because the
guest is running on the destination server, using up a full core, and
not present on the originating server anymore. CPU usage goes back to
normal once the guest becomes responsive again.

Just before the migration, NTP was perfectly locked to well within
100us. Right after the machine become responsive again, this NTP status
shows the machine simply lost more than 27 seconds:

root@guest:~# ntpq -p 
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*cl0     xx.xx.xx.xx       3 u   15   16  377    0.457  27388.3   0.100
 cl1     xx.xx.xx.xx       3 u   13   16  377    0.429  27388.4   0.178

root@guest:~# uptime
 16:03:30 up 8 days, 23:45,  1 user,  load average: 0.02, 0.02, 0.05

During these 27 seconds, it did not respond to any network activity or
(virtual) console. There is no mention of clock-jumps or anything else
in dmesg this time.

Note that I have now reproduced this on two different pairs of machines:
our original KVM cluster, and two compute nodes (different hardware) to
test this with a supported Ubuntu release.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/1297218/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to