If I've got comment 27 right, the issue has also been fixed upstream, so
I'm setting the status now to "Fix released". If there's still something
left to do here, feel free to change it again.
** Changed in: qemu
Status: New => Fix Released
--
You received this bug notification because
This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.25
---
qemu (2.0.0+dfsg-2ubuntu1.25) trusty; urgency=medium
[Kai Storbeck]
* backport patch to fix guest hangs after live migration (LP: #1297218)
-- Serge Hallyn Fri, 01 Jul 2016 14:25:20
Awesome- thanks for verifying
** Tags removed: verification-needed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage
Hello Chris,
Steps taken to test the proposed package:
1) enabled trusty-proposed
2) installed qemu-system-arm qemu-system-common qemu-system-misc
qemu-system-x86 qemu-user version 2.0.0+dfsg-2ubuntu1.25
3) again on a second trusty14.04 server
4) migrate 41 running VM's (uptimes vary between 1
Hello Paul, or anyone else affected,
Accepted qemu into trusty-proposed. The package will build now and be
available at https://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-
2ubuntu1.25 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See
No, I'm afraid not. But if you can test when this package is accepted
into trusty-proposed that'll be great.
** Description changed:
+ =
+ SRU Justification:
+ 1. Impact: guests hang after live migration with 100% cpu
+ 2. Upstream fix: a set of four patches
We have four identical Ubuntu servers running libvirt/kvm/qemu, with
access to CEPH rbd filesystems. Guests can be live migrated between
them. However, live migration leads in ~30% of the cases to guests being
stuck at 100%. Only a few times we had the patience to wait, and upon
logging in our
Conflicting experimental packages in that ppa, trying ubuntu-virt/ppa
instead.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To
Thank you. I'm doing a test build in ppa:serge-hallyn/virt, and will
run a full regression test from there. I'll push for SRU if that
passes.
Would you mind putting in the bug Description (at top) a concise summary
of the test case, for the SRU process?
--
You received this bug notification
I can reasonably assume that this solved my problem. I've live migrated
41 VM's 5 times between 2 hypervisors without the 100% cpu problem
appearing.
My production servers run 2.0.0+dfsg-2ubuntu1.22, and still observe the
same problem.
Attached is the patch that I created with quilt in
That patch does not apply cleanly on qemu (2.0.0+dfsg, from trusty).
There are changes in "kvmclock_pre_save" and "kvmclock_post_save",
there's only "kvmclock_vm_state_change" in 2.0.0.
Peeking at the 4 referenced patches on
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789
the code
See
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1297218/+attachment/4301780/+files/backport.patch
referenced in comment #29
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
@serge,
I'd be happy to test each of the patches, but considering the length of
this page I'd like an exact link to a patch and/or patches that need to
be tested, and against which version (trusty-security i suppose?).
--
You received this bug notification because you are a member of Ubuntu
@Steve,
it seems to me those are the same as the 'backport.patch' from an
earlier comment?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc
In the absence of any progress on this, and suddenly remembering that
I'm occasionally affected by this, I did some digging and found this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789
Looks like those four patches might be the solution?
** Bug watch added: Debian Bug tracker
But unfortunately we do not know which patch fixed it, making an SRU
much more problematic. Someone who is able to reproduce the bug would
need to try to either bisect, or make educated guesses and test patch
cherrypicks.
--
You received this bug notification because you are a member of Ubuntu
But unfortunately we do not know which patch fixed it, making an SRU
much more problematic. Someone who is able to reproduce the bug would
need to try to either bisect, or make educated guesses and test patch
cherrypicks.
--
You received this bug notification because you are a member of Ubuntu
Thanks - marked fixed released for development release. We can SRU this
into trusty if we know exactly which patch actualy fixed it.
** Changed in: qemu (Ubuntu)
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Server Team, which
Thanks - marked fixed released for development release. We can SRU this
into trusty if we know exactly which patch actualy fixed it.
** Changed in: qemu (Ubuntu)
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
Thanks for looking into this. We started experimenting with live
migration on 14.04 and stumbled over this bug. As a workaround we've
installed qemu from the Ubuntu Cloud archive
(https://wiki.ubuntu.com/ServerTeam/CloudArchive). I can confirm this
bug is fixed in
Thanks for looking into this. We started experimenting with live
migration on 14.04 and stumbled over this bug. As a workaround we've
installed qemu from the Ubuntu Cloud archive
(https://wiki.ubuntu.com/ServerTeam/CloudArchive). I can confirm this
bug is fixed in
Could someone confirm whether this is fixed in 15.04 and/or 15.10?
** Changed in: qemu (Ubuntu)
Status: Triaged = Incomplete
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
Could someone confirm whether this is fixed in 15.04 and/or 15.10?
** Changed in: qemu (Ubuntu)
Status: Triaged = Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
I'm sorry, but I'm not clear at this point on the status of this bug. I
never received an answer to comments #32 and comment #35, and don't know
what, if anything, to apply in an SRU.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to
I'm sorry, but I'm not clear at this point on the status of this bug. I
never received an answer to comments #32 and comment #35, and don't know
what, if anything, to apply in an SRU.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
I believe this should be fixed in 15.04, as the cited patches are
present. Could someone confirm?
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live
I believe this should be fixed in 15.04, as the cited patches are
present. Could someone confirm?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: glusterfs (Ubuntu Trusty)
Status: New = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: glusterfs (Ubuntu Trusty)
Status: New = Confirmed
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
Ping.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage notifications about this bug go to:
Ping.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage notifications about this bug go to:
Hi Serge,
Yes, that's the case. Let me also make it clear that this is a backport on top
of qemu 1.2 stable.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live
Hi Serge,
Yes, that's the case. Let me also make it clear that this is a backport on top
of qemu 1.2 stable.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs
Hi,
just to be clear, the backport.patch which you uploaded actually
increased jitter for you, making the situation worse, right?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest
Hi,
just to be clear, the backport.patch which you uploaded actually
increased jitter for you, making the situation worse, right?
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Hi,
I've seen some strange time behavior in some of our VMs usually
triggered by live migration. In some VMs we have seen some significant
time drift which NTP was not able to correct after doing a live
migration.
I've not been able so far to reproduce the same case, however, I did
notice that
Hi,
I've seen some strange time behavior in some of our VMs usually
triggered by live migration. In some VMs we have seen some significant
time drift which NTP was not able to correct after doing a live
migration.
I've not been able so far to reproduce the same case, however, I did
notice that
The attachment backport.patch seems to be a patch. If it isn't,
please remove the patch flag from the attachment, remove the patch
tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the
team.
[This is an automated message performed by a Launchpad user owned by
~brian-murray, for
The attachment backport.patch seems to be a patch. If it isn't,
please remove the patch flag from the attachment, remove the patch
tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the
team.
[This is an automated message performed by a Launchpad user owned by
~brian-murray, for
It looks as though the relevant commits were re-committed to upstream
git HEAD (9a48bcd1b82494671c09b0eefdb882581499 and
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c). So this may be fixed in
vivid, and we might be able to cherrypick the final patches to trusty.
** Package changed: libvirt
@Paul,
could you confirm whether qemu 1:2.2+dfsg-3exp~ubuntu1 from
https://launchpad.net/~ubuntu-virt/+archive/ubuntu/virt-daily-upstream
fixes this issue? If it does then I'll go ahead and backport the patch.
--
You received this bug notification because you are a member of Ubuntu
Server
It looks as though the relevant commits were re-committed to upstream
git HEAD (9a48bcd1b82494671c09b0eefdb882581499 and
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c). So this may be fixed in
vivid, and we might be able to cherrypick the final patches to trusty.
** Package changed: libvirt
@Paul,
could you confirm whether qemu 1:2.2+dfsg-3exp~ubuntu1 from
https://launchpad.net/~ubuntu-virt/+archive/ubuntu/virt-daily-upstream
fixes this issue? If it does then I'll go ahead and backport the patch.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which
Andrey: the bug also occurs when not using '--copy-storage-inc'. I
originally encountered the bug on a pair of servers that share a
glusterfs filesystem. As part of the debugging effort, I took glusterfs
out of the equation to show that it is not the cause of the issue. My
test envirement is
Another test with NTP disabled on the servers (but enabled on the guest):
Still running qemu-git-2.1.0-rc2-git-20140721
server-a:~$ ntpdate -q cl0
stratum 3, offset -29.405612, delay 0.02597
server-b$ ntpdate -q cl0
stratum 3, offset -32.990292, delay 0.02597
The guest is running NTP, hosted on
Paul,
do I understand right that:
1. disabling ksm on the hosts always fixes the pause on migration
2. disabling ksm on the host is not needed with the patchset by
Alexander Graf?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
Andrey,
I don't quite understand (I suspect Paul didn't quite understand) what
you wanted tested. Could you please rephrase, as specifically as
possible? Do you want Paul to verify that the packages with the
patchset still work with --copy-storage-inc enabled?
--
You received this bug
As another test (still running qemu-git-2.1.0-rc2-git-20140721), I
disabled NTP on the two servers (and rebooted them), but left it running
on the guest.
When doing the migration, server a (where the guest was running) had an
NTP offset of -3.037619 s, and server b was at -3.337718 s. The guest
Serge, Paul
As I can see all the cases in this ticket involves storage blkcopy with
--copy-storage-inc. My initial reason for rolling this patchset back was
a freeze on p2p migration without this flag being set. Although I am
hitting both of the problems, and cumulative after-migration delay
** Also affects: qemu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to libvirt in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To
** Also affects: qemu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage
I've installed the latest virt-daily-upstream yesterday, and tried a
migration today. Result: guest froze for 3.6 seconds after only 1 day of
uptime, exactly the behaviour as seen with the stock Ubuntu-14.04
packages.
So with qemu-git-2.1.0-rc2-git-20140721, the migration problem is back.
This
I've pushed those 3 patches on top of the otherwise identical package in
the same ubuntu-virt/virt-daily-upstream ppa. If you can confirm that they
again fix it, then we can cherrypick those and ask inform upstream.
--
You received this bug notification because you are a member of Ubuntu
Bugs,
** Changed in: libvirt (Ubuntu)
Importance: Low = High
** Changed in: libvirt (Ubuntu)
Status: New = Triaged
** Changed in: glusterfs (Ubuntu)
Status: New = Invalid
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to
** Changed in: libvirt (Ubuntu)
Importance: Low = High
** Changed in: libvirt (Ubuntu)
Status: New = Triaged
** Changed in: glusterfs (Ubuntu)
Status: New = Invalid
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Using the packages from the virt-daily-upstream PPA, as suggested in
#16, seems to resolve the issue: there are no detectable hangs after a
live migration, and the clock offset afterwards is only 0.13s, where it
used to be 2.3s for the same amount of uptime.
$ dpkg -l \*qemu\* |awk '/^ii/{print
In the production setup, we have two KVM servers. According to NTP, their clock
corrections are:
server-a: -147.2 ppm
server-b: -142.1 ppm
NTP is running on the guest as well, and it's drift-rate matches
whichever server the guest is running on, after NTP has had time to
adjust.
The length of
Could you please try to reproduce this using the qemu version at
https://launchpad.net/~ubuntu-virt/+archive/virt-daily-upstream ?
I will work on merging the newest debian qemu today, but the virt-daily-
upstream ppa has the git upstream HEAD. It would be good to know
whether that fixes this.
I've repeated the experiment without any shared storage, so that
eliminates GlusterFS as a suspect.
server-a# virsh migrate --live --persistent --undefinesource --copy-
storage-inc guest qemu+tls://server-b/system
Result: After about a week of uptime, the guest froze solid for 27
seconds after
After some deliberation on what shared storage to use, I decided to take
that factor out of the equation alltogether:
server-a# virsh migrate --live --persistent --undefinesource --copy-
storage-inc guest qemu+tls://server-b/system
So the storage is now on a non-shared directory and copied
I've just done a migration on my test setup, on a guest having 21 days of
uptime.
Result: The guest froze for 53 seconds, then went happily on its way again.
The 'TSC unstable' message did not show up, but ntp shows that the machine is
now 53 seconds behind. The physical hosts have no NTP timing
So it is behind for exactly as long as it was frozen.
I wonder if you could reproduce this with upstream qemu git HEAD.
But non-gluster reproduction will also be interesting.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
A few notes after a few weeks of playing with this.
I've not reproduced the failure with migration per se.
However, when I did not add 'aio=native' to the backing store flags
(i.e. file=/mnt/disk.img,if=virtio,cache=none,aio=native), after around
2 days qemu would exit saying
(qemu) qemu:
A few notes after a few weeks of playing with this.
I've not reproduced the failure with migration per se.
However, when I did not add 'aio=native' to the backing store flags
(i.e. file=/mnt/disk.img,if=virtio,cache=none,aio=native), after around
2 days qemu would exit saying
(qemu) qemu:
I've set up a test environment using Ubuntu 14.04 LTS on two servers.
These are running glusterfs as a shared filesystem (3.5.0 from semiosis' PPA),
connected through QDR infiniband as a backend.
Live migrating a guest (also running 14.04 LTS) after a few days of uptime
still leads to a clock
I've tried running apport, but it says 'no packages found matching
libvirt'
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage
Could you try using libvirt-bin as the package name? (I thought it
had to be the other way around, but it's worth a try)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs
apport information
** Tags added: apparmor apport-collected trusty
** Description changed:
We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a
Gluster filesystem. Guests can be live migrated between them. However,
live migration often leads to the guest being stuck at
I would be curious to hear whether this can also be reproduced with a
ceph or an nfs backing store (though i'm not asking you to set that up
if noone speaks up to say yes - i'll just have to test it myself)
Has this ever reproduced immediately, or have you just never tried
migrating without a few
Marking high priority (though if it turns out to be only related to a
gluster backend, so that using ceph is a workaround, then we should
lower it to medium per guidelines)
** Changed in: libvirt (Ubuntu)
Importance: Undecided = High
** Changed in: libvirt (Ubuntu)
Status: Incomplete =
These servers will be upgraded to 14.04LTS once that's released, I'll
update the ticket accordingly once we've tested this.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs
Thanks for reporting this bug. Unfortunately 13.04 (on your servers) is
no longer supported. Could you test to see if you can reproduce this on
13.10 or 14.04? If you can, then I'll mark this as also affecting linux
(the kernel) and hopefully we can get 'apport-collect 1297218' output
from both
72 matches
Mail list logo