Public bug reported:

In the thread at
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/127042/focus=129294,
three commits were identified to fix live migration for qemu 2.0 (at
least), which I am using on trusty. I would like to get these pulled-in
by the package maintainer.

I have cherry-picked those three commits (with some considerable fix-up
for the first , which may or may not be correct; the others apply
cleanly) and built packages locally.  Installing that on the migration-
receiver seems to fix my guest lockups after live-migrating.  I can
attach the patches I'm using if someone is able to review my fix-ups to
the first one.

My original problem description was:
Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows
it down a /whole lot/ ...), live migration started killing my Ubuntu precise
(kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop.  Once
(and only once) I've observed the guest eventually becoming responsive again,
with a clock nearly 600 years in the future and a negative uptime.

I haven't been able to dig up any previous threads about this problem, so my
gut instinct is that I've configured something wonky.  Any pointers toward
/what/ I may have done wrong are appreciated.

It only seems to happen if I've given the guests Nehalem-class CPU features.
My longest-running VMs, from before I started passing-through the CPU
capabilities into the guest, seem to migrate without issue.

It also seems to happen reliably when the guest has been running for a while;
it's easily reproducible with guests that have been up ~1 day, and I've
reproduced it in VMs with an uptime of ~20 hours.  I haven't yet figured out a
lower-bound, which makes the testing cycle a little longer for me.

The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
the current 3.2 kernel that Canonical distributes.  Recent Fedora kernels
(3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
case exhaustively, and I haven't written down very good notes for the tests I
have done with Fedora.

The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04
and the associated 3.13 kernel.  I had previously reproduced this with 12.04
running a raring-backport 3.11 kernel as well, but I (seemingly erroneously)
assumed it may have been a qemu userspace discrepancy.

** Affects: qemu (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1398718

Title:
  Live migration locks up Linux 3.2-based guests

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1398718/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to