https://bugs.launchpad.net/qemu/+bug/1831225
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/177
Title:
guest migration 100% cpu freeze bug
Status in QEMU:
Fix Released
Bug description:
# I
as per the last two comments, this fix is already in 2.12
** Changed in: qemu
Status: New => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/177
Title:
guest migration 100% c
Hi David,
I can confirm that the specific patch solves our migration freezes. We
have not seen any freezes after applying this patch to 2.11.2.
We can close this issue as 'fix released'.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEM
Frank: If you're happy that
http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00820.html
then we should close this as 'fix released' since this was commit c2b01cfec1f
which went in v2.12.0-rc2; although we could then ask for qemu-stable.
--
You received this bug notification because yo
I may have found the issue and if it is the case it should be fixed after
applying: http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00820.html
Is there a reason why this patch is not backported to 2.11.2?
The theory is that the VM is not actually "frozen", but catching up in time, as
We finally managed to reproduce this issue in our test environment. 2
out of 3 VMs froze within 12 hours of constant migrations.
All migrations took place between Skylake Gold => non-Skylake Gold and
non-Skylake Gold => Skylake Gold. Test environment hypervisors are
running Debian 9, Qemu 2.11 and
As I only see the freezes when Skylake Gold is involved, I started to
look at more differences between Skylake and non-Skylake servers. I'll
paste the information I found so far below:
Intel(R) Xeon(R) Gold 6126 CPU
/sys/module/kvm_intel/parameters/enable_shadow_vmcs: Y
/sys/module/kvm_intel/param
I upgraded libvirt to version 4.4.0 and it looks like this behavior is
still there. It still changes the cpu options after migration. What I do
notice is that a "virsh dumpxml" shows me the following after starting a
new VM:
Westmere
I get this "virsh dumpx
Note that with the -cpu there are two issues:
a) Odd combinations of flags
b) The fact the -cpu options are different between source and dest
(b) is what's more likely to be a problem - the source/destination flags really
should be the same.
If they're differnt between source/dest on 2.11<=>2
It seems like all failing VMs are older, running kernel 3.*. I have not
seen this with more modern guests, but maybe that's just luck.
A customer with a freezing VM told me he's running tomcat/java
applications. I'm going to try and reproduce it with older guest
operating systems in combination wi
Right; even if you're getting the problem across multiple OS versions,
just pick one that fails; the most convenient one to debug/newest. Hmm
are all the ones that are failing old - that's an old Debian, an old
Ubuntu and an old CentOS? Do any more modern guests fail?
If the -cpu is different on
It's not a specific guest OS version that suffers from this bug, we
already had multiple incidents on Debian 7, Ubuntu 14.04 and CentOS 6. I
will try to reproduce it again.
The command line output from the destination host looks different
indeed, could it be possible that this is caused by a libvi
This looks suspicious:
$GUEST.log source: -cpu Westmere
$GUEST.log destination: -cpu
Westmere,vme=on,pclmuldq=on,x2apic=on,hypervisor=on,arat=on
libvirt is supposed to use the same device configuration on the source
and destination. In the best case, the additional options are useless
becaus
ok, it's always a pain when you can't reproduce it on demand, but it does
happen.
I suggest you start by figuring out which guest OS versions are hitting it; and
then just create a couple of VMs with identical configs and set them constantly
bouncing between a pair of boxes - a bit of scriptin
We have done over 30.000 migrations (of different VMs) between non-
Skylake CPUs from qemu 2.6 to 2.11 and have never encountered the "100%
freeze bug". That's why I'm sure it's related to Skylake (Intel(R)
Xeon(R) Gold 6126) CPU.
I'm testing migrations with my own VM between Skylake and non-Skyla
But you've not seen it in any migration between non-skylake CPUs?
Now that you've figured out it's skylake related, can you more easily repeat it?
e.g. what happens if you set your own VM up and make it migrate back and forth
between two
hosts constantly; can you get that to fail?
--
You receiv
I have seen it with migrations between Skylake servers, but also to and
from Skylake to non-Skylake servers.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/177
Title:
guest migration 100% cpu fr
That 'PAUSE' thing is just a performance issue (in weird cases); so
whatever the problem is it's unlikely related to that - there are loads
of other changes in Skylake.
So, lets narrow it down - have you seen it in migrations *between* a
pair of skylakes, or just when it's migrating too/from somet
After doing another 10.000 migrations on pre-Skylake hardware it looks like the
"100% freeze bug" is not related to qemu 2.6 => 2.11 migrations, but only to
Skylake hardware.
We only see this bug with migrations to/from the new "Intel(R) Xeon(R) Gold
6126" CPUs.
I stumbled across an article abo
I do like to add that I only saw this when the source we migrate from is
running on a relatively new CPU: Intel(R) Xeon(R) Gold 6126 CPU @
2.60GHz.
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
stepping: 4
guest xml definition:
vps25
0cf4666d-6855-b3a8-12da-2967563f
8388608
8388608
4
/machine
hvm
Westmere
destroy
restart
restart
/usr/bin/kvm
734003200
57
hangs like this after migration are a pain to debug; especially with
that really rare recurrence rate.
The fact the RIP is changing and is moving in and out of the kernel suggests
something is happening; so it might be that we've corrupted some memory, or got
a device in a mess where the device
I don't have any suggestions wrt the actual bug cause, but just want to
suggest adding the XML config and corresponding CLI args used on both
the source and dest hosts (see /var/log/libvirt/qemu/$GUEST.log) to this
bug, for one of the VMs that sees the 100% CPU hang.
--
You received this bug noti
23 matches
Mail list logo