[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2019-05-31 Thread Dion Bosschieter
https://bugs.launchpad.net/qemu/+bug/1831225 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/177 Title: guest migration 100% cpu freeze bug Status in QEMU: Fix Released Bug description: # I

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-08-06 Thread Dr. David Alan Gilbert
as per the last two comments, this fix is already in 2.12 ** Changed in: qemu Status: New => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/177 Title: guest migration 100% c

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-08-06 Thread Frank Schreuder
Hi David, I can confirm that the specific patch solves our migration freezes. We have not seen any freezes after applying this patch to 2.11.2. We can close this issue as 'fix released'. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEM

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-08-06 Thread Dr. David Alan Gilbert
Frank: If you're happy that http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00820.html then we should close this as 'fix released' since this was commit c2b01cfec1f which went in v2.12.0-rc2; although we could then ask for qemu-stable. -- You received this bug notification because yo

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-07-18 Thread Frank Schreuder
I may have found the issue and if it is the case it should be fixed after applying: http://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00820.html Is there a reason why this patch is not backported to 2.11.2? The theory is that the VM is not actually "frozen", but catching up in time, as

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-22 Thread Frank Schreuder
We finally managed to reproduce this issue in our test environment. 2 out of 3 VMs froze within 12 hours of constant migrations. All migrations took place between Skylake Gold => non-Skylake Gold and non-Skylake Gold => Skylake Gold. Test environment hypervisors are running Debian 9, Qemu 2.11 and

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Frank Schreuder
As I only see the freezes when Skylake Gold is involved, I started to look at more differences between Skylake and non-Skylake servers. I'll paste the information I found so far below: Intel(R) Xeon(R) Gold 6126 CPU /sys/module/kvm_intel/parameters/enable_shadow_vmcs: Y /sys/module/kvm_intel/param

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Frank Schreuder
I upgraded libvirt to version 4.4.0 and it looks like this behavior is still there. It still changes the cpu options after migration. What I do notice is that a "virsh dumpxml" shows me the following after starting a new VM: Westmere I get this "virsh dumpx

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Dr. David Alan Gilbert
Note that with the -cpu there are two issues: a) Odd combinations of flags b) The fact the -cpu options are different between source and dest (b) is what's more likely to be a problem - the source/destination flags really should be the same. If they're differnt between source/dest on 2.11<=>2

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Frank Schreuder
It seems like all failing VMs are older, running kernel 3.*. I have not seen this with more modern guests, but maybe that's just luck. A customer with a freezing VM told me he's running tomcat/java applications. I'm going to try and reproduce it with older guest operating systems in combination wi

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Dr. David Alan Gilbert
Right; even if you're getting the problem across multiple OS versions, just pick one that fails; the most convenient one to debug/newest. Hmm are all the ones that are failing old - that's an old Debian, an old Ubuntu and an old CentOS? Do any more modern guests fail? If the -cpu is different on

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-20 Thread Frank Schreuder
It's not a specific guest OS version that suffers from this bug, we already had multiple incidents on Debian 7, Ubuntu 14.04 and CentOS 6. I will try to reproduce it again. The command line output from the destination host looks different indeed, could it be possible that this is caused by a libvi

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-19 Thread Eduardo Habkost
This looks suspicious: $GUEST.log source: -cpu Westmere $GUEST.log destination: -cpu Westmere,vme=on,pclmuldq=on,x2apic=on,hypervisor=on,arat=on libvirt is supposed to use the same device configuration on the source and destination. In the best case, the additional options are useless becaus

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-19 Thread Dr. David Alan Gilbert
ok, it's always a pain when you can't reproduce it on demand, but it does happen. I suggest you start by figuring out which guest OS versions are hitting it; and then just create a couple of VMs with identical configs and set them constantly bouncing between a pair of boxes - a bit of scriptin

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-19 Thread Frank Schreuder
We have done over 30.000 migrations (of different VMs) between non- Skylake CPUs from qemu 2.6 to 2.11 and have never encountered the "100% freeze bug". That's why I'm sure it's related to Skylake (Intel(R) Xeon(R) Gold 6126) CPU. I'm testing migrations with my own VM between Skylake and non-Skyla

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-19 Thread Dr. David Alan Gilbert
But you've not seen it in any migration between non-skylake CPUs? Now that you've figured out it's skylake related, can you more easily repeat it? e.g. what happens if you set your own VM up and make it migrate back and forth between two hosts constantly; can you get that to fail? -- You receiv

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-18 Thread Frank Schreuder
I have seen it with migrations between Skylake servers, but also to and from Skylake to non-Skylake servers. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/177 Title: guest migration 100% cpu fr

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-18 Thread Dr. David Alan Gilbert
That 'PAUSE' thing is just a performance issue (in weird cases); so whatever the problem is it's unlikely related to that - there are loads of other changes in Skylake. So, lets narrow it down - have you seen it in migrations *between* a pair of skylakes, or just when it's migrating too/from somet

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-18 Thread Frank Schreuder
After doing another 10.000 migrations on pre-Skylake hardware it looks like the "100% freeze bug" is not related to qemu 2.6 => 2.11 migrations, but only to Skylake hardware. We only see this bug with migrations to/from the new "Intel(R) Xeon(R) Gold 6126" CPUs. I stumbled across an article abo

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-11 Thread Dion Bosschieter
I do like to add that I only saw this when the source we migrate from is running on a relatively new CPU: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz. vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz stepping: 4

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-07 Thread Dion Bosschieter
guest xml definition: vps25 0cf4666d-6855-b3a8-12da-2967563f 8388608 8388608 4 /machine hvm Westmere destroy restart restart /usr/bin/kvm 734003200 57

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-07 Thread Dr. David Alan Gilbert
hangs like this after migration are a pain to debug; especially with that really rare recurrence rate. The fact the RIP is changing and is moving in and out of the kernel suggests something is happening; so it might be that we've corrupted some memory, or got a device in a mess where the device

[Qemu-devel] [Bug 1775555] Re: guest migration 100% cpu freeze bug

2018-06-07 Thread Daniel Berrange
I don't have any suggestions wrt the actual bug cause, but just want to suggest adding the XML config and corresponding CLI args used on both the source and dest hosts (see /var/log/libvirt/qemu/$GUEST.log) to this bug, for one of the VMs that sees the 100% CPU hang. -- You received this bug noti