Thanks for the data so far, the guest does not look very "special" other than the ceph storage and that looks fine at a first glance.
It seems 4 of the 6 guest vCPU threads are what is in this 100% hog. None of the other helper threads seems busy. Of these vCPU threads we see that they are about 50% in host-kernel and 50% guest and not much more. I wonder what they are doing ... We can see in the strace that the vCPUs never leave to userspace (which they'd do for heavy exits). Instead those seem to really just spin between kernel and guest as seen in the cpu utilization. ioctl(34, KVM_RUN, 0 <unfinished ...> Every now and then we can see most of the other threads to show up on Futex locks. Sometimes a few ceph/rbd related messages also show up like read(25, 0x7fe9f8006ec0, 4096) = -1 EAGAIN (Resource temporarily unavailable) <0.00001 Both could be a red herring or not - no message is clear enough to point that out yet. So for the next step the guest still seems to do soemthing (even if it might spin in a bad loop) and regularly exits to the host kernel. Lets try to find where that is, once you have a guest in that situation you might: 1. check which kind of host exits we see 2. check where the guest is atm If the affected guest is the only one for #1 you can e.g. run perf kvm stat like: $ sudo perf kvm stat --live But since you know the PID of one of the vCPU threads that we are interested in this would be better (also add -d 30 for some more reliability in the numbers): $ sudo perf kvm stat --live -d 30 --vcpu 0 --pid=<pid-of-one-vcpu-thread> Let it run for a while and then report what exits you are seeing in your case. An example of an idle guest is below. Maybe also worth could be the KVM tracepoints, you can check that (globally) with: $ sudo perf stat -e 'kvm:*' sleep 30s [2]: has more general info on perf counters with KVM e.g. how to get kallsysms and modules files. For some of these actions (to get more details) you might want to get a dbgsym of the guest kernel [1] has more about that. For your #2 you could maybe run: # Record data to file (on Host) $ sudo perf kvm --host --guest --guestkallsyms=kallsyms --guestvmlinux=debug-kernel/usr/lib/debug/boot/vmlinux-5.0.0-13-generic --guestmodules=modules record # Host info $ sudo perf kvm --host report -i perf.data.kvm $ sudo perf kvm --guest report -i perf.data.kvm In general it would help to clean results by isolating the host that the affected guest runs on to only run this guest and nothing else - not sure how doable that is for your case thou :-/ Lets see from here what we get ... [1]: https://wiki.ubuntu.com/Kernel/CrashdumpRecipe#Inspecting_the_crash_dump_using_crash [2]: https://www.linux-kvm.org/page/Perf_events#Recording_events_for_a_guest Example idle guest exits: Analyze events for pid(s) 14470, VCPU 0: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time MSR_WRITE 1585 75.55% 0.00% 0.00us 7.83us 0.85us ( +- 2.09% ) HLT 473 22.55% 100.00% 0.00us 100138.85us 58658.26us ( +- 2.88% ) MSR_READ 30 1.43% 0.00% 0.00us 2.82us 0.78us ( +- 9.24% ) EXTERNAL_INTERRUPT 7 0.33% 0.00% 0.00us 6.68us 1.61us ( +- 52.60% ) PREEMPTION_TIMER 2 0.10% 0.00% 0.00us 0.95us 0.92us ( +- 3.65% ) PENDING_INTERRUPT 1 0.05% 0.00% 0.00us 0.62us 0.62us ( +- 0.00% ) Total Samples:2098, Total events handled time:27746743.01us. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1826051 Title: VMs go to 100% CPU after live migration from Trusty to Bionic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1826051/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs