I am wondering if there are any other big OpenNebula clouds out there using 
RHEL 6.3 or 6.4,
Centos 6.3 or 6.4, or Scientific Linux 6.3 or 6.4?

We are seeing a fairly nasty performance problem, but only on intel-based 
"Sandy Bridge" or "Ivy Bridge"
based hardware.  If you have N kvm-based virtual machines running (N>=4 as far 
as I can tell)
and then do a lot of disk and I/O  activity on the hypervisor, for example 
migrating several more virtual machines to or from the bare metal, and if at 
least one of those virtual machines is doing some I/O too, there is a failure
mode such that you start seeing sshd processes (from oneadmin monitoring or 
otherwise) hanging and taking 100%
of CPU. Ping times to virtual machines become very widely varied, in extreme 
cases the VM can even go
off the network entirely in such a fashion that ifdown/ifup doesn't bring it 
back and sometimes you can't even kill
it with virsh destroy.  A couple times we have even managed to crash the 
hypervisor irreversibly so it has to be power cycled.

If all the surviving virtual machines are shut down, the system then returns to 
normal and all the hung processes exit.

Has anyone else seen problems iike this?  If so please let me know.  There 
seems to be little if anything out there about this bug and that is strange 
since it has been out there for a while.

Steve Timm


_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to