Hi Guys

I have a new KVM server, running software raid (mdadm) and the VM disk are help in a raid 5 with 5 disks (the system is on SSDs in a mirror).

So far I have about 10 VM's setup, but they are all unable to function because after we have a few up, and then start to deploy/resubmit the VM's which have never booted properly the disk IO will stop, the scp process will hang and it all stops. You will then find the following error in dmesg:

[ 1201.890311] INFO: task kworker/1:1:6185 blocked for more than 120 seconds. [ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1201.890569] kworker/1:1 D ffff88203fc13740 0 6185 2 0x00000000 [ 1201.890573] ffff881ffe510140 0000000000000046 0000000000000000 ffff881039023590 [ 1201.890580] 0000000000013740 ffff8820393fffd8 ffff8820393fffd8 ffff881ffe510140 [ 1201.890586] 0000000000000000 0000000100000000 0000000000000001 7fffffffffffffff
[ 1201.890593] Call Trace:
[ 1201.890597]  [<ffffffff81349d2e>] ? schedule_timeout+0x2c/0xdb
[ 1201.890605]  [<ffffffff810ebdbf>] ? kmem_cache_alloc+0x86/0xea
[ 1201.890610]  [<ffffffff8134a58a>] ? __down_common+0x9b/0xee
[ 1201.890631]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
[ 1201.890635]  [<ffffffff81063111>] ? down+0x25/0x34
[ 1201.890648]  [<ffffffffa041566f>] ? xfs_buf_lock+0x65/0x9d [xfs]
[ 1201.890665]  [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs]
[ 1201.890685]  [<ffffffffa045b957>] ? xfs_trans_getsb+0x64/0xb4 [xfs]
[ 1201.890704]  [<ffffffffa0452a40>] ? xfs_mod_sb+0x21/0x77 [xfs]
[ 1201.890720]  [<ffffffffa0422736>] ? xfs_reclaim_inode+0x22d/0x22d [xfs]
[ 1201.890734]  [<ffffffffa041a43e>] ? xfs_fs_log_dummy+0x61/0x75 [xfs]
[ 1201.890754]  [<ffffffffa04573a7>] ? xfs_log_need_covered+0x4d/0x8d [xfs]
[ 1201.890769]  [<ffffffffa0422770>] ? xfs_sync_worker+0x3a/0x6a [xfs]
[ 1201.890773]  [<ffffffff8105aeaa>] ? process_one_work+0x163/0x284
[ 1201.890778]  [<ffffffff8105be72>] ? worker_thread+0xc2/0x145
[ 1201.890782]  [<ffffffff8105bdb0>] ? manage_workers.isra.23+0x15b/0x15b
[ 1201.890787]  [<ffffffff8105efad>] ? kthread+0x76/0x7e
[ 1201.890794]  [<ffffffff81351cf4>] ? kernel_thread_helper+0x4/0x10
[ 1201.890799]  [<ffffffff8105ef37>] ? kthread_worker_fn+0x139/0x139
[ 1201.890804]  [<ffffffff81351cf0>] ? gs_change+0x13/0x13

and lots of them. With this stack track the CPU load will just increase and I have to power cycle it to get the system back. I have added the following sysctls:

fs.file-max = 262144
kernel.pid_max = 262144
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 87380 8388608
net.core.rmem_max = 25165824
net.core.rmem_default = 25165824
net.core.wmem_max = 25165824
net.core.wmem_default = 131072
net.core.netdev_max_backlog = 8192
net.ipv4.tcp_window_scaling = 1
net.core.optmem_max = 25165824
net.core.somaxconn = 65536
net.ipv4.ip_local_port_range = 1024 65535
kernel.shmmax = 4294967296
vm.max_map_count = 262144

but the import part I found out was:
#http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
vm.dirty_ratio=10

which does not seem to help thou.

Now some info on the disk:
#mount
/dev/md2 on /data type xfs (rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota)

cat /proc/meminfo
MemTotal:       132259720 kB
MemFree:        122111692 kB

cat /proc/cpuinfo (32 v cores)
processor    : 31
vendor_id    : GenuineIntel
cpu family    : 6
model        : 45
model name    : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz

Some info the Host:
# cat /etc/debian_version
wheezy/sid
# uname -a
Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux ii kvm 1:1.1.0+dfsg-3 dummy transitional package from kvm to qemu-kvm ii qemu-kvm 1.1.0+dfsg-3 Full virtualization on x86 hardware ii libvirt-bin 0.9.12-4 programs for the libvirt library ii libvirt0 0.9.12-4 library for interfacing with different virtualization systems ii python-libvirt 0.9.12-4 libvirt Python bindings ii opennebula 3.4.1-3+b1 controller which executes the OpenNebula cluster services ii opennebula-common 3.4.1-3 empty package to create OpenNebula users and directories ii opennebula-sunstone 3.4.1-3 web interface to which executes the OpenNebula cluster services ii opennebula-tools 3.4.1-3 Command-line tools for OpenNebula Cloud ii ruby-opennebula 3.4.1-3 Ruby bindings for OpenNebula Cloud API (OCA)

Any ideas on how to get this working right now the server is a lemon! :0

--
Jurgen Weber

Systems Engineer
IT Infrastructure Team Leader

THE ICONIC | E jurgen.we...@theiconic.com.au | www.theiconic.com.au

_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to