Sounds like a kernel bug. From: Indra Pramana <in...@sg.or.id<mailto:in...@sg.or.id>> Reply-To: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>" <users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>> Date: Thursday, May 22, 2014 at 2:59 AM To: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>" <users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>> Cc: "ehle...@gmail.com<mailto:ehle...@gmail.com>" <ehle...@gmail.com<mailto:ehle...@gmail.com>> Subject: Re: Major stability problems lately
Hi Timothy and all, Apologise for replying to an old thread. I noticed that nobody replied to you on this thread, may I know if you have managed to find the root cause of the problem, and the solution? I seems to have similar issues with one of our hypervisors. We are using CloudStack 4.2.0 and KVM. The error message is similar, started with "BUG: soft lockup - CPU stuck" error message. Nothing can be found on cloudstack-agent.log file. http://pastebin.com/4GW9yPsm Looking forward to your reply, thank you. Cheers. On Thu, Nov 14, 2013 at 10:34 AM, Timothy Ehlers <ehle...@gmail.com<mailto:ehle...@gmail.com>> wrote: We are experiencing massive instability and cannot determine whats causing this. Every so often jvsvc triggers the following in our system logs: Nov 13 18:59:31 cpegh0009 kernel: \[15188599.258955\] BUG: soft lockup - CPU#24 stuck for 22s\! \[jsvc:60385\] Nov 13 18:59:31 cpegh0009 kernel: \[15188599.266229\] Modules linked in: mptctl mptbase vhost_net macvtap macvlan 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables nfsd kvm_amd kvm ghash_clmulni_intel aesni_intel cryptd aes_x86_64 nfs microcode psmouse radeon serio_raw ttm drm_kms_helper amd64_edac_mod joydev drm edac_core fam15h_power k10temp edac_mce_amd i2c_algo_bit sp5100_tco i2c_piix4 hpilo hpwdt lockd bridge stp mac_hid llc fscache auth_rpcgss acpi_power_meter nfs_acl bonding sunrpc lp parport hid_generic usbhid hid pata_atiixp ixgbe dca hpsa mdio Nov 13 18:59:31 cpegh0009 kernel: \[15188599.266322\] CPU 24 Nov 13 18:59:31 cpegh0009 kernel: \[15188599.266323\] Modules linked in: mptctl mptbase vhost_net macvtap macvlan 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables nfsd kvm_amd kvm ghash_clmulni_intel aesni_intel cryptd aes_x86_64 nfs microcode psmouse radeon serio_raw ttm drm_kms_helper amd64_edac_mod joydev drm edac_core fam15h_power k10temp edac_mce_amd i2c_algo_bit sp5100_tco i2c_piix4 hpilo hpwdt lockd bridge stp mac_hid llc fscache auth_rpcgss acpi_power_meter nfs_acl bonding sunrpc lp parport hid_generic usbhid hid pata_atiixp ixgbe dca hpsa mdio Nov 13 18:59:31 cpegh0009 kernel: \[15188599.266378\] Nov 13 18:59:31 cpegh0009 kernel: \[15188599.266382\] Pid: 60385, comm: jsvc Not tainted 3.5.0-23-generic #35~precise1-Ubuntu HP ProLiant DL585 I am not sure if this is the cause of the high load or an after effect.. 03:25:01 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked 06:45:01 PM 31 982 36.95 39.33 41.50 0 06:55:01 PM 17 1000 28.53 37.28 40.06 0 07:05:01 PM 60 954 114.52 91.36 63.66 0 07:15:01 PM 48 961 29.55 53.94 60.76 0 07:25:01 PM 12 895 13.23 24.64 42.47 0 07:35:01 PM 5 772 8.02 13.32 28.31 0 We run ubuntu 12.04.3 LTS on HP DL585s with 64 AMD cores and .5 TB of ram. This will host approx 40~50 vms (centos 5 guest). Agent version is: Version: 1:4.0.2 Any ideas? Perhaps gathering cpu usage data on the jsvc pid ? -- Tim Ehlers