** Description changed: + ======================== + Impact: libvirt hangs + Fix: mutex libvirt's access to cgmanager + Test case: see script in #14 + Regression potential: if this is done wrongly, it could cause a deadlock. No non-cgmanager codepaths are affected. + ======================== + reference bug 1367702. As per request, opening new ticket w/ instructions to reproduce. This is on 14.10 server, libvirt-bin 1.2.8-0ubuntu11.1 As per 1367702, this is not using LXC (which u used in your attempt). This is running bare-metal, no container, no hypervisor. Each VM below is started from OpenStack nova-compute (this node is a compute-only node). don@nubo-5:~$ sudo service cgmanager restart cgmanager stop/waiting cgmanager start/running, process 22588 don@nubo-5:~$ virsh list - Id Name State + Id Name State ---------------------------------------------------- - 2 instance-000015de running - 3 instance-000015df running - 4 instance-000015e0 running - 5 instance-000015e1 running - 6 instance-000015e2 running - 7 instance-000015e3 running - 8 instance-000015e4 running - 9 instance-000015e5 running - 10 instance-000015e6 running - 11 instance-000015e7 running - 12 instance-000015e8 running - 13 instance-000015e9 running - 14 instance-000015ea running - 15 instance-000015eb running - 16 instance-000015ec running - 17 instance-000015ed running - 18 instance-000015ee running - 19 instance-000015ef running - 20 instance-000015f0 running - 21 instance-000015f1 running - 22 instance-000015f2 running - 23 instance-000015f3 running - 24 instance-000015f4 running - 25 instance-000015f5 running - 26 instance-000015f6 running - 27 instance-000015f7 running - 28 instance-000015f8 running - 29 instance-000015f9 running - 30 instance-000015fa running - 31 instance-000015fb running - 32 instance-000015fc running - 33 instance-000015fd running - 34 instance-000015fe running - 35 instance-000015ff running - 36 instance-00001600 running + 2 instance-000015de running + 3 instance-000015df running + 4 instance-000015e0 running + 5 instance-000015e1 running + 6 instance-000015e2 running + 7 instance-000015e3 running + 8 instance-000015e4 running + 9 instance-000015e5 running + 10 instance-000015e6 running + 11 instance-000015e7 running + 12 instance-000015e8 running + 13 instance-000015e9 running + 14 instance-000015ea running + 15 instance-000015eb running + 16 instance-000015ec running + 17 instance-000015ed running + 18 instance-000015ee running + 19 instance-000015ef running + 20 instance-000015f0 running + 21 instance-000015f1 running + 22 instance-000015f2 running + 23 instance-000015f3 running + 24 instance-000015f4 running + 25 instance-000015f5 running + 26 instance-000015f6 running + 27 instance-000015f7 running + 28 instance-000015f8 running + 29 instance-000015f9 running + 30 instance-000015fa running + 31 instance-000015fb running + 32 instance-000015fc running + 33 instance-000015fd running + 34 instance-000015fe running + 35 instance-000015ff running + 36 instance-00001600 running don@nubo-5:~$ sudo service libvirt-bin restart libvirt-bin stop/waiting libvirt-bin start/running, process 22751 don@nubo-5:~$ virsh list error: failed to connect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer If i then run libvirtd manually: root@nubo-5:~# libvirtd -v 2014-11-27 22:38:18.066+0000: 26422: info : libvirt version: 1.2.8, package: 1.2.8-0ubuntu11.1 2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 0 2014-11-27 22:38:18.066+0000: 26422: info : virNetlinkEventServiceStart:521 : starting netlink event service with protocol 15 2014-11-27 22:38:18.073+0000: 26433: info : dnsmasqCapsSetFromBuffer:685 : dnsmasq version is 2.71, --bind-dynamic is present, SO_BINDTODEVICE is in use 2014-11-27 22:38:18.074+0000: 26433: info : networkReloadFirewallRules:1778 : Reloading iptables rules 2014-11-27 22:38:18.074+0000: 26433: info : networkRefreshDaemons:1750 : Refreshing network daemons 2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyGroup:844 : Starting transaction for 0x7f15e40e7110 flags=0 2014-11-27 22:38:18.198+0000: 26433: info : virFirewallApplyRule:785 : Applying rule '/sbin/iptables --version' 2014-11-27 22:38:18.207+0000: 26433: info : libxlDriverShouldLoad:241 : Disabling driver as /proc/xen/capabilities does not exist 2014-11-27 22:38:18.250+0000: 26433: info : virDomainObjListLoadAllConfigs:18944 : Scanning for configs in /var/run/libvirt/qemu 2014-11-27 22:38:18.256+0000: 26433: info : virDomainObjListLoadAllConfigs:18968 : Loading config file 'instance-000015fd.xml' - ... + ... 2014-11-27 22:38:18.385+0000: 26441: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. process 26422: The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details. Most likely, the application was supposed to call dbus_connection_close(), since this is a private connection. 2014-11-27 22:38:18.387+0000: 26439: warning : cg_detect_placement:561 : Failed to get cgroup path for cpu 2014-11-27 22:38:18.392+0000: 26445: error : cgm_dbus_connect:76 : cgmanager: Error pinging manager: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. - cgm ping returns true, so cgmanager is presumably ok. sometimes when doing the libvirtd -v manually it does a segfault instead of an assert. sometimes the assertion is different: (null):cgmanager-client.c:1015: Assertion failed in cgmanager_get_pid_cgroup_sync: proxy != NULL Segmentation fault (core dumped) or (null):alloc.c:315: Assertion failed in nih_free: ptr != NULL (null):alloc.c:315: Assertion failed in nih_free: ptr != NULL Segmentation fault (core dumped) - --- + --- ApportVersion: 2.14.7-0ubuntu8 Architecture: amd64 DistroRelease: Ubuntu 14.10 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.16.0-25-generic root=UUID=a58668fa-f6db-4941-84eb-c89e102971e1 ro splash quiet vt.handoff=7 ProcEnviron: - LANGUAGE=en_CA:en - TERM=screen - PATH=(custom, no user) - LANG=en_CA.UTF-8 - SHELL=/bin/bash + LANGUAGE=en_CA:en + TERM=screen + PATH=(custom, no user) + LANG=en_CA.UTF-8 + SHELL=/bin/bash ProcVersionSignature: Ubuntu 3.16.0-25.33-generic 3.16.7 Tags: utopic utopic Uname: Linux 3.16.0-25-generic x86_64 UnreportableReason: The report belongs to a package that is not installed. UpgradeStatus: Upgraded to utopic on 2014-10-19 (39 days ago) UserGroups: - + _MarkForUpload: True modified.conffile..etc.apparmor.d.abstractions.libvirt.qemu: [modified] modified.conffile..etc.apparmor.d.usr.sbin.libvirtd: [modified] mtime.conffile..etc.apparmor.d.abstractions.libvirt.qemu: 2014-10-23T03:29:38.231519 mtime.conffile..etc.apparmor.d.usr.sbin.libvirtd: 2014-10-23T03:18:18.057906
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1397130 Title: libvirt-bin crashes / refuses to restart if cgmanager is restarted To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1397130/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs