** Changed in: cloud-archive/icehouse Status: Fix Committed => Fix Released
-- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1460164 Title: restart of openvswitch-switch causes instance network down when l2population enabled Status in Ubuntu Cloud Archive: In Progress Status in Ubuntu Cloud Archive icehouse series: Fix Released Status in Ubuntu Cloud Archive juno series: New Status in Ubuntu Cloud Archive kilo series: In Progress Status in neutron: Fix Released Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Trusty: Fix Released Status in neutron source package in Wily: Fix Released Status in neutron source package in Xenial: Fix Released Bug description: [Impact] Restarts of openvswitch (typically on upgrade) result in loss of tunnel connectivity when the l2population driver is in use. This results in loss of access to all instances on the effected compute hosts [Test Case] Deploy cloud with ml2/ovs/l2population enabled boot instances restart ovs; instance connectivity will be lost until the neutron-openvswitch-agent is restarted on the compute hosts. [Regression Potential] Minimal - in multiple stable branches upstream. [Original Bug Report] On 2015-05-28, our Landscape auto-upgraded packages on two of our OpenStack clouds. On both clouds, but only on some compute nodes, the upgrade of openvswitch-switch and corresponding downtime of ovs-vswitchd appears to have triggered some sort of race condition within neutron-plugin-openvswitch-agent leaving it in a broken state; any new instances come up with non-functional network but pre-existing instances appear unaffected. Restarting n-p-ovs-agent on the affected compute nodes is sufficient to work around the problem. The packages Landscape upgraded (from /var/log/apt/history.log): Start-Date: 2015-05-28 14:23:07 Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 0.13.0-1ubuntu 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2) End-Date: 2015-05-28 14:24:47 From /var/log/neutron/openvswitch-agent.log: 2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor [-] Error received from ovsdb monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End of file) Looking at a stuck instances, all the right tunnels and bridges and what not appear to be there: root@vector:~# ip l l | grep c-3b 460002: qbr7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 460003: qvo7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000 460004: qvb7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 1000 460005: tap7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 500 root@vector:~# ovs-vsctl list-ports br-int | grep c-3b qvo7ed8b59c-3b root@vector:~# But I can't ping the unit from within the qrouter-${id} namespace on the neutron gateway. If I tcpdump the {q,t}*c-3b interfaces, I don't see any traffic. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp