** Description changed:

+ [Impact]
+ Restarts of openvswitch (typically on upgrade) result in loss of tunnel 
connectivity when the l2population driver is in use.  This results in loss of 
access to all instances on the effected compute hosts
+ 
+ [Test Case]
+ Deploy cloud with ml2/ovs/l2population enabled
+ boot instances
+ restart ovs; instance connectivity will be lost until the 
neutron-openvswitch-agent is restarted on the compute hosts.
+ 
+ [Regression Potential]
+ Minimal - in multiple stable branches upstream.
+ 
+ [Original Bug Report]
  On 2015-05-28, our Landscape auto-upgraded packages on two of our
  OpenStack clouds.  On both clouds, but only on some compute nodes, the
  upgrade of openvswitch-switch and corresponding downtime of
  ovs-vswitchd appears to have triggered some sort of race condition
  within neutron-plugin-openvswitch-agent leaving it in a broken state;
  any new instances come up with non-functional network but pre-existing
  instances appear unaffected.  Restarting n-p-ovs-agent on the affected
  compute nodes is sufficient to work around the problem.
  
  The packages Landscape upgraded (from /var/log/apt/history.log):
  
  Start-Date: 2015-05-28  14:23:07
  Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), 
systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 
2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 
2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 
204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), 
grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), 
openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), 
libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 
(4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 
0.13.0-1ubuntu
 2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), 
grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 
(2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 
(2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 
(2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
  End-Date: 2015-05-28  14:24:47
  
  From /var/log/neutron/openvswitch-agent.log:
  
  2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
  [-] Error received from ovsdb monitor: ovsdb-client:
  unix:/var/run/openvswitch/db.sock: receive failed (End of file)
  
  Looking at a stuck instances, all the right tunnels and bridges and
  what not appear to be there:
  
  root@vector:~# ip l l | grep c-3b
- 460002: qbr7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default 
+ 460002: qbr7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP mode DEFAULT group default
  460003: qvo7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 
qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
  460004: qvb7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 
qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 
1000
  460005: tap7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 
500
  root@vector:~# ovs-vsctl list-ports br-int | grep c-3b
  qvo7ed8b59c-3b
- root@vector:~# 
+ root@vector:~#
  
  But I can't ping the unit from within the qrouter-${id} namespace on
  the neutron gateway.  If I tcpdump the {q,t}*c-3b interfaces, I don't
  see any traffic.

** Changed in: neutron (Ubuntu Trusty)
       Status: New => In Progress

** Changed in: neutron (Ubuntu Trusty)
     Assignee: (unassigned) => James Page (james-page)

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/juno
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/kilo
   Importance: Undecided
       Status: New

** Changed in: cloud-archive/kilo
   Importance: Undecided => Medium

** Changed in: cloud-archive/juno
   Importance: Undecided => Medium

** Changed in: neutron (Ubuntu Trusty)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1460164

Title:
  restart of openvswitch-switch causes instance network down when
  l2population enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1460164/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to