Public bug reported: A bug was introduced in https://review.openstack.org/#/c/227589/ . The idea of that patch was to improve performance by not shelling out to "ip netns list" to get a list of network namespaces and doing it in python instead (os.listdir).
The iproute2 C code which implements "ip netns list" will first check if the "/var/run/netns" directory exists, before trying to enumerate the contents. The Python code tries to enumerate the directory contents (os.listdir), but doesn't check for the directory's existence. ip netns add would be able to create the directory. However, since an exception is thrown, that code path is no longer reached. The result is that the agents are non-functional when the directory is not present, and are unable to recover on their own. When use_helper_for_ns_read is True (the default value), then the existence of the directory is a non-issue, as "ip netns list" is run instead and sidesteps the broken behavior. Steps to reproduce: 1- Start with a machine with no /var/run/netns directory (such as a newly provisioned VM) 2- Disable use_helper_for_ns_read. In devstack: [[post-config|$NEUTRON_CONF]] [agent] use_helper_for_ns_read=False 3- Run stack.sh 4- At this point, q-l3 errors should already start appearing in the logs 5- Create a new network and subnetwork 6- There will be stacktraces in the dhcp agent logs. 7- Observe that no router or dhcp namespaces were created Expected behavior: - No errors in the logs - /var/run/netns directory and mountpoint created (if not yet present) - network namespaces are created One possible fix would be to merge (https://review.openstack.org/#/c/237653/) and restore the old behavior. q-dhcp.log: Unable to plug DHCP port for network 37b27bd3-5072-4c3a-b26d-b7b67c2bc788. Releasing port. 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp Traceback (most recent call last): 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp mtu=network.get('mtu')) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 250, in plug 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp bridge, namespace, prefix, mtu) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 354, in plug_new 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp namespace_obj = ip.ensure_namespace(namespace) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 195, in ensure_namespace 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp if not self.netns.exists(name): 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 883, in exists 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp return name in os.listdir(IP_NETNS_PATH) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp OSError: [Errno 2] No such file or directory: '/var/run/netns' Unable to enable dhcp for 37b27bd3-5072-4c3a-b26d-b7b67c2bc788. 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent Traceback (most recent call last): 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/dhcp/agent.py", line 112, in call_driver 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 209, in enable 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1240, in setup 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent self.plugin.release_dhcp_port(network.id, port.device_id) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent self.force_reraise() 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent six.reraise(self.type_, self.value, self.tb) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent mtu=network.get('mtu')) q-l3.log(repeatedly): 2016-04-22 23:54:14.659 ESC[01;31mERROR oslo_service.periodic_task req-43829b71-bb53-48d1-826b-7421a9a3612e None None Error during L3NATAgentWithStateReport.periodic_sync_routers_task 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task Traceback (most recent call last): 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task task(self, context) 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/agent.py", line 540, in periodic_sync_routers_task 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task with self.namespaces_manager as ns_manager: 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/namespace_manager.py", line 72, in __enter__ 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task self._all_namespaces = self.list_all() 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/namespace_manager.py", line 119, in list_all 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task namespaces = root_ip.get_namespaces() 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 250, in get_namespaces 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task return os.listdir(IP_NETNS_PATH) 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task OSError: [Errno 2] No such file or directory: '/var/run/netns' ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1574881 Title: Optimization for use_helper_for_ns_read crashes dhcp agent and l3 agent Status in neutron: New Bug description: A bug was introduced in https://review.openstack.org/#/c/227589/ . The idea of that patch was to improve performance by not shelling out to "ip netns list" to get a list of network namespaces and doing it in python instead (os.listdir). The iproute2 C code which implements "ip netns list" will first check if the "/var/run/netns" directory exists, before trying to enumerate the contents. The Python code tries to enumerate the directory contents (os.listdir), but doesn't check for the directory's existence. ip netns add would be able to create the directory. However, since an exception is thrown, that code path is no longer reached. The result is that the agents are non-functional when the directory is not present, and are unable to recover on their own. When use_helper_for_ns_read is True (the default value), then the existence of the directory is a non-issue, as "ip netns list" is run instead and sidesteps the broken behavior. Steps to reproduce: 1- Start with a machine with no /var/run/netns directory (such as a newly provisioned VM) 2- Disable use_helper_for_ns_read. In devstack: [[post-config|$NEUTRON_CONF]] [agent] use_helper_for_ns_read=False 3- Run stack.sh 4- At this point, q-l3 errors should already start appearing in the logs 5- Create a new network and subnetwork 6- There will be stacktraces in the dhcp agent logs. 7- Observe that no router or dhcp namespaces were created Expected behavior: - No errors in the logs - /var/run/netns directory and mountpoint created (if not yet present) - network namespaces are created One possible fix would be to merge (https://review.openstack.org/#/c/237653/) and restore the old behavior. q-dhcp.log: Unable to plug DHCP port for network 37b27bd3-5072-4c3a-b26d-b7b67c2bc788. Releasing port. 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp Traceback (most recent call last): 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp mtu=network.get('mtu')) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 250, in plug 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp bridge, namespace, prefix, mtu) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 354, in plug_new 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp namespace_obj = ip.ensure_namespace(namespace) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 195, in ensure_namespace 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp if not self.netns.exists(name): 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 883, in exists 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp return name in os.listdir(IP_NETNS_PATH) 2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp OSError: [Errno 2] No such file or directory: '/var/run/netns' Unable to enable dhcp for 37b27bd3-5072-4c3a-b26d-b7b67c2bc788. 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent Traceback (most recent call last): 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/dhcp/agent.py", line 112, in call_driver 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 209, in enable 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1240, in setup 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent self.plugin.release_dhcp_port(network.id, port.device_id) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent self.force_reraise() 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent six.reraise(self.type_, self.value, self.tb) 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent File "/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup 2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent mtu=network.get('mtu')) q-l3.log(repeatedly): 2016-04-22 23:54:14.659 ESC[01;31mERROR oslo_service.periodic_task req-43829b71-bb53-48d1-826b-7421a9a3612e None None Error during L3NATAgentWithStateReport.periodic_sync_routers_task 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task Traceback (most recent call last): 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task task(self, context) 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/agent.py", line 540, in periodic_sync_routers_task 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task with self.namespaces_manager as ns_manager: 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/namespace_manager.py", line 72, in __enter__ 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task self._all_namespaces = self.list_all() 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/l3/namespace_manager.py", line 119, in list_all 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task namespaces = root_ip.get_namespaces() 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 250, in get_namespaces 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task return os.listdir(IP_NETNS_PATH) 2016-04-22 23:54:14.659 TRACE oslo_service.periodic_task OSError: [Errno 2] No such file or directory: '/var/run/netns' To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1574881/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp