Reviewed: https://review.openstack.org/631275 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=284afcf24f2dfd3af20a803ada79c69205249f25 Submitter: Zuul Branch: master
commit 284afcf24f2dfd3af20a803ada79c69205249f25 Author: Slawek Kaplonski <skapl...@redhat.com> Date: Wed Jan 16 17:13:08 2019 +0100 Lock privileged.agent.linux.ip_lib functions It is workaround of bug in pyroute2 library which, when running in multithread environment, sometimes have issues with NetNS class. When NetNS.__init__() is called, it uses os.pipe() function to create 2 file descriptors which are used to communicated between 2 processes. In some cases when multiple threads are running it might happen that in two instances of NetNS() class there will be same file descriptors used and that leads to problems when one thread closes file descriptor and second still wants to use it. With this patch functions which uses instance of pyroute2.NetNS class are locked thus there shouldn't be risk of using same file descriptors in 2 separate threads. Co-Authored-By: Rodolfo Alonso Hernandez <ralon...@redhat.com> Change-Id: Id5e6f2f8e9c31a7138da9cd6792e9d75845b81c5 Closes-Bug: #1811515 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1811515 Title: pyroute2.NetNS don't work properly with concurrency in oslo.privsep Status in neutron: Fix Released Bug description: It is quite common case that tempest/scenario test jobs are failing because SSH to VM through floating IP fails. It looks that in many (I'm not sure if all) case there are some errors from pyroute2 library in L3 agent logs and that looks like reason of many of such SSH failures. Example of such fail in logs is e.g. in http://logs.openstack.org/43/628943/2/check/tempest- full/b955419/controller/logs/screen-q-l3.txt.gz#_Jan_11_23_41_31_955036 Stacktrace looks like below: Jan 11 23:41:31.954562 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR pyroute2.netns.nslink [-] forced shutdown procedure, clean up netns manually Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: DEBUG oslo.privsep.daemon [-] privsep: Exception during request[139718939505360]: [Errno 9] Bad file descriptor {{(pid=6501) _process_cmd /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:452}} Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: Traceback (most recent call last): Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 447, in _process_cmd Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ret = func(*f_args, **f_kwargs) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 216, in _wrap Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: return func(*args, **kwargs) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 244, in add_ip_address Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: scope=_get_scope_name(scope)) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 220, in _run_iproute_addr Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: with _get_iproute(namespace) as ip: Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/opt/stack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 169, in _get_iproute Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: return pyroute2.NetNS(namespace, flags=0) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/pyroute2/netns/nslink.py", line 170, in __init__ Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: super(NetNS, self).__init__(trnsp_in, trnsp_out) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/pyroute2/iproute/linux.py", line 127, in __init__ Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: super(RTNL_API, self).__init__(*argv, **kwarg) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/pyroute2/remote/__init__.py", line 196, in __init__ Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: init = self.trnsp_in.recv_cmd() Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/pyroute2/remote/__init__.py", line 77, in recv_cmd Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ret = self.__recv() Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: File "/usr/local/lib/python2.7/dist-packages/pyroute2/remote/__init__.py", line 51, in __recv Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: dump.write(self.file_obj.read(length - 8)) Jan 11 23:41:31.955036 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: OSError: [Errno 9] Bad file descriptor Jan 11 23:41:31.957344 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: DEBUG oslo.privsep.daemon [-] privsep: reply[139718939505360]: (5, 'exceptions.OSError', (9, 'Bad file descriptor')) {{(pid=6501) _call_back /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:473}} Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info [-] [Errno 88] Socket operation on non-socket: error: [Errno 88] Socket operation on non-socket Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info Traceback (most recent call last): Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/common/utils.py", line 158, in call Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info return func(*args, **kwargs) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 1175, in process Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info self._process_internal_ports() Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 560, in _process_internal_ports Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info self.internal_network_added(p) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 477, in internal_network_added Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info mtu=port.get('mtu')) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 452, in _internal_network_added Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info prefix=prefix, mtu=mtu) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 276, in plug Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info bridge, namespace, prefix, mtu) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 359, in plug_new Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info self.check_bridge_exists(bridge) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 233, in check_bridge_exists Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info if not ip_lib.device_exists(bridge): Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 799, in device_exists Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info return IPDevice(device_name, namespace=namespace).exists() Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 305, in exists Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info return privileged.interface_exists(self.name, self.namespace) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 214, in _wrap Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info return self.channel.remote_call(name, args, kwargs) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info raise exc_type(*result[2]) Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info error: [Errno 88] Socket operation on non-socket Jan 11 23:41:31.966453 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.router_info Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a4f60f08-8daa-4361-8b85-1a9b770ee03c: error: [Errno 88] Socket operation on non-socket Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent Traceback (most recent call last): Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/agent.py", line 623, in _process_routers_if_compatible Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/agent.py", line 498, in _process_router_if_compatible Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self._process_added_router(router) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/agent.py", line 506, in _process_added_router Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent ri.process() Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/common/utils.py", line 161, in call Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self.logger(e) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self.force_reraise() Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/common/utils.py", line 158, in call Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent return func(*args, **kwargs) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 1175, in process Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self._process_internal_ports() Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 560, in _process_internal_ports Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self.internal_network_added(p) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 477, in internal_network_added Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent mtu=port.get('mtu')) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/l3/router_info.py", line 452, in _internal_network_added Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent prefix=prefix, mtu=mtu) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 276, in plug Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent bridge, namespace, prefix, mtu) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 359, in plug_new Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent self.check_bridge_exists(bridge) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/linux/interface.py", line 233, in check_bridge_exists Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent if not ip_lib.device_exists(bridge): Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 799, in device_exists Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent return IPDevice(device_name, namespace=namespace).exists() Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 305, in exists Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent return privileged.interface_exists(self.name, self.namespace) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 214, in _wrap Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent return self.channel.remote_call(name, args, kwargs) Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call Jan 11 23:41:31.973050 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent raise exc_type(*result[2]) Jan 11 23:41:31.977173 ubuntu-bionic-rax-iad-0001661199 neutron-l3-agent[5600]: ERROR neutron.agent.l3.agent error: [Errno 88] Socket operation on non-socket It don't look like related to recent Oslo.privsep update as according to logstash it happened also before oslo.privsep 1.31.0. I don't know since when it is like that but looks that recently it started to happens more often than in the past. It looks for me like some race between threads/workers that one tries to read (or write sometimes) something from opened file descriptor but something else already closed it. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1811515/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp