[Expired for neutron because there has been no activity for 60 days.] ** Changed in: neutron Status: Incomplete => Expired
-- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1223369 Title: Metadata ns proxy didn't start - pid already exist. Daemon already running? Status in neutron: Expired Bug description: This failure happened just once. Levels are Ubuntu Raring 13.04, Grizzly Quantum packages at 1:2013.1.2-0ubuntu1. I noticed the metadata namespace proxy hadn't started after the network node was booted. The l3-agent.log (was only at INFO) has: 2013-09-04 15:53:16 INFO [quantum.openstack.common.rpc.common] Connected to AMQP server on 10.0.10.10:5672 2013-09-04 15:53:16 INFO [quantum.agent.l3_agent] L3 agent started 2013-09-04 15:53:28 ERROR [quantum.agent.l3_agent] Failed synchronizing routers Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 638, in _sync_routers_task self._process_routers(routers, all_routers=True) File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 618, in _process_routers self._router_added(r['id'], r) File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 236, in _router_added self._spawn_metadata_proxy(ri) File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 270, in _spawn_metadata_proxy pm.enable(callback) File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py", line 55, in enable ip_wrapper.netns.execute(cmd) File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line 414, in execute check_exit_code=check_exit_code) File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line 61, in execute raise RuntimeError(m) RuntimeError: Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7', 'quantum-ns-metadata-proxy', '--pid_file=/var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid', '--router_id=fa2ec96d-d1f9-4af2-a022-cac171646aa7', '--state_path=/var/lib/quantum', '--metadata_port=9697', '--verbose', '--log-file=quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log', '--log-dir=/var/log/quantum'] Exit code: 1 Stdout: '' Stderr: '2013-09-04 15:53:28 INFO [quantum.common.config] Logging enabled!\n2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile /var /lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running?\n' And quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log has: 2013-08-29 19:04:04 INFO [quantum.common.config] Logging enabled! 2013-09-04 15:53:28 INFO [quantum.common.config] Logging enabled! 2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile /var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running? It is the same error message as https://bugs.launchpad.net/neutron/+bug/1177416 - but the patch from that bug was applied. The file /lib/quantum/external/pids/fa2ec96d- d1f9-4af2-a022-cac171646aa7.pid had 2045 in it - but no process with pid 2045 was running when I checked - /proc/2045/ did not exist. The pid file was stale as its date was that of the previous launch. The process call chain in short-hand is like this: l3-agent --> sudo rootwrap... --> python rootwrap ip netns exec qrouter-uuid quantum-ns-metadata-proxy router_id=uuid... --> python quantum-ns-metadata-proxy router_id=uuid... Now the code in external_process.py either didn't find a /proc/2045/cmdline, or if it did then that file did not have the strings 'python' and 'fa2ec96d-d1f9-4af2-a022-cac171646aa7'. But the code in daemon.py must have found a /proc/2045/cmdline and it must have had those strings. The only explaination I can give for this is that the python rootwrap process started by sudo just happened to get pid 2045 that time, and this is what daemon.py is_running() found. Its full command line would have looked like: /usr/bin/python /usr/bin/quantum-rootwrap /etc/quantum/rootwrap.conf ip netns exec qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7 quantum-ns- metadata-proxy --pid_file=/var/lib/quantum/external/pids/fa2ec96d- d1f9-4af2-a022-cac171646aa7.pid --router_id=fa2ec96d- d1f9-4af2-a022-cac171646aa7 --state_path=/var/lib/quantum --metadata_port=9697 --verbose --log-file=quantum-ns-metadata- proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log --log- dir=/var/log/quantum It has the strings 'python' and the router's uuid, so it would have matched. If my theory is right, then a possible fix would be to change the checks to not report cmdlines with 'ip\x00netns\x00exec' as a running daemon. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1223369/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp