Public bug reported: dhcp-agent on a neutron host started dying with
--- 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent Traceback (most recent call last): 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent **args) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 45, in dispatch 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent neutron_ctxt, version, method, namespace, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent result = getattr(proxyobj, method)(ctxt, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 92, in get_active_networks_info 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent networks = self._get_active_networks(context, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 42, in _get_active_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent plugin.auto_schedule_networks(context, host) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 222, in auto_schedule_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent self.network_scheduler.auto_schedule_networks(self, context, host) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/scheduler/dhcp_agent_scheduler.py", line 122, in auto_schedule_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent context, [net_id], active=True) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 126, in get_dhcp_agents_hosting_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent binding.dhcp_agent)] 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 83, in is_eligible_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent agent['heartbeat_timestamp']) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent TypeError: 'NoneType' object has no attribute '__getitem__' 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent --- further investigation on the neutron server found that networkdhcpbindings seems to have got out of sync we can see just one dhcp-agent --- MariaDB [neutron]> select id from agents where agent_type="DHCP agent"; +--------------------------------------+ | id | +--------------------------------------+ | 6923675d-5616-4ffe-b2c4-4d130f67973f | +--------------------------------------+ 1 row in set (0.00 sec) --- but in the network bindings, at least 3 are listed ---- MariaDB [neutron]> select DISTINCT(dhcp_agent_id) from networkdhcpagentbindings; +--------------------------------------+ | dhcp_agent_id | +--------------------------------------+ | 6923675d-5616-4ffe-b2c4-4d130f67973f | | b23f9f97-da04-4f61-bcfb-f8514e43cefd | | d3e3ac5b-9962-428a-a9f8-6b2a1aba48d8 | +--------------------------------------+ 3 rows in set (0.00 sec) --- neutron/db/agentschedulers_db.py:get_dhcp_agents_hosting_networks() [1] doesn't expect this case of a binding with no associated agent, so it goes off an tries to find the latest heartbeat for the agent, which leads to the original traceback. I unfortunately have no idea how this got out of sync; the host has been upgraded from juno->kilo and neutron and dhcp-agents have been restarted many, many times for various reasons. I found two other references [2],[3] [1] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/agentschedulers_db.py#n107 [2] http://www.gossamer-threads.com/lists/openstack/dev/42060 [3] http://lists.openstack.org/pipermail/openstack/2013-October/002195.html ** Affects: neutron Importance: Undecided Assignee: Ian Wienand (iwienand) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1398988 Title: dhcp-agent / network bindings out of sync, stopping dhcp-agent Status in OpenStack Neutron (virtual network service): In Progress Bug description: dhcp-agent on a neutron host started dying with --- 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent Traceback (most recent call last): 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent **args) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 45, in dispatch 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent neutron_ctxt, version, method, namespace, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent result = getattr(proxyobj, method)(ctxt, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 92, in get_active_networks_info 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent networks = self._get_active_networks(context, **kwargs) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/dhcp_rpc_base.py", line 42, in _get_active_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent plugin.auto_schedule_networks(context, host) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 222, in auto_schedule_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent self.network_scheduler.auto_schedule_networks(self, context, host) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/scheduler/dhcp_agent_scheduler.py", line 122, in auto_schedule_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent context, [net_id], active=True) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 126, in get_dhcp_agents_hosting_networks 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent binding.dhcp_agent)] 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 83, in is_eligible_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent agent['heartbeat_timestamp']) 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent TypeError: 'NoneType' object has no attribute '__getitem__' 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent 2014-12-02 20:52:20.739 17097 TRACE neutron.agent.dhcp_agent --- further investigation on the neutron server found that networkdhcpbindings seems to have got out of sync we can see just one dhcp-agent --- MariaDB [neutron]> select id from agents where agent_type="DHCP agent"; +--------------------------------------+ | id | +--------------------------------------+ | 6923675d-5616-4ffe-b2c4-4d130f67973f | +--------------------------------------+ 1 row in set (0.00 sec) --- but in the network bindings, at least 3 are listed ---- MariaDB [neutron]> select DISTINCT(dhcp_agent_id) from networkdhcpagentbindings; +--------------------------------------+ | dhcp_agent_id | +--------------------------------------+ | 6923675d-5616-4ffe-b2c4-4d130f67973f | | b23f9f97-da04-4f61-bcfb-f8514e43cefd | | d3e3ac5b-9962-428a-a9f8-6b2a1aba48d8 | +--------------------------------------+ 3 rows in set (0.00 sec) --- neutron/db/agentschedulers_db.py:get_dhcp_agents_hosting_networks() [1] doesn't expect this case of a binding with no associated agent, so it goes off an tries to find the latest heartbeat for the agent, which leads to the original traceback. I unfortunately have no idea how this got out of sync; the host has been upgraded from juno->kilo and neutron and dhcp-agents have been restarted many, many times for various reasons. I found two other references [2],[3] [1] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/agentschedulers_db.py#n107 [2] http://www.gossamer-threads.com/lists/openstack/dev/42060 [3] http://lists.openstack.org/pipermail/openstack/2013-October/002195.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1398988/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp