[Yahoo-eng-team] [Bug 1574881] [NEW] Optimization for use_helper_for_ns_read crashes dhcp agent and l3 agent

2016-04-25 Thread Stephen Eilert
Public bug reported:

A bug was introduced in https://review.openstack.org/#/c/227589/ .  The
idea of that patch was to improve performance by not shelling out to "ip
netns list" to get a list of network namespaces and doing it in python
instead (os.listdir).

The iproute2 C code which implements "ip netns list" will first check if
the "/var/run/netns" directory exists, before trying to enumerate the
contents. The Python code tries to enumerate the directory contents
(os.listdir), but doesn't check for the directory's existence. ip netns
add would be able to create the directory. However, since an exception
is thrown, that code path is no longer reached.

The result is that the agents are non-functional when the directory is
not present, and are unable to recover on their own.


When use_helper_for_ns_read is True (the default value), then the existence of 
the directory is a non-issue, as "ip netns list" is run instead and sidesteps 
the broken behavior.

Steps to reproduce:

1- Start with a machine with no /var/run/netns directory (such as a newly 
provisioned VM)
2- Disable use_helper_for_ns_read.
In devstack:
[[post-config|$NEUTRON_CONF]]
[agent]
use_helper_for_ns_read=False

3- Run stack.sh
4- At this point, q-l3 errors should already start appearing in the logs
5- Create a new network and subnetwork
6- There will be stacktraces in the dhcp agent logs.
7- Observe that no router or dhcp namespaces were created

Expected behavior:

- No errors in the logs
- /var/run/netns directory and mountpoint created (if not yet present)
- network namespaces are created

One possible fix would be to merge
(https://review.openstack.org/#/c/237653/) and restore the old behavior.


q-dhcp.log:

Unable to plug DHCP port for network 37b27bd3-5072-4c3a-b26d-b7b67c2bc788. 
Releasing port.
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp Traceback (most recent 
call last):
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp   File 
"/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp 
mtu=network.get('mtu'))
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp   File 
"/opt/stack/neutron/neutron/agent/linux/interface.py", line 250, in plug
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp bridge, namespace, 
prefix, mtu)
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp   File 
"/opt/stack/neutron/neutron/agent/linux/interface.py", line 354, in plug_new
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp namespace_obj = 
ip.ensure_namespace(namespace)
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp   File 
"/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 195, in 
ensure_namespace
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp if not 
self.netns.exists(name):
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp   File 
"/opt/stack/neutron/neutron/agent/linux/ip_lib.py", line 883, in exists
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp return name in 
os.listdir(IP_NETNS_PATH)
2016-04-22 23:53:21.492 TRACE neutron.agent.linux.dhcp OSError: [Errno 2] No 
such file or directory: '/var/run/netns'


Unable to enable dhcp for 37b27bd3-5072-4c3a-b26d-b7b67c2bc788.
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent Traceback (most recent 
call last):
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/opt/stack/neutron/neutron/agent/dhcp/agent.py", line 112, in call_driver
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent getattr(driver, 
action)(**action_kwargs)
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 209, in enable
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent interface_name = 
self.device_manager.setup(self.network)
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1240, in setup
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent 
self.plugin.release_dhcp_port(network.id, port.device_id)
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent self.force_reraise()
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent 
six.reraise(self.type_, self.value, self.tb)
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent   File 
"/opt/stack/neutron/neutron/agent/linux/dhcp.py", line 1234, in setup
2016-04-22 23:53:21.614 TRACE neutron.agent.dhcp.agent 
mtu=network.get('mtu'))

q-l3.log(repeatedly):

2016-04-22 23:54:14.659 ESC[01;31mERROR oslo_service.periodic_task
req-43829b71-bb53-48d1-826b-7421a9a3612e None None Error during 
L3NATAgentWithStateReport.periodic_sync_routers_task

[Yahoo-eng-team] [Bug 1557640] [NEW] DHCP agent logging is too verbose

2016-03-15 Thread Stephen Eilert
Public bug reported:

When a new port gets added or removed, the DHCP agent is currently
dumping the entirety of the hosts file in the logs. This is a problem in
some deployments we have encountered, some of which with upwards of
1 ports.

Change-Id: I3ad7864eeb2f959549ed356a1e34fa18804395cc removed logging
calls from inside the loop. It also added two new log calls: one at the
beginning of the loop that builds the hosts file in memory and one at
the end, that also includes the full content. Those could be useful
pieces of debugging information. However, adding the host file contents
can get expensive when there are large numbers of ports.

Will propose a patch to keep the log message, but remove the file
contents.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1557640

Title:
  DHCP agent logging is too verbose

Status in neutron:
  New

Bug description:
  When a new port gets added or removed, the DHCP agent is currently
  dumping the entirety of the hosts file in the logs. This is a problem
  in some deployments we have encountered, some of which with upwards of
  1 ports.

  Change-Id: I3ad7864eeb2f959549ed356a1e34fa18804395cc removed logging
  calls from inside the loop. It also added two new log calls: one at
  the beginning of the loop that builds the hosts file in memory and one
  at the end, that also includes the full content. Those could be useful
  pieces of debugging information. However, adding the host file
  contents can get expensive when there are large numbers of ports.

  Will propose a patch to keep the log message, but remove the file
  contents.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1557640/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp