We noticed an issue in one of our larger clouds (~700 hypervisors and ovs 
agents) where (Liberty) neutron-server CPU and RAM load would spike up quite a 
bit whenever a DHCP agent port was updated.  So much load that processes were 
getting OOM killed on our API servers, and so many queries were going to the 
database that it was affecting performance of other APIs (sharing the same 
database cluster.)

Kris Lindgren determined what was happening: any time a DHCP port is changed, 
Neutron forces a complete refresh of all security group filter rules for all 
ports on the same network as the DHCP port.  We run only provider networks 
which VMs plug directly into, and our largest network has several thousand 
ports.  This was generating an avalanche of RPCs from the OVS agents, thus 
loading up neutron-server with a lot of work.

We only use DHCP has a backup network configuration mechanism in case something 
goes wrong with config-drive, so we are not huge users of it.  But DHCP agents 
are being scheduled and removed often enough, and our networks contain a large 
enough number of ports, that this has begun affecting us quite a bit.

Kevin Benton suggested a minor patch [1] to disable the blanket refresh of 
security group filters on DHCP port changes.  I tested it out in our staging 
environment and can confirm that:

-          Security group filters are indeed not refreshed on DHCP port changes

-          iptables rules generated for regular VM ports still include the 
generic rules to allow the DHCP ports 67 and 68 regardless of the presence of a 
DHCP agent on the network or not.  (This covers the scenario where VM ports are 
created while there are no DHCP agents, and a DHCP agent is added later.)

I think there are some plans to deprecate this behavior.  As far as I know, it 
still exists in master neutron.  I’m happy to put the trivial patch up for 
review if people think this is a useful change to Neutron.

We are a bit of an edge case with such large number of ports per network.  We 
are also considering disabling DHCP altogether since it is really not used.  
But, wanted to share the experience with others in case people are running into 
the same issue.

Thanks,
Mike

[1] https://gist.github.com/misterdorm/37a8997aed43081bac8d12c7f101853b
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to