** Also affects: nova/icehouse Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1255594
Title: neutron glue code creates tokens excessively, still Status in OpenStack Compute (Nova): Fix Released Status in OpenStack Compute (nova) icehouse series: New Bug description: Reusing keystone tokens improves OpenStack efficiency and performance. For operations that require a token, reusing a token avoids the overhead of a request to keystone. For operations that validate tokens, reused tokens improve the hit rate of authentication caches (e.g., in keystoneclient.middleware). In both cases, the load on the keystone server is reduced, thus improving the response time for requests that do require new tokens or token validation. Finally, since token validation is so CPU intensive, improved auth cache hit rate can significantly reduce CPU utilization by keystone. In spite of the progress made by http://github.com/openstack/nova/commit/85332012dede96fa6729026c2a90594ea0502ac5, which was committed to address bug #1250580, the neutronv2 network API code in nova-compute creates more tokens than necessary, to the point where performance degradation is measurable when creating a large number of instances. Prior to the aforementioned change, nova-compute created a new admin token for accessing neutron virtually every time a call was made into nova.network.neutronv2. With aforementioned change, a token is created once per "thread" (i.e., green thread); thus multiple calls into neutronv2 can share a token. For example, during instance creation, a single token is created then reused 6 times; prior to the patch, 7 tokens would have been created by nova.network.neutronv2 per "nova boot". However, this scheme is far from optimal. Given that tokens, by default, have a shelf life of 24H, a single token could be shared by _all_ nova.network.neutronv2 calls in a 24-hour period. The performance impact of sharing a single neutronv2 admin token is easy to observe when creating a large number of instances in parallel. In this example, I boot 40 instances in parallel, ping them, then delete them. I'm using a 24-core machine with enough RAM and disk throughput to never become bottlenecks. Note that I'm running with multiple keystone-all worker processes (https://review.openstack.org/#/c/42967/). Using the per-thread tokens, the last instance becomes active after 40s and the last instance is deleted after 65s. Using a single shared token, the last instance becomes active after 32s and the last instance is deleted after 60s. During the token-per-thread run, keystone-all processes had 900% CPU utilization (i.e., 9 x 100% of a single core) for the first ~10s, then stayed in the 50-100% range for the rest of the run. In the single token run, the keystone-all processes never exceeded 150% CPU utilization. I focused on the nova.network.neutronv2 because it created the most tokens during my parallel boot experiment. However there are other excessive token offenders. After fixing nova.network.neutronv2, the leading auth requestors are glance-index and glance-registry due to a high auth cache miss rate. I'm not sure who's creating those new tokens however. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1255594/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp