Public bug reported: Various things come up in IRC every once in a while about configuration options that need to be tweaked at large scale (blizzard, cern, etc) which once you hit hundreds or thousands of compute nodes need to be changed to avoid killing the control plane.
One such option is this: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval >From a blizzard operator: (3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane (3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes (3:06:26 PM) eandersson: We also ended up adding a variance Similarly, CERN had to totally disable this one: https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh And rely on SIGHUP / restart of the service if they needed to refresh that cache. We should put these things in the admin docs as we come across them so we don't forget about this stuff when new operators/users come along and hit scaling issues. ** Affects: nova Importance: Undecided Status: New ** Tags: docs performance -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1838819 Title: Docs needed for tunables at large scale Status in OpenStack Compute (nova): New Bug description: Various things come up in IRC every once in a while about configuration options that need to be tweaked at large scale (blizzard, cern, etc) which once you hit hundreds or thousands of compute nodes need to be changed to avoid killing the control plane. One such option is this: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval From a blizzard operator: (3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane (3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes (3:06:26 PM) eandersson: We also ended up adding a variance Similarly, CERN had to totally disable this one: https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh And rely on SIGHUP / restart of the service if they needed to refresh that cache. We should put these things in the admin docs as we come across them so we don't forget about this stuff when new operators/users come along and hit scaling issues. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1838819/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp