Public bug reported: The HostState object which is used by the scheduler is using the 'stats' property of the compute node to derive its own values, e.g. :
self.stats = compute.stats or {} self.num_instances = int(self.stats.get('num_instances', 0)) self.num_io_ops = int(self.stats.get('io_workload', 0)) self.failed_builds = int(self.stats.get('failed_builds', 0)) These values are used for both filtering and weighing compute hosts. However, the 'stats' property of the compute node is cleared during the periodic update_available_resources() and populated again. The clearing occurs in RT._copy_resources() and it preserves only the old value of 'failed_builds'. This creates a race condition between RT and scheduler which may result into populating wrong values for 'num_io_ops' and 'num_instances' into the HostState object and thus leading to incorrect scheduling decisions. ** Affects: nova Importance: High Assignee: Radoslav Gerganov (rgerganov) Status: In Progress ** Tags: scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1798806 Title: Race condition between RT and scheduler Status in OpenStack Compute (nova): In Progress Bug description: The HostState object which is used by the scheduler is using the 'stats' property of the compute node to derive its own values, e.g. : self.stats = compute.stats or {} self.num_instances = int(self.stats.get('num_instances', 0)) self.num_io_ops = int(self.stats.get('io_workload', 0)) self.failed_builds = int(self.stats.get('failed_builds', 0)) These values are used for both filtering and weighing compute hosts. However, the 'stats' property of the compute node is cleared during the periodic update_available_resources() and populated again. The clearing occurs in RT._copy_resources() and it preserves only the old value of 'failed_builds'. This creates a race condition between RT and scheduler which may result into populating wrong values for 'num_io_ops' and 'num_instances' into the HostState object and thus leading to incorrect scheduling decisions. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1798806/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp