[Yahoo-eng-team] [Bug 1434335] [NEW] Change v2 API server group name validation to use the same validation that the other APIs use
Public bug reported: The ServerGroup create v2 API does the following check on the requested name for the group: if not common.VALID_NAME_REGEX.search(value): msg = _("Invalid format for name: '%s'") % value raise nova.exception.InvalidInput(reason=msg) where VALID_NAME_REGEX = re.compile("^(?! )[\w. _-]+(?https://review.openstack.org/#/c/119741/. The purpose of this commit was to make the flavor API and others less restrictive in the characters that are accepted for a name. ** Affects: nova Importance: Undecided Assignee: Jennifer Mulsow (jmulsow) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1434335 Title: Change v2 API server group name validation to use the same validation that the other APIs use Status in OpenStack Compute (Nova): In Progress Bug description: The ServerGroup create v2 API does the following check on the requested name for the group: if not common.VALID_NAME_REGEX.search(value): msg = _("Invalid format for name: '%s'") % value raise nova.exception.InvalidInput(reason=msg) where VALID_NAME_REGEX = re.compile("^(?! )[\w. _-]+(?https://review.openstack.org/#/c/119741/. The purpose of this commit was to make the flavor API and others less restrictive in the characters that are accepted for a name. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1434335/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1431932] [NEW] Make the server group invalid format message more verbose
Public bug reported: The ServerGroup create API does the following check on the requested name for the group: if not common.VALID_NAME_REGEX.search(value): msg = _("Invalid format for name: '%s'") % value raise nova.exception.InvalidInput(reason=msg) where VALID_NAME_REGEX = re.compile("^(?! )[\w. _-]+(? Jennifer Mulsow (jmulsow) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1431932 Title: Make the server group invalid format message more verbose Status in OpenStack Compute (Nova): New Bug description: The ServerGroup create API does the following check on the requested name for the group: if not common.VALID_NAME_REGEX.search(value): msg = _("Invalid format for name: '%s'") % value raise nova.exception.InvalidInput(reason=msg) where VALID_NAME_REGEX = re.compile("^(?! )[\w. _-]+(?https://bugs.launchpad.net/nova/+bug/1431932/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1400860] [NEW] server group policy not honored for targeted evacuations
Public bug reported: This was observed in the Juno release. Because targeted evacuations do not go through the scheduler for policy- based decision making, a VM could be evacuated to a host that would violate the policy of the server group it belongs to. If a VM belongs to a server group, the group policy will need to be checked in the compute manager at the time of evacuation to ensure that: 1. VMs in a server group with affinity rule can't be evacuated. 2. VMs in a server group with anti-affinity rule don't move to a host that would violate the rule. This is related to Bug #1399815, where the same issue is seen with migration. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1400860 Title: server group policy not honored for targeted evacuations Status in OpenStack Compute (Nova): New Bug description: This was observed in the Juno release. Because targeted evacuations do not go through the scheduler for policy-based decision making, a VM could be evacuated to a host that would violate the policy of the server group it belongs to. If a VM belongs to a server group, the group policy will need to be checked in the compute manager at the time of evacuation to ensure that: 1. VMs in a server group with affinity rule can't be evacuated. 2. VMs in a server group with anti-affinity rule don't move to a host that would violate the rule. This is related to Bug #1399815, where the same issue is seen with migration. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1400860/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1399815] [NEW] server group policy not honored for targeted migrations
Public bug reported: This was observed in the Juno release. Because targeted live and cold migrations do not go through the scheduler for policy-based decision making, a VM could be migrated to a host that would violate the policy of the server-group. If a VM belongs to a server group, the group policy will need to be checked in the compute manager at the time of migration to ensure that: 1. VMs in a server group with affinity rule can't be migrated. 2. VMs in a server group with anti-affinity rule don't move to a host that would violate the rule. ** Affects: nova Importance: Undecided Assignee: Jennifer Mulsow (jmulsow) Status: New ** Changed in: nova Assignee: (unassigned) => Jennifer Mulsow (jmulsow) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1399815 Title: server group policy not honored for targeted migrations Status in OpenStack Compute (Nova): New Bug description: This was observed in the Juno release. Because targeted live and cold migrations do not go through the scheduler for policy-based decision making, a VM could be migrated to a host that would violate the policy of the server-group. If a VM belongs to a server group, the group policy will need to be checked in the compute manager at the time of migration to ensure that: 1. VMs in a server group with affinity rule can't be migrated. 2. VMs in a server group with anti-affinity rule don't move to a host that would violate the rule. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1399815/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1376933] [NEW] _poll_unconfirmed_resize timing window causes instance to stay in verify_resize state forever
Public bug reported: If the _poll_unconfirmed_resizes periodic task runs in nova/compute/manager.py:ComputeManager._finish_resize() after the migration record has been updated in the database but before the instances has been updated. 2014-09-30 16:15:00.897 112868 INFO nova.compute.manager [-] Automatically confirming migration 207 for instance 799f9246-bc05-4ae8-8737-4f358240f586 2014-09-30 16:15:01.109 112868 WARNING nova.compute.manager [-] [instance: 799f9246-bc05-4ae8-8737-4f358240f586] Setting migration 207 to error: In states stopped/resize_finish, not RESIZED/None This causes _poll_unconfirmed_resizes to see that the VM task_state is still 'resize_finish' instead of None, and set the migration record to error state. Which in turn causes the VM to be stuck in resizing forever. Two fixes have been proposed for this issue so far but were reverted because they caused other race conditions. See the following two bugs for more details. https://bugs.launchpad.net/nova/+bug/1321298 https://bugs.launchpad.net/nova/+bug/1326778 This timing issue still exists in Juno today in an environment with periodic tasks set to run once every 60 seconds and with a resize_confirm_window of 1 second. Would a possible solution for this be to change the code in _poll_unconfirmed_resizes() to ignore any VMs with a task state of 'resize_finish' instead of setting the corresponding migration record to error? This is the task_state it should have right before changed to None in finish_resize(). Then next time _poll_unconfirmed_resizes() is called, the migration record will still be fetched and the VM will be checked again and in the updated vm_state/task_state. add the following in _poll_unconfirmed_resizes(): # This removes a race condition if task_state == 'resize_finish': continue prior to: elif vm_state != vm_states.RESIZED or task_state is not None: reason = (_("In states %(vm_state)s/%(task_state)s, not " "RESIZED/None") % {'vm_state': vm_state, 'task_state': task_state}) _set_migration_to_error(migration, reason, instance=instance) continue ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1376933 Title: _poll_unconfirmed_resize timing window causes instance to stay in verify_resize state forever Status in OpenStack Compute (Nova): New Bug description: If the _poll_unconfirmed_resizes periodic task runs in nova/compute/manager.py:ComputeManager._finish_resize() after the migration record has been updated in the database but before the instances has been updated. 2014-09-30 16:15:00.897 112868 INFO nova.compute.manager [-] Automatically confirming migration 207 for instance 799f9246-bc05-4ae8-8737-4f358240f586 2014-09-30 16:15:01.109 112868 WARNING nova.compute.manager [-] [instance: 799f9246-bc05-4ae8-8737-4f358240f586] Setting migration 207 to error: In states stopped/resize_finish, not RESIZED/None This causes _poll_unconfirmed_resizes to see that the VM task_state is still 'resize_finish' instead of None, and set the migration record to error state. Which in turn causes the VM to be stuck in resizing forever. Two fixes have been proposed for this issue so far but were reverted because they caused other race conditions. See the following two bugs for more details. https://bugs.launchpad.net/nova/+bug/1321298 https://bugs.launchpad.net/nova/+bug/1326778 This timing issue still exists in Juno today in an environment with periodic tasks set to run once every 60 seconds and with a resize_confirm_window of 1 second. Would a possible solution for this be to change the code in _poll_unconfirmed_resizes() to ignore any VMs with a task state of 'resize_finish' instead of setting the corresponding migration record to error? This is the task_state it should have right before changed to None in finish_resize(). Then next time _poll_unconfirmed_resizes() is called, the migration record will still be fetched and the VM will be checked again and in the updated vm_state/task_state. add the following in _poll_unconfirmed_resizes(): # This removes a race condition if task_state == 'resize_finish': continue prior to: elif vm_state != vm_states.RESIZED or task_state is not None: reason = (_("In states %(vm_state)s/%(task_state)s, not " "RESIZED/None") % {'vm_state': vm_state, 'task_state': task_state}) _set_migration_to_error(migration, reason,
[Yahoo-eng-team] [Bug 1374158] [NEW] Typo in call to LibvirtConfigObject's parse_dom() method
Public bug reported: In Juno in nova/virt/libvirt/config.py: LibvirtConfigGuestPUNUMA.parse_dom() calls super with a capital 'D' in parse_dom(). super(LibvirtConfigGuestCPUNUMA, self).parse_Dom(xmldoc) LibvirtConfigObject does not have a 'parse_Dom()' method. It has a 'parse_dom()' method. This causes the following exception to be raised. ... 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 1733, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack obj.parse_dom(c) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 542, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack numa.parse_dom(child) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 509, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack super(LibvirtConfigGuestCPUNUMA, self).parse_Dom(xmldoc) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstackAttributeError: 'super' object has no attribute 'parse_Dom' 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1374158 Title: Typo in call to LibvirtConfigObject's parse_dom() method Status in OpenStack Compute (Nova): New Bug description: In Juno in nova/virt/libvirt/config.py: LibvirtConfigGuestPUNUMA.parse_dom() calls super with a capital 'D' in parse_dom(). super(LibvirtConfigGuestCPUNUMA, self).parse_Dom(xmldoc) LibvirtConfigObject does not have a 'parse_Dom()' method. It has a 'parse_dom()' method. This causes the following exception to be raised. ... 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 1733, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack obj.parse_dom(c) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 542, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack numa.parse_dom(child) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/config.py", line 509, in parse_dom 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack super(LibvirtConfigGuestCPUNUMA, self).parse_Dom(xmldoc) 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstackAttributeError: 'super' object has no attribute 'parse_Dom' 2014-09-25 15:35:21.546 14344 TRACE nova.api.openstack 2014-09-25 15:35 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1374158/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1370229] [NEW] Total VCPUs could change on PowerKVM host, but change not reflected in host stats
Public bug reported: PowerKVM hosts support the feature of split cores. If a user enables 4 subcores per core on a system with 16 CPUS, then total VCPUs reported by virsh and libvirt's getInfo() API changes from 16 to 64. However, the hypervisor details API still shows 16 for VCPUs. This is because the total VCPUs of a host are only collected once; the first time libvirt driver obtains all the host stats, and that value is saved off and used from that point on every time the host stats are calculated. In nova.virt.libvirt.driver.LibvirtDriver: def _get_vcpu_total(self): """Get available vcpu number of physical computer. :returns: the number of cpu core instances can be used. """ if self._vcpu_total != 0: return self._vcpu_total try: total_pcpus = self._conn.getInfo()[2] This should be changed to always fetch the total VCPUs, instead of relying on it never changing. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1370229 Title: Total VCPUs could change on PowerKVM host, but change not reflected in host stats Status in OpenStack Compute (Nova): New Bug description: PowerKVM hosts support the feature of split cores. If a user enables 4 subcores per core on a system with 16 CPUS, then total VCPUs reported by virsh and libvirt's getInfo() API changes from 16 to 64. However, the hypervisor details API still shows 16 for VCPUs. This is because the total VCPUs of a host are only collected once; the first time libvirt driver obtains all the host stats, and that value is saved off and used from that point on every time the host stats are calculated. In nova.virt.libvirt.driver.LibvirtDriver: def _get_vcpu_total(self): """Get available vcpu number of physical computer. :returns: the number of cpu core instances can be used. """ if self._vcpu_total != 0: return self._vcpu_total try: total_pcpus = self._conn.getInfo()[2] This should be changed to always fetch the total VCPUs, instead of relying on it never changing. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1370229/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1258275] [NEW] Migration record for resize not cleared if exception is thrown during the resize
Public bug reported: Testing on havana. prep_resize() calls resource tracker's resize_claim() which creates a migration record. This record is cleared during the rt.drop_resize_claim() from confirm_resize() or revert_resize(), however if an exception is thrown before one of these is called or after, but before they clean up the migration record, then the migration record will hang around in the database indefinitely. This results in an WARNING being logged every 60 seconds for every resize operation that ended with the instance in ERROR state as part of the update_available_resource period task, like the following: 2013-12-04 17:49:15.247 25592 WARNING nova.compute.resource_tracker [req-75e94365-1cca-4bca-92a7-19b2c62b9551 e4857f249aec4160bfa19c12eb805a96 a42cfb9766bf41869efab25703f5ce7b] [instance: 12d2551a-6403-4100-ba57-0995594c9c93] Instance not resizing, skipping migration. This message is because the resource tracker's _update_usage_from_migrations() logs this warning if a migration record for an instance is found, but the instance's current state is not in a resize state. These messages will be permanent in the logs even after the instance in question's state is reset, and even after a successful resize has occurred on that instance. There is no way to clean up the old migration record at this point. It seems like there should be some handling when an exception occurs during resize, finish_resize, confirm_resize, revert_resize, etc. that will drop the resize claim, so the claim and migration record do not persist indefinitely. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1258275 Title: Migration record for resize not cleared if exception is thrown during the resize Status in OpenStack Compute (Nova): New Bug description: Testing on havana. prep_resize() calls resource tracker's resize_claim() which creates a migration record. This record is cleared during the rt.drop_resize_claim() from confirm_resize() or revert_resize(), however if an exception is thrown before one of these is called or after, but before they clean up the migration record, then the migration record will hang around in the database indefinitely. This results in an WARNING being logged every 60 seconds for every resize operation that ended with the instance in ERROR state as part of the update_available_resource period task, like the following: 2013-12-04 17:49:15.247 25592 WARNING nova.compute.resource_tracker [req-75e94365-1cca-4bca-92a7-19b2c62b9551 e4857f249aec4160bfa19c12eb805a96 a42cfb9766bf41869efab25703f5ce7b] [instance: 12d2551a-6403-4100-ba57-0995594c9c93] Instance not resizing, skipping migration. This message is because the resource tracker's _update_usage_from_migrations() logs this warning if a migration record for an instance is found, but the instance's current state is not in a resize state. These messages will be permanent in the logs even after the instance in question's state is reset, and even after a successful resize has occurred on that instance. There is no way to clean up the old migration record at this point. It seems like there should be some handling when an exception occurs during resize, finish_resize, confirm_resize, revert_resize, etc. that will drop the resize claim, so the claim and migration record do not persist indefinitely. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1258275/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp