[Yahoo-eng-team] [Bug 1567549] [NEW] SR-IOV VF passthrough does not properly update status of parent PF upon freeing VF
Public bug reported: Assigning an SR-IOV VF device to an instance when PFs are whitelisted too correctly marks the PF as unavailable if one of it's VFs got assigned. However when we delete the instance, the PF is not makred as available. Steps to reproduce: 1) Whitelist PFs and VFs in nova.conf (as explained in the docs) for example pci_passthrough_whitelist = [{"product_id":"1520", "vendor_id":"8086", "physical_network":"phynet"}, {"product_id":"1521", "vendor_id":"8086", "physical_network":"phynet"}] # Both pfs and vfs are whitelisted 2) Add an alias to assign a VF pci_alias = {"name": "vf", "device_type": "type-VF"} 3) Set up a flavor with an alias extra_spec $ nova flavor-key 2 set "pci_passthrough:alias"="vf:1" 4) Boot an instance with the said flavor and observe a VF being set to 'allocated' and a PF being set to 'unavailable' select * from pci_devices where deleted=0; 5) Delete the instance from step 4 and observe that the VF has been made available but the PF is still 'unavailable'. Both should be back to available if this was the only VF used. ** Affects: nova Importance: High Status: New ** Changed in: nova Importance: Undecided => Medium ** Changed in: nova Importance: Medium => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1567549 Title: SR-IOV VF passthrough does not properly update status of parent PF upon freeing VF Status in OpenStack Compute (nova): New Bug description: Assigning an SR-IOV VF device to an instance when PFs are whitelisted too correctly marks the PF as unavailable if one of it's VFs got assigned. However when we delete the instance, the PF is not makred as available. Steps to reproduce: 1) Whitelist PFs and VFs in nova.conf (as explained in the docs) for example pci_passthrough_whitelist = [{"product_id":"1520", "vendor_id":"8086", "physical_network":"phynet"}, {"product_id":"1521", "vendor_id":"8086", "physical_network":"phynet"}] # Both pfs and vfs are whitelisted 2) Add an alias to assign a VF pci_alias = {"name": "vf", "device_type": "type-VF"} 3) Set up a flavor with an alias extra_spec $ nova flavor-key 2 set "pci_passthrough:alias"="vf:1" 4) Boot an instance with the said flavor and observe a VF being set to 'allocated' and a PF being set to 'unavailable' select * from pci_devices where deleted=0; 5) Delete the instance from step 4 and observe that the VF has been made available but the PF is still 'unavailable'. Both should be back to available if this was the only VF used. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1567549/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1565785] [NEW] SR-IOV PF passthrough device claiming/allocation does not work for physical functions devices
Public bug reported: Enable PCI passthrough on a compute host (whitelist devices explained in more detail in the docs), and create a network, subnet and a port that represents a SR-IOV physical function passthrough: $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf After that try to boot an instance using the created port (provided the pci_passthrough_whitelist was setup correctly) this should work: $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm My test env has 2 PFs with 7 VFs each, after spawning an instance, the PF gets marked as allocated, but non of the VFs do, even though they are removed from the host (note that device_pools are correctly updated. So after the instance was successfully booted we get MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0; +--+ | count(*) | +--+ | 15 | +--+ # This should be 8 - we are leaking 7 VFs belonging to the attached PF that never get updated. MariaDB [nova]> select pci_stats from compute_nodes; | pci_stats | {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_ node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical _network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_ object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n ova"} | This is correct - shows 8 available devices Once a new resource_tracker run happens we hit https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based on what is found on the host. The root cause of this is (I believe) that we update PCI objects in the local scope, but never call save() on those particular instances. So we grap and update the status here: https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349 but never call save inside that method. The save is eventually called here referencing completely different instances that never see the update: https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646 ** Affects: nova Importance: High Status: New ** Tags: pci ** Changed in: nova Importance: Undecided => High ** Tags added: pci -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1565785 Title: SR-IOV PF passthrough device claiming/allocation does not work for physical functions devices Status in OpenStack Compute (nova): New Bug description: Enable PCI passthrough on a compute host (whitelist devices explained in more detail in the docs), and create a network, subnet and a port that represents a SR-IOV physical function passthrough: $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf After that try to boot an instance using the created port (provided the pci_passthrough_whitelist was setup correctly) this should work: $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm My test env has 2 PFs with 7 VFs each, after spawning an instance, the PF gets marked as allocated, but non of the VFs do, even though they are removed from the host (note that device_pools are correctly updated. So after the instance was successfully booted we get MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0; +--+ | count(*) | +--+ | 15 | +--+ # This should be 8 - we are leaking 7 VFs belonging to the attached PF that never get updated.
[Yahoo-eng-team] [Bug 1565721] [NEW] SR-IOV PF passthrough breaks resource tracking
Public bug reported: Enable PCI passthrough on a compute host (whitelist devices explained in more detail in the docs), and create a network, subnet and a port that represents a SR-IOV physical function passthrough: $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf After that try to boot an instance using the created port (provided the pci_passthrough_whitelist was setup correctly) this should work: $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm however, the next resource tracker run fails with: 2016-04-04 11:25:34.663 ERROR nova.compute.manager [req-d8095318-9710-48a8-a054-4581641c3bf3 None None] Error updating resources for node kilmainham-ghost. 2016-04-04 11:25:34.663 TRACE nova.compute.manager Traceback (most recent call last): 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 6442, in update_available_resource_for_node 2016-04-04 11:25:34.663 TRACE nova.compute.manager rt.update_available_resource(context) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 458, in update_available_resource 2016-04-04 11:25:34.663 TRACE nova.compute.manager self._update_available_resource(context, resources) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner 2016-04-04 11:25:34.663 TRACE nova.compute.manager return f(*args, **kwargs) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 493, in _update_available_resource 2016-04-04 11:25:34.663 TRACE nova.compute.manager self.pci_tracker.update_devices_from_hypervisor_resources(dev_json) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/pci/manager.py", line 118, in update_devices_from_hypervisor_resources 2016-04-04 11:25:34.663 TRACE nova.compute.manager self._set_hvdevs(devices) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/pci/manager.py", line 141, in _set_hvdevs 2016-04-04 11:25:34.663 TRACE nova.compute.manager self.stats.remove_device(existed) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/pci/stats.py", line 138, in remove_device 2016-04-04 11:25:34.663 TRACE nova.compute.manager pool['devices'].remove(dev) 2016-04-04 11:25:34.663 TRACE nova.compute.manager ValueError: list.remove(x): x not in list Which basically kills the RT periodic run meaning no further resources get updated by the periodic task. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1565721 Title: SR-IOV PF passthrough breaks resource tracking Status in OpenStack Compute (nova): New Bug description: Enable PCI passthrough on a compute host (whitelist devices explained in more detail in the docs), and create a network, subnet and a port that represents a SR-IOV physical function passthrough: $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf After that try to boot an instance using the created port (provided the pci_passthrough_whitelist was setup correctly) this should work: $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm however, the next resource tracker run fails with: 2016-04-04 11:25:34.663 ERROR nova.compute.manager [req-d8095318-9710-48a8-a054-4581641c3bf3 None None] Error updating resources for node kilmainham-ghost. 2016-04-04 11:25:34.663 TRACE nova.compute.manager Traceback (most recent call last): 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 6442, in update_available_resource_for_node 2016-04-04 11:25:34.663 TRACE nova.compute.manager rt.update_available_resource(context) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 458, in update_available_resource 2016-04-04 11:25:34.663 TRACE nova.compute.manager self._update_available_resource(context, resources) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner 2016-04-04 11:25:34.663 TRACE nova.compute.manager return f(*args, **kwargs) 2016-04-04 11:25:34.663 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/resource_track
[Yahoo-eng-team] [Bug 1563874] [NEW] libvirt: Snapshot and resume wont' work for instances with some SR-IOV ports
Public bug reported: libvirt driver methods that are used for determining whether a port is an SR-IOV port do not check properly for all possible SR-IOV port types: https://github.com/openstack/nova/blob/f15d9a9693b19393fcde84cf4bc6f044d39ffdca/nova/virt/libvirt/driver.py#L3378 should be checking for VNIC_TYPES_SRIOV instead. This affects snapshot and suspend/resume functionality provided by the libvirt driver, for instances using non-direct flavors of SR-IOV ** Affects: nova Importance: Undecided Status: New ** Tags: libvirt pci ** Tags added: libvirt pci -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1563874 Title: libvirt: Snapshot and resume wont' work for instances with some SR-IOV ports Status in OpenStack Compute (nova): New Bug description: libvirt driver methods that are used for determining whether a port is an SR-IOV port do not check properly for all possible SR-IOV port types: https://github.com/openstack/nova/blob/f15d9a9693b19393fcde84cf4bc6f044d39ffdca/nova/virt/libvirt/driver.py#L3378 should be checking for VNIC_TYPES_SRIOV instead. This affects snapshot and suspend/resume functionality provided by the libvirt driver, for instances using non-direct flavors of SR-IOV To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1563874/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1543149] Re: Reserve host pages on compute nodes
** Changed in: nova Status: Fix Released => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1543149 Title: Reserve host pages on compute nodes Status in OpenStack Compute (nova): Confirmed Bug description: In some use cases we may want to avoid Nova to use an amount of hugepages in compute nodes. (example when using ovs-dpdk). We should to provide an option 'reserved_memory_pages' which provides way to determine amount of pages we want to reserved for third part components To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1543149/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1543562] [NEW] mitaka pci_request object needs a migration script for an online data migration
Public bug reported: The following change adds an online data migration to the PciDevice object. https://review.openstack.org/#/c/249015/ (50355c45) When we do that we normally want to couple it together with a script that will allow operators to run the migration code even for rows that do not get accessed and saved during normal operation, as we normally drop any compatibility code in the release following the change. This is normally done using a nova-manage script, an example of which can be seen in the following commit: https://review.openstack.org/#/c/135067/ The above patch did not add such a script and so does not provide admins with any tools to make sure their data is updated for the N release where we expect the data to be migrated as per our current upgrade policy (http://docs.openstack.org/developer/nova/upgrade.html#migration- policy) ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1543562 Title: mitaka pci_request object needs a migration script for an online data migration Status in OpenStack Compute (nova): New Bug description: The following change adds an online data migration to the PciDevice object. https://review.openstack.org/#/c/249015/ (50355c45) When we do that we normally want to couple it together with a script that will allow operators to run the migration code even for rows that do not get accessed and saved during normal operation, as we normally drop any compatibility code in the release following the change. This is normally done using a nova-manage script, an example of which can be seen in the following commit: https://review.openstack.org/#/c/135067/ The above patch did not add such a script and so does not provide admins with any tools to make sure their data is updated for the N release where we expect the data to be migrated as per our current upgrade policy (http://docs.openstack.org/developer/nova/upgrade.html #migration-policy) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1543562/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1370207] Re: race condition between nova scheduler and nova compute
This seems to be by design i.e. Scheduler can get out of sync, and we have the claim-and-retry mechanism in place so request for vm3 would fail and trigger a reschedule. ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1370207 Title: race condition between nova scheduler and nova compute Status in OpenStack Compute (nova): Invalid Bug description: This is for nova 2014.1.2. Here, nova DB is the shared resources between nova-scheduler and nova- compute. Nova-scheduler checks DB to see if hv node can meet the provision requirement, nova-compute is the actual process to modify DB to reduce the free_ram_mb. For example, current available RAM on hv is 56G, with ram_allocation_ration=1.0. Within a minute, 3 vm provision requests are coming to scheduler, each asking for 24G RAM. t1: scheduler gets a request for vm1, assign vm1 to hv t2: scheduler gets a request for vm2, assign vm2 to hv t3: vm1 is created, nova-compute updates nova DB with RAM=32G t4: scheduler gets a request for vm3, assign vm3 to hv t5: vm2 is created, nova-compute updates nova DB with RAM=8G t6: vm3 is created, nova-compute updates nova DB with RAM=-16G In the end, we have a negative RAM with ram_allocation_ratio=1.0. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1370207/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1519878] Re: numatopology filter incorrectly returns no resources
Yes. as discussed - that is to be expected. Closing the bug for now. Feel free to reopen if you feel it needs more looking into. ** Changed in: nova Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1519878 Title: numatopology filter incorrectly returns no resources Status in OpenStack Compute (nova): Invalid Bug description: When launching a new instance, in some cases NUmaTopology Filter does not return available compute nodes, but according to the content of numa_topology in compute_nodes tables, there are sufficient resources to satisfy requirements. I started three instances, attached log show changes in numa_topology, when I try to start 4th instance which is requesting 4vCPU and according to numa_topology I have left 8 vCPU, NumaTopology filter incorrectly returns 0 hosts. If I delete existing instances, I can launch failed one without any modification. rpm -qa | grep nova openstack-nova-conductor-12.0.0-1.el7.noarch python-novaclient-2.30.1-1.el7.noarch openstack-nova-console-12.0.0-1.el7.noarch openstack-nova-common-12.0.0-1.el7.noarch openstack-nova-scheduler-12.0.0-1.el7.noarch openstack-nova-compute-12.0.0-1.el7.noarch python-nova-12.0.0-1.el7.noarch openstack-nova-novncproxy-12.0.0-1.el7.noarch openstack-nova-api-12.0.0-1.el7.noarch openstack-nova-cert-12.0.0-1.el7.noarch To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1519878/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1517442] [NEW] libvirt/xenapi: disk_available_least reported by the driver does not take into account instances being migrated to/from the host
Public bug reported: Looking briefly at the code of other drivers that try to report this (xenapi and ironic) - it is also likely broken for at least xenapi. The crux of the issue is that resource tracker works by looking at the instances Nova knows about, and also the ongoing migration, so anything that is reported by any of the virt drivers as part of the dictionary returned from get_available_resource should only be based on the available resources and should never try to factor in any resource usage. Only the resource tracker holding the global resource lock (COMPUTE_RESOURCE_SEMAPHORE) knows the current usage of resources since it can take into account migrations that are in flight etc. Unfortunately, both libvirt and xenapi (I think) try to look at the instance currently know by the hypervisor, which is not all instances we should be taking into account, and deduce the final disk_available_least number. To fix this we would have to rework how disk_available least is calculated - we'd have to make sure the drivers only report the total available space, and then make sure we update the usage _for each instance and migration_ to come up with the final number. ** Affects: nova Importance: High Status: New ** Tags: libvirt resource-tracker xen -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1517442 Title: libvirt/xenapi: disk_available_least reported by the driver does not take into account instances being migrated to/from the host Status in OpenStack Compute (nova): New Bug description: Looking briefly at the code of other drivers that try to report this (xenapi and ironic) - it is also likely broken for at least xenapi. The crux of the issue is that resource tracker works by looking at the instances Nova knows about, and also the ongoing migration, so anything that is reported by any of the virt drivers as part of the dictionary returned from get_available_resource should only be based on the available resources and should never try to factor in any resource usage. Only the resource tracker holding the global resource lock (COMPUTE_RESOURCE_SEMAPHORE) knows the current usage of resources since it can take into account migrations that are in flight etc. Unfortunately, both libvirt and xenapi (I think) try to look at the instance currently know by the hypervisor, which is not all instances we should be taking into account, and deduce the final disk_available_least number. To fix this we would have to rework how disk_available least is calculated - we'd have to make sure the drivers only report the total available space, and then make sure we update the usage _for each instance and migration_ to come up with the final number. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1517442/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1501358] [NEW] cpu pinning on the host that has siblings does not work properly in some cases (instance with odd CPUs)
Public bug reported: Calculating CPU pinning for an instance for the host with hyperthreading fails in certain cases. Most notably when the instance has an odd number of CPUs, due to a bug in the logic we might either fail to pin entirely or end up avoiding siblings by accident, although the default policy should be to prefer them (which is what happens when we have an even number of CPUs). Consider a host with CPUs [(0, 3), (1, 4), (2, 5)] (brackets denote thread siblings). Instance with 5 CPUs would fail to get fitted onto this host even though it's clear that it's absolutely possible to fit that instance on there. Another unexpected result happens when we have a host [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]] and a 5 CPU instance. In this case an instance with 5 CPUs would get pinned as follows (instance cpu -> host cpu): [(0 -> 0), (1 -> 1), (2 -> 2), (3 -> 3), (4 -> 4)] which is wrong since the default for instances with an even number of CPUs would be to prefer sibling CPUs. After inspecting the fitting logic code: https://github.com/openstack/nova/blob/b0013d93ffeaed53bc28d9558def26bdb7041ed7/nova/virt/hardware.py#L653 I also noticed that we would consult the existing topology of the instance NUMA cell when deciding on the proper way to fit instance CPUs onto host. This is actually wrong after https://review.openstack.org/#/c/198312/. We don't need to consider the requested topology in the CPU fitting any more as the code that decides on the final CPU topology takes all of this into account. ** Affects: nova Importance: Medium Status: New ** Tags: numa -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1501358 Title: cpu pinning on the host that has siblings does not work properly in some cases (instance with odd CPUs) Status in OpenStack Compute (nova): New Bug description: Calculating CPU pinning for an instance for the host with hyperthreading fails in certain cases. Most notably when the instance has an odd number of CPUs, due to a bug in the logic we might either fail to pin entirely or end up avoiding siblings by accident, although the default policy should be to prefer them (which is what happens when we have an even number of CPUs). Consider a host with CPUs [(0, 3), (1, 4), (2, 5)] (brackets denote thread siblings). Instance with 5 CPUs would fail to get fitted onto this host even though it's clear that it's absolutely possible to fit that instance on there. Another unexpected result happens when we have a host [[0, 8], [1, 9], [2, 10], [3, 11], [4, 12], [5, 13], [6, 14], [7, 15]] and a 5 CPU instance. In this case an instance with 5 CPUs would get pinned as follows (instance cpu -> host cpu): [(0 -> 0), (1 -> 1), (2 -> 2), (3 -> 3), (4 -> 4)] which is wrong since the default for instances with an even number of CPUs would be to prefer sibling CPUs. After inspecting the fitting logic code: https://github.com/openstack/nova/blob/b0013d93ffeaed53bc28d9558def26bdb7041ed7/nova/virt/hardware.py#L653 I also noticed that we would consult the existing topology of the instance NUMA cell when deciding on the proper way to fit instance CPUs onto host. This is actually wrong after https://review.openstack.org/#/c/198312/. We don't need to consider the requested topology in the CPU fitting any more as the code that decides on the final CPU topology takes all of this into account. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1501358/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1499449] [NEW] libvirt live-migration: Monitoring task does not track progress watermark correctly
Public bug reported: It is possible for a libvirt to report libvirt.VIR_DOMAIN_JOB_UNBOUNDED in _live_migration_monitor (https://github.com/openstack/nova/blob/ccea5d6b0ace535b375d3e63bd572885cb5dbc91/nova/virt/libvirt/driver.py#L5823) but return 0s for data_remaining, which in turn makes out progress watermark 0 which is lower than it is likely to get during the migration and basically useless to report as it is going to be 0. We should not 0 out the progress_watermark var in that method ** Affects: nova Importance: Low Status: New ** Tags: libvirt live-migration low-hanging-fruit ** Changed in: nova Importance: Undecided => Low ** Tags added: libvirt live-migration low-hanging-fruit -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1499449 Title: libvirt live-migration: Monitoring task does not track progress watermark correctly Status in OpenStack Compute (nova): New Bug description: It is possible for a libvirt to report libvirt.VIR_DOMAIN_JOB_UNBOUNDED in _live_migration_monitor (https://github.com/openstack/nova/blob/ccea5d6b0ace535b375d3e63bd572885cb5dbc91/nova/virt/libvirt/driver.py#L5823) but return 0s for data_remaining, which in turn makes out progress watermark 0 which is lower than it is likely to get during the migration and basically useless to report as it is going to be 0. We should not 0 out the progress_watermark var in that method To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1499449/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1499028] [NEW] Rebuild would not apply the migration context before calling the driver
Public bug reported: https://review.openstack.org/#/c/200485/ patch makes rebuild use migration context added earlier in Liberty for proper resource tracking when doing rebuild/evacuate. Sadly the above patch missed that we need to make sure we set the proper data from the context when calling the driver methods so that we make sure the stashed migration context data is applied when rebuilding. HEAD at 568be05 ** Affects: nova Importance: Undecided Status: New ** Tags: liberty-rc-potential ** Tags added: liberty-rc-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1499028 Title: Rebuild would not apply the migration context before calling the driver Status in OpenStack Compute (nova): New Bug description: https://review.openstack.org/#/c/200485/ patch makes rebuild use migration context added earlier in Liberty for proper resource tracking when doing rebuild/evacuate. Sadly the above patch missed that we need to make sure we set the proper data from the context when calling the driver methods so that we make sure the stashed migration context data is applied when rebuilding. HEAD at 568be05 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1499028/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1496135] Re: libvirt live-migration will not honor destination vcpu_pin_set config
** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1496135 Title: libvirt live-migration will not honor destination vcpu_pin_set config Status in OpenStack Compute (nova): Invalid Bug description: Reporting this based on code inspection of the current master (commit: 9f61d1eb642785734f19b5b23365f80f033c3d9a) When we attempt to live-migrate an instance onto a host that has a different vcpu_pin_set than the one that was on the source host, we may either break the policy set by the destination host or fail (as we will not recalculate the vcpu cpuset attribute to match that of the destination host, so we may end up with an invalid range). The first solution that jumps out is to make sure the XML is updated in https://github.com/openstack/nova/blob/6d68462c4f20a0b93a04828cb829e86b7680d8a4/nova/virt/libvirt/driver.py#L5422 However that would mean passing over the requested info from the destination host. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1496135/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1498126] [NEW] Inconsistencies with resource tracking in the case of resize operation.
Public bug reported: All of these are being reported upon code inspection - I have yet to confirm all of these as they are in fact edge cases and subtle race conditions: * We update the instance.host field to the value of the destination_node in resize_migration which runs on the source host. (https://github.com/openstack/nova/blob/1df8248b6ad7982174c417abf80070107eac8909/nova/compute/manager.py#L3750) This means that in between that DB write, and changing the flavor and applying the migration context (which happens in finish_resize ran on destination host) all resource tracking runs on the destination host will be wrong (they will use the instance record and thus use the wrong . * There is very similar racy-ness in the revert_resize path as described in the following comment (https://github.com/openstack/nova/blob/1df8248b6ad7982174c417abf80070107eac8909/nova/compute/manager.py#L3448) - we should fix that too. * drop_move_claim method makes sense only when called on the source node, so it's name should be reflected to change that. It's really an optimization where we free the resources sooner than the next RT pass which will not see the migration as in progress. This should be documented better * drop_move_claim looks up the new_flavor to compare it with the flavor that was used to track the migration, but on the source node it's certain to be the old_flavor. Thus as it stands now drop_move_claim (only ran on source nodes) doesn't do anything. Not a big deal, but we should probably fix it. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1498126 Title: Inconsistencies with resource tracking in the case of resize operation. Status in OpenStack Compute (nova): New Bug description: All of these are being reported upon code inspection - I have yet to confirm all of these as they are in fact edge cases and subtle race conditions: * We update the instance.host field to the value of the destination_node in resize_migration which runs on the source host. (https://github.com/openstack/nova/blob/1df8248b6ad7982174c417abf80070107eac8909/nova/compute/manager.py#L3750) This means that in between that DB write, and changing the flavor and applying the migration context (which happens in finish_resize ran on destination host) all resource tracking runs on the destination host will be wrong (they will use the instance record and thus use the wrong . * There is very similar racy-ness in the revert_resize path as described in the following comment (https://github.com/openstack/nova/blob/1df8248b6ad7982174c417abf80070107eac8909/nova/compute/manager.py#L3448) - we should fix that too. * drop_move_claim method makes sense only when called on the source node, so it's name should be reflected to change that. It's really an optimization where we free the resources sooner than the next RT pass which will not see the migration as in progress. This should be documented better * drop_move_claim looks up the new_flavor to compare it with the flavor that was used to track the migration, but on the source node it's certain to be the old_flavor. Thus as it stands now drop_move_claim (only ran on source nodes) doesn't do anything. Not a big deal, but we should probably fix it. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1498126/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1489442] Re: Invalid order of volumes with adding a volume in boot operation
Moving this to Invalid - but please feel free to move back if you disagree. ** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1489442 Title: Invalid order of volumes with adding a volume in boot operation Status in OpenStack Compute (nova): Invalid Bug description: If an image has several volume in bdm, and a user adds one more volume for boot operation, then the new volume is not just added to a volume list, but becomes the second device. This can lead to problems if the image root device has various soft which settings point to other volumes. For example: 1 the image is a snapshot of a volume backed instance which had vda and vdb volumes 2 the instance had an sql server, which used both vda and vdb for its database 3 if a user runs a new instance from the image, either device names are restored (with xen), or they're reassigned (libvirt) to the same names, because the order of devices, which are passed in libvirt, is the same as it was for the original instance 4 if a user runs a new instance, adding a new volume, the volume list becomes vda, new, vdb 5 in this case libvirt reassings device names to vda=vda, new=vdb, vdb=vdc 6 as a result the sql server will not find its data on vdb To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1489442/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1496135] [NEW] libvirt live-migration will not honor destination vcpu_pin_set config
Public bug reported: Reporting this based on code inspection of the current master (commit: 9f61d1eb642785734f19b5b23365f80f033c3d9a) When we attempt to live-migrate an instance onto a host that has a different vcpu_pin_set than the one that was on the source host, we may either break the policy set by the destination host or fail (as we will not recalculate the vcpu cpuset attribute to match that of the destination host, so we may end up with an invalid range). The first solution that jumps out is to make sure the XML is updated in https://github.com/openstack/nova/blob/6d68462c4f20a0b93a04828cb829e86b7680d8a4/nova/virt/libvirt/driver.py#L5422 However that would mean passing over the requested info from the destination host. ** Affects: nova Importance: Medium Status: New ** Tags: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1496135 Title: libvirt live-migration will not honor destination vcpu_pin_set config Status in OpenStack Compute (nova): New Bug description: Reporting this based on code inspection of the current master (commit: 9f61d1eb642785734f19b5b23365f80f033c3d9a) When we attempt to live-migrate an instance onto a host that has a different vcpu_pin_set than the one that was on the source host, we may either break the policy set by the destination host or fail (as we will not recalculate the vcpu cpuset attribute to match that of the destination host, so we may end up with an invalid range). The first solution that jumps out is to make sure the XML is updated in https://github.com/openstack/nova/blob/6d68462c4f20a0b93a04828cb829e86b7680d8a4/nova/virt/libvirt/driver.py#L5422 However that would mean passing over the requested info from the destination host. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1496135/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1370250] Re: Can not set volume attributes at instance launch by EC2 API
Not sure why this was moved to "won't fix" the fix is up and has a +2. Moving back ** Changed in: nova Status: Won't Fix => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1370250 Title: Can not set volume attributes at instance launch by EC2 API Status in ec2-api: Confirmed Status in OpenStack Compute (nova): In Progress Bug description: AWS allows to change block device attributes (such as volume size, delete on termination behavior, existence) at instance launch. For example, image xxx has devices: vda, size 10, delete on termination vdb, size 100, delete on termination vdc, size 100, delete on termination We can run an instance by euca-run-instances ... xxx -b /dev/vda=:20 -b /dev/vdb=::false -b /dev/vdc=none to get the instance with devices: vda, size 20, delete on termination vdb, size 100, not delete on termination For Nova we get now: $ euca-run-instances --instance-type m1.nano -b /dev/vda=::true ami-000a euca-run-instances: error (InvalidBDMFormat): Block Device Mapping is Invalid: Unrecognized legacy format. To manage notifications about this bug go to: https://bugs.launchpad.net/ec2-api/+bug/1370250/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1475831] Re: injected_file_content_bytes should be changed to injected-file-size
I think Alex was saying that this needs to be fixed in the openstack- client, not Nova client. Nova client does the right thing for what the server expects, it's the unified client that gets it wrong. ** Also affects: python-openstackclient Importance: Undecided Status: New ** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1475831 Title: injected_file_content_bytes should be changed to injected-file-size Status in OpenStack Compute (nova): Invalid Status in python-openstackclient: New Bug description: In nova and novaclient, injected_file_content_bytes should be changed to injected_file_size. Because (1) nova/quota.py nvoa/compute/api.py please use 'grep -r injected_file_content_bytes' to look at (2) novaclient/v2/shell.py 3877 _quota_resources = ['instances', 'cores', 'ram', 3878 'floating_ips', 'fixed_ips', 'metadata_items', 3879 'injected_files', 'injected_file_content_bytes', 3880 'injected_file_path_bytes', 'key_pairs', 3881 'security_groups', 'security_group_rules', 3882 'server_groups', 'server_group_members'] (3) python-openstackclient/openstackclient/common/quota.py 30 COMPUTE_QUOTAS = { 31 'cores': 'cores', 32 'fixed_ips': 'fixed-ips', 33 'floating_ips': 'floating-ips', 34 'injected_file_content_bytes': 'injected-file-size', 35 'injected_file_path_bytes': 'injected-path-size', 36 'injected_files': 'injected-files', 37 'instances': 'instances', 38 'key_pairs': 'key-pairs', 39 'metadata_items': 'properties', 40 'ram': 'ram', 41 'security_group_rules': 'secgroup-rules', 42 'security_groups': 'secgroups', 43 } (4). http://docs.openstack.org/developer/python-openstackclient/command-objects/quota.html os quota set # Compute settings [--cores ] [--fixed-ips ] [--floating-ips ] [--injected-file-size ] [--injected-files ] [--instances ] [--key-pairs ] [--properties ] [--ram ] # Volume settings [--gigabytes ] [--snapshots ] [--volumes ] [--volume-type ] so when you use stack@openstack:~$ openstack quota set --injected-file-size 11 testproject_dx No quotas updatedstack@openstack:~$ If this bug is solved, plus the fix to https://bugs.launchpad.net/keystone/+bug/1420104 can solve these two. So the bug is related with nova and novaclient. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1475831/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1275675] Re: Version change in ObjectField does not work with back-levelling
** Also affects: oslo.versionedobjects Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1275675 Title: Version change in ObjectField does not work with back-levelling Status in OpenStack Compute (nova): In Progress Status in oslo.versionedobjects: New Bug description: When a NovaObject primitive is deserialized the object version is checked and an IncompatibleObjectVersion exception is raised if the serialized primitive is labelled with a version that is not known locally. The exception indicates what version is known locally, and the deserialization attempts to backport the primitive to the local version. If a NovaObject A has an ObjectField b containing NovaObject B and it is B that has the incompatible version, the version number in the exception will be the the locally supported version for B. The desrialization will then attempt to backport the primitive of object A to the locally supported version number for object B. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1275675/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1475254] [NEW] NovaObjectSerializer cannot handle backporting a nested object
Public bug reported: NovaObjectSerializer will call obj_from_primitive, and tries to guard against IncompatibleObjectVersion in which case it will call on the conductor to backport the object to the highest version it knows about. See: https://github.com/openstack/nova/blob/35375133398d862a61334783c1e7a90b95f34cdb/nova/objects/base.py#L634 The problem is if a top-level object can be serialized but one of the nested objects throws an IncompatibleObjectVersion what happens, due to the way that we handle all exceptions from the recursion at the top level is that conductor gets asked to backport the top-level object to the nested object's latest known version - completely wrong! https://github.com/openstack/nova/blob/35375133398d862a61334783c1e7a90b95f34cdb/nova/objects/base.py#L643 This happens in our case when trying to fix https://bugs.launchpad.net/nova/+bug/1474074, and running upgrade tests with unpatched Kilo code - we bumped the PciDeviceList version on master, and need to do it on Kilo but the stable/kilo patch cannot be landed first, so the highest PciDeviceList kilo node know about is 1.1, however we end up asking the conductor to backport the Instance to 1.1 which drops a whole bunch of things we need, which then causes lazy_loading exception (copied from the gate logs of https://review.openstack.org/#/c/201280/ PS 6) 2015-07-15 16:55:15.377 ERROR nova.compute.manager [req-fb91e079-1eef-4768-b315-9233c6b9946d tempest-ServerAddressesTestJSON-1642250859 tempest-ServerAddressesTestJSON-713705678] [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] Instance failed to spawn 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] Traceback (most recent call last): 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/compute/manager.py", line 2461, in _build_resources 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] yield resources 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/compute/manager.py", line 2333, in _build_and_run_instance 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] block_device_info=block_device_info) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 2378, in spawn 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] write_to_disk=True) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 4179, in _get_guest_xml 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] context) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 3989, in _get_guest_config 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] pci_devs = pci_manager.get_instance_pci_devs(instance, 'all') 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/pci/manager.py", line 279, in get_instance_pci_devs 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] pci_devices = inst.pci_devices 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/objects/base.py", line 72, in getter 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] self.obj_load_attr(name) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/objects/instance.py", line 1018, in obj_load_attr 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] self._load_generic(attrname) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] File "/opt/stack/old/nova/nova/objects/instance.py", line 908, in _load_generic 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] reason='loading %s requires recursion' % attrname) 2015-07-15 16:55:15.377 21515 TRACE nova.compute.manager [instance: 25387a96-e47f-47f1-8e3c-3716072c9c23] ObjectActionError: Object action obj_load_attr failed because: loading pci_devices requires recursion 2015-07-15 16:55:15.377 21515
[Yahoo-eng-team] [Bug 1474074] [NEW] PciDeviceList is not versioned properly in liberty and kilo
Public bug reported: The following commit: https://review.openstack.org/#/c/140289/4/nova/objects/pci_device.py missed to bump the PciDeviceList version. We should do it now (master @ 4bfb094) and backport this to stable Kilo as well ** Affects: nova Importance: High Status: Confirmed ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1474074 Title: PciDeviceList is not versioned properly in liberty and kilo Status in OpenStack Compute (nova): Confirmed Bug description: The following commit: https://review.openstack.org/#/c/140289/4/nova/objects/pci_device.py missed to bump the PciDeviceList version. We should do it now (master @ 4bfb094) and backport this to stable Kilo as well To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1474074/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1461638] [NEW] when booting with a blank volume without supplied size - it will just get ignored
Public bug reported: $ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 --block-device source=blank,dest=volume testvm-blank The above line would be accepted as a valid boot request, but no blank volume would be created. The reason is that: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1210 will not check if the size was provided (like it checks when source=image volume is requested), and then it will just get completely disregarded here: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1204 ** Affects: nova Importance: Undecided Status: New ** Tags: volumes ** Description changed: $ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 --block-device source=blank,dest=volume testvm-blank - The above line would succseed but no volume would be created. The reason - is that: + The above line would be accepted as a valid boot request, but no blank + volume would be created. The reason is that: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1210 will not check if the size was provided (like it checks when source=image volume is requested), and then it will just get completely disregarded here: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1204 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1461638 Title: when booting with a blank volume without supplied size - it will just get ignored Status in OpenStack Compute (Nova): New Bug description: $ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 --block-device source=blank,dest=volume testvm-blank The above line would be accepted as a valid boot request, but no blank volume would be created. The reason is that: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1210 will not check if the size was provided (like it checks when source=image volume is requested), and then it will just get completely disregarded here: https://github.com/openstack/nova/blob/46bba88413c99ddbb8080f68c1a32a64ef908150/nova/compute/api.py#L1204 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1461638/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1377161] Re: If volume-attach API is failed, Block Device Mapping record will remain
So as commented on the patch - I really think that we need to make sure that whatever gets created, also gets cleaned up on errors - while the patch https://review.openstack.org/166695 has some good ideas. What I also noticed (when I was testing this some time ago) is that what really happens here is that the rpc client does not time out - the reason you see the failure is likely because the Nova API times out the request. This might actually be an issue with oslo.messaging rabbitmq driver (which would never time out a request) or the fact that we assume it would. ** Also affects: oslo.messaging Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1377161 Title: If volume-attach API is failed, Block Device Mapping record will remain Status in Cinder: Invalid Status in OpenStack Compute (Nova): In Progress Status in Messaging API for OpenStack: New Status in Python client library for Cinder: Invalid Bug description: I executed volume-attach API(nova V2 API) when RabbitMQ was down. As result of above API execution, volume-attach API was failed and volume's status is still available. But, block device mapping record remains on nova DB. This condition is inconsistency. And, remained block device mapping record maybe cause some problems. (I'm researching now.) I used openstack juno-3. -- * Before executing volume-attach API: $ nova list +--++++-++ | ID | Name | Status | Task State | Power State | Networks | +--++++-++ | 0b529526-4c8d-4650-8295-b7155a977ba7 | testVM | ACTIVE | - | Running | private=10.0.0.104 | +--++++-++ $ cinder list +--+---+--+--+-+--+-+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+-+ | e93478bf-ee37-430f-93df-b3cf26540212 | available | None | 1 | None| false | | +--+---+--+--+-+--+-+ devstack@ubuntu-14-04-01-64-juno3-01:~$ mysql> select * from block_device_mapping where instance_uuid = '0b529526-4c8d-4650-8295-b7155a977ba7'; +-+-++-+-+---+-+---+-+---+-+--+-+-+--+--+-+--++--+ | created_at | updated_at | deleted_at | id | device_name | delete_on_termination | snapshot_id | volume_id | volume_size | no_device | connection_info | instance_uuid| deleted | source_type | destination_type | guest_format | device_type | disk_bus | boot_index | image_id | +-+-++-+-+---+-+---+-+---+-+--+-+-+--+--+-+--++--+ | 2014-10-02 18:36:08 | 2014-10-02 18:36:10 | NULL | 145 | /dev/vda | 1 | NULL| NULL |NULL | NULL | NULL| 0b529526-4c8d-4650-8295-b7155a977ba7 | 0 | image | local| NULL | disk| NULL | 0 | c1d264fd-c559-446e-9b94-934ba8249ae1 | +-+-++-+-+---+-+---+-+---+-+--+-+-+--+--+-+--++--+ 1 row in set (0.00 sec) * After executing volume-attach API: $ nova list --all-t +--++++-++ | ID | Name | Status | Task Stat
[Yahoo-eng-team] [Bug 1435748] Re: save method is getting called two times in 'attach' api
** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1435748 Title: save method is getting called two times in 'attach' api Status in OpenStack Compute (Nova): Invalid Bug description: 'save' method is getting called two times in 'attach' method of class 'DriverVolumeBlockDevice'. (https://github.com/openstack/nova/blob/master/nova/virt/block_device.py#L224) It is getting called from decorator 'update_db' and from attach method itself. There is no need of decorator 'update_db' for attach method as 'save' is already called in attach method. Note: save method will not update db if there is no change in bdm object To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1435748/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1452224] [NEW] libvirt: attaching volume device name should be decided using the same logic as when booting
Public bug reported: libvirt driver needs to use it's own logic for determining the device name that will be persisted in Nova instead of the generic methods in nova.compute.utils, since libvirt cannot really assign the device name to a block device of an instance (it's treated as a ordering hint only), and we need to make sure that information in the Nova DB matches what will be assigned. We already have this logic in nova.virt.libvirt.blockinfo and is being called for booting instances, however when attaching volumes to an already running instance we rely on nova.compute.utils.get_device_name_for_instance() which will do the wrong thing in a number of cases (for example volumes using different bus (see bug #1379212), instances with an ephemeral disk etc.) Current master is: 0b23bce359c8c92715695cac7a6eff7c473ad8c2 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1452224 Title: libvirt: attaching volume device name should be decided using the same logic as when booting Status in OpenStack Compute (Nova): New Bug description: libvirt driver needs to use it's own logic for determining the device name that will be persisted in Nova instead of the generic methods in nova.compute.utils, since libvirt cannot really assign the device name to a block device of an instance (it's treated as a ordering hint only), and we need to make sure that information in the Nova DB matches what will be assigned. We already have this logic in nova.virt.libvirt.blockinfo and is being called for booting instances, however when attaching volumes to an already running instance we rely on nova.compute.utils.get_device_name_for_instance() which will do the wrong thing in a number of cases (for example volumes using different bus (see bug #1379212), instances with an ephemeral disk etc.) Current master is: 0b23bce359c8c92715695cac7a6eff7c473ad8c2 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1452224/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1451950] [NEW] virt.block_device.conver_all_volumes would miss blank volumes
Public bug reported: The following patch that introduces the method for some reason completely missed the Blank volume type https://review.openstack.org/#/c/150090/ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1451950 Title: virt.block_device.conver_all_volumes would miss blank volumes Status in OpenStack Compute (Nova): New Bug description: The following patch that introduces the method for some reason completely missed the Blank volume type https://review.openstack.org/#/c/150090/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1451950/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1172808] Re: Nova fails on Quantum port quota too late
A patch that does a partial revert of https://review.openstack.org/49455 from comment #16 and is under discussion at the time of writing so I am linking it here. https://review.openstack.org/#/c/175742/ Basically - just checking quotas and not reserving them is a bit of a fool's errand. We should eithere have a reserve-rollback api in Neutron, or as has been suggested above - create the port quickly and then update it with additional information once we have it (when the request reaches the compute host) ** Changed in: nova Status: Fix Released => Confirmed ** Changed in: nova Milestone: 2014.2 => None -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1172808 Title: Nova fails on Quantum port quota too late Status in OpenStack Compute (Nova): Confirmed Bug description: Currently Nova will only hit any port quota limit in Quantum in the compute manager - as that's where the code to create ports exists - resulting in the instance going to an error state (after its bounced through three hosts). Seems to me that for Quantum the ports should be created in the API call (so that the error can be sent back to the user), and the port then passed down to the compute manager. (Since a user can pass a port into the server create call I'm assuming this would be OK) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1172808/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1450438] [NEW] loopingcall: if a time drift to the future occurs, all timers will be blocked
Public bug reported: Due to the fact that loopingcall.py uses time.time for recording wall- clock time which is not guaranteed to be monotonic, if a time drift to the future occurs, and then gets corrected, all the timers will get blocked until the actual time reaches the moment of the original drift. This can be pretty bad if the interval is not insignificant - in Nova's case - all services uses FixedIntervalLoopingCall for it's heartbeat periodic tasks - if a drift is on the order of magnitude of several hours, no heartbeats will happen. DynamicLoopingCall is affected by this as well but because it relies on eventlet which would also use a non-monotonic time.time function for it's internal timers. Solving this will require looping calls to start using a monotonic timer (for python 2.7 there is a monotonic package). Also all services that want to use timers and avoid this issue should doe something like import monotonic hub = eventlet.get_hub() hub.clock = monotonic.monotonic immediately after calling eventlet.monkey_patch() ** Affects: nova Importance: Undecided Status: New ** Affects: oslo-incubator Importance: Undecided Status: New ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1450438 Title: loopingcall: if a time drift to the future occurs, all timers will be blocked Status in OpenStack Compute (Nova): New Status in The Oslo library incubator: New Bug description: Due to the fact that loopingcall.py uses time.time for recording wall- clock time which is not guaranteed to be monotonic, if a time drift to the future occurs, and then gets corrected, all the timers will get blocked until the actual time reaches the moment of the original drift. This can be pretty bad if the interval is not insignificant - in Nova's case - all services uses FixedIntervalLoopingCall for it's heartbeat periodic tasks - if a drift is on the order of magnitude of several hours, no heartbeats will happen. DynamicLoopingCall is affected by this as well but because it relies on eventlet which would also use a non-monotonic time.time function for it's internal timers. Solving this will require looping calls to start using a monotonic timer (for python 2.7 there is a monotonic package). Also all services that want to use timers and avoid this issue should doe something like import monotonic hub = eventlet.get_hub() hub.clock = monotonic.monotonic immediately after calling eventlet.monkey_patch() To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1450438/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1383465] Re: [pci-passthrough] nova-compute fails to start
*** This bug is a duplicate of bug 1415768 *** https://bugs.launchpad.net/bugs/1415768 ** This bug has been marked a duplicate of bug 1415768 the pci deivce assigned to instance is inconsistent with DB record when restarting nova-compute -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1383465 Title: [pci-passthrough] nova-compute fails to start Status in OpenStack Compute (Nova): Fix Released Bug description: Created a guest using nova with a passthrough device, shutdown that guest, and disabled nova-compute (openstack-service stop). Went to turn things back on, and nova-compute fails to start. The trace: 2014-10-20 16:06:45.734 48553 ERROR nova.openstack.common.threadgroup [-] PCI device request ({'requests': [InstancePCIRequest(alias_name='rook',count=2,is_new=False,request_id=None,spec=[{product_id='10fb',vendor_id='8086'}])], 'code': 500}equests)s failed 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup Traceback (most recent call last): 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 125, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup x.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 47, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self.thread.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 173, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self._exit_event.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 293, in switch 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self.greenlet.switch() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 212, in main 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/service.py", line 492, in run_service 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup service.start() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/service.py", line 181, in start 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1152, in pre_start_hook 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_context()) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5949, in update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup rt.update_available_resource(context) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 332, in update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self._update_available_resource(context, resources) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 272, in inner 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 349, in _update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self._update_usage_from_instances(context, resources, instances) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 708, in _update_usage_from_instances 2014-10-20 16
[Yahoo-eng-team] [Bug 1383465] Re: [pci-passthrough] nova-compute fails to start
** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1383465 Title: [pci-passthrough] nova-compute fails to start Status in OpenStack Compute (Nova): Fix Released Bug description: Created a guest using nova with a passthrough device, shutdown that guest, and disabled nova-compute (openstack-service stop). Went to turn things back on, and nova-compute fails to start. The trace: 2014-10-20 16:06:45.734 48553 ERROR nova.openstack.common.threadgroup [-] PCI device request ({'requests': [InstancePCIRequest(alias_name='rook',count=2,is_new=False,request_id=None,spec=[{product_id='10fb',vendor_id='8086'}])], 'code': 500}equests)s failed 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup Traceback (most recent call last): 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 125, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup x.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 47, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self.thread.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 173, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self._exit_event.wait() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 293, in switch 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self.greenlet.switch() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 212, in main 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/service.py", line 492, in run_service 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup service.start() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/service.py", line 181, in start 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook() 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1152, in pre_start_hook 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_context()) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5949, in update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup rt.update_available_resource(context) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 332, in update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return self._update_available_resource(context, resources) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 272, in inner 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 349, in _update_available_resource 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self._update_usage_from_instances(context, resources, instances) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 708, in _update_usage_from_instances 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common.threadgroup self._update_usage_from_instance(context, resources, instance) 2014-10-20 16:06:45.734 48553 TRACE nova.openstack.common
[Yahoo-eng-team] [Bug 1442048] [NEW] Avoid websocket proxies needing to have matching have config '*_baseurl' configs with compute nodes
Public bug reported: As part of the fix for the related bug - we've added protocol checking to mitigate MITM attacks, however we base protocol checking on a config option that is normally only intended for compute hosts. This is quite user hostile, as it is now important that all nodes running compute and proxy services have this option in sync. We can do better than that - we can persist the URL the client is expected to use, and once we get it back on token validation, we can make sure that the request is using the intended protocol, mitigating the MITM injected script attacks. ** Affects: nova Importance: High Assignee: Nikola Đipanov (ndipanov) Status: Confirmed ** Tags: kilo-rc-potential ** Tags added: kilo-rc-potential ** Changed in: nova Status: New => Confirmed ** Changed in: nova Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1442048 Title: Avoid websocket proxies needing to have matching have config '*_baseurl' configs with compute nodes Status in OpenStack Compute (Nova): Confirmed Bug description: As part of the fix for the related bug - we've added protocol checking to mitigate MITM attacks, however we base protocol checking on a config option that is normally only intended for compute hosts. This is quite user hostile, as it is now important that all nodes running compute and proxy services have this option in sync. We can do better than that - we can persist the URL the client is expected to use, and once we get it back on token validation, we can make sure that the request is using the intended protocol, mitigating the MITM injected script attacks. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1442048/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1436314] Re: Option to boot VM only from volume is not available
It is enough to specify --boot-volume option, (see https://wiki.openstack.org/wiki/BlockDeviceConfig for more details about the block device mapping syntax). Setting max_local_block_devices to 0 means that any request that attempts to create a local disk will fail. This option is meant to limit the number of local discs (so root local disc that is the result of --image being used, and any other ephemeral and swap disks). AFAIK Tempest by it's very nature will test both booting instances from volumes and from images downloaded to hypervisor local storage, so it makes very little sense to me to attempt to limit the environment tempest runs against to allow only boot from volume and then expect to be able to run tests that spawn instances from images. max_local_block_devices set to 0 does not mean that nova will automatically convert --images to volumes and boot instances from volumes - it just means that all request that attempt to create a local disk will fail. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1436314 Title: Option to boot VM only from volume is not available Status in OpenStack Compute (Nova): Invalid Status in Tempest: New Bug description: Issue: When service provider wants to use only boot from volume option for booting a server then the integration tests fails. No option in Tempest to use only boot from volume for booting the server. Expected : a parameter in Tempest.conf for option of boot_from_volume_only for all the tests except for image tests. $ nova boot --flavor FLAVOR_ID [--image IMAGE_ID] / [ --boot-volume BOOTABLE_VOLUME] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1436314/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1439282] [NEW] Cinder API errors and timeouts can cause Nova to not save data about volumes
Public bug reported: A user reports: Nova's Block device mappings can become invalid/inconsistent if errors are encountered while calling for Cinder to attach a volume. 2014-12-18 11:14:41.594 19473 ERROR nova.compute.manager [req-6f65b7d5-0930-4adf-9b5f-dd20eb1a707e 96612f5455c44e95960e733c48eaccc9 1076a7e653b3465295131c495e7d4ae4] [instance: 463dbedc-00f4-4c66-a00-139a4d79a46e] Instance failed block device setup 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] Traceback (most recent call last): 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1706, in _prep_block _device 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] self.driver, self._await_block_device_map_created)) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 367, in attach_blo ck_devices 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] map(_log_and_attach, block_device_mapping) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 365, in _log_and_a ttach 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] bdm.attach(*attach_args, **attach_kwargs) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 322, in attach 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] volume_api, virt_driver) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 44, in wrapped 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] ret_val = method(obj, context, *args, **kwargs) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 255, in attach 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] self['mount_device'], mode=mode) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 173, in wrapper 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] res = method(self, ctx, volume_id, *args, **kwargs) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 262, in attach 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] mountpoint, mode=mode) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/cinderclient/v1/volumes.py", line 266, in attach 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] 'mode': mode}) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/cinderclient/v1/volumes.py", line 250, in _action 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] return self.api.client.post(url, body=body) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 223, in post 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] return self._cs_request(url, 'POST', **kwargs) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 212, in _cs_request 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] raise exceptions.ConnectionError(msg) 2014-12-18 11:14:41.594 19473 TRACE nova.compute.manager [instance: 463dbedc-00f4-4c66-a020-139a4d79a46e] ConnectionError: Unable to establish connection: HTTPConnectionPool(): Max retries exceeded with url: /v1/1076a7e653b346
[Yahoo-eng-team] [Bug 1438238] [NEW] Several concurent scheduling requests for CPU pinning may fail due to racy host_state handling
Public bug reported: The issue happens when multiple scheduling attempts that request CPU pinning are done in parallel. 015-03-25T14:18:00.222 controller-0 nova-scheduler err Exception during message handling: Cannot pin/unpin cpus [4] from the following pinned set [3, 4, 5, 6, 7, 8, 9] 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher incoming.message)) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/oslo/messaging/rpc/server.py", line 139, in inner 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher return func(*args, **kwargs) 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/manager.py", line 86, in select_destinations 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site- packages/nova/scheduler/filter_scheduler.py", line 80, in select_destinations 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site- packages/nova/scheduler/filter_scheduler.py", line 241, in _schedule 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/scheduler/host_manager.py", line 266, in consume_from_instance 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 1472, in get_host_numa_usage_from_instance 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/virt/hardware.py", line 1344, in numa_usage_from_instances 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "./usr/lib64/python2.7/site-packages/nova/objects/numa.py", line 91, in pin_cpus 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher CPUPinningInvalid: Cannot pin/unpin cpus [4] from the following pinned set [3, 4, 5, 6, 7, 8, 9] 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher What is likely happening is: * nova-scheduler is handling several RPC calls to select_destinations at the same time, in multiple greenthreads * greenthread 1 runs the NUMATopologyFilter and selects a cpu on a particular compute node, updating host_state.instance_numa_topology * greenthread 1 then blocks for some reason * greenthread 2 runs the NUMATopologyFilter and selects the same cpu on the same compute node, updating host_state.instance_numa_topology. This also seems like an issue if a different cpu was selected, as it would be overwriting the instance_numa_topology selected by greenthread 1. * greenthread 2 then blocks for some reason * greenthread 1 gets scheduled and calls consume_from_instance, which consumes the numa resources based on what is in host_state.instance_numa_topology * greenthread 1 completes the scheduling operation * greenthread 2 gets scheduled and calls consume_from_instance, which consumes the numa resources based on what is in host_state.instance_numa_topology - since the resources were already consumed by greenthread 1, we get the exception above ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1438238 Title: Several concurent scheduling requests for CPU pinning may fail due to racy host_state handling Status in OpenStack Compute (Nova): New Bug description: The issue happens when multiple scheduling attempts that request CPU pinning are done in parallel. 015-03-25T14:18:00.222 controller-0 nova-scheduler err Exception during message handling: Cannot pin/unpin cpus [4] from the following pinned set [3, 4, 5, 6, 7, 8, 9] 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib64/python2.7/site- packages/oslo/messaging/rpc/dispatcher.py", l
[Yahoo-eng-team] [Bug 1433609] [NEW] Not adding a image block device mapping causes some valid boot requests to fail
Public bug reported: The following commit removed the code in the python nova client that would add an image block device mapping entry (source_type: image, destination_type: local) in preparation for fixing https://bugs.launchpad.net/nova/+bug/1377958. However this makes some valid instance boot requests not work as expected as they will not pass the block device mapping validation because of this. An example would be: nova boot test-vm --flavor m1.medium --image centos-vm-32 --nic net- id=c3f40e33-d535-4217-916b-1450b8cd3987 --block-device id=26b7b917-2794 -452a-95e5-2efb2ca6e32d,bus=sata,source=volume,bootindex=1 Which would be a valid boot request previously since the client would add a block device with boot_index=0 that would not fail. ** Affects: nova Importance: High Status: New ** Changed in: nova Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1433609 Title: Not adding a image block device mapping causes some valid boot requests to fail Status in OpenStack Compute (Nova): New Bug description: The following commit removed the code in the python nova client that would add an image block device mapping entry (source_type: image, destination_type: local) in preparation for fixing https://bugs.launchpad.net/nova/+bug/1377958. However this makes some valid instance boot requests not work as expected as they will not pass the block device mapping validation because of this. An example would be: nova boot test-vm --flavor m1.medium --image centos-vm-32 --nic net- id=c3f40e33-d535-4217-916b-1450b8cd3987 --block-device id=26b7b917-2794-452a- 95e5-2efb2ca6e32d,bus=sata,source=volume,bootindex=1 Which would be a valid boot request previously since the client would add a block device with boot_index=0 that would not fail. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1433609/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1257142] Re: booting multiple instances in one API call doesn't work with booting from volume
IMHO this is the intended behaviour. Booting more than one instance with a snapshot would actually work fine. If you choose a specific volume from cinder, then of course it cannot be attached to more than a single instance, the error you are seeing is a guard specifically against that case (though the error message could use a bit of re-wording). Closing as invalid. ** Changed in: nova Status: Triaged => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1257142 Title: booting multiple instances in one API call doesn't work with booting from volume Status in OpenStack Compute (Nova): Invalid Bug description: Some users of Cinder tried to launch multiple instances using 'nova boot' with '--num-instances' flag together with a bunch of volumes (boot-from-volume). But that doesn't work. For example: nova boot --flavor 28 --image b40a54f2-5691-497c-8c54-2110d1bc203a --num-instances 3 --block-device-mapping vda=71ba253a-0011-4e08-860a-6c908fefb06c,vda=db326b69-c106-4c9b-bacc-cb3806793023,vda=6b7682fe-00cf-43b2-a1b5-056c4ec16aae test-servers ERROR: Cannot attach one or more volumes to multiple instances (HTTP 400) (Request-ID: req-dd71fd30-11e4-4374-bf63-ab0dedce53e0) It'd be nice if we can figure a way to support booting multiple instances from volume using single API call instead of iterating several times creating one instance at a time. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1257142/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1427772] [NEW] Instance that uses force-host still needs to run some filters
Public bug reported: More in-depth discussion can be found here: http://lists.openstack.org/pipermail/openstack- dev/2015-February/056695.html Basically - there is a number of filters that need to be re-run even if we force a host. The reasons are two-fold. Placing some instances on some hosts is an obvious mistake and should be disallowed (instances with specific CPU pinning are an example), though this will be eventually rejected by the host. Second reason is that claims logic on compute hosts depends on limits being set by the filters, and if they are not some of the oversubscription as well as more complex placement logic will not work for the instance (see the following bug report as to how it impacts NUMA placement logic https://bugzilla.redhat.com/show_bug.cgi?id=1189906) Overall completely bypassing the filters is not ideal. ** Affects: nova Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: Confirmed ** Tags: scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1427772 Title: Instance that uses force-host still needs to run some filters Status in OpenStack Compute (Nova): Confirmed Bug description: More in-depth discussion can be found here: http://lists.openstack.org/pipermail/openstack- dev/2015-February/056695.html Basically - there is a number of filters that need to be re-run even if we force a host. The reasons are two-fold. Placing some instances on some hosts is an obvious mistake and should be disallowed (instances with specific CPU pinning are an example), though this will be eventually rejected by the host. Second reason is that claims logic on compute hosts depends on limits being set by the filters, and if they are not some of the oversubscription as well as more complex placement logic will not work for the instance (see the following bug report as to how it impacts NUMA placement logic https://bugzilla.redhat.com/show_bug.cgi?id=1189906) Overall completely bypassing the filters is not ideal. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1427772/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1403547] [NEW] flavor extra_specs are not passed to any of the filters
Public bug reported: https://review.openstack.org/#/c/122557/7/nova/scheduler/utils.py broke this by removing the 2 lines that would make sure extra_specs are dug up from the DB before passing adding the instance_type to the request_spec, which eventually get's passed as part of the filter_properties (wrongfully, but that's a different bug) to all filters. The fix is to either put this line back, or alternatively remove instance_type from the update call here: https://github.com/openstack/nova/blob/fec5ff129465ab35ca8cc37fa8dafd368233b7b6/nova/scheduler/filter_scheduler.py#L119 The consequence is that AggregateInstanceExtraSpecsFilter, ComputeCapabilitiesFilter, TrustedFilter and trusted filters are broken in master since cb338cb7692e12cc94515f1f09008d0e328c1505 ** Affects: nova Importance: Critical Status: Confirmed ** Tags: scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1403547 Title: flavor extra_specs are not passed to any of the filters Status in OpenStack Compute (Nova): Confirmed Bug description: https://review.openstack.org/#/c/122557/7/nova/scheduler/utils.py broke this by removing the 2 lines that would make sure extra_specs are dug up from the DB before passing adding the instance_type to the request_spec, which eventually get's passed as part of the filter_properties (wrongfully, but that's a different bug) to all filters. The fix is to either put this line back, or alternatively remove instance_type from the update call here: https://github.com/openstack/nova/blob/fec5ff129465ab35ca8cc37fa8dafd368233b7b6/nova/scheduler/filter_scheduler.py#L119 The consequence is that AggregateInstanceExtraSpecsFilter, ComputeCapabilitiesFilter, TrustedFilter and trusted filters are broken in master since cb338cb7692e12cc94515f1f09008d0e328c1505 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1403547/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1375868] [NEW] libvirt: race between hot unplug and XMLDesc in _get_instance_disk_info
Public bug reported: THis came up when analyzing https://bugs.launchpad.net/nova/+bug/1371677 and there is a lot information on there. The bug in short is that _get_instance_disk_info will rely on db information to filter out the volumes from the list of discs it gets from libvirt XML, but due to the async nature of unplug - this can still contain a volume that does not exist in the DB and will not be filtered out, so the code will assume it's an lvm image and do a blockdev on it which can block for a very long time. The solution is to NOT use libvirt XML in this particular case (but anywhere in Nova really) to find out information about running instances. ** Affects: nova Importance: High Status: New ** Tags: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1375868 Title: libvirt: race between hot unplug and XMLDesc in _get_instance_disk_info Status in OpenStack Compute (Nova): New Bug description: THis came up when analyzing https://bugs.launchpad.net/nova/+bug/1371677 and there is a lot information on there. The bug in short is that _get_instance_disk_info will rely on db information to filter out the volumes from the list of discs it gets from libvirt XML, but due to the async nature of unplug - this can still contain a volume that does not exist in the DB and will not be filtered out, so the code will assume it's an lvm image and do a blockdev on it which can block for a very long time. The solution is to NOT use libvirt XML in this particular case (but anywhere in Nova really) to find out information about running instances. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1375868/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1373950] [NEW] Serial proxy service and API broken by design
Public bug reported: As part of the blueprint https://blueprints.launchpad.net/nova/+spec /serial-ports we introduced an API extension and a websocket proxy binary. The problem with the 2 is that a lot of the stuff was copied verbatim from the novnc-proxy API and service which relies heavily on the internal implementation details of NoVNC and python-websockify libraries. We should not ship a service that will proxy websocket traffic if we do not acutally serve a web-based client for it (in the NoVNC case, it has it's own HTML5 VNC implementation that works over ws://). No similar thing was part of the proposed (and accepted) implementation. The websocket proxy based on websockify that we currently have actually assumes it will serve static content (which we don't do for serial console case) which will then when excuted in the browser initiate a websocket connection that sends the security token in the cookie: field of the request. All of this is specific to the NoVNC implementation (see: https://github.com/kanaka/noVNC/blob/e4e9a9b97fec107b25573b29d2e72a6abf8f0a46/vnc_auto.html#L18) and does not make any sense for serial console functionality. The proxy service was introduced in https://review.openstack.org/#/c/113963/ In a similar manner - the API that was proposed and implemented (in https://review.openstack.org/#/c/113966/) that gives us back the URL with the security token makes no sense for the same reasons outlined above. We should revert at least these 2 patches before the final Juno release as we do not want to ship a useless service and commit to a useles API method. We could then look into providing similar functionality through possibly something like https://github.com/chjj/term.js which will require us to write a different proxy service. ** Affects: nova Importance: Critical Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1373950 Title: Serial proxy service and API broken by design Status in OpenStack Compute (Nova): Confirmed Bug description: As part of the blueprint https://blueprints.launchpad.net/nova/+spec /serial-ports we introduced an API extension and a websocket proxy binary. The problem with the 2 is that a lot of the stuff was copied verbatim from the novnc-proxy API and service which relies heavily on the internal implementation details of NoVNC and python-websockify libraries. We should not ship a service that will proxy websocket traffic if we do not acutally serve a web-based client for it (in the NoVNC case, it has it's own HTML5 VNC implementation that works over ws://). No similar thing was part of the proposed (and accepted) implementation. The websocket proxy based on websockify that we currently have actually assumes it will serve static content (which we don't do for serial console case) which will then when excuted in the browser initiate a websocket connection that sends the security token in the cookie: field of the request. All of this is specific to the NoVNC implementation (see: https://github.com/kanaka/noVNC/blob/e4e9a9b97fec107b25573b29d2e72a6abf8f0a46/vnc_auto.html#L18) and does not make any sense for serial console functionality. The proxy service was introduced in https://review.openstack.org/#/c/113963/ In a similar manner - the API that was proposed and implemented (in https://review.openstack.org/#/c/113966/) that gives us back the URL with the security token makes no sense for the same reasons outlined above. We should revert at least these 2 patches before the final Juno release as we do not want to ship a useless service and commit to a useles API method. We could then look into providing similar functionality through possibly something like https://github.com/chjj/term.js which will require us to write a different proxy service. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1373950/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1372845] [NEW] libvirt: Instance NUMA fitting code fails to account for vpu_pin_set config option properly
Public bug reported: Looking at this branch of the NUMA fitting code https://github.com/openstack/nova/blob/51de439a4d1fe5e17d59d3aac3fd2c49556e641b/nova/virt/libvirt/driver.py#L3738 We do not account for allowed cpus when choosing viable cells for the given instance. meaning we could chose a NUMA cell that has no viable CPUs which we will try to pin to. We need to consider allowed_cpus when calculating viable NUMA cells for the instance. ** Affects: nova Importance: High Status: Confirmed ** Tags: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1372845 Title: libvirt: Instance NUMA fitting code fails to account for vpu_pin_set config option properly Status in OpenStack Compute (Nova): Confirmed Bug description: Looking at this branch of the NUMA fitting code https://github.com/openstack/nova/blob/51de439a4d1fe5e17d59d3aac3fd2c49556e641b/nova/virt/libvirt/driver.py#L3738 We do not account for allowed cpus when choosing viable cells for the given instance. meaning we could chose a NUMA cell that has no viable CPUs which we will try to pin to. We need to consider allowed_cpus when calculating viable NUMA cells for the instance. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1372845/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1369945] Re: libvirt: libvirt reports even single cell NUMA topologies
Now that https://bugs.launchpad.net/nova/+bug/1369984 is fixed - we can mark this as invalid. ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1369945 Title: libvirt: libvirt reports even single cell NUMA topologies Status in OpenStack Compute (Nova): Invalid Bug description: Libvirt reports even single numa nodes in it's hypervisor capabilities (which we use to figure out if a compute host is a NUMA host). This is technically correct, but in Nova we assume that to mean - no NUMA capabilities when scheduling instances. Right now we just pass what we get from libvirt as is to the resource tracker, but we need to make sure that "single NUMA node" hypervisors are reported back to the resource tracker as non-NUMA. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1369945/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1370390] [NEW] Resize instace will not change the NUMA topology of a running instance to the one from the new flavor
Public bug reported: When we resize (change the flavor) of an instance that has a NUMA topology defined, the NUMA info from the new flavor will not be considered during scheduling. The instance will get re-scheduled based on the old NUMA information, but the claiming on the host will use the new flavor data. Once the instane sucessfully lands on a host, we will still use the old data when provisioning it on the new host. We should be considering only the new flavor information in resizes. ** Affects: nova Importance: High Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1370390 Title: Resize instace will not change the NUMA topology of a running instance to the one from the new flavor Status in OpenStack Compute (Nova): New Bug description: When we resize (change the flavor) of an instance that has a NUMA topology defined, the NUMA info from the new flavor will not be considered during scheduling. The instance will get re-scheduled based on the old NUMA information, but the claiming on the host will use the new flavor data. Once the instane sucessfully lands on a host, we will still use the old data when provisioning it on the new host. We should be considering only the new flavor information in resizes. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1370390/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1369984] [NEW] NUMA topology checking will not check if instance can fit properly.
Public bug reported: When testing weather the instance can fit into the host topology will currently not take into account the number of cells hte instance has, and will only claim matching cells and pass an instance if the matching cells fit. So for example a 4 NUMA cell isntance would pass the claims test on a 2 NUMA cell host, as long as the first 2 cells fit, without considering that the whole instance will not actually fit. ** Affects: nova Importance: High Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1369984 Title: NUMA topology checking will not check if instance can fit properly. Status in OpenStack Compute (Nova): Confirmed Bug description: When testing weather the instance can fit into the host topology will currently not take into account the number of cells hte instance has, and will only claim matching cells and pass an instance if the matching cells fit. So for example a 4 NUMA cell isntance would pass the claims test on a 2 NUMA cell host, as long as the first 2 cells fit, without considering that the whole instance will not actually fit. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1369984/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1369945] [NEW] libvirt: libvirt reports even single cell NUMA topologies
Public bug reported: Libvirt reports even single numa nodes in it's hypervisor capabilities (which we use to figure out if a compute host is a NUMA host). This is technically correct, but in Nova we assume that to mean - no NUMA capabilities when scheduling instances. Right now we just pass what we get from libvirt as is to the resource tracker, but we need to make sure that "single NUMA node" hypervisors are reported back to the resource tracker as non-NUMA. ** Affects: nova Importance: High Status: Confirmed ** Tags: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1369945 Title: libvirt: libvirt reports even single cell NUMA topologies Status in OpenStack Compute (Nova): Confirmed Bug description: Libvirt reports even single numa nodes in it's hypervisor capabilities (which we use to figure out if a compute host is a NUMA host). This is technically correct, but in Nova we assume that to mean - no NUMA capabilities when scheduling instances. Right now we just pass what we get from libvirt as is to the resource tracker, but we need to make sure that "single NUMA node" hypervisors are reported back to the resource tracker as non-NUMA. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1369945/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1369508] [NEW] Instance with NUMA topology causes exception in the scheduler
Public bug reported: This was reported by Michael Turek as he was testing this while the patches were still in flight See: https://review.openstack.org/#/c/114938/26/nova/virt/hardware.py As described on there - the code there makes a bad assumption about the format in which it will get the data in the scheduler, which results in: 2014-09-15 10:45:44.906 ERROR oslo.messaging.rpc.dispatcher [req-f29a469e-268d-49bf-abfa-0ccb228d768c admin admin] Exception during message handling: An object of type InstanceNUMACell is required here 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher incoming.message)) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/server.py", line 139, in inner 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher return func(*args, **kwargs) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/manager.py", line 175, in select_destinations 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher filter_properties) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 147, in select_destinations 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher filter_properties) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 300, in _schedule 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher chosen_host.obj.consume_from_instance(context, instance_properties) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/scheduler/host_manager.py", line 252, in consume_from_instance 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher self, instance) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/hardware.py", line 978, in get_host_numa_usage_from_instance 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher instance_numa_topology = instance_topology_from_instance(instance) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/hardware.py", line 949, in instance_topology_from_instance 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher cells=cells) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/base.py", line 242, in __init__ 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher self[key] = kwargs[key] 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/base.py", line 474, in __setitem__ 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher setattr(self, name, value) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/base.py", line 75, in setter 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher field_value = field.coerce(self, name, value) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/fields.py", line 189, in coerce 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher return self._type.coerce(obj, attr, value) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/fields.py", line 388, in coerce 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher obj, '%s[%i]' % (attr, index), element) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/fields.py", line 189, in coerce 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher return self._type.coerce(obj, attr, value) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/objects/fields.py", line 474, in coerce 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher self._obj_name) 2014-09-15 10:45:44.906 TRACE oslo.messaging.rpc.dispatcher ValueError: An object of type InstanceNUMACell is required here ** Affects: nova Importance: High Status: Confirmed
[Yahoo-eng-team] [Bug 1369502] [NEW] NUMA topology _get_constraints_auto assumes flavor object
Public bug reported: Resulting in AttributeError: 'dict' object has no attribute 'vcpus' if we try to start with a flavor that will result in Nova trying to decide on an automatic topology (for example providing only number of nodes with hw:numa_nodes extra_spec) ** Affects: nova Importance: High Assignee: Nikola Đipanov (ndipanov) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1369502 Title: NUMA topology _get_constraints_auto assumes flavor object Status in OpenStack Compute (Nova): In Progress Bug description: Resulting in AttributeError: 'dict' object has no attribute 'vcpus' if we try to start with a flavor that will result in Nova trying to decide on an automatic topology (for example providing only number of nodes with hw:numa_nodes extra_spec) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1369502/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1360656] [NEW] Objects remotable decorator fails to properly handle ListOfObjects field if it is in the updates dict
Public bug reported: Since this change https://review.openstack.org/#/c/98607/, if the conductor sends back a field of type ListOfObjects field in the updates dictionary after a remotable decorator has called the object_action RPC method, restoring them into objects will fail since they will already be 'hydrated' but the field's from_primitive logic won't know hot to deal with that. ** Affects: nova Importance: Critical Status: Confirmed ** Tags: unified-objects ** Tags added: unified-objects -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1360656 Title: Objects remotable decorator fails to properly handle ListOfObjects field if it is in the updates dict Status in OpenStack Compute (Nova): Confirmed Bug description: Since this change https://review.openstack.org/#/c/98607/, if the conductor sends back a field of type ListOfObjects field in the updates dictionary after a remotable decorator has called the object_action RPC method, restoring them into objects will fail since they will already be 'hydrated' but the field's from_primitive logic won't know hot to deal with that. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1360656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1359617] [NEW] libvirt: driver calls volume connect twice for every volume on boot
Public bug reported: Libvirt driver will attempt to connect the volume on the hipervisor twice for every volume provided to the instance when booting. If you examine the libvirt driver's spawn() method, both _get_guest_xml (by means of get_guest_storage_config) and _create_domain_and_network will call the _connect_volume method which works out the volume driver and then dispatches the connect logic. This is especially bad in the iscsi volume driver case, where we do 2 rootwraped calls in the best case, one of which is the target rescan, that can in theory add and remove devices in the kernel. I suspect that fixing this will make a number of races that have to do with the volume not being present when expected on the hypervisor, at least less likely to happen, in addition to making the boot process with volumes more performant. An example of a race condition that may be caused or made worse by this is: https://bugs.launchpad.net/cinder/+bug/1357677 ** Affects: nova Importance: High Status: Confirmed ** Tags: libvirt volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1359617 Title: libvirt: driver calls volume connect twice for every volume on boot Status in OpenStack Compute (Nova): Confirmed Bug description: Libvirt driver will attempt to connect the volume on the hipervisor twice for every volume provided to the instance when booting. If you examine the libvirt driver's spawn() method, both _get_guest_xml (by means of get_guest_storage_config) and _create_domain_and_network will call the _connect_volume method which works out the volume driver and then dispatches the connect logic. This is especially bad in the iscsi volume driver case, where we do 2 rootwraped calls in the best case, one of which is the target rescan, that can in theory add and remove devices in the kernel. I suspect that fixing this will make a number of races that have to do with the volume not being present when expected on the hypervisor, at least less likely to happen, in addition to making the boot process with volumes more performant. An example of a race condition that may be caused or made worse by this is: https://bugs.launchpad.net/cinder/+bug/1357677 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1359617/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1359596] [NEW] Objects should be able to backport related objects automatically
Public bug reported: Following change https://review.openstack.org/#/c/114594 adds checking for related versions of objects. This is imho wrong because it will make for unnecessary versioning code that will need to be written by developers. Better way to do this would be to declare version on the ObjectField and then do all the necesary backports automatically as the code is always: primitive['field_name'] = ( objects.RlatedObject().object_make_compatible( primitive, field_version)) And thus can be done in the superclass in a generic way with a little bit of tweaking of the ObjectField to know it's expected version, and stop the proliferation of boilerplate that can be an easy source of bugs. Furthermore it will stop the unnecessary proliferation of versions of all related objects. We would need to bump the version of the object that owns another object only when we require new functionality from the owned object. ** Affects: nova Importance: High Status: Confirmed ** Tags: unified-objects -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1359596 Title: Objects should be able to backport related objects automatically Status in OpenStack Compute (Nova): Confirmed Bug description: Following change https://review.openstack.org/#/c/114594 adds checking for related versions of objects. This is imho wrong because it will make for unnecessary versioning code that will need to be written by developers. Better way to do this would be to declare version on the ObjectField and then do all the necesary backports automatically as the code is always: primitive['field_name'] = ( objects.RlatedObject().object_make_compatible( primitive, field_version)) And thus can be done in the superclass in a generic way with a little bit of tweaking of the ObjectField to know it's expected version, and stop the proliferation of boilerplate that can be an easy source of bugs. Furthermore it will stop the unnecessary proliferation of versions of all related objects. We would need to bump the version of the object that owns another object only when we require new functionality from the owned object. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1359596/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1347499] [NEW] block-device source=blank, dest=volume is allowed as a combination, but won't work
Public bug reported: This is a spin-off of https://bugs.launchpad.net/nova/+bug/1347028 As per the example given there - currently source=blank, destination=volume will not work. We should either make it create an empty volume and attach it, or disallow it in the API. ** Affects: nova Importance: Low Status: New ** Tags: volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1347499 Title: block-device source=blank,dest=volume is allowed as a combination, but won't work Status in OpenStack Compute (Nova): New Bug description: This is a spin-off of https://bugs.launchpad.net/nova/+bug/1347028 As per the example given there - currently source=blank, destination=volume will not work. We should either make it create an empty volume and attach it, or disallow it in the API. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1347499/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1317880] Re: Boot from image (creates a new volume) starts an instance with no image
All of this is by design - image field on the instance means that the instance was started with the particular image. If the volume was created from an image at any point, and instance was booted from that volume at a later stage - it may or may not have anything to do with the image, so setting it is wrong and probably breaks a bunch of assumptions Nova code makes about the empty image field for instances booted from volume. Luckily there is a revert for this commit that was merged by mistake here: https://review.openstack.org/#/c/107875/ ** Changed in: nova Status: Fix Committed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1317880 Title: Boot from image (creates a new volume) starts an instance with no image Status in OpenStack Compute (Nova): Invalid Bug description: 1. Fire up a DevStack instance from the stable/havana, stable/icehouse, or master branches. 2. Go into Horizon 3. Launch an instance 3.1 Instance Boot Source: Boot from image (creates a new volume) 3.2 Image Name: cirros 3.3 Device size (GB): 1 When the instance finishes booting you’ll see that the instance only has a ‘-‘ in the Image Name column. If you click on the instance you’ll see in the Overview Meta section “Image Name (not found)”. My understanding of Boot from image (creates a new volume) is that it simply creates a instance and attaches a volume automatically. It’s basically a convenience for the user. Is that right? Seems the bug is in Nova as the instance was created with the cirros image and Nova isn’t reporting that fact back. The different responses from various clients. API curl .../v2/tenant_id/servers/server_id "image": “” python-novaclient nova show server_id "Attempt to boot from volume - no image supplied” Horizon "Image Name (not found)" I suspect Horizon is making some bad calls but Nova shouldn’t be allowing an instance to get into this state. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1317880/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1337821] Re: VMDK Volume attach fails while attaching to an instance that is booted from VMDK volume
** No longer affects: horizon -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1337821 Title: VMDK Volume attach fails while attaching to an instance that is booted from VMDK volume Status in OpenStack Compute (Nova): In Progress Bug description: I have booted an instance from a volume, successfully booted, now another volume, i try to attach to same instance, it is failing. see the stack trace.. 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher raise exception.InvalidDevicePath(path=root_device_name) 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher InvalidDevicePath: The supplied device path (vda) is invalid. 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher 2014-07-04 08:56:11.396 ERROR oslo.messaging._drivers.common [req-648122d5-fd39-495b-a3a7-a96bd32091d6 admin admin] Returning exception The supplied device path (vda) is invalid. to caller 2014-07-04 08:56:11.396 ERROR oslo.messaging._drivers.common [req-648122d5-fd39-495b-a3a7-a96bd32091d6 admin admin] ['Traceback (most recent call last):\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\nincoming.message))\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\nreturn self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\nresult = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 401, in decorated_function\nreturn function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/exception.py", line 88, in wrapped\npayload)\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\nsix.reraise(self.type_, self.value, self.tb) \n', ' File "/opt/stack/nova/nova/exception.py", line 71, in wrapped\n return f(self, context, *args, **kw)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 286, in decorated_function\n pass\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 272, in decorated_function\n return function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function\n kwargs[\'instance\'], e, sys.exc_info())\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 302, in decorated_function\n return function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 4201, in reserve_block_device_name\nret urn do_reserve()\n', ' File "/opt/stack/nova/nova/openstack/common/lockutils.py", line 249, in inner\n return f(*args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 4188, in do_reserve\n context, instance, bdms, device)\n', ' File "/opt/stack/nova/nova/compute/utils.py", line 106, in get_device_name_for_instance\nmappings[\'root\'], device)\n', ' File "/opt/stack/nova/nova/compute/utils.py", line 155, in get_next_device_name\n raise exception.InvalidDevicePath(path=root_device_name)\n', 'InvalidDevicePath: The supplied device path (vda) is invalid.\n'] The reason behind this issue is: because of the root device_name being set 'vda' in the case of boot from volume, The future volume attaches to the VM fail saying "The supplied device path (vda) is invalid" To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1337821/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1337821] Re: VMDK Volume attach fails while attaching to an instance that is booted from VMDK volume
Removing the VMWare tag as looking at the code it seems to affect all drivers. Also marking as invalid for Nova - Horizon should not be making assumptions about the device name for attach. ** Also affects: horizon Importance: Undecided Status: New ** Changed in: nova Status: New => Invalid ** Tags removed: vmware -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1337821 Title: VMDK Volume attach fails while attaching to an instance that is booted from VMDK volume Status in OpenStack Dashboard (Horizon): New Status in OpenStack Compute (Nova): Invalid Bug description: I have booted an instance from a volume, successfully booted, now another volume, i try to attach to same instance, it is failing. see the stack trace.. 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher raise exception.InvalidDevicePath(path=root_device_name) 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher InvalidDevicePath: The supplied device path (vda) is invalid. 2014-07-04 08:56:11.391 TRACE oslo.messaging.rpc.dispatcher 2014-07-04 08:56:11.396 ERROR oslo.messaging._drivers.common [req-648122d5-fd39-495b-a3a7-a96bd32091d6 admin admin] Returning exception The supplied device path (vda) is invalid. to caller 2014-07-04 08:56:11.396 ERROR oslo.messaging._drivers.common [req-648122d5-fd39-495b-a3a7-a96bd32091d6 admin admin] ['Traceback (most recent call last):\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\nincoming.message))\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\nreturn self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\nresult = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 401, in decorated_function\nreturn function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/exception.py", line 88, in wrapped\npayload)\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\nsix.reraise(self.type_, self.value, self.tb) \n', ' File "/opt/stack/nova/nova/exception.py", line 71, in wrapped\n return f(self, context, *args, **kw)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 286, in decorated_function\n pass\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\nsix.reraise(self.type_, self.value, self.tb)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 272, in decorated_function\n return function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function\n kwargs[\'instance\'], e, sys.exc_info())\n', ' File "/opt/stack/nova/nova/openstack/common/excutils.py", line 82, in __exit__\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 302, in decorated_function\n return function(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 4201, in reserve_block_device_name\nret urn do_reserve()\n', ' File "/opt/stack/nova/nova/openstack/common/lockutils.py", line 249, in inner\n return f(*args, **kwargs)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 4188, in do_reserve\n context, instance, bdms, device)\n', ' File "/opt/stack/nova/nova/compute/utils.py", line 106, in get_device_name_for_instance\nmappings[\'root\'], device)\n', ' File "/opt/stack/nova/nova/compute/utils.py", line 155, in get_next_device_name\n raise exception.InvalidDevicePath(path=root_device_name)\n', 'InvalidDevicePath: The supplied device path (vda) is invalid.\n'] To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1337821/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1321370] Re: nova overwrites hw_disk_bus image property with incorrect value
*** This bug is a duplicate of bug 1255449 *** https://bugs.launchpad.net/bugs/1255449 Ah so looks like this is actually fixed in icehouse - we just need to backport it to Havana. See https://bugs.launchpad.net/nova/+bug/1255449 and the related fix. Let me close this is a dublicate of that - and I will propose a backport for havana stable there. ** This bug has been marked a duplicate of bug 1255449 Libvirt Driver - Custom disk_bus setting is being lost on instance power on -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1321370 Title: nova overwrites hw_disk_bus image property with incorrect value Status in OpenStack Compute (Nova): Triaged Bug description: Currently using Havana Booting from a snapshot with the image property 'hw_disk_bus' = ide boots fine initially. Shutting down/restarting the instance via the dashboard overwrites this value with 'virtio' in the libvirt.xml definition. The value in glance and nova image is correct. glance image-show b2e157f7-d244-4f61-afdf-d39af63f67c6 +---+--+ | Property | Value | +---+--+ | Property 'base_image_ref' | e8ce2f05-f399-4e8f-aa98-c38a8b9d9fbb | | Property 'hw_disk_bus'| ide | | Property 'image_location' | snapshot | | Property 'image_state'| available | | Property 'image_type' | snapshot | | Property 'instance_type_ephemeral_gb' | 0 | | Property 'instance_type_flavorid' | 550ac351-fa21-4315-8309-bec97f00536b | | Property 'instance_type_id' | 24 | | Property 'instance_type_memory_mb'| 4096 | | Property 'instance_type_name' | windows7 | | Property 'instance_type_root_gb' | 35 | | Property 'instance_type_rxtx_factor' | 1 | | Property 'instance_type_swap' | 2000 | | Property 'instance_type_vcpus'| 2 | | Property 'instance_uuid' | b34995bc-50f6-4a9f-bc54-f8b62f0b69eb | | Property 'os_type'| None | | Property 'owner_id' | 473a5f18d57a4746abfb3d6ed33cea45 | | Property 'user_id'| 40caf1d1cb994fbfb8c905e68d07b283 | | checksum | fdad2f12773319dfa8a71dac3cdd4e5a | | container_format | bare | | created_at| 2014-04-29T17:29:57 | | deleted | False | | disk_format | qcow2 | | id| b2e157f7-d244-4f61-afdf-d39af63f67c6 | | is_public | True | | min_disk | 35 | | min_ram | 2048 | | name | pre-migration | | protected | False | | size | 12756713472 | | status| active | | updated_at| 2014-05-12T14:27:25 | +---+--+ nova image-show b2e157f7-d244-4f61-afdf-d39af63f67c6 +-+--+ | Property| Value| +-+--+ | metadata owner_id | 473a5f18d57a4746abfb3d6ed33cea45 | | minDisk | 35 | | metadata instance_type_name | windows7 | | metadata instance_type_swap | 2000 | | metadata instance_type_memory_mb| 4096 | | id | b2e157f7-d244-4f61-afdf-d39af63f67c6 | | metadata instance_type_
[Yahoo-eng-team] [Bug 1304695] Re: glusterfs: Instance is not using the correct volume snapshot file after reboot
** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1304695 Title: glusterfs: Instance is not using the correct volume snapshot file after reboot Status in Cinder: New Status in OpenStack Compute (Nova): New Bug description: Instance is not using the correct volume snapshot file after reboot. Steps to recreate bug: 1. Create a volume 2. Attach volume to a running instance. 3. Take an online snapshot of the volume. Note that the active volume used by the instance is now switched to volume-.. 4. Shutdown the instance. 5. Start the instance. If you invoke virsh dumpxml , you will see that it is re-attaching the base volume ( volume-) to the instance and not the snapshot volume (volume-.). The expected behavior is to have the snapshot volume re-attach to the instance. This bug will cause data corruption in the snapshot and volume. It looks like the nova volume manager is using a stale copy of the block_device_mapping. The block_device_mapping needs to be refreshed in order for the updated volume snapshot to be used. On power on, the nova manager (nova/compute/manager.py ) does: 1. start_instance 2. _power_on 3. _get_instance_volume_block_device_info The structure for this method is: def _get_instance_volume_block_device_info(self, context, instance, refresh_conn_info=False, bdms=None): if not bdms: bdms = (block_device_obj.BlockDeviceMappingList. get_by_instance_uuid(context, instance['uuid'])) block_device_mapping = ( driver_block_device.convert_volumes(bdms) + driver_block_device.convert_snapshots(bdms) + driver_block_device.convert_images(bdms)) block_device_obj.BlockDeviceMappingList.get_by_instance_uuid() goes and queries the database to construct the bdms, which will contain stale data. To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1304695/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1302545] [NEW] Boot volumes API race
Public bug reported: When there is a race for a volume between 2 or more instances, it is possible for more than one to pass the API check. All of them will get scheduled as a result, and only one will actually successfully attach the volume, while others will go to ERROR. This is not ideal since we can reserve the volume in the API, thus making it a bit more user friendly when there is a race (the user will be informed immediately instead of seeing an errored instance). ** Affects: nova Importance: Low Status: Triaged ** Tags: volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1302545 Title: Boot volumes API race Status in OpenStack Compute (Nova): Triaged Bug description: When there is a race for a volume between 2 or more instances, it is possible for more than one to pass the API check. All of them will get scheduled as a result, and only one will actually successfully attach the volume, while others will go to ERROR. This is not ideal since we can reserve the volume in the API, thus making it a bit more user friendly when there is a race (the user will be informed immediately instead of seeing an errored instance). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1302545/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1297127] Re: nova can't detach volume after force detach in cinder
I'd say this is a reasonable thing to propose, although since forcing in cinder is an admin only command - I am thinking this should be as well. Also I fear there could be edge cases where we really should not allow even the force detach (see https://bugs.launchpad.net/nova/+bug/1240922 where we might want to disable attach for suspended instances). Having all this in mind makes me think this needs to be a BP rather than a bug - so I will move this to Won't fix, and the reporter might propose this as a Bluepring for Juno. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1297127 Title: nova can't detach volume after force detach in cinder Status in OpenStack Compute (Nova): Won't Fix Bug description: There is use case: we have two nova components(call them nova A and nova B) and one cinder component. Attach a volume to an instance in nova A and then services of nova A become abnormal. Because the volume also want to be used in nova B, so using cinder api "force detach volume" to free this volume. But when nova A is normal, nova can't detach this volume from instance by using nova api "detach volume" , as nova check the volume state must be "attached". I think should we add "force detach" function to nova just like "attach" and "detach", because if using force detach volume in cinder, there is still some attach information in nova which can't be cleaned by using nova api "detach". To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1297127/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1253612] Re: Launch Instance Boot from image - creates a new volume fails
Similar as the bug https://bugs.launchpad.net/nova/+bug/1280357, I think marking this one as a won't fix and getting the cinder interactions with events done early in juno makes the most sense to me here. ** Changed in: nova Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1253612 Title: Launch Instance Boot from image - creates a new volume fails Status in OpenStack Compute (Nova): Won't Fix Bug description: steps to reproduce: 1. Launch a new instance with a Boot Source from image (creates a new volume). Nova-Compute side fails with the below logs: 2013-11-21 11:31:30.708 19098 ERROR nova.compute.manager [req-8b32d1cd-42be-4daa-a3a3-2a1429d199c3 b94edf2504c84223b58e254314528902 679545ff6c1e4401adcafa0857aefe2e] [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] Instance failed block device setup 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] Traceback (most recent call last): 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1376, in _prep_block_device 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] self._await_block_device_map_created)) 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 283, in attach_block_devices 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] block_device_mapping) 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 238, in attach 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] wait_func(context, vol['id']) 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 901, in _await_block_device_map_created 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] attempts=attempts) 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] VolumeNotCreated: Volume cff27c84-5f73-40e4-8356-72bd7b3e0b4f did not finish being created even after we waited 71 seconds or 60 attempts. 2013-11-21 11:31:30.708 19098 TRACE nova.compute.manager [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] 2013-11-21 11:31:30.806 19098 AUDIT nova.compute.manager [req-8b32d1cd-42be-4daa-a3a3-2a1429d199c3 b94edf2504c84223b58e254314528902 679545ff6c1e4401adcafa0857aefe2e] [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] Terminating instance 2013-11-21 11:31:31.571 19098 ERROR nova.virt.libvirt.driver [-] [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] During wait destroy, instance disappeared. 2013-11-21 11:31:31.845 19098 ERROR nova.virt.libvirt.vif [req-8b32d1cd-42be-4daa-a3a3-2a1429d199c3 b94edf2504c84223b58e254314528902 679545ff6c1e4401adcafa0857aefe2e] [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] Failed while unplugging vif 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] Traceback (most recent call last): 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py", line 666, in unplug_mlnx_direct 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] vnic_mac, run_as_root=True) 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/utils.py", line 177, in execute 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] return processutils.execute(*cmd, **kwargs) 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] File "/usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py", line 178, in execute 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] cmd=' '.join(cmd)) 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c675b-1bd6-402e-8e73-7d99a4222cc8] ProcessExecutionError: Unexpected error while running command. 2013-11-21 11:31:31.845 19098 TRACE nova.virt.libvirt.vif [instance: 013c67
[Yahoo-eng-team] [Bug 1280357] Re: parameters max_tries and wait_between of method ComputeManager._await_block_device_map_created should be configurable
As discussed on several proposed patches around this (see https://review.openstack.org/#/c/80619/ or https://review.openstack.org/#/c/80619/ which actually rejects this solution). I will move this bug to "won't fix", and will raise a BP targeted for Juno to use some of the code added in https://blueprints.launchpad.net/nova/+spec/admin-event-callback-api to make interactions between nova and cinder better and avoid the need for a configurable timeout. ** Changed in: nova Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1280357 Title: parameters max_tries and wait_between of method ComputeManager._await_block_device_map_created should be configurable Status in OpenStack Compute (Nova): Won't Fix Bug description: When using a weak storage backend and initiating the creation of a lot of new instances using volumes as backend (directly created from an image) I got a lot of InvalidBDM: Block Device Mapping is Invalid. After I had a look on the method _await_block_device_map_created (in nova/manager/ComputeManager) the solution was pretty easy: increasing the max_tries and/or wait_between parameters solved the issue. The storage backend could simply not provide this mass of volumes in a very short time (100 seconds on my testing system). To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1280357/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1075971] Re: Attach volume with libvirt disregards target device but still reserves it
Since it is now possible to both boot instances and attach volumes without specifying device names after https://blueprints.launchpad.net/nova/+spec/improve-block-device- handling BP has been implemented. in which case the device names will be handled properly by Nova. It is still possible to supply device names (for backwards compatibility's sake), which would cause the same behavior as described above. This is really an issue due to the fact that there is no way to make sure libvirt uses the same device name as supplied to it since libvirt only takes this as ordering hints. the best solution really _is_ to rely on Nova to actually choose the device name as per implemented BP. ** Changed in: nova Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1075971 Title: Attach volume with libvirt disregards target device but still reserves it Status in OpenStack Compute (Nova): Won't Fix Bug description: Running devstack with libvirt/qemu - the problem is that attaching a volume (either by passing it with --block_device_mapping to boot or by using nova volume-attach) completely disregards the device name passed as can be seen fromt the folloowng shell session. However the device remains reserved so subsequent attach attempts will fail on the specified device, and succeed with some other given (which will not be honored again). The following session is how to reproduce it: [ndipanov@devstack devstack]$ cinder list +--+---+--+--+-+-+ | ID | Status | Display Name | Size | Volume Type | Attached to | +--+---+--+--+-+-+ | 5792f1ed-c5f7-40c6-913f-43aa66c717c7 | available | bootable | 3 | None| | | abc77933-119b-4105-b085-092c93be36f5 | available | blank_2| 1 | None| | | b4de941a-627c-447a-9226-456159d95173 | available |blank | 1 | None| | +--+---+--+--+-+-+ [ndipanov@devstack devstack]$ nova list [ndipanov@devstack devstack]$ nova boot --image c346fdd1-d438-472b-98f5-b4c5f2b716f8 --flavor 1 --block_device_mapping vdr=b4de941a-627c-447a-9226-456159d95173:::0 --key_name nova_key w_vol ++--+ | Property | Value| ++--+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-STS:power_state | 0| | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state| building | | accessIPv4 | | | accessIPv6 | | | adminPass | CqgT4dXkq64t | | config_drive | | | created| 2012-11-07T14:02:00Z | | flavor | m1.tiny | | hostId | | | id | caa459d5-27ae-4c5b-b190-fd740054a2ec | | image | cirros-0.3.0-x86_64-uec | | key_name | nova_key | | metadata | {} | | name | w_vol| | progress | 0| | security_groups| [{u'name': u'default'}] | | status | BUILD| | tenant_id | 5f68e605463940dda20e876604385c43 | | updated| 2012-11-07T14:02:01Z | | user_id| 104895e85fe54ae5a2cc5c5a650f50b0 | ++--+ [ndipanov@devstack devstack]$ nova list +--+---++--+ | ID | Name | Status | Networks | +--+---++--+ | caa459d5-27ae-4c5b-b190-fd740054a2ec | w_vol | ACTIVE | private=10.0.0.2 | +--+---++--+ [ndipanov@devstack devstack]$ ssh -o StrictHostKeyChecking=no -i nova_key.priv cirros@10.0.0.2 @@@ @WARNING: REMOTE HO
[Yahoo-eng-team] [Bug 1180040] Re: Race condition in attaching/detaching volumes when compute manager is unreachable
** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1180040 Title: Race condition in attaching/detaching volumes when compute manager is unreachable Status in OpenStack Compute (Nova): Invalid Bug description: When a compute manager is offline, or if it cannot pick up messages for some reason, a race condition exists in attaching/detaching volumes. Try attach and detach a volume and then bring the compute manager online. Then the reserve_block_device_name message gets delivered and a block_device_mapping is created for this instance/volume regardless of the state of the volume. This will result in the following issues. 1. The mountpoint is no longer be usable. 2. os-volume_attachments API will list the volume as attached to the instance. Steps to reproduce (This was recreated in Devstack with nova trunk 75af47a.) 1. Spawn an instance (Mine is a multinode devstack setup, so I spawn it to a different machine than the api, but the race condition should be reproducible in a single-node setup too) 2. Create a volume 3. Stop the compute manager (n-cpu) 4. Try to attach the volume to the instance, it should fail after a while 5. Try to detach the volume 6. List the volumes. The volume should be in 'available' state. Optionally you can delete it at this point 7. Check db for block_device_mapping. It shouldn't have any reference to this volume 8. Start compute manager on the node that the instance is running 9. Check db for block_device_mapping and it should now have a new entry associating this volume and instance regardless of the state of the volume To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1180040/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1296593] [NEW] Compute manager _poll_live_migration 'instance_ref' argument should be renamed to 'instance'
Public bug reported: The reason is 2-fold: * wrap_instance_fault decorator expects the argument to be 'instance' * We are using new-wold objects in live migration and instance_ref used to imply a dict. ** Affects: nova Importance: Medium Assignee: Nikola Đipanov (ndipanov) Status: In Progress ** Changed in: nova Milestone: None => icehouse-rc1 ** Changed in: nova Importance: Undecided => Medium ** Changed in: nova Status: New => Confirmed ** Changed in: nova Assignee: (unassigned) => Nikola Đipanov (ndipanov) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1296593 Title: Compute manager _poll_live_migration 'instance_ref' argument should be renamed to 'instance' Status in OpenStack Compute (Nova): In Progress Bug description: The reason is 2-fold: * wrap_instance_fault decorator expects the argument to be 'instance' * We are using new-wold objects in live migration and instance_ref used to imply a dict. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1296593/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1295625] [NEW] Oslo messaging port broke graceful shutdown of services in Nova
Public bug reported: After the port of Nova to oslo.messaging (https://review.openstack.org/#/c/39929) graceful shutdown of services introduced by https://blueprints.launchpad.net/nova/+spec/graceful- shutdown in I-1 got broken. In order to make this work again we need to make sure that Nova services call oslo.messaging MessageHandlingServer wait() method so that it gives a chance to the running greenthreads to finish. ** Affects: nova Importance: Undecided Status: Confirmed ** Changed in: nova Status: New => Confirmed ** Changed in: nova Milestone: None => icehouse-rc1 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1295625 Title: Oslo messaging port broke graceful shutdown of services in Nova Status in OpenStack Compute (Nova): Confirmed Bug description: After the port of Nova to oslo.messaging (https://review.openstack.org/#/c/39929) graceful shutdown of services introduced by https://blueprints.launchpad.net/nova/+spec/graceful- shutdown in I-1 got broken. In order to make this work again we need to make sure that Nova services call oslo.messaging MessageHandlingServer wait() method so that it gives a chance to the running greenthreads to finish. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1295625/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 944383] Re: There is no way to recover/cleanup a volume in an "attaching" state
Ok so I've looked at this and seems to work as expected now: $ for i in {1..5}; do cinder create --display-name volume_$i 1; done $ cinder list +--+---+--+--+-+--+-+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+-+ | 0afc6137-bb95-433b-bf47-31edb4f22109 | available | volume_5 | 1 | None | false | | | 4368ddd6-6d1c-436f-abf2-328de4af4c14 | available | volume_2 | 1 | None | false | | | 5899a09f-a052-4328-80a1-dccefde7ffbb | available | volume_4 | 1 | None | false | | | 65bb1a41-39c9-47bf-b48e-3f873ece7cc8 | available | volume_3 | 1 | None | false | | | a163bc28-7980-4c50-8ae3-cde63037096f | available | volume_1 | 1 | None | false | | +--+---+--+--+-+--+-+ $ cinder list | grep "^| \w" | awk '{ print $2 }' | xargs -P5 -I {} nova volume-attach d6544df8-7e3a-4f45-ad60-deff250e07c3 {} $ cinder list +--++--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--++--+--+-+--+--+ | 0afc6137-bb95-433b-bf47-31edb4f22109 | in-use | volume_5 | 1 | None | false | d6544df8-7e3a-4f45-ad60-deff250e07c3 | | 4368ddd6-6d1c-436f-abf2-328de4af4c14 | in-use | volume_2 | 1 | None | false | d6544df8-7e3a-4f45-ad60-deff250e07c3 | | 5899a09f-a052-4328-80a1-dccefde7ffbb | in-use | volume_4 | 1 | None | false | d6544df8-7e3a-4f45-ad60-deff250e07c3 | | 65bb1a41-39c9-47bf-b48e-3f873ece7cc8 | in-use | volume_3 | 1 | None | false | d6544df8-7e3a-4f45-ad60-deff250e07c3 | | a163bc28-7980-4c50-8ae3-cde63037096f | in-use | volume_1 | 1 | None | false | d6544df8-7e3a-4f45-ad60-deff250e07c3 | +--++--+--+-+--+--+ So based on the above I will mark as invalid for Nova ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/944383 Title: There is no way to recover/cleanup a volume in an "attaching" state Status in Cinder: Triaged Status in OpenStack Compute (Nova): Invalid Bug description: While trying to attach more than one volume to an instance two volumes hung in an "attaching" state. A volume-detach on that volume returns a 404 and a volume-delete returns a 500. It seems that a volume-force-detach is needed to clean up volumes in a hung state. To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/944383/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 884984] Re: Cannot boot from volume with 2 devices
** Changed in: nova Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/884984 Title: Cannot boot from volume with 2 devices Status in OpenStack Compute (Nova): Invalid Bug description: More details on: https://answers.launchpad.net/nova/+question/176938 Summary: - Say I had 2 disks, disk1 and disk2 (represented by 2 volumes). disk1 has the root-file-system and disk2 has some data. I boot an instances using the boot-from-volumes extension, and specify the 2 disks such as disk1 should be attached to /dev/vda and disk2 to /dev/vdb. When the instance is launched it fails to boot, because it tries to find the root-filesystem on disk2 instead. The underlying problem is with virsh/libvirt. Boot fails because in the libvirt.xml file created by Openstack, disk2 (/dev/vdb) is listed before disk1 (/dev/vda). So, what happens is that the hypervisor attaches disk2 first (since its listed first in the XML). Therefore when these disks are attached on the guest, disk2 appears as /dev/vda and disk1 as /dev/vdb. Later the kernel tries to find the root filesystem on '/dev/vda' (because that's what is selected as the root) and it fails for obvious reason. I think it's a virsh bug. It should be smart about it and attach the devices in the right order. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/884984/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1268665] [NEW] metadata service should be setting the objects indirection_api
Public bug reported: Ia12f48227eb2380f5da93313cd4045577d8857c9 introduces objects in the metadata service, and metadata is supposed to be using the conductor, so we need to make sure we set the nova.objects.base.NovaObject.indirection_api to conductor in the metadata service. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1268665 Title: metadata service should be setting the objects indirection_api Status in OpenStack Compute (Nova): New Bug description: Ia12f48227eb2380f5da93313cd4045577d8857c9 introduces objects in the metadata service, and metadata is supposed to be using the conductor, so we need to make sure we set the nova.objects.base.NovaObject.indirection_api to conductor in the metadata service. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1268665/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1260806] [NEW] Defaulting device names fails to update the database
Public bug reported: _default_block_device_names method of the compute manager, would call the conductor block_device_mapping_update method with the wrong arguments, causing a TypeError and ultimately the instance to fail. This bug happens only when using a driver that does not provid it's own implementation of default_device_names_for_instance, (currently only the libvirt driver does this). Also affects havana since https://review.openstack.org/#/c/40229/ ** Affects: nova Importance: Undecided Status: New ** Tags: havana-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1260806 Title: Defaulting device names fails to update the database Status in OpenStack Compute (Nova): New Bug description: _default_block_device_names method of the compute manager, would call the conductor block_device_mapping_update method with the wrong arguments, causing a TypeError and ultimately the instance to fail. This bug happens only when using a driver that does not provid it's own implementation of default_device_names_for_instance, (currently only the libvirt driver does this). Also affects havana since https://review.openstack.org/#/c/40229/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1260806/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp