[Yahoo-eng-team] [Bug 1825386] [NEW] nova is looking for OVMF file no longer provided by latest CentOS

2019-04-18 Thread Chris Friesen
Public bug reported: In nova/virt/libvirt/driver.py the code looks for a hardcoded path "/usr/share/OVMF/OVMF_CODE.fd". It appears that centos 7.6 has modified the OVMF-20180508-3 rpm to no longer contain this file. Instead it now seems to be named /usr/share/OVMF/OVMF_CODE.secboot.fd This

[Yahoo-eng-team] [Bug 1819216] [NEW] in devstack, "nova migrate " will try to migrate to the same host (and then fail)

2019-03-08 Thread Chris Friesen
Public bug reported: In multinode devstack I had an instance running on one node and tried running "nova migrate ". The operation started, but then the instance went into an error state with the following fault: {"message": "Unable to migrate instance (2bbdab8e- 3a83-43a4-8c47-ce57b653e43e) to

[Yahoo-eng-team] [Bug 1818701] [NEW] invalid PCI alias in flavor results in HTTP 500 on instance create

2019-03-05 Thread Chris Friesen
Public bug reported: If an invalid PCI alias is specified in the flavor extra-specs and we try to create an instance with that flavor, it will result in a PciInvalidAlias exception being raised. In ServersController.create() PciInvalidAlias is missing from the list of exceptions that get

[Yahoo-eng-team] [Bug 1818092] [NEW] hypervisor check in _check_instance_has_no_numa() is broken

2019-02-28 Thread Chris Friesen
eport the hypervisor type as "QEMU". So we need to fix up the hypervisor type check otherwise we'll always fail the check. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status: In Progress ** Tags: compute -- You received this bug notific

[Yahoo-eng-team] [Bug 1815949] [NEW] missing special-case libvirt exception during device detach

2019-02-14 Thread Chris Friesen
Public bug reported: In Pike a customer has run into the following issue: 2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall._func' failed: libvirtError: internal error: unable to execute QEMU command 'device_del': Device

[Yahoo-eng-team] [Bug 1792985] [NEW] strict NUMA memory allocation for 4K pages leads to OOM-killer

2018-09-17 Thread Chris Friesen
Public bug reported: We've seen a case on a resource-constrained compute node where booting multiple instances passed, but led to the following error messages from the host kernel: [ 731.911731] Out of memory: Kill process 133047 (nova-api) score 4 or sacrifice child [ 731.920377] Killed

[Yahoo-eng-team] [Bug 1792077] [NEW] problem specifying multiple "bus=scsi" block devices on nova boot

2018-09-11 Thread Chris Friesen
Public bug reported: I'm using devstack stable/rocky on ubuntu 16.04. When running this command nova boot --flavor m1.small --nic net-name=public --block-device source=image,id=24e8e922-2687-48b5-a895-3134a650e00f,dest=volume,size=2,bootindex=0,shutdown=remove,bus=scsi --block-device

[Yahoo-eng-team] [Bug 1790195] [NEW] performance problems starting up nova process due to regex code

2018-08-31 Thread Chris Friesen
Public bug reported: We noticed that nova process startup seems to take a long time. It looks like one major culprit is the regex code at https://github.com/openstack/nova/blob/master/nova/api/validation/parameter_types.py Sean K Mooney highlighted one possible culprit: i dont really like

[Yahoo-eng-team] [Bug 1785270] [NEW] allow confirmation of resize/migration for migrations in "confirming" status

2018-08-03 Thread Chris Friesen
Public bug reported: Confirmation of a resize is an RPC operation. If a compute node fails after a migration has been put into the "confirming" status there is no way to confirm it again, causing the state of the instance to get "stuck". In the case of confirm_resize(), I don't see any problem

[Yahoo-eng-team] [Bug 1785123] [NEW] UEFI NVRAM lost on cold migration or resize

2018-08-02 Thread Chris Friesen
Public bug reported: If you boot a virtual instance with UEFI, the UEFI NVRAM is lost on a cold migration. The default storage for the virtual UEFI NVRAM is in /var/lib/libvirt/qemu/nvram/, and the file is not being copied over on cold migration. ** Affects: nova Importance: Undecided

[Yahoo-eng-team] [Bug 1785086] [NEW] docs for RPC is out of date

2018-08-02 Thread Chris Friesen
Public bug reported: The information in doc/source/reference/rpc.rst is stale and should probably be updated or removed so that it doesn't confuse people. ** Affects: nova Importance: Undecided Status: New ** Tags: docs -- You received this bug notification because you are a

[Yahoo-eng-team] [Bug 1781643] [NEW] With remote storage, swap disk size changed after resize-revert

2018-07-13 Thread Chris Friesen
Public bug reported: There seems to be an issue (discovered in Pike) where ceph-backed swap does not return to the original size if a resize operation is reverted. Steps to reproduce: 1) Configure compute nodes to use remote ceph-backed storage for instances. 2) Launch a vm with with ephemeral

[Yahoo-eng-team] [Bug 1538565] Re: Guest CPU does not support 1Gb hugepages with explicit models

2018-05-28 Thread Chris Friesen
The code at https://review.openstack.org/#/c/534384/ has been merged, and should allow the operator to explicitly add the pdpe1gb flag. Marking as fixed. ** Changed in: nova Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Yahoo!

[Yahoo-eng-team] [Bug 1764556] Re: "nova list" fails with exception.ServiceNotFound if service is deleted and has no UUID

2018-04-18 Thread Chris Friesen
I think we could get into the bad state described in the bug if we do a slightly different series of actions: 1) boot instance on Ocata 2) migrate instance 3) delete compute node (thus deleting the service record) 4) create compute node with same name 5) migrate instance to newly-created

[Yahoo-eng-team] [Bug 1764556] [NEW] "nova list" fails with exception.ServiceNotFound if service is deleted and has no UUID

2018-04-16 Thread Chris Friesen
Public bug reported: We had a testcase where we booted an instance on Newton, migrated it off the compute node, deleted the compute node (and service), upgraded to Pike, created a new compute node with the same name, and migrated the instance back to the compute node. At this point the "nova

[Yahoo-eng-team] [Bug 1763766] [NEW] nova needs to disallow topology changes on image rebuild

2018-04-13 Thread Chris Friesen
Public bug reported: When doing a rebuild the assumption throughout the code is that we are not changing the resources consumed by the guest (that is what a resize is for). The complication here is that there are a number of image properties which might affect the instance resource consumption

[Yahoo-eng-team] [Bug 1552777] Re: resizing from flavor with swap to one without swap puts instance into Error status

2018-03-21 Thread Chris Friesen
** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1552777 Title: resizing from flavor with swap to one

[Yahoo-eng-team] [Bug 1756179] [NEW] deleting a nova-compute service leaves orphaned records in placement and host mapping

2018-03-15 Thread Chris Friesen
Public bug reported: Currently when deleting a nova-compute service via the API, we will delete the service and compute_node records in the DB, but the placement resource provider and host mapping records will be orphaned. The orphaned resource provider records have been found to cause scheduler

[Yahoo-eng-team] [Bug 1755981] [NEW] powering off and on an instance can result in instance boot failure due to serial port handling race

2018-03-14 Thread Chris Friesen
Public bug reported: The following is specific to the libvirt driver. When we call power_off() it calls _destroy(), which in turn calls self._get_serial_ports_from_guest() and loops over all the serial ports calling serial_console.release_port() on each. This removes the host TCP port from

[Yahoo-eng-team] [Bug 1754782] [NEW] we skip critical scheduler filters when forcing the host on instance boot

2018-03-09 Thread Chris Friesen
Public bug reported: When booting an instance it's possible to force it to be placed on a specific host using the "--availability-zone nova:host" syntax. If you do this, the code at https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L581 will return early rather than

[Yahoo-eng-team] [Bug 1538565] Re: Guest CPU does not support 1Gb hugepages with explicit models

2018-03-01 Thread Chris Friesen
In recent versions of qemu the "Skylake-Server" cpu model has the flag, but any earlier Intel processor models do not. ** Changed in: nova Status: Expired => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to

[Yahoo-eng-team] [Bug 1750623] [NEW] rebuild to same host with different image shouldn't check with placement

2018-02-20 Thread Chris Friesen
Public bug reported: When doing a rebuild-to-same-host but with a different image, all we really want to do is ensure that the image properties for the new image are still valid for the current host. Accordingly we need to go through the scheduler (to run the image-related filters) but we don't

[Yahoo-eng-team] [Bug 1750618] [NEW] rebuild to same host with a different image results in erroneously doing a Claim

2018-02-20 Thread Chris Friesen
Public bug reported: As of stable/pike if we do a rebuild-to-same-node with a new image, it results in ComputeManager.rebuild_instance() being called with "scheduled_node=" and "recreate=False". This results in a new Claim, which seems wrong since we're not changing the flavor and that claim

[Yahoo-eng-team] [Bug 1605098] Re: Nova usage not showing server real uptime

2018-01-10 Thread Chris Friesen
Nova reserves resources for the instance even if it's not running, so the reported uptime probably shouldn't be used for billing. Also, the uptime gets reset on a resize/revert-resize/rescue, further making it tricky to use for billing. ** Changed in: nova Status: New => Invalid -- You

[Yahoo-eng-team] [Bug 1734394] [NEW] nova microversion 2.36 accidentally removed support for "force" when setting quotas

2017-11-24 Thread Chris Friesen
Public bug reported: It is supposed to be possible to specify the "force" option when updating a quota-set. Up to microversion 2.35 this works as expected. However, in 2.36 it no longer works, and nova-api sends back: RESP BODY: {"badRequest": {"message": "Invalid input for field/attribute

[Yahoo-eng-team] [Bug 1724686] [NEW] authentication code hangs when there are three or more admin keystone endpoints

2017-10-18 Thread Chris Friesen
Public bug reported: I'm running stable/pike devstack, and I was playing around with what happens when there are many endpoints in multiple regions, and I stumbled over a scenario where the keystone authentication code hangs. My original endpoint list looked like this:

[Yahoo-eng-team] [Bug 1284719] Re: buggy live migration rollback when using shared storage

2017-08-28 Thread Chris Friesen
** Changed in: nova Status: Expired => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1284719 Title: buggy live migration rollback when using shared

[Yahoo-eng-team] [Bug 1712684] [NEW] allocations not immediately removed when instance is deleted

2017-08-23 Thread Chris Friesen
Public bug reported: Based on code inspection and a discussion with mriedem on IRC, it appears that when deleting an instance in a pure-Pike cloud the allocations are not removed until the update_available_resource() periodic task calls ResourceTracker._update_usage_from_instances(), which calls

[Yahoo-eng-team] [Bug 1695991] [NEW] "nova-manage db online_data_migrations" doesn't report matched/migrated properly

2017-06-05 Thread Chris Friesen
Public bug reported: When running "nova-manage db online_data_migrations", it will report how many items matched the query and how many of the matching items were migrated. However, most of the migration routines are not properly reporting the "total matched" count when "max_count" is specified.

[Yahoo-eng-team] [Bug 1695965] [NEW] "nova-manage db online_data_migrations" exit code is strange

2017-06-05 Thread Chris Friesen
Public bug reported: If I'm reading the code right, the exit value for "nova-manage db online_data_migrations" will be 1 if we actually performed some migrations and 0 if we performed no migrations, either because there were no remaining migrations or because the migration code raised an

[Yahoo-eng-team] [Bug 1691780] [NEW] port id is incorrectly logged in _update_port_binding_for_instance

2017-05-18 Thread Chris Friesen
': {}, 'binding:host_id': 'compute-6'} ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status: In Progress ** Tags: neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https

[Yahoo-eng-team] [Bug 1690890] [NEW] error message not clear for shared live migration with block storage

2017-05-15 Thread Chris Friesen
;Shared storage migration requires either shared storage or boot-from- volume with no local disks." ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status: In Progress ** Tags: compute -- You received this bug notification because you are a member o

[Yahoo-eng-team] [Bug 1688673] [NEW] cpu_realtime_mask handling is not intuitive

2017-05-05 Thread Chris Friesen
Public bug reported: The nova code implicitly assumes that all vCPUs are realtime in nova.virt.hardware.vcpus_realtime_topology(), and then it appends the user-specified mask. This only makes sense if the user-specified cpu_realtime_mask is an exclusion mask, but this isn't documented anywhere.

[Yahoo-eng-team] [Bug 1688599] [NEW] resource audit races against evacuating instance

2017-05-05 Thread Chris Friesen
Public bug reported: We recently hit an issue where an evacuating instance with dedicated cpu_policy being pinned to same host CPUs as other instances with dedicated cpu_policy. During subsequent resource audits we would see cpu pinning errors. The root cause appears to be the fact that the

[Yahoo-eng-team] [Bug 1687067] [NEW] problems with cpu and cpu-thread policy where flavor/image specify different settings

2017-04-28 Thread Chris Friesen
Public bug reported: There are a number of issues related to CPU policy and CPU thread policy where the flavor extra-spec and image properties do not match up. The docs at https://docs.openstack.org/admin-guide/compute-cpu- topologies.html say the following: "Image metadata takes precedence

[Yahoo-eng-team] [Bug 1669054] [NEW] RequestSpec.ignore_hosts from resize is reused in subsequent evacuate

2017-03-01 Thread Chris Friesen
Public bug reported: When doing a resize, if CONF.allow_resize_to_same_host is False, then we set RequestSpec.ignore_hosts and then save the RequestSpec. When we go to use the same RequestSpec on a subsequent rebuild/evacuate, ignore_hosts is still set from the previous resize. In

[Yahoo-eng-team] [Bug 1573288] Re: over time, horizon's admin -> overview page becomes very slow ....

2017-01-25 Thread Chris Friesen
*** This bug is a duplicate of bug 1508571 *** https://bugs.launchpad.net/bugs/1508571 ** This bug has been marked a duplicate of bug 1508571 Overview panels use too wide date range as default -- You received this bug notification because you are a member of Yahoo! Engineering Team,

[Yahoo-eng-team] [Bug 1654345] Re: realtime emulatorpin should use pcpus, not vcpus

2017-01-05 Thread Chris Friesen
Looks like this has already been dealt with on Master via bug 1614054, commit 6683bf9. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova).

[Yahoo-eng-team] [Bug 1654345] [NEW] realtime emulatorpin should use pcpus, not vcpus

2017-01-05 Thread Chris Friesen
the "emulatorpin" cpuset. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status: New ** Tags: compute libvirt newton-backport-potential ** Description changed: When specifying "hw:cpu_realtime_mask" in the flavor, Libv

[Yahoo-eng-team] [Bug 1638961] [NEW] evacuating an instance loses files specified via "--file" on the cli

2016-11-03 Thread Chris Friesen
Public bug reported: I booted up an instance as follows in my stable/mitaka devstack environment: $ echo "this is a test" > /tmp/my_user_data.txt $ echo "blah1" > /tmp/file1 $ echo "blah2" > /tmp/file2 $ nova boot --flavor m1.tiny --image cirros-0.3.4-x86_64-uec --config-drive true --user-data

[Yahoo-eng-team] [Bug 1613488] Re: changed fields of versionedobjects not tracked properly when down-versioning object

2016-08-29 Thread Chris Friesen
The review for the oslo.versionedobjects change is here: https://review.openstack.org/#/c/355981/ ** Changed in: nova Status: New => Fix Released ** Project changed: nova => oslo.versionedobjects -- You received this bug notification because you are a member of Yahoo! Engineering Team,

[Yahoo-eng-team] [Bug 1613488] [NEW] changed fields of versionedobjects not tracked properly when down-versioning object

2016-08-15 Thread Chris Friesen
Public bug reported: Sorry for the complicated write-up below, but the issue is complicated. I'm running into a problem between Mitaka and Kilo, but I *think* it'll also hit Mitaka/Liberty. The problem scenario is when we have older and newer services talking to each other. The problem

[Yahoo-eng-team] [Bug 1605720] [NEW] backing store missing for ephemeral disk on migration with boot-from-vol

2016-07-22 Thread Chris Friesen
Public bug reported: I'm on stable/mitaka, but the master code looks similar. I have compute nodes configured to use qcow2 and libvirt. The flavor has an ephemeral disk and a swap disk. I boot an instance with this flavor, and the instance is boot-from-volume. When I try to cold-migrate the

[Yahoo-eng-team] [Bug 1602814] [NEW] hyperthreading bug in NUMATopologyFilter

2016-07-13 Thread Chris Friesen
Public bug reported: I recently ran into an issue where I was trying to boot an instance with 8 vCPUs, with hw:cpu_policy=dedicated. The host had 8 pCPUs available, but they were a mix of siblings and non-siblings. In virt.hardware._pack_instance_onto_cores(), the _get_pinning() function seems

[Yahoo-eng-team] [Bug 1600304] [NEW] _update_usage_from_migrations() can end up processing stale migrations

2016-07-08 Thread Chris Friesen
in the case of the stale migration we will have hit the error case in _pair_instances_to_migrations(), and so the instance will be lazy-loaded from the DB, ensuring that its migration ID is up-to-date. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status

[Yahoo-eng-team] [Bug 1213224] Re: nova allows multiple aggregates with same zone name

2016-06-30 Thread Chris Friesen
Just to clarify something, availability zones don't "have" host aggregates. Rather, some host aggregates *are also* availability zones, but a given host can only be in one availability zone. I went and looked at the code, and the way it is currently written I think it is actually okay to have

[Yahoo-eng-team] [Bug 1590607] [NEW] incorrect handling of host numa cell usage with instances having no numa topology

2016-06-08 Thread Chris Friesen
y. This writes the original host cell usage information back to it. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123) Status: New ** Tags: compute scheduler ** Changed in: nova Assignee: (unassigned) => Chris Friesen (cbf123) -- You received this

[Yahoo-eng-team] [Bug 1590133] [NEW] help text for cpu_allocation_ratio is wrong

2016-06-07 Thread Chris Friesen
Public bug reported: In stable/mitaka in resource_tracker.py the help text for the cpu_allocation_ratio config option reads in part: 'NOTE: This can be set per-compute, or if set to 0.0, the value ' 'set on the scheduler node(s) will be used ' 'and

[Yahoo-eng-team] [Bug 1590091] [NEW] bug in handling of ISOLATE thread policy

2016-06-07 Thread Chris Friesen
Public bug reported: I'm running stable/mitaka in devstack. I've got a small system with 2 pCPUs, both marked as available for pinning. They're two cores of a single processor, no threads. "virsh capabilities" shows: It is my understanding that

[Yahoo-eng-team] [Bug 1577642] [NEW] race between disk_available_least and instance operations

2016-05-02 Thread Chris Friesen
Public bug reported: The calculation for LibvirtDriver._get_disk_over_committed_size_total() loops over all the instances on the hypervisor to try to figure out the total overcommitted size for all instances. However, at the time that routine is called from

[Yahoo-eng-team] [Bug 1552777] [NEW] resizing from flavor with swap to one without swap puts instance into Error status

2016-03-03 Thread Chris Friesen
Public bug reported: In a single-node devstack (current trunk, nova commit 6e1051b7), if you boot an instance with a flavor that has nonzero swap and then resize to a flavor with zero swap it causes an exception. It seems that we somehow neglect to remove the swap file from the instance.

[Yahoo-eng-team] [Bug 1549032] [NEW] max_net_count doesn't interact properly with min_count when booting multiple instances

2016-02-23 Thread Chris Friesen
: Chris Friesen (cbf123) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1549032 Title: max_net_count doesn't interact properly with min_count

[Yahoo-eng-team] [Bug 1542039] [NEW] nova should not reschedule an instance that has already been deleted

2016-02-04 Thread Chris Friesen
Public bug reported: I'm investigating an issue where an instance with a large disk and an attached cinder volume was booted in a stable/kilo OpenStack setup with the diskFilter disabled. The timeline looks like this: scheduler picks initial compute node nova attempts to boot it up on one

[Yahoo-eng-team] [Bug 1538619] [NEW] Fix up argument order in remove_volume_connection()

2016-01-27 Thread Chris Friesen
Public bug reported: The RPC API function for remove_volume_connection() uses a different argument order than the ComputeManager function of the same name. The normal RPC code uses named arguments, but the _ComputeV4Proxy version doesn't, and it has the order wrong. This causes problems

[Yahoo-eng-team] [Bug 1536703] [NEW] unable to re-issue confirm/revert of resize

2016-01-21 Thread Chris Friesen
Public bug reported: If we call confirm_resize() that sets migration.status to 'confirming' and sends an RPC cast to the compute node. If there's a glitch and that cast is received but never processed, there's no way to confirm the resize since it only looks for migrations with a status of

[Yahoo-eng-team] [Bug 1528325] [NEW] instance with explicit "small" pages treated different from implicit

2015-12-21 Thread Chris Friesen
Public bug reported: In numa_get_constraints() we call pagesize = _numa_get_pagesize_constraints(flavor, image_meta) then later we have if nodes or pagesize: [setattr(c, 'pagesize', pagesize) for c in numa_topology.cells] This ends up treating an instance which doesn't specify

[Yahoo-eng-team] [Bug 1512907] [NEW] leak of vswitch port if delete an instance while resizing

2015-11-03 Thread Chris Friesen
Public bug reported: I've been testing with a modified version of stable/kilo, but I believe the bug is present in upstream stable/kilo. When using nova with neutron, if I boot an instance, then trigger a resize, and then delete the instance at just the right point during the resize it ends up

[Yahoo-eng-team] [Bug 1471997] Re: nova MAX_FUNC value in nova/pci/devspec.py is too low

2015-08-20 Thread Chris Friesen
Jay Pipes helpfully pointed out that the MAX_FUNC value was defined by the PCI spec, and didn't refer to the SRIOV VF value, but rather the PCI device function. The original issue turned out to be a local problem generating the PCI whitelist. ** Changed in: nova Status: In Progress =

[Yahoo-eng-team] [Bug 1484742] Re: NUMATopologyFilter doesn't account for CPU/RAM overcommit

2015-08-17 Thread Chris Friesen
** Changed in: nova Status: In Progress = Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1484742 Title: NUMATopologyFilter doesn't account for CPU/RAM

[Yahoo-eng-team] [Bug 1485631] [NEW] CPU/RAM overcommit treated differently by normal and NUMA topology case

2015-08-17 Thread Chris Friesen
Public bug reported: Currently in the NUMA topology case (so multi-node guest, dedicated CPUs, hugepages in the guest, etc.) a single guest is not allowed to consume more CPU/RAM than the host actually has in total regardless of the specified overcommit ratio. In other words, the overcommit

[Yahoo-eng-team] [Bug 1484742] [NEW] NUMATopologyFilter doesn't account for cpu_allocation_ratio

2015-08-13 Thread Chris Friesen
Public bug reported: There seems to be a bug in the NUMATopologyFilter where it doesn't properly account for cpu_allocation_ratio. (Detected on stable/kilo, not sure if it applies to current master.) To reproduce: 1) Create a flavor with a moderate number of CPUs (5, for example) and enable

[Yahoo-eng-team] [Bug 1482416] [NEW] bug blocks DB migration that changes column type

2015-08-06 Thread Chris Friesen
Public bug reported: I'm trying to make the following change as a DB migration +# Table instances, modify field 'vcpus_used' to Float +compute_nodes = Table('compute_nodes', meta, autoload=True) +vcpus_used = getattr(compute_nodes.c, 'vcpus_used') +vcpus_used.alter(type=Float)

[Yahoo-eng-team] [Bug 1471997] [NEW] nova MAX_FUNC value in nova/pci/devspec.py is too low

2015-07-06 Thread Chris Friesen
Public bug reported: The MAX_FUNC value in nova/pci/devspec.py is set to 0x7. This limits us to a relatively small number of VFs per PF, which is annoying when trying to use SRIOV in any sort of serious way. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123

[Yahoo-eng-team] [Bug 1461678] [NEW] nova error handling causes glance to keep unlinked files open, wasting space

2015-06-03 Thread Chris Friesen
is an iterator. If we take an exception (like we can't write the file because the filesystem is full) then we will stop iterating over the chunks. If we don't iterate over all the chunks then glance will keep the file open. ** Affects: nova Importance: Undecided Assignee: Chris Friesen (cbf123

[Yahoo-eng-team] [Bug 1459782] [NEW] _is_storage_shared_with() in libvirt/driver.py gives possibly false results if ssh keys not configured

2015-05-28 Thread Chris Friesen
Public bug reported: In virt.libvirt.driver.LibvirtDriver._is_storage_shared_with() we first check IP addresses and if they don't match then we'll try to use ssh to check whether the storage is actually shared or not. If ssh keys are not set up between the compute nodes for the user running

[Yahoo-eng-team] [Bug 1458122] [NEW] nova shouldn't error if we can't schedule all of max_count instances at boot time

2015-05-22 Thread Chris Friesen
Public bug reported: When booting up instances, nova allows the user to specify a min count and a max count. Currently, if the user has quota space for max count instances, then nova will try to create them all. If any of them can't be scheduled, then the creation of all of them will be aborted

[Yahoo-eng-team] [Bug 1454451] [NEW] simultaneous boot of multiple instances leads to cpu pinning overlap

2015-05-12 Thread Chris Friesen
Public bug reported: I'm running into an issue with kilo-3 that I think is present in current trunk. I think there is a race between the claimed CPUs of an instance being persisted to the DB, and the resource audit scanning the DB for instances and subtracting pinned CPUs from the list of

[Yahoo-eng-team] [Bug 1444171] [NEW] evacuate code path is not updating binding:host_id in neutron

2015-04-14 Thread Chris Friesen
Public bug reported: Currently when using neutron we don't update the binding:host_id during the evacuate code path. This can cause the evacuation to fail if we go to sleep waiting for events in virt.libvirt.driver.LibvirtDriver._create_domain_and_network(). Since the binding:host_id in neutron

[Yahoo-eng-team] [Bug 1298513] Re: nova server group policy should be applied when resizing/migrating server

2015-03-23 Thread Chris Friesen
*** This bug is a duplicate of bug 1379451 *** https://bugs.launchpad.net/bugs/1379451 ** This bug has been marked a duplicate of bug 1379451 anti-affinity policy only honored on boot -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is

[Yahoo-eng-team] [Bug 1423648] [NEW] race conditions with server group scheduler policies

2015-02-19 Thread Chris Friesen
Public bug reported: In git commit a79ecbe Russel Bryant submitted a partial fix for a race condition when booting an instance as part of a server group with an anti-affinity scheduler policy. That fix only solves part of the problem, however. There are a number of issues remaining: 1) It's

[Yahoo-eng-team] [Bug 1420848] [NEW] nova-compute service spuriously marked as up when disabled

2015-02-11 Thread Chris Friesen
Public bug reported: I think our usage of the updated_at field to determine whether a service is up or not is buggy. Consider this scenario: 1) nova-compute is happily running and is up/enabled on compute-0 2) something causes nova-compute to stop (process crash, hardware fault, power failure,

[Yahoo-eng-team] [Bug 1419115] [NEW] IndexError adding host to availability zone

2015-02-06 Thread Chris Friesen
Public bug reported: There appears to be a bug in the code dealing with adding a disabled host to an aggregate that is exported as an availability zone. I disable the nova-compute service on a host and then tried to add it to an aggregate that is exported as an availabilty zone. This resulted in

[Yahoo-eng-team] [Bug 1417667] [NEW] migration/evacuation/rebuild/resize of instance with dedicated cpus needs to recalculate cpus on destination

2015-02-03 Thread Chris Friesen
Public bug reported: I'm running nova trunk, commit 752954a. I configured a flavor with two vcpus and extra specs hw:cpu_policy=dedicated in order to enable vcpu pinning. I booted up a number of instances such that there was one instance affined to host cpus 12 and 13 on compute-0, and another

[Yahoo-eng-team] [Bug 1417671] [NEW] when using dedicated cpus, the emulator thread should be affined as well

2015-02-03 Thread Chris Friesen
Public bug reported: I'm running nova trunk, commit 752954a. I configured a flavor with two vcpus and extra specs hw:cpu_policy=dedicated in order to enable vcpu pinning. I booted up an instance with this flavor, and virsh dumpxml shows that the two vcpus were affined suitably to host cpus, but

[Yahoo-eng-team] [Bug 1417723] [NEW] when using dedicated cpus, the guest topology doesn't match the host

2015-02-03 Thread Chris Friesen
Public bug reported: According to http://specs.openstack.org/openstack/nova- specs/specs/juno/approved/virt-driver-cpu-pinning.html, the topology of the guest is set up as follows: In the absence of an explicit vCPU topology request, the virt drivers typically expose all vCPUs as sockets with 1

[Yahoo-eng-team] [Bug 1417201] [NEW] nova-scheduler exception when trying to use hugepages

2015-02-02 Thread Chris Friesen
Public bug reported: I'm trying to make use of huge pages as described in http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented /virt-driver-large-pages.html. I'm running nova kilo as of Jan 27th. The other openstack services are juno. Libvirt is 1.2.8. I've allocated 1

[Yahoo-eng-team] [Bug 1410924] [NEW] instructions for rebuilding API samples are wrong

2015-01-14 Thread Chris Friesen
Public bug reported: The instructions in nova/tests/functional/api_samples/README.rst say to run GENERATE_SAMPLES=True tox -epy27 nova.tests.unit.integrated, but that path doesn't exist anymore. Running GENERATE_SAMPLES=True tox -e functional seems to work, but someone who knows more than me

[Yahoo-eng-team] [Bug 1368917] [NEW] rpc core should abort a call() early if the connection is terminated before the timeout period expires

2014-09-12 Thread Chris Friesen
Public bug reported: As it stands, if a client issuing an RPC call() sends a message to the rabbitmq server, then the rabbitmq server does a failover the client will wait for the full RPC timeout period (60 seconds) even though new rabbitmq server has come up long before then and some connections

[Yahoo-eng-team] [Bug 1368989] [NEW] service_update() should not set an RPC timeout longer than service.report_interval

2014-09-12 Thread Chris Friesen
Public bug reported: nova.servicegroup.drivers.db.DbDriver._report_state() is called every service.report_interval seconds from a timer in order to periodically report the service state. It calls self.conductor_api.service_update(). If this ends up calling

[Yahoo-eng-team] [Bug 1330744] [NEW] live migration is incorrectly comparing host cpu features

2014-06-16 Thread Chris Friesen
Public bug reported: Runnng Havana, we're seeing live migration fail when attempting to migrate from an Ivy-Bridge host to a Sandy-Bridge host. However, we're using the default kvm guest config which has a safe default virtual cpu with a subset of cpu features. /proc/cpuinfo from within the

[Yahoo-eng-team] [Bug 1313967] [NEW] build_and_run_instance() appears to be dead code

2014-04-28 Thread Chris Friesen
Importance: Undecided Assignee: Chris Friesen (cbf123) Status: New ** Changed in: nova Assignee: (unassigned) = Chris Friesen (cbf123) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova

[Yahoo-eng-team] [Bug 1313967] Re: build_and_run_instance() appears to be dead code

2014-04-28 Thread Chris Friesen
Sorry for the noise, I started reading the code and realized that it was just taking a long time to transition over to the new function. ** Changed in: nova Status: New = Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is

[Yahoo-eng-team] [Bug 1311793] [NEW] wrap_instance_event() swallows return codes

2014-04-23 Thread Chris Friesen
Friesen (cbf123) Status: New ** Changed in: nova Assignee: (unassigned) = Chris Friesen (cbf123) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1311793 Title

[Yahoo-eng-team] [Bug 1298494] [NEW] nova server-group-list doesn't show members of the group

2014-03-27 Thread Chris Friesen
Public bug reported: With current devstack I ensured I had GroupAntiAffinityFilter in scheduler_default_filters in /etc/nova/nova.conf, restarted nova- scheduler, then ran: nova server-group-create --policy anti-affinity antiaffinitygroup nova server-group-list

[Yahoo-eng-team] [Bug 1298509] [NEW] nova server-group-delete allows deleting server group with members

2014-03-27 Thread Chris Friesen
Public bug reported: Currently nova will let you do this: nova server-group-create --policy anti-affinity antiaffinitygroup nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid cirros0 nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid cirros1

[Yahoo-eng-team] [Bug 1298513] [NEW] nova server group policy should be applied when resizing/migrating server

2014-03-27 Thread Chris Friesen
Public bug reported: If I do the following: nova server-group-create --policy affinity affinitygroup nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid cirros0 nova resize cirros0 2 The cirros0 server will be resized but when the scheduler runs it doesn't take into

[Yahoo-eng-team] [Bug 1298690] [NEW] sqlite regexp() function doesn't behave like mysql

2014-03-27 Thread Chris Friesen
Public bug reported: In bug 1298494 I recently saw a case where the unit tests (using sqlite) behaved differently than devstack with mysql. The issue seems to be when we do filters = {'uuid': group.members, 'deleted_at': None} instances = instance_obj.InstanceList.get_by_filters(

[Yahoo-eng-team] [Bug 1296967] [NEW] instances stuck with task_state of REBOOTING after controller switchover

2014-03-24 Thread Chris Friesen
Public bug reported: We were doing some testing of Havana and have run into a scenario that ended up with two instances stuck with a task_state of REBOOTING following a reboot of the controller: 1) We reboot the controller. 2) Right after it comes back up something calls

[Yahoo-eng-team] [Bug 1296972] Re: RPC code in Havana doesn't handle connection errors

2014-03-24 Thread Chris Friesen
Looks like I misread that patch below, it's adding back the channel error check, not the connection error check. This may be due to a bad patch on our end, sorry for the noise. ** Changed in: nova Status: New = Invalid -- You received this bug notification because you are a member of

[Yahoo-eng-team] [Bug 1294756] [NEW] missing test for None in sqlalchemy query filter

2014-03-19 Thread Chris Friesen
Public bug reported: In db.sqlalchemy.api.instance_get_all_by_filters() there is code that looks like this: if not filters.pop('soft_deleted', False): query_prefix = query_prefix.\ filter(models.Instance.vm_state != vm_states.SOFT_DELETED) In sqlalchemy a comparison against a

[Yahoo-eng-team] [Bug 1292963] [NEW] postgres incompatibility in InstanceGroup.get_hosts()

2014-03-15 Thread Chris Friesen
in get_hosts(): filters = {'uuid': filter_uuids, 'deleted_at': None} It seems that current postgres doesn't allow implicit casts. If I change the line to: filters = {'uuid': filter_uuids, 'deleted': 0} Then it seems to work. ** Affects: nova Importance: Undecided Assignee: Chris Friesen

[Yahoo-eng-team] [Bug 1289064] [NEW] live migration of instance should claim resources on target compute node

2014-03-06 Thread Chris Friesen
Public bug reported: I'm looking at the current Icehouse code, but this applies to previous versions as well. When we create a new instance via _build_instance() or _build_and_run_instance(), in both cases we call instance_claim() to test for resources and reserve them. During a cold migration