Public bug reported:
In nova/virt/libvirt/driver.py the code looks for a hardcoded path
"/usr/share/OVMF/OVMF_CODE.fd".
It appears that centos 7.6 has modified the OVMF-20180508-3 rpm to no
longer contain this file. Instead it now seems to be named
/usr/share/OVMF/OVMF_CODE.secboot.fd
This
Public bug reported:
In multinode devstack I had an instance running on one node and tried
running "nova migrate ". The operation started, but then the
instance went into an error state with the following fault:
{"message": "Unable to migrate instance (2bbdab8e-
3a83-43a4-8c47-ce57b653e43e) to
Public bug reported:
If an invalid PCI alias is specified in the flavor extra-specs and we
try to create an instance with that flavor, it will result in a
PciInvalidAlias exception being raised.
In ServersController.create() PciInvalidAlias is missing from the list
of exceptions that get
eport the
hypervisor type as "QEMU". So we need to fix up the hypervisor type
check otherwise we'll always fail the check.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: In Progress
** Tags: compute
--
You received this bug notific
Public bug reported:
In Pike a customer has run into the following issue:
2019-02-12 07:34:43.728 23425 ERROR oslo.service.loopingcall [-] Dynamic
interval looping call 'oslo_service.loopingcall._func' failed: libvirtError:
internal error: unable to execute QEMU command 'device_del': Device
Public bug reported:
We've seen a case on a resource-constrained compute node where booting
multiple instances passed, but led to the following error messages from
the host kernel:
[ 731.911731] Out of memory: Kill process 133047 (nova-api) score 4 or
sacrifice child
[ 731.920377] Killed
Public bug reported:
I'm using devstack stable/rocky on ubuntu 16.04.
When running this command
nova boot --flavor m1.small --nic net-name=public --block-device
source=image,id=24e8e922-2687-48b5-a895-3134a650e00f,dest=volume,size=2,bootindex=0,shutdown=remove,bus=scsi
--block-device
Public bug reported:
We noticed that nova process startup seems to take a long time. It
looks like one major culprit is the regex code at
https://github.com/openstack/nova/blob/master/nova/api/validation/parameter_types.py
Sean K Mooney highlighted one possible culprit:
i dont really like
Public bug reported:
Confirmation of a resize is an RPC operation. If a compute node fails
after a migration has been put into the "confirming" status there is no
way to confirm it again, causing the state of the instance to get
"stuck".
In the case of confirm_resize(), I don't see any problem
Public bug reported:
If you boot a virtual instance with UEFI, the UEFI NVRAM is lost on a
cold migration.
The default storage for the virtual UEFI NVRAM is in
/var/lib/libvirt/qemu/nvram/, and the file is not being copied over on
cold migration.
** Affects: nova
Importance: Undecided
Public bug reported:
The information in doc/source/reference/rpc.rst is stale and should
probably be updated or removed so that it doesn't confuse people.
** Affects: nova
Importance: Undecided
Status: New
** Tags: docs
--
You received this bug notification because you are a
Public bug reported:
There seems to be an issue (discovered in Pike) where ceph-backed swap does not
return to the original size if a resize operation is reverted.
Steps to reproduce:
1) Configure compute nodes to use remote ceph-backed storage for instances.
2) Launch a vm with with ephemeral
The code at https://review.openstack.org/#/c/534384/ has been merged,
and should allow the operator to explicitly add the pdpe1gb flag.
Marking as fixed.
** Changed in: nova
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
I think we could get into the bad state described in the bug if we do a
slightly different series of actions:
1) boot instance on Ocata
2) migrate instance
3) delete compute node (thus deleting the service record)
4) create compute node with same name
5) migrate instance to newly-created
Public bug reported:
We had a testcase where we booted an instance on Newton, migrated it off
the compute node, deleted the compute node (and service), upgraded to
Pike, created a new compute node with the same name, and migrated the
instance back to the compute node.
At this point the "nova
Public bug reported:
When doing a rebuild the assumption throughout the code is that we are
not changing the resources consumed by the guest (that is what a resize
is for). The complication here is that there are a number of image
properties which might affect the instance resource consumption
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1552777
Title:
resizing from flavor with swap to one
Public bug reported:
Currently when deleting a nova-compute service via the API, we will
delete the service and compute_node records in the DB, but the placement
resource provider and host mapping records will be orphaned.
The orphaned resource provider records have been found to cause
scheduler
Public bug reported:
The following is specific to the libvirt driver.
When we call power_off() it calls _destroy(), which in turn calls
self._get_serial_ports_from_guest() and loops over all the serial ports
calling serial_console.release_port() on each. This removes the host
TCP port from
Public bug reported:
When booting an instance it's possible to force it to be placed on a
specific host using the "--availability-zone nova:host" syntax.
If you do this, the code at
https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L581
will return early rather than
In recent versions of qemu the "Skylake-Server" cpu model has the flag,
but any earlier Intel processor models do not.
** Changed in: nova
Status: Expired => Confirmed
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to
Public bug reported:
When doing a rebuild-to-same-host but with a different image, all we
really want to do is ensure that the image properties for the new image
are still valid for the current host. Accordingly we need to go through
the scheduler (to run the image-related filters) but we don't
Public bug reported:
As of stable/pike if we do a rebuild-to-same-node with a new image, it
results in ComputeManager.rebuild_instance() being called with
"scheduled_node=" and "recreate=False". This results in a new
Claim, which seems wrong since we're not changing the flavor and that
claim
Nova reserves resources for the instance even if it's not running, so
the reported uptime probably shouldn't be used for billing.
Also, the uptime gets reset on a resize/revert-resize/rescue, further
making it tricky to use for billing.
** Changed in: nova
Status: New => Invalid
--
You
Public bug reported:
It is supposed to be possible to specify the "force" option when
updating a quota-set. Up to microversion 2.35 this works as expected.
However, in 2.36 it no longer works, and nova-api sends back:
RESP BODY: {"badRequest": {"message": "Invalid input for field/attribute
Public bug reported:
I'm running stable/pike devstack, and I was playing around with what
happens when there are many endpoints in multiple regions, and I
stumbled over a scenario where the keystone authentication code hangs.
My original endpoint list looked like this:
** Changed in: nova
Status: Expired => Incomplete
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1284719
Title:
buggy live migration rollback when using shared
Public bug reported:
Based on code inspection and a discussion with mriedem on IRC, it
appears that when deleting an instance in a pure-Pike cloud the
allocations are not removed until the update_available_resource()
periodic task calls ResourceTracker._update_usage_from_instances(),
which calls
Public bug reported:
When running "nova-manage db online_data_migrations", it will report how
many items matched the query and how many of the matching items were
migrated.
However, most of the migration routines are not properly reporting the
"total matched" count when "max_count" is specified.
Public bug reported:
If I'm reading the code right, the exit value for "nova-manage db
online_data_migrations" will be 1 if we actually performed some
migrations and 0 if we performed no migrations, either because there
were no remaining migrations or because the migration code raised an
': {}, 'binding:host_id': 'compute-6'}
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: In Progress
** Tags: neutron
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https
;Shared storage migration requires either shared storage or boot-from-
volume with no local disks."
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: In Progress
** Tags: compute
--
You received this bug notification because you are a member o
Public bug reported:
The nova code implicitly assumes that all vCPUs are realtime in
nova.virt.hardware.vcpus_realtime_topology(), and then it appends the
user-specified mask.
This only makes sense if the user-specified cpu_realtime_mask is an
exclusion mask, but this isn't documented anywhere.
Public bug reported:
We recently hit an issue where an evacuating instance with dedicated
cpu_policy being pinned to same host CPUs as other instances with
dedicated cpu_policy. During subsequent resource audits we would see cpu
pinning errors.
The root cause appears to be the fact that the
Public bug reported:
There are a number of issues related to CPU policy and CPU thread policy
where the flavor extra-spec and image properties do not match up.
The docs at https://docs.openstack.org/admin-guide/compute-cpu-
topologies.html say the following:
"Image metadata takes precedence
Public bug reported:
When doing a resize, if CONF.allow_resize_to_same_host is False, then we
set RequestSpec.ignore_hosts and then save the RequestSpec.
When we go to use the same RequestSpec on a subsequent rebuild/evacuate,
ignore_hosts is still set from the previous resize.
In
*** This bug is a duplicate of bug 1508571 ***
https://bugs.launchpad.net/bugs/1508571
** This bug has been marked a duplicate of bug 1508571
Overview panels use too wide date range as default
--
You received this bug notification because you are a member of Yahoo!
Engineering Team,
Looks like this has already been dealt with on Master via bug 1614054,
commit 6683bf9.
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
the "emulatorpin" cpuset.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: New
** Tags: compute libvirt newton-backport-potential
** Description changed:
When specifying "hw:cpu_realtime_mask" in the flavor,
Libv
Public bug reported:
I booted up an instance as follows in my stable/mitaka devstack
environment:
$ echo "this is a test" > /tmp/my_user_data.txt
$ echo "blah1" > /tmp/file1
$ echo "blah2" > /tmp/file2
$ nova boot --flavor m1.tiny --image cirros-0.3.4-x86_64-uec --config-drive
true --user-data
The review for the oslo.versionedobjects change is here:
https://review.openstack.org/#/c/355981/
** Changed in: nova
Status: New => Fix Released
** Project changed: nova => oslo.versionedobjects
--
You received this bug notification because you are a member of Yahoo!
Engineering Team,
Public bug reported:
Sorry for the complicated write-up below, but the issue is complicated.
I'm running into a problem between Mitaka and Kilo, but I *think* it'll also
hit Mitaka/Liberty. The problem scenario is when we have older and newer
services talking to each other. The problem
Public bug reported:
I'm on stable/mitaka, but the master code looks similar.
I have compute nodes configured to use qcow2 and libvirt. The flavor
has an ephemeral disk and a swap disk. I boot an instance with this
flavor, and the instance is boot-from-volume.
When I try to cold-migrate the
Public bug reported:
I recently ran into an issue where I was trying to boot an instance with
8 vCPUs, with hw:cpu_policy=dedicated. The host had 8 pCPUs available,
but they were a mix of siblings and non-siblings.
In virt.hardware._pack_instance_onto_cores(), the _get_pinning()
function seems
in the case of the stale
migration we will have hit the error case in _pair_instances_to_migrations(),
and so the instance will be lazy-loaded from the DB, ensuring that its
migration ID is up-to-date.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status
Just to clarify something, availability zones don't "have" host
aggregates. Rather, some host aggregates *are also* availability zones,
but a given host can only be in one availability zone.
I went and looked at the code, and the way it is currently written I
think it is actually okay to have
y. This
writes the original host cell usage information back to it.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: New
** Tags: compute scheduler
** Changed in: nova
Assignee: (unassigned) => Chris Friesen (cbf123)
--
You received this
Public bug reported:
In stable/mitaka in resource_tracker.py the help text for the
cpu_allocation_ratio config option reads in part:
'NOTE: This can be set per-compute, or if set to 0.0, the value '
'set on the scheduler node(s) will be used '
'and
Public bug reported:
I'm running stable/mitaka in devstack. I've got a small system with 2
pCPUs, both marked as available for pinning. They're two cores of a
single processor, no threads. "virsh capabilities" shows:
It is my understanding that
Public bug reported:
The calculation for LibvirtDriver._get_disk_over_committed_size_total()
loops over all the instances on the hypervisor to try to figure out the
total overcommitted size for all instances.
However, at the time that routine is called from
Public bug reported:
In a single-node devstack (current trunk, nova commit 6e1051b7), if you
boot an instance with a flavor that has nonzero swap and then resize to
a flavor with zero swap it causes an exception. It seems that we
somehow neglect to remove the swap file from the instance.
: Chris Friesen (cbf123)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1549032
Title:
max_net_count doesn't interact properly with min_count
Public bug reported:
I'm investigating an issue where an instance with a large disk and an
attached cinder volume was booted in a stable/kilo OpenStack setup with
the diskFilter disabled.
The timeline looks like this:
scheduler picks initial compute node
nova attempts to boot it up on one
Public bug reported:
The RPC API function for remove_volume_connection() uses a different argument
order than the ComputeManager function of the same name.
The normal RPC code uses named arguments, but the _ComputeV4Proxy version
doesn't, and it has the order wrong. This causes problems
Public bug reported:
If we call confirm_resize() that sets migration.status to 'confirming'
and sends an RPC cast to the compute node.
If there's a glitch and that cast is received but never processed,
there's no way to confirm the resize since it only looks for migrations
with a status of
Public bug reported:
In numa_get_constraints() we call
pagesize = _numa_get_pagesize_constraints(flavor, image_meta)
then later we have
if nodes or pagesize:
[setattr(c, 'pagesize', pagesize) for c in numa_topology.cells]
This ends up treating an instance which doesn't specify
Public bug reported:
I've been testing with a modified version of stable/kilo, but I believe
the bug is present in upstream stable/kilo.
When using nova with neutron, if I boot an instance, then trigger a
resize, and then delete the instance at just the right point during the
resize it ends up
Jay Pipes helpfully pointed out that the MAX_FUNC value was defined by
the PCI spec, and didn't refer to the SRIOV VF value, but rather the PCI
device function.
The original issue turned out to be a local problem generating the PCI
whitelist.
** Changed in: nova
Status: In Progress =
** Changed in: nova
Status: In Progress = Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1484742
Title:
NUMATopologyFilter doesn't account for CPU/RAM
Public bug reported:
Currently in the NUMA topology case (so multi-node guest, dedicated
CPUs, hugepages in the guest, etc.) a single guest is not allowed to
consume more CPU/RAM than the host actually has in total regardless of
the specified overcommit ratio. In other words, the overcommit
Public bug reported:
There seems to be a bug in the NUMATopologyFilter where it doesn't
properly account for cpu_allocation_ratio. (Detected on stable/kilo,
not sure if it applies to current master.)
To reproduce:
1) Create a flavor with a moderate number of CPUs (5, for example) and
enable
Public bug reported:
I'm trying to make the following change as a DB migration
+# Table instances, modify field 'vcpus_used' to Float
+compute_nodes = Table('compute_nodes', meta, autoload=True)
+vcpus_used = getattr(compute_nodes.c, 'vcpus_used')
+vcpus_used.alter(type=Float)
Public bug reported:
The MAX_FUNC value in nova/pci/devspec.py is set to 0x7. This limits us
to a relatively small number of VFs per PF, which is annoying when
trying to use SRIOV in any sort of serious way.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123
is an iterator. If we take an exception (like we can't
write the file because the filesystem is full) then we will stop
iterating over the chunks. If we don't iterate over all the chunks then
glance will keep the file open.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen (cbf123
Public bug reported:
In virt.libvirt.driver.LibvirtDriver._is_storage_shared_with() we first
check IP addresses and if they don't match then we'll try to use ssh to
check whether the storage is actually shared or not.
If ssh keys are not set up between the compute nodes for the user
running
Public bug reported:
When booting up instances, nova allows the user to specify a min count
and a max count.
Currently, if the user has quota space for max count instances, then
nova will try to create them all. If any of them can't be scheduled,
then the creation of all of them will be aborted
Public bug reported:
I'm running into an issue with kilo-3 that I think is present in current
trunk.
I think there is a race between the claimed CPUs of an instance being
persisted to the DB, and the resource audit scanning the DB for
instances and subtracting pinned CPUs from the list of
Public bug reported:
Currently when using neutron we don't update the binding:host_id during
the evacuate code path.
This can cause the evacuation to fail if we go to sleep waiting for
events in
virt.libvirt.driver.LibvirtDriver._create_domain_and_network(). Since
the binding:host_id in neutron
*** This bug is a duplicate of bug 1379451 ***
https://bugs.launchpad.net/bugs/1379451
** This bug has been marked a duplicate of bug 1379451
anti-affinity policy only honored on boot
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is
Public bug reported:
In git commit a79ecbe Russel Bryant submitted a partial fix for a race
condition when booting an instance as part of a server group with an
anti-affinity scheduler policy.
That fix only solves part of the problem, however. There are a number
of issues remaining:
1) It's
Public bug reported:
I think our usage of the updated_at field to determine whether a
service is up or not is buggy. Consider this scenario:
1) nova-compute is happily running and is up/enabled on compute-0
2) something causes nova-compute to stop (process crash, hardware fault, power
failure,
Public bug reported:
There appears to be a bug in the code dealing with adding a disabled
host to an aggregate that is exported as an availability zone.
I disable the nova-compute service on a host and then tried to add it to
an aggregate that is exported as an availabilty zone. This resulted in
Public bug reported:
I'm running nova trunk, commit 752954a.
I configured a flavor with two vcpus and extra specs
hw:cpu_policy=dedicated in order to enable vcpu pinning.
I booted up a number of instances such that there was one instance
affined to host cpus 12 and 13 on compute-0, and another
Public bug reported:
I'm running nova trunk, commit 752954a.
I configured a flavor with two vcpus and extra specs
hw:cpu_policy=dedicated in order to enable vcpu pinning.
I booted up an instance with this flavor, and virsh dumpxml shows that
the two vcpus were affined suitably to host cpus, but
Public bug reported:
According to http://specs.openstack.org/openstack/nova-
specs/specs/juno/approved/virt-driver-cpu-pinning.html, the topology of
the guest is set up as follows:
In the absence of an explicit vCPU topology request, the virt drivers
typically expose all vCPUs as sockets with 1
Public bug reported:
I'm trying to make use of huge pages as described in
http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented
/virt-driver-large-pages.html. I'm running nova kilo as of Jan 27th.
The other openstack services are juno. Libvirt is 1.2.8.
I've allocated 1
Public bug reported:
The instructions in nova/tests/functional/api_samples/README.rst say to
run GENERATE_SAMPLES=True tox -epy27 nova.tests.unit.integrated, but
that path doesn't exist anymore.
Running GENERATE_SAMPLES=True tox -e functional seems to work, but
someone who knows more than me
Public bug reported:
As it stands, if a client issuing an RPC call() sends a message to the
rabbitmq server, then the rabbitmq server does a failover the client
will wait for the full RPC timeout period (60 seconds) even though new
rabbitmq server has come up long before then and some connections
Public bug reported:
nova.servicegroup.drivers.db.DbDriver._report_state() is called every
service.report_interval seconds from a timer in order to periodically
report the service state. It calls self.conductor_api.service_update().
If this ends up calling
Public bug reported:
Runnng Havana, we're seeing live migration fail when attempting to
migrate from an Ivy-Bridge host to a Sandy-Bridge host.
However, we're using the default kvm guest config which has a safe
default virtual cpu with a subset of cpu features. /proc/cpuinfo from
within the
Importance: Undecided
Assignee: Chris Friesen (cbf123)
Status: New
** Changed in: nova
Assignee: (unassigned) = Chris Friesen (cbf123)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova
Sorry for the noise, I started reading the code and realized that it was
just taking a long time to transition over to the new function.
** Changed in: nova
Status: New = Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is
Friesen (cbf123)
Status: New
** Changed in: nova
Assignee: (unassigned) = Chris Friesen (cbf123)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1311793
Title
Public bug reported:
With current devstack I ensured I had GroupAntiAffinityFilter in
scheduler_default_filters in /etc/nova/nova.conf, restarted nova-
scheduler, then ran:
nova server-group-create --policy anti-affinity antiaffinitygroup
nova server-group-list
Public bug reported:
Currently nova will let you do this:
nova server-group-create --policy anti-affinity antiaffinitygroup
nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid
cirros0
nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid
cirros1
Public bug reported:
If I do the following:
nova server-group-create --policy affinity affinitygroup
nova boot --flavor=1 --image=cirros-0.3.1-x86_64-uec --hint group=group_uuid
cirros0
nova resize cirros0 2
The cirros0 server will be resized but when the scheduler runs it
doesn't take into
Public bug reported:
In bug 1298494 I recently saw a case where the unit tests (using sqlite)
behaved differently than devstack with mysql.
The issue seems to be when we do
filters = {'uuid': group.members, 'deleted_at': None}
instances = instance_obj.InstanceList.get_by_filters(
Public bug reported:
We were doing some testing of Havana and have run into a scenario that ended up
with two instances stuck with a task_state of REBOOTING following a reboot of
the controller:
1) We reboot the controller.
2) Right after it comes back up something calls
Looks like I misread that patch below, it's adding back the channel
error check, not the connection error check.
This may be due to a bad patch on our end, sorry for the noise.
** Changed in: nova
Status: New = Invalid
--
You received this bug notification because you are a member of
Public bug reported:
In db.sqlalchemy.api.instance_get_all_by_filters() there is code that
looks like this:
if not filters.pop('soft_deleted', False):
query_prefix = query_prefix.\
filter(models.Instance.vm_state != vm_states.SOFT_DELETED)
In sqlalchemy a comparison against a
in get_hosts():
filters = {'uuid': filter_uuids, 'deleted_at': None}
It seems that current postgres doesn't allow implicit casts. If I
change the line to:
filters = {'uuid': filter_uuids, 'deleted': 0}
Then it seems to work.
** Affects: nova
Importance: Undecided
Assignee: Chris Friesen
Public bug reported:
I'm looking at the current Icehouse code, but this applies to previous
versions as well.
When we create a new instance via _build_instance() or
_build_and_run_instance(), in both cases we call instance_claim() to
test for resources and reserve them.
During a cold migration
92 matches
Mail list logo