[Yahoo-eng-team] [Bug 1887380] [NEW] Attaching virtual GPU devices to guests in nova

ryan Mon, 13 Jul 2020 06:11:17 -0700

Public bug reported:


This bug tracker is for errors with the documentation, use the following
as a template and remove or add fields as you see fit. Convert [ ] into
[x] to check boxes:

- [X] This is a doc addition request.

Hi, a problem came up when we are using nova(Queens) configured with the
vGPU feature to create several instances. It seems multiple instances
preempt the same vGPU resource, in our case, on the exact same instance
which has acquired a vGPU already. Here is the error reported in the
log:

"libvirt.libvirtError: Requested operation is not valid: mediated device
/sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx"

Apparently, nova is trying to allocate the vGPU resource that is already
being used by another instance. Also, we ruled out a situation that
there is not enough vGPU resources on the host. In our case, 25% of
instances fell into error-creating state while we are only creating
instances which only need 50% of all vGPU resources. From our
perspective, the problem is with the nova-scheduler. Any idea how to
work this out?

Thanks

Ruien Zhang
zhangru...@bytedance.com

-----------------------------------
Release: 21.1.0.dev214 on 2020-04-28 20:09:00
SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c
Source: https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst
URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887380

Title:
  Attaching virtual GPU devices to guests in nova

Status in OpenStack Compute (nova):
  New

Bug description:

  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [X] This is a doc addition request.

  Hi, a problem came up when we are using nova(Queens) configured with
  the vGPU feature to create several instances. It seems multiple
  instances preempt the same vGPU resource, in our case, on the exact
  same instance which has acquired a vGPU already. Here is the error
  reported in the log:

  "libvirt.libvirtError: Requested operation is not valid: mediated
  device /sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx"

  Apparently, nova is trying to allocate the vGPU resource that is
  already being used by another instance. Also, we ruled out a situation
  that there is not enough vGPU resources on the host. In our case, 25%
  of instances fell into error-creating state while we are only creating
  instances which only need 50% of all vGPU resources. From our
  perspective, the problem is with the nova-scheduler. Any idea how to
  work this out?

  Thanks

  Ruien Zhang
  zhangru...@bytedance.com

  -----------------------------------
  Release: 21.1.0.dev214 on 2020-04-28 20:09:00
  SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c
  Source: 
https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst
  URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1887380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1887380] [NEW] Attaching virtual GPU devices to guests in nova

Reply via email to