Re: [PATCH V1 05/26] vfio/container: preserve descriptors

2025-02-03 Thread Steven Sistare

On 2/3/2025 12:48 PM, Cédric Le Goater wrote:

On 1/29/25 15:43, Steve Sistare wrote:

At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state.  On qemu restart, vfio_realize() finds and uses
the saved descriptors, and remembers the reused status for subsequent
patches.  The reused status is cleared when vmstate load finishes.

During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped.  The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.

Signed-off-by: Steve Sistare 
---
  hw/vfio/container.c   | 105 ++
  hw/vfio/cpr-legacy.c  |  17 +++
  include/hw/vfio/vfio-common.h |   2 +
  3 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a90ce6c..81d0ccc 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -31,6 +31,7 @@
  #include "system/reset.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "migration/cpr.h"
  #include "pci.h"
  VFIOGroupList vfio_group_list =
@@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
  }
  static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,
-    Error **errp)
+    bool reused, Error **errp)


Please rename 'reused' to 'cpr_reused'. We should know what this parameter
is for and I don't see any other use than CPR.


Hi Cedric, glad to virtually meet you, and thanks for reviewing this.

There is no other notion of "reused" in qemu -- CPR is the first to introduce
it.  Thus "reused" is unambiguous, it always refers to CPR.  IMO shorter names
without underscores make the code more readable, as long as they are 
unambiguous.

Also, the "reused" identifier already appears in the initial series for
cpr-transfer, and to switch now to a different identifier leaves us with two
names for the same functionality.  Right now I can cscope "reused" and find
everything.

For those reasons, I prefer reused, but if you feel strongly, I will rename it.


  {
  int iommu_type;
  const char *vioc_name;
  VFIOContainer *container;
+    /*
+ * If container is reused, just set its type and skip the ioctls, as the
+ * container and group are already configured in the kernel.
+ * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr.
+ */
+    if (reused) {
+    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
+    iommu_type = VFIO_TYPE1v2_IOMMU;
+    goto skip_iommu;
+    } else {
+    error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU "
+ "is not supported");
+    return NULL;
+    }
+    }
+


Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION
ioctl ? and then set the iommu unless CPR reused is set.


Sure, I'll mke that change.


  iommu_type = vfio_get_iommu_type(fd, errp);
  if (iommu_type < 0) {
  return NULL;
@@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, 
VFIOGroup *group,
  return NULL;
  }
+skip_iommu:


I think we can avoid this 'skip_iommu' label with some minor refactoring.


  vioc_name = vfio_get_iommu_class_name(iommu_type);
  container = VFIO_IOMMU_LEGACY(object_new(vioc_name));
  container->fd = fd;
+    container->reused = reused;
  container->iommu_type = iommu_type;
  return container;
  }
@@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  VFIOContainer *container;
  VFIOContainerBase *bcontainer;
  int ret, fd;
+    bool reused;


cpr_reused.


  VFIOAddressSpace *space;
  VFIOIOMMUClass *vioc;
  space = vfio_get_address_space(as);
+    fd = cpr_find_fd("vfio_container_for_group", group->groupid);
+    reused = (fd > 0);



hmm, so we are deducing from the existence of a CprFd state element
that we are doing a live update of the VM.  This seems to me to be a
somewhat quick heuristic.

Isn't there a global helper ? Isn't the VM aware that it's being
restarted after a live update ? I am not familiar with the CPR
sequence.


There is a global mode that can be checked, but we would still need to
fetch the fd.  Checking the fd alone yields tighter code.  It also seems
perfectly logical to me when reading the code.  Can't find the cpr fd?
Then we are not doing cpr.  BTW, it is not heuristic.  The cpr fd exists
at creation time iff we are doing cpr.


  /*
   * VFIO is currently incompatible with discarding of RAM insofar as the
@@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
   * details once we know which type of IOMMU we are using.
   */
+    /*
+ * If the container is reused, 

Re: [PATCH V1 05/26] vfio/container: preserve descriptors

2025-02-03 Thread Cédric Le Goater

On 1/29/25 15:43, Steve Sistare wrote:

At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state.  On qemu restart, vfio_realize() finds and uses
the saved descriptors, and remembers the reused status for subsequent
patches.  The reused status is cleared when vmstate load finishes.

During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped.  The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.

Signed-off-by: Steve Sistare 
---
  hw/vfio/container.c   | 105 ++
  hw/vfio/cpr-legacy.c  |  17 +++
  include/hw/vfio/vfio-common.h |   2 +
  3 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a90ce6c..81d0ccc 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -31,6 +31,7 @@
  #include "system/reset.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "migration/cpr.h"
  #include "pci.h"
  
  VFIOGroupList vfio_group_list =

@@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd,
  }
  
  static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group,

-Error **errp)
+bool reused, Error **errp)


Please rename 'reused' to 'cpr_reused'. We should know what this parameter
is for and I don't see any other use than CPR.


  {
  int iommu_type;
  const char *vioc_name;
  VFIOContainer *container;
  
+/*

+ * If container is reused, just set its type and skip the ioctls, as the
+ * container and group are already configured in the kernel.
+ * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr.
+ */
+if (reused) {
+if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
+iommu_type = VFIO_TYPE1v2_IOMMU;
+goto skip_iommu;
+} else {
+error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU "
+ "is not supported");
+return NULL;
+}
+}
+


Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION
ioctl ? and then set the iommu unless CPR reused is set.


  iommu_type = vfio_get_iommu_type(fd, errp);
  if (iommu_type < 0) {
  return NULL;
@@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, 
VFIOGroup *group,
  return NULL;
  }
  
+skip_iommu:


I think we can avoid this 'skip_iommu' label with some minor refactoring.


  vioc_name = vfio_get_iommu_class_name(iommu_type);
  
  container = VFIO_IOMMU_LEGACY(object_new(vioc_name));

  container->fd = fd;
+container->reused = reused;
  container->iommu_type = iommu_type;
  return container;
  }
@@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
  VFIOContainer *container;
  VFIOContainerBase *bcontainer;
  int ret, fd;
+bool reused;


cpr_reused.


  VFIOAddressSpace *space;
  VFIOIOMMUClass *vioc;
  
  space = vfio_get_address_space(as);

+fd = cpr_find_fd("vfio_container_for_group", group->groupid);
+reused = (fd > 0);



hmm, so we are deducing from the existence of a CprFd state element
that we are doing a live update of the VM.  This seems to me to be a
somewhat quick heuristic.

Isn't there a global helper ? Isn't the VM aware that it's being
restarted after a live update ? I am not familiar with the CPR
sequence.


  /*
   * VFIO is currently incompatible with discarding of RAM insofar as the
@@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
   * details once we know which type of IOMMU we are using.
   */
  
+/*

+ * If the container is reused, then the group is already attached in the
+ * kernel.  If a container with matching fd is found, then update the
+ * userland group list and return.  If not, then after the loop, create
+ * the container struct and group list.
+ */
+
  QLIST_FOREACH(bcontainer, &space->containers, next) {
  container = container_of(bcontainer, VFIOContainer, bcontainer);
-if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-ret = vfio_ram_block_discard_disable(container, true);
-if (ret) {
-error_setg_errno(errp, -ret,
- "Cannot set discarding of RAM broken");
-if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER,
-  &container->fd)) {
-error_report("vfio: error disconnecting group %d from"
- " container", group->groupid);
-}
-return false;
+
+if (reused) {
+