Re: [PATCH V1 05/26] vfio/container: preserve descriptors
On 2/3/2025 12:48 PM, Cédric Le Goater wrote: On 1/29/25 15:43, Steve Sistare wrote: At vfio creation time, save the value of vfio container, group, and device descriptors in CPR state. On qemu restart, vfio_realize() finds and uses the saved descriptors, and remembers the reused status for subsequent patches. The reused status is cleared when vmstate load finishes. During reuse, device and iommu state is already configured, so operations in vfio_realize that would modify the configuration, such as vfio ioctl's, are skipped. The result is that vfio_realize constructs qemu data structures that reflect the current state of the device. Signed-off-by: Steve Sistare --- hw/vfio/container.c | 105 ++ hw/vfio/cpr-legacy.c | 17 +++ include/hw/vfio/vfio-common.h | 2 + 3 files changed, 105 insertions(+), 19 deletions(-) diff --git a/hw/vfio/container.c b/hw/vfio/container.c index a90ce6c..81d0ccc 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -31,6 +31,7 @@ #include "system/reset.h" #include "trace.h" #include "qapi/error.h" +#include "migration/cpr.h" #include "pci.h" VFIOGroupList vfio_group_list = @@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd, } static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, - Error **errp) + bool reused, Error **errp) Please rename 'reused' to 'cpr_reused'. We should know what this parameter is for and I don't see any other use than CPR. Hi Cedric, glad to virtually meet you, and thanks for reviewing this. There is no other notion of "reused" in qemu -- CPR is the first to introduce it. Thus "reused" is unambiguous, it always refers to CPR. IMO shorter names without underscores make the code more readable, as long as they are unambiguous. Also, the "reused" identifier already appears in the initial series for cpr-transfer, and to switch now to a different identifier leaves us with two names for the same functionality. Right now I can cscope "reused" and find everything. For those reasons, I prefer reused, but if you feel strongly, I will rename it. { int iommu_type; const char *vioc_name; VFIOContainer *container; + /* + * If container is reused, just set its type and skip the ioctls, as the + * container and group are already configured in the kernel. + * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr. + */ + if (reused) { + if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { + iommu_type = VFIO_TYPE1v2_IOMMU; + goto skip_iommu; + } else { + error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU " + "is not supported"); + return NULL; + } + } + Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION ioctl ? and then set the iommu unless CPR reused is set. Sure, I'll mke that change. iommu_type = vfio_get_iommu_type(fd, errp); if (iommu_type < 0) { return NULL; @@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, return NULL; } +skip_iommu: I think we can avoid this 'skip_iommu' label with some minor refactoring. vioc_name = vfio_get_iommu_class_name(iommu_type); container = VFIO_IOMMU_LEGACY(object_new(vioc_name)); container->fd = fd; + container->reused = reused; container->iommu_type = iommu_type; return container; } @@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, VFIOContainer *container; VFIOContainerBase *bcontainer; int ret, fd; + bool reused; cpr_reused. VFIOAddressSpace *space; VFIOIOMMUClass *vioc; space = vfio_get_address_space(as); + fd = cpr_find_fd("vfio_container_for_group", group->groupid); + reused = (fd > 0); hmm, so we are deducing from the existence of a CprFd state element that we are doing a live update of the VM. This seems to me to be a somewhat quick heuristic. Isn't there a global helper ? Isn't the VM aware that it's being restarted after a live update ? I am not familiar with the CPR sequence. There is a global mode that can be checked, but we would still need to fetch the fd. Checking the fd alone yields tighter code. It also seems perfectly logical to me when reading the code. Can't find the cpr fd? Then we are not doing cpr. BTW, it is not heuristic. The cpr fd exists at creation time iff we are doing cpr. /* * VFIO is currently incompatible with discarding of RAM insofar as the @@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, * details once we know which type of IOMMU we are using. */ + /* + * If the container is reused,
Re: [PATCH V1 05/26] vfio/container: preserve descriptors
On 1/29/25 15:43, Steve Sistare wrote: At vfio creation time, save the value of vfio container, group, and device descriptors in CPR state. On qemu restart, vfio_realize() finds and uses the saved descriptors, and remembers the reused status for subsequent patches. The reused status is cleared when vmstate load finishes. During reuse, device and iommu state is already configured, so operations in vfio_realize that would modify the configuration, such as vfio ioctl's, are skipped. The result is that vfio_realize constructs qemu data structures that reflect the current state of the device. Signed-off-by: Steve Sistare --- hw/vfio/container.c | 105 ++ hw/vfio/cpr-legacy.c | 17 +++ include/hw/vfio/vfio-common.h | 2 + 3 files changed, 105 insertions(+), 19 deletions(-) diff --git a/hw/vfio/container.c b/hw/vfio/container.c index a90ce6c..81d0ccc 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -31,6 +31,7 @@ #include "system/reset.h" #include "trace.h" #include "qapi/error.h" +#include "migration/cpr.h" #include "pci.h" VFIOGroupList vfio_group_list = @@ -415,12 +416,28 @@ static bool vfio_set_iommu(int container_fd, int group_fd, } static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, -Error **errp) +bool reused, Error **errp) Please rename 'reused' to 'cpr_reused'. We should know what this parameter is for and I don't see any other use than CPR. { int iommu_type; const char *vioc_name; VFIOContainer *container; +/* + * If container is reused, just set its type and skip the ioctls, as the + * container and group are already configured in the kernel. + * VFIO_TYPE1v2_IOMMU is the only type that supports reuse/cpr. + */ +if (reused) { +if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { +iommu_type = VFIO_TYPE1v2_IOMMU; +goto skip_iommu; +} else { +error_setg(errp, "container was reused but VFIO_TYPE1v2_IOMMU " + "is not supported"); +return NULL; +} +} + Can we use 'iommu_type' below instead and avoid VFIO_CHECK_EXTENSION ioctl ? and then set the iommu unless CPR reused is set. iommu_type = vfio_get_iommu_type(fd, errp); if (iommu_type < 0) { return NULL; @@ -430,10 +447,12 @@ static VFIOContainer *vfio_create_container(int fd, VFIOGroup *group, return NULL; } +skip_iommu: I think we can avoid this 'skip_iommu' label with some minor refactoring. vioc_name = vfio_get_iommu_class_name(iommu_type); container = VFIO_IOMMU_LEGACY(object_new(vioc_name)); container->fd = fd; +container->reused = reused; container->iommu_type = iommu_type; return container; } @@ -543,10 +562,13 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, VFIOContainer *container; VFIOContainerBase *bcontainer; int ret, fd; +bool reused; cpr_reused. VFIOAddressSpace *space; VFIOIOMMUClass *vioc; space = vfio_get_address_space(as); +fd = cpr_find_fd("vfio_container_for_group", group->groupid); +reused = (fd > 0); hmm, so we are deducing from the existence of a CprFd state element that we are doing a live update of the VM. This seems to me to be a somewhat quick heuristic. Isn't there a global helper ? Isn't the VM aware that it's being restarted after a live update ? I am not familiar with the CPR sequence. /* * VFIO is currently incompatible with discarding of RAM insofar as the @@ -579,28 +601,52 @@ static bool vfio_connect_container(VFIOGroup *group, AddressSpace *as, * details once we know which type of IOMMU we are using. */ +/* + * If the container is reused, then the group is already attached in the + * kernel. If a container with matching fd is found, then update the + * userland group list and return. If not, then after the loop, create + * the container struct and group list. + */ + QLIST_FOREACH(bcontainer, &space->containers, next) { container = container_of(bcontainer, VFIOContainer, bcontainer); -if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { -ret = vfio_ram_block_discard_disable(container, true); -if (ret) { -error_setg_errno(errp, -ret, - "Cannot set discarding of RAM broken"); -if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, - &container->fd)) { -error_report("vfio: error disconnecting group %d from" - " container", group->groupid); -} -return false; + +if (reused) { +