[RFC 18/18] vfio-pci: Assume device working after liveupdate

2024-02-05 Thread James Gowans
When re-creating a VFIO device after liveupdate no desctructive actions should be taken on it to avoid interrupting any ongoing DMA. Specifically bus mastering should not be cleared and the device should not be reset. Assume that reset works properly and skip over bus mastering reset. Ideally this

[RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables

2024-02-05 Thread James Gowans
Similarly to the root/context pgtables, the IOMMU driver also does phys_to_virt when walking the domain pgtables. To make this work properly the physical memory needs to be mapped in at the correct place in the direct map. Register a memory device to support this. The alternative would be to wrap

[RFC 17/18] pci: Don't clear bus master is persistence enabled

2024-02-05 Thread James Gowans
In order for persistent devices to continue to DMA during kexec the bus mastering capability needs to remain on. Do not disable bus mastering if pkernfs is enabled, indicating that persistent devices are enabled. Only persistent devices should have bus mastering left on during kexec but this serve

[RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update

2024-02-05 Thread James Gowans
When restoring VMs after live update kexec, the IOVAs for the guest VM are already present in the persisted page tables. It is unnecessary to clobber the existing pgtable entries and it may introduce races if pgtable modifications happen concurrently with DMA. Provide a new VFIO MAP_DMA flag which

[RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs

2024-02-05 Thread James Gowans
In the previous commit VFIO was updated to be able to define persistent pgtables on a container. Now the IOMMU driver is updated to accept the file for persistent pgtables when the domain is allocated and use that file as the source of pages for the pgtables. The iommu_ops.domain_alloc callback is

[RFC 13/18] vfio: add ioctl to define persistent pgtables on container

2024-02-05 Thread James Gowans
The previous commits added a file type in pkernfs for IOMMU persistent page tables. Now support actually setting persistent page tables on an IOMMU domain. This is done via a VFIO ioctl on a VFIO container. Userspace needs to create and open a IOMMU persistent page tables file and then supply that

[RFC 12/18] pkernfs: Add IOMMU domain pgtables file

2024-02-05 Thread James Gowans
Similar to the IOMMU root pgtables file which was added in a previous commit, now support a file type for IOMMU domain pgtables in the IOMMU directory. These domain pgtable files only need to be useable after the system has booted up, for example by QEMU creating one of these files and using it to

[RFC 10/18] iommu/intel: zap context table entries on kexec

2024-02-05 Thread James Gowans
In the next commit the IOMMU shutdown function will be modified to not actually shut down the IOMMU when doing a kexec. To prevent leaving DMA mappings for non-persistent devices around during kexec we add a function to the kexec flow which iterates though all IOMMU domains and zaps the context ent

[RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate

2024-02-05 Thread James Gowans
Seeing as translations are pre-enabled, all devices will be set for deferred attach. The deferred attached actually has to be done when doing DMA mapping for devices to work. There may be a better way to do this be, for example, consulting the context entry table and only deferring attach if there

[RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages

2024-02-05 Thread James Gowans
The previous commits were preparation for using pkernfs memory for IOMMU pgtables: a file in the filesystem is available and an allocator to allocate 4-KiB pages from that file is available. Now use those to actually use pkernfs memory for root and context pgtable pages. If pkernfs is enabled then

[RFC 08/18] iommu: Add allocator for pgtables from persistent region

2024-02-05 Thread James Gowans
The specific IOMMU drivers will need to ability to allocate pages from a pkernfs IOMMU pgtable file for their pgtables. Also, the IOMMU drivers will need to ability to consistent get the same page for the root PGD page - add a specific function to get this PGD "root" page. This is different to allo

[RFC 07/18] pkernfs: Add file type for IOMMU root pgtables

2024-02-05 Thread James Gowans
So far pkernfs is able to hold regular files for userspace to mmap and in which store persisted data. Now begin the IOMMU integration for persistent IOMMU pgtables. A new type of inode is created for an IOMMU data directory. A new type of inode is also created for a file which holds the IOMMU root

[RFC 06/18] init: Add liveupdate cmdline param

2024-02-05 Thread James Gowans
This will allow other subsystems to know when we're going a LU and hence when they should be restoring rather than reinitialising state. --- include/linux/init.h | 1 + init/main.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/include/linux/init.h b/include/linux/init.

[RFC 05/18] pkernfs: add file mmap callback

2024-02-05 Thread James Gowans
Make the file data useable to userspace by adding mmap. That's all that QEMU needs for guest RAM, so that's all be bother implementing for now. When mmaping the file the VMA is marked as PFNMAP to indicate that there are no struct pages for the memory in this VMA. Remap_pfn_range() is used to actu

[RFC 04/18] pkernfs: support file truncation

2024-02-05 Thread James Gowans
In the previous commit a block allocator was added. Now use that block allocator to allocate blocks for files when ftruncate is run on them. To do that a inode_operations is added on the file inodes with a getattr callback handling the ATTR_SIZE attribute. When this is invoked pages are allocated,

[RFC 03/18] pkernfs: Define an allocator for persistent pages

2024-02-05 Thread James Gowans
This introduces the concept of a bitmap allocator for pages from the pkernfs filesystem. The allocation bitmap is stored in the second half of the first page. This imposes an artificial limit of the maximum size of the filesystem; this needs to be made extensible. The allocations can be zeroed, th

[RFC 02/18] pkernfs: Add persistent inodes hooked into directies

2024-02-05 Thread James Gowans
Add the ability to create inodes for files and directories inside directories. Inodes are persistent in the in-memory filesystem; the second 2 MiB is used as an "inode store." The inode store is one big array of struct pkernfs_inodes and they use a linked list to point to the next sibling inode or

[RFC 01/18] pkernfs: Introduce filesystem skeleton

2024-02-05 Thread James Gowans
Add an in-memory filesystem: pkernfs. Memory is donated to pkernfs by carving it out of the normal System RAM range with the memmap= cmdline parameter and then giving that same physical range to pkernfs with the pkernfs= cmdline parameter. A new filesystem is added; so far it doesn't do much excep

[RFC 00/18] Pkernfs: Support persistence for live update

2024-02-05 Thread James Gowans
83eaee02b0 James Gowans (18): pkernfs: Introduce filesystem skeleton pkernfs: Add persistent inodes hooked into directies pkernfs: Define an allocator for persistent pages pkernfs: support file truncation pkernfs: add file mmap callback init: Add liveupdate cmdline param pkernfs: Add fil

[PATCH] kexec: do syscore_shutdown() in kernel_kexec

2023-12-12 Thread James Gowans
down, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown, This has been tested by doing a kexec on x86_64 and aarch64. Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown&q