When re-creating a VFIO device after liveupdate no desctructive actions
should be taken on it to avoid interrupting any ongoing DMA.
Specifically bus mastering should not be cleared and the device should
not be reset. Assume that reset works properly and skip over bus
mastering reset.
Ideally this
Similarly to the root/context pgtables, the IOMMU driver also does
phys_to_virt when walking the domain pgtables. To make this work
properly the physical memory needs to be mapped in at the correct place
in the direct map. Register a memory device to support this.
The alternative would be to wrap
In order for persistent devices to continue to DMA during kexec the bus
mastering capability needs to remain on. Do not disable bus mastering if
pkernfs is enabled, indicating that persistent devices are enabled.
Only persistent devices should have bus mastering left on during kexec
but this serve
When restoring VMs after live update kexec, the IOVAs for the guest VM
are already present in the persisted page tables. It is unnecessary to
clobber the existing pgtable entries and it may introduce races if
pgtable modifications happen concurrently with DMA.
Provide a new VFIO MAP_DMA flag which
In the previous commit VFIO was updated to be able to define persistent
pgtables on a container. Now the IOMMU driver is updated to accept the
file for persistent pgtables when the domain is allocated and use that
file as the source of pages for the pgtables.
The iommu_ops.domain_alloc callback is
The previous commits added a file type in pkernfs for IOMMU persistent
page tables. Now support actually setting persistent page tables on an
IOMMU domain. This is done via a VFIO ioctl on a VFIO container.
Userspace needs to create and open a IOMMU persistent page tables file
and then supply that
Similar to the IOMMU root pgtables file which was added in a previous
commit, now support a file type for IOMMU domain pgtables in the IOMMU
directory. These domain pgtable files only need to be useable after the
system has booted up, for example by QEMU creating one of these files
and using it to
In the next commit the IOMMU shutdown function will be modified to not
actually shut down the IOMMU when doing a kexec. To prevent leaving DMA
mappings for non-persistent devices around during kexec we add a
function to the kexec flow which iterates though all IOMMU domains and
zaps the context ent
Seeing as translations are pre-enabled, all devices will be set for
deferred attach. The deferred attached actually has to be done when
doing DMA mapping for devices to work.
There may be a better way to do this be, for example, consulting the
context entry table and only deferring attach if there
The previous commits were preparation for using pkernfs memory for IOMMU
pgtables: a file in the filesystem is available and an allocator to
allocate 4-KiB pages from that file is available.
Now use those to actually use pkernfs memory for root and context
pgtable pages. If pkernfs is enabled then
The specific IOMMU drivers will need to ability to allocate pages from a
pkernfs IOMMU pgtable file for their pgtables. Also, the IOMMU drivers
will need to ability to consistent get the same page for the root PGD
page - add a specific function to get this PGD "root" page. This is
different to allo
So far pkernfs is able to hold regular files for userspace to mmap and
in which store persisted data. Now begin the IOMMU integration for
persistent IOMMU pgtables.
A new type of inode is created for an IOMMU data directory. A new type
of inode is also created for a file which holds the IOMMU root
This will allow other subsystems to know when we're going a LU and hence
when they should be restoring rather than reinitialising state.
---
include/linux/init.h | 1 +
init/main.c | 10 ++
2 files changed, 11 insertions(+)
diff --git a/include/linux/init.h b/include/linux/init.
Make the file data useable to userspace by adding mmap. That's all that
QEMU needs for guest RAM, so that's all be bother implementing for now.
When mmaping the file the VMA is marked as PFNMAP to indicate that there
are no struct pages for the memory in this VMA. Remap_pfn_range() is
used to actu
In the previous commit a block allocator was added. Now use that block
allocator to allocate blocks for files when ftruncate is run on them.
To do that a inode_operations is added on the file inodes with a getattr
callback handling the ATTR_SIZE attribute. When this is invoked pages
are allocated,
This introduces the concept of a bitmap allocator for pages from the
pkernfs filesystem. The allocation bitmap is stored in the second half
of the first page. This imposes an artificial limit of the maximum size
of the filesystem; this needs to be made extensible.
The allocations can be zeroed, th
Add the ability to create inodes for files and directories inside
directories. Inodes are persistent in the in-memory filesystem; the
second 2 MiB is used as an "inode store." The inode store is one big array
of struct pkernfs_inodes and they use a linked list to point to the next
sibling inode or
Add an in-memory filesystem: pkernfs. Memory is donated to pkernfs by
carving it out of the normal System RAM range with the memmap= cmdline
parameter and then giving that same physical range to pkernfs with the
pkernfs= cmdline parameter.
A new filesystem is added; so far it doesn't do much excep
83eaee02b0
James Gowans (18):
pkernfs: Introduce filesystem skeleton
pkernfs: Add persistent inodes hooked into directies
pkernfs: Define an allocator for persistent pages
pkernfs: support file truncation
pkernfs: add file mmap callback
init: Add liveupdate cmdline param
pkernfs: Add fil
down,
kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown,
virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
This has been tested by doing a kexec on x86_64 and aarch64.
Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook
restart/shutdown&q
20 matches
Mail list logo