Re: [RFC v1 0/4] Make KHO Stateless

2025-09-25 Thread Jason Gunthorpe
On Thu, Sep 25, 2025 at 02:27:06PM +0200, Pratyush Yadav wrote: > I think the tables should be treated as the final serialized data > structure, and should get all the same properties that other KHO > serialization formats have like stable binary format, versioning, etc. Right, that's how I see it

Re: [PATCH v5 3/4] kho: add support for preserving vmalloc allocations

2025-09-22 Thread Jason Gunthorpe
>first); > + > + while (chunk) { > + struct kho_vmalloc_chunk *tmp = chunk; > + > + kho_vmalloc_unpreserve_chunk(chunk); > + > + chunk = KHOSER_LOAD_PTR(chunk->hdr.next); > + kfree(tmp); Shouldn't this be free_page()? Otherwise looks OK Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v5 2/4] kho: replace kho_preserve_phys() with kho_preserve_pages()

2025-09-22 Thread Jason Gunthorpe
t for > vmalloc preservation. > > Signed-off-by: Mike Rapoport (Microsoft) > --- > include/linux/kexec_handover.h | 5 +++-- > kernel/kexec_handover.c| 25 +++-- > mm/memblock.c | 4 +++- > 3 files changed, 17 insertions(+), 17 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc

2025-09-20 Thread Jason Gunthorpe
On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote: > +static inline void mmap_action_remap(struct mmap_action *action, > + unsigned long addr, unsigned long pfn, unsigned long size, > + pgprot_t pgprot) > +{ > + action->type = MMAP_REMAP_PFN; > + > + ac

Re: [PATCH 08/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

2025-09-20 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 02:27:12PM +0100, Lorenzo Stoakes wrote: > It's not only remap that is a concern here, people do all kinds of weird > and wonderful things in .mmap(), sometimes in combination with remap. So it should really not be split this way, complete is a badly name prepopulate and i

Re: [PATCH v3 10/13] mm/hugetlbfs: update hugetlbfs to use mmap_prepare

2025-09-20 Thread Jason Gunthorpe
- > 4 files changed, 85 insertions(+), 52 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table

2025-09-20 Thread Jason Gunthorpe
On Tue, Sep 09, 2025 at 10:14:41PM +0200, Andrey Ryabinin wrote: > +static int kstate_preserve_phys(struct kstate_stream *stream, void *obj, > + const struct kstate_field *field) > +{ > + struct reserve_mem_table *map = obj; > + > + return kho_preserve_phys(map->

Re: [PATCH v3 11/13] mm: update mem char driver to use mmap_prepare

2025-09-20 Thread Jason Gunthorpe
desc->action_error_hook to filter the remap error to > -EAGAIN to keep behaviour consistent. Hurm, in practice this converts reserve_pfn_range()/etc conflicts into from EINVAL into EAGAIN and converts all the unlikely OOM ENOMEM failures to EAGAIN. Seems wrong/unnecessary to me, I wouldn

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-20 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 03:09:43PM +0100, Lorenzo Stoakes wrote: > > Perhaps > > > > !vma_desc_cowable() > > > > Is what many drivers are really trying to assert. > > Well no, because: > > static inline bool is_cow_mapping(vm_flags_t flags) > { > return (flags & (VM_SHARED | VM_MAYWRITE)) =

Re: [PATCH 10/16] mm/hugetlb: update hugetlbfs to use mmap_prepare, mmap_complete

2025-09-20 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 02:37:44PM +0100, Lorenzo Stoakes wrote: > On Mon, Sep 08, 2025 at 10:11:21AM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 08, 2025 at 12:10:41PM +0100, Lorenzo Stoakes wrote: > > > @@ -151,20 +123,55 @@ static int hugetlbfs_file_mmap(struct file *f

Re: [RFC v1 1/4] kho: Introduce KHO page table data structures

2025-09-20 Thread Jason Gunthorpe
> 1. Find the `start_level` from the `target_order`. (for example, > target_order = 10, start_level = 4) > 2. The path from the root down to the level above `start_level` is > fixed (index 0 at each of these levels). > 3. At `start_level`, the index is also fixed, by (1 << (63 - > PAGE_SHIFT

Re: [PATCH 08/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

2025-09-19 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 03:18:46PM +0100, Lorenzo Stoakes wrote: > On Mon, Sep 08, 2025 at 10:35:38AM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 08, 2025 at 02:27:12PM +0100, Lorenzo Stoakes wrote: > > > > > It's not only remap that is a concern here, people do

Re: [PATCH v4 12/14] mm: add shmem_zero_setup_desc()

2025-09-19 Thread Jason Gunthorpe
igned-off-by: Lorenzo Stoakes > --- > include/linux/shmem_fs.h | 3 ++- > mm/shmem.c | 41 > 2 files changed, 35 insertions(+), 9 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-19 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 05:50:18PM +0200, David Hildenbrand wrote: > So in practice there is indeed not a big difference between a private and > cow mapping. Right and most drivers just check SHARED. But if we are being documentative why they check shared is because the driver cannot tolerate CO

Re: [PATCH v4 06/14] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

2025-09-19 Thread Jason Gunthorpe
On Wed, Sep 17, 2025 at 08:11:08PM +0100, Lorenzo Stoakes wrote: > -int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr, > +static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long > addr, > unsigned long pfn, unsigned long size, pgprot_t pr

Re: [PATCH v4 07/14] mm: abstract io_remap_pfn_range() based on PFN

2025-09-19 Thread Jason Gunthorpe
please drop this. Soon future work will require something more complicated to compute if pgprot_decrypted() should be called so this unused stuff isn't going to hold up. Otherwise looks good to me Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v4 09/14] mm: add ability to take further action in vm_area_desc

2025-09-19 Thread Jason Gunthorpe
On Wed, Sep 17, 2025 at 08:11:11PM +0100, Lorenzo Stoakes wrote: > +static int mmap_action_finish(struct mmap_action *action, > + const struct vm_area_struct *vma, int err) > +{ > + /* > + * If an error occurs, unmap the VMA altogether and return an error. We > + * only cl

Re: [PATCH 08/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

2025-09-18 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:39PM +0100, Lorenzo Stoakes wrote: > remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so > it must be supplied with a correct PFN to do so. If the caller must hold > locks to be able to do this, those locks should be held across the > operation, a

Re: [PATCH v3 6/7] mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table

2025-09-18 Thread Jason Gunthorpe
On Thu, Sep 18, 2025 at 09:00:31PM +0200, Andrey Ryabinin wrote: > By contrast, KSTATE centralizes this logic. It avoids duplicating code > and lets us express the preservation details declaratively instead > of re-implementing them per struct. I didn't really see it centralize much of anything,

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-18 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 05:24:23PM +0200, David Hildenbrand wrote: > > > > > I think we need to be cautious of scope here :) I don't want to > > > accidentally break things this way. > > > > IMHO it is worth doing when you get into more driver places it is far > > more obvious why the VM_SHARED i

Re: [PATCH 10/16] mm/hugetlb: update hugetlbfs to use mmap_prepare, mmap_complete

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:41PM +0100, Lorenzo Stoakes wrote: > @@ -151,20 +123,55 @@ static int hugetlbfs_file_mmap(struct file *file, > struct vm_area_struct *vma) > vm_flags |= VM_NORESERVE; > > if (hugetlb_reserve_pages(inode, > - vma->vm_pg

Re: [PATCH] kho: make sure folio being restored is actually from KHO

2025-09-17 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 03:20:51PM +0200, Pratyush Yadav wrote: > >> >> @@ -210,16 +226,16 @@ static void kho_restore_page(struct page *page, > >> >> unsigned int order) > >> >> struct folio *kho_restore_folio(phys_addr_t phys) > >> >> { > >> >> struct page *page = pfn_to_online_page(PHY

Re: [PATCH 16/16] kcov: update kcov to use mmap_prepare, mmap_complete

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:47PM +0100, Lorenzo Stoakes wrote: > Now we have the capacity to set up the VMA in f_op->mmap_prepare and then > later, once the VMA is established, insert a mixed mapping in > f_op->mmap_complete, do so for kcov. > > We utilise the context desc->mmap_context field to

Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc

2025-09-17 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 06:57:56PM +0100, Lorenzo Stoakes wrote: > > > + /* > > > + * If an error occurs, unmap the VMA altogether and return an error. We > > > + * only clear the newly allocated VMA, since this function is only > > > + * invoked if we do NOT merge, so we only clean up the VMA w

Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 01:54:05PM +0100, Lorenzo Stoakes wrote: > > Just mark the functions as manipulating the action using the 'action' > > in the fuction name. > > Because now sub-callers that partially map using one method and partially map > using another now need to have a desc too that the

Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 01:23:30PM +0100, Lorenzo Stoakes wrote: > On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote: > > > +static inline void mmap_action_remap(struct mmap_action *action, > &

Re: [PATCH v4 3/4] kho: add support for preserving vmalloc allocations

2025-09-17 Thread Jason Gunthorpe
On Wed, Sep 17, 2025 at 02:15:28PM -0700, Andrew Morton wrote: > On Wed, 17 Sep 2025 20:40:32 +0300 Mike Rapoport wrote: > > +struct kho_vmalloc_chunk; > > +struct kho_vmalloc { > > +DECLARE_KHOSER_PTR(first, struct kho_vmalloc_chunk *); > > offtopic nit: DECLARE_KHOSER_PTR() *defines* a

Re: [PATCH v3 05/13] mm/vma: rename __mmap_prepare() function to avoid confusion

2025-09-17 Thread Jason Gunthorpe
toakes > Reviewed-by: David Hildenbrand > --- > mm/vma.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH 12/16] mm: update resctl to use mmap_prepare, mmap_complete, mmap_abort

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:43PM +0100, Lorenzo Stoakes wrote: > resctl uses remap_pfn_range(), but holds a mutex over the > operation. Therefore, establish the mutex in mmap_prepare(), release it in > mmap_complete() and release it in mmap_abort() should the operation fail. The mutex can't do a

Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc

2025-09-17 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 02:51:52PM +0100, Lorenzo Stoakes wrote: > > vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap > > helpers just aren't good for what mixedmap needs.. Mixed map need a > > list of physical pfns with a bit indicating if they are "special" or > > not. If you

Re: [PATCH v3 12/13] mm: update resctl to use mmap_prepare

2025-09-17 Thread Jason Gunthorpe
y: Reinette Chatre > --- > fs/resctrl/pseudo_lock.c | 20 +--- > 1 file changed, 9 insertions(+), 11 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [RFC v1 1/4] kho: Introduce KHO page table data structures

2025-09-17 Thread Jason Gunthorpe
On Wed, Sep 17, 2025 at 12:18:39PM -0400, Pasha Tatashin wrote: > On Wed, Sep 17, 2025 at 8:22 AM Jason Gunthorpe wrote: > > > > On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote: > > > + * kho_order_table > > > + * +---+--

Re: [RFC v1 1/4] kho: Introduce KHO page table data structures

2025-09-17 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote: > + * kho_order_table > + * +---++ > + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER | > + * ++--++ > + * | > + * | > + * v > + * ++

Re: [RFC v1 0/4] Make KHO Stateless

2025-09-17 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 07:50:15PM -0700, Jason Miu wrote: > This series transitions KHO from an xarray-based metadata tracking > system with serialization to using page table like data structures > that can be passed directly to the next kernel. > > The key motivations for this change are to: > -

Re: [PATCH v3 04/13] relay: update relay to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
t; --- > kernel/relay.c | 33 + > 1 file changed, 17 insertions(+), 16 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v3 03/13] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-16 Thread Jason Gunthorpe
. > > Signed-off-by: Lorenzo Stoakes > Reviewed-by: Jan Kara > Acked-by: David Hildenbrand > --- > fs/ntfs3/file.c| 2 +- > include/linux/mm.h | 10 ++ > mm/secretmem.c | 2 +- > 3 files changed, 12 insertions(+), 2 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v3 06/13] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

2025-09-16 Thread Jason Gunthorpe
t *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { int err; err = remap_pfn_range_prepare_vma(vma, addr, pfn, size) if (err) return err; if (IS_ENABLED(__HAVE_PFNMAP_TRACKING)) return remap_pfn_range_track(vma, addr, pfn, size, prot); return remap_pfn_range_notrack(vma, addr, pfn, size, prot); } (fix pgtable_Types.h to #define to 1 so IS_ENABLED works) But the logic here is all fine Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v3 01/13] mm/shmem: update shmem to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
| 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v3 13/13] iommufd: update to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 03:11:59PM +0100, Lorenzo Stoakes wrote: > -static int iommufd_fops_mmap(struct file *filp, struct vm_area_struct *vma) > +static int iommufd_fops_mmap_prepare(struct vm_area_desc *desc) > { > + struct file *filp = desc->file; > struct iommufd_ctx *ictx = filp->p

Re: [PATCH v3 07/13] mm: introduce io_remap_pfn_range_[prepare, complete]()

2025-09-16 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 03:11:53PM +0100, Lorenzo Stoakes wrote: > > -int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr, > - unsigned long pfn, unsigned long size, pgprot_t prot) > +static unsigned long calc_pfn(unsigned long pfn, unsigned long size) > { >

Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc

2025-09-16 Thread Jason Gunthorpe
On Tue, Sep 16, 2025 at 03:11:54PM +0100, Lorenzo Stoakes wrote: > +/* What action should be taken after an .mmap_prepare call is complete? */ > +enum mmap_action_type { > + MMAP_NOTHING, /* Mapping is complete, no further action. */ > + MMAP_REMAP_PFN, /* Remap PFN range

Re: [PATCH v3 02/13] device/dax: update devdax to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
Reviewed-by: Jan Kara > --- > drivers/dax/device.c | 32 +--- > 1 file changed, 21 insertions(+), 11 deletions(-) Reviewed-by: Jason Gunthorpe Jason

Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
On Wed, Sep 10, 2025 at 09:22:11PM +0100, Lorenzo Stoakes wrote: > +static int kcov_mmap_prepare(struct vm_area_desc *desc) > { > int res = 0; > - struct kcov *kcov = vma->vm_file->private_data; > - unsigned long size, off; > - struct page *page; > + struct kcov *kcov = desc-

Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare

2025-09-16 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 01:43:50PM +0100, Lorenzo Stoakes wrote: > > > + if (kcov->area == NULL || desc->pgoff != 0 || > > > + vma_desc_size(desc) != size) { > > > > IMHO these range checks should be cleaned up into a helper: > > > > /* Returns true if the VMA falls within starting_pgoff to > >

Re: [PATCH v3 1/2] kho: add support for preserving vmalloc allocations

2025-09-16 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 07:36:25PM +0300, Mike Rapoport wrote: > > Under the covers it all uses the generic folio based code we already > > have, but we should have appropriate wrappers around that code that > > make clear these patterns. > > Right, but that does not mean that vmalloc preserve/res

Re: [PATCH v3 1/2] kho: add support for preserving vmalloc allocations

2025-09-15 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 05:01:01PM +0300, Mike Rapoport wrote: > > kzalloc() cannot be preserved, the only thing we support today is > > alloc_page(), so this code pattern shouldn't exist. > > kzalloc(PAGE_SIZE) can be preserved, it's page aligned and we don't have to > restore it into a slab cac

Re: [PATCH v3 1/2] kho: add support for preserving vmalloc allocations

2025-09-15 Thread Jason Gunthorpe
On Mon, Sep 15, 2025 at 05:12:27PM +0300, Mike Rapoport wrote: > > I don't suppose I'd insist on it, but something to consider since you > > are likely going to do another revision anyway. > > I think vmalloc is as basic as folio. vmalloc() ultimately calls vm_area_alloc_pages() -> alloc_pages_b

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-11 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:34PM +0100, Lorenzo Stoakes wrote: > static int secretmem_mmap_prepare(struct vm_area_desc *desc) > { > - const unsigned long len = desc->end - desc->start; > + const unsigned long len = vma_desc_size(desc); > > if ((desc->vm_flags & (VM_SHARED | VM_M

Re: [PATCH 00/16] expand mmap_prepare functionality, port more users

2025-09-11 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 03:48:36PM +0100, Lorenzo Stoakes wrote: > But sadly some _do need_ to do extra work afterwards, most notably, > prepopulation. I think Jan is suggesting something more like mmap_op() { struct vma_desc desc = {}; desc.[..] = x desc.[..] = y desc.[..] = z vm

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-11 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 03:47:34PM +0100, Lorenzo Stoakes wrote: > On Mon, Sep 08, 2025 at 11:20:11AM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 08, 2025 at 03:09:43PM +0100, Lorenzo Stoakes wrote: > > > > Perhaps > > > > > > > > !vma_desc_cowab

Re: [PATCH] kho: make sure folio being restored is actually from KHO

2025-09-10 Thread Jason Gunthorpe
On Wed, Sep 10, 2025 at 05:52:04PM +0200, Pratyush Yadav wrote: > On Wed, Sep 10 2025, Matthew Wilcox wrote: > > > On Wed, Sep 10, 2025 at 05:34:40PM +0200, Pratyush Yadav wrote: > >> +#define KHO_PAGE_MAGIC 0x4b484f50U /* ASCII for 'KHOP' */ > >> + > >> +/* > >> + * KHO uses page->private, which

Re: [PATCH 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers

2025-09-09 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 02:12:00PM +0100, Lorenzo Stoakes wrote: > On Mon, Sep 08, 2025 at 09:51:01AM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 08, 2025 at 12:10:34PM +0100, Lorenzo Stoakes wrote: > > > static int secretmem_mmap_prepare(struct vm_area_desc *desc) >

Re: [RFC PATCH 1/4] kho: introduce the KHO array

2025-09-09 Thread Jason Gunthorpe
On Tue, Sep 09, 2025 at 05:40:21PM +0200, Pratyush Yadav wrote: > PS: do you know if bitfield layout is reliable for serialization? Can > different compiler versions move them around? I always thought they can. > If not, I can also use them in memfd code since they make the code > neater. It is sp

Re: [RFC PATCH 1/4] kho: introduce the KHO array

2025-09-09 Thread Jason Gunthorpe
On Tue, Sep 09, 2025 at 04:44:21PM +0200, Pratyush Yadav wrote: > The KHO Array is a data structure that behaves like a sparse array of > pointers. It is designed to be preserved and restored over Kexec > Handover (KHO), and targets only 64-bit platforms. It can store 8-byte > aligned pointers. It

Re: [PATCH 13/16] mm: update cramfs to use mmap_prepare, mmap_complete

2025-09-09 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:44PM +0100, Lorenzo Stoakes wrote: > We thread the state through the mmap_context, allowing for both PFN map and > mixed mapped pre-population. > > Signed-off-by: Lorenzo Stoakes > --- > fs/cramfs/inode.c | 134 +++--- > 1 fil

Re: [PATCH 06/16] mm: introduce the f_op->mmap_complete, mmap_abort hooks

2025-09-09 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 12:10:37PM +0100, Lorenzo Stoakes wrote: > We have introduced the f_op->mmap_prepare hook to allow for setting up a > VMA far earlier in the process of mapping memory, reducing problematic > error handling paths, but this does not provide what all > drivers/filesystems need.

Re: [PATCH v3 1/2] kho: add support for preserving vmalloc allocations

2025-09-09 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 08:12:59PM +0200, Pratyush Yadav wrote: > > +#define KHO_VMALLOC_FLAGS_MASK (VM_ALLOC | VM_ALLOW_HUGE_VMAP) > > I don't think it is a good idea to re-use VM flags. This can make adding > more flags later down the line ugly. I think it would be better to > define KHO_VMA

Re: [PATCH v3 1/2] kho: add support for preserving vmalloc allocations

2025-09-08 Thread Jason Gunthorpe
On Mon, Sep 08, 2025 at 01:35:27PM +0300, Mike Rapoport wrote: > +static struct kho_vmalloc_chunk *new_vmalloc_chunk(struct kho_vmalloc_chunk > *cur) > +{ > + struct kho_vmalloc_chunk *chunk; > + int err; > + > + chunk = kzalloc(PAGE_SIZE, GFP_KERNEL); > + if (!chunk) > +

Re: [PATCH 1/2] kho: add support for preserving vmalloc allocations

2025-09-04 Thread Jason Gunthorpe
On Wed, Sep 03, 2025 at 10:25:02PM +0300, Mike Rapoport wrote: > It seems that our major disagreement is about using 'folio' vs 'page' in > the naming. It is a folio because folio is the name for something that is a high order page and it signals that the pointer is the head page. Which is excatl

Re: [PATCH 1/2] kho: add support for preserving vmalloc allocations

2025-09-03 Thread Jason Gunthorpe
On Wed, Sep 03, 2025 at 06:38:00PM +0300, Mike Rapoport wrote: > > Don't call kho_preserve_phy if you already have a page! > > Ok, I'll add kho_preserve_page() ;-P. Cast it to a folio :P > Now seriously, by no means this is a folio, It really is. The entire bitmap thing is about preserving fo

Re: [PATCH 1/2] kho: add support for preserving vmalloc allocations

2025-09-03 Thread Jason Gunthorpe
On Wed, Sep 03, 2025 at 09:30:17AM +0300, Mike Rapoport wrote: > +int kho_preserve_vmalloc(void *ptr, phys_addr_t *preservation) > +{ > + struct kho_vmalloc_chunk *chunk, *first_chunk; > + struct vm_struct *vm = find_vm_area(ptr); > + int err; > + > + if (!vm) > + return

Re: [Hypervisor Live Update] Notes from May 19, 2025

2025-06-02 Thread Jason Gunthorpe
On Sat, May 31, 2025 at 08:16:14PM -0700, David Rientjes wrote: > Pratyush asked about the relationship between KHO and LUO. Pasha noted > that KHO provides a state machine and in RFC v2 of LUO, LUO can drive KHO > which makes the KHO debugfs interface optional. KHO activate will cause > LUO to s

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-15 Thread Jason Gunthorpe
On Fri, Apr 04, 2025 at 04:24:54PM +, Pratyush Yadav wrote: > Only if the objects in the slab cache are of a format that doesn't > change, and I am not sure if that is the case anywhere. Maybe a driver > written with KHO in mind would find it useful, but that's way down the > line. Things like

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-15 Thread Jason Gunthorpe
On Sun, Apr 06, 2025 at 07:11:14PM +0300, Mike Rapoport wrote: > > > > We know what the future use case is for the folio preservation, all > > > > the drivers and the iommu are going to rely on this. > > > > > > We don't know how much of the preservation will be based on folios. > > > > I think a

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Mon, Apr 07, 2025 at 07:31:21PM +0300, Mike Rapoport wrote: > > alloc_pages is a 0 order "folio". vmalloc is an array of 0 order > > folios (?) > > According to current Matthew's plan [1] vmalloc is misc memory :) Someday! :) > Ok, let's stick with memdesc then. Put aside the name it looks li

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Thu, Apr 10, 2025 at 05:51:51PM +0100, Matthew Wilcox wrote: > On Wed, Apr 09, 2025 at 01:28:37PM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 09, 2025 at 07:19:30PM +0300, Mike Rapoport wrote: > > > But we have memdesc today, it's struct page. > > > > N

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Wed, Apr 09, 2025 at 07:19:30PM +0300, Mike Rapoport wrote: > On Wed, Apr 09, 2025 at 12:37:14PM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 09, 2025 at 04:58:16PM +0300, Mike Rapoport wrote: > > > > > > > > I think we still don't really know what wil

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Wed, Apr 09, 2025 at 04:58:16PM +0300, Mike Rapoport wrote: > > I'm not sure that is consistent with what Matthew is trying to build, > > I think we are trying to remove 'struct page' usage, especially for > > compound pages. Right now, though it is confusing, folio is the right > > word to enco

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Wed, Apr 09, 2025 at 07:28:47PM +0300, Mike Rapoport wrote: > On Mon, Apr 07, 2025 at 11:16:26AM -0300, Jason Gunthorpe wrote: > > On Sun, Apr 06, 2025 at 07:11:14PM +0300, Mike Rapoport wrote: > > > > KHO needs to provide a way to give back an allocated struct page/folio

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Wed, Apr 09, 2025 at 12:06:27PM +0300, Mike Rapoport wrote: > Now we've settled with terminology, and given that currently memdesc == > struct page, I think we need kho_preserve_folio(struct *folio) for actual > struct folios and, apparently other high order allocations, and > kho_preserve_page

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Sun, Apr 06, 2025 at 07:34:30PM +0300, Mike Rapoport wrote: > It's more than 200 line longer than maple tree if we count the lines. > My point is both table and xarrays are trying to optimize for an unknown > goal. Not unknown, the point of the bitmap scheme is to be memory deterministic. You

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-14 Thread Jason Gunthorpe
On Fri, Apr 04, 2025 at 04:53:13PM +0300, Mike Rapoport wrote: > > Maybe change the reserved regions code to put the region list in a > > folio and preserve the folio instead of using FDT as a "demo" for the > > functionality. > > Folios are not available when we restore reserved regions, this jus

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-04-05 Thread Jason Gunthorpe
On Wed, Mar 19, 2025 at 01:35:31PM +, Pratyush Yadav wrote: > On Tue, Mar 18 2025, Jason Gunthorpe wrote: > > > On Tue, Mar 18, 2025 at 11:02:31PM +, Pratyush Yadav wrote: > > > >> I suppose we can serialize all FDs when the box is sealed and get rid of >

Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers

2025-04-05 Thread Jason Gunthorpe
On Wed, Mar 19, 2025 at 06:55:42PM -0700, Changyuan Lyu wrote: > From: Alexander Graf > > Add the core infrastructure to generate Kexec HandOver metadata. Kexec > HandOver is a mechanism that allows Linux to preserve state - arbitrary > properties as well as memory locations - across kexec. > >

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Wed, Mar 19, 2025 at 06:55:44PM -0700, Changyuan Lyu wrote: > +/** > + * kho_preserve_folio - preserve a folio across KHO. > + * @folio: folio to preserve > + * > + * Records that the entire folio is preserved across KHO. The order > + * will be preserved as well. > + * > + * Return: 0 on succes

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Thu, Apr 03, 2025 at 03:50:04PM +, Pratyush Yadav wrote: > The patch currently has a limitation where it does not free any of the > empty tables after a unpreserve operation. But Changyuan's patch also > doesn't do it so at least it is not any worse off. We do we even have unpreserve? Just

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Wed, Apr 02, 2025 at 07:16:27PM +, Pratyush Yadav wrote: > > +int kho_preserve_phys(phys_addr_t phys, size_t size) > > +{ > > + unsigned long pfn = PHYS_PFN(phys), end_pfn = PHYS_PFN(phys + size); > > + unsigned int order = ilog2(end_pfn - pfn); > > This caught my eye when playing aroun

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Thu, Apr 03, 2025 at 05:37:06PM +, Pratyush Yadav wrote: > And I think this will help make the 2 seconds much smaller as well later > down the line since we can now find out if a given page is reserved in a > few operations, and do it in parallel. Yes, most certainly > > This should be m

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Thu, Mar 27, 2025 at 05:28:40PM +, Pratyush Yadav wrote: > > Otherwise we are going to be spending months just polishing this one > > patch without any actual data on where the performance issues and hot > > spots actually are. > > The memblock_reserve side we can optimize later, I agree. B

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Fri, Apr 04, 2025 at 12:54:25PM +0300, Mike Rapoport wrote: > > IMHO it should not call kho_preserve_phys() at all. > > Do you mean that for preserving large physical ranges we need something > entirely different? If they don't use the buddy allocator, then yes? > Then we don't need the bitma

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-04-04 Thread Jason Gunthorpe
On Thu, Apr 03, 2025 at 04:58:27PM +0300, Mike Rapoport wrote: > On Thu, Apr 03, 2025 at 08:42:09AM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 02, 2025 at 07:16:27PM +, Pratyush Yadav wrote: > > > > +int kho_preserve_phys(phys_addr_t phys, size_t size) > > >

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-31 Thread Jason Gunthorpe
On Wed, Mar 26, 2025 at 10:40:29PM +, Pratyush Yadav wrote: > Ideally, kho_preserve_folio() should be similar to freeing the folio, > except that it doesn't go to buddy for re-allocation. In that case, > re-using those pages should not be a problem as long as the driver made > sure the page was

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-03-27 Thread Jason Gunthorpe
On Thu, Mar 27, 2025 at 10:03:17AM +, Pratyush Yadav wrote: > Of course, with the current linked list structure, this cannot work. But > I don't see why we need to have it. I think having a page-table like > structure would be better -- only instead of having PTEs at the lowest > levels, you h

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-03-27 Thread Jason Gunthorpe
On Mon, Mar 24, 2025 at 02:18:34PM -0400, Mike Rapoport wrote: > On Sun, Mar 23, 2025 at 03:55:52PM -0300, Jason Gunthorpe wrote: > > On Sat, Mar 22, 2025 at 03:12:26PM -0400, Mike Rapoport wrote: > > > > > > > + page->private = order; > > >

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-03-27 Thread Jason Gunthorpe
On Sat, Mar 22, 2025 at 03:12:26PM -0400, Mike Rapoport wrote: > This hunk actually came from me. I decided to keep it simple for now and > check what are the alternatives, like moving away from memblock_reserve(), > adding a maple_tree or even something else. Okat, makes sense to me > > > +

Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers

2025-03-24 Thread Jason Gunthorpe
On Mon, Mar 24, 2025 at 05:21:45PM -0700, Changyuan Lyu wrote: > Thanks for the suggestions! I am a little bit concerned about assuming > every FDT fragment is smaller than PAGE_SIZE. In case a child FDT is > larger than PAGE_SIZE, I would like to turn the single u64 in the parent > FDT into a u64

Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

2025-03-24 Thread Jason Gunthorpe
On Sun, Mar 23, 2025 at 12:07:58PM -0700, Changyuan Lyu wrote: > > > + down_read(&kho_out.tree_lock); > > > + if (kho_out.fdt) { > > > > What is the lock and fdt test for? > > It is to avoid the competition between the following 2 operations, > - converting the hashtables and mem traker to FDT, >

Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers

2025-03-24 Thread Jason Gunthorpe
On Sun, Mar 23, 2025 at 12:02:04PM -0700, Changyuan Lyu wrote: > > Why are we changing this? I much prefered the idea of having recursive > > FDTs than this notion copying eveything into tables then out into FDT? > > Now that we have the preserved pages mechanism there is a pretty > > direct path

Re: [PATCH v5 10/16] kexec: add KHO support to kexec file loads

2025-03-21 Thread Jason Gunthorpe
On Wed, Mar 19, 2025 at 06:55:45PM -0700, Changyuan Lyu wrote: > +int kho_copy_fdt(struct kimage *image) > +{ > + int err = 0; > + void *fdt; > + > + if (!kho_enable || !image->file_mode) > + return 0; > + > + if (!kho_out.fdt) { > + err = kho_finalize(); > +

Re: [Hypervisor Live Update] Notes from March 10, 2025

2025-03-20 Thread Jason Gunthorpe
> I didn't mean the exact flags value, but the ability to have > per-folio flags. The exact bits and their meaning would of course > need to be part of the ABI. Shmem uses the dirty and uptodate flags > to track some state on the folios, and the flags can affect it's > behavior (lazily zeroing ou

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-18 Thread Jason Gunthorpe
On Tue, Mar 18, 2025 at 11:02:31PM +, Pratyush Yadav wrote: > I suppose we can serialize all FDs when the box is sealed and get rid of > the struct file. If kexec fails, userspace can unseal the box, and FDs > will be deserialized into a new struct file. This way, the behaviour > from userspac

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-18 Thread Jason Gunthorpe
On Tue, Mar 18, 2025 at 03:25:25PM +0100, Christian Brauner wrote: > > It is not really a stash, it is not keeping files, it is hardwired to > > Right now as written it is keeping references to files in these fdboxes > and thus functioning both as a crippled high-privileged fdstore and a > serial

Re: [Hypervisor Live Update] Notes from March 10, 2025

2025-03-17 Thread Jason Gunthorpe
On Sun, Mar 16, 2025 at 08:52:43PM -0700, David Rientjes wrote: > Pasha asked how drivers would know if reservations would be denied in the > first 4GB of memory. Mike said an error code would be returned. Pasha > was specific about devices that wanted to preserve the memory because > they knew

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-17 Thread Jason Gunthorpe
On Sun, Mar 09, 2025 at 01:03:31PM +0100, Christian Brauner wrote: > So either that work is done right from the start or that stashing files > goes out the window and instead that KHO part is implemented in a way > where during a KHO dump relevant userspace is notified that they must > now seriali

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-17 Thread Jason Gunthorpe
On Sat, Mar 08, 2025 at 12:09:53PM +0100, Christian Brauner wrote: > On Fri, Mar 07, 2025 at 11:14:17AM -0400, Jason Gunthorpe wrote: > > On Fri, Mar 07, 2025 at 10:31:39AM +0100, Christian Brauner wrote: > > > On Fri, Mar 07, 2025 at 12:57:35AM +, Pratyush Yadav wrote

Re: [RFC PATCH 1/5] misc: introduce FDBox

2025-03-07 Thread Jason Gunthorpe
On Fri, Mar 07, 2025 at 10:31:39AM +0100, Christian Brauner wrote: > On Fri, Mar 07, 2025 at 12:57:35AM +, Pratyush Yadav wrote: > > The File Descriptor Box (FDBox) is a mechanism for userspace to name > > file descriptors and give them over to the kernel to hold. They can > > later be retrieve

Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers

2025-02-24 Thread Jason Gunthorpe
On Sun, Feb 23, 2025 at 08:51:27PM +0200, Mike Rapoport wrote: > On Wed, Feb 12, 2025 at 01:43:03PM -0400, Jason Gunthorpe wrote: > > On Wed, Feb 12, 2025 at 06:39:06PM +0200, Mike Rapoport wrote: > > > > > As I've mentioned off-list earlier, KHO in its current form

Re: [Hypervisor Live Update] Notes from February 10, 2025

2025-02-19 Thread Jason Gunthorpe
On Tue, Feb 18, 2025 at 08:04:47PM -0800, David Rientjes wrote: > - the future of guestmemfs and what it becomes, including alignment so >prototyping can be done IMHO we need a generic FDBOX sort of filesystem and the ability to put guestmemfd, memfd and hugetlbfs (fd) into it. This would co

Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers

2025-02-12 Thread Jason Gunthorpe
On Wed, Feb 12, 2025 at 06:39:06PM +0200, Mike Rapoport wrote: > As I've mentioned off-list earlier, KHO in its current form is the lowest > level of abstraction for state preservation and it is by no means is > intended to provide complex drivers with all the tools necessary. My point, is I thin

Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers

2025-02-12 Thread Jason Gunthorpe
On Tue, Feb 11, 2025 at 12:37:20PM -0400, Jason Gunthorpe wrote: > To do that you need to preserve folios as the basic primitive. I made a small sketch of what I suggest. I imagine the FDT schema for this would look something like this: /dts-v1/; / { compatible = "linux-kho,v1"

  1   2   >