On 22.01.2026 18:38, Roger Pau Monne wrote:
> Physmap population has the need to use pages as big as possible to reduce
> p2m shattering.  However that triggers issues when big enough pages are not
> yet scrubbed, and so scrubbing must be done at allocation time.  On some
> scenarios with added contention the watchdog can trigger:
> 
> Watchdog timer detects that CPU55 is stuck!
> ----[ Xen-4.17.5-21  x86_64  debug=n  Not tainted ]----
> CPU:    55
> RIP:    e008:[<ffff82d040204c4a>] clear_page_sse2+0x1a/0x30
> RFLAGS: 0000000000000202   CONTEXT: hypervisor (d0v12)
> [...]
> Xen call trace:
>    [<ffff82d040204c4a>] R clear_page_sse2+0x1a/0x30
>    [<ffff82d04022a121>] S clear_domain_page+0x11/0x20
>    [<ffff82d04022c170>] S common/page_alloc.c#alloc_heap_pages+0x400/0x5a0
>    [<ffff82d04022d4a7>] S alloc_domheap_pages+0x67/0x180
>    [<ffff82d040226f9f>] S common/memory.c#populate_physmap+0x22f/0x3b0
>    [<ffff82d040228ec8>] S do_memory_op+0x728/0x1970
> 
> Introduce a mechanism to preempt page scrubbing in populate_physmap().  It
> relies on stashing the dirty page in the domain struct temporarily to
> preempt to guest context, so the scrubbing can resume when the domain
> re-enters the hypercall.  The added deferral mechanism will only be used for
> domain construction, and is designed to be used with a single threaded
> domain builder.  If the toolstack makes concurrent calls to
> XENMEM_populate_physmap for the same target domain it will trash stashed
> pages, resulting in slow domain physmap population.
> 
> Note a similar issue is present in increase reservation.  However that
> hypercall is likely to only be used once the domain is already running and
> the known implementations use 4K pages. It will be deal with in a separate
> patch using a different approach, that will also take care of the
> allocation in populate_physmap() once the domain is running.
> 
> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> Signed-off-by: Roger Pau Monné <[email protected]>
> ---
> Changes since v2:
>  - Introduce FREE_DOMHEAP_PAGE{,S}().
>  - Remove j local counter.
>  - Free page pending scrub in domain_kill() also.

Yet still not right in domain_unpause_by_systemcontroller() as well. I.e. a
toolstack action is still needed after the crash to make the memory usable
again. If you made ...

> @@ -1286,6 +1293,19 @@ int domain_kill(struct domain *d)
>          rspin_barrier(&d->domain_lock);
>          argo_destroy(d);
>          vnuma_destroy(d->vnuma);
> +        /*
> +         * Attempt to free any pages pending scrub early.  Toolstack can 
> still
> +         * trigger populate_physmap() operations at this point, and hence a
> +         * final cleanup must be done in _domain_destroy().
> +         */
> +        rspin_lock(&d->page_alloc_lock);
> +        if ( d->pending_scrub )
> +        {
> +            FREE_DOMHEAP_PAGES(d->pending_scrub, d->pending_scrub_order);
> +            d->pending_scrub_order = 0;
> +            d->pending_scrub_index = 0;
> +        }
> +        rspin_unlock(&d->page_alloc_lock);

... this into a small helper function (usable even from _domain_destroy(),
as locking being used doesn't matter there), it would have negligible
footprint there.

As to the comment, not being a native speaker it still feels to me as if
moving "early" earlier (after "free") might help parsing of the 1st sentence.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -159,6 +159,66 @@ static void increase_reservation(struct memop_args *a)
>      a->nr_done = i;
>  }
>  
> +/*
> + * Temporary storage for a domain assigned page that's not been fully 
> scrubbed.
> + * Stored pages must be domheap ones.
> + *
> + * The stashed page can be freed at any time by Xen, the caller must pass the
> + * order and NUMA node requirement to the fetch function to ensure the
> + * currently stashed page matches it's requirements.
> + */
> +static void stash_allocation(struct domain *d, struct page_info *page,
> +                             unsigned int order, unsigned int scrub_index)
> +{
> +    rspin_lock(&d->page_alloc_lock);
> +
> +    /*
> +     * Drop any stashed allocation to accommodated the current one.  This
> +     * interface is designed to be used for single-threaded domain creation.
> +     */
> +    if ( d->pending_scrub )
> +        free_domheap_pages(d->pending_scrub, d->pending_scrub_order);

Didn't you indicate you'd move the freeing ...

> +    d->pending_scrub_index = scrub_index;
> +    d->pending_scrub_order = order;
> +    d->pending_scrub = page;
> +
> +    rspin_unlock(&d->page_alloc_lock);
> +}
> +
> +static struct page_info *get_stashed_allocation(struct domain *d,
> +                                                unsigned int order,
> +                                                nodeid_t node,
> +                                                unsigned int *scrub_index)
> +{

... into this function?

Jan

Reply via email to