On 22.01.2026 18:38, Roger Pau Monne wrote:
> Physmap population has the need to use pages as big as possible to reduce
> p2m shattering. However that triggers issues when big enough pages are not
> yet scrubbed, and so scrubbing must be done at allocation time. On some
> scenarios with added contention the watchdog can trigger:
>
> Watchdog timer detects that CPU55 is stuck!
> ----[ Xen-4.17.5-21 x86_64 debug=n Not tainted ]----
> CPU: 55
> RIP: e008:[<ffff82d040204c4a>] clear_page_sse2+0x1a/0x30
> RFLAGS: 0000000000000202 CONTEXT: hypervisor (d0v12)
> [...]
> Xen call trace:
> [<ffff82d040204c4a>] R clear_page_sse2+0x1a/0x30
> [<ffff82d04022a121>] S clear_domain_page+0x11/0x20
> [<ffff82d04022c170>] S common/page_alloc.c#alloc_heap_pages+0x400/0x5a0
> [<ffff82d04022d4a7>] S alloc_domheap_pages+0x67/0x180
> [<ffff82d040226f9f>] S common/memory.c#populate_physmap+0x22f/0x3b0
> [<ffff82d040228ec8>] S do_memory_op+0x728/0x1970
>
> Introduce a mechanism to preempt page scrubbing in populate_physmap(). It
> relies on stashing the dirty page in the domain struct temporarily to
> preempt to guest context, so the scrubbing can resume when the domain
> re-enters the hypercall. The added deferral mechanism will only be used for
> domain construction, and is designed to be used with a single threaded
> domain builder. If the toolstack makes concurrent calls to
> XENMEM_populate_physmap for the same target domain it will trash stashed
> pages, resulting in slow domain physmap population.
>
> Note a similar issue is present in increase reservation. However that
> hypercall is likely to only be used once the domain is already running and
> the known implementations use 4K pages. It will be deal with in a separate
> patch using a different approach, that will also take care of the
> allocation in populate_physmap() once the domain is running.
>
> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> Signed-off-by: Roger Pau Monné <[email protected]>
> ---
> Changes since v2:
> - Introduce FREE_DOMHEAP_PAGE{,S}().
> - Remove j local counter.
> - Free page pending scrub in domain_kill() also.
Yet still not right in domain_unpause_by_systemcontroller() as well. I.e. a
toolstack action is still needed after the crash to make the memory usable
again. If you made ...
> @@ -1286,6 +1293,19 @@ int domain_kill(struct domain *d)
> rspin_barrier(&d->domain_lock);
> argo_destroy(d);
> vnuma_destroy(d->vnuma);
> + /*
> + * Attempt to free any pages pending scrub early. Toolstack can
> still
> + * trigger populate_physmap() operations at this point, and hence a
> + * final cleanup must be done in _domain_destroy().
> + */
> + rspin_lock(&d->page_alloc_lock);
> + if ( d->pending_scrub )
> + {
> + FREE_DOMHEAP_PAGES(d->pending_scrub, d->pending_scrub_order);
> + d->pending_scrub_order = 0;
> + d->pending_scrub_index = 0;
> + }
> + rspin_unlock(&d->page_alloc_lock);
... this into a small helper function (usable even from _domain_destroy(),
as locking being used doesn't matter there), it would have negligible
footprint there.
As to the comment, not being a native speaker it still feels to me as if
moving "early" earlier (after "free") might help parsing of the 1st sentence.
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -159,6 +159,66 @@ static void increase_reservation(struct memop_args *a)
> a->nr_done = i;
> }
>
> +/*
> + * Temporary storage for a domain assigned page that's not been fully
> scrubbed.
> + * Stored pages must be domheap ones.
> + *
> + * The stashed page can be freed at any time by Xen, the caller must pass the
> + * order and NUMA node requirement to the fetch function to ensure the
> + * currently stashed page matches it's requirements.
> + */
> +static void stash_allocation(struct domain *d, struct page_info *page,
> + unsigned int order, unsigned int scrub_index)
> +{
> + rspin_lock(&d->page_alloc_lock);
> +
> + /*
> + * Drop any stashed allocation to accommodated the current one. This
> + * interface is designed to be used for single-threaded domain creation.
> + */
> + if ( d->pending_scrub )
> + free_domheap_pages(d->pending_scrub, d->pending_scrub_order);
Didn't you indicate you'd move the freeing ...
> + d->pending_scrub_index = scrub_index;
> + d->pending_scrub_order = order;
> + d->pending_scrub = page;
> +
> + rspin_unlock(&d->page_alloc_lock);
> +}
> +
> +static struct page_info *get_stashed_allocation(struct domain *d,
> + unsigned int order,
> + nodeid_t node,
> + unsigned int *scrub_index)
> +{
... into this function?
Jan