On 24/12/2025 7:40 pm, Roger Pau Monne wrote:
> The current logic splits the update of the amount of available memory in
> the system (total_avail_pages) and pending claims into two separately
> locked regions. This leads to a window between counters adjustments where
> the result of total_avail_pages - outstanding_claims doesn't reflect the
> real amount of free memory available, and can return a negative value due
> to total_avail_pages having been updated ahead of outstanding_claims.
>
> Fix by adjusting outstanding_claims and d->outstanding_pages in the same
> place where total_avail_pages is updated. Note that accesses to
> d->outstanding_pages is protected by the global heap_lock, just like
> total_avail_pages or outstanding_claims. Add a comment to the field
> declaration, and also adjust the comment at the top of
> domain_set_outstanding_pages() to be clearer in that regard.
>
> Finally, due to claims being adjusted ahead of pages having been assigned
> to the domain, add logic to re-gain the claim in case assign_page() fails.
> Otherwise the page is freed and the claimed amount would be lost.
>
> Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory
> ops)")
> Signed-off-by: Roger Pau Monné <[email protected]>
> ---
> Changes since v1:
> - Regain the claim if allocated page cannot be assigned to the domain.
> - Adjust comments regarding d->outstanding_pages being protected by the
> heap_lock (instead of the d->page_alloc_lock).
This is a complicated patch, owing to the churn from adding extra
parameters.
I've had a go splitting this patch in half. First to adjust the
parameters, and second the bugfix.
https://gitlab.com/xen-project/hardware/xen-staging/-/commits/andrew/roger-claims
I think the result is nicer to follow. Thoughts?
As to the logical part of the change, the extra parameters are ugly but
I can't see a better option.
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 1f67b88a8933..b73f6e264514 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1071,6 +1049,28 @@ static struct page_info *alloc_heap_pages(
> total_avail_pages -= request;
> ASSERT(total_avail_pages >= 0);
>
> + if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> + {
> + /*
> + * Adjust claims in the same locked region where total_avail_pages is
> + * adjusted, not doing so would lead to a window where the amount of
> + * free memory (avail - claimed) would be incorrect.
> + *
> + * Note that by adjusting the claimed amount here it's possible for
> + * pages to fail to be assigned to the claiming domain while already
> + * having been subtracted from d->outstanding_pages. Such claimed
> + * amount is then lost, as the pages that fail to be assigned to the
> + * domain are freed without replenishing the claim.
> + */
> + unsigned int outstanding = min(d->outstanding_pages, request);
> +
> + BUG_ON(outstanding > outstanding_claims);
> + outstanding_claims -= outstanding;
> + d->outstanding_pages -= outstanding;
> + ASSERT(claimed);
> + *claimed = outstanding;
Something that's not explicitly called out is that it is invalid to pass
claims=NULL for a non-NULL d.
There are only 3 callers, and two of them are updated in this way (the
3rd passing NULL for both), but it caught me a little off-guard.
In principle alloc_heap_pages() could return {pg, claimed} via a struct
which avoids this entanglement, but is unergonomic to express.
> @@ -1553,6 +1554,12 @@ static void free_heap_pages(
>
> avail[node][zone] += 1 << order;
> total_avail_pages += 1 << order;
> + if ( d && claim )
> + {
> + outstanding_claims += claim;
> + d->outstanding_pages += claim;
> + }
In the rework, I added a comment here:
+ if ( d && claim )
+ {
+ /*
+ * An allocation can consume part of a claim and subsequently
+ * fail. In such a case, the claim needs re-crediting.
+ */
+ outstanding_claims += claim;
+ d->outstanding_pages += claim;
+ }
because it's very non-intuitive that `claim` is only non-zero in the
case of a failed domheap allocation. Calling the parameter rebate (or
claim_rebate) would be clearer, but I'm not sure if it's a common enough
world to be terribly meaningful to non-native speakers.
It really is inconvenient that you can deplete the claim with part of a
single allocation. The intended use doesn't care for this corner case;
you're supposed to claim what you need and have it drop exactly to zero
as construction progresses (anything else is a toolstack error).
However, taking such a simplification to the claims behaviour doesn't
help here AFAICT; you still need something to distinguish the rebate
case from others, so you can't drop the parameter entirely.
~Andrew