On 24.12.2025 23:31, Andrew Cooper wrote:
> On 24/12/2025 7:40 pm, Roger Pau Monne wrote:
>> The current logic splits the update of the amount of available memory in
>> the system (total_avail_pages) and pending claims into two separately
>> locked regions. This leads to a window between counters adjustments where
>> the result of total_avail_pages - outstanding_claims doesn't reflect the
>> real amount of free memory available, and can return a negative value due
>> to total_avail_pages having been updated ahead of outstanding_claims.
>>
>> Fix by adjusting outstanding_claims and d->outstanding_pages in the same
>> place where total_avail_pages is updated. Note that accesses to
>> d->outstanding_pages is protected by the global heap_lock, just like
>> total_avail_pages or outstanding_claims. Add a comment to the field
>> declaration, and also adjust the comment at the top of
>> domain_set_outstanding_pages() to be clearer in that regard.
>>
>> Finally, due to claims being adjusted ahead of pages having been assigned
>> to the domain, add logic to re-gain the claim in case assign_page() fails.
>> Otherwise the page is freed and the claimed amount would be lost.
>>
>> Fixes: 65c9792df600 ("mmu: Introduce XENMEM_claim_pages (subop of memory
>> ops)")
>> Signed-off-by: Roger Pau Monné <[email protected]>
>> ---
>> Changes since v1:
>> - Regain the claim if allocated page cannot be assigned to the domain.
>> - Adjust comments regarding d->outstanding_pages being protected by the
>> heap_lock (instead of the d->page_alloc_lock).
>
> This is a complicated patch, owing to the churn from adding extra
> parameters.
>
> I've had a go splitting this patch in half. First to adjust the
> parameters, and second the bugfix.
> https://gitlab.com/xen-project/hardware/xen-staging/-/commits/andrew/roger-claims
>
> I think the result is nicer to follow. Thoughts?
Question (from the unfinished v1 thread) being whether we actually need the
restoration, given Roger's analysis of the affected failure cases.
Jan