[Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-24 Thread Paul Durrant
Currently it is unsafe to assign a domheap page allocated with
MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
be incremented, but will be decrement when the page is freed (since
free_domheap_pages() has no way of telling that the increment was skipped).

This patch allocates a new 'count_info' bit for a PGC_no_refcount flag
which is then used to mark domheap pages allocated with MEMF_no_refcount.
This then allows free_domheap_pages() to skip decrementing tot_pages when
appropriate and hence makes the pages safe to assign.

NOTE: The patch sets MEMF_no_refcount directly in alloc_domheap_pages()
  rather than in assign_pages() because the latter is called with
  MEMF_no_refcount by memory_exchange() as an optimization, to avoid
  too many calls to domain_adjust_tot_pages() (which acquires and
  releases the global 'heap_lock').

Signed-off-by: Paul Durrant 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: Volodymyr Babchuk 
Cc: "Roger Pau Monné" 

v4:
 - New in v4
---
 xen/common/page_alloc.c  | 39 ---
 xen/include/asm-arm/mm.h |  5 -
 xen/include/asm-x86/mm.h |  7 +--
 3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 919a270587..6103f14a4e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -460,6 +460,9 @@ unsigned long domain_adjust_tot_pages(struct domain *d, 
long pages)
 {
 long dom_before, dom_after, dom_claimed, sys_before, sys_after;
 
+if ( !pages )
+goto out;
+
 ASSERT(spin_is_locked(&d->page_alloc_lock));
 d->tot_pages += pages;
 
@@ -1389,6 +1392,8 @@ static void free_heap_pages(
 
 for ( i = 0; i < (1 << order); i++ )
 {
+ASSERT(!(pg[i].count_info & PGC_no_refcount));
+
 /*
  * Cannot assume that count_info == 0, as there are some corner cases
  * where it isn't the case and yet it isn't a bug:
@@ -2314,11 +2319,6 @@ struct page_info *alloc_domheap_pages(
 
 if ( memflags & MEMF_no_owner )
 memflags |= MEMF_no_refcount;
-else if ( (memflags & MEMF_no_refcount) && d )
-{
-ASSERT(!(memflags & MEMF_no_refcount));
-return NULL;
-}
 
 if ( !dma_bitsize )
 memflags &= ~MEMF_no_dma;
@@ -2331,11 +2331,20 @@ struct page_info *alloc_domheap_pages(
   memflags, d)) == NULL)) )
  return NULL;
 
-if ( d && !(memflags & MEMF_no_owner) &&
- assign_pages(d, pg, order, memflags) )
+if ( d && !(memflags & MEMF_no_owner) )
 {
-free_heap_pages(pg, order, memflags & MEMF_no_scrub);
-return NULL;
+if ( assign_pages(d, pg, order, memflags) )
+{
+free_heap_pages(pg, order, memflags & MEMF_no_scrub);
+return NULL;
+}
+if ( memflags & MEMF_no_refcount )
+{
+unsigned long i;
+
+for ( i = 0; i < (1 << order); i++ )
+pg[i].count_info |= PGC_no_refcount;
+}
 }
 
 return pg;
@@ -2368,24 +2377,32 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 
 if ( likely(d) && likely(d != dom_cow) )
 {
+long pages = 0;
+
 /* NB. May recursively lock from relinquish_memory(). */
 spin_lock_recursive(&d->page_alloc_lock);
 
 for ( i = 0; i < (1 << order); i++ )
 {
+unsigned long count_info = pg[i].count_info;
+
 if ( pg[i].u.inuse.type_info & PGT_count_mask )
 {
 printk(XENLOG_ERR
"pg[%u] MFN %"PRI_mfn" c=%#lx o=%u v=%#lx t=%#x\n",
i, mfn_x(page_to_mfn(pg + i)),
-   pg[i].count_info, pg[i].v.free.order,
+   count_info, pg[i].v.free.order,
pg[i].u.free.val, pg[i].tlbflush_timestamp);
 BUG();
 }
 arch_free_heap_page(d, &pg[i]);
+if ( count_info & PGC_no_refcount )
+pg[i].count_info &= ~PGC_no_refcount;
+else
+pages--;
 }
 
-drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order));
+drop_dom_ref = !domain_adjust_tot_pages(d, pages);
 
 spin_unlock_recursive(&d->page_alloc_lock);
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 333efd3a60..1076cc9713 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -119,9 +119,12 @@ struct page_info
 #define PGC_state_offlined PG_mask(2, 9)
 #define PGC_state_freePG_mask(3, 9)
 #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+/* Page is not reference counted */
+#define _PGC_no_ref

Re: [Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-28 Thread Jan Beulich
On 24.01.2020 16:31, Paul Durrant wrote:
> Currently it is unsafe to assign a domheap page allocated with
> MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
> be incremented, but will be decrement when the page is freed (since
> free_domheap_pages() has no way of telling that the increment was skipped).
> 
> This patch allocates a new 'count_info' bit for a PGC_no_refcount flag
> which is then used to mark domheap pages allocated with MEMF_no_refcount.
> This then allows free_domheap_pages() to skip decrementing tot_pages when
> appropriate and hence makes the pages safe to assign.
> 
> NOTE: The patch sets MEMF_no_refcount directly in alloc_domheap_pages()
>   rather than in assign_pages() because the latter is called with
>   MEMF_no_refcount by memory_exchange() as an optimization, to avoid
>   too many calls to domain_adjust_tot_pages() (which acquires and
>   releases the global 'heap_lock').

I don't think there were any optimization thoughts with this. The
MEMF_no_refcount use is because otherwise for a domain with
tot_pages == max_pages the assignment would fail.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -460,6 +460,9 @@ unsigned long domain_adjust_tot_pages(struct domain *d, 
> long pages)
>  {
>  long dom_before, dom_after, dom_claimed, sys_before, sys_after;
>  
> +if ( !pages )
> +goto out;

Unrelated change? Are there, in fact, any callers passing in 0?
Oh, further down you add one which may do so, but then perhaps
better to make the caller not call here (as is done e.g. in
memory_exchange())?

> @@ -2331,11 +2331,20 @@ struct page_info *alloc_domheap_pages(
>memflags, d)) == NULL)) )
>   return NULL;
>  
> -if ( d && !(memflags & MEMF_no_owner) &&
> - assign_pages(d, pg, order, memflags) )
> +if ( d && !(memflags & MEMF_no_owner) )
>  {
> -free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> -return NULL;
> +if ( assign_pages(d, pg, order, memflags) )
> +{
> +free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> +return NULL;
> +}
> +if ( memflags & MEMF_no_refcount )
> +{
> +unsigned long i;
> +
> +for ( i = 0; i < (1 << order); i++ )
> +pg[i].count_info |= PGC_no_refcount;
> +}

I would seem to me that this needs doing the other way around:
First set PGC_no_refcount, then assign_pages(). After all, the
moment assign_pages() drops its lock, the domain could also
decide to get rid of (some of) the pages again. For this (and
also to slightly simplify things in free_domheap_pages())
perhaps it would be better not to add that ASSERT() to
free_heap_pages(). The function shouldn't really be concerned
of any refcounting, and hence could as well be ignorant to
PGC_no_refcount being set on a page.

> @@ -2368,24 +2377,32 @@ void free_domheap_pages(struct page_info *pg, 
> unsigned int order)
>  
>  if ( likely(d) && likely(d != dom_cow) )
>  {
> +long pages = 0;
> +
>  /* NB. May recursively lock from relinquish_memory(). */
>  spin_lock_recursive(&d->page_alloc_lock);
>  
>  for ( i = 0; i < (1 << order); i++ )
>  {
> +unsigned long count_info = pg[i].count_info;
> +
>  if ( pg[i].u.inuse.type_info & PGT_count_mask )
>  {
>  printk(XENLOG_ERR
> "pg[%u] MFN %"PRI_mfn" c=%#lx o=%u v=%#lx 
> t=%#x\n",
> i, mfn_x(page_to_mfn(pg + i)),
> -   pg[i].count_info, pg[i].v.free.order,
> +   count_info, pg[i].v.free.order,
> pg[i].u.free.val, pg[i].tlbflush_timestamp);
>  BUG();
>  }
>  arch_free_heap_page(d, &pg[i]);
> +if ( count_info & PGC_no_refcount )
> +pg[i].count_info &= ~PGC_no_refcount;
> +else
> +pages--;

Not only to reduce code churn, may I recommend to avoid introducing
the local variable? There's no strict rule preventing
arch_free_heap_page() from possibly playing with the field you
latch up front.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 January 2020 15:23
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign
> 
> On 24.01.2020 16:31, Paul Durrant wrote:
> > Currently it is unsafe to assign a domheap page allocated with
> > MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
> > be incremented, but will be decrement when the page is freed (since
> > free_domheap_pages() has no way of telling that the increment was
> skipped).
> >
> > This patch allocates a new 'count_info' bit for a PGC_no_refcount flag
> > which is then used to mark domheap pages allocated with
> MEMF_no_refcount.
> > This then allows free_domheap_pages() to skip decrementing tot_pages
> when
> > appropriate and hence makes the pages safe to assign.
> >
> > NOTE: The patch sets MEMF_no_refcount directly in alloc_domheap_pages()
> >   rather than in assign_pages() because the latter is called with
> >   MEMF_no_refcount by memory_exchange() as an optimization, to avoid
> >   too many calls to domain_adjust_tot_pages() (which acquires and
> >   releases the global 'heap_lock').
> 
> I don't think there were any optimization thoughts with this. The
> MEMF_no_refcount use is because otherwise for a domain with
> tot_pages == max_pages the assignment would fail.
> 

That would not be the case if the calls to steal_page() further up didn't pass 
MEMF_no_refcount (which would be the correct thing to do if not passing it to 
assign_pages(). I had originally considered doing that because I think it 
allows the somewhat complex error path after assign_pages() to be dropped. But 
avoiding thrashing the global lock seemed a good reason to leave 
memory_exchange() the way it is.

> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -460,6 +460,9 @@ unsigned long domain_adjust_tot_pages(struct domain
> *d, long pages)
> >  {
> >  long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> >
> > +if ( !pages )
> > +goto out;
> 
> Unrelated change? Are there, in fact, any callers passing in 0?
> Oh, further down you add one which may do so, but then perhaps
> better to make the caller not call here (as is done e.g. in
> memory_exchange())?

I think it's preferable for domain_adjust_tot_pages() to handle zero gracefully.

> 
> > @@ -2331,11 +2331,20 @@ struct page_info *alloc_domheap_pages(
> >memflags, d)) == NULL)) )
> >   return NULL;
> >
> > -if ( d && !(memflags & MEMF_no_owner) &&
> > - assign_pages(d, pg, order, memflags) )
> > +if ( d && !(memflags & MEMF_no_owner) )
> >  {
> > -free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > -return NULL;
> > +if ( assign_pages(d, pg, order, memflags) )
> > +{
> > +free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > +return NULL;
> > +}
> > +if ( memflags & MEMF_no_refcount )
> > +{
> > +unsigned long i;
> > +
> > +for ( i = 0; i < (1 << order); i++ )
> > +pg[i].count_info |= PGC_no_refcount;
> > +}
> 
> I would seem to me that this needs doing the other way around:
> First set PGC_no_refcount, then assign_pages(). After all, the
> moment assign_pages() drops its lock, the domain could also
> decide to get rid of (some of) the pages again.

True. Yes, this needs to be swapped.

> For this (and
> also to slightly simplify things in free_domheap_pages())
> perhaps it would be better not to add that ASSERT() to
> free_heap_pages(). The function shouldn't really be concerned
> of any refcounting, and hence could as well be ignorant to
> PGC_no_refcount being set on a page.
> 

Not sure I understand here. What would you like to see free_heap_pages() assert?

> > @@ -2368,24 +2377,32 @@ void free_domheap_pages(struct page_info *pg,
> unsigned int order)
> >
> >  if ( likely(d) && likely(d != dom_cow) )
> >  {
> > +long pages = 0;
> > +
> >  /* NB. May recursively lock from relinquish_memory(). */
> >  spin_lock_recursive(&d->page_alloc_lock);
> >
> >  for ( i = 0; i < (1 << order); i++ )
> >  {
> > +unsigned long count_info = pg[i].count_info;
> > +
> >  if ( pg[i].u.inuse.type_info & PGT_count_mask )
> >  {
> >  printk(XENLOG_ERR
> > "pg[%u] MFN %"PRI_mfn" c=%#lx o=%u v=%#lx
> t=%#x\n",
> > i, mfn_x(page_to_mfn(pg + i)),
> > -   pg[i].count_info, pg[i].v.free.order,
> > +   count_info, pg[i].v.free.order,
> >  

Re: [Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-28 Thread Durrant, Paul
> -Original Message-
[snip]
> > > @@ -2331,11 +2331,20 @@ struct page_info *alloc_domheap_pages(
> > >memflags, d)) == NULL)) )
> > >   return NULL;
> > >
> > > -if ( d && !(memflags & MEMF_no_owner) &&
> > > - assign_pages(d, pg, order, memflags) )
> > > +if ( d && !(memflags & MEMF_no_owner) )
> > >  {
> > > -free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > > -return NULL;
> > > +if ( assign_pages(d, pg, order, memflags) )
> > > +{
> > > +free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > > +return NULL;
> > > +}
> > > +if ( memflags & MEMF_no_refcount )
> > > +{
> > > +unsigned long i;
> > > +
> > > +for ( i = 0; i < (1 << order); i++ )
> > > +pg[i].count_info |= PGC_no_refcount;
> > > +}
> >
> > I would seem to me that this needs doing the other way around:
> > First set PGC_no_refcount, then assign_pages(). After all, the
> > moment assign_pages() drops its lock, the domain could also
> > decide to get rid of (some of) the pages again.
> 
> True. Yes, this needs to be swapped.
> 
> > For this (and
> > also to slightly simplify things in free_domheap_pages())
> > perhaps it would be better not to add that ASSERT() to
> > free_heap_pages(). The function shouldn't really be concerned
> > of any refcounting, and hence could as well be ignorant to
> > PGC_no_refcount being set on a page.
> >
> 
> Not sure I understand here. What would you like to see free_heap_pages()
> assert?
> 

Oh, I misread. You want me to remove the ASSERT that I added... that's fine.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-29 Thread Jan Beulich
On 28.01.2020 18:01, Durrant, Paul wrote:
>> From: Jan Beulich 
>> Sent: 28 January 2020 15:23
>>
>> On 24.01.2020 16:31, Paul Durrant wrote:
>>> Currently it is unsafe to assign a domheap page allocated with
>>> MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
>>> be incremented, but will be decrement when the page is freed (since
>>> free_domheap_pages() has no way of telling that the increment was
>> skipped).
>>>
>>> This patch allocates a new 'count_info' bit for a PGC_no_refcount flag
>>> which is then used to mark domheap pages allocated with
>> MEMF_no_refcount.
>>> This then allows free_domheap_pages() to skip decrementing tot_pages
>> when
>>> appropriate and hence makes the pages safe to assign.
>>>
>>> NOTE: The patch sets MEMF_no_refcount directly in alloc_domheap_pages()
>>>   rather than in assign_pages() because the latter is called with
>>>   MEMF_no_refcount by memory_exchange() as an optimization, to avoid
>>>   too many calls to domain_adjust_tot_pages() (which acquires and
>>>   releases the global 'heap_lock').
>>
>> I don't think there were any optimization thoughts with this. The
>> MEMF_no_refcount use is because otherwise for a domain with
>> tot_pages == max_pages the assignment would fail.
>>
> 
> That would not be the case if the calls to steal_page() further up didn't
> pass MEMF_no_refcount (which would be the correct thing to do if not
> passing it to assign_pages().

No, that's not an option either: steal_page() would otherwise decrement
->tot_pages, allowing the domain to allocate new memory on another vCPU.
This would again result in the exchange failing for no reason. (And no,
I don't think a guest should be required to serialize e.g. ballooning
operations with exchanges.)

>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -460,6 +460,9 @@ unsigned long domain_adjust_tot_pages(struct domain
>> *d, long pages)
>>>  {
>>>  long dom_before, dom_after, dom_claimed, sys_before, sys_after;
>>>
>>> +if ( !pages )
>>> +goto out;
>>
>> Unrelated change? Are there, in fact, any callers passing in 0?
>> Oh, further down you add one which may do so, but then perhaps
>> better to make the caller not call here (as is done e.g. in
>> memory_exchange())?
> 
> I think it's preferable for domain_adjust_tot_pages() to handle zero
> gracefully.

That's an option, but imo would then better be a separate change (to
also drop present guards of calls to the function).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 08:22
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH v4 5/7] mm: make MEMF_no_refcount pages safe to assign
> 
> On 28.01.2020 18:01, Durrant, Paul wrote:
> >> From: Jan Beulich 
> >> Sent: 28 January 2020 15:23
> >>
> >> On 24.01.2020 16:31, Paul Durrant wrote:
> >>> Currently it is unsafe to assign a domheap page allocated with
> >>> MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
> >>> be incremented, but will be decrement when the page is freed (since
> >>> free_domheap_pages() has no way of telling that the increment was
> >> skipped).
> >>>
> >>> This patch allocates a new 'count_info' bit for a PGC_no_refcount flag
> >>> which is then used to mark domheap pages allocated with
> >> MEMF_no_refcount.
> >>> This then allows free_domheap_pages() to skip decrementing tot_pages
> >> when
> >>> appropriate and hence makes the pages safe to assign.
> >>>
> >>> NOTE: The patch sets MEMF_no_refcount directly in
> alloc_domheap_pages()
> >>>   rather than in assign_pages() because the latter is called with
> >>>   MEMF_no_refcount by memory_exchange() as an optimization, to
> avoid
> >>>   too many calls to domain_adjust_tot_pages() (which acquires and
> >>>   releases the global 'heap_lock').
> >>
> >> I don't think there were any optimization thoughts with this. The
> >> MEMF_no_refcount use is because otherwise for a domain with
> >> tot_pages == max_pages the assignment would fail.
> >>
> >
> > That would not be the case if the calls to steal_page() further up
> didn't
> > pass MEMF_no_refcount (which would be the correct thing to do if not
> > passing it to assign_pages().
> 
> No, that's not an option either: steal_page() would otherwise decrement
> ->tot_pages, allowing the domain to allocate new memory on another vCPU.
> This would again result in the exchange failing for no reason. (And no,
> I don't think a guest should be required to serialize e.g. ballooning
> operations with exchanges.)
> 

Ok, yes it does make it non-atomic but my view would be that the guest should 
not be simultaneously ballooning; however, we clearly differ there.

> >>> --- a/xen/common/page_alloc.c
> >>> +++ b/xen/common/page_alloc.c
> >>> @@ -460,6 +460,9 @@ unsigned long domain_adjust_tot_pages(struct
> domain
> >> *d, long pages)
> >>>  {
> >>>  long dom_before, dom_after, dom_claimed, sys_before, sys_after;
> >>>
> >>> +if ( !pages )
> >>> +goto out;
> >>
> >> Unrelated change? Are there, in fact, any callers passing in 0?
> >> Oh, further down you add one which may do so, but then perhaps
> >> better to make the caller not call here (as is done e.g. in
> >> memory_exchange())?
> >
> > I think it's preferable for domain_adjust_tot_pages() to handle zero
> > gracefully.
> 
> That's an option, but imo would then better be a separate change (to
> also drop present guards of calls to the function).

Ok, I'll split it out into a separate patch.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel