[Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-02 Thread Yu Zhang
After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

This patch disallows mapping of an ioreq server, when there's still
p2m_ioreq_server entry left, in case another mapping occurs right after
the current one being unmapped, releases its lock, with p2m table not
synced yet.

This patch also disallows live migration, when there's remaining
p2m_ioreq_server entry in p2m table. The core reason is our current
implementation of p2m_change_entry_type_global() lacks information
to resync p2m_ioreq_server entries correctly if global_logdirty is
on.

Signed-off-by: Yu Zhang 
Reviewed-by: Paul Durrant 
---
Cc: Paul Durrant 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jun Nakajima 
Cc: Kevin Tian 

changes in v5: 
  - According to comments from Jan: use unsigned long for entry_count;
  - According to comments from Jan: refuse mapping requirement when
there's p2m_ioreq_server remained in p2m table.
  - Added "Reviewed-by: Paul Durrant "

changes in v4: 
  - According to comments from Jan: use ASSERT() instead of 'if'
condition in p2m_change_type_one().
  - According to comments from Jan: commit message changes to mention
the p2m_ioreq_server are all based on 4K sized pages.

changes in v3: 
  - Move the synchronously resetting logic into patch 5.
  - According to comments from Jan: introduce p2m_check_changeable()
to clarify the p2m type change code.
  - According to comments from George: use locks in the same order
to avoid deadlock, call p2m_change_entry_type_global() after unmap
of the ioreq server is finished.

changes in v2: 
  - Move the calculation of ioreq server page entry_cout into 
p2m_change_type_one() so that we do not need a seperate lock.
Note: entry_count is also calculated in resolve_misconfig()/
do_recalc(), fortunately callers of both routines have p2m 
lock protected already.
  - Simplify logic in hvmop_set_mem_type().
  - Introduce routine p2m_finish_type_change() to walk the p2m 
table and do the p2m reset.
---
 xen/arch/x86/hvm/ioreq.c  |  8 
 xen/arch/x86/mm/hap/hap.c |  9 +
 xen/arch/x86/mm/p2m-ept.c |  8 +++-
 xen/arch/x86/mm/p2m-pt.c  | 13 +++--
 xen/arch/x86/mm/p2m.c | 29 +
 xen/include/asm-x86/p2m.h |  9 -
 6 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 5bf3b6d..07a6c26 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, 
ioservid_t id,
 
 spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
 
+if ( rc == 0 && flags == 0 )
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+if ( read_atomic(&p2m->ioreq.entry_count) )
+p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+}
+
 return rc;
 }
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..6ec950a 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+/*
+ * Refuse to turn on global log-dirty mode if
+ * there's outstanding p2m_ioreq_server pages.
+ */
+if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+return -EBUSY;
+
 /* turn on PG_log_dirty bit in paging mode */
 paging_lock(d);
 d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..62e9ddf 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, 
unsigned long gfn)
 e.ipat = ipat;
 if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
 {
+ if ( e.sa_p2mt == p2m_ioreq_server )
+ {
+ ASSERT(p2m->ioreq.entry_count > 0);
+ p2m->ioreq.entry_count--;
+ }
+
  e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + 
i)
  ? p2m_ram_logdirty : p2m_ram_rw;
  ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
@@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_doma

Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-03 Thread Jan Beulich
>>> On 02.04.17 at 14:24,  wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
>  if ( p2m->ioreq.server != NULL )
>  goto out;
>  
> +/*
> + * It is possible that an ioreq server has just been unmapped,
> + * released the spin lock, with some p2m_ioreq_server entries
> + * in p2m table remained. We shall refuse another ioreq server
> + * mapping request in such case.
> + */
> +if ( read_atomic(&p2m->ioreq.entry_count) )
> +goto out;

So this produces the same -EINVAL as the earlier check in context
above. I think it would be nice if neither did - -EINUSE for the first
(which we don't have, so -EOPNOTSUPP would seem the second
bets option there) and -EBUSY for the second would seem more
appropriate. If you agree, respective adjustments could be done
while committing, if no other reason for a v11 arises.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-03 Thread Jan Beulich
>>> On 03.04.17 at 16:36,  wrote:
 On 02.04.17 at 14:24,  wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
>>  if ( p2m->ioreq.server != NULL )
>>  goto out;
>>  
>> +/*
>> + * It is possible that an ioreq server has just been unmapped,
>> + * released the spin lock, with some p2m_ioreq_server entries
>> + * in p2m table remained. We shall refuse another ioreq server
>> + * mapping request in such case.
>> + */
>> +if ( read_atomic(&p2m->ioreq.entry_count) )
>> +goto out;
> 
> So this produces the same -EINVAL as the earlier check in context
> above. I think it would be nice if neither did - -EINUSE for the first
> (which we don't have, so -EOPNOTSUPP would seem the second
> bets option there) and -EBUSY for the second would seem more
> appropriate. If you agree, respective adjustments could be done
> while committing, if no other reason for a v11 arises.

Oh, and with that
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/3/2017 10:36 PM, Jan Beulich wrote:

So this produces the same -EINVAL as the earlier check in context
above. I think it would be nice if neither did - -EINUSE for the first
(which we don't have, so -EOPNOTSUPP would seem the second
bets option there) and -EBUSY for the second would seem more
appropriate. If you agree, respective adjustments could be done
while committing, if no other reason for a v11 arises.


Thanks Jan.
But my code shows both will return -EBUSY, instead of -EINVAL for the 
mapping requirement:

  /* Unmap ioreq server from p2m type by passing flags with 0. */
  if ( flags == 0 )
  {
/rc = -EINVAL;/
  if ( p2m->ioreq.server != s )
  goto out;

  p2m->ioreq.server = NULL;
  p2m->ioreq.flags = 0;
  }
  else
  {
/rc = -EBUSY;/
  if ( p2m->ioreq.server != NULL )
  goto out;

  /*
   * It is possible that an ioreq server has just been unmapped,
   * released the spin lock, with some p2m_ioreq_server entries
   * in p2m table remained. We shall refuse another ioreq server
   * mapping request in such case.
   */
  if ( read_atomic(&p2m->ioreq.entry_count) )
  goto out;

  p2m->ioreq.server = s;
  p2m->ioreq.flags = flags;
  }

Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL 
for the mapping code?


For the unmapping code, -EINVAL is used when the ioreq server to be 
unmapped is not the designated one.

But for this one, I am not sure which one is better: -EINVAL or -EBUSY?

Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Jan Beulich
>>> On 05.04.17 at 09:18,  wrote:
> On 4/3/2017 10:36 PM, Jan Beulich wrote:
>> So this produces the same -EINVAL as the earlier check in context
>> above. I think it would be nice if neither did - -EINUSE for the first
>> (which we don't have, so -EOPNOTSUPP would seem the second
>> bets option there) and -EBUSY for the second would seem more
>> appropriate. If you agree, respective adjustments could be done
>> while committing, if no other reason for a v11 arises.
> 
> Thanks Jan.

First of all - deleting irrelevant context from replies is always
appreciated, but I think you went too far here. It is no longer
possible to re-associate your comments with the change you've
made (not visible from the code fragment below).

> But my code shows both will return -EBUSY, instead of -EINVAL for the 
> mapping requirement:
>/* Unmap ioreq server from p2m type by passing flags with 0. */
>if ( flags == 0 )
>{
> /rc = -EINVAL;/
>if ( p2m->ioreq.server != s )
>goto out;
> 
>p2m->ioreq.server = NULL;
>p2m->ioreq.flags = 0;
>}
>else
>{
> /rc = -EBUSY;/
>if ( p2m->ioreq.server != NULL )
>goto out;
> 
>/*
> * It is possible that an ioreq server has just been unmapped,
> * released the spin lock, with some p2m_ioreq_server entries
> * in p2m table remained. We shall refuse another ioreq server
> * mapping request in such case.
> */
>if ( read_atomic(&p2m->ioreq.entry_count) )
>goto out;

Oh, I see: I've mistakenly assumed to addition was to the if()
branch. I clearly didn't look closely enough, I'm sorry.

>p2m->ioreq.server = s;
>p2m->ioreq.flags = flags;
>}
> 
> Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL 
> for the mapping code?

As per above, I really did refer to the wrong piece of code, so all
is fine the way the code is.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread George Dunlap
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang  wrote:
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
>
> New field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries. One nature of
> these entries is that they only point to 4K sized page frames, because
> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
> p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they are only
ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a p2m_ioreq_server
page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
I miss something?

Other than that, only one minor comment...

> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index a57b385..6ec950a 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -187,6 +187,15 @@ out:
>   */
>  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>  {
> +struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +/*
> + * Refuse to turn on global log-dirty mode if
> + * there's outstanding p2m_ioreq_server pages.

Grammar nit: if *there are* outstanding pages.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/5/2017 10:41 PM, George Dunlap wrote:

On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang  wrote:

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they are only
ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a p2m_ioreq_server
page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
I miss something?

Other than that, only one minor comment...


Thanks for your thorough consideration, George. But I do not think we need to 
worry about this:

If the emulation is in process, the balloon driver cannot get a 
p2m_ioreq_server page - because
it is already allocated.

And even when emulation is finished, the balloon driver successfully get this 
page, and triggers
decrease_reservation, the purpose is to remove the current mapping relation 
between the gfn
and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if everything is 
goes fine, and then
p2m_set_entry(), which will trigger the recalc logic eventually, either in 
ept_set_entry() or
p2m_pt_set_entry(). Then the entry_count will be updated in the recalc logic.


diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..6ec950a 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
   */
  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
  {
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+/*
+ * Refuse to turn on global log-dirty mode if
+ * there's outstanding p2m_ioreq_server pages.

Grammar nit: if *there are* outstanding pages.


Oh, right. Thanks

B.R.
Yu

  -George




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread George Dunlap
On 05/04/17 17:22, Yu Zhang wrote:
> 
> 
> On 4/5/2017 10:41 PM, George Dunlap wrote:
>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
>> wrote:
>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>>> entries need to be reset back to p2m_ram_rw. This patch does this
>>> asynchronously with the current p2m_change_entry_type_global()
>>> interface.
>>>
>>> New field entry_count is introduced in struct p2m_domain, to record
>>> the number of p2m_ioreq_server p2m page table entries. One nature of
>>> these entries is that they only point to 4K sized page frames, because
>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>>> p2m_change_type_one(). We do not need to worry about the counting for
>>> 2M/1G sized pages.
>> Assuming that all p2m_ioreq_server entries are *created* by
>> p2m_change_type_one() may valid, but can you assume that they are only
>> ever *removed* by p2m_change_type_one() (or recalculation)?
>>
>> What happens, for instance, if a guest balloons out one of the ram
>> pages?  I don't immediately see anything preventing a p2m_ioreq_server
>> page from being ballooned out, nor anything on the
>> decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
>> I miss something?
>>
>> Other than that, only one minor comment...
> 
> Thanks for your thorough consideration, George. But I do not think we
> need to worry about this:
> 
> If the emulation is in process, the balloon driver cannot get a
> p2m_ioreq_server page - because
> it is already allocated.

In theory, yes, the guest *shouldn't* do this.  But what if the guest OS
makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device driver, or
forgets to change a page type back to ram when the device driver frees
it back to the guest kernel?

It's the hypervisor's job to do the right thing even if the guest and
the device model don't.

> And even when emulation is finished, the balloon driver successfully get
> this page, and triggers
> decrease_reservation, the purpose is to remove the current mapping
> relation between the gfn
> and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if
> everything is goes fine, and then
> p2m_set_entry(), which will trigger the recalc logic eventually, either
> in ept_set_entry() or
> p2m_pt_set_entry(). Then the entry_count will be updated in the recalc
> logic.

Yes, once the lazy type change has been made, we can rely on the
recalculation logic to make sure that the types are changed appropriately.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/6/2017 12:35 AM, George Dunlap wrote:

On 05/04/17 17:22, Yu Zhang wrote:


On 4/5/2017 10:41 PM, George Dunlap wrote:

On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
wrote:

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they are only
ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a p2m_ioreq_server
page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
I miss something?

Other than that, only one minor comment...

Thanks for your thorough consideration, George. But I do not think we
need to worry about this:

If the emulation is in process, the balloon driver cannot get a
p2m_ioreq_server page - because
it is already allocated.

In theory, yes, the guest *shouldn't* do this.  But what if the guest OS
makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device driver, or
forgets to change a page type back to ram when the device driver frees
it back to the guest kernel?


Then the lazy p2m change code will be triggered, and this page is reset 
to p2m_ram_rw
before being set to p2m_invalid, just like the normal path. Will this be 
a problem?



It's the hypervisor's job to do the right thing even if the guest and
the device model don't.


And even when emulation is finished, the balloon driver successfully get
this page, and triggers
decrease_reservation, the purpose is to remove the current mapping
relation between the gfn
and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if
everything is goes fine, and then
p2m_set_entry(), which will trigger the recalc logic eventually, either
in ept_set_entry() or
p2m_pt_set_entry(). Then the entry_count will be updated in the recalc
logic.

Yes, once the lazy type change has been made, we can rely on the
recalculation logic to make sure that the types are changed appropriately.


Yep. So my understanding is that as long as p2m_set_entry() is used to 
change the p2m,

we do not need to worry about this.

B.R.
Yu


  -George





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread George Dunlap
On 05/04/17 17:32, Yu Zhang wrote:
> 
> 
> On 4/6/2017 12:35 AM, George Dunlap wrote:
>> On 05/04/17 17:22, Yu Zhang wrote:
>>>
>>> On 4/5/2017 10:41 PM, George Dunlap wrote:
 On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
 wrote:
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
>
> New field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries. One nature of
> these entries is that they only point to 4K sized page frames, because
> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
> p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.
 Assuming that all p2m_ioreq_server entries are *created* by
 p2m_change_type_one() may valid, but can you assume that they are only
 ever *removed* by p2m_change_type_one() (or recalculation)?

 What happens, for instance, if a guest balloons out one of the ram
 pages?  I don't immediately see anything preventing a p2m_ioreq_server
 page from being ballooned out, nor anything on the
 decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
 I miss something?

 Other than that, only one minor comment...
>>> Thanks for your thorough consideration, George. But I do not think we
>>> need to worry about this:
>>>
>>> If the emulation is in process, the balloon driver cannot get a
>>> p2m_ioreq_server page - because
>>> it is already allocated.
>> In theory, yes, the guest *shouldn't* do this.  But what if the guest OS
>> makes a mistake?  Or, what if the ioreq server makes a mistake and
>> places a watch on a page that *isn't* allocated by the device driver, or
>> forgets to change a page type back to ram when the device driver frees
>> it back to the guest kernel?
> 
> Then the lazy p2m change code will be triggered, and this page is reset
> to p2m_ram_rw
> before being set to p2m_invalid, just like the normal path. Will this be
> a problem?

No, I'm talking about before the ioreq server detaches.

Scenario 1: Bug in driver
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver accidentally frees A to the kernel
4. guest kernel balloons out page A; ioreq.entry_count is wrong

Scenario 2: Bug in the kernel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest kernel tries to balloon out page B, but makes a calculation
mistake and balloons out A instead; now ioreq.entry_count is wrong

Scenario 3: Off-by-one bug in devicemodel
1. Guest driver allocates pages A-D
2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra
page)
3. guest kernel balloons out page E; now ioreq.entry_count is wrong

Scenario 4: "Leak" in devicemodel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver is done with page A, but DM forgets to reset it to
p2m_ram_rw
4. Guest driver frees A to guest kernel
5. Guest kernel balloons out page A; now ioreq.entry_count is wrong

I could keep going on; there are *lots* of bugs in the driver, the
kernel, or the devicemodel which could cause pages marked
p2m_ioreq_server to end up being ballooned out; which under the current
code would make ioreq.entry_count wrong.

It's the hypervisor's job to do the right thing even when other
components have bugs in them.  This is why I initially suggested keeping
count in atomic_write_ept_entry() -- no matter how the entry is changed,
we always know exactly how many entries of type p2m_ioreq_server we have.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/6/2017 1:01 AM, George Dunlap wrote:

On 05/04/17 17:32, Yu Zhang wrote:


On 4/6/2017 12:35 AM, George Dunlap wrote:

On 05/04/17 17:22, Yu Zhang wrote:

On 4/5/2017 10:41 PM, George Dunlap wrote:

On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
wrote:

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they are only
ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a p2m_ioreq_server
page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or did
I miss something?

Other than that, only one minor comment...

Thanks for your thorough consideration, George. But I do not think we
need to worry about this:

If the emulation is in process, the balloon driver cannot get a
p2m_ioreq_server page - because
it is already allocated.

In theory, yes, the guest *shouldn't* do this.  But what if the guest OS
makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device driver, or
forgets to change a page type back to ram when the device driver frees
it back to the guest kernel?

Then the lazy p2m change code will be triggered, and this page is reset
to p2m_ram_rw
before being set to p2m_invalid, just like the normal path. Will this be
a problem?

No, I'm talking about before the ioreq server detaches.

Sorry, I do not get it. Take scenario 1 for example:

Scenario 1: Bug in driver
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server

Here in step 2. the ioreq.entry_count increases;

3. Guest driver accidentally frees A to the kernel
4. guest kernel balloons out page A; ioreq.entry_count is wrong


Here in step 4. the ioreq.entry_count decreases.
Isn't this what we are expecting?

Yu


Scenario 2: Bug in the kernel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest kernel tries to balloon out page B, but makes a calculation
mistake and balloons out A instead; now ioreq.entry_count is wrong

Scenario 3: Off-by-one bug in devicemodel
1. Guest driver allocates pages A-D
2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra
page)
3. guest kernel balloons out page E; now ioreq.entry_count is wrong

Scenario 4: "Leak" in devicemodel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver is done with page A, but DM forgets to reset it to
p2m_ram_rw
4. Guest driver frees A to guest kernel
5. Guest kernel balloons out page A; now ioreq.entry_count is wrong

I could keep going on; there are *lots* of bugs in the driver, the
kernel, or the devicemodel which could cause pages marked
p2m_ioreq_server to end up being ballooned out; which under the current
code would make ioreq.entry_count wrong.

It's the hypervisor's job to do the right thing even when other
components have bugs in them.  This is why I initially suggested keeping
count in atomic_write_ept_entry() -- no matter how the entry is changed,
we always know exactly how many entries of type p2m_ioreq_server we have.

  -George





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread George Dunlap
On 05/04/17 18:18, Yu Zhang wrote:
> 
> 
> On 4/6/2017 1:01 AM, George Dunlap wrote:
>> On 05/04/17 17:32, Yu Zhang wrote:
>>>
>>> On 4/6/2017 12:35 AM, George Dunlap wrote:
 On 05/04/17 17:22, Yu Zhang wrote:
> On 4/5/2017 10:41 PM, George Dunlap wrote:
>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
>> wrote:
>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>>> entries need to be reset back to p2m_ram_rw. This patch does this
>>> asynchronously with the current p2m_change_entry_type_global()
>>> interface.
>>>
>>> New field entry_count is introduced in struct p2m_domain, to record
>>> the number of p2m_ioreq_server p2m page table entries. One nature of
>>> these entries is that they only point to 4K sized page frames,
>>> because
>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>>> p2m_change_type_one(). We do not need to worry about the counting
>>> for
>>> 2M/1G sized pages.
>> Assuming that all p2m_ioreq_server entries are *created* by
>> p2m_change_type_one() may valid, but can you assume that they are
>> only
>> ever *removed* by p2m_change_type_one() (or recalculation)?
>>
>> What happens, for instance, if a guest balloons out one of the ram
>> pages?  I don't immediately see anything preventing a
>> p2m_ioreq_server
>> page from being ballooned out, nor anything on the
>> decrease_reservation() path decreasing p2m->ioreq.entry_count.  Or
>> did
>> I miss something?
>>
>> Other than that, only one minor comment...
> Thanks for your thorough consideration, George. But I do not think we
> need to worry about this:
>
> If the emulation is in process, the balloon driver cannot get a
> p2m_ioreq_server page - because
> it is already allocated.
 In theory, yes, the guest *shouldn't* do this.  But what if the
 guest OS
 makes a mistake?  Or, what if the ioreq server makes a mistake and
 places a watch on a page that *isn't* allocated by the device
 driver, or
 forgets to change a page type back to ram when the device driver frees
 it back to the guest kernel?
>>> Then the lazy p2m change code will be triggered, and this page is reset
>>> to p2m_ram_rw
>>> before being set to p2m_invalid, just like the normal path. Will this be
>>> a problem?
>> No, I'm talking about before the ioreq server detaches.
> Sorry, I do not get it. Take scenario 1 for example:
>> Scenario 1: Bug in driver
>> 1. Guest driver allocates page A
>> 2. dm marks A as p2m_ioreq_server
> Here in step 2. the ioreq.entry_count increases;
>> 3. Guest driver accidentally frees A to the kernel
>> 4. guest kernel balloons out page A; ioreq.entry_count is wrong
> 
> Here in step 4. the ioreq.entry_count decreases.
> Isn't this what we are expecting?

So step 4 happens via xen/common/memory.c:decrease_reservation().  Where
along that path will ioreq.entry_count be decreased for the page freed?

Maybe it is, but I didn't see it when I looked.  As far as I can tell,
after step 4, ioreq.entry_count would still be 1, but that the actual
number of p2m_ioreq_server entries would be 0.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/6/2017 1:18 AM, Yu Zhang wrote:



On 4/6/2017 1:01 AM, George Dunlap wrote:

On 05/04/17 17:32, Yu Zhang wrote:


On 4/6/2017 12:35 AM, George Dunlap wrote:

On 05/04/17 17:22, Yu Zhang wrote:

On 4/5/2017 10:41 PM, George Dunlap wrote:
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 


wrote:

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One 
nature of
these entries is that they only point to 4K sized page frames, 
because

all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the 
counting for

2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they are 
only

ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a 
p2m_ioreq_server

page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  
Or did

I miss something?

Other than that, only one minor comment...

Thanks for your thorough consideration, George. But I do not think we
need to worry about this:

If the emulation is in process, the balloon driver cannot get a
p2m_ioreq_server page - because
it is already allocated.
In theory, yes, the guest *shouldn't* do this.  But what if the 
guest OS

makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device 
driver, or

forgets to change a page type back to ram when the device driver frees
it back to the guest kernel?

Then the lazy p2m change code will be triggered, and this page is reset
to p2m_ram_rw
before being set to p2m_invalid, just like the normal path. Will 
this be

a problem?

No, I'm talking about before the ioreq server detaches.

Sorry, I do not get it. Take scenario 1 for example:

Scenario 1: Bug in driver
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server

Here in step 2. the ioreq.entry_count increases;

3. Guest driver accidentally frees A to the kernel
4. guest kernel balloons out page A; ioreq.entry_count is wrong


Here in step 4. the ioreq.entry_count decreases.


Oh. I figured out. This entry is not invalidated yet if ioreq is not 
unmapped. Sorry.



Isn't this what we are expecting?

Yu


Scenario 2: Bug in the kernel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest kernel tries to balloon out page B, but makes a calculation
mistake and balloons out A instead; now ioreq.entry_count is wrong

Scenario 3: Off-by-one bug in devicemodel
1. Guest driver allocates pages A-D
2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra
page)
3. guest kernel balloons out page E; now ioreq.entry_count is wrong

Scenario 4: "Leak" in devicemodel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver is done with page A, but DM forgets to reset it to
p2m_ram_rw
4. Guest driver frees A to guest kernel
5. Guest kernel balloons out page A; now ioreq.entry_count is wrong

I could keep going on; there are *lots* of bugs in the driver, the
kernel, or the devicemodel which could cause pages marked
p2m_ioreq_server to end up being ballooned out; which under the current
code would make ioreq.entry_count wrong.

It's the hypervisor's job to do the right thing even when other
components have bugs in them.  This is why I initially suggested keeping
count in atomic_write_ept_entry() -- no matter how the entry is changed,
we always know exactly how many entries of type p2m_ioreq_server we 
have.




Well, count in atomic_write_ept_entry() only works for ept. Besides, it 
requires

interface changes - need to pass the p2m.
Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn 
does
not change. So how about we disable ballooning if ioreq.entry_count is 
not 0?


Yu

  -George





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/6/2017 1:28 AM, Yu Zhang wrote:



On 4/6/2017 1:18 AM, Yu Zhang wrote:



On 4/6/2017 1:01 AM, George Dunlap wrote:

On 05/04/17 17:32, Yu Zhang wrote:


On 4/6/2017 12:35 AM, George Dunlap wrote:

On 05/04/17 17:22, Yu Zhang wrote:

On 4/5/2017 10:41 PM, George Dunlap wrote:
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 


wrote:

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to 
record
the number of p2m_ioreq_server p2m page table entries. One 
nature of
these entries is that they only point to 4K sized page frames, 
because
all p2m_ioreq_server entries are originated from p2m_ram_rw 
ones in
p2m_change_type_one(). We do not need to worry about the 
counting for

2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they 
are only

ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a 
p2m_ioreq_server

page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  
Or did

I miss something?

Other than that, only one minor comment...
Thanks for your thorough consideration, George. But I do not 
think we

need to worry about this:

If the emulation is in process, the balloon driver cannot get a
p2m_ioreq_server page - because
it is already allocated.
In theory, yes, the guest *shouldn't* do this.  But what if the 
guest OS

makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device 
driver, or
forgets to change a page type back to ram when the device driver 
frees

it back to the guest kernel?
Then the lazy p2m change code will be triggered, and this page is 
reset

to p2m_ram_rw
before being set to p2m_invalid, just like the normal path. Will 
this be

a problem?

No, I'm talking about before the ioreq server detaches.

Sorry, I do not get it. Take scenario 1 for example:

Scenario 1: Bug in driver
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server

Here in step 2. the ioreq.entry_count increases;

3. Guest driver accidentally frees A to the kernel
4. guest kernel balloons out page A; ioreq.entry_count is wrong


Here in step 4. the ioreq.entry_count decreases.


Oh. I figured out. This entry is not invalidated yet if ioreq is not 
unmapped. Sorry.



Isn't this what we are expecting?

Yu


Scenario 2: Bug in the kernel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest kernel tries to balloon out page B, but makes a calculation
mistake and balloons out A instead; now ioreq.entry_count is wrong

Scenario 3: Off-by-one bug in devicemodel
1. Guest driver allocates pages A-D
2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one 
extra

page)
3. guest kernel balloons out page E; now ioreq.entry_count is wrong

Scenario 4: "Leak" in devicemodel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver is done with page A, but DM forgets to reset it to
p2m_ram_rw
4. Guest driver frees A to guest kernel
5. Guest kernel balloons out page A; now ioreq.entry_count is wrong

I could keep going on; there are *lots* of bugs in the driver, the
kernel, or the devicemodel which could cause pages marked
p2m_ioreq_server to end up being ballooned out; which under the current
code would make ioreq.entry_count wrong.

It's the hypervisor's job to do the right thing even when other
components have bugs in them.  This is why I initially suggested 
keeping
count in atomic_write_ept_entry() -- no matter how the entry is 
changed,
we always know exactly how many entries of type p2m_ioreq_server we 
have.




Well, count in atomic_write_ept_entry() only works for ept. Besides, 
it requires

interface changes - need to pass the p2m.
Another thought is - now in XenGT, PoD is disabled to make sure 
gfn->mfn does
not change. So how about we disable ballooning if ioreq.entry_count is 
not 0?


Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is set 
to p2m_invalid?

Like below code:

diff --git a/xen/common/memory.c b/xen/common/memory.c

index 7dbddda..40e5f63 100644

--- a/xen/common/memory.c

+++ b/xen/common/memory.c

@@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned 
long gmfn)


put_gfn(d, gmfn);

return 1;

}

+if ( unlikely(p2mt == p2m_ioreq_server) )

+p2m_change_type_one(d, gmfn,

+p2m_ioreq_server, p2m_ram_rw);

+

#else

mfn = gfn_to_mfn(d, _gfn(gmfn));

#endif


Yu



Yu

  -George





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



___

Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-05 Thread Yu Zhang



On 4/6/2017 2:02 AM, Yu Zhang wrote:



On 4/6/2017 1:28 AM, Yu Zhang wrote:



On 4/6/2017 1:18 AM, Yu Zhang wrote:



On 4/6/2017 1:01 AM, George Dunlap wrote:

On 05/04/17 17:32, Yu Zhang wrote:


On 4/6/2017 12:35 AM, George Dunlap wrote:

On 05/04/17 17:22, Yu Zhang wrote:

On 4/5/2017 10:41 PM, George Dunlap wrote:
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 


wrote:
After an ioreq server has unmapped, the remaining 
p2m_ioreq_server

entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to 
record
the number of p2m_ioreq_server p2m page table entries. One 
nature of
these entries is that they only point to 4K sized page frames, 
because
all p2m_ioreq_server entries are originated from p2m_ram_rw 
ones in
p2m_change_type_one(). We do not need to worry about the 
counting for

2M/1G sized pages.

Assuming that all p2m_ioreq_server entries are *created* by
p2m_change_type_one() may valid, but can you assume that they 
are only

ever *removed* by p2m_change_type_one() (or recalculation)?

What happens, for instance, if a guest balloons out one of the ram
pages?  I don't immediately see anything preventing a 
p2m_ioreq_server

page from being ballooned out, nor anything on the
decrease_reservation() path decreasing p2m->ioreq.entry_count.  
Or did

I miss something?

Other than that, only one minor comment...
Thanks for your thorough consideration, George. But I do not 
think we

need to worry about this:

If the emulation is in process, the balloon driver cannot get a
p2m_ioreq_server page - because
it is already allocated.
In theory, yes, the guest *shouldn't* do this.  But what if the 
guest OS

makes a mistake?  Or, what if the ioreq server makes a mistake and
places a watch on a page that *isn't* allocated by the device 
driver, or
forgets to change a page type back to ram when the device driver 
frees

it back to the guest kernel?
Then the lazy p2m change code will be triggered, and this page is 
reset

to p2m_ram_rw
before being set to p2m_invalid, just like the normal path. Will 
this be

a problem?

No, I'm talking about before the ioreq server detaches.

Sorry, I do not get it. Take scenario 1 for example:

Scenario 1: Bug in driver
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server

Here in step 2. the ioreq.entry_count increases;

3. Guest driver accidentally frees A to the kernel
4. guest kernel balloons out page A; ioreq.entry_count is wrong


Here in step 4. the ioreq.entry_count decreases.


Oh. I figured out. This entry is not invalidated yet if ioreq is not 
unmapped. Sorry.



Isn't this what we are expecting?

Yu


Scenario 2: Bug in the kernel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest kernel tries to balloon out page B, but makes a calculation
mistake and balloons out A instead; now ioreq.entry_count is wrong

Scenario 3: Off-by-one bug in devicemodel
1. Guest driver allocates pages A-D
2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one 
extra

page)
3. guest kernel balloons out page E; now ioreq.entry_count is wrong

Scenario 4: "Leak" in devicemodel
1. Guest driver allocates page A
2. dm marks A as p2m_ioreq_server
3. Guest driver is done with page A, but DM forgets to reset it to
p2m_ram_rw
4. Guest driver frees A to guest kernel
5. Guest kernel balloons out page A; now ioreq.entry_count is wrong

I could keep going on; there are *lots* of bugs in the driver, the
kernel, or the devicemodel which could cause pages marked
p2m_ioreq_server to end up being ballooned out; which under the 
current

code would make ioreq.entry_count wrong.

It's the hypervisor's job to do the right thing even when other
components have bugs in them.  This is why I initially suggested 
keeping
count in atomic_write_ept_entry() -- no matter how the entry is 
changed,
we always know exactly how many entries of type p2m_ioreq_server we 
have.




Well, count in atomic_write_ept_entry() only works for ept. Besides, 
it requires

interface changes - need to pass the p2m.
Another thought is - now in XenGT, PoD is disabled to make sure 
gfn->mfn does
not change. So how about we disable ballooning if ioreq.entry_count 
is not 0?


Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is 
set to p2m_invalid?

Like below code:

diff --git a/xen/common/memory.c b/xen/common/memory.c

index 7dbddda..40e5f63 100644

--- a/xen/common/memory.c

+++ b/xen/common/memory.c

@@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned 
long gmfn)


put_gfn(d, gmfn);

return 1;

}

+if ( unlikely(p2mt == p2m_ioreq_server) )

+p2m_change_type_one(d, gmfn,

+p2m_ioreq_server, p2m_ram_rw);

+

#else

mfn = gfn_to_mfn(d, _gfn(gmfn));

#endif



Sorry for the format. Please ignore above code, and see below one:

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 7dbddda..40e5f63 100644
--- a

Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-06 Thread Jan Beulich
>>> On 05.04.17 at 19:28,  wrote:
> Well, count in atomic_write_ept_entry() only works for ept. Besides, it 
> requires
> interface changes - need to pass the p2m.
> Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn 
> does
> not change. So how about we disable ballooning if ioreq.entry_count is 
> not 0?

You can't disable ballooning - the guest is free to make decisions
regarding what memory to return to the hypervisor all by itself.
Tool stack directions (via xenstore) are merely a hint for well
behaved guests.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-06 Thread Jan Beulich
>>> On 05.04.17 at 20:04,  wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long 
> gmfn)
>   put_gfn(d, gmfn);
>   return 1;
>   }
> +if ( unlikely(p2mt == p2m_ioreq_server) )
> +p2m_change_type_one(d, gmfn,
> +p2m_ioreq_server, p2m_ram_rw);
> +
>   #else
>   mfn = gfn_to_mfn(d, _gfn(gmfn));
>   #endif

To be honest, this looks more like a quick hack than a proper solution
at the first glance. To me it would seem preferable if the count was
adjusted at the point the P2M entry is being replaced (i.e. down the
call stack from guest_physmap_remove_page()). The closer to the
actual changing of the P2M entry, the less likely you miss any call
paths (as was already explained by George while suggesting to put
the accounting into the leaf most EPT/NPT functions).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-06 Thread Yu Zhang



On 4/6/2017 3:48 PM, Jan Beulich wrote:

On 05.04.17 at 20:04,  wrote:

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
   put_gfn(d, gmfn);
   return 1;
   }
+if ( unlikely(p2mt == p2m_ioreq_server) )
+p2m_change_type_one(d, gmfn,
+p2m_ioreq_server, p2m_ram_rw);
+
   #else
   mfn = gfn_to_mfn(d, _gfn(gmfn));
   #endif

To be honest, this looks more like a quick hack than a proper solution
at the first glance. To me it would seem preferable if the count was


Yeah, right. :)


adjusted at the point the P2M entry is being replaced (i.e. down the
call stack from guest_physmap_remove_page()). The closer to the
actual changing of the P2M entry, the less likely you miss any call
paths (as was already explained by George while suggesting to put
the accounting into the leaf most EPT/NPT functions).


Well, I thought I have explained the reason why I have always been 
hesitating to do the count in

atomic_write_ept_entry(). But seems I did not make it clear:

1> atomic_write_ept_entry() is used each time an p2m entry is written, 
but sweeping p2m_ioreq_server is
only supposed to happen when an ioreq server unmaps. Checking the p2m 
here impacts the performance,
and I do not think it is worthy - consider there may be very limited 
usage requirement for the p2m_ioreq_server;


2> atomic_write_ept_entry()  does not have a p2m parameter, even some of 
its caller does not have either;


3> the lazy p2m type change is triggered at p2m level, by 
p2m_change_entry_type_global(). It can be used not
only for intel ept. And supporting the count at the lowest level for non 
intel ept platform is more complex than

changes in atomic_write_ept_entry().

4> Fortunately, we have both resolve_misconfig() and do_recalc(), so I 
thought it would be enough to do the count

in these two routines, together with the count in p2m_change_type_one().

But I had to admit I did not think of the extreme scenarios raised by 
George - I had always assumed an p2m_ioreq_server

page will not be allocated to the balloon driver when it is in use.

So here I have another proposal - we shall not allow a p2m_ioreq_server 
being ballooned out. I mean, if some bug in
kernel really allocates a p2m_ioreq_server page to a balloon driver, or 
if the driver is a malicious one which does not
tell the device model this gfn shall be no longer emulated, the 
hypervisor shall let the ballooning fail for this gfn. After
all, if such situation happens, the guest or the device model already 
have bugs, and these last 2 patches are to make
sure that even if there's bug in guest/device model, xen will help do 
the cleanup, instead of to tolerate guest bugs.


If you think this is reasonable, I have drafted a patch, like this:

diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index d5fea72..ff726ad 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -606,7 +606,8 @@ p2m_pod_decrease_reservation(struct domain *d,
 BUG_ON(p2m->pod.entry_count < 0);
 pod -= n;
 }
-else if ( steal_for_cache && p2m_is_ram(t) )
+else if ( steal_for_cache && p2m_is_ram(t) &&
+  (t != p2m_ioreq_server) )
 {
 /*
  * If we need less than 1 << cur_order, we may end up stealing
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 7dbddda..40d5545 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -288,6 +288,14 @@ int guest_remove_page(struct domain *d, unsigned 
long gmfn)

 put_gfn(d, gmfn);
 return 1;
 }
+if ( unlikely(p2mt == p2m_ioreq_server) )
+{
+put_gfn(d, gmfn);
+gdprintk(XENLOG_INFO, "Domain %u page %lx cannot be removed.\n",
+d->domain_id, gmfn);
+return 0;
+}
+
 #else
 mfn = gfn_to_mfn(d, _gfn(gmfn));
 #endif

Changes made in p2m_pod_decrease_reservation() is to not let balloon 
driver to steal a p2m_ioreq_server page in PoD;
Changes made in guest_remove_page() is to disallow a p2m_ioreq_server 
page being removed.



Thanks
Yu


Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

2017-04-06 Thread Jan Beulich
>>> On 06.04.17 at 10:27,  wrote:
> On 4/6/2017 3:48 PM, Jan Beulich wrote:
> On 05.04.17 at 20:04,  wrote:
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long 
>>> gmfn)
>>>put_gfn(d, gmfn);
>>>return 1;
>>>}
>>> +if ( unlikely(p2mt == p2m_ioreq_server) )
>>> +p2m_change_type_one(d, gmfn,
>>> +p2m_ioreq_server, p2m_ram_rw);
>>> +
>>>#else
>>>mfn = gfn_to_mfn(d, _gfn(gmfn));
>>>#endif
>> To be honest, this looks more like a quick hack than a proper solution
>> at the first glance. To me it would seem preferable if the count was
> 
> Yeah, right. :)
> 
>> adjusted at the point the P2M entry is being replaced (i.e. down the
>> call stack from guest_physmap_remove_page()). The closer to the
>> actual changing of the P2M entry, the less likely you miss any call
>> paths (as was already explained by George while suggesting to put
>> the accounting into the leaf most EPT/NPT functions).
> 
> Well, I thought I have explained the reason why I have always been 
> hesitating to do the count in
> atomic_write_ept_entry(). But seems I did not make it clear:

Well, there was no reason to re-explain. I understand your
reasoning, and I understand George's. Hence my request to move
it _closer_ to the leaf function, not specifically to move it _into_
there.

> But I had to admit I did not think of the extreme scenarios raised by 
> George - I had always assumed an p2m_ioreq_server
> page will not be allocated to the balloon driver when it is in use.
> 
> So here I have another proposal - we shall not allow a p2m_ioreq_server 
> being ballooned out. I mean, if some bug in
> kernel really allocates a p2m_ioreq_server page to a balloon driver, or 
> if the driver is a malicious one which does not
> tell the device model this gfn shall be no longer emulated, the 
> hypervisor shall let the ballooning fail for this gfn. After
> all, if such situation happens, the guest or the device model already 
> have bugs, and these last 2 patches are to make
> sure that even if there's bug in guest/device model, xen will help do 
> the cleanup, instead of to tolerate guest bugs.
> 
> If you think this is reasonable, I have drafted a patch, like this:

Well, that's still the same hack as before (merely extended to
PoD code), isn't it? I'd still prefer the accounting to be got right
instead of the page remove attempt to be failed.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel