Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-30 Thread Jan Beulich
>>> On 30.03.16 at 04:28,  wrote:
> On March 29, 2016 3:21pm,  wrote:
>> >>> On 28.03.16 at 05:33,  wrote:
>> > On March 18, 2016 1:15am,  wrote:
>> >> >>> On 17.03.16 at 07:54,  wrote:
>> >> > --- a/xen/common/grant_table.c
>> >> > +++ b/xen/common/grant_table.c
>> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>> >> >  {
>> >> >  nr_gets++;
>> >> >  (void)get_page(pg, rd);
>> >> > -if ( !(op->flags & GNTMAP_readonly) )
>> >> > -get_page_type(pg, PGT_writable_page);
>> >> > +if ( !(op->flags & GNTMAP_readonly) &&
>> >> > + !get_page_type(pg, PGT_writable_page) )
>> >> > +goto could_not_pin;
>> >>
>> >> This needs explanation, as it doesn't look related to what your
>> >> actual goal is: If an error was possible here, I think this would be
>> >> a security issue. However, as also kind of documented by the
>> >> explicitly ignored return value from get_page(), it is my understanding 
>> >> there
>> here we only obtain an _extra_ reference.
>> >>
>> >
>> > For this point, I inferred from:
>> > map_vcpu_info()
>> > {
>> > ...
>> > if ( !get_page_type(page, PGT_writable_page) )
>> > {
>> > put_page(page);
>> > return -EINVAL;
>> > }
>> > ...
>> > }
>> > , then for get_page_type(), I think the return value:
>> >  0 -- error,
>> >  1-- right.
>> >
>> > So if get_page_type() is failed, we should goto could_not_pin.
>> 
>> Did you read my reply at all? The explanation I'm expecting here is why 
> error
>> checking is all of the sudden needed _at all_.
>> 
> 
> Sorry for my stupid reply.
> As in this version, before the open discussion, I try to return the 
> iommu_{,un}map_page() error in this call tree:
>iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> then, in this point, I try to deal with this iommu_{,un}map_page() error.

I still don't get it: We're talking about a get_page_type() invocation
that previously was known to never fail (or at least so we hope,
based on the existing code). What I'm expecting as an explanation
is why this "cannot fail" state is not true any longer. And while
sorting this out, please pay particular attention to the limited set of
cases where __get_page_type() calls iommu_{,un}map_page() in
the first place.

>> > btw, there is another issue in the call path:
>> > iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
>> >
>> >
>> > I tried to return iommu_{,un}map_page() error code in
>> > __get_page_type(), is it right?
>> 
>> If the operation got fully rolled back - yes. Whether fully rolling back is 
>> feasible
>> there though is - see the respective discussion - an open question.
>> 
> 
> For the open question, does it refer to as below:

Partly.

> """
> As said, we first need
> to settle on an abstract model. Do we want IOMMU mapping
> failures to be fatal to the domain (perhaps with the exception
> of the hardware one)? I think we do, and for the hardware domain
> we'd do things on a best effort basis (always erring on the side
> of unmapping). Which would probably mean crashing the domain
> could be centralized in iommu_{,un}map_page(). How much roll
> back would then still be needed in callers of these functions
> for the hardware domain's sake would need to be seen.
> """
> 
> I hope it is yes.

It is not clear to me what part of the above this is meant to refer to.
Perhaps this is meant to answer the question in the 2nd sentence,
but I think this really ought to take a little more than "yes".

>> >> > --- a/xen/drivers/passthrough/x86/iommu.c
>> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
>> >> domain *d)
>> >> >  this_cpu(iommu_dont_flush_iotlb) = 0;
>> >> >
>> >> >  if ( !rc )
>> >> > -iommu_iotlb_flush_all(d);
>> >> > +{
>> >> > +rc = iommu_iotlb_flush_all(d);
>> >> > +if ( rc )
>> >> > +iommu_teardown(d);
>> >> > +}
>> >> >  else if ( rc != -ERESTART )
>> >> >  iommu_teardown(d);
>> >>
>> >> Why can't you just use the existing call to iommu_teardown(), by
>> >> simply
>> > deleting
>> >> the "else"?
>> >>
>> >
>> > Just check it, could I modify it as below:
>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
>> > *d)
>> >
>> >  if ( !rc )
>> >  iommu_iotlb_flush_all(d);
>> > -else if ( rc != -ERESTART )
>> > +
>> > +if ( rc != -ERESTART )
>> >  iommu_teardown(d);
>> 
>> Clearly not - not only are you losing the return value of
>> iommu_iotlb_flush_all() now, you would then also call
>> iommu_teardown() in the "success" case. My comment was related to code
>> structure, yet you 

Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-29 Thread Xu, Quan
On March 30, 2016 10:28am, Xu, Quan  wrote:
> If this is still the correct one, could you help me send out the correct one?
> 
Sorry, a typo:
If this is still _not_ the correct one, could you help me send out the correct 
one?

Quan
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-29 Thread Xu, Quan
On March 29, 2016 3:21pm,  wrote:
> >>> On 28.03.16 at 05:33,  wrote:
> > On March 18, 2016 1:15am,  wrote:
> >> >>> On 17.03.16 at 07:54,  wrote:
> >> > --- a/xen/common/grant_table.c
> >> > +++ b/xen/common/grant_table.c
> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
> >> >  {
> >> >  nr_gets++;
> >> >  (void)get_page(pg, rd);
> >> > -if ( !(op->flags & GNTMAP_readonly) )
> >> > -get_page_type(pg, PGT_writable_page);
> >> > +if ( !(op->flags & GNTMAP_readonly) &&
> >> > + !get_page_type(pg, PGT_writable_page) )
> >> > +goto could_not_pin;
> >>
> >> This needs explanation, as it doesn't look related to what your
> >> actual goal is: If an error was possible here, I think this would be
> >> a security issue. However, as also kind of documented by the
> >> explicitly ignored return value from get_page(), it is my understanding 
> >> there
> here we only obtain an _extra_ reference.
> >>
> >
> > For this point, I inferred from:
> > map_vcpu_info()
> > {
> > ...
> > if ( !get_page_type(page, PGT_writable_page) )
> > {
> > put_page(page);
> > return -EINVAL;
> > }
> > ...
> > }
> > , then for get_page_type(), I think the return value:
> >  0 -- error,
> >  1-- right.
> >
> > So if get_page_type() is failed, we should goto could_not_pin.
> 
> Did you read my reply at all? The explanation I'm expecting here is why error
> checking is all of the sudden needed _at all_.
> 

Sorry for my stupid reply.
As in this version, before the open discussion, I try to return the 
iommu_{,un}map_page() error in this call tree:
   iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
then, in this point, I try to deal with this iommu_{,un}map_page() error.

> > btw, there is another issue in the call path:
> > iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> >
> >
> > I tried to return iommu_{,un}map_page() error code in
> > __get_page_type(), is it right?
> 
> If the operation got fully rolled back - yes. Whether fully rolling back is 
> feasible
> there though is - see the respective discussion - an open question.
> 

For the open question, does it refer to as below:

"""
As said, we first need
to settle on an abstract model. Do we want IOMMU mapping
failures to be fatal to the domain (perhaps with the exception
of the hardware one)? I think we do, and for the hardware domain
we'd do things on a best effort basis (always erring on the side
of unmapping). Which would probably mean crashing the domain
could be centralized in iommu_{,un}map_page(). How much roll
back would then still be needed in callers of these functions
for the hardware domain's sake would need to be seen.
"""

I hope it is yes. I read all of your emails again and again, I found I did get 
the point until this Monday.
I am summarizing it and would send out in a new thread.


> >> > --- a/xen/drivers/passthrough/x86/iommu.c
> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
> >> domain *d)
> >> >  this_cpu(iommu_dont_flush_iotlb) = 0;
> >> >
> >> >  if ( !rc )
> >> > -iommu_iotlb_flush_all(d);
> >> > +{
> >> > +rc = iommu_iotlb_flush_all(d);
> >> > +if ( rc )
> >> > +iommu_teardown(d);
> >> > +}
> >> >  else if ( rc != -ERESTART )
> >> >  iommu_teardown(d);
> >>
> >> Why can't you just use the existing call to iommu_teardown(), by
> >> simply
> > deleting
> >> the "else"?
> >>
> >
> > Just check it, could I modify it as below:
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
> > *d)
> >
> >  if ( !rc )
> >  iommu_iotlb_flush_all(d);
> > -else if ( rc != -ERESTART )
> > +
> > +if ( rc != -ERESTART )
> >  iommu_teardown(d);
> 
> Clearly not - not only are you losing the return value of
> iommu_iotlb_flush_all() now, you would then also call
> iommu_teardown() in the "success" case. My comment was related to code
> structure, yet you seem to have taken it literally.
> 

Then, what about this one:
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -104,8 +104,9 @@ int arch_iommu_populate_page_table(struct domain *d)
 this_cpu(iommu_dont_flush_iotlb) = 0;

 if ( !rc )
-iommu_iotlb_flush_all(d);
-else if ( rc != -ERESTART )
+rc = iommu_iotlb_flush_all(d);
+
+if ( !rc && rc != -ERESTART )
 iommu_teardown(d);


IMO, my original modification is correct and redundant with 2 
'iommu_teardown()'..
If this is still the correct one, could you help me send out the correct one?

Quan

___

Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-29 Thread Jan Beulich
>>> On 28.03.16 at 05:33,  wrote:
> On March 18, 2016 1:15am,  wrote:
>> >>> On 17.03.16 at 07:54,  wrote:
>> > --- a/xen/common/grant_table.c
>> > +++ b/xen/common/grant_table.c
>> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>> >  {
>> >  nr_gets++;
>> >  (void)get_page(pg, rd);
>> > -if ( !(op->flags & GNTMAP_readonly) )
>> > -get_page_type(pg, PGT_writable_page);
>> > +if ( !(op->flags & GNTMAP_readonly) &&
>> > + !get_page_type(pg, PGT_writable_page) )
>> > +goto could_not_pin;
>> 
>> This needs explanation, as it doesn't look related to what your actual goal 
>> is: If
>> an error was possible here, I think this would be a security issue. However, 
>> as
>> also kind of documented by the explicitly ignored return value from 
>> get_page(),
>> it is my understanding there here we only obtain an _extra_ reference.
>> 
> 
> For this point, I inferred from:
> map_vcpu_info()
> {
> ...
> if ( !get_page_type(page, PGT_writable_page) )
> {
> put_page(page);
> return -EINVAL;
> }
> ...
> }
> , then for get_page_type(), I think the return value:
>  0 -- error, 
>  1-- right.
> 
> So if get_page_type() is failed, we should goto could_not_pin.

Did you read my reply at all? The explanation I'm expecting here is
why error checking is all of the sudden needed _at all_.

> btw, there is another issue in the call path:
> iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> 
> 
> I tried to return iommu_{,un}map_page() error code in __get_page_type(), is 
> it right?

If the operation got fully rolled back - yes. Whether fully rolling back
is feasible there though is - see the respective discussion - an open
question.

>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
>> domain *d)
>> >  this_cpu(iommu_dont_flush_iotlb) = 0;
>> >
>> >  if ( !rc )
>> > -iommu_iotlb_flush_all(d);
>> > +{
>> > +rc = iommu_iotlb_flush_all(d);
>> > +if ( rc )
>> > +iommu_teardown(d);
>> > +}
>> >  else if ( rc != -ERESTART )
>> >  iommu_teardown(d);
>> 
>> Why can't you just use the existing call to iommu_teardown(), by simply 
> deleting
>> the "else"?
>> 
> 
> Just check it, could I modify it as below:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain *d)
> 
>  if ( !rc )
>  iommu_iotlb_flush_all(d);
> -else if ( rc != -ERESTART )
> +
> +if ( rc != -ERESTART )
>  iommu_teardown(d);

Clearly not - not only are you losing the return value of
iommu_iotlb_flush_all() now, you would then also call
iommu_teardown() in the "success" case. My comment was
related to code structure, yet you seem to have taken it
literally.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-27 Thread Xu, Quan
On March 18, 2016 1:15am,  wrote:
> >>> On 17.03.16 at 07:54,  wrote:
> > @@ -53,11 +55,21 @@ static int device_power_down(void)
> >
> >  ioapic_suspend();
> >
> > -iommu_suspend();
> > +err = iommu_suspend();
> > +if ( err )
> > +goto iommu_suspend_error;
> >
> >  lapic_suspend();
> >
> >  return 0;
> > +
> > +iommu_suspend_error:
> 
> Labels indented by at least one space please.
> 

Good, I wasn't aware of it.


> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -830,7 +830,15 @@ out:
> >  {
> >  if ( iommu_flags )
> >  for ( i = 0; i < (1 << order); i++ )
> > -iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> > +{
> > +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> > +if ( rc )
> > +{
> > +while ( i-- > 0 )
> > +iommu_unmap_page(d, gfn + i);
> > +break;
> > +}
> > +}
> >  else
> >  for ( i = 0; i < (1 << order); i++ )
> >  iommu_unmap_page(d, gfn + i);
> 
> Earlier on in the PV mm code you also checked iommu_unmap_page()'s return
> code - why not here (and also in p2m-pt.c)?
> 
> Also I'm quite unhappy about the inconsistent state you leave things
> in: You unmap from the IOMMU, return an error, but leave the EPT entry in
> place.
> 

As I mentioned for the abstract model,
 For iommu_{,un}map_page(), we'd better fix it as a normal error, as the 
error is not only from iommu flush, .e.g, '-ENOMEM'.
 So, we need to {,un}map from the IOMMU, return an error, and roll back the 
failed operation. 

For iommu_unmap_page, it is still under discussion in another thread. I think 
we could hold it on, waiting for another discussion.


> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
> >  {
> >  nr_gets++;
> >  (void)get_page(pg, rd);
> > -if ( !(op->flags & GNTMAP_readonly) )
> > -get_page_type(pg, PGT_writable_page);
> > +if ( !(op->flags & GNTMAP_readonly) &&
> > + !get_page_type(pg, PGT_writable_page) )
> > +goto could_not_pin;
> 
> This needs explanation, as it doesn't look related to what your actual goal 
> is: If
> an error was possible here, I think this would be a security issue. However, 
> as
> also kind of documented by the explicitly ignored return value from 
> get_page(),
> it is my understanding there here we only obtain an _extra_ reference.
> 

For this point, I inferred from:
map_vcpu_info()
{
...
if ( !get_page_type(page, PGT_writable_page) )
{
put_page(page);
return -EINVAL;
}
...
}
, then for get_page_type(), I think the return value:
 0 -- error, 
 1-- right.

So if get_page_type() is failed, we should goto could_not_pin.

btw, there is another issue in the call path:
iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---


I tried to return iommu_{,un}map_page() error code in __get_page_type(), is it 
right?

> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain
> *d,
> >  if ( need_iommu(d) )
> >  {
> >  this_cpu(iommu_dont_flush_iotlb) = 0;
> > -iommu_iotlb_flush(d, xatp->idx - done, done);
> > -iommu_iotlb_flush(d, xatp->gpfn - done, done);
> > +rc = iommu_iotlb_flush(d, xatp->idx - done, done);
> > +if ( !rc )
> > +rc = iommu_iotlb_flush(d, xatp->gpfn - done, done);
> >  }
> 
> And the pattern repeats - you now return an error, but you don't roll back the
> now failed operation. But wait - maybe that intended:
> Are you meaning to crash the guest in such cases (somewhere deep in the flush
> code)? If so, I think that's fine, but you absolutely would need to say so in 
> the
> commit message.
> 

Yes, I should enhance the commit message.

> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
> domain *d)
> >  this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> >  if ( !rc )
> > -iommu_iotlb_flush_all(d);
> > +{
> > +rc = iommu_iotlb_flush_all(d);
> > +if ( rc )
> > +iommu_teardown(d);
> > +}
> >  else if ( rc != -ERESTART )
> >  iommu_teardown(d);
> 
> Why can't you just use the existing call to iommu_teardown(), by simply 
> deleting
> the "else"?
> 

Just check it, could I modify it as below:
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -105,7 +105,8 @@ int 

Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-24 Thread Jan Beulich
>>> On 24.03.16 at 15:12,  wrote:
> On March 24, 2016 5:59pm,  wrote:
>> >>> On 24.03.16 at 10:02,  wrote:
>> > On March 18, 2016 5:49pm,  wrote:
> 
> 
>> >3. For iommu_{,un}map_page(), we'd better fix it as a normal error,
>> > as the error is not only from iommu flush, .e.g, '-ENOMEM'.
>> >  So, we need to {,un}map from the IOMMU, return an error, and roll
>> > back the failed operation( .e.g, unmap EPT).
>> 
>> Well, if that possible in a provably correct way, then sure. But be clear - 
>> when
>> the failure occurs while unmapping, unmapping the EPT entry obviously can't 
>> be
>> the solution, 
> 
> I hope we discuss about the same point as bellow?:
>ept_set_entry()
>{
> 
>   if ( iommu_flags )
> for ( i = 0; i < (1 << order); i++ )
> iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> else
> for ( i = 0; i < (1 << order); i++ )
> iommu_unmap_page(d, gfn + i);
> 
>}
> 
> 
>> you'd need a true roll back.
> 
> Does it refer to as:
> If the old entry is present, we need to write back the old entry? 

Yes, but that's not as simple as it sounds: You also need to take
care of the splitting of super pages which may have occurred.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-24 Thread Xu, Quan
On March 24, 2016 5:59pm,  wrote:
> >>> On 24.03.16 at 10:02,  wrote:
> > On March 18, 2016 5:49pm,  wrote:


> >3. For iommu_{,un}map_page(), we'd better fix it as a normal error,
> > as the error is not only from iommu flush, .e.g, '-ENOMEM'.
> >  So, we need to {,un}map from the IOMMU, return an error, and roll
> > back the failed operation( .e.g, unmap EPT).
> 
> Well, if that possible in a provably correct way, then sure. But be clear - 
> when
> the failure occurs while unmapping, unmapping the EPT entry obviously can't be
> the solution, 

I hope we discuss about the same point as bellow?:
   ept_set_entry()
   {

  if ( iommu_flags )
for ( i = 0; i < (1 << order); i++ )
iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
else
for ( i = 0; i < (1 << order); i++ )
iommu_unmap_page(d, gfn + i);

   }


> you'd need a true roll back.

Does it refer to as:
If the old entry is present, we need to write back the old entry? 

> And of course you should keep in mind
> what happens to the guest if such an operation fails: If you can be certain 
> it'll
> crash because of this later on anyway, you're likely better off crashing it 
> right
> away (such that the reason for the crash is at least obvious).
> 

I think, for iommu_{,un}map_page(), it would be not aware that the domain is 
going to crash.
As mentioned,  the error is not only from iommu flush.
We need to fix it one by one. fortunately, it is limited.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-24 Thread Jan Beulich
>>> On 24.03.16 at 10:02,  wrote:
> On March 18, 2016 5:49pm,  wrote:
>> That was taking only the flush timeout as an error source into account.
>> Now that we see that the lack of error handling pre-exists, we can't just 
>> extend
>> that intended model to also cover those other error reasons without at least
>> having given people a chance to object.
>> 
> 
> For this abstract model, 
> I assume we are on the same page for the precondition:
> If Device-TLB flush timed out, we would hide the target ATS device and crash 
> the domain owning this ATS device. 
> If impacted domain is hardware domain, just throw out a warning.
> 
> Then IMO,
>1. Try the best to return error code.
>2. Log error and don't return error value for hardware_domain init or 
> crashed system shutdown.
>3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as 
> the error is not only from iommu flush, .e.g, '-ENOMEM'.
>  So, we need to {,un}map from the IOMMU, return an error, and roll back 
> the failed operation( .e.g, unmap EPT).

Well, if that possible in a provably correct way, then sure. But be
clear - when the failure occurs while unmapping, unmapping the
EPT entry obviously can't be the solution, you'd need a true
roll back. And of course you should keep in mind what happens to
the guest if such an operation fails: If you can be certain it'll crash
because of this later on anyway, you're likely better off crashing
it right away (such that the reason for the crash is at least obvious).

Jan

>4. for the rest, we may return an error, but don't roll back the failed 
> operation, and we need to analysis the different condition.
> 
> Quan




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-24 Thread Xu, Quan
On March 18, 2016 5:49pm,  wrote:
> >>> On 18.03.16 at 10:38,  wrote:
> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> >> >
> >> Not sure what exactly you're asking for: As said, we first need to
> >> settle on an abstract model. Do we want IOMMU mapping failures to be
> >> fatal to the domain (perhaps with the exception of the hardware one)?
> >> I think we do, and for the hardware domain we'd do things on a best
> >> effort basis (always erring on the side of unmapping). Which would
> >> probably mean crashing the domain could be centralized in
> >> iommu_{,un}map_page(). How much roll back would then still be needed
> >> in callers of these functions for the hardware domain's sake would
> >> need to be seen.
> >>
> >> So before you start coing, give others (namely but not limited to
> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance to voice
> >> differing opinions.
> >>
> > FWIW, the behavior Jan described
> > (crashing the domain for all domains but the hardware domain) was
> > indeed the intended plan for this series, as far as I understood from
> > talking to people and looking at previous email conversations and
> > submissions.
> 
> That was taking only the flush timeout as an error source into account.
> Now that we see that the lack of error handling pre-exists, we can't just 
> extend
> that intended model to also cover those other error reasons without at least
> having given people a chance to object.
> 

For this abstract model, 
I assume we are on the same page for the precondition:
If Device-TLB flush timed out, we would hide the target ATS device and crash 
the domain owning this ATS device. 
If impacted domain is hardware domain, just throw out a warning.

Then IMO,
   1. Try the best to return error code.
   2. Log error and don't return error value for hardware_domain init or 
crashed system shutdown.
   3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as the 
error is not only from iommu flush, .e.g, '-ENOMEM'.
 So, we need to {,un}map from the IOMMU, return an error, and roll back the 
failed operation( .e.g, unmap EPT).
   4. for the rest, we may return an error, but don't roll back the failed 
operation, and we need to analysis the different condition.

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-24 Thread Xu, Quan
On March 18, 2016 4:10pm,  wrote:
> >>> On 18.03.16 at 04:19,  wrote:
> > On March 17, 2016 8:34pm, George Dunlap 
> wrote:
> >> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
> >>  wrote:
> >> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
> >> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> >> @@ -830,7 +830,15 @@ out:
> >> >>  {
> >> >>  if ( iommu_flags )
> >> >>  for ( i = 0; i < (1 << order); i++ )
> >> >> -iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> >> iommu_flags);
> >> >> +{
> >> >> +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) +
> >> >> + i,
> >> iommu_flags);
> >> >> +if ( rc )
> >> >> +{
> >> >> +while ( i-- > 0 )
> >> >> +iommu_unmap_page(d, gfn + i);
> >> >
> >> > This won't unmap gfn+0 (since it will break out when i == 0 without
> >> > calling unmap).
> >>
> >> Oh, no it won't, because the decrement is postfix.
> >>
> >> For us mere mortals, I'd appreciate a comment here like this:
> >>
> >> /* Postfix operator means we will call unmap with i == 0 */
> >>
> > Agreed.
> > For these 2 points, to summarize:
> >- adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
> >- adding a comment:
> > /* Postfix operator means we will call unmap with i == 0 */
> 
> To be honest, I'm opposed to the addition of such comments.
> See also the parallel discussion rooted at
> http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html
> 

Reading the Follow-Ups email, it looks a pretty common cleanup pattern.
now I don't fully get this point, but I would try to follow this pattern.

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-21 Thread Jan Beulich
>>> On 21.03.16 at 07:18,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, March 18, 2016 5:49 PM
>> 
>> >>> On 18.03.16 at 10:38,  wrote:
>> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
>> >> >
>> >> Not sure what exactly you're asking for: As said, we first need to
>> >> settle on an abstract model. Do we want IOMMU mapping failures
>> >> to be fatal to the domain (perhaps with the exception of the
>> >> hardware one)? I think we do, and for the hardware domain we'd
>> >> do things on a best effort basis (always erring on the side of
>> >> unmapping). Which would probably mean crashing the domain
>> >> could be centralized in iommu_{,un}map_page(). How much roll
>> >> back would then still be needed in callers of these functions for
>> >> the hardware domain's sake would need to be seen.
>> >>
>> >> So before you start coing, give others (namely but not limited to
>> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
>> >> to voice differing opinions.
>> >>
>> > I'm nothing of the above but,
>> 
>> Don't you fall under "but not limited to"  ;-) ?
>> 
>> > FWIW, the behavior Jan described
>> > (crashing the domain for all domains but the hardware domain) was
>> > indeed the intended plan for this series, as far as I understood from
>> > talking to people and looking at previous email conversations and
>> > submissions.
>> 
>> That was taking only the flush timeout as an error source into account.
>> Now that we see that the lack of error handling pre-exists, we can't
>> just extend that intended model to also cover those other error
>> reasons without at least having given people a chance to object.
>> 
> 
> It makes sense so I'm OK with this behavior change. 
> 
> Then regarding to Quan's next version (if nobody disagrees), is it better
> to first describe related callers and then reach agreement which caller
> still needs error handling for hardware domain, before Quan starts to
> change actual code (at least based on current discussion Quan didn't
> have thorough understanding of such necessity yet)?

Well, ideally we'd indeed reach agreement before starting to write
or change code. However, in a case like this where error handling
was just ignored in pre-existing code, I think the easiest way to get
a complete picture is to see what places / paths need changing. I.e.
here it may indeed be better to start from the patches. Whether that
means posting the patches in patch form, or - prior to posting them -
extracting what was learned into a textual description is a different
aspect (and I'd leave it to Quan to pick the route more suitable to him).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-21 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, March 18, 2016 5:49 PM
> 
> >>> On 18.03.16 at 10:38,  wrote:
> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> >> >
> >> Not sure what exactly you're asking for: As said, we first need to
> >> settle on an abstract model. Do we want IOMMU mapping failures
> >> to be fatal to the domain (perhaps with the exception of the
> >> hardware one)? I think we do, and for the hardware domain we'd
> >> do things on a best effort basis (always erring on the side of
> >> unmapping). Which would probably mean crashing the domain
> >> could be centralized in iommu_{,un}map_page(). How much roll
> >> back would then still be needed in callers of these functions for
> >> the hardware domain's sake would need to be seen.
> >>
> >> So before you start coing, give others (namely but not limited to
> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
> >> to voice differing opinions.
> >>
> > I'm nothing of the above but,
> 
> Don't you fall under "but not limited to"  ;-) ?
> 
> > FWIW, the behavior Jan described
> > (crashing the domain for all domains but the hardware domain) was
> > indeed the intended plan for this series, as far as I understood from
> > talking to people and looking at previous email conversations and
> > submissions.
> 
> That was taking only the flush timeout as an error source into account.
> Now that we see that the lack of error handling pre-exists, we can't
> just extend that intended model to also cover those other error
> reasons without at least having given people a chance to object.
> 

It makes sense so I'm OK with this behavior change. 

Then regarding to Quan's next version (if nobody disagrees), is it better
to first describe related callers and then reach agreement which caller
still needs error handling for hardware domain, before Quan starts to
change actual code (at least based on current discussion Quan didn't
have thorough understanding of such necessity yet)?

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-20 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, March 17, 2016 3:59 PM
> 
> >>> On 17.03.16 at 08:32,  wrote:
> >>  From: Xu, Quan
> >> Sent: Thursday, March 17, 2016 2:55 PM
> >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain
> *d)
> >>
> >>  tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
> >>  for ( j = 0; j < tmp; j++ )
> >> -iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> >> -   IOMMUF_readable|IOMMUF_writable);
> >> +if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> >> +IOMMUF_readable|IOMMUF_writable) )
> >> +printk(XENLOG_G_ERR
> >> +   "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
> >> +   pfn * tmp + j, pfn * tmp + j);
> >>
> >>  if (!(i & (0xf >> (PAGE_SHIFT - PAGE_SHIFT_4K
> >>  process_pending_softirqs();
> >
> > Hi, Quan, this patch looks good to me in general. Just double confirm one
> > thing. Above doesn't handle error from iommu_map_page, while only
> > throwing out warning. Not sure whether it has been discussed before
> > as the agreement or not.
> 
> For code paths involved in preparing the hardware domain only
> I had specifically asked to handle such in a best effort manner,
> instead of explicitly rendering a system unbootable.
> 

OK, good to know that.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-20 Thread Dario Faggioli
On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> > 
> Not sure what exactly you're asking for: As said, we first need to
> settle on an abstract model. Do we want IOMMU mapping failures
> to be fatal to the domain (perhaps with the exception of the
> hardware one)? I think we do, and for the hardware domain we'd
> do things on a best effort basis (always erring on the side of
> unmapping). Which would probably mean crashing the domain
> could be centralized in iommu_{,un}map_page(). How much roll
> back would then still be needed in callers of these functions for
> the hardware domain's sake would need to be seen.
> 
> So before you start coing, give others (namely but not limited to
> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
> to voice differing opinions.
>
I'm nothing of the above but, FWIW, the behavior Jan described
(crashing the domain for all domains but the hardware domain) was
indeed the intended plan for this series, as far as I understood from
talking to people and looking at previous email conversations and
submissions.

And it looks to me like it is a sane plan.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Xu, Quan
On March 17, 2016 8:30pm,  wrote:
> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> > c997b53..526548e 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page,
> unsigned long type,
> > int preemptible)  {
> >  unsigned long nx, x, y = page->u.inuse.type_info;
> > -int rc = 0;
> > +int rc = 0, ret = 0;
> >
> >  ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >
> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >  if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >  {
> >  if ( (x & PGT_type_mask) == PGT_writable_page )
> > -iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> > +ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >  else if ( type == PGT_writable_page )
> > -iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> > -   page_to_mfn(page),
> > -
> IOMMUF_readable|IOMMUF_writable);
> > +ret = iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> > + page_to_mfn(page),
> > +
> IOMMUF_readable|IOMMUF_writable);
> >  }
> >  }
> >
> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page,
> unsigned long type,
> >  if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >  put_page(page);
> >
> > +if ( !rc )
> > +rc = ret;
> > +
> 
> What's this about?  If the iommu_[un]map_page() operation times out,
> we still go through with calling alloc_page_type(); and if
> alloc_page_type() fails we return its failure value, but if it
> succeeds, we return the error from iommu_[un]map_page()?
> 
Yes.
To be honest, to me, it is tricky too. 
I found it is not right to return directly if the iommu_[un]map_page() 
operation times out.

 """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Jan Beulich
>>> On 18.03.16 at 04:19,  wrote:
> On March 17, 2016 8:34pm, George Dunlap  wrote:
>> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
>>  wrote:
>> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
>> >> --- a/xen/arch/x86/mm/p2m-ept.c
>> >> +++ b/xen/arch/x86/mm/p2m-ept.c
>> >> @@ -830,7 +830,15 @@ out:
>> >>  {
>> >>  if ( iommu_flags )
>> >>  for ( i = 0; i < (1 << order); i++ )
>> >> -iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
>> iommu_flags);
>> >> +{
>> >> +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
>> iommu_flags);
>> >> +if ( rc )
>> >> +{
>> >> +while ( i-- > 0 )
>> >> +iommu_unmap_page(d, gfn + i);
>> >
>> > This won't unmap gfn+0 (since it will break out when i == 0 without
>> > calling unmap).
>> 
>> Oh, no it won't, because the decrement is postfix.
>> 
>> For us mere mortals, I'd appreciate a comment here like this:
>> 
>> /* Postfix operator means we will call unmap with i == 0 */
>> 
> Agreed.
> For these 2 points, to summarize:
>- adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
>- adding a comment:
> /* Postfix operator means we will call unmap with i == 0 */

To be honest, I'm opposed to the addition of such comments.
See also the parallel discussion rooted at
http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Xu, Quan
On March 18, 2016 4:20pm,  wrote:
> >>> On 18.03.16 at 08:54,  wrote:
> > On March 17, 2016 8:30pm,  wrote:
> >> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
> >> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> >> > c997b53..526548e 100644
> >> > --- a/xen/arch/x86/mm.c
> >> > +++ b/xen/arch/x86/mm.c
> >> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> >> > *page,
> >> unsigned long type,
> >> > int preemptible)  {
> >> >  unsigned long nx, x, y = page->u.inuse.type_info;
> >> > -int rc = 0;
> >> > +int rc = 0, ret = 0;
> >> >
> >> >  ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >> >
> >> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> >> *page, unsigned long type,
> >> >  if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >> >  {
> >> >  if ( (x & PGT_type_mask) == PGT_writable_page )
> >> > -iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> > +ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> >  else if ( type == PGT_writable_page )
> >> > -iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> > -   page_to_mfn(page),
> >> > -
> >> IOMMUF_readable|IOMMUF_writable);
> >> > +ret = iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> > + page_to_mfn(page),
> >> > +
> >> IOMMUF_readable|IOMMUF_writable);
> >> >  }
> >> >  }
> >> >
> >> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> >> > *page,
> >> unsigned long type,
> >> >  if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >> >  put_page(page);
> >> >
> >> > +if ( !rc )
> >> > +rc = ret;
> >> > +
> >>
> >> What's this about?  If the iommu_[un]map_page() operation times out,
> >> we still go through with calling alloc_page_type(); and if
> >> alloc_page_type() fails we return its failure value, but if it
> >> succeeds, we return the error from iommu_[un]map_page()?
> >>
> > Yes.
> > To be honest, to me, it is tricky too.
> > I found it is not right to return directly if the iommu_[un]map_page()
> > operation times out.
> >
> >  """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
> > Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"
> 
> What strange a question: Of course it does.
> 

Jan, thanks. 
To be honest, this stupid question was always in my mind.

> As you can infer form the reply I sent yesterday, you first need to settle on 
> an
> abstract model: What do failures mean? If, in the flush case, a timeout is 
> going
> to kill the domain anyway, then error handling can be lighter weight than if 
> you
> mean to try to keep the domain running. Of course in this context you also
> should not forget that iommu_map_page() could already return errors prior to
> your changes (most notably -ENOMEM, but at least the AMD side also produces
> others, with -EFAULT generally being accompanied by domain_crash()). As
> mentioned elsewhere - it seems extremely bogus that these errors weren't
> check for from the beginning.
> 
Jan, I am not familiar/confident enough to this single point, could you tell me 
more how to fix it?
Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread George Dunlap
On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
 wrote:
> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index c997b53..526548e 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, 
>> unsigned long type,
>> int preemptible)
>>  {
>>  unsigned long nx, x, y = page->u.inuse.type_info;
>> -int rc = 0;
>> +int rc = 0, ret = 0;
>>
>>  ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>>
>> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, 
>> unsigned long type,
>>  if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>  {
>>  if ( (x & PGT_type_mask) == PGT_writable_page )
>> -iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>> +ret = iommu_unmap_page(d, mfn_to_gmfn(d, 
>> page_to_mfn(page)));
>>  else if ( type == PGT_writable_page )
>> -iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>> -   page_to_mfn(page),
>> -   IOMMUF_readable|IOMMUF_writable);
>> +ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>> + page_to_mfn(page),
>> + IOMMUF_readable|IOMMUF_writable);
>>  }
>>  }
>>
>> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, 
>> unsigned long type,
>>  if ( (x & PGT_partial) && !(nx & PGT_partial) )
>>  put_page(page);
>>
>> +if ( !rc )
>> +rc = ret;
>> +
>
> What's this about?  If the iommu_[un]map_page() operation times out,
> we still go through with calling alloc_page_type(); and if
> alloc_page_type() fails we return its failure value, but if it
> succeeds, we return the error from iommu_[un]map_page()?
>
>>  return rc;
>>  }
>>
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index 3cb6868..f9bcce7 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -830,7 +830,15 @@ out:
>>  {
>>  if ( iommu_flags )
>>  for ( i = 0; i < (1 << order); i++ )
>> -iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
>> +{
>> +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
>> iommu_flags);
>> +if ( rc )
>> +{
>> +while ( i-- > 0 )
>> +iommu_unmap_page(d, gfn + i);
>
> This won't unmap gfn+0 (since it will break out when i == 0 without
> calling unmap).

Oh, no it won't, because the decrement is postfix.

For us mere mortals, I'd appreciate a comment here like this:

/* Postfix operator means we will call unmap with i == 0 */

Thanks,
 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Xu, Quan
On March 17, 2016 8:34pm, George Dunlap  wrote:
> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
>  wrote:
> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
> >> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> >> c997b53..526548e 100644
> >> --- a/xen/arch/x86/mm.c
> >> +++ b/xen/arch/x86/mm.c
> >> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >> int preemptible)  {
> >>  unsigned long nx, x, y = page->u.inuse.type_info;
> >> -int rc = 0;
> >> +int rc = 0, ret = 0;
> >>
> >>  ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >>
> >> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >>  if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >>  {
> >>  if ( (x & PGT_type_mask) == PGT_writable_page )
> >> -iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >> +ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >>  else if ( type == PGT_writable_page )
> >> -iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >> -   page_to_mfn(page),
> >> -
> IOMMUF_readable|IOMMUF_writable);
> >> +ret = iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >> + page_to_mfn(page),
> >> +
> IOMMUF_readable|IOMMUF_writable);
> >>  }
> >>  }
> >>
> >> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >>  if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >>  put_page(page);
> >>
> >> +if ( !rc )
> >> +rc = ret;
> >> +
> >
> > What's this about?  If the iommu_[un]map_page() operation times out,
> > we still go through with calling alloc_page_type(); and if
> > alloc_page_type() fails we return its failure value, but if it
> > succeeds, we return the error from iommu_[un]map_page()?
> >
> >>  return rc;
> >>  }
> >>
> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> >> index 3cb6868..f9bcce7 100644
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -830,7 +830,15 @@ out:
> >>  {
> >>  if ( iommu_flags )
> >>  for ( i = 0; i < (1 << order); i++ )
> >> -iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> >> +{
> >> +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> >> +if ( rc )
> >> +{
> >> +while ( i-- > 0 )
> >> +iommu_unmap_page(d, gfn + i);
> >
> > This won't unmap gfn+0 (since it will break out when i == 0 without
> > calling unmap).
> 
> Oh, no it won't, because the decrement is postfix.
> 
> For us mere mortals, I'd appreciate a comment here like this:
> 
> /* Postfix operator means we will call unmap with i == 0 */
> 
Agreed.
For these 2 points, to summarize:
   - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
   - adding a comment:
/* Postfix operator means we will call unmap with i == 0 */


Right?

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Jan Beulich
>>> On 17.03.16 at 08:32,  wrote:
>>  From: Xu, Quan
>> Sent: Thursday, March 17, 2016 2:55 PM
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain 
>> *d)
>> 
>>  tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
>>  for ( j = 0; j < tmp; j++ )
>> -iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
>> -   IOMMUF_readable|IOMMUF_writable);
>> +if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
>> +IOMMUF_readable|IOMMUF_writable) )
>> +printk(XENLOG_G_ERR
>> +   "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
>> +   pfn * tmp + j, pfn * tmp + j);
>> 
>>  if (!(i & (0xf >> (PAGE_SHIFT - PAGE_SHIFT_4K
>>  process_pending_softirqs();
> 
> Hi, Quan, this patch looks good to me in general. Just double confirm one
> thing. Above doesn't handle error from iommu_map_page, while only
> throwing out warning. Not sure whether it has been discussed before
> as the agreement or not.

For code paths involved in preparing the hardware domain only
I had specifically asked to handle such in a best effort manner,
instead of explicitly rendering a system unbootable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Tian, Kevin
> From: Xu, Quan
> Sent: Thursday, March 17, 2016 2:55 PM
> 
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index c0d6aab..e5ab10a 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
> 
>  tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
>  for ( j = 0; j < tmp; j++ )
> -iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> -   IOMMUF_readable|IOMMUF_writable);
> +if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> +IOMMUF_readable|IOMMUF_writable) )
> +printk(XENLOG_G_ERR
> +   "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
> +   pfn * tmp + j, pfn * tmp + j);
> 
>  if (!(i & (0xf >> (PAGE_SHIFT - PAGE_SHIFT_4K
>  process_pending_softirqs();

Hi, Quan, this patch looks good to me in general. Just double confirm one
thing. Above doesn't handle error from iommu_map_page, while only
throwing out warning. Not sure whether it has been discussed before
as the agreement or not.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Jan Beulich
>>> On 18.03.16 at 10:09,  wrote:
> On March 18, 2016 4:20pm,  wrote:
>> As you can infer form the reply I sent yesterday, you first need to settle 
>> on an
>> abstract model: What do failures mean? If, in the flush case, a timeout is 
>> going
>> to kill the domain anyway, then error handling can be lighter weight than if 
>> you
>> mean to try to keep the domain running. Of course in this context you also
>> should not forget that iommu_map_page() could already return errors prior to
>> your changes (most notably -ENOMEM, but at least the AMD side also produces
>> others, with -EFAULT generally being accompanied by domain_crash()). As
>> mentioned elsewhere - it seems extremely bogus that these errors weren't
>> check for from the beginning.
>> 
> Jan, I am not familiar/confident enough to this single point, could you tell 
> me more how to fix it?

Not sure what exactly you're asking for: As said, we first need to
settle on an abstract model. Do we want IOMMU mapping failures
to be fatal to the domain (perhaps with the exception of the
hardware one)? I think we do, and for the hardware domain we'd
do things on a best effort basis (always erring on the side of
unmapping). Which would probably mean crashing the domain
could be centralized in iommu_{,un}map_page(). How much roll
back would then still be needed in callers of these functions for
the hardware domain's sake would need to be seen.

So before you start coing, give others (namely but not limited to
VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
to voice differing opinions.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Jan Beulich
>>> On 17.03.16 at 07:54,  wrote:
> @@ -53,11 +55,21 @@ static int device_power_down(void)
>  
>  ioapic_suspend();
>  
> -iommu_suspend();
> +err = iommu_suspend();
> +if ( err )
> +goto iommu_suspend_error;
>  
>  lapic_suspend();
>  
>  return 0;
> +
> +iommu_suspend_error:

Labels indented by at least one space please.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -830,7 +830,15 @@ out:
>  {
>  if ( iommu_flags )
>  for ( i = 0; i < (1 << order); i++ )
> -iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +{
> +rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
> iommu_flags);
> +if ( rc )
> +{
> +while ( i-- > 0 )
> +iommu_unmap_page(d, gfn + i);
> +break;
> +}
> +}
>  else
>  for ( i = 0; i < (1 << order); i++ )
>  iommu_unmap_page(d, gfn + i);

Earlier on in the PV mm code you also checked iommu_unmap_page()'s
return code - why not here (and also in p2m-pt.c)?

Also I'm quite unhappy about the inconsistent state you leave things
in: You unmap from the IOMMU, return an error, but leave the EPT
entry in place.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>  {
>  nr_gets++;
>  (void)get_page(pg, rd);
> -if ( !(op->flags & GNTMAP_readonly) )
> -get_page_type(pg, PGT_writable_page);
> +if ( !(op->flags & GNTMAP_readonly) &&
> + !get_page_type(pg, PGT_writable_page) )
> +goto could_not_pin;

This needs explanation, as it doesn't look related to what your actual
goal is: If an error was possible here, I think this would be a security
issue. However, as also kind of documented by the explicitly ignored
return value from get_page(), it is my understanding there here we
only obtain an _extra_ reference.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain *d,
>  if ( need_iommu(d) )
>  {
>  this_cpu(iommu_dont_flush_iotlb) = 0;
> -iommu_iotlb_flush(d, xatp->idx - done, done);
> -iommu_iotlb_flush(d, xatp->gpfn - done, done);
> +rc = iommu_iotlb_flush(d, xatp->idx - done, done);
> +if ( !rc )
> +rc = iommu_iotlb_flush(d, xatp->gpfn - done, done);
>  }

And the pattern repeats - you now return an error, but you don't
roll back the now failed operation. But wait - maybe that intended:
Are you meaning to crash the guest in such cases (somewhere
deep in the flush code)? If so, I think that's fine, but you
absolutely would need to say so in the commit message.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct domain *d)
>  this_cpu(iommu_dont_flush_iotlb) = 0;
>  
>  if ( !rc )
> -iommu_iotlb_flush_all(d);
> +{
> +rc = iommu_iotlb_flush_all(d);
> +if ( rc )
> +iommu_teardown(d);
> +}
>  else if ( rc != -ERESTART )
>  iommu_teardown(d);

Why can't you just use the existing call to iommu_teardown(), by
simply deleting the "else"?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-19 Thread Jan Beulich
>>> On 18.03.16 at 10:38,  wrote:
> On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
>> > 
>> Not sure what exactly you're asking for: As said, we first need to
>> settle on an abstract model. Do we want IOMMU mapping failures
>> to be fatal to the domain (perhaps with the exception of the
>> hardware one)? I think we do, and for the hardware domain we'd
>> do things on a best effort basis (always erring on the side of
>> unmapping). Which would probably mean crashing the domain
>> could be centralized in iommu_{,un}map_page(). How much roll
>> back would then still be needed in callers of these functions for
>> the hardware domain's sake would need to be seen.
>> 
>> So before you start coing, give others (namely but not limited to
>> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
>> to voice differing opinions.
>>
> I'm nothing of the above but,

Don't you fall under "but not limited to"  ;-) ?

> FWIW, the behavior Jan described
> (crashing the domain for all domains but the hardware domain) was
> indeed the intended plan for this series, as far as I understood from
> talking to people and looking at previous email conversations and
> submissions.

That was taking only the flush timeout as an error source into account.
Now that we see that the lack of error handling pre-exists, we can't
just extend that intended model to also cover those other error
reasons without at least having given people a chance to object.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.

2016-03-18 Thread Jan Beulich
>>> On 18.03.16 at 08:54,  wrote:
> On March 17, 2016 8:30pm,  wrote:
>> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu  wrote:
>> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
>> > c997b53..526548e 100644
>> > --- a/xen/arch/x86/mm.c
>> > +++ b/xen/arch/x86/mm.c
>> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page,
>> unsigned long type,
>> > int preemptible)  {
>> >  unsigned long nx, x, y = page->u.inuse.type_info;
>> > -int rc = 0;
>> > +int rc = 0, ret = 0;
>> >
>> >  ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>> >
>> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
>> *page, unsigned long type,
>> >  if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>> >  {
>> >  if ( (x & PGT_type_mask) == PGT_writable_page )
>> > -iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> > +ret = iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> >  else if ( type == PGT_writable_page )
>> > -iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> > -   page_to_mfn(page),
>> > -
>> IOMMUF_readable|IOMMUF_writable);
>> > +ret = iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> > + page_to_mfn(page),
>> > +
>> IOMMUF_readable|IOMMUF_writable);
>> >  }
>> >  }
>> >
>> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page,
>> unsigned long type,
>> >  if ( (x & PGT_partial) && !(nx & PGT_partial) )
>> >  put_page(page);
>> >
>> > +if ( !rc )
>> > +rc = ret;
>> > +
>> 
>> What's this about?  If the iommu_[un]map_page() operation times out,
>> we still go through with calling alloc_page_type(); and if
>> alloc_page_type() fails we return its failure value, but if it
>> succeeds, we return the error from iommu_[un]map_page()?
>> 
> Yes.
> To be honest, to me, it is tricky too. 
> I found it is not right to return directly if the iommu_[un]map_page() 
> operation times out.
> 
>  """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
> Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"

What strange a question: Of course it does.

As you can infer form the reply I sent yesterday, you first need to
settle on an abstract model: What do failures mean? If, in the flush
case, a timeout is going to kill the domain anyway, then error
handling can be lighter weight than if you mean to try to keep the
domain running. Of course in this context you also should not forget
that iommu_map_page() could already return errors prior to your
changes (most notably -ENOMEM, but at least the AMD side also
produces others, with -EFAULT generally being accompanied by
domain_crash()). As mentioned elsewhere - it seems extremely
bogus that these errors weren't check for from the beginning.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel