Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:50 PM, Tamas K Lengyel wrote:

On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
 wrote:


On 1/9/19 6:34 PM, Roger Pau Monné wrote:

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?


Would these two guests be HVM guests? Introspection only works for HVM
guests. I'm not sure I follow your scenario though. How would these
guests collaborate to escape introspection via grants?


If there are two domains acting maliciously in concert to bypass
monitoring of memory writes they could achieve that with grants, yes.
Say a write-monitored page is grant-shared to another domain, which
then does the write on behalf of the first. I wouldn't say that's
"completely bypassing introspection" though, there are many types of
events that can be monitored, write-accesses are only one. I'm not
aware of any mechanism that can be used to limit which pages can be
shared but you can use XSM to restrict which domains can share pages
to begin with. That's normally enough.


Right, I agree. We're not currently dealing with that case and assume 
that XSM (or a similar mechanism) will be used in scenarios where this 
level of access is possible.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Roger Pau Monné
On Wed, Jan 9, 2019 at 5:51 PM Tamas K Lengyel  wrote:
>
> On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
>  wrote:
> >
> > On 1/9/19 6:34 PM, Roger Pau Monné wrote:
> > > Maybe this is use-case is different, but how does introspection handle
> > > accesses to the shared info page or the runstate info for example?
> > >
> > > I would consider argo to be the same in this regard.
> > 
> >  Not exactly: The shared info page is special in any event. For
> >  runstate info (and alike - there's also struct vcpu_time_info)
> >  I'd question correctness of the current handling. If that's
> >  wrong already, I'd prefer if the issue wasn't spread.
> > >>>
> > >>> There are also grants, which when used together with another guest on
> > >>> the same host could allow to bypass introspection AFAICT? (unless
> > >>> there's some policy applied that limit grant sharing to trusted
> > >>> domains)
> > >>>
> > >>> TBH I'm not sure how to handle hypoervisor accesses with
> > >>> introspection.  My knowledge of introspection is fairly limited, but
> > >>> it pauses the guest and sends a notification to an in guest agent. I'm
> > >>> not sure this is applicable to hypervisor writes, since it's not
> > >>> possible to pause hypervisor execution and wait for a response from a
> > >>> guest agent.
> > >>>
> > >>
> > >> Introspection applications only care about memory accesses performed
> > >> by the guest. Hypervisor accesses to monitored pages are not included
> > >> when monitoring - it is actually a feature when using the emulator in
> > >> Xen to continue guest execution because the hypervisor ignores EPT
> > >> memory permissions that trip the guest for introspection. So having
> > >> the hypervisor access memory or a grant-shared page being accessed in
> > >> another domain are not a problem for introspection.
> > >
> > > Can't then two guests running on the same host be able to completely
> > > bypass introspection? I guess you prevent this by limiting to which
> > > guests pages can be shared?
> >
> > Would these two guests be HVM guests? Introspection only works for HVM
> > guests. I'm not sure I follow your scenario though. How would these
> > guests collaborate to escape introspection via grants?
>
> If there are two domains acting maliciously in concert to bypass
> monitoring of memory writes they could achieve that with grants, yes.
> Say a write-monitored page is grant-shared to another domain, which
> then does the write on behalf of the first. I wouldn't say that's
> "completely bypassing introspection" though, there are many types of
> events that can be monitored, write-accesses are only one. I'm not
> aware of any mechanism that can be used to limit which pages can be
> shared but you can use XSM to restrict which domains can share pages
> to begin with. That's normally enough.

Yes, I assumed that would be the way to protect against such attacks,
ie: limiting to which guests pages can be shared. I think just making
sure the right access checks are placed in XSM (just like they are for
grants) should be enough.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Tamas K Lengyel
On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
 wrote:
>
> On 1/9/19 6:34 PM, Roger Pau Monné wrote:
> > Maybe this is use-case is different, but how does introspection handle
> > accesses to the shared info page or the runstate info for example?
> >
> > I would consider argo to be the same in this regard.
> 
>  Not exactly: The shared info page is special in any event. For
>  runstate info (and alike - there's also struct vcpu_time_info)
>  I'd question correctness of the current handling. If that's
>  wrong already, I'd prefer if the issue wasn't spread.
> >>>
> >>> There are also grants, which when used together with another guest on
> >>> the same host could allow to bypass introspection AFAICT? (unless
> >>> there's some policy applied that limit grant sharing to trusted
> >>> domains)
> >>>
> >>> TBH I'm not sure how to handle hypoervisor accesses with
> >>> introspection.  My knowledge of introspection is fairly limited, but
> >>> it pauses the guest and sends a notification to an in guest agent. I'm
> >>> not sure this is applicable to hypervisor writes, since it's not
> >>> possible to pause hypervisor execution and wait for a response from a
> >>> guest agent.
> >>>
> >>
> >> Introspection applications only care about memory accesses performed
> >> by the guest. Hypervisor accesses to monitored pages are not included
> >> when monitoring - it is actually a feature when using the emulator in
> >> Xen to continue guest execution because the hypervisor ignores EPT
> >> memory permissions that trip the guest for introspection. So having
> >> the hypervisor access memory or a grant-shared page being accessed in
> >> another domain are not a problem for introspection.
> >
> > Can't then two guests running on the same host be able to completely
> > bypass introspection? I guess you prevent this by limiting to which
> > guests pages can be shared?
>
> Would these two guests be HVM guests? Introspection only works for HVM
> guests. I'm not sure I follow your scenario though. How would these
> guests collaborate to escape introspection via grants?

If there are two domains acting maliciously in concert to bypass
monitoring of memory writes they could achieve that with grants, yes.
Say a write-monitored page is grant-shared to another domain, which
then does the write on behalf of the first. I wouldn't say that's
"completely bypassing introspection" though, there are many types of
events that can be monitored, write-accesses are only one. I'm not
aware of any mechanism that can be used to limit which pages can be
shared but you can use XSM to restrict which domains can share pages
to begin with. That's normally enough.

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:34 PM, Roger Pau Monné wrote:

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?


Would these two guests be HVM guests? Introspection only works for HVM 
guests. I'm not sure I follow your scenario though. How would these 
guests collaborate to escape introspection via grants?



If that's the case, and introspection doesn't care about hypervisor
accesses to guest pages, then just getting a reference to the
underlying page when the ring is setup should be enough. There's no
need to check the gfn -> mfn relation every time there's an hypervisor
access to the ring.


I think so, but I might be missing something.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Roger Pau Monné
On Wed, Jan 9, 2019 at 5:17 PM Tamas K Lengyel  wrote:
>
> On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné  wrote:
> >
> > Adding the introspection guys.
> >
> > On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> > > >>> On 04.01.19 at 16:35,  wrote:
> > > > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> > > >> >>> On 04.01.19 at 09:57,  wrote:
> > > >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> > > >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné 
> > > >> >>  wrote:
> > > >> >> >
> > > >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > > >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 
> > > >> >> > > 
> > > > wrote:
> > > >> >> > > >
> > > >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark 
> > > >> >> > > > wrote:
> > > >> >> > Then I wonder why you need such check in any case if the code can
> > > >> >> > handle such cases, the more than the check itself is racy.
> > > >> >>
> > > >> >> OK, so at the root of the question here is: does it matter what the 
> > > >> >> p2m
> > > >> >> type of the memory is at these points:
> > > >> >>
> > > >> >> 1) when the gfn is translated to mfn, at the time of ring 
> > > >> >> registration
> > > >> >
> > > >> > This is the important check, because that's where you should take a
> > > >> > reference to the page. In this case you should check that the page is
> > > >> > of ram_rw type.
> > > >> >
> > > >> >> 2) when the hypervisor writes into guest memory:
> > > >> >> - where the tx_ptr index is initialized in the register op
> > > >> >> - where ringbuf data is written in sendv
> > > >> >> - where ring description data is written in notify
> > > >> >
> > > >> > As long as you keep a reference to the pages that are part of the 
> > > >> > ring
> > > >> > you don't need to do any checks when writing/reading from them. If 
> > > >> > the
> > > >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > > >> > pages that are part of the ring that's the guest problem, the
> > > >> > hypervisor still has a reference to those pages so they won't be
> > > >> > reused.
> > > >>
> > > >> For use cases like introspection this may not be fully correct,
> > > >> but it may also be that my understanding there isn't fully
> > > >> correct. If introspection agents care about _any_ writes to
> > > >> a page, hypervisor ones (which in most cases are merely
> > > >> writes on behalf of the guest) might matter as well. I think
> > > >> to decide whether page accesses need to be accompanied
> > > >> by any checks (and if so, which ones) one needs to
> > > >> - establish what p2m type transitions are possible for a
> > > >>   given page,
> > > >> - verify what restrictions may occur "behind the back" of
> > > >>   the entity wanting to do the accesses,
> > > >> - explore whether doing the extra checking at p2m type
> > > >>   change time wouldn't be better than at the time of access.
> > > >
> > > > Maybe this is use-case is different, but how does introspection handle
> > > > accesses to the shared info page or the runstate info for example?
> > > >
> > > > I would consider argo to be the same in this regard.
> > >
> > > Not exactly: The shared info page is special in any event. For
> > > runstate info (and alike - there's also struct vcpu_time_info)
> > > I'd question correctness of the current handling. If that's
> > > wrong already, I'd prefer if the issue wasn't spread.
> >
> > There are also grants, which when used together with another guest on
> > the same host could allow to bypass introspection AFAICT? (unless
> > there's some policy applied that limit grant sharing to trusted
> > domains)
> >
> > TBH I'm not sure how to handle hypoervisor accesses with
> > introspection.  My knowledge of introspection is fairly limited, but
> > it pauses the guest and sends a notification to an in guest agent. I'm
> > not sure this is applicable to hypervisor writes, since it's not
> > possible to pause hypervisor execution and wait for a response from a
> > guest agent.
> >
>
> Introspection applications only care about memory accesses performed
> by the guest. Hypervisor accesses to monitored pages are not included
> when monitoring - it is actually a feature when using the emulator in
> Xen to continue guest execution because the hypervisor ignores EPT
> memory permissions that trip the guest for introspection. So having
> the hypervisor access memory or a grant-shared page being accessed in
> another domain are not a problem for introspection.

Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?

If that's the case, and introspection doesn't care about hypervisor
accesses to guest pages, then just getting a reference to the
underlying page when the ring is setup should be enough. There's no
need to check the gfn -> mfn relation every

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:15 PM, Tamas K Lengyel wrote:

On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné  wrote:


Adding the introspection guys.

On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:

On 04.01.19 at 16:35,  wrote:

On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:

On 04.01.19 at 09:57,  wrote:

On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:

On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  wrote:


On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:

On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 

wrote:


On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:

Then I wonder why you need such check in any case if the code can
handle such cases, the more than the check itself is racy.


OK, so at the root of the question here is: does it matter what the p2m
type of the memory is at these points:

1) when the gfn is translated to mfn, at the time of ring registration


This is the important check, because that's where you should take a
reference to the page. In this case you should check that the page is
of ram_rw type.


2) when the hypervisor writes into guest memory:
 - where the tx_ptr index is initialized in the register op
 - where ringbuf data is written in sendv
 - where ring description data is written in notify


As long as you keep a reference to the pages that are part of the ring
you don't need to do any checks when writing/reading from them. If the
guest messes up it's p2m and does change the gfn -> mfn mappings for
pages that are part of the ring that's the guest problem, the
hypervisor still has a reference to those pages so they won't be
reused.


For use cases like introspection this may not be fully correct,
but it may also be that my understanding there isn't fully
correct. If introspection agents care about _any_ writes to
a page, hypervisor ones (which in most cases are merely
writes on behalf of the guest) might matter as well. I think
to decide whether page accesses need to be accompanied
by any checks (and if so, which ones) one needs to
- establish what p2m type transitions are possible for a
   given page,
- verify what restrictions may occur "behind the back" of
   the entity wanting to do the accesses,
- explore whether doing the extra checking at p2m type
   change time wouldn't be better than at the time of access.


Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Indeed, that's how it goes.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Tamas K Lengyel
On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné  wrote:
>
> Adding the introspection guys.
>
> On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> > >>> On 04.01.19 at 16:35,  wrote:
> > > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> > >> >>> On 04.01.19 at 09:57,  wrote:
> > >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> > >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné 
> > >> >>  wrote:
> > >> >> >
> > >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 
> > >> >> > > 
> > > wrote:
> > >> >> > > >
> > >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark 
> > >> >> > > > wrote:
> > >> >> > Then I wonder why you need such check in any case if the code can
> > >> >> > handle such cases, the more than the check itself is racy.
> > >> >>
> > >> >> OK, so at the root of the question here is: does it matter what the 
> > >> >> p2m
> > >> >> type of the memory is at these points:
> > >> >>
> > >> >> 1) when the gfn is translated to mfn, at the time of ring registration
> > >> >
> > >> > This is the important check, because that's where you should take a
> > >> > reference to the page. In this case you should check that the page is
> > >> > of ram_rw type.
> > >> >
> > >> >> 2) when the hypervisor writes into guest memory:
> > >> >> - where the tx_ptr index is initialized in the register op
> > >> >> - where ringbuf data is written in sendv
> > >> >> - where ring description data is written in notify
> > >> >
> > >> > As long as you keep a reference to the pages that are part of the ring
> > >> > you don't need to do any checks when writing/reading from them. If the
> > >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > >> > pages that are part of the ring that's the guest problem, the
> > >> > hypervisor still has a reference to those pages so they won't be
> > >> > reused.
> > >>
> > >> For use cases like introspection this may not be fully correct,
> > >> but it may also be that my understanding there isn't fully
> > >> correct. If introspection agents care about _any_ writes to
> > >> a page, hypervisor ones (which in most cases are merely
> > >> writes on behalf of the guest) might matter as well. I think
> > >> to decide whether page accesses need to be accompanied
> > >> by any checks (and if so, which ones) one needs to
> > >> - establish what p2m type transitions are possible for a
> > >>   given page,
> > >> - verify what restrictions may occur "behind the back" of
> > >>   the entity wanting to do the accesses,
> > >> - explore whether doing the extra checking at p2m type
> > >>   change time wouldn't be better than at the time of access.
> > >
> > > Maybe this is use-case is different, but how does introspection handle
> > > accesses to the shared info page or the runstate info for example?
> > >
> > > I would consider argo to be the same in this regard.
> >
> > Not exactly: The shared info page is special in any event. For
> > runstate info (and alike - there's also struct vcpu_time_info)
> > I'd question correctness of the current handling. If that's
> > wrong already, I'd prefer if the issue wasn't spread.
>
> There are also grants, which when used together with another guest on
> the same host could allow to bypass introspection AFAICT? (unless
> there's some policy applied that limit grant sharing to trusted
> domains)
>
> TBH I'm not sure how to handle hypoervisor accesses with
> introspection.  My knowledge of introspection is fairly limited, but
> it pauses the guest and sends a notification to an in guest agent. I'm
> not sure this is applicable to hypervisor writes, since it's not
> possible to pause hypervisor execution and wait for a response from a
> guest agent.
>

Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-07 Thread Roger Pau Monné
Adding the introspection guys.

On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:
> >>> On 04.01.19 at 16:35,  wrote:
> > On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.19 at 09:57,  wrote:
> >> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> >> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  
> >> >> wrote:
> >> >> >
> >> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> >> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 
> >> >> > >  
> > wrote:
> >> >> > > >
> >> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> >> >> > Then I wonder why you need such check in any case if the code can
> >> >> > handle such cases, the more than the check itself is racy.
> >> >> 
> >> >> OK, so at the root of the question here is: does it matter what the p2m
> >> >> type of the memory is at these points:
> >> >> 
> >> >> 1) when the gfn is translated to mfn, at the time of ring registration
> >> > 
> >> > This is the important check, because that's where you should take a
> >> > reference to the page. In this case you should check that the page is
> >> > of ram_rw type.
> >> > 
> >> >> 2) when the hypervisor writes into guest memory:
> >> >> - where the tx_ptr index is initialized in the register op
> >> >> - where ringbuf data is written in sendv
> >> >> - where ring description data is written in notify
> >> > 
> >> > As long as you keep a reference to the pages that are part of the ring
> >> > you don't need to do any checks when writing/reading from them. If the
> >> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> >> > pages that are part of the ring that's the guest problem, the
> >> > hypervisor still has a reference to those pages so they won't be
> >> > reused.
> >> 
> >> For use cases like introspection this may not be fully correct,
> >> but it may also be that my understanding there isn't fully
> >> correct. If introspection agents care about _any_ writes to
> >> a page, hypervisor ones (which in most cases are merely
> >> writes on behalf of the guest) might matter as well. I think
> >> to decide whether page accesses need to be accompanied
> >> by any checks (and if so, which ones) one needs to
> >> - establish what p2m type transitions are possible for a
> >>   given page,
> >> - verify what restrictions may occur "behind the back" of
> >>   the entity wanting to do the accesses,
> >> - explore whether doing the extra checking at p2m type
> >>   change time wouldn't be better than at the time of access.
> > 
> > Maybe this is use-case is different, but how does introspection handle
> > accesses to the shared info page or the runstate info for example?
> > 
> > I would consider argo to be the same in this regard.
> 
> Not exactly: The shared info page is special in any event. For
> runstate info (and alike - there's also struct vcpu_time_info)
> I'd question correctness of the current handling. If that's
> wrong already, I'd prefer if the issue wasn't spread.

There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-04 Thread Jan Beulich
>>> On 04.01.19 at 16:35,  wrote:
> On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
>> >>> On 04.01.19 at 09:57,  wrote:
>> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
>> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  
>> >> wrote:
>> >> >
>> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
>> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 
>> >> > >  
> wrote:
>> >> > > >
>> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> >> > Then I wonder why you need such check in any case if the code can
>> >> > handle such cases, the more than the check itself is racy.
>> >> 
>> >> OK, so at the root of the question here is: does it matter what the p2m
>> >> type of the memory is at these points:
>> >> 
>> >> 1) when the gfn is translated to mfn, at the time of ring registration
>> > 
>> > This is the important check, because that's where you should take a
>> > reference to the page. In this case you should check that the page is
>> > of ram_rw type.
>> > 
>> >> 2) when the hypervisor writes into guest memory:
>> >> - where the tx_ptr index is initialized in the register op
>> >> - where ringbuf data is written in sendv
>> >> - where ring description data is written in notify
>> > 
>> > As long as you keep a reference to the pages that are part of the ring
>> > you don't need to do any checks when writing/reading from them. If the
>> > guest messes up it's p2m and does change the gfn -> mfn mappings for
>> > pages that are part of the ring that's the guest problem, the
>> > hypervisor still has a reference to those pages so they won't be
>> > reused.
>> 
>> For use cases like introspection this may not be fully correct,
>> but it may also be that my understanding there isn't fully
>> correct. If introspection agents care about _any_ writes to
>> a page, hypervisor ones (which in most cases are merely
>> writes on behalf of the guest) might matter as well. I think
>> to decide whether page accesses need to be accompanied
>> by any checks (and if so, which ones) one needs to
>> - establish what p2m type transitions are possible for a
>>   given page,
>> - verify what restrictions may occur "behind the back" of
>>   the entity wanting to do the accesses,
>> - explore whether doing the extra checking at p2m type
>>   change time wouldn't be better than at the time of access.
> 
> Maybe this is use-case is different, but how does introspection handle
> accesses to the shared info page or the runstate info for example?
> 
> I would consider argo to be the same in this regard.

Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-04 Thread Roger Pau Monné
On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:
> >>> On 04.01.19 at 09:57,  wrote:
> > On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> >> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  
> >> wrote:
> >> >
> >> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> >> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  
> >> > > wrote:
> >> > > >
> >> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> >> > Then I wonder why you need such check in any case if the code can
> >> > handle such cases, the more than the check itself is racy.
> >> 
> >> OK, so at the root of the question here is: does it matter what the p2m
> >> type of the memory is at these points:
> >> 
> >> 1) when the gfn is translated to mfn, at the time of ring registration
> > 
> > This is the important check, because that's where you should take a
> > reference to the page. In this case you should check that the page is
> > of ram_rw type.
> > 
> >> 2) when the hypervisor writes into guest memory:
> >> - where the tx_ptr index is initialized in the register op
> >> - where ringbuf data is written in sendv
> >> - where ring description data is written in notify
> > 
> > As long as you keep a reference to the pages that are part of the ring
> > you don't need to do any checks when writing/reading from them. If the
> > guest messes up it's p2m and does change the gfn -> mfn mappings for
> > pages that are part of the ring that's the guest problem, the
> > hypervisor still has a reference to those pages so they won't be
> > reused.
> 
> For use cases like introspection this may not be fully correct,
> but it may also be that my understanding there isn't fully
> correct. If introspection agents care about _any_ writes to
> a page, hypervisor ones (which in most cases are merely
> writes on behalf of the guest) might matter as well. I think
> to decide whether page accesses need to be accompanied
> by any checks (and if so, which ones) one needs to
> - establish what p2m type transitions are possible for a
>   given page,
> - verify what restrictions may occur "behind the back" of
>   the entity wanting to do the accesses,
> - explore whether doing the extra checking at p2m type
>   change time wouldn't be better than at the time of access.

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-04 Thread Jan Beulich
>>> On 04.01.19 at 09:57,  wrote:
> On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
>> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  wrote:
>> >
>> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
>> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  
>> > > wrote:
>> > > >
>> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> > Then I wonder why you need such check in any case if the code can
>> > handle such cases, the more than the check itself is racy.
>> 
>> OK, so at the root of the question here is: does it matter what the p2m
>> type of the memory is at these points:
>> 
>> 1) when the gfn is translated to mfn, at the time of ring registration
> 
> This is the important check, because that's where you should take a
> reference to the page. In this case you should check that the page is
> of ram_rw type.
> 
>> 2) when the hypervisor writes into guest memory:
>> - where the tx_ptr index is initialized in the register op
>> - where ringbuf data is written in sendv
>> - where ring description data is written in notify
> 
> As long as you keep a reference to the pages that are part of the ring
> you don't need to do any checks when writing/reading from them. If the
> guest messes up it's p2m and does change the gfn -> mfn mappings for
> pages that are part of the ring that's the guest problem, the
> hypervisor still has a reference to those pages so they won't be
> reused.

For use cases like introspection this may not be fully correct,
but it may also be that my understanding there isn't fully
correct. If introspection agents care about _any_ writes to
a page, hypervisor ones (which in most cases are merely
writes on behalf of the guest) might matter as well. I think
to decide whether page accesses need to be accompanied
by any checks (and if so, which ones) one needs to
- establish what p2m type transitions are possible for a
  given page,
- verify what restrictions may occur "behind the back" of
  the entity wanting to do the accesses,
- explore whether doing the extra checking at p2m type
  change time wouldn't be better than at the time of access.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-04 Thread Roger Pau Monné
On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:
> On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  wrote:
> >
> > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  
> > > wrote:
> > > >
> > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > Then I wonder why you need such check in any case if the code can
> > handle such cases, the more than the check itself is racy.
> 
> OK, so at the root of the question here is: does it matter what the p2m
> type of the memory is at these points:
> 
> 1) when the gfn is translated to mfn, at the time of ring registration

This is the important check, because that's where you should take a
reference to the page. In this case you should check that the page is
of ram_rw type.

> 2) when the hypervisor writes into guest memory:
> - where the tx_ptr index is initialized in the register op
> - where ringbuf data is written in sendv
> - where ring description data is written in notify

As long as you keep a reference to the pages that are part of the ring
you don't need to do any checks when writing/reading from them. If the
guest messes up it's p2m and does change the gfn -> mfn mappings for
pages that are part of the ring that's the guest problem, the
hypervisor still has a reference to those pages so they won't be
reused.

> or is having PGT_writable_page type and ownership by the domain
> sufficient?
> 
> For 1), I think there's some use in saying no to a guest that has
> supplied a region that appears misconfigured.

Sure, when you setup the ring in Xen you should execute all the checks
and make sure the ring pages are of type ram_rw and take a reference
to each page.

> For 2), input would be appreciated. It currently works under the
> assumption that a p2m type check is unnecessary, which is why the
> put_gfn is where it is.

Right, and this should go away since it's confusing and unnecessary
AFAICT.

> For further background context, here's my understanding of this section:
> 
> When the guest invokes the hypercall operation to register a ring, it
> identifies the memory that it owns and wants the hypervisor to use by
> supplying an array of gfns (or in v2, addresses which are shifted to
> extract their gfns).
> 
> The hypervisor translates from gfns to mfns, using the translation that
> exists at that time, and then refers to that memory internally by mfn
> from there on out. This find_ring_mfn function is where the gfn->mfn
> translation happens.  (The variable name does needs renaming from pfn,
> as you noted - thanks.)
> 
> To do the translation from gfn to mfn, (on x86) it's using
> get_gfn_unshare. That's doing three things:
> * returns the mfn, if there is one.
> * returns the p2m type of that memory.
> * acquires a reference to that gfn, which needs to be dropped at some
>   point.
> 
> The p2m type type check on the gfn in find_ring_mfn at that time is
> possibly conservative, rejecting more types than perhaps it needs to,
> but the type that it accepts (p2m_ram_rw) is sane. It is a validation of
> the p2m type at that instant, intended to detect if the guest has
> supplied memory to the ring register op that does not make sense for it
> to use as a ring, as indicated by the current p2m type, and if so, fail
> early, or indicate that a retry later is needed.
> 
> Then the get_page_and_type call is where the memory identified by the
> mfn that was just obtained, gets locked to PGT_writable_page type, and
> ownership fixed to its current owner domain, by adding to its reference
> count.

Here you should make sure the get_page is performed while the gfn is
still locked, and once you have a reference to the page you can unlock
the gfn. As said above, if the guest then wants to mess up with the
gfn -> mfn mapping that's fine, the hypervisor already has a
reference to the underlying original memory page.

> Then the gfn reference count is dropped with the put_gfn call. This
> means that the guest can elect to change the p2m type afterwards, if it
> wants; (any change needs to be consistent with its domain ownership and
> PGT_writable_page type though -- not sure if that constrains possible types).

Hm, not really get_gfn / put_gfn doesn't actually take references, but
rather pick the p2m lock. What takes references is get_page.

> That memory can have guest-supplied data written into it, either by the
> domain owning the page itself, or in response to argo sendv operations
> by other domains that are authorized to transmit into the ring.
> 
> Your note that the "check itself is racy": ie. that a change of p2m type
> could occur immediately afterwards is true.
> 
> So: Do you think that a check on the current p2m type of the pages in
> the ring is needed at the points where the hypervisor issues writes into
> that ring memory?

No, as long as you have a reference to the page (note: not the gfn)
that you are writing to it should be fine.

Roge

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-21 Thread Christopher Clark
On Fri, Dec 21, 2018 at 12:53 AM Jan Beulich  wrote:
>
> >>> On 21.12.18 at 09:16,  wrote:
> > On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich  wrote:
> >>
> >> >>> On 21.12.18 at 02:25,  wrote:
> >> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich  wrote:
> >> >>
> >> >> >>> On 20.12.18 at 06:29,  wrote:
> >> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
> >> >> >>
> >> >> >> > +static int
> >> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info 
> >> >> >> > *ring_info,
> >> >> >> > +uint32_t npage, 
> >> >> >> > XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> >> >> >> > +uint32_t len)
> >> >> >> > +{
> >> >> >> > +int i;
> >> >> >> > +int ret = 0;
> >> >> >> > +
> >> >> >> > +if ( (npage << PAGE_SHIFT) < len )
> >> >> >> > +return -EINVAL;
> >> >> >> > +
> >> >> >> > +if ( ring_info->mfns )
> >> >> >> > +{
> >> >> >> > +/*
> >> >> >> > + * Ring already existed. Check if it's the same ring,
> >> >> >> > + * i.e. same number of pages and all translated gpfns 
> >> >> >> > still
> >> >> >> > + * translating to the same mfns
> >> >> >> > + */
> >> >> >>
> >> >> >> This comment makes me wonder whether the translations are
> >> >> >> permitted to change at other times. If so I'm not sure what
> >> >> >> value verification here has. If not, this probably would want to
> >> >> >> be debugging-only code.
> >> >> >
> >> >> > My understanding is that the gfn->mfn translation is not necessarily 
> >> >> > stable
> >> >> > across entry and exit from host power state S4, suspend to disk.
>
> Note this ^^^ (and see below).
>
> >> Now I'm afraid there's some confusion here: Originally you've
> >> said "host".
> >>
> >> >> How would that be? It's not stable across guest migration (or
> >> >> its non-live save/restore equivalent),
> >> >
> >> > Right, that's clear.
> >> >
> >> >> but how would things change across S3?
> >> >
> >> > I don't think that they do change in that case.
> >> >
> >> > From studying the code involved above, a related item: the guest runs 
> >> > the same
> >> > suspend and resume kernel code before entering into/exiting from either 
> >> > guest
> >> > S3 or S4, so the guest kernel resume code needs to re-register the 
> >> > rings, to
> >> > cover the case where it is coming up in an environment where they were 
> >> > dropped
> >> > - so that's what it does.
> >> >
> >> > This relates to the code section above: if guest entry to S3 is aborted 
> >> > at the
> >> > final step (eg. error or platform refuses, eg. maybe a physical device
> >> > interaction with passthrough) then the hypervisor has not torn down the 
> >> > rings,
> >> > the guest remains running within the same domain, and the guest resume 
> >> > logic
> >> > runs, which runs through re-registration for all its rings. The check in 
> >> > the
> >> > logic above allows the existing ring mappings within the hypervisor to be
> >> > preserved.
> >>
> >> Yet now you suddenly talk about guest S3.
> >
> > Well, the context is that you did just ask about S3, without
> > specifying host or guest.
>
> I'm sorry to be picky, but no, I don't think I did. You did expicitly
> say "host", making me further think only about that case.

OK, apologies for the confusing direction of the reply. It was not intended
to be so.

> > That logic aims to make ring registration idempotent, to avoid the
> > teardown of established mappings of the ring pages in the case where
> > doing so isn't needed.
>
> You treat complexity in the kernel for complexity in the hypervisor.

(s/treat/trade/ ?) OK, that is a fair concern, yes.

> I'm not sure this is appropriate, as I can't judge how much more
> difficult it would be for the guest to look after itself. But let's look
> at both cases again:
> - For guest S3, afaik, the domain doesn't change, and hence
>   memory assignment remains the same. No re-registration
>   necessary then afaict.
> - For guest S4, aiui, the domain gets destroyed and a new one
>   built upon resume. Re-registration would be needed, but due
>   to the domain re-construction no leftovers ought to exist in
>   Xen.

I agree.

> Hence to me it would seem more natural to have the guest deal
> with the situation, and have no extra logic for this in Xen. You
> want the guest to re-register anyway, yet simply avoiding to
> do so in the S3 case ought to be a single (or very few)
> conditional(s), i.e. not a whole lot of complexity.

OK. That looks doable. thanks.

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-21 Thread Christopher Clark
On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  wrote:
>
> On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  
> > wrote:
> > >
> > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > > +static inline uint16_t
> > > > +argo_hash_fn(const struct argo_ring_id *id)
> > >
> > > No need for the argo_ prefix for static functions, this is already an
> > > argo specific file.
> >
> > Although the compiler could live without the prefix, I'm finding it helpful 
> > to
> > very easily determine that functions being used are not defined elsewhere
> > within Xen; so I've left the prefix as is for version two of this series.
>
> Why do you care whether they are defined elsewhere in Xen? The scope
> of static functions is limited to the translation unit anyway.

ok, I'll remove the prefixes - you're right that I shouldn't care
whether they are defined elsewhere in Xen, and Jan's points about the
string table expansion and serial line bandwidth are true - I had not
considered those. Would adding a note to describe this reasoning to
the CODING_STYLE document be welcome?

> > > > +#else
> > > > +*mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > > > +#endif
> > > > +
> > > > +if ( !mfn_valid(*mfn) )
> > > > +ret = -EINVAL;
> > > > +#ifdef CONFIG_X86
> > > > +else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > > > +ret = -EAGAIN;
> > > > +#endif
> > > > +else if ( (p2mt != p2m_ram_rw) ||
> > > > +  !get_page_and_type(mfn_to_page(*mfn), d, 
> > > > PGT_writable_page) )
> > > > +ret = -EINVAL;
> > > > +
> > > > +#ifdef CONFIG_X86
> > > > +put_gfn(d, pfn);
> > >
> > > If you do this put_gfn here, by the time you check that the gfn -> mfn
> > > matches your expectations the guest might have somehow changed the gfn
> > > -> mfn mapping already (for example by ballooning down memory?)
> >
> > If the guest does that, I think it only harms itself. If for some reason
> > a memory access is denied, then the op would just fail. I don't think
> > there's a more serious consequence to be worried about.
>
> Then I wonder why you need such check in any case if the code can
> handle such cases, the more than the check itself is racy.

OK, so at the root of the question here is: does it matter what the p2m
type of the memory is at these points:

1) when the gfn is translated to mfn, at the time of ring registration

2) when the hypervisor writes into guest memory:
- where the tx_ptr index is initialized in the register op
- where ringbuf data is written in sendv
- where ring description data is written in notify

or is having PGT_writable_page type and ownership by the domain
sufficient?

For 1), I think there's some use in saying no to a guest that has
supplied a region that appears misconfigured.

For 2), input would be appreciated. It currently works under the
assumption that a p2m type check is unnecessary, which is why the
put_gfn is where it is.

For further background context, here's my understanding of this section:

When the guest invokes the hypercall operation to register a ring, it
identifies the memory that it owns and wants the hypervisor to use by
supplying an array of gfns (or in v2, addresses which are shifted to
extract their gfns).

The hypervisor translates from gfns to mfns, using the translation that
exists at that time, and then refers to that memory internally by mfn
from there on out. This find_ring_mfn function is where the gfn->mfn
translation happens.  (The variable name does needs renaming from pfn,
as you noted - thanks.)

To do the translation from gfn to mfn, (on x86) it's using
get_gfn_unshare. That's doing three things:
* returns the mfn, if there is one.
* returns the p2m type of that memory.
* acquires a reference to that gfn, which needs to be dropped at some
  point.

The p2m type type check on the gfn in find_ring_mfn at that time is
possibly conservative, rejecting more types than perhaps it needs to,
but the type that it accepts (p2m_ram_rw) is sane. It is a validation of
the p2m type at that instant, intended to detect if the guest has
supplied memory to the ring register op that does not make sense for it
to use as a ring, as indicated by the current p2m type, and if so, fail
early, or indicate that a retry later is needed.

Then the get_page_and_type call is where the memory identified by the
mfn that was just obtained, gets locked to PGT_writable_page type, and
ownership fixed to its current owner domain, by adding to its reference
count.

Then the gfn reference count is dropped with the put_gfn call. This
means that the guest can elect to change the p2m type afterwards, if it
wants; (any change needs to be consistent with its domain ownership and
PGT_writable_page type though -- not sure if that constrains possible types).

That memory can have guest-supplied data written into it, either by the
domain owning the page itself, or in 

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-21 Thread Jan Beulich
>>> On 21.12.18 at 09:16,  wrote:
> On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich  wrote:
>>
>> >>> On 21.12.18 at 02:25,  wrote:
>> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich  wrote:
>> >>
>> >> >>> On 20.12.18 at 06:29,  wrote:
>> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
>> >> >>
>> >> >> > +static int
>> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info 
>> >> >> > *ring_info,
>> >> >> > +uint32_t npage, 
>> >> >> > XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
>> >> >> > +uint32_t len)
>> >> >> > +{
>> >> >> > +int i;
>> >> >> > +int ret = 0;
>> >> >> > +
>> >> >> > +if ( (npage << PAGE_SHIFT) < len )
>> >> >> > +return -EINVAL;
>> >> >> > +
>> >> >> > +if ( ring_info->mfns )
>> >> >> > +{
>> >> >> > +/*
>> >> >> > + * Ring already existed. Check if it's the same ring,
>> >> >> > + * i.e. same number of pages and all translated gpfns still
>> >> >> > + * translating to the same mfns
>> >> >> > + */
>> >> >>
>> >> >> This comment makes me wonder whether the translations are
>> >> >> permitted to change at other times. If so I'm not sure what
>> >> >> value verification here has. If not, this probably would want to
>> >> >> be debugging-only code.
>> >> >
>> >> > My understanding is that the gfn->mfn translation is not necessarily 
>> >> > stable
>> >> > across entry and exit from host power state S4, suspend to disk.

Note this ^^^ (and see below).

>> Now I'm afraid there's some confusion here: Originally you've
>> said "host".
>>
>> >> How would that be? It's not stable across guest migration (or
>> >> its non-live save/restore equivalent),
>> >
>> > Right, that's clear.
>> >
>> >> but how would things change across S3?
>> >
>> > I don't think that they do change in that case.
>> >
>> > From studying the code involved above, a related item: the guest runs the 
>> > same
>> > suspend and resume kernel code before entering into/exiting from either 
>> > guest
>> > S3 or S4, so the guest kernel resume code needs to re-register the rings, 
>> > to
>> > cover the case where it is coming up in an environment where they were 
>> > dropped
>> > - so that's what it does.
>> >
>> > This relates to the code section above: if guest entry to S3 is aborted at 
>> > the
>> > final step (eg. error or platform refuses, eg. maybe a physical device
>> > interaction with passthrough) then the hypervisor has not torn down the 
>> > rings,
>> > the guest remains running within the same domain, and the guest resume 
>> > logic
>> > runs, which runs through re-registration for all its rings. The check in 
>> > the
>> > logic above allows the existing ring mappings within the hypervisor to be
>> > preserved.
>>
>> Yet now you suddenly talk about guest S3.
> 
> Well, the context is that you did just ask about S3, without
> specifying host or guest.

I'm sorry to be picky, but no, I don't think I did. You did expicitly
say "host", making me further think only about that case.

> Host S3 doesn't involve much at all, so I
> went and studied the code in both the Linux driver and the hypervisor
> to determine what it does in the case of guest S3, and then replied
> with the above since it is relevant to the code in question. I hope I
> was clear about referring to guest S3 above in my last reply.
> 
> That logic aims to make ring registration idempotent, to avoid the
> teardown of established mappings of the ring pages in the case where
> doing so isn't needed.

You treat complexity in the kernel for complexity in the hypervisor.
I'm not sure this is appropriate, as I can't judge how much more
difficult it would be for the guest to look after itself. But let's look
at both cases again:
- For guest S3, afaik, the domain doesn't change, and hence
  memory assignment remains the same. No re-registration
  necessary then afaict.
- For guest S4, aiui, the domain gets destroyed and a new one
  built upon resume. Re-registration would be needed, but due
  to the domain re-construction no leftovers ought to exist in
  Xen.
Hence to me it would seem more natural to have the guest deal
with the situation, and have no extra logic for this in Xen. You
want the guest to re-register anyway, yet simply avoiding to
do so in the S3 case ought to be a single (or very few)
conditional(s), i.e. not a whole lot of complexity.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-21 Thread Christopher Clark
On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich  wrote:
>
> >>> On 21.12.18 at 02:25,  wrote:
> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich  wrote:
> >>
> >> >>> On 20.12.18 at 06:29,  wrote:
> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
> >> >>
> >> >> > +static int
> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info 
> >> >> > *ring_info,
> >> >> > +uint32_t npage, 
> >> >> > XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd,
> >> >> > +uint32_t len)
> >> >> > +{
> >> >> > +int i;
> >> >> > +int ret = 0;
> >> >> > +
> >> >> > +if ( (npage << PAGE_SHIFT) < len )
> >> >> > +return -EINVAL;
> >> >> > +
> >> >> > +if ( ring_info->mfns )
> >> >> > +{
> >> >> > +/*
> >> >> > + * Ring already existed. Check if it's the same ring,
> >> >> > + * i.e. same number of pages and all translated gpfns still
> >> >> > + * translating to the same mfns
> >> >> > + */
> >> >>
> >> >> This comment makes me wonder whether the translations are
> >> >> permitted to change at other times. If so I'm not sure what
> >> >> value verification here has. If not, this probably would want to
> >> >> be debugging-only code.
> >> >
> >> > My understanding is that the gfn->mfn translation is not necessarily 
> >> > stable
> >> > across entry and exit from host power state S4, suspend to disk.
>
> Now I'm afraid there's some confusion here: Originally you've
> said "host".
>
> >> How would that be? It's not stable across guest migration (or
> >> its non-live save/restore equivalent),
> >
> > Right, that's clear.
> >
> >> but how would things change across S3?
> >
> > I don't think that they do change in that case.
> >
> > From studying the code involved above, a related item: the guest runs the 
> > same
> > suspend and resume kernel code before entering into/exiting from either 
> > guest
> > S3 or S4, so the guest kernel resume code needs to re-register the rings, to
> > cover the case where it is coming up in an environment where they were 
> > dropped
> > - so that's what it does.
> >
> > This relates to the code section above: if guest entry to S3 is aborted at 
> > the
> > final step (eg. error or platform refuses, eg. maybe a physical device
> > interaction with passthrough) then the hypervisor has not torn down the 
> > rings,
> > the guest remains running within the same domain, and the guest resume logic
> > runs, which runs through re-registration for all its rings. The check in the
> > logic above allows the existing ring mappings within the hypervisor to be
> > preserved.
>
> Yet now you suddenly talk about guest S3.

Well, the context is that you did just ask about S3, without
specifying host or guest. Host S3 doesn't involve much at all, so I
went and studied the code in both the Linux driver and the hypervisor
to determine what it does in the case of guest S3, and then replied
with the above since it is relevant to the code in question. I hope I
was clear about referring to guest S3 above in my last reply.

That logic aims to make ring registration idempotent, to avoid the
teardown of established mappings of the ring pages in the case where
doing so isn't needed.

> >> And there's no support for S4 (and I can't see it appearing any time soon).
> >
> > OK. oh well.
>
> Considering the original "host" context, my response here was
> relating to host S4. Guest S4 ought to be functional (as being
> mostly a guest kernel function anyway).

ack.

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-20 Thread Jan Beulich
>>> On 21.12.18 at 02:25,  wrote:
> On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich  wrote:
>>
>> >>> On 20.12.18 at 06:29,  wrote:
>> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
>> >>
>> >> > +static int
>> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
>> >> > +uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) 
>> >> > pfn_hnd,
>> >> > +uint32_t len)
>> >> > +{
>> >> > +int i;
>> >> > +int ret = 0;
>> >> > +
>> >> > +if ( (npage << PAGE_SHIFT) < len )
>> >> > +return -EINVAL;
>> >> > +
>> >> > +if ( ring_info->mfns )
>> >> > +{
>> >> > +/*
>> >> > + * Ring already existed. Check if it's the same ring,
>> >> > + * i.e. same number of pages and all translated gpfns still
>> >> > + * translating to the same mfns
>> >> > + */
>> >>
>> >> This comment makes me wonder whether the translations are
>> >> permitted to change at other times. If so I'm not sure what
>> >> value verification here has. If not, this probably would want to
>> >> be debugging-only code.
>> >
>> > My understanding is that the gfn->mfn translation is not necessarily stable
>> > across entry and exit from host power state S4, suspend to disk.

Now I'm afraid there's some confusion here: Originally you've
said "host".

>> How would that be? It's not stable across guest migration (or
>> its non-live save/restore equivalent),
> 
> Right, that's clear.
> 
>> but how would things change across S3?
> 
> I don't think that they do change in that case.
> 
> From studying the code involved above, a related item: the guest runs the same
> suspend and resume kernel code before entering into/exiting from either guest
> S3 or S4, so the guest kernel resume code needs to re-register the rings, to
> cover the case where it is coming up in an environment where they were dropped
> - so that's what it does.
> 
> This relates to the code section above: if guest entry to S3 is aborted at the
> final step (eg. error or platform refuses, eg. maybe a physical device
> interaction with passthrough) then the hypervisor has not torn down the rings,
> the guest remains running within the same domain, and the guest resume logic
> runs, which runs through re-registration for all its rings. The check in the
> logic above allows the existing ring mappings within the hypervisor to be
> preserved.

Yet now you suddenly talk about guest S3.

>> And there's no support for S4 (and I can't see it appearing any time soon).
> 
> OK. oh well.

Considering the original "host" context, my response here was
relating to host S4. Guest S4 ought to be functional (as being
mostly a guest kernel function anyway).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-20 Thread Christopher Clark
On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich  wrote:
>
> >>> On 20.12.18 at 06:29,  wrote:
> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
> >>
> >> > +static int
> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> >> > +uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) 
> >> > pfn_hnd,
> >> > +uint32_t len)
> >> > +{
> >> > +int i;
> >> > +int ret = 0;
> >> > +
> >> > +if ( (npage << PAGE_SHIFT) < len )
> >> > +return -EINVAL;
> >> > +
> >> > +if ( ring_info->mfns )
> >> > +{
> >> > +/*
> >> > + * Ring already existed. Check if it's the same ring,
> >> > + * i.e. same number of pages and all translated gpfns still
> >> > + * translating to the same mfns
> >> > + */
> >>
> >> This comment makes me wonder whether the translations are
> >> permitted to change at other times. If so I'm not sure what
> >> value verification here has. If not, this probably would want to
> >> be debugging-only code.
> >
> > My understanding is that the gfn->mfn translation is not necessarily stable
> > across entry and exit from host power state S4, suspend to disk.
>
> How would that be? It's not stable across guest migration (or
> its non-live save/restore equivalent),

Right, that's clear.

> but how would things change across S3?

I don't think that they do change in that case.

From studying the code involved above, a related item: the guest runs the same
suspend and resume kernel code before entering into/exiting from either guest
S3 or S4, so the guest kernel resume code needs to re-register the rings, to
cover the case where it is coming up in an environment where they were dropped
- so that's what it does.

This relates to the code section above: if guest entry to S3 is aborted at the
final step (eg. error or platform refuses, eg. maybe a physical device
interaction with passthrough) then the hypervisor has not torn down the rings,
the guest remains running within the same domain, and the guest resume logic
runs, which runs through re-registration for all its rings. The check in the
logic above allows the existing ring mappings within the hypervisor to be
preserved.

I'm not certain that is an enormous win though; it looks like it would be ok
to drop that logic and reestablish the mappings as the ring is used, as per
other cases.

> And there's no support for S4 (and I can't see it appearing any time soon).

OK. oh well.

>
> >> > +static struct argo_ring_info *
> >> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id 
> >> > *id)
> >> > +{
> >> > +uint16_t hash;
> >> > +struct hlist_node *node;
> >>
> >> const?
> >
> > I couldn't determine exactly what you were pointing towards with this one.
> > I've applied 'const' in a lot further place in the next version; please
> > let me know if I've missed where you intended.
>
> This is a pretty general rule: const should be applied to pointer
> target types whenever no modification is intended, to make
> this read-only aspect very obvious (and force people to think
> twice if they alter such a property).
>
> >> > +uint64_t dst_domain_cookie = 0;
> >> > +
> >> > +if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> >> > +return -EINVAL;
> >>
> >> Why? You don't store the handle for later use (and you shouldn't).
> >> If there really is a need for a full page's worth of memory, it
> >> would better be passed in as GFN.
> >
> > I've added this comment for this behaviour in v2:
> >
> > +/*
> > + * Verify the alignment of the ring data structure supplied with the
> > + * understanding that the ring handle supplied points to the same 
> > memory as
> > + * the first entry in the array of pages provided via pg_descr_hnd, 
> > where
> > + * the head of the ring will reside.
> > + * See argo_update_tx_ptr where the location of the tx_ptr is accessed 
> > at a
> > + * fixed offset from head of the first page in the mfn array.
> > + */
>
> Well, this then suggests that you don't want to verify alignment,
> but instead you want to verify addresses match.

ack. I'll take a look at doing that.

>
> >> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, 
> >> > XEN_GUEST_HANDLE_PARAM(void) arg1,
> >> >
> >> >  switch (cmd)
> >> >  {
> >> > +case ARGO_MESSAGE_OP_register_ring:
> >> > +{
> >> > +XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> >> > +guest_handle_cast(arg1, argo_ring_t);
> >> > +XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> >> > +guest_handle_cast(arg2, argo_pfn_t);
> >> > +uint32_t npage = arg3;
> >> > +bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> >> > +
> >> > +if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> >> > +break;
> >>
> >> I don't understand the need for this and ...
> >>
> >> > +if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-20 Thread Roger Pau Monné
On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:
> On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  wrote:
> >
> > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > > +static inline uint16_t
> > > +argo_hash_fn(const struct argo_ring_id *id)
> >
> > No need for the argo_ prefix for static functions, this is already an
> > argo specific file.
> 
> Although the compiler could live without the prefix, I'm finding it helpful to
> very easily determine that functions being used are not defined elsewhere
> within Xen; so I've left the prefix as is for version two of this series.

Why do you care whether they are defined elsewhere in Xen? The scope
of static functions is limited to the translation unit anyway.

> > > +*mfn = get_gfn_unshare(d, pfn, &p2mt);
> >
> > Is this supposed to work for PV guests?
> 
> Yes -- and they seem to work OK. Am I missing something?

No, my fault, this should indeed work for both paging and non paging
assisted guests, sorry for the noise.

> > > +#else
> > > +*mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > > +#endif
> > > +
> > > +if ( !mfn_valid(*mfn) )
> > > +ret = -EINVAL;
> > > +#ifdef CONFIG_X86
> > > +else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > > +ret = -EAGAIN;
> > > +#endif
> > > +else if ( (p2mt != p2m_ram_rw) ||
> > > +  !get_page_and_type(mfn_to_page(*mfn), d, 
> > > PGT_writable_page) )
> > > +ret = -EINVAL;
> > > +
> > > +#ifdef CONFIG_X86
> > > +put_gfn(d, pfn);
> >
> > If you do this put_gfn here, by the time you check that the gfn -> mfn
> > matches your expectations the guest might have somehow changed the gfn
> > -> mfn mapping already (for example by ballooning down memory?)
> 
> If the guest does that, I think it only harms itself. If for some reason
> a memory access is denied, then the op would just fail. I don't think
> there's a more serious consequence to be worried about.

Then I wonder why you need such check in any case if the code can
handle such cases, the more than the check itself is racy.

> Above, if we're going to use the mfn, then we've just done a successful:
> get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)
> 
> which should hold it in a state that we're ok with until we're done
> with it -- see put_page_and_type in argo_ring_remove_mfns.
> 
> > > +/* W(L2) protects all the elements of the domain's ring_info */
> > > +write_lock(&d->argo->lock);
> >
> > I don't understand this W(L2) nomenclature, is this explain somewhere?
> 
> Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
> comment near the top of the file. It's a read-write lock, so 'W' means:
> take the write lock on it.
> 
> > Also there's no such comment when you take the global argo_lock above.
> 
> L2 covers more interesting work than L1, which is why there are more
> comments pertaining to it than L1.

I would add such comments about which locks protect what items to the
declaration of the locks, rather than the usage place. I don't see a
lot of value in the comments there unless they maybe describe an
exception or a corner case, but that might just be my taste.

> > > +/*
> > > + * Messages on the ring are padded to 128 bits
> > > + * Len here refers to the exact length of the data not including the
> > > + * 128 bit header. The message uses
> > > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > > + * Using typeof(a) make clear that this does not truncate any high-order 
> > > bits.
> > > + */
> > > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
> >
> > Why not just use ROUNDUP?
> >
> > And in any case this shouldn't be on the public header IMO, since it's
> > not part of the interface AFAICT.
> 
> Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> because it does need to be in the public header because it's used within the
> Linux device driver, and items in that public Xen header need the 'xen' prefix
> (so they now do).  Within the Linux code, it's used to choose a sensible ring
> size, and also used when manipulating the rx_ptr on the guest side.

I'm quite sure Linux (or any other OS) will have a roundup helper, or
if there's indeed an OS without a roundup helper it should be added to
the generic OS code. There's nothing Xen or ARGO specific in this
roundup helper, hence I see no need to add it to the public header.

I think you should instead:

#define XEN_ARGO_MESSAGE_SIZE 0xf

Or some such and use that value with the OS roundup helper.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-20 Thread Jan Beulich
>>> On 20.12.18 at 06:41,  wrote:
> On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  wrote:
>>
>> On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
>> > +static inline uint16_t
>> > +argo_hash_fn(const struct argo_ring_id *id)
>>
>> No need for the argo_ prefix for static functions, this is already an
>> argo specific file.
> 
> Although the compiler could live without the prefix, I'm finding it helpful to
> very easily determine that functions being used are not defined elsewhere
> within Xen; so I've left the prefix as is for version two of this series.

But you realize that this needlessly increases the string table
size as well as the volume of output going over the (normally
low bandwidth) serial line in case of an isuse?

>> > +/*
>> > + * Messages on the ring are padded to 128 bits
>> > + * Len here refers to the exact length of the data not including the
>> > + * 128 bit header. The message uses
>> > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
>> > + * Using typeof(a) make clear that this does not truncate any high-order 
>> > bits.
>> > + */
>> > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>>
>> Why not just use ROUNDUP?
>>
>> And in any case this shouldn't be on the public header IMO, since it's
>> not part of the interface AFAICT.
> 
> Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
> because it does need to be in the public header because it's used within the
> Linux device driver, and items in that public Xen header need the 'xen' prefix
> (so they now do).  Within the Linux code, it's used to choose a sensible ring
> size, and also used when manipulating the rx_ptr on the guest side.

It's at least questionable to put constructs into the public header
that the header itself doesn't use, and the naming of which may
not be what consumers would prefer. I can see the "single
central place" point though.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-20 Thread Jan Beulich
>>> On 20.12.18 at 06:29,  wrote:
> On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
>>
>> > +static int
>> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
>> > +uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) 
>> > pfn_hnd,
>> > +uint32_t len)
>> > +{
>> > +int i;
>> > +int ret = 0;
>> > +
>> > +if ( (npage << PAGE_SHIFT) < len )
>> > +return -EINVAL;
>> > +
>> > +if ( ring_info->mfns )
>> > +{
>> > +/*
>> > + * Ring already existed. Check if it's the same ring,
>> > + * i.e. same number of pages and all translated gpfns still
>> > + * translating to the same mfns
>> > + */
>>
>> This comment makes me wonder whether the translations are
>> permitted to change at other times. If so I'm not sure what
>> value verification here has. If not, this probably would want to
>> be debugging-only code.
> 
> My understanding is that the gfn->mfn translation is not necessarily stable
> across entry and exit from host power state S4, suspend to disk.

How would that be? It's not stable across guest migration (or
its non-live save/restore equivalent), but how would things
change across S3? And there's no support for S4 (and I can't
see it appearing any time soon).

>> > +static struct argo_ring_info *
>> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
>> > +{
>> > +uint16_t hash;
>> > +struct hlist_node *node;
>>
>> const?
> 
> I couldn't determine exactly what you were pointing towards with this one.
> I've applied 'const' in a lot further place in the next version; please
> let me know if I've missed where you intended.

This is a pretty general rule: const should be applied to pointer
target types whenever no modification is intended, to make
this read-only aspect very obvious (and force people to think
twice if they alter such a property).

>> > +uint64_t dst_domain_cookie = 0;
>> > +
>> > +if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
>> > +return -EINVAL;
>>
>> Why? You don't store the handle for later use (and you shouldn't).
>> If there really is a need for a full page's worth of memory, it
>> would better be passed in as GFN.
> 
> I've added this comment for this behaviour in v2:
> 
> +/*
> + * Verify the alignment of the ring data structure supplied with the
> + * understanding that the ring handle supplied points to the same memory 
> as
> + * the first entry in the array of pages provided via pg_descr_hnd, where
> + * the head of the ring will reside.
> + * See argo_update_tx_ptr where the location of the tx_ptr is accessed 
> at a
> + * fixed offset from head of the first page in the mfn array.
> + */

Well, this then suggests that you don't want to verify alignment,
but instead you want to verify addresses match.

>> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, 
>> > XEN_GUEST_HANDLE_PARAM(void) arg1,
>> >
>> >  switch (cmd)
>> >  {
>> > +case ARGO_MESSAGE_OP_register_ring:
>> > +{
>> > +XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
>> > +guest_handle_cast(arg1, argo_ring_t);
>> > +XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
>> > +guest_handle_cast(arg2, argo_pfn_t);
>> > +uint32_t npage = arg3;
>> > +bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
>> > +
>> > +if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
>> > +break;
>>
>> I don't understand the need for this and ...
>>
>> > +if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
>> > +{
>> > +rc = -EINVAL;
>> > +break;
>> > +}
>> > +if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
>> > +break;
>>
>> ... perhaps also this, when you use copy_from_guest() upon access.
> 
> This is the one piece of feedback on version 1 of this series that I haven't
> taken the time to address yet. The code is evidently safe, with only a 
> possible
> performance decrease a concern, so I'd like to study it further before 
> removing
> any of the checks rather than delay posting version two of this series.

Hmm, re-posting without all comments addressed is not ideal.
It means extra work for the reviewers (unless you've clearly
marked respective code fragments with some sort of TBD
comment).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-19 Thread Christopher Clark
On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné  wrote:
>
> On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> > +static inline uint16_t
> > +argo_hash_fn(const struct argo_ring_id *id)
>
> No need for the argo_ prefix for static functions, this is already an
> argo specific file.

Although the compiler could live without the prefix, I'm finding it helpful to
very easily determine that functions being used are not defined elsewhere
within Xen; so I've left the prefix as is for version two of this series.

> > +{
> > +uint16_t ret;
> > +
> > +ret = (uint16_t)(id->addr.port >> 16);
> > +ret ^= (uint16_t)id->addr.port;
> > +ret ^= id->addr.domain_id;
> > +ret ^= id->partner;
> > +
> > +ret &= (ARGO_HTABLE_SIZE - 1);
>
> I'm having trouble figuring out what this is supposed to do, I think a
> comment and the expected hash formula will help make sure the code is
> correct.

Fair point. I've added a comment with explanation in the next version.

+ * This hash function is used to distribute rings within the per-domain
+ * hash table (d->argo->ring_hash). The hash table will provide a
+ * 'ring_info' struct if a match is found with a 'xen_argo_ring_id' key:
+ * ie. the key is a (domain id, port, partner domain id) tuple.
+ * There aren't many hash table buckets, and this doesn't need to be
+ * cryptographically robust. Since port number varies the most in
+ * expected use, and the Linux driver allocates at both the high and
+ * low ends, incorporate high and low bits to help with distribution.

> Also doesn't this need to be documented in the public header?

No, it's only for internal use within the hypervisor and designed to
meet the hypervisor's hashing requirement with a small table.

> > +ASSERT(ring_info->mfns);
> > +ASSERT(ring_info->mfn_mapping);
>
> We are trying to move away from such assertions, and instead use
> constructions that would prevent issues in non-debug builds. I would
> write the above asserts as:
>
> if ( !ring_info->mfns || !ring_info->mfn_mapping )
> {
> ASSERT_UNREACHABLE();
> return -E;
> }
>
> That way non-debug builds won't trigger page faults if there's indeed
> a way to get here with the wrong state, and debug builds will still
> hit an assert.

ack, will do so in v2.

> > +*mfn = get_gfn_unshare(d, pfn, &p2mt);
>
> Is this supposed to work for PV guests?

Yes -- and they seem to work OK. Am I missing something?

> > +#else
> > +*mfn = p2m_lookup(d, _gfn(pfn), &p2mt);
> > +#endif
> > +
> > +if ( !mfn_valid(*mfn) )
> > +ret = -EINVAL;
> > +#ifdef CONFIG_X86
> > +else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > +ret = -EAGAIN;
> > +#endif
> > +else if ( (p2mt != p2m_ram_rw) ||
> > +  !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > +ret = -EINVAL;
> > +
> > +#ifdef CONFIG_X86
> > +put_gfn(d, pfn);
>
> If you do this put_gfn here, by the time you check that the gfn -> mfn
> matches your expectations the guest might have somehow changed the gfn
> -> mfn mapping already (for example by ballooning down memory?)

If the guest does that, I think it only harms itself. If for some reason
a memory access is denied, then the op would just fail. I don't think
there's a more serious consequence to be worried about.

Above, if we're going to use the mfn, then we've just done a successful:
get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page)

which should hold it in a state that we're ok with until we're done
with it -- see put_page_and_type in argo_ring_remove_mfns.

> > +/* W(L2) protects all the elements of the domain's ring_info */
> > +write_lock(&d->argo->lock);
>
> I don't understand this W(L2) nomenclature, is this explain somewhere?

Yes, sort of. Lock "L2" is the per-domain argo lock, identified in a
comment near the top of the file. It's a read-write lock, so 'W' means:
take the write lock on it.

> Also there's no such comment when you take the global argo_lock above.

L2 covers more interesting work than L1, which is why there are more
comments pertaining to it than L1.

> > +/*
> > + * Messages on the ring are padded to 128 bits
> > + * Len here refers to the exact length of the data not including the
> > + * 128 bit header. The message uses
> > + * ((len + 0xf) & ~0xf) + sizeof(argo_ring_message_header) bytes.
> > + * Using typeof(a) make clear that this does not truncate any high-order 
> > bits.
> > + */
> > +#define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>
> Why not just use ROUNDUP?
>
> And in any case this shouldn't be on the public header IMO, since it's
> not part of the interface AFAICT.

Well, in version two it's now: XEN_ARGO_ROUNDUP :-)
because it does need to be in the public header because it's used within the
Linux device driver, and items in that public Xen header need the 'xen' prefix
(so they now do).  Within the Linux code, it's used to choose a sensible ring
size, and also used whe

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-19 Thread Christopher Clark
On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich  wrote:
>
> > +static int
> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > +uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) 
> > pfn_hnd,
> > +uint32_t len)
> > +{
> > +int i;
> > +int ret = 0;
> > +
> > +if ( (npage << PAGE_SHIFT) < len )
> > +return -EINVAL;
> > +
> > +if ( ring_info->mfns )
> > +{
> > +/*
> > + * Ring already existed. Check if it's the same ring,
> > + * i.e. same number of pages and all translated gpfns still
> > + * translating to the same mfns
> > + */
>
> This comment makes me wonder whether the translations are
> permitted to change at other times. If so I'm not sure what
> value verification here has. If not, this probably would want to
> be debugging-only code.

My understanding is that the gfn->mfn translation is not necessarily stable
across entry and exit from host power state S4, suspend to disk.

I've added extra explanation to the next version of the patch series in
the commit message for the suspend and resume functions which trigger
guest use of this logic.

> > +static struct argo_ring_info *
> > +argo_ring_find_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > +uint16_t hash;
> > +struct hlist_node *node;
>
> const?

I couldn't determine exactly what you were pointing towards with this one.
I've applied 'const' in a lot further place in the next version; please
let me know if I've missed where you intended.

> > +uint64_t dst_domain_cookie = 0;
> > +
> > +if ( !(guest_handle_is_aligned(ring_hnd, ~PAGE_MASK)) )
> > +return -EINVAL;
>
> Why? You don't store the handle for later use (and you shouldn't).
> If there really is a need for a full page's worth of memory, it
> would better be passed in as GFN.

I've added this comment for this behaviour in v2:

+/*
+ * Verify the alignment of the ring data structure supplied with the
+ * understanding that the ring handle supplied points to the same memory as
+ * the first entry in the array of pages provided via pg_descr_hnd, where
+ * the head of the ring will reside.
+ * See argo_update_tx_ptr where the location of the tx_ptr is accessed at a
+ * fixed offset from head of the first page in the mfn array.
+ */

> > @@ -253,6 +723,34 @@ do_argo_message_op(int cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg1,
> >
> >  switch (cmd)
> >  {
> > +case ARGO_MESSAGE_OP_register_ring:
> > +{
> > +XEN_GUEST_HANDLE_PARAM(argo_ring_t) ring_hnd =
> > +guest_handle_cast(arg1, argo_ring_t);
> > +XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd =
> > +guest_handle_cast(arg2, argo_pfn_t);
> > +uint32_t npage = arg3;
> > +bool fail_exist = arg4 & ARGO_REGISTER_FLAG_FAIL_EXIST;
> > +
> > +if ( unlikely(!guest_handle_okay(ring_hnd, 1)) )
> > +break;
>
> I don't understand the need for this and ...
>
> > +if ( unlikely(npage > (ARGO_MAX_RING_SIZE >> PAGE_SHIFT)) )
> > +{
> > +rc = -EINVAL;
> > +break;
> > +}
> > +if ( unlikely(!guest_handle_okay(pfn_hnd, npage)) )
> > +break;
>
> ... perhaps also this, when you use copy_from_guest() upon access.

This is the one piece of feedback on version 1 of this series that I haven't
taken the time to address yet. The code is evidently safe, with only a possible
performance decrease a concern, so I'd like to study it further before removing
any of the checks rather than delay posting version two of this series.

All the other points: ack, and should be ok in v2.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-12 Thread Roger Pau Monné
On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:
> Used by a domain to register a region of memory for receiving messages from
> either a specified other domain, or, if specifying a wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits, the
> hypervisor will use this mapping to copy data from a sending domain into this
> registered ring, making it accessible to the domain that registered the ring 
> to
> receive data.
> 
> In this code, the p2m type of the memory supplied by the guest for the ring
> must be p2m_ram_rw, which is a conservative choice made to defer the need to
> reason about the other p2m types with this commit.
> 
> argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit on
> all architectures, to assist with avoiding the need to add a compat ABI.
> 
> Signed-off-by: Christopher Clark 
> ---
>  xen/common/argo.c  | 498 
> +
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h  |  64 +
>  4 files changed, 566 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 2a95e09..f4e82cf 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  
> +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
>  
> @@ -98,6 +99,25 @@ struct argo_domain
>  };
>  
>  /*
> + * Helper functions
> + */
> +
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)

No need for the argo_ prefix for static functions, this is already an
argo specific file.

> +{
> +uint16_t ret;
> +
> +ret = (uint16_t)(id->addr.port >> 16);
> +ret ^= (uint16_t)id->addr.port;
> +ret ^= id->addr.domain_id;
> +ret ^= id->partner;
> +
> +ret &= (ARGO_HTABLE_SIZE - 1);

I'm having trouble figuring out what this is supposed to do, I think a
comment and the expected hash formula will help make sure the code is
correct.

Also doesn't this need to be documented in the public header?

> +return ret;
> +}
> +
> +/*
>   * locks
>   */
>  
> @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
>  }
>  }
>  
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +   uint8_t **page)
> +{
> +if ( i >= ring_info->nmfns )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"

You likely want to use gprintk here and below, or XENLOG_G_ERR, so
that the guest cannot DoS the console.

> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;
> +}
> +ASSERT(ring_info->mfns);
> +ASSERT(ring_info->mfn_mapping);

We are trying to move away from such assertions, and instead use
constructions that would prevent issues in non-debug builds. I would
write the above asserts as:

if ( !ring_info->mfns || !ring_info->mfn_mapping )
{
ASSERT_UNREACHABLE();
return -E;
}

That way non-debug builds won't trigger page faults if there's indeed
a way to get here with the wrong state, and debug builds will still
hit an assert.

> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +/*
> + * TODO:
> + * The first page of the ring contains the ring indices, so both 
> read and
> + * write access to the page is required by the hypervisor, but 
> read-access
> + * is not needed for this mapping for the remainder of the ring.
> + * Since this mapping will remain resident in Xen's address space for
> + * the lifetime of the ring, and following the principle of least 
> privilege,
> + * it could be preferable to:
> + *  # add a XSM check to determine what policy is wanted here
> + *  # depending on the XSM query, optionally create this mapping as
> + *_write-only_ on platforms that can support it.
> + *(eg. Intel EPT/AMD NPT).
> + */
> +ring_info->mfn_mapping[i] = 
> map_domain_page_global(ring_info->mfns[i]);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;
> +}
> +argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +   mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +}
> +
> +if ( page )
> +*page = ring_info->mfn_mappin

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-12 Thread Jan Beulich
>>> On 01.12.18 at 02:32,  wrote:
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)

We generally prefer to avoid "inline" outside of header files. Also
is there any strict need for the function to return a fixed width
type? Plus what's the point of the _fn suffix?

> +{
> +uint16_t ret;
> +
> +ret = (uint16_t)(id->addr.port >> 16);
> +ret ^= (uint16_t)id->addr.port;

Pointless casts (with ret itself being uint16_t).

> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +   uint8_t **page)
> +{
> +if ( i >= ring_info->nmfns )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;
> +}
> +ASSERT(ring_info->mfns);
> +ASSERT(ring_info->mfn_mapping);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +/*
> + * TODO:
> + * The first page of the ring contains the ring indices, so both 
> read and
> + * write access to the page is required by the hypervisor, but 
> read-access
> + * is not needed for this mapping for the remainder of the ring.
> + * Since this mapping will remain resident in Xen's address space for
> + * the lifetime of the ring, and following the principle of least 
> privilege,
> + * it could be preferable to:
> + *  # add a XSM check to determine what policy is wanted here
> + *  # depending on the XSM query, optionally create this mapping as
> + *_write-only_ on platforms that can support it.
> + *(eg. Intel EPT/AMD NPT).
> + */
> +ring_info->mfn_mapping[i] = 
> map_domain_page_global(ring_info->mfns[i]);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map 
> page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;

Unsuitable error code.

> +}
> +argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +   mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +}
> +
> +if ( page )
> +*page = ring_info->mfn_mapping[i];

This suggests that the parameter is misnamed. "page" variables
should be of types other than struct page_info * only under
exceptional circumstances.

> +static int
> +argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +uint8_t *dst;

Why uint8_t when you don't mean to access bytes? For the
arithmetic below void * should do just fine.

> +uint32_t *p;
> +int ret;
> +
> +ret = argo_ring_map_page(ring_info, 0, &dst);
> +if ( ret )
> +return ret;
> +
> +p = (uint32_t *)(dst + offsetof(argo_ring_t, tx_ptr));

And then you also don't need any cast here.

> +write_atomic(p, tx_ptr);
> +mb();

While guests need to use non-SMP barriers, I don't see why an
SMP one wouldn't be sufficient here. I also don't see why this
isn't smp_wmb().

> @@ -231,6 +319,388 @@ argo_ring_remove_info(struct domain *d, struct 
> argo_ring_info *ring_info)
>  xfree(ring_info);
>  }
>  
> +/*
> + * ring
> + */

I can see the point of using multi-line comments in a few cases
where our style would not permit this, but a single word is imo
too little to justify a style violation.

> +static int
> +argo_find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> +uint32_t npage, XEN_GUEST_HANDLE_PARAM(argo_pfn_t) 
> pfn_hnd,
> +uint32_t len)
> +{
> +int i;
> +int ret = 0;
> +
> +if ( (npage << PAGE_SHIFT) < len )
> +return -EINVAL;
> +
> +if ( ring_info->mfns )
> +{
> +/*
> + * Ring already existed. Check if it's the same ring,
> + * i.e. same number of pages and all translated gpfns still
> + * translating to the same mfns
> + */

This comment makes me wonder whether the translations are
permitted to change at other times. If so I'm not sure what
value verification here has. If not, this probably would want to
be debugging-only code.

> +if ( ring_info->npage != npage )
> +i = ring_info->nmfns + 1; /* forces re-register below */
> +else
> +{
> +for ( i = 0; i < ring_info->nmfns; i++ )
> +{
> +argo_pfn_t pfn;
> +mfn_t mfn;
> +
> +ret = copy_from_guest_offset_errno(&pfn, pfn_hnd, i, 1);
> +if ( ret )
> +break;
> +
> +ret = argo_find_ring_mfn(d, pfn, &mfn);
> +if ( ret )
> +break;
>

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-11 Thread Julien Grall

Hi Christoffer,

On 05/12/2018 22:35, Christopher Clark wrote:

On Wed, Dec 5, 2018 at 9:20 AM Julien Grall  wrote:

On 04/12/2018 09:08, Christopher Clark wrote:

On Sun, Dec 2, 2018 at 12:11 PM Julien Grall  wrote:

On 01/12/2018 01:32, Christopher Clark wrote:

diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
...
+/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
+typedef uint64_t argo_pfn_t;


As you always use 64-bit, can we just use an address? This would make
the ABI agnostic to the hypervisor page granularity.


By address I meant guest physical address (and not guest virtual address).

Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The
software is allowed to use different granularity at different level. This means
that the hypervisor could use 4KB page while the guest kernel would use 64KB
page (and vice versa). Some distros made the choice to only support one type of
page granularity (i.e 64KB for REHL, 4KB for Debian...).

At the moment the hypercall interface is based on the hypervisor page
granularity. Because Xen has always supported 4KB page-granularity, this
assumption was also hardcoded in the kernel.

What prevent us to get 64KB page support in Xen (and therefore support for
52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the
hypercall interface would defact use 64KB frame. This would break any current
guest. It is also not possible to keep 4KB pages everywhere because you can only
map 64KB in Xen. So you may map a bit too much from another guest.

This makes me think that the frame is probably not the best in that situation.
Instead a pair of address/size would be more suitable.

The problem is much larger than this series. But I thought I would attempt to
convince the community using guest physical address over guest frame address
whenever it is possible.


Thanks, Julien -- that explanation is very helpful and your request makes sense.

So in concrete terms, with the change that you're advocating for to
this patch, the 64-bit value that is supplied by the guest in the
array passed as an argument to register_ring would encode the same
guest physical frame number as it currently does in the patch version
presented in this thread, but it would be bit-shifted to the position
used in a physical address.

In addition to that change, a page size indicator would be supplied
too -- for every page address supplied in the call.

Is there a method currently used within Xen (or relevant places
elsewhere) for encoding both the page address and size (ie. 4KB, 16KB
or 64KB) within the same 64-bits?
ie. Knowing that the smallest granularity of page is 4KB, and that all
pages are aligned to at least a 4KB boundary, there are low bits in
the address that are known to be zero, and those could be used to
indicate the page size when supplied to this call. It seems like such
an encoding would allow for avoiding doubling the size of the argument
array, but I'm not sure how inconvenient it would be to work with in
practice.

If so, such an interface change looks manageable and hopefully it
would be acceptable to only support 4KB pages in the current
implementation behind that new ABI for the time being. Let me know
what you think.


If you let the user the choice of the granularity, then, I believe, you will 
prevent the hypervisor to do some optimization.


For instance, if the guest supplies only 4KB page but the hypervisor is 64KB. 
There are no way to easily map them contiguously in the hypervisor (e.g using vmap).


Is there a particular reason to allow the ring buffer to be non-contiguous in 
the guest physical address?


Depending on the answer, there are different way to handle that:
	1) Request the guest to allocate memory using 64KB (on Arm) chunk and pass the 
base address for each chunk
	2) Request the guest to allocate contiguously the buffer and pass the base 
address and size


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-05 Thread Christopher Clark
On Wed, Dec 5, 2018 at 9:20 AM Julien Grall  wrote:
> On 04/12/2018 09:08, Christopher Clark wrote:
> > On Sun, Dec 2, 2018 at 12:11 PM Julien Grall  wrote:
> >> On 01/12/2018 01:32, Christopher Clark wrote:
> >>> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> >>> ...
> >>> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> >>> +typedef uint64_t argo_pfn_t;
> >>
> >> As you always use 64-bit, can we just use an address? This would make
> >> the ABI agnostic to the hypervisor page granularity.
>
> By address I meant guest physical address (and not guest virtual address).
>
> Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The
> software is allowed to use different granularity at different level. This 
> means
> that the hypervisor could use 4KB page while the guest kernel would use 64KB
> page (and vice versa). Some distros made the choice to only support one type 
> of
> page granularity (i.e 64KB for REHL, 4KB for Debian...).
>
> At the moment the hypercall interface is based on the hypervisor page
> granularity. Because Xen has always supported 4KB page-granularity, this
> assumption was also hardcoded in the kernel.
>
> What prevent us to get 64KB page support in Xen (and therefore support for
> 52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the
> hypercall interface would defact use 64KB frame. This would break any current
> guest. It is also not possible to keep 4KB pages everywhere because you can 
> only
> map 64KB in Xen. So you may map a bit too much from another guest.
>
> This makes me think that the frame is probably not the best in that situation.
> Instead a pair of address/size would be more suitable.
>
> The problem is much larger than this series. But I thought I would attempt to
> convince the community using guest physical address over guest frame address
> whenever it is possible.

Thanks, Julien -- that explanation is very helpful and your request makes sense.

So in concrete terms, with the change that you're advocating for to
this patch, the 64-bit value that is supplied by the guest in the
array passed as an argument to register_ring would encode the same
guest physical frame number as it currently does in the patch version
presented in this thread, but it would be bit-shifted to the position
used in a physical address.

In addition to that change, a page size indicator would be supplied
too -- for every page address supplied in the call.

Is there a method currently used within Xen (or relevant places
elsewhere) for encoding both the page address and size (ie. 4KB, 16KB
or 64KB) within the same 64-bits?
ie. Knowing that the smallest granularity of page is 4KB, and that all
pages are aligned to at least a 4KB boundary, there are low bits in
the address that are known to be zero, and those could be used to
indicate the page size when supplied to this call. It seems like such
an encoding would allow for avoiding doubling the size of the argument
array, but I'm not sure how inconvenient it would be to work with in
practice.

If so, such an interface change looks manageable and hopefully it
would be acceptable to only support 4KB pages in the current
implementation behind that new ABI for the time being. Let me know
what you think.

thanks,

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-05 Thread Julien Grall

Hi Christoffer,

On 04/12/2018 09:08, Christopher Clark wrote:

On Sun, Dec 2, 2018 at 12:11 PM Julien Grall  wrote:




On 01/12/2018 01:32, Christopher Clark wrote:

diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
index 20dabc0..5ad8e2b 100644
--- a/xen/include/public/argo.h
+++ b/xen/include/public/argo.h
@@ -21,6 +21,20 @@

   #include "xen.h"

+#define ARGO_RING_MAGIC  0xbd67e163ef2fULL
+
+#define ARGO_DOMID_ANY   DOMID_INVALID
+
+/*
+ * The maximum size of an Argo ring is defined to be: 16GB
+ *  -- which is 0x100 or 16777216 bytes.
+ * A byte index into the ring is at most 24 bits.
+ */
+#define ARGO_MAX_RING_SIZE  (16777216ULL)
+
+/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
+typedef uint64_t argo_pfn_t;


As you always use 64-bit, can we just use an address? This would make
the ABI agnostic to the hypervisor page granularity.


Thanks for reviewing this series.

I'm not sure yet that switching to using addresses instead would be
for the best, so have been working through some reasoning about your
suggestion. This interface is for the guest to identify to the
hypervisor the list of frames of memory to use as the ring, and the
purpose of a frame number is to uniquely identify a frame. Frame
numbers, as opposed to addresses, are going to remain the same across
all processors, independent of the page tables that happen to
currently be in use.


Sorry I wasn't clear enough about the address. By address I meant guest physical 
address (and not guest virtual address).


guest virtual address would indeed be a pretty bad idea as you can't promise the
address would stay mapped forever. For a matter of fact, we already see some 
issues because of (K)PTI.




Where possible, translation should be performed by the guest rather
than the hypervisor, minimizing the hypervisor logic (good for several
reasons) - so it would be better to avoid adding the
address-to-page-number walk and granularity handling in the hypervisor
here. In this case, the guest has the incentive to do that work, given
that it wants to register the ring.

(Slightly out of scope, but hopefully not for long: We have a
near-term interest in using argo to communicate between VMs at
different levels of nesting in L0/L1 nested hypervisors, and I suspect
that frame number translation will end up being easier to handle
across L0/L1 than translation of guest addresses in a VM running at
the other level.)

Could you give a specific scenario you have in mind that is prompting a concern?


Arm processors may support multiple page granularity (4KB, 16KB, 64KB). The 
software is allowed to use different granularity at different level. This means 
that the hypervisor could use 4KB page while the guest kernel would use 64KB 
page (and vice versa). Some distros made the choice to only support one type of 
page granularity (i.e 64KB for REHL, 4KB for Debian...).


At the moment the hypercall interface is based on the hypervisor page 
granularity. Because Xen has always supported 4KB page-granularity, this 
assumption was also hardcoded in the kernel.


What prevent us to get 64KB page support in Xen (and therefore support for 
52-bit address) is the hypercall ABI. If you upgrade Xen to 64KB then the 
hypercall interface would defact use 64KB frame. This would break any current 
guest. It is also not possible to keep 4KB pages everywhere because you can only 
map 64KB in Xen. So you may map a bit too much from another guest.


This makes me think that the frame is probably not the best in that situation. 
Instead a pair of address/size would be more suitable.


The problem is much larger than this series. But I thought I would attempt to 
convince the community using guest physical address over guest frame address 
whenever it is possible.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-04 Thread Paul Durrant
> -Original Message-
> From: Christopher Clark [mailto:christopher.w.cl...@gmail.com]
> Sent: 01 December 2018 01:33
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; George Dunlap
> ; Ian Jackson ; Jan
> Beulich ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Tim (Xen.org) ; Wei Liu
> ; Roger Pau Monne ; Paul
> Durrant ; Rich Persaud ; Ross
> Philipson ; Eric Chanudet
> ; James McKenzie ; Jason
> Andryuk ; Daniel Smith 
> Subject: [PATCH 13/25] argo: implement the register op
> 
> Used by a domain to register a region of memory for receiving messages
> from
> either a specified other domain, or, if specifying a wildcard, any domain.
> 
> This operation creates a mapping within Xen's private address space that
> will remain resident for the lifetime of the ring. In subsequent commits,
> the
> hypervisor will use this mapping to copy data from a sending domain into
> this
> registered ring, making it accessible to the domain that registered the
> ring to
> receive data.
> 
> In this code, the p2m type of the memory supplied by the guest for the
> ring
> must be p2m_ram_rw, which is a conservative choice made to defer the need
> to
> reason about the other p2m types with this commit.
> 
> argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit
> on
> all architectures, to assist with avoiding the need to add a compat ABI.
> 
> Signed-off-by: Christopher Clark 
> ---
>  xen/common/argo.c  | 498
> +
>  xen/include/asm-arm/guest_access.h |   2 +
>  xen/include/asm-x86/guest_access.h |   2 +
>  xen/include/public/argo.h  |  64 +
>  4 files changed, 566 insertions(+)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index 2a95e09..f4e82cf 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
> 
> +DEFINE_XEN_GUEST_HANDLE(argo_pfn_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_addr_t);
>  DEFINE_XEN_GUEST_HANDLE(argo_ring_t);
> 
> @@ -98,6 +99,25 @@ struct argo_domain
>  };
> 
>  /*
> + * Helper functions
> + */
> +
> +static inline uint16_t
> +argo_hash_fn(const struct argo_ring_id *id)
> +{
> +uint16_t ret;
> +
> +ret = (uint16_t)(id->addr.port >> 16);
> +ret ^= (uint16_t)id->addr.port;
> +ret ^= id->addr.domain_id;
> +ret ^= id->partner;
> +
> +ret &= (ARGO_HTABLE_SIZE - 1);
> +
> +return ret;
> +}
> +
> +/*
>   * locks
>   */
> 
> @@ -171,6 +191,74 @@ argo_ring_unmap(struct argo_ring_info *ring_info)
>  }
>  }
> 
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_ring_map_page(struct argo_ring_info *ring_info, uint32_t i,
> +   uint8_t **page)
> +{
> +if ( i >= ring_info->nmfns )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to map
> page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner, ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;

-ENOMEM? The seems to be the conventional errno to use when a global mapping 
fails.

> +}
> +ASSERT(ring_info->mfns);
> +ASSERT(ring_info->mfn_mapping);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +/*
> + * TODO:
> + * The first page of the ring contains the ring indices, so both
> read and
> + * write access to the page is required by the hypervisor, but
> read-access
> + * is not needed for this mapping for the remainder of the ring.
> + * Since this mapping will remain resident in Xen's address space
> for
> + * the lifetime of the ring, and following the principle of least
> privilege,
> + * it could be preferable to:
> + *  # add a XSM check to determine what policy is wanted here
> + *  # depending on the XSM query, optionally create this mapping
> as
> + *_write-only_ on platforms that can support it.
> + *(eg. Intel EPT/AMD NPT).
> + */
> +ring_info->mfn_mapping[i] = map_domain_page_global(ring_info-
> >mfns[i]);
> +
> +if ( !ring_info->mfn_mapping[i] )
> +{
> +printk(XENLOG_ERR "argo: ring (vm%u:%x vm%d) %p attempted to
> map page"
> +   " %u of %u\n", ring_info->id.addr.domain_id,
> +   ring_info->id.addr.port, ring_info->id.partner,
> ring_info,
> +   i, ring_info->nmfns);
> +return -EFAULT;

Same here.

> +}
> +argo_dprintk("mapping page %"PRI_mfn" to %p\n",
> +   mfn_x(ring_info->mfns[i]), ring_info->mfn_mapping[i]);
> +}
> +
> +if ( page )
> +*page = ring_info->mfn_mapping[i];

Blank line here.

> +return 0;
> +}
> +
> +/* caller must have L3 or W(L2) */
> +static int
> +argo_update_tx_ptr(struct argo_ring_info *ring_info, uint32_t tx_ptr)
> +{
> +uint8_t *dst;
> +uint32_t *p;
> +int ret;
> +
> +ret = argo_ring_ma

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-04 Thread Christopher Clark
On Sun, Dec 2, 2018 at 12:11 PM Julien Grall  wrote:
>
>
>
> On 01/12/2018 01:32, Christopher Clark wrote:
> > diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> > index 20dabc0..5ad8e2b 100644
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -21,6 +21,20 @@
> >
> >   #include "xen.h"
> >
> > +#define ARGO_RING_MAGIC  0xbd67e163ef2fULL
> > +
> > +#define ARGO_DOMID_ANY   DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
> > + *  -- which is 0x100 or 16777216 bytes.
> > + * A byte index into the ring is at most 24 bits.
> > + */
> > +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> > +
> > +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> > +typedef uint64_t argo_pfn_t;
>
> As you always use 64-bit, can we just use an address? This would make
> the ABI agnostic to the hypervisor page granularity.

Thanks for reviewing this series.

I'm not sure yet that switching to using addresses instead would be
for the best, so have been working through some reasoning about your
suggestion. This interface is for the guest to identify to the
hypervisor the list of frames of memory to use as the ring, and the
purpose of a frame number is to uniquely identify a frame. Frame
numbers, as opposed to addresses, are going to remain the same across
all processors, independent of the page tables that happen to
currently be in use.

Where possible, translation should be performed by the guest rather
than the hypervisor, minimizing the hypervisor logic (good for several
reasons) - so it would be better to avoid adding the
address-to-page-number walk and granularity handling in the hypervisor
here. In this case, the guest has the incentive to do that work, given
that it wants to register the ring.

(Slightly out of scope, but hopefully not for long: We have a
near-term interest in using argo to communicate between VMs at
different levels of nesting in L0/L1 nested hypervisors, and I suspect
that frame number translation will end up being easier to handle
across L0/L1 than translation of guest addresses in a VM running at
the other level.)

Could you give a specific scenario you have in mind that is prompting a concern?

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2018-12-02 Thread Julien Grall


On 01/12/2018 01:32, Christopher Clark wrote:
> diff --git a/xen/include/public/argo.h b/xen/include/public/argo.h
> index 20dabc0..5ad8e2b 100644
> --- a/xen/include/public/argo.h
> +++ b/xen/include/public/argo.h
> @@ -21,6 +21,20 @@
>   
>   #include "xen.h"
>   
> +#define ARGO_RING_MAGIC  0xbd67e163ef2fULL
> +
> +#define ARGO_DOMID_ANY   DOMID_INVALID
> +
> +/*
> + * The maximum size of an Argo ring is defined to be: 16GB
> + *  -- which is 0x100 or 16777216 bytes.
> + * A byte index into the ring is at most 24 bits.
> + */
> +#define ARGO_MAX_RING_SIZE  (16777216ULL)
> +
> +/* pfn type: 64-bit on all architectures to aid avoiding a compat ABI */
> +typedef uint64_t argo_pfn_t;

As you always use 64-bit, can we just use an address? This would make 
the ABI agnostic to the hypervisor page granularity.

Cheers,

-- 
Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel