Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-12 Thread David Vrabel
On 09/01/15 08:02, Tian, Kevin wrote:
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: Thursday, January 08, 2015 8:43 PM

 Hi,

 Not really.  The IOMMU tables are also 64-bit so there must be enough
 addresses to map all of RAM.  There shouldn't be any need for these
 mappings to be _contiguous_, btw.  You just need to have one free
 address for each mapping.  Again, following how grant maps work, I'd
 imagine that PVH guests will allocate an unused GFN for each mapping
 and do enough bookkeeping to make sure they don't clash with other GFN
 users (grant mapping, ballooning, c).  PV guests will probably be
 given a BFN by the hypervisor at map time (which will be == MFN in
 practice) and just needs to pass the same BFN to the unmap call later
 (it can store it in the GTT meanwhile).

 if possible prefer to make both consistent, i.e. always finding unused GFN?

 I don't think it will be possible.  PV domains are already using BFNs
 supplied by Xen (in fact == MFN) for backend grant mappings, which
 would conflict with supplying their own for these mappings.  But
 again, I think the kernel maintainers for Xen may have a better idea
 of how these interfaces are used inside the kernel.  For example,
 it might be easy enough to wrap the two systems inside a common API
 inside linux.   Again, following how grant mapping works seems like
 the way forward.

 
 So Konrad, do you have any insight here? :-)

Malcolm took two pages of this notebook explaining to me how he thought
it should work (in combination with his PV IOMMU work), so I'll let him
explain.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-09 Thread Tian, Kevin
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: Thursday, January 08, 2015 8:43 PM
 
 Hi,
 
   Not really.  The IOMMU tables are also 64-bit so there must be enough
   addresses to map all of RAM.  There shouldn't be any need for these
   mappings to be _contiguous_, btw.  You just need to have one free
   address for each mapping.  Again, following how grant maps work, I'd
   imagine that PVH guests will allocate an unused GFN for each mapping
   and do enough bookkeeping to make sure they don't clash with other GFN
   users (grant mapping, ballooning, c).  PV guests will probably be
   given a BFN by the hypervisor at map time (which will be == MFN in
   practice) and just needs to pass the same BFN to the unmap call later
   (it can store it in the GTT meanwhile).
 
  if possible prefer to make both consistent, i.e. always finding unused GFN?
 
 I don't think it will be possible.  PV domains are already using BFNs
 supplied by Xen (in fact == MFN) for backend grant mappings, which
 would conflict with supplying their own for these mappings.  But
 again, I think the kernel maintainers for Xen may have a better idea
 of how these interfaces are used inside the kernel.  For example,
 it might be easy enough to wrap the two systems inside a common API
 inside linux.   Again, following how grant mapping works seems like
 the way forward.
 

So Konrad, do you have any insight here? :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-09 Thread Konrad Rzeszutek Wilk
On Fri, Jan 09, 2015 at 08:02:48AM +, Tian, Kevin wrote:
  From: Tim Deegan [mailto:t...@xen.org]
  Sent: Thursday, January 08, 2015 8:43 PM
  
  Hi,
  
Not really.  The IOMMU tables are also 64-bit so there must be enough
addresses to map all of RAM.  There shouldn't be any need for these
mappings to be _contiguous_, btw.  You just need to have one free
address for each mapping.  Again, following how grant maps work, I'd
imagine that PVH guests will allocate an unused GFN for each mapping
and do enough bookkeeping to make sure they don't clash with other GFN
users (grant mapping, ballooning, c).  PV guests will probably be
given a BFN by the hypervisor at map time (which will be == MFN in
practice) and just needs to pass the same BFN to the unmap call later
(it can store it in the GTT meanwhile).
  
   if possible prefer to make both consistent, i.e. always finding unused 
   GFN?
  
  I don't think it will be possible.  PV domains are already using BFNs
  supplied by Xen (in fact == MFN) for backend grant mappings, which
  would conflict with supplying their own for these mappings.  But
  again, I think the kernel maintainers for Xen may have a better idea
  of how these interfaces are used inside the kernel.  For example,
  it might be easy enough to wrap the two systems inside a common API
  inside linux.   Again, following how grant mapping works seems like
  the way forward.
  
 
 So Konrad, do you have any insight here? :-)

For grants we end up making the 'struct page' for said grant be visible
in our linear space. We stash the original BFNs(MFN) in the 'struct page'
and replace the P2M in PV guests with the new BFN(MFN). David and Jenniefer
is working on making this more lightweight.

How often do we these updates? We could also do simpler way - which is
what backend drivers do - is to get a swath of vmalloc memory and hooking
the BFNs to it.  That can stay for quite some time.

The neat thing about vmalloc is that it is an sliding-window
type mechanism to deal with memory that is not usually accessed via
linear page tables. 

I suppose the complexity behind this is that this 'window' at the GPU
page tables needs to change. As in it moves around as there are different
guests doing things. So the mechanism of swapping this 'window' is going
to be expensive to map/unmap (as you have to flush the TLBs in the 
initial domain for the page-tables - unless you have multiple
'windows' and we flush the olders ones lazily? But that sounds complex).

Who is doing the audit/modification ? Is it some application in the
initial domain (backend) domain or some driver in the kernel?

 
 Thanks
 Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-08 Thread Tim Deegan
Hi,

At 08:56 + on 06 Jan (1420530995), Tian, Kevin wrote:
  From: Tim Deegan [mailto:t...@xen.org]
  At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote:
   but just to confirm one point. from my understanding whether it's a
   mapping operation doesn't really matter. We can invent an interface
   to get p2m mapping and then increase refcnt. the key is refcnt here.
   when XenGT constructs a shadow GPU page table, it creates a reference
   to guest memory page so the refcnt must be increased. :-)
  
  True. :)  But Xen does need to remember all the refcounts that were
  created (so it can tidy up if the domain crashes).  If Xen is already
  doing that it might as well do it in the IOMMU tables since that
  solves other problems.
 
 would a refcnt in p2m layer enough so we don't need separate refcnt in both
 EPT and IOMMU page table?

Yes, that sounds right.  The p2m layer is actually the same as the EPT
table, so that is where the refcount should be attached to (and it
shouldn't matter whether the IOMMU page tables are shared or not).

 yes, that's the hard part requiring experiments to find a good balance
 between complexity and performance. IOMMU page table is not designed 
 with same frequent modifications as CPU/GPU page tables, but following
 above trend make them connected. Another option might be reserve a big
 enough BFNs to cover all available guest memory at boot time, so to
 eliminate run-time modification overhead.

Sure, or you can map them on demend but keep a cache of maps to avoid
unmapping between uses. 

  Not really.  The IOMMU tables are also 64-bit so there must be enough
  addresses to map all of RAM.  There shouldn't be any need for these
  mappings to be _contiguous_, btw.  You just need to have one free
  address for each mapping.  Again, following how grant maps work, I'd
  imagine that PVH guests will allocate an unused GFN for each mapping
  and do enough bookkeeping to make sure they don't clash with other GFN
  users (grant mapping, ballooning, c).  PV guests will probably be
  given a BFN by the hypervisor at map time (which will be == MFN in
  practice) and just needs to pass the same BFN to the unmap call later
  (it can store it in the GTT meanwhile).
 
 if possible prefer to make both consistent, i.e. always finding unused GFN?

I don't think it will be possible.  PV domains are already using BFNs
supplied by Xen (in fact == MFN) for backend grant mappings, which
would conflict with supplying their own for these mappings.  But
again, I think the kernel maintainers for Xen may have a better idea
of how these interfaces are used inside the kernel.  For example,
it might be easy enough to wrap the two systems inside a common API
inside linux.   Again, following how grant mapping works seems like
the way forward.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-06 Thread Ian Campbell
On Tue, 2015-01-06 at 08:42 +, Tian, Kevin wrote:
  From: George Dunlap
  Sent: Monday, January 05, 2015 11:50 PM
  
  On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote:
   We're not there in the current design, purely because XenGT has to be
   in dom0 (so it can trivially DoS Xen by rebooting the host).
  
   Can we really decouple dom0 from DoS Xen? I know there's on-going effort
   like PVH Dom0, however there are lots of trickiness in Dom0 which can
   put the platform into a bad state. One example is ACPI. All the platform
   details are encapsulated in AML language, and only dom0 knows how to
   handle ACPI events. Unless Xen has another parser to guard all possible
   resources which might be touched thru ACPI, a tampered dom0 has many
   way to break out. But that'd be very challenging and complex.
  
   If we can't containerize Dom0's behavior completely, I would think dom0
   and Xen actually in the same trust zone, so putting XenGT in Dom0 
   shouldn't
   make things worse.
  
  The question here is, If a malicious guest can manage to break into
  XenGT, what can they do?
  
  If XenGT is running in dom0, then the answer is, At very least, they
  can DoS the host because dom0 is allowed to reboot; they can probably
  do lots of other nasty things as well.
  
  If XenGT is running in its own domain, and can only add IOMMU entries
  for MFNs belonging to XenGT-only VMs, then the answer is, They can
  access other XenGT-enabled VMs, but they cannot shut down the host or
  access non-XenGT VMs.
  
  Slides 8-11 of a presentation I gave
  (http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-s
  ecurity-features)
  can give you a graphical idea of what we're' talking about.
  
 
 I agree we need to make XenGT more isolated following on-going trend from
 previous discussion, but regarding to whether Dom0/Xen are in the same 
 security
 domain, I don't see my statement is changed w/ above attempts which just try 
 to 
 move privileged Xen stuff away from dom0, but all existing Linux 
 vulnerabilities 
 allow a tampered Dom0 do many evil things with root permission or even 
 tampered 
 kernel to DoS Xen (e.g. w/ ACPI). PVH dom0 can help performance... but itself 
 alone 
 doesn't change the fact that Dom0/Xen are actually in the same security 
 domain. :-)

Which is a good reason why one would want to remove as much potentially
vulnerable code from dom0 as possible, and then deny it the
corresponding permissions via XSM too.

I also find the argument dom0 can do some bad things so we should let
it be able to do all bad things rather specious.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-06 Thread Tian, Kevin
 From: George Dunlap
 Sent: Monday, January 05, 2015 11:50 PM
 
 On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote:
  We're not there in the current design, purely because XenGT has to be
  in dom0 (so it can trivially DoS Xen by rebooting the host).
 
  Can we really decouple dom0 from DoS Xen? I know there's on-going effort
  like PVH Dom0, however there are lots of trickiness in Dom0 which can
  put the platform into a bad state. One example is ACPI. All the platform
  details are encapsulated in AML language, and only dom0 knows how to
  handle ACPI events. Unless Xen has another parser to guard all possible
  resources which might be touched thru ACPI, a tampered dom0 has many
  way to break out. But that'd be very challenging and complex.
 
  If we can't containerize Dom0's behavior completely, I would think dom0
  and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't
  make things worse.
 
 The question here is, If a malicious guest can manage to break into
 XenGT, what can they do?
 
 If XenGT is running in dom0, then the answer is, At very least, they
 can DoS the host because dom0 is allowed to reboot; they can probably
 do lots of other nasty things as well.
 
 If XenGT is running in its own domain, and can only add IOMMU entries
 for MFNs belonging to XenGT-only VMs, then the answer is, They can
 access other XenGT-enabled VMs, but they cannot shut down the host or
 access non-XenGT VMs.
 
 Slides 8-11 of a presentation I gave
 (http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-s
 ecurity-features)
 can give you a graphical idea of what we're' talking about.
 

I agree we need to make XenGT more isolated following on-going trend from
previous discussion, but regarding to whether Dom0/Xen are in the same security
domain, I don't see my statement is changed w/ above attempts which just try to 
move privileged Xen stuff away from dom0, but all existing Linux 
vulnerabilities 
allow a tampered Dom0 do many evil things with root permission or even tampered 
kernel to DoS Xen (e.g. w/ ACPI). PVH dom0 can help performance... but itself 
alone 
doesn't change the fact that Dom0/Xen are actually in the same security domain. 
:-)

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-06 Thread Tian, Kevin
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: Thursday, December 18, 2014 11:47 PM
 
 Hi,
 
 At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote:
   I'm afraid not.  There's nothing worrying per se in a backend knowing
   the MFNs of the pages -- the worry is that the backend can pass the
   MFNs to hardware.  If the check happens only at lookup time, then XenGT
   can (either through a bug or a security breach) just pass _any_ MFN to
   the GPU for DMA.
  
   But even without considering the security aspects, this model has bugs
   that may be impossible for XenGT itself to even detect.  E.g.:
1. Guest asks its virtual GPU to DMA to a frame of memory;
2. XenGT looks up the GFN-MFN mapping;
3. Guest balloons out the page;
4. Xen allocates the page to a different guest;
5. XenGT passes the MFN to the GPU, which DMAs to it.
  
   Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
   underlying memory and make sure it doesn't get reallocated until XenGT
   is finished with it.
 
  yes, I see your point. Now we can't support ballooning in VM given above
  reason, and refcnt is required to close that gap.
 
  but just to confirm one point. from my understanding whether it's a
  mapping operation doesn't really matter. We can invent an interface
  to get p2m mapping and then increase refcnt. the key is refcnt here.
  when XenGT constructs a shadow GPU page table, it creates a reference
  to guest memory page so the refcnt must be increased. :-)
 
 True. :)  But Xen does need to remember all the refcounts that were
 created (so it can tidy up if the domain crashes).  If Xen is already
 doing that it might as well do it in the IOMMU tables since that
 solves other problems.

would a refcnt in p2m layer enough so we don't need separate refcnt in both
EPT and IOMMU page table?

 
   [First some hopefully-helpful diagrams to explain my thinking.  I'll
borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the
addresses that devices issue their DMAs in:
 
  what's 'BFN' short for? Bus Frame Number?
 
 Yes, I think so.
 
   If we replace that lookup with a _map_ hypercall, either with Xen
   choosing the BFN (as happens in the PV grant map operation) or with
   the guest choosing an unused address (as happens in the HVM/PVH
   grant map operation), then:
- the only extra code in XenGT itself is that you need to unmap
  when you change the GTT;
- Xen can track and control exactly which MFNs XenGT/the GPU can
 access;
- running XenGT in a driver domain or PVH dom0 ought to work; and
- we fix the race condition I described above.
 
  ok, I see your point here. It does sound like a better design to meet
  Xen hypervisor's security requirement and can also work with PVH
  Dom0 or driver domain. Previously even when we said a MFN is
  required, it's actually a BFN due to IOMMU existence, and it works
  just because we have a 1:1 identity mapping in-place. And by finding
  a BFN
 
  some follow-up think here:
 
  - one extra unmap call will have some performance impact, especially
  for media processing workloads where GPU page table modifications
  are hot. but suppose this can be optimized with batch request
 
 Yep.  In general I'd hope that the extra overhead of unmap is small
 compared with the trap + emulate + ioreq + schedule that's just
 happened.  Though I know that IOTLB shootdowns are potentially rather
 expensive right now so it might want some measurement.

yes, that's the hard part requiring experiments to find a good balance
between complexity and performance. IOMMU page table is not designed 
with same frequent modifications as CPU/GPU page tables, but following
above trend make them connected. Another option might be reserve a big
enough BFNs to cover all available guest memory at boot time, so to
eliminate run-time modification overhead.

 
  - is there existing _map_ call for this purpose per your knowledge, or
  a new one is required? If the latter, what's the additional logic to be
  implemented there?
 
 For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do
 what you need, I think.  For PV, I think we probably need a new map
 operation with sensible semantics.  My inclination would be to have it
 follow the grant-map semantics (i.e. caller supplies domid + gfn,
 hypervisor supplies BFN and success/failure code).

setup mapping is not a big problem. it's more about finding available BFNs
in a way not conflicting with other usages e.g. memory hotplug, ballooning
(well for this I'm not sure now whether it's only for existing gfns from other
thread...)

 
 Malcolm might have opinions about this -- it starts looking like the
 sort of PV IOMMU interface he's suggested before.

we'd like to hear Malcolm's suggestion here.

 
  - when you say _map_, do you expect this mapped into dom0's virtual
  address space, or just guest physical space?
 
 For PVH, I mean into guest physical address space (and iommu tables,
 since those 

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-05 Thread George Dunlap
On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote:
 We're not there in the current design, purely because XenGT has to be
 in dom0 (so it can trivially DoS Xen by rebooting the host).

 Can we really decouple dom0 from DoS Xen? I know there's on-going effort
 like PVH Dom0, however there are lots of trickiness in Dom0 which can
 put the platform into a bad state. One example is ACPI. All the platform
 details are encapsulated in AML language, and only dom0 knows how to
 handle ACPI events. Unless Xen has another parser to guard all possible
 resources which might be touched thru ACPI, a tampered dom0 has many
 way to break out. But that'd be very challenging and complex.

 If we can't containerize Dom0's behavior completely, I would think dom0
 and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't
 make things worse.

The question here is, If a malicious guest can manage to break into
XenGT, what can they do?

If XenGT is running in dom0, then the answer is, At very least, they
can DoS the host because dom0 is allowed to reboot; they can probably
do lots of other nasty things as well.

If XenGT is running in its own domain, and can only add IOMMU entries
for MFNs belonging to XenGT-only VMs, then the answer is, They can
access other XenGT-enabled VMs, but they cannot shut down the host or
access non-XenGT VMs.

Slides 8-11 of a presentation I gave
(http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-security-features)
can give you a graphical idea of what we're' talking about.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-18 Thread Tim Deegan
Hi, 

At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote:
  I'm afraid not.  There's nothing worrying per se in a backend knowing
  the MFNs of the pages -- the worry is that the backend can pass the
  MFNs to hardware.  If the check happens only at lookup time, then XenGT
  can (either through a bug or a security breach) just pass _any_ MFN to
  the GPU for DMA.
  
  But even without considering the security aspects, this model has bugs
  that may be impossible for XenGT itself to even detect.  E.g.:
   1. Guest asks its virtual GPU to DMA to a frame of memory;
   2. XenGT looks up the GFN-MFN mapping;
   3. Guest balloons out the page;
   4. Xen allocates the page to a different guest;
   5. XenGT passes the MFN to the GPU, which DMAs to it.
  
  Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
  underlying memory and make sure it doesn't get reallocated until XenGT
  is finished with it.
 
 yes, I see your point. Now we can't support ballooning in VM given above
 reason, and refcnt is required to close that gap.
 
 but just to confirm one point. from my understanding whether it's a 
 mapping operation doesn't really matter. We can invent an interface
 to get p2m mapping and then increase refcnt. the key is refcnt here.
 when XenGT constructs a shadow GPU page table, it creates a reference
 to guest memory page so the refcnt must be increased. :-)

True. :)  But Xen does need to remember all the refcounts that were
created (so it can tidy up if the domain crashes).  If Xen is already
doing that it might as well do it in the IOMMU tables since that
solves other problems.

  [First some hopefully-helpful diagrams to explain my thinking.  I'll
   borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the
   addresses that devices issue their DMAs in:
 
 what's 'BFN' short for? Bus Frame Number?

Yes, I think so.

  If we replace that lookup with a _map_ hypercall, either with Xen
  choosing the BFN (as happens in the PV grant map operation) or with
  the guest choosing an unused address (as happens in the HVM/PVH
  grant map operation), then:
   - the only extra code in XenGT itself is that you need to unmap
 when you change the GTT;
   - Xen can track and control exactly which MFNs XenGT/the GPU can access;
   - running XenGT in a driver domain or PVH dom0 ought to work; and
   - we fix the race condition I described above.
 
 ok, I see your point here. It does sound like a better design to meet
 Xen hypervisor's security requirement and can also work with PVH
 Dom0 or driver domain. Previously even when we said a MFN is
 required, it's actually a BFN due to IOMMU existence, and it works
 just because we have a 1:1 identity mapping in-place. And by finding
 a BFN
 
 some follow-up think here:
 
 - one extra unmap call will have some performance impact, especially
 for media processing workloads where GPU page table modifications
 are hot. but suppose this can be optimized with batch request

Yep.  In general I'd hope that the extra overhead of unmap is small
compared with the trap + emulate + ioreq + schedule that's just
happened.  Though I know that IOTLB shootdowns are potentially rather
expensive right now so it might want some measurement.

 - is there existing _map_ call for this purpose per your knowledge, or
 a new one is required? If the latter, what's the additional logic to be
 implemented there?

For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do
what you need, I think.  For PV, I think we probably need a new map
operation with sensible semantics.  My inclination would be to have it
follow the grant-map semantics (i.e. caller supplies domid + gfn,
hypervisor supplies BFN and success/failure code). 

Malcolm might have opinions about this -- it starts looking like the
sort of PV IOMMU interface he's suggested before. 

 - when you say _map_, do you expect this mapped into dom0's virtual
 address space, or just guest physical space?

For PVH, I mean into guest physical address space (and iommu tables,
since those are the same).  For PV, I mean just the IOMMU tables --
since the guest controls its own PFN space entirely there's nothing
Xen can to map things into it.

 - how is BFN or unused address (what do you mean by address here?)
 allocated? does it need present in guest physical memory at boot time,
 or just finding some holes?

That's really a question for the xen maintainers in the linux kernel.
I presume that whatever bookkeeping they currently do for grant-mapped
memory would suffice here just as well.

 - graphics memory size could be large. starting from BDW, there'll
 be 64bit page table format. Do you see any limitation here on finding
 BFN or address?

Not really.  The IOMMU tables are also 64-bit so there must be enough
addresses to map all of RAM.  There shouldn't be any need for these
mappings to be _contiguous_, btw.  You just need to have one free
address for each mapping.  Again, following how grant maps work, I'd
imagine that 

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-18 Thread Andrew Cooper
On 18/12/14 16:08, Tim Deegan wrote:
 yep. Just curious, I thought stubdomain is not popularly used. typical
  case is to have qemu in dom0. is this still true? :-)
 Some do and some don't. :)  High-security distros like Qubes and
 XenClient do.  You can enable it in xl config files pretty easily.
 IIRC the xapi toolstack doesn't use it, but XenServer uses privilege
 separation to isolate the qemu processes in dom0.


We are looking into stubdomains as part of future architectural roadmap,
but as identified, there is a lot of toolstack plumbing required before
this be feasible to put into XenServer.

Our privilege separate in qemu is a stopgap measure which we would like
to replace in due course.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-15 Thread Jan Beulich
 On 15.12.14 at 07:25, kevin.t...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  On 12.12.14 at 08:24, kevin.t...@intel.com wrote:
  - how is BFN or unused address (what do you mean by address here?)
  allocated? does it need present in guest physical memory at boot time,
  or just finding some holes?
 
 Fitting this into holes should be fine.
 
 this is an interesting open to be further discussed. Here we need consider 
 the extreme case, i.e. a 64bit GPU page table can legitimately use up all 
 the system memory allocates to that VM, and considering dozens of VMs, 
 it means we need reserve a very large hole. 

Oh, it's guest RAM you want mapped, not frame buffer space. But still
you're never going to have to map more than the total amount of host
RAM, and (with Linux) we already assume everything can be mapped
through the 1:1 mapping. I.e. the only collision would be with excessive
PFN reservations for ballooning purposes.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-15 Thread Jan Beulich
 On 15.12.14 at 10:05, kevin.t...@intel.com wrote:
 yes, definitely host RAM is the upper limit, and what I'm concerning here
 is how to reserve (at boot time) or allocate (on-demand) such large PFN
 resource, w/o collision with other PFN reservation usage (ballooning
 should be fine since it's operating existing RAM ranges in dom0 e820
 table).

I don't think ballooning is restricted to the regions named RAM in
Dom0's E820 table (at least it shouldn't be, and wasn't in the
classic Xen kernels).

 Maybe we can reserve a big-enough reserved region in dom0's 
 e820 table at boot time, for all PFN reservation usages, and then allocate
 them on-demand for specific usages?

What would big enough here mean (i.e. how would one determine
the needed size up front)? Plus any form of allocation would need a
reasonable approach to avoid fragmentation. And anyway I'm not
getting what position you're on: Do you expect to be able to fit
everything that needs mapping into the available mapping space (as
your reply above seems to imply) or do you think there won't be
enough mapping space (as earlier replies of yours appeared to
indicate)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-15 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Monday, December 15, 2014 5:23 PM
 
  On 15.12.14 at 10:05, kevin.t...@intel.com wrote:
  yes, definitely host RAM is the upper limit, and what I'm concerning here
  is how to reserve (at boot time) or allocate (on-demand) such large PFN
  resource, w/o collision with other PFN reservation usage (ballooning
  should be fine since it's operating existing RAM ranges in dom0 e820
  table).
 
 I don't think ballooning is restricted to the regions named RAM in
 Dom0's E820 table (at least it shouldn't be, and wasn't in the
 classic Xen kernels).

well, nice to know that.

 
  Maybe we can reserve a big-enough reserved region in dom0's
  e820 table at boot time, for all PFN reservation usages, and then allocate
  them on-demand for specific usages?
 
 What would big enough here mean (i.e. how would one determine
 the needed size up front)? Plus any form of allocation would need a
 reasonable approach to avoid fragmentation. And anyway I'm not
 getting what position you're on: Do you expect to be able to fit
 everything that needs mapping into the available mapping space (as
 your reply above seems to imply) or do you think there won't be
 enough mapping space (as earlier replies of yours appeared to
 indicate)?
 

I expect to have everything mapped into the available mapping space,
and is asking for suggestions what's the best way to find and reserve
available PFNs in a way not conflicting with other usages (either
virtualization features like ballooning that you mentioned, or bare 
metal features like PCI hotplug or memory hotplug).

Tanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-15 Thread Jan Beulich
 On 15.12.14 at 16:22, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 15 Dec 2014, Jan Beulich wrote:
  On 15.12.14 at 10:05, kevin.t...@intel.com wrote:
  yes, definitely host RAM is the upper limit, and what I'm concerning here
  is how to reserve (at boot time) or allocate (on-demand) such large PFN
  resource, w/o collision with other PFN reservation usage (ballooning
  should be fine since it's operating existing RAM ranges in dom0 e820
  table).
 
 I don't think ballooning is restricted to the regions named RAM in
 Dom0's E820 table (at least it shouldn't be, and wasn't in the
 classic Xen kernels).
 
 Could you please elaborate more on this? It seems counter-intuitive at best.

I don't see what's counter-intuitive here. How can the hypervisor
(Dom0) or tool stack (DomU) know what ballooning intentions a
guest kernel may have? It's solely the guest kernel's responsibility
to make sure its ballooning activities don't collide with anything
else address-wise.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-15 Thread Stefano Stabellini
On Mon, 15 Dec 2014, Jan Beulich wrote:
  On 15.12.14 at 16:22, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 15 Dec 2014, Jan Beulich wrote:
   On 15.12.14 at 10:05, kevin.t...@intel.com wrote:
   yes, definitely host RAM is the upper limit, and what I'm concerning here
   is how to reserve (at boot time) or allocate (on-demand) such large PFN
   resource, w/o collision with other PFN reservation usage (ballooning
   should be fine since it's operating existing RAM ranges in dom0 e820
   table).
  
  I don't think ballooning is restricted to the regions named RAM in
  Dom0's E820 table (at least it shouldn't be, and wasn't in the
  classic Xen kernels).
  
  Could you please elaborate more on this? It seems counter-intuitive at best.
 
 I don't see what's counter-intuitive here. How can the hypervisor
 (Dom0) or tool stack (DomU) know what ballooning intentions a
 guest kernel may have?

The hypervisor checks that the memory the guest is giving back is
actually ram, as a consequence the ballooning interface only supports
ram. Do you agree?

Ballooning is restricted to regions named RAM in the e820 table, because
Linux respects e820 in its pfn-mfn mappings. However it is true that
respecting the e820 in dom0 is not part of the interface.


 It's solely the guest kernel's responsibility
 to make sure its ballooning activities don't collide with anything
 else address-wise.

In the sense that it is in the guest kernel's responsibility to use the
interface properly.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-14 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Friday, December 12, 2014 6:54 PM
 
  On 12.12.14 at 08:24, kevin.t...@intel.com wrote:
  - is there existing _map_ call for this purpose per your knowledge, or
  a new one is required? If the latter, what's the additional logic to be
  implemented there?
 
 I think the answer to this depends on whether you want to use
 grants. The goal of using the native driver in the guest (mentioned
 further down) speaks against this, in which case I don't think we
 have an existing interface.

yes, grants don't apply here. 

 
  - when you say _map_, do you expect this mapped into dom0's virtual
  address space, or just guest physical space?
 
 Iiuc you don't care about the memory to be visible to the CPU, all
 you need is it being translated by the IOMMU. In which case the
 input address space for the IOMMU (which is different between PV
 and PVH) is where this needs to be mapped into.

it should be in p2m level, not just in IOMMU. otherwise I'm wondering 
there'll be tricky issues ahead due to inconsistent mapping between EPT 
and IOMMU page table (though a specific attributes like r/w may be 
different from previous split table discussion). 

another reason here. If we just talk about shadow GPU page table, yes
it's used by device only so IOMMU mapping is enough. However we do 
have several other places where we need to map and access guest memory,
e.g. scanning command in a buffer mapped through GPU page table (
currently through remap_domain_mfn_range_in_kernel). 

 
  - how is BFN or unused address (what do you mean by address here?)
  allocated? does it need present in guest physical memory at boot time,
  or just finding some holes?
 
 Fitting this into holes should be fine.

this is an interesting open to be further discussed. Here we need consider 
the extreme case, i.e. a 64bit GPU page table can legitimately use up all 
the system memory allocates to that VM, and considering dozens of VMs, 
it means we need reserve a very large hole. 

I once remember some similar cases requiring grabbing some unmapped
pfns (in grant table?). So wonder whether there's already a clean interface
for such purpose, or we need tweak a new one to allocate unmapped pfns
(but won't conflict with usages like memory hotplug)...

appreciate any suggestion here.

 
  - graphics memory size could be large. starting from BDW, there'll
  be 64bit page table format. Do you see any limitation here on finding
  BFN or address?
 
 I don't think this concern differs much for the different models: As long
 as you don't want the same underlying memory to be accessible by
 more than one guest, the address space requirements ought to be the
 same.

See above.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-11 Thread Tim Deegan
Hi, 

At 01:41 + on 11 Dec (1418258504), Tian, Kevin wrote:
  From: Tim Deegan [mailto:t...@xen.org]
  It is Xen's job to isolate VMs from each other.  As part of that, Xen
  uses the MMU, nested paging, and IOMMUs to control access to RAM.  Any
  software component that can pass a raw MFN to hardware breaks that
  isolation, because Xen has no way of controlling what that component
  can do (including taking over the hypervisor).  This is why I am
  afraid when developers ask for GFN-MFN translation functions.
 
 When I agree Xen's job absolutely, the isolation is also required in different
 layers, regarding to who controls the resource and where the virtualization 
 happens. For example talking about I/O virtualization, Dom0 or driver domain 
 needs to isolate among backend drivers to avoid one backend interfering 
 with another. Xen doesn't know such violation, since it only knows it's Dom0
 wants to access a VM's page.

I'm going to write second reply to this mail in a bit, to talk about
this kind of system-level design.  In this email I'll just talk about
the practical aspects of interfaces and address spaces and IOMMUs.

 btw curious of how worse exposing GFN-MFN translation compared to
 allowing mapping other VM's GFN? If exposing GFN-MFN is under the
 same permission control as mapping, would it avoid your worry here?

I'm afraid not.  There's nothing worrying per se in a backend knowing
the MFNs of the pages -- the worry is that the backend can pass the
MFNs to hardware.  If the check happens only at lookup time, then XenGT
can (either through a bug or a security breach) just pass _any_ MFN to
the GPU for DMA.

But even without considering the security aspects, this model has bugs
that may be impossible for XenGT itself to even detect.  E.g.:
 1. Guest asks its virtual GPU to DMA to a frame of memory;
 2. XenGT looks up the GFN-MFN mapping;
 3. Guest balloons out the page;
 4. Xen allocates the page to a different guest;
 5. XenGT passes the MFN to the GPU, which DMAs to it.

Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
underlying memory and make sure it doesn't get reallocated until XenGT
is finished with it.

  When the backend component gets a GFN from the guest, it wants an
  address that it can give to the GPU for DMA that will map the right
  memory.  That address must be mapped in the IOMMU tables that the GPU
  will be using, which means the IOMMU tables of the backend domain,
  IIUC[1].  So the hypercall it needs is not give me the MFN that matches
  this GFN but please map this GFN into my IOMMU tables.
 
 Here please map this GFN into my IOMMU tables actually breaks the
 IOMMU isolation. IOMMU is designed for serving DMA requests issued
 by an exclusive VM, so IOMMU page table can restrict that VM's attempts
 strictly.
 
 To map multiple VM's GFNs into one IOMMU table, the 1st thing is to
 avoid GFN conflictions to make it functional. We thought about this approach
 previously, e.g. by reserving highest 3 bits of GFN as VMID, so one IOMMU
 page table can be used to combine multi-VM's page table together. However
 doing so have two limitations:
 
 a) it still requires write-protect guest GPU page table, and maintain a shadow
 GPU page table by translate from real GFN to pseudo GFN (plus VMID), which
 doesn't save any engineering effort in the device model part

Yes -- since there's only one IOMMU context for the whole GPU, the
XenGT backend still has to audit all GPU commands to maintain
isolation between clients.

 b) it breaks the designed isolation intrinsic of IOMMU. In such case, IOMMU
 can't isolate multiple VMs by itself, since a DMA request can target any 
 pseudo GFN if valid in the page table. We have to rely on the audit in the 
 backend component in Dom0 to ensure the isolation.

Yep.

 c) this introduces tricky logic in IOMMU driver to handle such non-standard
 multiplexed page table style. 
 
 w/o a SR-IOV implementation (so each VF has its own IOMMU page table),
 I don't see using IOMMU can help isolation here.

If I've understood your argument correctly, it basically comes down
to It would be extra work for no benefit, because XenGT still has to
do all the work of isolating GPU clients from each other.  It's true
that XenGT still has to isolate its clients, but there are other
benefits.

The main one, from my point of view as a Xen maintainer, is that it
allows Xen to constrain XenGT itself, in the case where bugs or
security breaches mean that XenGT tries to access memory it shouldn't.
More about that in my other reply.  I'll talk about the rest below.

 yes, this is a good feedback we didn't think about before. So far the reason
 why XenGT can work is because we use default IOMMU setting which set
 up a 1:1 r/w mapping for all possible RAM, so when GPU hits a MFN thru
 shadow GPU page table, IOMMU is essentially bypassed. However like
 you said, if IOMMU page table is restricted to dom0's memory, or is not
 1:1 identity mapping, XenGT will be 

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-11 Thread Tim Deegan
Hi, again. :)

As promised, I'm going to talk about more abstract design
considerations.  Thi will be a lot less concrete than in the other
email, and about a larger range of things.  Some of of them may not be
really desirable - or even possible.

[ TL;DR: read the other reply with the practical suggestions in it :) ]

I'm talking from the point of view of a hypervisor maintainer, looking
at introducing this new XenGT component and thinking about what
security properties we would like the _system_ to have once XenGT is
introduced.  I'm going to lay out a series of broadly increasing
levels of security goodness and talk about what we'd need to do to get
there.

For the purposes of this discussion, Xen does not _trust_ XenGT.  By
that I mean that Xen can't rely on the correctness/integrity of XenGT
itself to maintain system security.  Now, we can decide that for some
properties we _will_ choose to trust XenGT, but the default is to
assume that XenGT could be compromised or buggy.  (This is not
intended as a slur on XenGT, btw -- this is how we reason about device
driver domains, qemu-dm and other components.  There will be bugs in
any component, and we're designing the system to minimise the effect
of those bugs.)

OK.  Properties we would like to have:

LEVEL 0: Protect Xen itself from XenGT
--

Bugs in XenGT should not be able to crash he host, and a compromised
XenGT should not be able to take over the hypervisor

We're not there in the current design, purely because XenGT has to be
in dom0 (so it can trivially DoS Xen by rebooting the host).

But it doesn't seem too hard: as soon as we can run XenGT in a driver
domain, and with IOMMU tables that restrict the GPU from writing to Xen's
datastructures, we'll have this property.

[BTW, this whole discussion assumes that the GPU has no 'back door'
 access to issue DMA that is not translated by the IOMMU.  I have heard
 rumours in the past that such things exist. :) If the GPU can issue
 untranslated DMA, then whetever controls it can take over the entire
 system, and so we can't make _any_ security guarantees about it.]


LEVEL 1: Isolate XenGT's clients from other VMs
---

In other words we partition the machine into VMs XenGT can touch
(i.e. its clients) and those it can't.  Then a malicious client that
compromises XenGT only gains access to other VMs that share a GPU with
it.  That means we can deploy XenGT for some VMs without increasing
the risk to other tenants.

Again we're not there yet, but I think the design I was talking about
in my other email would do it: if XenGT must map all the memory it
wants to let the GPU DMA to, and Xen's policy is to deny mappings for
non-client-vm memory, then VMs that aren't using XenGT are protected.


LEVEL 2: Isolate XenGT's clients from each other


This is trickier, as you pointed out.  We could:

a) Decide that we will trust XenGT to provide this property.  After
   all, that's its main purpose!  This is how we treat other shared
   backends: if a NIC device driver domain is compromised, the
   attacker controls the network traffic for all its frontends.
   OTOH, we don't trust qemu in that way -- instead we use stub domains 
   and IS_PRIV_FOR to enforce isolation.

b) Move all of XenGT into Xen.  This is just defining the problem away
   and would probably do more harm than good - after all, keeping it
   separate has other advantages.

c) Use privilege separation: break XenGT into parts, isolated from each
   other, with the principle of least privilege applied to them.  E.g.
   - GPU emulation could be in a per-client component that doesn't
 share state with the other clients' emulators;
   - Shadowing GTTs and auditing GPU commands could move into Xen,
 with a clean interface to the emulation parts.
   That way, even if a client VM can exploit a bug in the emulator,
   it can't affect other clients because it can't see their emulator
   state, and it can't bypass the safety rules because they're
   enforced by Xen.

   When I talked about privilege separation before I was suggesting
   something like this, but without moving anything into Xen -- e.g.
   the device-emulation code for each client could be in a per-client,
   non-root process.  The code that audits and issues commands to the
   GPU would be in a separate process, which is allowed to make
   hypercalls, and which does not trust the emulator processes.
   My apologies if you're already doing this -- I know XenGT has some
   components in a kernel driver and some elsewhere but I haven't
   looked at the details.


LEVEL 3: Isolate XenGT's clients from XenGT itself
--

XenGT should not be able to access parts of its client VMs that they
have not given it permission to.  E.g. XenGT should not be able to
read a client VM's crypto keys unless it displays them on the

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-11 Thread Tian, Kevin
 From: Tim Deegan
 Sent: Friday, December 12, 2014 12:47 AM
 
 Hi,
 
 At 01:41 + on 11 Dec (1418258504), Tian, Kevin wrote:
   From: Tim Deegan [mailto:t...@xen.org]
   It is Xen's job to isolate VMs from each other.  As part of that, Xen
   uses the MMU, nested paging, and IOMMUs to control access to RAM.
 Any
   software component that can pass a raw MFN to hardware breaks that
   isolation, because Xen has no way of controlling what that component
   can do (including taking over the hypervisor).  This is why I am
   afraid when developers ask for GFN-MFN translation functions.
 
  When I agree Xen's job absolutely, the isolation is also required in 
  different
  layers, regarding to who controls the resource and where the virtualization
  happens. For example talking about I/O virtualization, Dom0 or driver
 domain
  needs to isolate among backend drivers to avoid one backend interfering
  with another. Xen doesn't know such violation, since it only knows it's Dom0
  wants to access a VM's page.
 
 I'm going to write second reply to this mail in a bit, to talk about
 this kind of system-level design.  In this email I'll just talk about
 the practical aspects of interfaces and address spaces and IOMMUs.

sure. I've replied to another design mail before seeing this. my bad outlook 
rule didn't push this mail to my eye, and fortunately I dig it out when 
wondering Hi, again in your another mail. :-)


 
  btw curious of how worse exposing GFN-MFN translation compared to
  allowing mapping other VM's GFN? If exposing GFN-MFN is under the
  same permission control as mapping, would it avoid your worry here?
 
 I'm afraid not.  There's nothing worrying per se in a backend knowing
 the MFNs of the pages -- the worry is that the backend can pass the
 MFNs to hardware.  If the check happens only at lookup time, then XenGT
 can (either through a bug or a security breach) just pass _any_ MFN to
 the GPU for DMA.
 
 But even without considering the security aspects, this model has bugs
 that may be impossible for XenGT itself to even detect.  E.g.:
  1. Guest asks its virtual GPU to DMA to a frame of memory;
  2. XenGT looks up the GFN-MFN mapping;
  3. Guest balloons out the page;
  4. Xen allocates the page to a different guest;
  5. XenGT passes the MFN to the GPU, which DMAs to it.
 
 Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
 underlying memory and make sure it doesn't get reallocated until XenGT
 is finished with it.

yes, I see your point. Now we can't support ballooning in VM given above
reason, and refcnt is required to close that gap.

but just to confirm one point. from my understanding whether it's a 
mapping operation doesn't really matter. We can invent an interface
to get p2m mapping and then increase refcnt. the key is refcnt here.
when XenGT constructs a shadow GPU page table, it creates a reference
to guest memory page so the refcnt must be increased. :-)

 
   When the backend component gets a GFN from the guest, it wants an
   address that it can give to the GPU for DMA that will map the right
   memory.  That address must be mapped in the IOMMU tables that the
 GPU
   will be using, which means the IOMMU tables of the backend domain,
   IIUC[1].  So the hypercall it needs is not give me the MFN that matches
   this GFN but please map this GFN into my IOMMU tables.
 
  Here please map this GFN into my IOMMU tables actually breaks the
  IOMMU isolation. IOMMU is designed for serving DMA requests issued
  by an exclusive VM, so IOMMU page table can restrict that VM's attempts
  strictly.
 
  To map multiple VM's GFNs into one IOMMU table, the 1st thing is to
  avoid GFN conflictions to make it functional. We thought about this approach
  previously, e.g. by reserving highest 3 bits of GFN as VMID, so one IOMMU
  page table can be used to combine multi-VM's page table together. However
  doing so have two limitations:
 
  a) it still requires write-protect guest GPU page table, and maintain a
 shadow
  GPU page table by translate from real GFN to pseudo GFN (plus VMID),
 which
  doesn't save any engineering effort in the device model part
 
 Yes -- since there's only one IOMMU context for the whole GPU, the
 XenGT backend still has to audit all GPU commands to maintain
 isolation between clients.
 
  b) it breaks the designed isolation intrinsic of IOMMU. In such case, IOMMU
  can't isolate multiple VMs by itself, since a DMA request can target any
  pseudo GFN if valid in the page table. We have to rely on the audit in the
  backend component in Dom0 to ensure the isolation.
 
 Yep.
 
  c) this introduces tricky logic in IOMMU driver to handle such non-standard
  multiplexed page table style.
 
  w/o a SR-IOV implementation (so each VF has its own IOMMU page table),
  I don't see using IOMMU can help isolation here.
 
 If I've understood your argument correctly, it basically comes down
 to It would be extra work for no benefit, because XenGT still has to
 do all the 

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-11 Thread Tian, Kevin
 From: Tian, Kevin
 Sent: Friday, December 12, 2014 2:30 PM
 
  Conclusion
  --
 
  That's enough rambling from me -- time to come back down to earth.
  While I think it's useful to think about all these things, we don't
  want to get carried away. :)  And as I said, for some things we can
  decide to trust XenGT to provide them, as long as we're clear about
  what that means.
 
  I think that a reasonable minimum standard to expect is to enforce
  levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3.  And I
  think we can do that without needing any huge engineering effort;
  as I said, I think that's covered in my earlier reply.
 
 
 I agree the conclusion that minimum standard to expect is to enforce
 levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3, except the
 concern whether PVH Dom0 is a hard requirement or not. Having
 said that, I'm happy to discuss technical detail in another thread on
 how to support PVH Dom0.
 

So after going through another mail, now I agree both level 0/1 can't
be enforced. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Jan Beulich
 On 10.12.14 at 02:07, kevin.t...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, December 09, 2014 6:50 PM
 
  On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote:
  On 12/9/2014 6:19 PM, Paul Durrant wrote:
  I think use of an raw mfn value currently works only because dom0 is using
 a
  1:1 IOMMU mapping scheme. Is my understanding correct, or do you really
 need
  raw mfn values?
  Thanks for your quick response, Paul.
  Well, not exactly for this case. :)
  In XenGT, our need to translate gfn to mfn is for GPU's page table,
  which contains the translation between graphic address and the memory
  address. This page table is maintained by GPU drivers, and our service
  domain need to have a method to translate the guest physical addresses
  written by the vGPU into host physical ones.
  We do not use IOMMU in XenGT and therefore this translation may not
  necessarily be a 1:1 mapping.
 
 Hmm, that suggests you indeed need raw MFNs, which in turn seems
 problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation
 layer). But while you don't use the IOMMU yourself, I suppose the GPU
 accesses still don't bypass the IOMMU? In which case all you'd need
 returned is a frame number that guarantees that after IOMMU
 translation it refers to the correct MFN, i.e. still allowing for your Dom0
 driver to simply set aside a part of its PFN space, asking Xen to
 (IOMMU-)map the necessary guest frames into there.
 
 
 No. What we require is the raw MFNs. One IOMMU device entry can't
 point to multiple VM's page tables, so that's why XenGT needs to use
 software shadow GPU page table to implement the sharing. Note it's
 not for dom0 to access the MFN. It's for dom0 to setup the correct
 shadow GPU page table, so a VM can access the graphics memory
 in a controlled way.

So what's the translation flow here: driver - GPU - IOMMU -
hardware or driver - IOMMU - GPU - hardware? Or do things get
set up for the GPU to bypass the IOMMU altogether?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, December 10, 2014 4:39 PM
 
  On 10.12.14 at 02:07, kevin.t...@intel.com wrote:
   From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Tuesday, December 09, 2014 6:50 PM
 
   On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote:
   On 12/9/2014 6:19 PM, Paul Durrant wrote:
   I think use of an raw mfn value currently works only because dom0 is
 using
  a
   1:1 IOMMU mapping scheme. Is my understanding correct, or do you
 really
  need
   raw mfn values?
   Thanks for your quick response, Paul.
   Well, not exactly for this case. :)
   In XenGT, our need to translate gfn to mfn is for GPU's page table,
   which contains the translation between graphic address and the memory
   address. This page table is maintained by GPU drivers, and our service
   domain need to have a method to translate the guest physical addresses
   written by the vGPU into host physical ones.
   We do not use IOMMU in XenGT and therefore this translation may not
   necessarily be a 1:1 mapping.
 
  Hmm, that suggests you indeed need raw MFNs, which in turn seems
  problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation
  layer). But while you don't use the IOMMU yourself, I suppose the GPU
  accesses still don't bypass the IOMMU? In which case all you'd need
  returned is a frame number that guarantees that after IOMMU
  translation it refers to the correct MFN, i.e. still allowing for your Dom0
  driver to simply set aside a part of its PFN space, asking Xen to
  (IOMMU-)map the necessary guest frames into there.
 
 
  No. What we require is the raw MFNs. One IOMMU device entry can't
  point to multiple VM's page tables, so that's why XenGT needs to use
  software shadow GPU page table to implement the sharing. Note it's
  not for dom0 to access the MFN. It's for dom0 to setup the correct
  shadow GPU page table, so a VM can access the graphics memory
  in a controlled way.
 
 So what's the translation flow here: driver - GPU - IOMMU -
 hardware or driver - IOMMU - GPU - hardware? Or do things get
 set up for the GPU to bypass the IOMMU altogether?
 

two translation paths in assigned case:

1. [direct CPU access from VM], with partitioned PCI aperture
resource, every VM can access a portion of PCI aperture directly.

- CPU page table/EPT: CPU virtual address-PCI aperture
- PCI aperture - bar base = Graphics Memory Address (GMA)
- GPU page table: GMA - GPA (as programmed by guest)
- IOMMU: GPA - MPA

2. [GPU access through GPU command operands], with GPU scheduling,
every VM's command buffer will be fetched by GPU in a time-shared
manner.

- GPU page table: GMA-GPA
- IOMMU: GPA-MPA

In our case, IOMMU is setup with 1:1 identity table for dom0. So 
when GPU may access GPAs from different VMs, we can't count on
IOMMU which can only serve one mapping for one device (unless 
we have SR-IOV). 

That's why we need shadow GPU page table in dom0, and need a
p2m query call to translate from GPA - MPA:

- shadow GPU page table: GMA-MPA
- IOMMU: MPA-MPA (for dom0)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tian, Kevin
 From: Tian, Kevin
 Sent: Wednesday, December 10, 2014 4:48 PM
 
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, December 10, 2014 4:39 PM
 
   On 10.12.14 at 02:07, kevin.t...@intel.com wrote:
From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: Tuesday, December 09, 2014 6:50 PM
  
On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote:
On 12/9/2014 6:19 PM, Paul Durrant wrote:
I think use of an raw mfn value currently works only because dom0 is
  using
   a
1:1 IOMMU mapping scheme. Is my understanding correct, or do you
  really
   need
raw mfn values?
Thanks for your quick response, Paul.
Well, not exactly for this case. :)
In XenGT, our need to translate gfn to mfn is for GPU's page table,
which contains the translation between graphic address and the
 memory
address. This page table is maintained by GPU drivers, and our service
domain need to have a method to translate the guest physical
 addresses
written by the vGPU into host physical ones.
We do not use IOMMU in XenGT and therefore this translation may not
necessarily be a 1:1 mapping.
  
   Hmm, that suggests you indeed need raw MFNs, which in turn seems
   problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation
   layer). But while you don't use the IOMMU yourself, I suppose the GPU
   accesses still don't bypass the IOMMU? In which case all you'd need
   returned is a frame number that guarantees that after IOMMU
   translation it refers to the correct MFN, i.e. still allowing for your 
   Dom0
   driver to simply set aside a part of its PFN space, asking Xen to
   (IOMMU-)map the necessary guest frames into there.
  
  
   No. What we require is the raw MFNs. One IOMMU device entry can't
   point to multiple VM's page tables, so that's why XenGT needs to use
   software shadow GPU page table to implement the sharing. Note it's
   not for dom0 to access the MFN. It's for dom0 to setup the correct
   shadow GPU page table, so a VM can access the graphics memory
   in a controlled way.
 
  So what's the translation flow here: driver - GPU - IOMMU -
  hardware or driver - IOMMU - GPU - hardware? Or do things get
  set up for the GPU to bypass the IOMMU altogether?
 
 
 two translation paths in assigned case:
 
 1. [direct CPU access from VM], with partitioned PCI aperture
 resource, every VM can access a portion of PCI aperture directly.

sorry the above description is for XenGT shared case, and the 
below translation is for VT-d assigned case. Just put there to indicate
the necessity of same translation path in XenGT.

 
 - CPU page table/EPT: CPU virtual address-PCI aperture
 - PCI aperture - bar base = Graphics Memory Address (GMA)
 - GPU page table: GMA - GPA (as programmed by guest)
 - IOMMU: GPA - MPA
 
 2. [GPU access through GPU command operands], with GPU scheduling,
 every VM's command buffer will be fetched by GPU in a time-shared
 manner.
 
 - GPU page table: GMA-GPA
 - IOMMU: GPA-MPA
 
 In our case, IOMMU is setup with 1:1 identity table for dom0. So
 when GPU may access GPAs from different VMs, we can't count on
 IOMMU which can only serve one mapping for one device (unless
 we have SR-IOV).
 
 That's why we need shadow GPU page table in dom0, and need a
 p2m query call to translate from GPA - MPA:
 
 - shadow GPU page table: GMA-MPA
 - IOMMU: MPA-MPA (for dom0)
 
 Thanks
 Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Jan Beulich
 On 10.12.14 at 09:47, kevin.t...@intel.com wrote:
 two translation paths in assigned case:
 
 1. [direct CPU access from VM], with partitioned PCI aperture
 resource, every VM can access a portion of PCI aperture directly.
 
 - CPU page table/EPT: CPU virtual address-PCI aperture
 - PCI aperture - bar base = Graphics Memory Address (GMA)
 - GPU page table: GMA - GPA (as programmed by guest)
 - IOMMU: GPA - MPA
 
 2. [GPU access through GPU command operands], with GPU scheduling,
 every VM's command buffer will be fetched by GPU in a time-shared
 manner.
 
 - GPU page table: GMA-GPA
 - IOMMU: GPA-MPA
 
 In our case, IOMMU is setup with 1:1 identity table for dom0. So 
 when GPU may access GPAs from different VMs, we can't count on
 IOMMU which can only serve one mapping for one device (unless 
 we have SR-IOV). 
 
 That's why we need shadow GPU page table in dom0, and need a
 p2m query call to translate from GPA - MPA:
 
 - shadow GPU page table: GMA-MPA
 - IOMMU: MPA-MPA (for dom0)

I still can't see why the Dom0 translation has to remain 1:1, i.e.
why Xen couldn't return some arbitrary GPA for the query in
question here, setting up a suitable GPA-MPA translation. (I put
arbitrary in quotes because this of course must not conflict with
GPAs already or possibly in use by Dom0.) And I can only stress
again that you shouldn't leave out PVH (where the IOMMU already
isn't set up with all 1:1 mappings) from these considerations.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, December 10, 2014 5:17 PM
 
  On 10.12.14 at 09:47, kevin.t...@intel.com wrote:
  two translation paths in assigned case:
 
  1. [direct CPU access from VM], with partitioned PCI aperture
  resource, every VM can access a portion of PCI aperture directly.
 
  - CPU page table/EPT: CPU virtual address-PCI aperture
  - PCI aperture - bar base = Graphics Memory Address (GMA)
  - GPU page table: GMA - GPA (as programmed by guest)
  - IOMMU: GPA - MPA
 
  2. [GPU access through GPU command operands], with GPU scheduling,
  every VM's command buffer will be fetched by GPU in a time-shared
  manner.
 
  - GPU page table: GMA-GPA
  - IOMMU: GPA-MPA
 
  In our case, IOMMU is setup with 1:1 identity table for dom0. So
  when GPU may access GPAs from different VMs, we can't count on
  IOMMU which can only serve one mapping for one device (unless
  we have SR-IOV).
 
  That's why we need shadow GPU page table in dom0, and need a
  p2m query call to translate from GPA - MPA:
 
  - shadow GPU page table: GMA-MPA
  - IOMMU: MPA-MPA (for dom0)
 
 I still can't see why the Dom0 translation has to remain 1:1, i.e.
 why Xen couldn't return some arbitrary GPA for the query in
 question here, setting up a suitable GPA-MPA translation. (I put
 arbitrary in quotes because this of course must not conflict with
 GPAs already or possibly in use by Dom0.) And I can only stress
 again that you shouldn't leave out PVH (where the IOMMU already
 isn't set up with all 1:1 mappings) from these considerations.
 

It's interesting that you think IOMMU can be used in such situation.

what do you mean by arbitrary GPA here? and It's not just about 
conflicting with Dom0's GPA, it's about confliction in all VM's GPAs 
when you hosting them through one IOMMU page table, and there's 
no way to prevent this definitely since GPAs are picked by VMs 
themselves.

I don't think we can support PVH here if IOMMU is not 1:1 mapping.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Jan Beulich
 On 10.12.14 at 02:14, kevin.t...@intel.com wrote:
  From: Tim Deegan [mailto:t...@xen.org]
 It's been suggested before that we should revive this hypercall, and I
 don't think it's a good idea.  Whenever a domain needs to know the
 actual MFN of another domain's memory it's usually because the
 security model is problematic.  In particular, finding the MFN is
 usually followed by a brute-force mapping from a dom0 process, or by
 passing the MFN to a device for unprotected DMA.
 
 In our case it's not because the security model is problematic. It's 
 because GPU virtualization is done in Dom0 while the memory virtualization
 is done in hypervisor.

Which by itself is a questionable design decision.

 We need a means to query GPFN-MFN so we can
 setup shadow GPU page table in Dom0 correctly, for a VM.
 
 
 These days DMA access should be protected by IOMMUs, or else
 the device drivers (and associated tools) are effectively inside the
 hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
 presumably present on anything new enough to run XenGT?).
 
 yes, IOMMU protect DMA accesses in a device-agnostic way. But in
 our case, IOMMU can't be used because it's only for exclusively
 assigned case, as I replied in another mail. And to reduce the hypervisor
 TCB, we put device model in Dom0 which is why a interface is required
 to connect p2m information.
 
 
 So I think the interface we need here is a please-map-this-gfn one,
 like the existing grant-table ops (which already do what you need by
 returning an address suitable for DMA).  If adding a grant entry for
 every frame of the framebuffer within the guest is too much, maybe we
 can make a new interface for the guest to grant access to larger areas.
 
 A please-map-this-gfn interface assumes the logic behind lies in Xen
 hypervisor, e.g. managing CPU page table or IOMMU entry. However
 here the management of GPU page table is in Dom0, and what we
 want is a please-tell-me-mfn-for-a-gpfn interface, so we can translate
 from gpfn in guest GPU PTE to a mfn in shadow GPU PTE. 

As said before, what needs to be put in the GPU PTE depends on
what the subsequent IOMMU translation would do to the address.
It's not a hard requirement for the IOMMU to pass through all
addresses for Dom0, so we have room to isolate things if possible.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tim Deegan
At 01:14 + on 10 Dec (1418170461), Tian, Kevin wrote:
  From: Tim Deegan [mailto:t...@xen.org]
  Sent: Tuesday, December 09, 2014 6:47 PM
  
  At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
   Hi all,
  
  As you can see, we are pushing our XenGT patches to the upstream. One
   feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
   device model.
  
  Here we may have 2 similar solutions:
  1 Paul told me(and thank you, Paul :)) that there used to be a
   hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
   commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was
  no
   usage at that time.
  
  It's been suggested before that we should revive this hypercall, and I
  don't think it's a good idea.  Whenever a domain needs to know the
  actual MFN of another domain's memory it's usually because the
  security model is problematic.  In particular, finding the MFN is
  usually followed by a brute-force mapping from a dom0 process, or by
  passing the MFN to a device for unprotected DMA.
 
 In our case it's not because the security model is problematic. It's 
 because GPU virtualization is done in Dom0 while the memory virtualization
 is done in hypervisor. We need a means to query GPFN-MFN so we can
 setup shadow GPU page table in Dom0 correctly, for a VM.

I don't think we understand each other.  Let me try to explain what I
mean.  My apologies if this sounds patronising; I'm just trying to be
as clear as I can.

It is Xen's job to isolate VMs from each other.  As part of that, Xen
uses the MMU, nested paging, and IOMMUs to control access to RAM.  Any
software component that can pass a raw MFN to hardware breaks that
isolation, because Xen has no way of controlling what that component
can do (including taking over the hypervisor).  This is why I am
afraid when developers ask for GFN-MFN translation functions.

So if the XenGT model allowed the backend component to (cause the GPU
to) perform arbitrary DMA without IOMMU checks, then that component
would have complete access to the system and (from a security pov)
might as well be running in the hypervisor.  That would be very
problematic, but AFAICT that's not what's going on.  From your reply
on the other thread it seems like the GPU is behind the IOMMU, so
that's OK. :)

When the backend component gets a GFN from the guest, it wants an
address that it can give to the GPU for DMA that will map the right
memory.  That address must be mapped in the IOMMU tables that the GPU
will be using, which means the IOMMU tables of the backend domain,
IIUC[1].  So the hypercall it needs is not give me the MFN that matches
this GFN but please map this GFN into my IOMMU tables.

Asking for the MFN will only work if the backend domain's IOMMU
tables have an existing 1:1 r/w mapping of all guest RAM, which
happens to be the case if the backend component is in dom0 _and_ dom0
is PV _and_ we're not using strict IOMMU tables.  Restricting XenGT to
work in only those circumstances would be short-sighted, not only
because it would mean XenGT could never work as a driver domain, but
also because it seems like PVH dom0 is going to be the default at some
point.

If the existing hypercalls that make IOMMU mappings are not right for
XenGT then we can absolutely consider adding some more.  But we need
to talk about what policy Xen will enforce on the mapping requests.
If the shared backend is allowed to map any page of any VM, then it
can easily take control of any VM on the host (even though the IOMMU
will prevent it from taking over the hypervisor itself).  The
absolute minumum we should allow here is some toolstack-controlled
list of which VMs the XenGT backend is serving, so that it can refuse
to map other VMs' memory (like an extension of IS_PRIV_FOR, which does
this job for Qemu).

I would also strongly advise using privilege separation in the backend
between the GPUPT shadow code (which needs mapping rights and is
trusted to maintain isolation between the VMs that are sharing the
GPU) and the rest of the XenGT backend (which doesn't/isn't).  But
that's outside my remit as a hypervisor maintainer so it goes no
further than an I told you so. :)

Cheers,

Tim.

[1] That is, AIUI this GPU doesn't context-switch which set of IOMMU
tables it's using for DMA, SR-IOV-style, and that's why you need a
software component in the first place.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Malcolm Crossley
On 10/12/14 09:51, Tian, Kevin wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, December 10, 2014 5:17 PM

 On 10.12.14 at 09:47, kevin.t...@intel.com wrote:
 two translation paths in assigned case:

 1. [direct CPU access from VM], with partitioned PCI aperture
 resource, every VM can access a portion of PCI aperture directly.

 - CPU page table/EPT: CPU virtual address-PCI aperture
 - PCI aperture - bar base = Graphics Memory Address (GMA)
 - GPU page table: GMA - GPA (as programmed by guest)
 - IOMMU: GPA - MPA

 2. [GPU access through GPU command operands], with GPU scheduling,
 every VM's command buffer will be fetched by GPU in a time-shared
 manner.

 - GPU page table: GMA-GPA
 - IOMMU: GPA-MPA

 In our case, IOMMU is setup with 1:1 identity table for dom0. So
 when GPU may access GPAs from different VMs, we can't count on
 IOMMU which can only serve one mapping for one device (unless
 we have SR-IOV).

 That's why we need shadow GPU page table in dom0, and need a
 p2m query call to translate from GPA - MPA:

 - shadow GPU page table: GMA-MPA
 - IOMMU: MPA-MPA (for dom0)

 I still can't see why the Dom0 translation has to remain 1:1, i.e.
 why Xen couldn't return some arbitrary GPA for the query in
 question here, setting up a suitable GPA-MPA translation. (I put
 arbitrary in quotes because this of course must not conflict with
 GPAs already or possibly in use by Dom0.) And I can only stress
 again that you shouldn't leave out PVH (where the IOMMU already
 isn't set up with all 1:1 mappings) from these considerations.

 
 It's interesting that you think IOMMU can be used in such situation.
 
 what do you mean by arbitrary GPA here? and It's not just about 
 conflicting with Dom0's GPA, it's about confliction in all VM's GPAs 
 when you hosting them through one IOMMU page table, and there's 
 no way to prevent this definitely since GPAs are picked by VMs 
 themselves.
 
 I don't think we can support PVH here if IOMMU is not 1:1 mapping.
 

I agree with Jan, there doesn't need to be a fixed 1:1 mapping between
IOMMU and MFN's addresses.

I think all that's required is that there is an IOMMU mapping for the
GPU device connected to dom0 (or driver domain) which allows guest
memory to be accessed by the GPU. This IOMMU address is what is
programmed into shadow GPU page table, I refer to this address as Bus
frame number(BFN) in the PV IOMMU design document.

- shadow GPU page table: GMA-BFN
- IOMMU: BFN-MPA


IOMMU's can almost always address more than the host physical RAM so we
can create IOMMU mappings above the top of host physical RAM in order to
have IOMMU mappings of guest RAM.

The PV-IOMMU design allows the guest to have control of the IOMMU
address space. In theory it could be extended to have permission checks
for mapping guest MFN's and have a mapping interface which takes a domid
and a GMFN. That way the driver domain does not need to know the actual
MFN's being used.

The guest itself (CPU) accesses the GPU via outbound MMIO mappings so we
don't need to be concerned with address translation in that direction.

I think getting Xen to allocate IOMMU mappings for a driver domain will
be problematic for PV based driver domains because the M2P for PV
domains is not kept strictly upto date with what the guest is using for
P2M and so it will be difficult/impossible to determine which addresses
are not in use.

Similarly it may be difficult to HVM guests because P2M mapping are
outbound (CPU to rest of host) and determining what addresses are
suitable for inbound access (rest of host to memory) may be difficult.
I.E should MMIO outbound address space be used for inbound IOMMU mappings?

I hope I've not caused more confusion.

Malcolm

 Thanks
 Kevin
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, December 10, 2014 6:36 PM
 
  On 10.12.14 at 02:14, kevin.t...@intel.com wrote:
   From: Tim Deegan [mailto:t...@xen.org]
  It's been suggested before that we should revive this hypercall, and I
  don't think it's a good idea.  Whenever a domain needs to know the
  actual MFN of another domain's memory it's usually because the
  security model is problematic.  In particular, finding the MFN is
  usually followed by a brute-force mapping from a dom0 process, or by
  passing the MFN to a device for unprotected DMA.
 
  In our case it's not because the security model is problematic. It's
  because GPU virtualization is done in Dom0 while the memory virtualization
  is done in hypervisor.
 
 Which by itself is a questionable design decision.
 

I don't think we want to put a ~20K LOC device model in hypervisor.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-10 Thread Tian, Kevin
 From: Ian Campbell [mailto:ian.campb...@citrix.com]
 Sent: Wednesday, December 10, 2014 6:11 PM
 
 On Wed, 2014-12-10 at 01:48 +, Tian, Kevin wrote:
  I'm not familiar with Arm architecture, but based on a brief reading it's
  for the assigned case where the MMU is exclusive owned by a VM, so
  some type of MMU virtualization is required and it's straightforward.
 
  However XenGT is a shared GPU usage:
 
  - a global GPU page table is partitioned among VMs. a shared shadow
  global page table is maintained, containing translations for multiple
  VMs simultaneously based on partitioning information
  - multiple per-process GPU page tables are created by each VM, and
  multiple shadow per-process GPU page tables are created correspondingly.
  shadow page table is switched when doing GPU context switch, same as
  what we did for CPU shadow page table.
 
 None of that sounds to me to be impossible to do in the remoteproc
 model, perhaps it needs some extensions from its initial core feature
 set but I see no reason why it couldn't maintain multiple sets of page
 tables, each tagged with an owning domain (for validation purposes) and
 a mechanism to switch between them, or to be able to manage partitioning
 of the GPU address space.

here we're talking about multiple GPU page tables on top of a 
IOMMU page table. Instead of one MMU unit concerned here in 
remoteproc.

 
  So you can see above shared MMU virtualization usage is very GPU
  specific,
 
 AIUI remoteproc is specific to a particular h/w device too, i.e. there
 is a device specific stub in the hypervisor which essentially knows how
 to implement set_pte for that bit of h/w, with appropriate safety and
 validation, as well as a write_cr3 type operation.
 
   that's why we didn't put in Xen hypervisor, and thus additional
  interface is required to get p2m mapping to assist our shadow GPU
  page table usage.
 
 There is a great reluctance among several maintainers to expose real
 hardware MFNs to VMs (including dom0 and backend driver domains).
 
 I think you need to think very carefully about possible ways of avoiding
 the need for this. Yes, this might require some changes to your current
 mode/design.
 

We're open to changes if necessary.

Thanks,
Kevin 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Paul Durrant
 -Original Message-
 From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
 Sent: 09 December 2014 10:11
 To: Paul Durrant; Keir (Xen.org); Tim (Xen.org); jbeul...@suse.com; Kevin
 Tian; Xen-devel@lists.xen.org
 Subject: One question about the hypercall to translate gfn to mfn.
 
 Hi all,
 
As you can see, we are pushing our XenGT patches to the upstream. One
 feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
 device model.
 
Here we may have 2 similar solutions:
1 Paul told me(and thank you, Paul :)) that there used to be a
 hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
 commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was
 no
 usage at that time. So solution 1 is to revert this commit. However,
 since this hypercall was removed ages ago, the reverting met many
 conflicts, i.e. the gmfn_to_mfn is no longer used in x86, etc.
 
2 In our project, we defined a new hypercall
 XENMEM_get_mfn_from_pfn, which has a similar implementation like the
 previous XENMEM_translate_gpfn_list. One of the major differences is
 that this newly defined one is only for x86(called in arch_memory_op),
 so we do not have to worry about the arm side.
 
Does anyone has any suggestions about this?

IIUC what is needed is a means to IOMMU map a gfn in the service domain (dom0 
for the moment) such that it can be accessed by the GPU. I think use of an raw 
mfn value currently works only because dom0 is using a 1:1 IOMMU mapping 
scheme. Is my understanding correct, or do you really need raw mfn values?

  Paul

Thanks in advance. :)
 
 B.R.
 Yu
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Jan Beulich
 On 09.12.14 at 11:10, yu.c.zh...@linux.intel.com wrote:
As you can see, we are pushing our XenGT patches to the upstream. One 
 feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 
 device model.
 
Here we may have 2 similar solutions:
1 Paul told me(and thank you, Paul :)) that there used to be a 
 hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in 
 commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no 
 usage at that time. So solution 1 is to revert this commit. However, 
 since this hypercall was removed ages ago, the reverting met many 
 conflicts, i.e. the gmfn_to_mfn is no longer used in x86, etc.
 
2 In our project, we defined a new hypercall 
 XENMEM_get_mfn_from_pfn, which has a similar implementation like the 
 previous XENMEM_translate_gpfn_list. One of the major differences is 
 that this newly defined one is only for x86(called in arch_memory_op), 
 so we do not have to worry about the arm side.
 
Does anyone has any suggestions about this?

Out of the two 1 seems preferable. But without background (see also
Paul's reply) it's hard to tell whether that's what you want/need.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Yu, Zhang



On 12/9/2014 6:19 PM, Paul Durrant wrote:

I think use of an raw mfn value currently works only because dom0 is using a 
1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need 
raw mfn values?

Thanks for your quick response, Paul.
Well, not exactly for this case. :)
In XenGT, our need to translate gfn to mfn is for GPU's page table, 
which contains the translation between graphic address and the memory 
address. This page table is maintained by GPU drivers, and our service 
domain need to have a method to translate the guest physical addresses 
written by the vGPU into host physical ones.
We do not use IOMMU in XenGT and therefore this translation may not 
necessarily be a 1:1 mapping.


B.R.
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Tim Deegan
At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
 Hi all,
 
As you can see, we are pushing our XenGT patches to the upstream. One 
 feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 
 device model.
 
Here we may have 2 similar solutions:
1 Paul told me(and thank you, Paul :)) that there used to be a 
 hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in 
 commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no 
 usage at that time.

It's been suggested before that we should revive this hypercall, and I
don't think it's a good idea.  Whenever a domain needs to know the
actual MFN of another domain's memory it's usually because the
security model is problematic.  In particular, finding the MFN is
usually followed by a brute-force mapping from a dom0 process, or by
passing the MFN to a device for unprotected DMA.

These days DMA access should be protected by IOMMUs, or else
the device drivers (and associated tools) are effectively inside the
hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
presumably present on anything new enough to run XenGT?).

So I think the interface we need here is a please-map-this-gfn one,
like the existing grant-table ops (which already do what you need by
returning an address suitable for DMA).  If adding a grant entry for
every frame of the framebuffer within the guest is too much, maybe we
can make a new interface for the guest to grant access to larger areas.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Paul Durrant
 -Original Message-
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: 09 December 2014 10:47
 To: Yu, Zhang
 Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-
 de...@lists.xen.org
 Subject: Re: One question about the hypercall to translate gfn to mfn.
 
 At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
  Hi all,
 
 As you can see, we are pushing our XenGT patches to the upstream. One
  feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
  device model.
 
 Here we may have 2 similar solutions:
 1 Paul told me(and thank you, Paul :)) that there used to be a
  hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
  commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was
 no
  usage at that time.
 
 It's been suggested before that we should revive this hypercall, and I
 don't think it's a good idea.  Whenever a domain needs to know the
 actual MFN of another domain's memory it's usually because the
 security model is problematic.  In particular, finding the MFN is
 usually followed by a brute-force mapping from a dom0 process, or by
 passing the MFN to a device for unprotected DMA.
 
 These days DMA access should be protected by IOMMUs, or else
 the device drivers (and associated tools) are effectively inside the
 hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
 presumably present on anything new enough to run XenGT?).
 
 So I think the interface we need here is a please-map-this-gfn one,
 like the existing grant-table ops (which already do what you need by
 returning an address suitable for DMA).  If adding a grant entry for
 every frame of the framebuffer within the guest is too much, maybe we
 can make a new interface for the guest to grant access to larger areas.
 

IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put 
in the guests table by the tools, which would entail some form of flexibly 
sized reserved range of grant entries otherwise any PV driver that are present 
in the guest would merrily clobber the new grant entries.
A domain can already priv map a gfn into the MMU, so I think we just need an 
equivalent for the IOMMU.

  Paul

 Cheers,
 
 Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Ian Campbell
On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote:
  -Original Message-
  From: Tim Deegan [mailto:t...@xen.org]
  Sent: 09 December 2014 10:47
  To: Yu, Zhang
  Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-
  de...@lists.xen.org
  Subject: Re: One question about the hypercall to translate gfn to mfn.
  
  At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
   Hi all,
  
  As you can see, we are pushing our XenGT patches to the upstream. One
   feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
   device model.
  
  Here we may have 2 similar solutions:
  1 Paul told me(and thank you, Paul :)) that there used to be a
   hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
   commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was
  no
   usage at that time.
  
  It's been suggested before that we should revive this hypercall, and I
  don't think it's a good idea.  Whenever a domain needs to know the
  actual MFN of another domain's memory it's usually because the
  security model is problematic.  In particular, finding the MFN is
  usually followed by a brute-force mapping from a dom0 process, or by
  passing the MFN to a device for unprotected DMA.
  
  These days DMA access should be protected by IOMMUs, or else
  the device drivers (and associated tools) are effectively inside the
  hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
  presumably present on anything new enough to run XenGT?).
  
  So I think the interface we need here is a please-map-this-gfn one,
  like the existing grant-table ops (which already do what you need by
  returning an address suitable for DMA).  If adding a grant entry for
  every frame of the framebuffer within the guest is too much, maybe we
  can make a new interface for the guest to grant access to larger areas.
  
 
 IIUC the in-guest driver is Xen-unaware so any grant entry would have
 to be put in the guests table by the tools, which would entail some
 form of flexibly sized reserved range of grant entries otherwise any
 PV driver that are present in the guest would merrily clobber the new
 grant entries.
 A domain can already priv map a gfn into the MMU, so I think we just
  need an equivalent for the IOMMU.

I'm not sure I'm fully understanding what's going on here, but is a
variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also
returns a DMA handle a plausible solution?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Paul Durrant
 -Original Message-
 From: Ian Campbell
 Sent: 09 December 2014 11:11
 To: Paul Durrant
 Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com;
 Xen-devel@lists.xen.org
 Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to
 mfn.
 
 On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote:
   -Original Message-
   From: Tim Deegan [mailto:t...@xen.org]
   Sent: 09 December 2014 10:47
   To: Yu, Zhang
   Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-
   de...@lists.xen.org
   Subject: Re: One question about the hypercall to translate gfn to mfn.
  
   At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
Hi all,
   
   As you can see, we are pushing our XenGT patches to the upstream.
 One
feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
device model.
   
   Here we may have 2 similar solutions:
   1 Paul told me(and thank you, Paul :)) that there used to be a
hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there
 was
   no
usage at that time.
  
   It's been suggested before that we should revive this hypercall, and I
   don't think it's a good idea.  Whenever a domain needs to know the
   actual MFN of another domain's memory it's usually because the
   security model is problematic.  In particular, finding the MFN is
   usually followed by a brute-force mapping from a dom0 process, or by
   passing the MFN to a device for unprotected DMA.
  
   These days DMA access should be protected by IOMMUs, or else
   the device drivers (and associated tools) are effectively inside the
   hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
   presumably present on anything new enough to run XenGT?).
  
   So I think the interface we need here is a please-map-this-gfn one,
   like the existing grant-table ops (which already do what you need by
   returning an address suitable for DMA).  If adding a grant entry for
   every frame of the framebuffer within the guest is too much, maybe we
   can make a new interface for the guest to grant access to larger areas.
  
 
  IIUC the in-guest driver is Xen-unaware so any grant entry would have
  to be put in the guests table by the tools, which would entail some
  form of flexibly sized reserved range of grant entries otherwise any
  PV driver that are present in the guest would merrily clobber the new
  grant entries.
  A domain can already priv map a gfn into the MMU, so I think we just
   need an equivalent for the IOMMU.
 
 I'm not sure I'm fully understanding what's going on here, but is a
 variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which
 also
 returns a DMA handle a plausible solution?
 

I think we want be able to avoid setting up a PTE in the MMU since it's not 
needed in most (or perhaps all?) cases.

  Paul

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Jan Beulich
 On 09.12.14 at 12:17, paul.durr...@citrix.com wrote:
 I think we want be able to avoid setting up a PTE in the MMU since it's not 
 needed in most (or perhaps all?) cases.

With shared page tables, there's no way to do one without the other.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Malcolm Crossley
On 09/12/14 11:23, Jan Beulich wrote:
 On 09.12.14 at 12:17, paul.durr...@citrix.com wrote:
 I think we want be able to avoid setting up a PTE in the MMU since it's not 
 needed in most (or perhaps all?) cases.
 
 With shared page tables, there's no way to do one without the other.
 
Interestingly the IOMMU in front of the Intel GPU is only capable of
handling 4k pages and so we wouldn't end up with share page tables being
used.

For other PCI device's then shared page tables will be a problem.

Malcolm

 Jan
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Ian Campbell
On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote:
  -Original Message-
  From: Ian Campbell
  Sent: 09 December 2014 11:11
  To: Paul Durrant
  Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com;
  Xen-devel@lists.xen.org
  Subject: Re: [Xen-devel] One question about the hypercall to translate gfn 
  to
  mfn.
  
  On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote:
-Original Message-
From: Tim Deegan [mailto:t...@xen.org]
Sent: 09 December 2014 10:47
To: Yu, Zhang
Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-
de...@lists.xen.org
Subject: Re: One question about the hypercall to translate gfn to mfn.
   
At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
 Hi all,

As you can see, we are pushing our XenGT patches to the upstream.
  One
 feature we need in xen is to translate guests' gfn to mfn in XenGT 
 dom0
 device model.

Here we may have 2 similar solutions:
1 Paul told me(and thank you, Paul :)) that there used to be a
 hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
 commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there
  was
no
 usage at that time.
   
It's been suggested before that we should revive this hypercall, and I
don't think it's a good idea.  Whenever a domain needs to know the
actual MFN of another domain's memory it's usually because the
security model is problematic.  In particular, finding the MFN is
usually followed by a brute-force mapping from a dom0 process, or by
passing the MFN to a device for unprotected DMA.
   
These days DMA access should be protected by IOMMUs, or else
the device drivers (and associated tools) are effectively inside the
hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
presumably present on anything new enough to run XenGT?).
   
So I think the interface we need here is a please-map-this-gfn one,
like the existing grant-table ops (which already do what you need by
returning an address suitable for DMA).  If adding a grant entry for
every frame of the framebuffer within the guest is too much, maybe we
can make a new interface for the guest to grant access to larger areas.
   
  
   IIUC the in-guest driver is Xen-unaware so any grant entry would have
   to be put in the guests table by the tools, which would entail some
   form of flexibly sized reserved range of grant entries otherwise any
   PV driver that are present in the guest would merrily clobber the new
   grant entries.
   A domain can already priv map a gfn into the MMU, so I think we just
need an equivalent for the IOMMU.
  
  I'm not sure I'm fully understanding what's going on here, but is a
  variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which
  also
  returns a DMA handle a plausible solution?
  
 
 I think we want be able to avoid setting up a PTE in the MMU since
 it's not needed in most (or perhaps all?) cases.

Another (wildly under-informed) thought then:

A while back Global logic proposed (for ARM) an infrastructure for
allowing dom0 drivers to maintain a set of iommu like pagetables under
hypervisor supervision (they called these remoteprocessor iommu).

I didn't fully grok what it was at the time, let alone remember the
details properly now, but AIUI it was essentially a framework for
allowing a simple Xen side driver to provide PV-MMU-like update
operations for a set of PTs which were not the main-processor's PTs,
with validation etc.

See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945

The introductory email even mentions GPUs...

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Paul Durrant
 -Original Message-
 From: Ian Campbell
 Sent: 09 December 2014 11:29
 To: Paul Durrant
 Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com;
 Xen-devel@lists.xen.org
 Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to
 mfn.
 
 On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote:
   -Original Message-
   From: Ian Campbell
   Sent: 09 December 2014 11:11
   To: Paul Durrant
   Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org);
 jbeul...@suse.com;
   Xen-devel@lists.xen.org
   Subject: Re: [Xen-devel] One question about the hypercall to translate
 gfn to
   mfn.
  
   On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote:
 -Original Message-
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: 09 December 2014 10:47
 To: Yu, Zhang
 Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-
 de...@lists.xen.org
 Subject: Re: One question about the hypercall to translate gfn to mfn.

 At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
  Hi all,
 
 As you can see, we are pushing our XenGT patches to the
 upstream.
   One
  feature we need in xen is to translate guests' gfn to mfn in XenGT
 dom0
  device model.
 
 Here we may have 2 similar solutions:
 1 Paul told me(and thank you, Paul :)) that there used to be a
  hypercall, XENMEM_translate_gpfn_list, which was removed by
 Keir in
  commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because
 there
   was
 no
  usage at that time.

 It's been suggested before that we should revive this hypercall, and I
 don't think it's a good idea.  Whenever a domain needs to know the
 actual MFN of another domain's memory it's usually because the
 security model is problematic.  In particular, finding the MFN is
 usually followed by a brute-force mapping from a dom0 process, or by
 passing the MFN to a device for unprotected DMA.

 These days DMA access should be protected by IOMMUs, or else
 the device drivers (and associated tools) are effectively inside the
 hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
 presumably present on anything new enough to run XenGT?).

 So I think the interface we need here is a please-map-this-gfn one,
 like the existing grant-table ops (which already do what you need by
 returning an address suitable for DMA).  If adding a grant entry for
 every frame of the framebuffer within the guest is too much, maybe
 we
 can make a new interface for the guest to grant access to larger 
 areas.

   
IIUC the in-guest driver is Xen-unaware so any grant entry would have
to be put in the guests table by the tools, which would entail some
form of flexibly sized reserved range of grant entries otherwise any
PV driver that are present in the guest would merrily clobber the new
grant entries.
A domain can already priv map a gfn into the MMU, so I think we just
 need an equivalent for the IOMMU.
  
   I'm not sure I'm fully understanding what's going on here, but is a
   variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign
 which
   also
   returns a DMA handle a plausible solution?
  
 
  I think we want be able to avoid setting up a PTE in the MMU since
  it's not needed in most (or perhaps all?) cases.
 
 Another (wildly under-informed) thought then:
 
 A while back Global logic proposed (for ARM) an infrastructure for
 allowing dom0 drivers to maintain a set of iommu like pagetables under
 hypervisor supervision (they called these remoteprocessor iommu).
 
 I didn't fully grok what it was at the time, let alone remember the
 details properly now, but AIUI it was essentially a framework for
 allowing a simple Xen side driver to provide PV-MMU-like update
 operations for a set of PTs which were not the main-processor's PTs,
 with validation etc.
 
 See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945
 
 The introductory email even mentions GPUs...
 

That series does indeed seem to be very relevant.

  Paul

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Tian, Kevin
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Tuesday, December 09, 2014 6:50 PM
 
  On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote:
  On 12/9/2014 6:19 PM, Paul Durrant wrote:
  I think use of an raw mfn value currently works only because dom0 is using
 a
  1:1 IOMMU mapping scheme. Is my understanding correct, or do you really
 need
  raw mfn values?
  Thanks for your quick response, Paul.
  Well, not exactly for this case. :)
  In XenGT, our need to translate gfn to mfn is for GPU's page table,
  which contains the translation between graphic address and the memory
  address. This page table is maintained by GPU drivers, and our service
  domain need to have a method to translate the guest physical addresses
  written by the vGPU into host physical ones.
  We do not use IOMMU in XenGT and therefore this translation may not
  necessarily be a 1:1 mapping.
 
 Hmm, that suggests you indeed need raw MFNs, which in turn seems
 problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation
 layer). But while you don't use the IOMMU yourself, I suppose the GPU
 accesses still don't bypass the IOMMU? In which case all you'd need
 returned is a frame number that guarantees that after IOMMU
 translation it refers to the correct MFN, i.e. still allowing for your Dom0
 driver to simply set aside a part of its PFN space, asking Xen to
 (IOMMU-)map the necessary guest frames into there.
 

No. What we require is the raw MFNs. One IOMMU device entry can't
point to multiple VM's page tables, so that's why XenGT needs to use
software shadow GPU page table to implement the sharing. Note it's
not for dom0 to access the MFN. It's for dom0 to setup the correct
shadow GPU page table, so a VM can access the graphics memory
in a controlled way.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Tian, Kevin
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: Tuesday, December 09, 2014 6:47 PM
 
 At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
  Hi all,
 
 As you can see, we are pushing our XenGT patches to the upstream. One
  feature we need in xen is to translate guests' gfn to mfn in XenGT dom0
  device model.
 
 Here we may have 2 similar solutions:
 1 Paul told me(and thank you, Paul :)) that there used to be a
  hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in
  commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was
 no
  usage at that time.
 
 It's been suggested before that we should revive this hypercall, and I
 don't think it's a good idea.  Whenever a domain needs to know the
 actual MFN of another domain's memory it's usually because the
 security model is problematic.  In particular, finding the MFN is
 usually followed by a brute-force mapping from a dom0 process, or by
 passing the MFN to a device for unprotected DMA.

In our case it's not because the security model is problematic. It's 
because GPU virtualization is done in Dom0 while the memory virtualization
is done in hypervisor. We need a means to query GPFN-MFN so we can
setup shadow GPU page table in Dom0 correctly, for a VM.

 
 These days DMA access should be protected by IOMMUs, or else
 the device drivers (and associated tools) are effectively inside the
 hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
 presumably present on anything new enough to run XenGT?).

yes, IOMMU protect DMA accesses in a device-agnostic way. But in
our case, IOMMU can't be used because it's only for exclusively
assigned case, as I replied in another mail. And to reduce the hypervisor
TCB, we put device model in Dom0 which is why a interface is required
to connect p2m information.

 
 So I think the interface we need here is a please-map-this-gfn one,
 like the existing grant-table ops (which already do what you need by
 returning an address suitable for DMA).  If adding a grant entry for
 every frame of the framebuffer within the guest is too much, maybe we
 can make a new interface for the guest to grant access to larger areas.

A please-map-this-gfn interface assumes the logic behind lies in Xen
hypervisor, e.g. managing CPU page table or IOMMU entry. However
here the management of GPU page table is in Dom0, and what we
want is a please-tell-me-mfn-for-a-gpfn interface, so we can translate
from gpfn in guest GPU PTE to a mfn in shadow GPU PTE. 

Hope this makes the requirement clearer.

 
 Cheers,
 
 Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Tian, Kevin
 From: Malcolm Crossley
 Sent: Tuesday, December 09, 2014 6:52 PM
 
 On 09/12/14 10:37, Yu, Zhang wrote:
 
 
  On 12/9/2014 6:19 PM, Paul Durrant wrote:
  I think use of an raw mfn value currently works only because dom0 is
  using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do
  you really need raw mfn values?
  Thanks for your quick response, Paul.
  Well, not exactly for this case. :)
  In XenGT, our need to translate gfn to mfn is for GPU's page table,
  which contains the translation between graphic address and the memory
  address. This page table is maintained by GPU drivers, and our service
  domain need to have a method to translate the guest physical addresses
  written by the vGPU into host physical ones.
  We do not use IOMMU in XenGT and therefore this translation may not
  necessarily be a 1:1 mapping.
 
 XenGT must use the IOMMU mappings that Xen has setup for the domain
 which owns the GPU. Currently Dom0 own's the GPU and so it's IOMMU
 mappings match the MFN's addresses. I suspect XenGT will not work if Xen
 is booted with iommu=dom0-strict.
 

This is a good point. So yes in this case IOMMU is still active which contains
a 1:1 IOMMU mapping table, but it's a separate thing from the interface
discussed here, which is about setup a shadow GPU page table for other VM's
graphics memory accesses. 

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2014-12-09 Thread Tian, Kevin
 From: Paul Durrant [mailto:paul.durr...@citrix.com]
 Sent: Tuesday, December 09, 2014 7:44 PM
 
  -Original Message-
  From: Ian Campbell
  Sent: 09 December 2014 11:29
  To: Paul Durrant
  Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com;
  Xen-devel@lists.xen.org
  Subject: Re: [Xen-devel] One question about the hypercall to translate gfn 
  to
  mfn.
 
  On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote:
-Original Message-
From: Ian Campbell
Sent: 09 December 2014 11:11
To: Paul Durrant
Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org);
  jbeul...@suse.com;
Xen-devel@lists.xen.org
Subject: Re: [Xen-devel] One question about the hypercall to translate
  gfn to
mfn.
   
On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote:
  -Original Message-
  From: Tim Deegan [mailto:t...@xen.org]
  Sent: 09 December 2014 10:47
  To: Yu, Zhang
  Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; 
  Xen-
  de...@lists.xen.org
  Subject: Re: One question about the hypercall to translate gfn to 
  mfn.
 
  At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote:
   Hi all,
  
  As you can see, we are pushing our XenGT patches to the
  upstream.
One
   feature we need in xen is to translate guests' gfn to mfn in XenGT
  dom0
   device model.
  
  Here we may have 2 similar solutions:
  1 Paul told me(and thank you, Paul :)) that there used to be a
   hypercall, XENMEM_translate_gpfn_list, which was removed by
  Keir in
   commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because
  there
was
  no
   usage at that time.
 
  It's been suggested before that we should revive this hypercall, 
  and I
  don't think it's a good idea.  Whenever a domain needs to know the
  actual MFN of another domain's memory it's usually because the
  security model is problematic.  In particular, finding the MFN is
  usually followed by a brute-force mapping from a dom0 process, or
 by
  passing the MFN to a device for unprotected DMA.
 
  These days DMA access should be protected by IOMMUs, or else
  the device drivers (and associated tools) are effectively inside the
  hypervisor's TCB.  Luckily on x86 IOMMUs are widely available (and
  presumably present on anything new enough to run XenGT?).
 
  So I think the interface we need here is a please-map-this-gfn one,
  like the existing grant-table ops (which already do what you need by
  returning an address suitable for DMA).  If adding a grant entry for
  every frame of the framebuffer within the guest is too much, maybe
  we
  can make a new interface for the guest to grant access to larger
 areas.
 

 IIUC the in-guest driver is Xen-unaware so any grant entry would have
 to be put in the guests table by the tools, which would entail some
 form of flexibly sized reserved range of grant entries otherwise any
 PV driver that are present in the guest would merrily clobber the new
 grant entries.
 A domain can already priv map a gfn into the MMU, so I think we just
  need an equivalent for the IOMMU.
   
I'm not sure I'm fully understanding what's going on here, but is a
variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign
  which
also
returns a DMA handle a plausible solution?
   
  
   I think we want be able to avoid setting up a PTE in the MMU since
   it's not needed in most (or perhaps all?) cases.
 
  Another (wildly under-informed) thought then:
 
  A while back Global logic proposed (for ARM) an infrastructure for
  allowing dom0 drivers to maintain a set of iommu like pagetables under
  hypervisor supervision (they called these remoteprocessor iommu).
 
  I didn't fully grok what it was at the time, let alone remember the
  details properly now, but AIUI it was essentially a framework for
  allowing a simple Xen side driver to provide PV-MMU-like update
  operations for a set of PTs which were not the main-processor's PTs,
  with validation etc.
 
  See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945
 
  The introductory email even mentions GPUs...
 
 
 That series does indeed seem to be very relevant.
 
   Paul

I'm not familiar with Arm architecture, but based on a brief reading it's
for the assigned case where the MMU is exclusive owned by a VM, so
some type of MMU virtualization is required and it's straightforward.

However XenGT is a shared GPU usage:

- a global GPU page table is partitioned among VMs. a shared shadow
global page table is maintained, containing translations for multiple
VMs simultaneously based on partitioning information
- multiple per-process GPU page tables are created by each VM, and
multiple shadow per-process GPU page tables are created correspondingly.
shadow page table is switched when