Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 09/01/15 08:02, Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] Sent: Thursday, January 08, 2015 8:43 PM Hi, Not really. The IOMMU tables are also 64-bit so there must be enough addresses to map all of RAM. There shouldn't be any need for these mappings to be _contiguous_, btw. You just need to have one free address for each mapping. Again, following how grant maps work, I'd imagine that PVH guests will allocate an unused GFN for each mapping and do enough bookkeeping to make sure they don't clash with other GFN users (grant mapping, ballooning, c). PV guests will probably be given a BFN by the hypervisor at map time (which will be == MFN in practice) and just needs to pass the same BFN to the unmap call later (it can store it in the GTT meanwhile). if possible prefer to make both consistent, i.e. always finding unused GFN? I don't think it will be possible. PV domains are already using BFNs supplied by Xen (in fact == MFN) for backend grant mappings, which would conflict with supplying their own for these mappings. But again, I think the kernel maintainers for Xen may have a better idea of how these interfaces are used inside the kernel. For example, it might be easy enough to wrap the two systems inside a common API inside linux. Again, following how grant mapping works seems like the way forward. So Konrad, do you have any insight here? :-) Malcolm took two pages of this notebook explaining to me how he thought it should work (in combination with his PV IOMMU work), so I'll let him explain. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tim Deegan [mailto:t...@xen.org] Sent: Thursday, January 08, 2015 8:43 PM Hi, Not really. The IOMMU tables are also 64-bit so there must be enough addresses to map all of RAM. There shouldn't be any need for these mappings to be _contiguous_, btw. You just need to have one free address for each mapping. Again, following how grant maps work, I'd imagine that PVH guests will allocate an unused GFN for each mapping and do enough bookkeeping to make sure they don't clash with other GFN users (grant mapping, ballooning, c). PV guests will probably be given a BFN by the hypervisor at map time (which will be == MFN in practice) and just needs to pass the same BFN to the unmap call later (it can store it in the GTT meanwhile). if possible prefer to make both consistent, i.e. always finding unused GFN? I don't think it will be possible. PV domains are already using BFNs supplied by Xen (in fact == MFN) for backend grant mappings, which would conflict with supplying their own for these mappings. But again, I think the kernel maintainers for Xen may have a better idea of how these interfaces are used inside the kernel. For example, it might be easy enough to wrap the two systems inside a common API inside linux. Again, following how grant mapping works seems like the way forward. So Konrad, do you have any insight here? :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Fri, Jan 09, 2015 at 08:02:48AM +, Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] Sent: Thursday, January 08, 2015 8:43 PM Hi, Not really. The IOMMU tables are also 64-bit so there must be enough addresses to map all of RAM. There shouldn't be any need for these mappings to be _contiguous_, btw. You just need to have one free address for each mapping. Again, following how grant maps work, I'd imagine that PVH guests will allocate an unused GFN for each mapping and do enough bookkeeping to make sure they don't clash with other GFN users (grant mapping, ballooning, c). PV guests will probably be given a BFN by the hypervisor at map time (which will be == MFN in practice) and just needs to pass the same BFN to the unmap call later (it can store it in the GTT meanwhile). if possible prefer to make both consistent, i.e. always finding unused GFN? I don't think it will be possible. PV domains are already using BFNs supplied by Xen (in fact == MFN) for backend grant mappings, which would conflict with supplying their own for these mappings. But again, I think the kernel maintainers for Xen may have a better idea of how these interfaces are used inside the kernel. For example, it might be easy enough to wrap the two systems inside a common API inside linux. Again, following how grant mapping works seems like the way forward. So Konrad, do you have any insight here? :-) For grants we end up making the 'struct page' for said grant be visible in our linear space. We stash the original BFNs(MFN) in the 'struct page' and replace the P2M in PV guests with the new BFN(MFN). David and Jenniefer is working on making this more lightweight. How often do we these updates? We could also do simpler way - which is what backend drivers do - is to get a swath of vmalloc memory and hooking the BFNs to it. That can stay for quite some time. The neat thing about vmalloc is that it is an sliding-window type mechanism to deal with memory that is not usually accessed via linear page tables. I suppose the complexity behind this is that this 'window' at the GPU page tables needs to change. As in it moves around as there are different guests doing things. So the mechanism of swapping this 'window' is going to be expensive to map/unmap (as you have to flush the TLBs in the initial domain for the page-tables - unless you have multiple 'windows' and we flush the olders ones lazily? But that sounds complex). Who is doing the audit/modification ? Is it some application in the initial domain (backend) domain or some driver in the kernel? Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
Hi, At 08:56 + on 06 Jan (1420530995), Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote: but just to confirm one point. from my understanding whether it's a mapping operation doesn't really matter. We can invent an interface to get p2m mapping and then increase refcnt. the key is refcnt here. when XenGT constructs a shadow GPU page table, it creates a reference to guest memory page so the refcnt must be increased. :-) True. :) But Xen does need to remember all the refcounts that were created (so it can tidy up if the domain crashes). If Xen is already doing that it might as well do it in the IOMMU tables since that solves other problems. would a refcnt in p2m layer enough so we don't need separate refcnt in both EPT and IOMMU page table? Yes, that sounds right. The p2m layer is actually the same as the EPT table, so that is where the refcount should be attached to (and it shouldn't matter whether the IOMMU page tables are shared or not). yes, that's the hard part requiring experiments to find a good balance between complexity and performance. IOMMU page table is not designed with same frequent modifications as CPU/GPU page tables, but following above trend make them connected. Another option might be reserve a big enough BFNs to cover all available guest memory at boot time, so to eliminate run-time modification overhead. Sure, or you can map them on demend but keep a cache of maps to avoid unmapping between uses. Not really. The IOMMU tables are also 64-bit so there must be enough addresses to map all of RAM. There shouldn't be any need for these mappings to be _contiguous_, btw. You just need to have one free address for each mapping. Again, following how grant maps work, I'd imagine that PVH guests will allocate an unused GFN for each mapping and do enough bookkeeping to make sure they don't clash with other GFN users (grant mapping, ballooning, c). PV guests will probably be given a BFN by the hypervisor at map time (which will be == MFN in practice) and just needs to pass the same BFN to the unmap call later (it can store it in the GTT meanwhile). if possible prefer to make both consistent, i.e. always finding unused GFN? I don't think it will be possible. PV domains are already using BFNs supplied by Xen (in fact == MFN) for backend grant mappings, which would conflict with supplying their own for these mappings. But again, I think the kernel maintainers for Xen may have a better idea of how these interfaces are used inside the kernel. For example, it might be easy enough to wrap the two systems inside a common API inside linux. Again, following how grant mapping works seems like the way forward. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Tue, 2015-01-06 at 08:42 +, Tian, Kevin wrote: From: George Dunlap Sent: Monday, January 05, 2015 11:50 PM On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote: We're not there in the current design, purely because XenGT has to be in dom0 (so it can trivially DoS Xen by rebooting the host). Can we really decouple dom0 from DoS Xen? I know there's on-going effort like PVH Dom0, however there are lots of trickiness in Dom0 which can put the platform into a bad state. One example is ACPI. All the platform details are encapsulated in AML language, and only dom0 knows how to handle ACPI events. Unless Xen has another parser to guard all possible resources which might be touched thru ACPI, a tampered dom0 has many way to break out. But that'd be very challenging and complex. If we can't containerize Dom0's behavior completely, I would think dom0 and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't make things worse. The question here is, If a malicious guest can manage to break into XenGT, what can they do? If XenGT is running in dom0, then the answer is, At very least, they can DoS the host because dom0 is allowed to reboot; they can probably do lots of other nasty things as well. If XenGT is running in its own domain, and can only add IOMMU entries for MFNs belonging to XenGT-only VMs, then the answer is, They can access other XenGT-enabled VMs, but they cannot shut down the host or access non-XenGT VMs. Slides 8-11 of a presentation I gave (http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-s ecurity-features) can give you a graphical idea of what we're' talking about. I agree we need to make XenGT more isolated following on-going trend from previous discussion, but regarding to whether Dom0/Xen are in the same security domain, I don't see my statement is changed w/ above attempts which just try to move privileged Xen stuff away from dom0, but all existing Linux vulnerabilities allow a tampered Dom0 do many evil things with root permission or even tampered kernel to DoS Xen (e.g. w/ ACPI). PVH dom0 can help performance... but itself alone doesn't change the fact that Dom0/Xen are actually in the same security domain. :-) Which is a good reason why one would want to remove as much potentially vulnerable code from dom0 as possible, and then deny it the corresponding permissions via XSM too. I also find the argument dom0 can do some bad things so we should let it be able to do all bad things rather specious. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: George Dunlap Sent: Monday, January 05, 2015 11:50 PM On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote: We're not there in the current design, purely because XenGT has to be in dom0 (so it can trivially DoS Xen by rebooting the host). Can we really decouple dom0 from DoS Xen? I know there's on-going effort like PVH Dom0, however there are lots of trickiness in Dom0 which can put the platform into a bad state. One example is ACPI. All the platform details are encapsulated in AML language, and only dom0 knows how to handle ACPI events. Unless Xen has another parser to guard all possible resources which might be touched thru ACPI, a tampered dom0 has many way to break out. But that'd be very challenging and complex. If we can't containerize Dom0's behavior completely, I would think dom0 and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't make things worse. The question here is, If a malicious guest can manage to break into XenGT, what can they do? If XenGT is running in dom0, then the answer is, At very least, they can DoS the host because dom0 is allowed to reboot; they can probably do lots of other nasty things as well. If XenGT is running in its own domain, and can only add IOMMU entries for MFNs belonging to XenGT-only VMs, then the answer is, They can access other XenGT-enabled VMs, but they cannot shut down the host or access non-XenGT VMs. Slides 8-11 of a presentation I gave (http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-s ecurity-features) can give you a graphical idea of what we're' talking about. I agree we need to make XenGT more isolated following on-going trend from previous discussion, but regarding to whether Dom0/Xen are in the same security domain, I don't see my statement is changed w/ above attempts which just try to move privileged Xen stuff away from dom0, but all existing Linux vulnerabilities allow a tampered Dom0 do many evil things with root permission or even tampered kernel to DoS Xen (e.g. w/ ACPI). PVH dom0 can help performance... but itself alone doesn't change the fact that Dom0/Xen are actually in the same security domain. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tim Deegan [mailto:t...@xen.org] Sent: Thursday, December 18, 2014 11:47 PM Hi, At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote: I'm afraid not. There's nothing worrying per se in a backend knowing the MFNs of the pages -- the worry is that the backend can pass the MFNs to hardware. If the check happens only at lookup time, then XenGT can (either through a bug or a security breach) just pass _any_ MFN to the GPU for DMA. But even without considering the security aspects, this model has bugs that may be impossible for XenGT itself to even detect. E.g.: 1. Guest asks its virtual GPU to DMA to a frame of memory; 2. XenGT looks up the GFN-MFN mapping; 3. Guest balloons out the page; 4. Xen allocates the page to a different guest; 5. XenGT passes the MFN to the GPU, which DMAs to it. Whereas if stage 2 is a _mapping_ operation, Xen can refcount the underlying memory and make sure it doesn't get reallocated until XenGT is finished with it. yes, I see your point. Now we can't support ballooning in VM given above reason, and refcnt is required to close that gap. but just to confirm one point. from my understanding whether it's a mapping operation doesn't really matter. We can invent an interface to get p2m mapping and then increase refcnt. the key is refcnt here. when XenGT constructs a shadow GPU page table, it creates a reference to guest memory page so the refcnt must be increased. :-) True. :) But Xen does need to remember all the refcounts that were created (so it can tidy up if the domain crashes). If Xen is already doing that it might as well do it in the IOMMU tables since that solves other problems. would a refcnt in p2m layer enough so we don't need separate refcnt in both EPT and IOMMU page table? [First some hopefully-helpful diagrams to explain my thinking. I'll borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the addresses that devices issue their DMAs in: what's 'BFN' short for? Bus Frame Number? Yes, I think so. If we replace that lookup with a _map_ hypercall, either with Xen choosing the BFN (as happens in the PV grant map operation) or with the guest choosing an unused address (as happens in the HVM/PVH grant map operation), then: - the only extra code in XenGT itself is that you need to unmap when you change the GTT; - Xen can track and control exactly which MFNs XenGT/the GPU can access; - running XenGT in a driver domain or PVH dom0 ought to work; and - we fix the race condition I described above. ok, I see your point here. It does sound like a better design to meet Xen hypervisor's security requirement and can also work with PVH Dom0 or driver domain. Previously even when we said a MFN is required, it's actually a BFN due to IOMMU existence, and it works just because we have a 1:1 identity mapping in-place. And by finding a BFN some follow-up think here: - one extra unmap call will have some performance impact, especially for media processing workloads where GPU page table modifications are hot. but suppose this can be optimized with batch request Yep. In general I'd hope that the extra overhead of unmap is small compared with the trap + emulate + ioreq + schedule that's just happened. Though I know that IOTLB shootdowns are potentially rather expensive right now so it might want some measurement. yes, that's the hard part requiring experiments to find a good balance between complexity and performance. IOMMU page table is not designed with same frequent modifications as CPU/GPU page tables, but following above trend make them connected. Another option might be reserve a big enough BFNs to cover all available guest memory at boot time, so to eliminate run-time modification overhead. - is there existing _map_ call for this purpose per your knowledge, or a new one is required? If the latter, what's the additional logic to be implemented there? For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do what you need, I think. For PV, I think we probably need a new map operation with sensible semantics. My inclination would be to have it follow the grant-map semantics (i.e. caller supplies domid + gfn, hypervisor supplies BFN and success/failure code). setup mapping is not a big problem. it's more about finding available BFNs in a way not conflicting with other usages e.g. memory hotplug, ballooning (well for this I'm not sure now whether it's only for existing gfns from other thread...) Malcolm might have opinions about this -- it starts looking like the sort of PV IOMMU interface he's suggested before. we'd like to hear Malcolm's suggestion here. - when you say _map_, do you expect this mapped into dom0's virtual address space, or just guest physical space? For PVH, I mean into guest physical address space (and iommu tables, since those
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Fri, Dec 12, 2014 at 6:29 AM, Tian, Kevin kevin.t...@intel.com wrote: We're not there in the current design, purely because XenGT has to be in dom0 (so it can trivially DoS Xen by rebooting the host). Can we really decouple dom0 from DoS Xen? I know there's on-going effort like PVH Dom0, however there are lots of trickiness in Dom0 which can put the platform into a bad state. One example is ACPI. All the platform details are encapsulated in AML language, and only dom0 knows how to handle ACPI events. Unless Xen has another parser to guard all possible resources which might be touched thru ACPI, a tampered dom0 has many way to break out. But that'd be very challenging and complex. If we can't containerize Dom0's behavior completely, I would think dom0 and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't make things worse. The question here is, If a malicious guest can manage to break into XenGT, what can they do? If XenGT is running in dom0, then the answer is, At very least, they can DoS the host because dom0 is allowed to reboot; they can probably do lots of other nasty things as well. If XenGT is running in its own domain, and can only add IOMMU entries for MFNs belonging to XenGT-only VMs, then the answer is, They can access other XenGT-enabled VMs, but they cannot shut down the host or access non-XenGT VMs. Slides 8-11 of a presentation I gave (http://www.slideshare.net/xen_com_mgr/a-brief-tutorial-on-xens-advanced-security-features) can give you a graphical idea of what we're' talking about. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
Hi, At 07:24 + on 12 Dec (1418365491), Tian, Kevin wrote: I'm afraid not. There's nothing worrying per se in a backend knowing the MFNs of the pages -- the worry is that the backend can pass the MFNs to hardware. If the check happens only at lookup time, then XenGT can (either through a bug or a security breach) just pass _any_ MFN to the GPU for DMA. But even without considering the security aspects, this model has bugs that may be impossible for XenGT itself to even detect. E.g.: 1. Guest asks its virtual GPU to DMA to a frame of memory; 2. XenGT looks up the GFN-MFN mapping; 3. Guest balloons out the page; 4. Xen allocates the page to a different guest; 5. XenGT passes the MFN to the GPU, which DMAs to it. Whereas if stage 2 is a _mapping_ operation, Xen can refcount the underlying memory and make sure it doesn't get reallocated until XenGT is finished with it. yes, I see your point. Now we can't support ballooning in VM given above reason, and refcnt is required to close that gap. but just to confirm one point. from my understanding whether it's a mapping operation doesn't really matter. We can invent an interface to get p2m mapping and then increase refcnt. the key is refcnt here. when XenGT constructs a shadow GPU page table, it creates a reference to guest memory page so the refcnt must be increased. :-) True. :) But Xen does need to remember all the refcounts that were created (so it can tidy up if the domain crashes). If Xen is already doing that it might as well do it in the IOMMU tables since that solves other problems. [First some hopefully-helpful diagrams to explain my thinking. I'll borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the addresses that devices issue their DMAs in: what's 'BFN' short for? Bus Frame Number? Yes, I think so. If we replace that lookup with a _map_ hypercall, either with Xen choosing the BFN (as happens in the PV grant map operation) or with the guest choosing an unused address (as happens in the HVM/PVH grant map operation), then: - the only extra code in XenGT itself is that you need to unmap when you change the GTT; - Xen can track and control exactly which MFNs XenGT/the GPU can access; - running XenGT in a driver domain or PVH dom0 ought to work; and - we fix the race condition I described above. ok, I see your point here. It does sound like a better design to meet Xen hypervisor's security requirement and can also work with PVH Dom0 or driver domain. Previously even when we said a MFN is required, it's actually a BFN due to IOMMU existence, and it works just because we have a 1:1 identity mapping in-place. And by finding a BFN some follow-up think here: - one extra unmap call will have some performance impact, especially for media processing workloads where GPU page table modifications are hot. but suppose this can be optimized with batch request Yep. In general I'd hope that the extra overhead of unmap is small compared with the trap + emulate + ioreq + schedule that's just happened. Though I know that IOTLB shootdowns are potentially rather expensive right now so it might want some measurement. - is there existing _map_ call for this purpose per your knowledge, or a new one is required? If the latter, what's the additional logic to be implemented there? For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do what you need, I think. For PV, I think we probably need a new map operation with sensible semantics. My inclination would be to have it follow the grant-map semantics (i.e. caller supplies domid + gfn, hypervisor supplies BFN and success/failure code). Malcolm might have opinions about this -- it starts looking like the sort of PV IOMMU interface he's suggested before. - when you say _map_, do you expect this mapped into dom0's virtual address space, or just guest physical space? For PVH, I mean into guest physical address space (and iommu tables, since those are the same). For PV, I mean just the IOMMU tables -- since the guest controls its own PFN space entirely there's nothing Xen can to map things into it. - how is BFN or unused address (what do you mean by address here?) allocated? does it need present in guest physical memory at boot time, or just finding some holes? That's really a question for the xen maintainers in the linux kernel. I presume that whatever bookkeeping they currently do for grant-mapped memory would suffice here just as well. - graphics memory size could be large. starting from BDW, there'll be 64bit page table format. Do you see any limitation here on finding BFN or address? Not really. The IOMMU tables are also 64-bit so there must be enough addresses to map all of RAM. There shouldn't be any need for these mappings to be _contiguous_, btw. You just need to have one free address for each mapping. Again, following how grant maps work, I'd imagine that
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 18/12/14 16:08, Tim Deegan wrote: yep. Just curious, I thought stubdomain is not popularly used. typical case is to have qemu in dom0. is this still true? :-) Some do and some don't. :) High-security distros like Qubes and XenClient do. You can enable it in xl config files pretty easily. IIRC the xapi toolstack doesn't use it, but XenServer uses privilege separation to isolate the qemu processes in dom0. We are looking into stubdomains as part of future architectural roadmap, but as identified, there is a lot of toolstack plumbing required before this be feasible to put into XenServer. Our privilege separate in qemu is a stopgap measure which we would like to replace in due course. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 15.12.14 at 07:25, kevin.t...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] On 12.12.14 at 08:24, kevin.t...@intel.com wrote: - how is BFN or unused address (what do you mean by address here?) allocated? does it need present in guest physical memory at boot time, or just finding some holes? Fitting this into holes should be fine. this is an interesting open to be further discussed. Here we need consider the extreme case, i.e. a 64bit GPU page table can legitimately use up all the system memory allocates to that VM, and considering dozens of VMs, it means we need reserve a very large hole. Oh, it's guest RAM you want mapped, not frame buffer space. But still you're never going to have to map more than the total amount of host RAM, and (with Linux) we already assume everything can be mapped through the 1:1 mapping. I.e. the only collision would be with excessive PFN reservations for ballooning purposes. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 15.12.14 at 10:05, kevin.t...@intel.com wrote: yes, definitely host RAM is the upper limit, and what I'm concerning here is how to reserve (at boot time) or allocate (on-demand) such large PFN resource, w/o collision with other PFN reservation usage (ballooning should be fine since it's operating existing RAM ranges in dom0 e820 table). I don't think ballooning is restricted to the regions named RAM in Dom0's E820 table (at least it shouldn't be, and wasn't in the classic Xen kernels). Maybe we can reserve a big-enough reserved region in dom0's e820 table at boot time, for all PFN reservation usages, and then allocate them on-demand for specific usages? What would big enough here mean (i.e. how would one determine the needed size up front)? Plus any form of allocation would need a reasonable approach to avoid fragmentation. And anyway I'm not getting what position you're on: Do you expect to be able to fit everything that needs mapping into the available mapping space (as your reply above seems to imply) or do you think there won't be enough mapping space (as earlier replies of yours appeared to indicate)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Monday, December 15, 2014 5:23 PM On 15.12.14 at 10:05, kevin.t...@intel.com wrote: yes, definitely host RAM is the upper limit, and what I'm concerning here is how to reserve (at boot time) or allocate (on-demand) such large PFN resource, w/o collision with other PFN reservation usage (ballooning should be fine since it's operating existing RAM ranges in dom0 e820 table). I don't think ballooning is restricted to the regions named RAM in Dom0's E820 table (at least it shouldn't be, and wasn't in the classic Xen kernels). well, nice to know that. Maybe we can reserve a big-enough reserved region in dom0's e820 table at boot time, for all PFN reservation usages, and then allocate them on-demand for specific usages? What would big enough here mean (i.e. how would one determine the needed size up front)? Plus any form of allocation would need a reasonable approach to avoid fragmentation. And anyway I'm not getting what position you're on: Do you expect to be able to fit everything that needs mapping into the available mapping space (as your reply above seems to imply) or do you think there won't be enough mapping space (as earlier replies of yours appeared to indicate)? I expect to have everything mapped into the available mapping space, and is asking for suggestions what's the best way to find and reserve available PFNs in a way not conflicting with other usages (either virtualization features like ballooning that you mentioned, or bare metal features like PCI hotplug or memory hotplug). Tanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 15.12.14 at 16:22, stefano.stabell...@eu.citrix.com wrote: On Mon, 15 Dec 2014, Jan Beulich wrote: On 15.12.14 at 10:05, kevin.t...@intel.com wrote: yes, definitely host RAM is the upper limit, and what I'm concerning here is how to reserve (at boot time) or allocate (on-demand) such large PFN resource, w/o collision with other PFN reservation usage (ballooning should be fine since it's operating existing RAM ranges in dom0 e820 table). I don't think ballooning is restricted to the regions named RAM in Dom0's E820 table (at least it shouldn't be, and wasn't in the classic Xen kernels). Could you please elaborate more on this? It seems counter-intuitive at best. I don't see what's counter-intuitive here. How can the hypervisor (Dom0) or tool stack (DomU) know what ballooning intentions a guest kernel may have? It's solely the guest kernel's responsibility to make sure its ballooning activities don't collide with anything else address-wise. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Mon, 15 Dec 2014, Jan Beulich wrote: On 15.12.14 at 16:22, stefano.stabell...@eu.citrix.com wrote: On Mon, 15 Dec 2014, Jan Beulich wrote: On 15.12.14 at 10:05, kevin.t...@intel.com wrote: yes, definitely host RAM is the upper limit, and what I'm concerning here is how to reserve (at boot time) or allocate (on-demand) such large PFN resource, w/o collision with other PFN reservation usage (ballooning should be fine since it's operating existing RAM ranges in dom0 e820 table). I don't think ballooning is restricted to the regions named RAM in Dom0's E820 table (at least it shouldn't be, and wasn't in the classic Xen kernels). Could you please elaborate more on this? It seems counter-intuitive at best. I don't see what's counter-intuitive here. How can the hypervisor (Dom0) or tool stack (DomU) know what ballooning intentions a guest kernel may have? The hypervisor checks that the memory the guest is giving back is actually ram, as a consequence the ballooning interface only supports ram. Do you agree? Ballooning is restricted to regions named RAM in the e820 table, because Linux respects e820 in its pfn-mfn mappings. However it is true that respecting the e820 in dom0 is not part of the interface. It's solely the guest kernel's responsibility to make sure its ballooning activities don't collide with anything else address-wise. In the sense that it is in the guest kernel's responsibility to use the interface properly. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, December 12, 2014 6:54 PM On 12.12.14 at 08:24, kevin.t...@intel.com wrote: - is there existing _map_ call for this purpose per your knowledge, or a new one is required? If the latter, what's the additional logic to be implemented there? I think the answer to this depends on whether you want to use grants. The goal of using the native driver in the guest (mentioned further down) speaks against this, in which case I don't think we have an existing interface. yes, grants don't apply here. - when you say _map_, do you expect this mapped into dom0's virtual address space, or just guest physical space? Iiuc you don't care about the memory to be visible to the CPU, all you need is it being translated by the IOMMU. In which case the input address space for the IOMMU (which is different between PV and PVH) is where this needs to be mapped into. it should be in p2m level, not just in IOMMU. otherwise I'm wondering there'll be tricky issues ahead due to inconsistent mapping between EPT and IOMMU page table (though a specific attributes like r/w may be different from previous split table discussion). another reason here. If we just talk about shadow GPU page table, yes it's used by device only so IOMMU mapping is enough. However we do have several other places where we need to map and access guest memory, e.g. scanning command in a buffer mapped through GPU page table ( currently through remap_domain_mfn_range_in_kernel). - how is BFN or unused address (what do you mean by address here?) allocated? does it need present in guest physical memory at boot time, or just finding some holes? Fitting this into holes should be fine. this is an interesting open to be further discussed. Here we need consider the extreme case, i.e. a 64bit GPU page table can legitimately use up all the system memory allocates to that VM, and considering dozens of VMs, it means we need reserve a very large hole. I once remember some similar cases requiring grabbing some unmapped pfns (in grant table?). So wonder whether there's already a clean interface for such purpose, or we need tweak a new one to allocate unmapped pfns (but won't conflict with usages like memory hotplug)... appreciate any suggestion here. - graphics memory size could be large. starting from BDW, there'll be 64bit page table format. Do you see any limitation here on finding BFN or address? I don't think this concern differs much for the different models: As long as you don't want the same underlying memory to be accessible by more than one guest, the address space requirements ought to be the same. See above. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
Hi, At 01:41 + on 11 Dec (1418258504), Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] It is Xen's job to isolate VMs from each other. As part of that, Xen uses the MMU, nested paging, and IOMMUs to control access to RAM. Any software component that can pass a raw MFN to hardware breaks that isolation, because Xen has no way of controlling what that component can do (including taking over the hypervisor). This is why I am afraid when developers ask for GFN-MFN translation functions. When I agree Xen's job absolutely, the isolation is also required in different layers, regarding to who controls the resource and where the virtualization happens. For example talking about I/O virtualization, Dom0 or driver domain needs to isolate among backend drivers to avoid one backend interfering with another. Xen doesn't know such violation, since it only knows it's Dom0 wants to access a VM's page. I'm going to write second reply to this mail in a bit, to talk about this kind of system-level design. In this email I'll just talk about the practical aspects of interfaces and address spaces and IOMMUs. btw curious of how worse exposing GFN-MFN translation compared to allowing mapping other VM's GFN? If exposing GFN-MFN is under the same permission control as mapping, would it avoid your worry here? I'm afraid not. There's nothing worrying per se in a backend knowing the MFNs of the pages -- the worry is that the backend can pass the MFNs to hardware. If the check happens only at lookup time, then XenGT can (either through a bug or a security breach) just pass _any_ MFN to the GPU for DMA. But even without considering the security aspects, this model has bugs that may be impossible for XenGT itself to even detect. E.g.: 1. Guest asks its virtual GPU to DMA to a frame of memory; 2. XenGT looks up the GFN-MFN mapping; 3. Guest balloons out the page; 4. Xen allocates the page to a different guest; 5. XenGT passes the MFN to the GPU, which DMAs to it. Whereas if stage 2 is a _mapping_ operation, Xen can refcount the underlying memory and make sure it doesn't get reallocated until XenGT is finished with it. When the backend component gets a GFN from the guest, it wants an address that it can give to the GPU for DMA that will map the right memory. That address must be mapped in the IOMMU tables that the GPU will be using, which means the IOMMU tables of the backend domain, IIUC[1]. So the hypercall it needs is not give me the MFN that matches this GFN but please map this GFN into my IOMMU tables. Here please map this GFN into my IOMMU tables actually breaks the IOMMU isolation. IOMMU is designed for serving DMA requests issued by an exclusive VM, so IOMMU page table can restrict that VM's attempts strictly. To map multiple VM's GFNs into one IOMMU table, the 1st thing is to avoid GFN conflictions to make it functional. We thought about this approach previously, e.g. by reserving highest 3 bits of GFN as VMID, so one IOMMU page table can be used to combine multi-VM's page table together. However doing so have two limitations: a) it still requires write-protect guest GPU page table, and maintain a shadow GPU page table by translate from real GFN to pseudo GFN (plus VMID), which doesn't save any engineering effort in the device model part Yes -- since there's only one IOMMU context for the whole GPU, the XenGT backend still has to audit all GPU commands to maintain isolation between clients. b) it breaks the designed isolation intrinsic of IOMMU. In such case, IOMMU can't isolate multiple VMs by itself, since a DMA request can target any pseudo GFN if valid in the page table. We have to rely on the audit in the backend component in Dom0 to ensure the isolation. Yep. c) this introduces tricky logic in IOMMU driver to handle such non-standard multiplexed page table style. w/o a SR-IOV implementation (so each VF has its own IOMMU page table), I don't see using IOMMU can help isolation here. If I've understood your argument correctly, it basically comes down to It would be extra work for no benefit, because XenGT still has to do all the work of isolating GPU clients from each other. It's true that XenGT still has to isolate its clients, but there are other benefits. The main one, from my point of view as a Xen maintainer, is that it allows Xen to constrain XenGT itself, in the case where bugs or security breaches mean that XenGT tries to access memory it shouldn't. More about that in my other reply. I'll talk about the rest below. yes, this is a good feedback we didn't think about before. So far the reason why XenGT can work is because we use default IOMMU setting which set up a 1:1 r/w mapping for all possible RAM, so when GPU hits a MFN thru shadow GPU page table, IOMMU is essentially bypassed. However like you said, if IOMMU page table is restricted to dom0's memory, or is not 1:1 identity mapping, XenGT will be
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
Hi, again. :) As promised, I'm going to talk about more abstract design considerations. Thi will be a lot less concrete than in the other email, and about a larger range of things. Some of of them may not be really desirable - or even possible. [ TL;DR: read the other reply with the practical suggestions in it :) ] I'm talking from the point of view of a hypervisor maintainer, looking at introducing this new XenGT component and thinking about what security properties we would like the _system_ to have once XenGT is introduced. I'm going to lay out a series of broadly increasing levels of security goodness and talk about what we'd need to do to get there. For the purposes of this discussion, Xen does not _trust_ XenGT. By that I mean that Xen can't rely on the correctness/integrity of XenGT itself to maintain system security. Now, we can decide that for some properties we _will_ choose to trust XenGT, but the default is to assume that XenGT could be compromised or buggy. (This is not intended as a slur on XenGT, btw -- this is how we reason about device driver domains, qemu-dm and other components. There will be bugs in any component, and we're designing the system to minimise the effect of those bugs.) OK. Properties we would like to have: LEVEL 0: Protect Xen itself from XenGT -- Bugs in XenGT should not be able to crash he host, and a compromised XenGT should not be able to take over the hypervisor We're not there in the current design, purely because XenGT has to be in dom0 (so it can trivially DoS Xen by rebooting the host). But it doesn't seem too hard: as soon as we can run XenGT in a driver domain, and with IOMMU tables that restrict the GPU from writing to Xen's datastructures, we'll have this property. [BTW, this whole discussion assumes that the GPU has no 'back door' access to issue DMA that is not translated by the IOMMU. I have heard rumours in the past that such things exist. :) If the GPU can issue untranslated DMA, then whetever controls it can take over the entire system, and so we can't make _any_ security guarantees about it.] LEVEL 1: Isolate XenGT's clients from other VMs --- In other words we partition the machine into VMs XenGT can touch (i.e. its clients) and those it can't. Then a malicious client that compromises XenGT only gains access to other VMs that share a GPU with it. That means we can deploy XenGT for some VMs without increasing the risk to other tenants. Again we're not there yet, but I think the design I was talking about in my other email would do it: if XenGT must map all the memory it wants to let the GPU DMA to, and Xen's policy is to deny mappings for non-client-vm memory, then VMs that aren't using XenGT are protected. LEVEL 2: Isolate XenGT's clients from each other This is trickier, as you pointed out. We could: a) Decide that we will trust XenGT to provide this property. After all, that's its main purpose! This is how we treat other shared backends: if a NIC device driver domain is compromised, the attacker controls the network traffic for all its frontends. OTOH, we don't trust qemu in that way -- instead we use stub domains and IS_PRIV_FOR to enforce isolation. b) Move all of XenGT into Xen. This is just defining the problem away and would probably do more harm than good - after all, keeping it separate has other advantages. c) Use privilege separation: break XenGT into parts, isolated from each other, with the principle of least privilege applied to them. E.g. - GPU emulation could be in a per-client component that doesn't share state with the other clients' emulators; - Shadowing GTTs and auditing GPU commands could move into Xen, with a clean interface to the emulation parts. That way, even if a client VM can exploit a bug in the emulator, it can't affect other clients because it can't see their emulator state, and it can't bypass the safety rules because they're enforced by Xen. When I talked about privilege separation before I was suggesting something like this, but without moving anything into Xen -- e.g. the device-emulation code for each client could be in a per-client, non-root process. The code that audits and issues commands to the GPU would be in a separate process, which is allowed to make hypercalls, and which does not trust the emulator processes. My apologies if you're already doing this -- I know XenGT has some components in a kernel driver and some elsewhere but I haven't looked at the details. LEVEL 3: Isolate XenGT's clients from XenGT itself -- XenGT should not be able to access parts of its client VMs that they have not given it permission to. E.g. XenGT should not be able to read a client VM's crypto keys unless it displays them on the
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tim Deegan Sent: Friday, December 12, 2014 12:47 AM Hi, At 01:41 + on 11 Dec (1418258504), Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] It is Xen's job to isolate VMs from each other. As part of that, Xen uses the MMU, nested paging, and IOMMUs to control access to RAM. Any software component that can pass a raw MFN to hardware breaks that isolation, because Xen has no way of controlling what that component can do (including taking over the hypervisor). This is why I am afraid when developers ask for GFN-MFN translation functions. When I agree Xen's job absolutely, the isolation is also required in different layers, regarding to who controls the resource and where the virtualization happens. For example talking about I/O virtualization, Dom0 or driver domain needs to isolate among backend drivers to avoid one backend interfering with another. Xen doesn't know such violation, since it only knows it's Dom0 wants to access a VM's page. I'm going to write second reply to this mail in a bit, to talk about this kind of system-level design. In this email I'll just talk about the practical aspects of interfaces and address spaces and IOMMUs. sure. I've replied to another design mail before seeing this. my bad outlook rule didn't push this mail to my eye, and fortunately I dig it out when wondering Hi, again in your another mail. :-) btw curious of how worse exposing GFN-MFN translation compared to allowing mapping other VM's GFN? If exposing GFN-MFN is under the same permission control as mapping, would it avoid your worry here? I'm afraid not. There's nothing worrying per se in a backend knowing the MFNs of the pages -- the worry is that the backend can pass the MFNs to hardware. If the check happens only at lookup time, then XenGT can (either through a bug or a security breach) just pass _any_ MFN to the GPU for DMA. But even without considering the security aspects, this model has bugs that may be impossible for XenGT itself to even detect. E.g.: 1. Guest asks its virtual GPU to DMA to a frame of memory; 2. XenGT looks up the GFN-MFN mapping; 3. Guest balloons out the page; 4. Xen allocates the page to a different guest; 5. XenGT passes the MFN to the GPU, which DMAs to it. Whereas if stage 2 is a _mapping_ operation, Xen can refcount the underlying memory and make sure it doesn't get reallocated until XenGT is finished with it. yes, I see your point. Now we can't support ballooning in VM given above reason, and refcnt is required to close that gap. but just to confirm one point. from my understanding whether it's a mapping operation doesn't really matter. We can invent an interface to get p2m mapping and then increase refcnt. the key is refcnt here. when XenGT constructs a shadow GPU page table, it creates a reference to guest memory page so the refcnt must be increased. :-) When the backend component gets a GFN from the guest, it wants an address that it can give to the GPU for DMA that will map the right memory. That address must be mapped in the IOMMU tables that the GPU will be using, which means the IOMMU tables of the backend domain, IIUC[1]. So the hypercall it needs is not give me the MFN that matches this GFN but please map this GFN into my IOMMU tables. Here please map this GFN into my IOMMU tables actually breaks the IOMMU isolation. IOMMU is designed for serving DMA requests issued by an exclusive VM, so IOMMU page table can restrict that VM's attempts strictly. To map multiple VM's GFNs into one IOMMU table, the 1st thing is to avoid GFN conflictions to make it functional. We thought about this approach previously, e.g. by reserving highest 3 bits of GFN as VMID, so one IOMMU page table can be used to combine multi-VM's page table together. However doing so have two limitations: a) it still requires write-protect guest GPU page table, and maintain a shadow GPU page table by translate from real GFN to pseudo GFN (plus VMID), which doesn't save any engineering effort in the device model part Yes -- since there's only one IOMMU context for the whole GPU, the XenGT backend still has to audit all GPU commands to maintain isolation between clients. b) it breaks the designed isolation intrinsic of IOMMU. In such case, IOMMU can't isolate multiple VMs by itself, since a DMA request can target any pseudo GFN if valid in the page table. We have to rely on the audit in the backend component in Dom0 to ensure the isolation. Yep. c) this introduces tricky logic in IOMMU driver to handle such non-standard multiplexed page table style. w/o a SR-IOV implementation (so each VF has its own IOMMU page table), I don't see using IOMMU can help isolation here. If I've understood your argument correctly, it basically comes down to It would be extra work for no benefit, because XenGT still has to do all the
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tian, Kevin Sent: Friday, December 12, 2014 2:30 PM Conclusion -- That's enough rambling from me -- time to come back down to earth. While I think it's useful to think about all these things, we don't want to get carried away. :) And as I said, for some things we can decide to trust XenGT to provide them, as long as we're clear about what that means. I think that a reasonable minimum standard to expect is to enforce levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3. And I think we can do that without needing any huge engineering effort; as I said, I think that's covered in my earlier reply. I agree the conclusion that minimum standard to expect is to enforce levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3, except the concern whether PVH Dom0 is a hard requirement or not. Having said that, I'm happy to discuss technical detail in another thread on how to support PVH Dom0. So after going through another mail, now I agree both level 0/1 can't be enforced. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 10.12.14 at 02:07, kevin.t...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, December 09, 2014 6:50 PM On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote: On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. Hmm, that suggests you indeed need raw MFNs, which in turn seems problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation layer). But while you don't use the IOMMU yourself, I suppose the GPU accesses still don't bypass the IOMMU? In which case all you'd need returned is a frame number that guarantees that after IOMMU translation it refers to the correct MFN, i.e. still allowing for your Dom0 driver to simply set aside a part of its PFN space, asking Xen to (IOMMU-)map the necessary guest frames into there. No. What we require is the raw MFNs. One IOMMU device entry can't point to multiple VM's page tables, so that's why XenGT needs to use software shadow GPU page table to implement the sharing. Note it's not for dom0 to access the MFN. It's for dom0 to setup the correct shadow GPU page table, so a VM can access the graphics memory in a controlled way. So what's the translation flow here: driver - GPU - IOMMU - hardware or driver - IOMMU - GPU - hardware? Or do things get set up for the GPU to bypass the IOMMU altogether? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, December 10, 2014 4:39 PM On 10.12.14 at 02:07, kevin.t...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, December 09, 2014 6:50 PM On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote: On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. Hmm, that suggests you indeed need raw MFNs, which in turn seems problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation layer). But while you don't use the IOMMU yourself, I suppose the GPU accesses still don't bypass the IOMMU? In which case all you'd need returned is a frame number that guarantees that after IOMMU translation it refers to the correct MFN, i.e. still allowing for your Dom0 driver to simply set aside a part of its PFN space, asking Xen to (IOMMU-)map the necessary guest frames into there. No. What we require is the raw MFNs. One IOMMU device entry can't point to multiple VM's page tables, so that's why XenGT needs to use software shadow GPU page table to implement the sharing. Note it's not for dom0 to access the MFN. It's for dom0 to setup the correct shadow GPU page table, so a VM can access the graphics memory in a controlled way. So what's the translation flow here: driver - GPU - IOMMU - hardware or driver - IOMMU - GPU - hardware? Or do things get set up for the GPU to bypass the IOMMU altogether? two translation paths in assigned case: 1. [direct CPU access from VM], with partitioned PCI aperture resource, every VM can access a portion of PCI aperture directly. - CPU page table/EPT: CPU virtual address-PCI aperture - PCI aperture - bar base = Graphics Memory Address (GMA) - GPU page table: GMA - GPA (as programmed by guest) - IOMMU: GPA - MPA 2. [GPU access through GPU command operands], with GPU scheduling, every VM's command buffer will be fetched by GPU in a time-shared manner. - GPU page table: GMA-GPA - IOMMU: GPA-MPA In our case, IOMMU is setup with 1:1 identity table for dom0. So when GPU may access GPAs from different VMs, we can't count on IOMMU which can only serve one mapping for one device (unless we have SR-IOV). That's why we need shadow GPU page table in dom0, and need a p2m query call to translate from GPA - MPA: - shadow GPU page table: GMA-MPA - IOMMU: MPA-MPA (for dom0) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tian, Kevin Sent: Wednesday, December 10, 2014 4:48 PM From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, December 10, 2014 4:39 PM On 10.12.14 at 02:07, kevin.t...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, December 09, 2014 6:50 PM On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote: On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. Hmm, that suggests you indeed need raw MFNs, which in turn seems problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation layer). But while you don't use the IOMMU yourself, I suppose the GPU accesses still don't bypass the IOMMU? In which case all you'd need returned is a frame number that guarantees that after IOMMU translation it refers to the correct MFN, i.e. still allowing for your Dom0 driver to simply set aside a part of its PFN space, asking Xen to (IOMMU-)map the necessary guest frames into there. No. What we require is the raw MFNs. One IOMMU device entry can't point to multiple VM's page tables, so that's why XenGT needs to use software shadow GPU page table to implement the sharing. Note it's not for dom0 to access the MFN. It's for dom0 to setup the correct shadow GPU page table, so a VM can access the graphics memory in a controlled way. So what's the translation flow here: driver - GPU - IOMMU - hardware or driver - IOMMU - GPU - hardware? Or do things get set up for the GPU to bypass the IOMMU altogether? two translation paths in assigned case: 1. [direct CPU access from VM], with partitioned PCI aperture resource, every VM can access a portion of PCI aperture directly. sorry the above description is for XenGT shared case, and the below translation is for VT-d assigned case. Just put there to indicate the necessity of same translation path in XenGT. - CPU page table/EPT: CPU virtual address-PCI aperture - PCI aperture - bar base = Graphics Memory Address (GMA) - GPU page table: GMA - GPA (as programmed by guest) - IOMMU: GPA - MPA 2. [GPU access through GPU command operands], with GPU scheduling, every VM's command buffer will be fetched by GPU in a time-shared manner. - GPU page table: GMA-GPA - IOMMU: GPA-MPA In our case, IOMMU is setup with 1:1 identity table for dom0. So when GPU may access GPAs from different VMs, we can't count on IOMMU which can only serve one mapping for one device (unless we have SR-IOV). That's why we need shadow GPU page table in dom0, and need a p2m query call to translate from GPA - MPA: - shadow GPU page table: GMA-MPA - IOMMU: MPA-MPA (for dom0) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 10.12.14 at 09:47, kevin.t...@intel.com wrote: two translation paths in assigned case: 1. [direct CPU access from VM], with partitioned PCI aperture resource, every VM can access a portion of PCI aperture directly. - CPU page table/EPT: CPU virtual address-PCI aperture - PCI aperture - bar base = Graphics Memory Address (GMA) - GPU page table: GMA - GPA (as programmed by guest) - IOMMU: GPA - MPA 2. [GPU access through GPU command operands], with GPU scheduling, every VM's command buffer will be fetched by GPU in a time-shared manner. - GPU page table: GMA-GPA - IOMMU: GPA-MPA In our case, IOMMU is setup with 1:1 identity table for dom0. So when GPU may access GPAs from different VMs, we can't count on IOMMU which can only serve one mapping for one device (unless we have SR-IOV). That's why we need shadow GPU page table in dom0, and need a p2m query call to translate from GPA - MPA: - shadow GPU page table: GMA-MPA - IOMMU: MPA-MPA (for dom0) I still can't see why the Dom0 translation has to remain 1:1, i.e. why Xen couldn't return some arbitrary GPA for the query in question here, setting up a suitable GPA-MPA translation. (I put arbitrary in quotes because this of course must not conflict with GPAs already or possibly in use by Dom0.) And I can only stress again that you shouldn't leave out PVH (where the IOMMU already isn't set up with all 1:1 mappings) from these considerations. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, December 10, 2014 5:17 PM On 10.12.14 at 09:47, kevin.t...@intel.com wrote: two translation paths in assigned case: 1. [direct CPU access from VM], with partitioned PCI aperture resource, every VM can access a portion of PCI aperture directly. - CPU page table/EPT: CPU virtual address-PCI aperture - PCI aperture - bar base = Graphics Memory Address (GMA) - GPU page table: GMA - GPA (as programmed by guest) - IOMMU: GPA - MPA 2. [GPU access through GPU command operands], with GPU scheduling, every VM's command buffer will be fetched by GPU in a time-shared manner. - GPU page table: GMA-GPA - IOMMU: GPA-MPA In our case, IOMMU is setup with 1:1 identity table for dom0. So when GPU may access GPAs from different VMs, we can't count on IOMMU which can only serve one mapping for one device (unless we have SR-IOV). That's why we need shadow GPU page table in dom0, and need a p2m query call to translate from GPA - MPA: - shadow GPU page table: GMA-MPA - IOMMU: MPA-MPA (for dom0) I still can't see why the Dom0 translation has to remain 1:1, i.e. why Xen couldn't return some arbitrary GPA for the query in question here, setting up a suitable GPA-MPA translation. (I put arbitrary in quotes because this of course must not conflict with GPAs already or possibly in use by Dom0.) And I can only stress again that you shouldn't leave out PVH (where the IOMMU already isn't set up with all 1:1 mappings) from these considerations. It's interesting that you think IOMMU can be used in such situation. what do you mean by arbitrary GPA here? and It's not just about conflicting with Dom0's GPA, it's about confliction in all VM's GPAs when you hosting them through one IOMMU page table, and there's no way to prevent this definitely since GPAs are picked by VMs themselves. I don't think we can support PVH here if IOMMU is not 1:1 mapping. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 10.12.14 at 02:14, kevin.t...@intel.com wrote: From: Tim Deegan [mailto:t...@xen.org] It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. In our case it's not because the security model is problematic. It's because GPU virtualization is done in Dom0 while the memory virtualization is done in hypervisor. Which by itself is a questionable design decision. We need a means to query GPFN-MFN so we can setup shadow GPU page table in Dom0 correctly, for a VM. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). yes, IOMMU protect DMA accesses in a device-agnostic way. But in our case, IOMMU can't be used because it's only for exclusively assigned case, as I replied in another mail. And to reduce the hypervisor TCB, we put device model in Dom0 which is why a interface is required to connect p2m information. So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. A please-map-this-gfn interface assumes the logic behind lies in Xen hypervisor, e.g. managing CPU page table or IOMMU entry. However here the management of GPU page table is in Dom0, and what we want is a please-tell-me-mfn-for-a-gpfn interface, so we can translate from gpfn in guest GPU PTE to a mfn in shadow GPU PTE. As said before, what needs to be put in the GPU PTE depends on what the subsequent IOMMU translation would do to the address. It's not a hard requirement for the IOMMU to pass through all addresses for Dom0, so we have room to isolate things if possible. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
At 01:14 + on 10 Dec (1418170461), Tian, Kevin wrote: From: Tim Deegan [mailto:t...@xen.org] Sent: Tuesday, December 09, 2014 6:47 PM At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. In our case it's not because the security model is problematic. It's because GPU virtualization is done in Dom0 while the memory virtualization is done in hypervisor. We need a means to query GPFN-MFN so we can setup shadow GPU page table in Dom0 correctly, for a VM. I don't think we understand each other. Let me try to explain what I mean. My apologies if this sounds patronising; I'm just trying to be as clear as I can. It is Xen's job to isolate VMs from each other. As part of that, Xen uses the MMU, nested paging, and IOMMUs to control access to RAM. Any software component that can pass a raw MFN to hardware breaks that isolation, because Xen has no way of controlling what that component can do (including taking over the hypervisor). This is why I am afraid when developers ask for GFN-MFN translation functions. So if the XenGT model allowed the backend component to (cause the GPU to) perform arbitrary DMA without IOMMU checks, then that component would have complete access to the system and (from a security pov) might as well be running in the hypervisor. That would be very problematic, but AFAICT that's not what's going on. From your reply on the other thread it seems like the GPU is behind the IOMMU, so that's OK. :) When the backend component gets a GFN from the guest, it wants an address that it can give to the GPU for DMA that will map the right memory. That address must be mapped in the IOMMU tables that the GPU will be using, which means the IOMMU tables of the backend domain, IIUC[1]. So the hypercall it needs is not give me the MFN that matches this GFN but please map this GFN into my IOMMU tables. Asking for the MFN will only work if the backend domain's IOMMU tables have an existing 1:1 r/w mapping of all guest RAM, which happens to be the case if the backend component is in dom0 _and_ dom0 is PV _and_ we're not using strict IOMMU tables. Restricting XenGT to work in only those circumstances would be short-sighted, not only because it would mean XenGT could never work as a driver domain, but also because it seems like PVH dom0 is going to be the default at some point. If the existing hypercalls that make IOMMU mappings are not right for XenGT then we can absolutely consider adding some more. But we need to talk about what policy Xen will enforce on the mapping requests. If the shared backend is allowed to map any page of any VM, then it can easily take control of any VM on the host (even though the IOMMU will prevent it from taking over the hypervisor itself). The absolute minumum we should allow here is some toolstack-controlled list of which VMs the XenGT backend is serving, so that it can refuse to map other VMs' memory (like an extension of IS_PRIV_FOR, which does this job for Qemu). I would also strongly advise using privilege separation in the backend between the GPUPT shadow code (which needs mapping rights and is trusted to maintain isolation between the VMs that are sharing the GPU) and the rest of the XenGT backend (which doesn't/isn't). But that's outside my remit as a hypervisor maintainer so it goes no further than an I told you so. :) Cheers, Tim. [1] That is, AIUI this GPU doesn't context-switch which set of IOMMU tables it's using for DMA, SR-IOV-style, and that's why you need a software component in the first place. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 10/12/14 09:51, Tian, Kevin wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, December 10, 2014 5:17 PM On 10.12.14 at 09:47, kevin.t...@intel.com wrote: two translation paths in assigned case: 1. [direct CPU access from VM], with partitioned PCI aperture resource, every VM can access a portion of PCI aperture directly. - CPU page table/EPT: CPU virtual address-PCI aperture - PCI aperture - bar base = Graphics Memory Address (GMA) - GPU page table: GMA - GPA (as programmed by guest) - IOMMU: GPA - MPA 2. [GPU access through GPU command operands], with GPU scheduling, every VM's command buffer will be fetched by GPU in a time-shared manner. - GPU page table: GMA-GPA - IOMMU: GPA-MPA In our case, IOMMU is setup with 1:1 identity table for dom0. So when GPU may access GPAs from different VMs, we can't count on IOMMU which can only serve one mapping for one device (unless we have SR-IOV). That's why we need shadow GPU page table in dom0, and need a p2m query call to translate from GPA - MPA: - shadow GPU page table: GMA-MPA - IOMMU: MPA-MPA (for dom0) I still can't see why the Dom0 translation has to remain 1:1, i.e. why Xen couldn't return some arbitrary GPA for the query in question here, setting up a suitable GPA-MPA translation. (I put arbitrary in quotes because this of course must not conflict with GPAs already or possibly in use by Dom0.) And I can only stress again that you shouldn't leave out PVH (where the IOMMU already isn't set up with all 1:1 mappings) from these considerations. It's interesting that you think IOMMU can be used in such situation. what do you mean by arbitrary GPA here? and It's not just about conflicting with Dom0's GPA, it's about confliction in all VM's GPAs when you hosting them through one IOMMU page table, and there's no way to prevent this definitely since GPAs are picked by VMs themselves. I don't think we can support PVH here if IOMMU is not 1:1 mapping. I agree with Jan, there doesn't need to be a fixed 1:1 mapping between IOMMU and MFN's addresses. I think all that's required is that there is an IOMMU mapping for the GPU device connected to dom0 (or driver domain) which allows guest memory to be accessed by the GPU. This IOMMU address is what is programmed into shadow GPU page table, I refer to this address as Bus frame number(BFN) in the PV IOMMU design document. - shadow GPU page table: GMA-BFN - IOMMU: BFN-MPA IOMMU's can almost always address more than the host physical RAM so we can create IOMMU mappings above the top of host physical RAM in order to have IOMMU mappings of guest RAM. The PV-IOMMU design allows the guest to have control of the IOMMU address space. In theory it could be extended to have permission checks for mapping guest MFN's and have a mapping interface which takes a domid and a GMFN. That way the driver domain does not need to know the actual MFN's being used. The guest itself (CPU) accesses the GPU via outbound MMIO mappings so we don't need to be concerned with address translation in that direction. I think getting Xen to allocate IOMMU mappings for a driver domain will be problematic for PV based driver domains because the M2P for PV domains is not kept strictly upto date with what the guest is using for P2M and so it will be difficult/impossible to determine which addresses are not in use. Similarly it may be difficult to HVM guests because P2M mapping are outbound (CPU to rest of host) and determining what addresses are suitable for inbound access (rest of host to memory) may be difficult. I.E should MMIO outbound address space be used for inbound IOMMU mappings? I hope I've not caused more confusion. Malcolm Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, December 10, 2014 6:36 PM On 10.12.14 at 02:14, kevin.t...@intel.com wrote: From: Tim Deegan [mailto:t...@xen.org] It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. In our case it's not because the security model is problematic. It's because GPU virtualization is done in Dom0 while the memory virtualization is done in hypervisor. Which by itself is a questionable design decision. I don't think we want to put a ~20K LOC device model in hypervisor. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Ian Campbell [mailto:ian.campb...@citrix.com] Sent: Wednesday, December 10, 2014 6:11 PM On Wed, 2014-12-10 at 01:48 +, Tian, Kevin wrote: I'm not familiar with Arm architecture, but based on a brief reading it's for the assigned case where the MMU is exclusive owned by a VM, so some type of MMU virtualization is required and it's straightforward. However XenGT is a shared GPU usage: - a global GPU page table is partitioned among VMs. a shared shadow global page table is maintained, containing translations for multiple VMs simultaneously based on partitioning information - multiple per-process GPU page tables are created by each VM, and multiple shadow per-process GPU page tables are created correspondingly. shadow page table is switched when doing GPU context switch, same as what we did for CPU shadow page table. None of that sounds to me to be impossible to do in the remoteproc model, perhaps it needs some extensions from its initial core feature set but I see no reason why it couldn't maintain multiple sets of page tables, each tagged with an owning domain (for validation purposes) and a mechanism to switch between them, or to be able to manage partitioning of the GPU address space. here we're talking about multiple GPU page tables on top of a IOMMU page table. Instead of one MMU unit concerned here in remoteproc. So you can see above shared MMU virtualization usage is very GPU specific, AIUI remoteproc is specific to a particular h/w device too, i.e. there is a device specific stub in the hypervisor which essentially knows how to implement set_pte for that bit of h/w, with appropriate safety and validation, as well as a write_cr3 type operation. that's why we didn't put in Xen hypervisor, and thus additional interface is required to get p2m mapping to assist our shadow GPU page table usage. There is a great reluctance among several maintainers to expose real hardware MFNs to VMs (including dom0 and backend driver domains). I think you need to think very carefully about possible ways of avoiding the need for this. Yes, this might require some changes to your current mode/design. We're open to changes if necessary. Thanks, Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
-Original Message- From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com] Sent: 09 December 2014 10:11 To: Paul Durrant; Keir (Xen.org); Tim (Xen.org); jbeul...@suse.com; Kevin Tian; Xen-devel@lists.xen.org Subject: One question about the hypercall to translate gfn to mfn. Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. So solution 1 is to revert this commit. However, since this hypercall was removed ages ago, the reverting met many conflicts, i.e. the gmfn_to_mfn is no longer used in x86, etc. 2 In our project, we defined a new hypercall XENMEM_get_mfn_from_pfn, which has a similar implementation like the previous XENMEM_translate_gpfn_list. One of the major differences is that this newly defined one is only for x86(called in arch_memory_op), so we do not have to worry about the arm side. Does anyone has any suggestions about this? IIUC what is needed is a means to IOMMU map a gfn in the service domain (dom0 for the moment) such that it can be accessed by the GPU. I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Paul Thanks in advance. :) B.R. Yu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 09.12.14 at 11:10, yu.c.zh...@linux.intel.com wrote: As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. So solution 1 is to revert this commit. However, since this hypercall was removed ages ago, the reverting met many conflicts, i.e. the gmfn_to_mfn is no longer used in x86, etc. 2 In our project, we defined a new hypercall XENMEM_get_mfn_from_pfn, which has a similar implementation like the previous XENMEM_translate_gpfn_list. One of the major differences is that this newly defined one is only for x86(called in arch_memory_op), so we do not have to worry about the arm side. Does anyone has any suggestions about this? Out of the two 1 seems preferable. But without background (see also Paul's reply) it's hard to tell whether that's what you want/need. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. B.R. Yu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
-Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. Paul Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. I'm not sure I'm fully understanding what's going on here, but is a variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also returns a DMA handle a plausible solution? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
-Original Message- From: Ian Campbell Sent: 09 December 2014 11:11 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. I'm not sure I'm fully understanding what's going on here, but is a variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also returns a DMA handle a plausible solution? I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. Paul Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 09.12.14 at 12:17, paul.durr...@citrix.com wrote: I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. With shared page tables, there's no way to do one without the other. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On 09/12/14 11:23, Jan Beulich wrote: On 09.12.14 at 12:17, paul.durr...@citrix.com wrote: I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. With shared page tables, there's no way to do one without the other. Interestingly the IOMMU in front of the Intel GPU is only capable of handling 4k pages and so we wouldn't end up with share page tables being used. For other PCI device's then shared page tables will be a problem. Malcolm Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote: -Original Message- From: Ian Campbell Sent: 09 December 2014 11:11 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. I'm not sure I'm fully understanding what's going on here, but is a variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also returns a DMA handle a plausible solution? I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. Another (wildly under-informed) thought then: A while back Global logic proposed (for ARM) an infrastructure for allowing dom0 drivers to maintain a set of iommu like pagetables under hypervisor supervision (they called these remoteprocessor iommu). I didn't fully grok what it was at the time, let alone remember the details properly now, but AIUI it was essentially a framework for allowing a simple Xen side driver to provide PV-MMU-like update operations for a set of PTs which were not the main-processor's PTs, with validation etc. See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945 The introductory email even mentions GPUs... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
-Original Message- From: Ian Campbell Sent: 09 December 2014 11:29 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote: -Original Message- From: Ian Campbell Sent: 09 December 2014 11:11 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. I'm not sure I'm fully understanding what's going on here, but is a variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also returns a DMA handle a plausible solution? I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. Another (wildly under-informed) thought then: A while back Global logic proposed (for ARM) an infrastructure for allowing dom0 drivers to maintain a set of iommu like pagetables under hypervisor supervision (they called these remoteprocessor iommu). I didn't fully grok what it was at the time, let alone remember the details properly now, but AIUI it was essentially a framework for allowing a simple Xen side driver to provide PV-MMU-like update operations for a set of PTs which were not the main-processor's PTs, with validation etc. See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945 The introductory email even mentions GPUs... That series does indeed seem to be very relevant. Paul Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, December 09, 2014 6:50 PM On 09.12.14 at 11:37, yu.c.zh...@linux.intel.com wrote: On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. Hmm, that suggests you indeed need raw MFNs, which in turn seems problematic wrt PVH Dom0 (or you'd need a GFN-GMFN translation layer). But while you don't use the IOMMU yourself, I suppose the GPU accesses still don't bypass the IOMMU? In which case all you'd need returned is a frame number that guarantees that after IOMMU translation it refers to the correct MFN, i.e. still allowing for your Dom0 driver to simply set aside a part of its PFN space, asking Xen to (IOMMU-)map the necessary guest frames into there. No. What we require is the raw MFNs. One IOMMU device entry can't point to multiple VM's page tables, so that's why XenGT needs to use software shadow GPU page table to implement the sharing. Note it's not for dom0 to access the MFN. It's for dom0 to setup the correct shadow GPU page table, so a VM can access the graphics memory in a controlled way. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Tim Deegan [mailto:t...@xen.org] Sent: Tuesday, December 09, 2014 6:47 PM At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. In our case it's not because the security model is problematic. It's because GPU virtualization is done in Dom0 while the memory virtualization is done in hypervisor. We need a means to query GPFN-MFN so we can setup shadow GPU page table in Dom0 correctly, for a VM. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). yes, IOMMU protect DMA accesses in a device-agnostic way. But in our case, IOMMU can't be used because it's only for exclusively assigned case, as I replied in another mail. And to reduce the hypervisor TCB, we put device model in Dom0 which is why a interface is required to connect p2m information. So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. A please-map-this-gfn interface assumes the logic behind lies in Xen hypervisor, e.g. managing CPU page table or IOMMU entry. However here the management of GPU page table is in Dom0, and what we want is a please-tell-me-mfn-for-a-gpfn interface, so we can translate from gpfn in guest GPU PTE to a mfn in shadow GPU PTE. Hope this makes the requirement clearer. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Malcolm Crossley Sent: Tuesday, December 09, 2014 6:52 PM On 09/12/14 10:37, Yu, Zhang wrote: On 12/9/2014 6:19 PM, Paul Durrant wrote: I think use of an raw mfn value currently works only because dom0 is using a 1:1 IOMMU mapping scheme. Is my understanding correct, or do you really need raw mfn values? Thanks for your quick response, Paul. Well, not exactly for this case. :) In XenGT, our need to translate gfn to mfn is for GPU's page table, which contains the translation between graphic address and the memory address. This page table is maintained by GPU drivers, and our service domain need to have a method to translate the guest physical addresses written by the vGPU into host physical ones. We do not use IOMMU in XenGT and therefore this translation may not necessarily be a 1:1 mapping. XenGT must use the IOMMU mappings that Xen has setup for the domain which owns the GPU. Currently Dom0 own's the GPU and so it's IOMMU mappings match the MFN's addresses. I suspect XenGT will not work if Xen is booted with iommu=dom0-strict. This is a good point. So yes in this case IOMMU is still active which contains a 1:1 IOMMU mapping table, but it's a separate thing from the interface discussed here, which is about setup a shadow GPU page table for other VM's graphics memory accesses. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.
From: Paul Durrant [mailto:paul.durr...@citrix.com] Sent: Tuesday, December 09, 2014 7:44 PM -Original Message- From: Ian Campbell Sent: 09 December 2014 11:29 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:17 +, Paul Durrant wrote: -Original Message- From: Ian Campbell Sent: 09 December 2014 11:11 To: Paul Durrant Cc: Tim (Xen.org); Yu, Zhang; Kevin Tian; Keir (Xen.org); jbeul...@suse.com; Xen-devel@lists.xen.org Subject: Re: [Xen-devel] One question about the hypercall to translate gfn to mfn. On Tue, 2014-12-09 at 11:05 +, Paul Durrant wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: 09 December 2014 10:47 To: Yu, Zhang Cc: Paul Durrant; Keir (Xen.org); jbeul...@suse.com; Kevin Tian; Xen- de...@lists.xen.org Subject: Re: One question about the hypercall to translate gfn to mfn. At 18:10 +0800 on 09 Dec (1418145055), Yu, Zhang wrote: Hi all, As you can see, we are pushing our XenGT patches to the upstream. One feature we need in xen is to translate guests' gfn to mfn in XenGT dom0 device model. Here we may have 2 similar solutions: 1 Paul told me(and thank you, Paul :)) that there used to be a hypercall, XENMEM_translate_gpfn_list, which was removed by Keir in commit 2d2f7977a052e655db6748be5dabf5a58f5c5e32, because there was no usage at that time. It's been suggested before that we should revive this hypercall, and I don't think it's a good idea. Whenever a domain needs to know the actual MFN of another domain's memory it's usually because the security model is problematic. In particular, finding the MFN is usually followed by a brute-force mapping from a dom0 process, or by passing the MFN to a device for unprotected DMA. These days DMA access should be protected by IOMMUs, or else the device drivers (and associated tools) are effectively inside the hypervisor's TCB. Luckily on x86 IOMMUs are widely available (and presumably present on anything new enough to run XenGT?). So I think the interface we need here is a please-map-this-gfn one, like the existing grant-table ops (which already do what you need by returning an address suitable for DMA). If adding a grant entry for every frame of the framebuffer within the guest is too much, maybe we can make a new interface for the guest to grant access to larger areas. IIUC the in-guest driver is Xen-unaware so any grant entry would have to be put in the guests table by the tools, which would entail some form of flexibly sized reserved range of grant entries otherwise any PV driver that are present in the guest would merrily clobber the new grant entries. A domain can already priv map a gfn into the MMU, so I think we just need an equivalent for the IOMMU. I'm not sure I'm fully understanding what's going on here, but is a variant of XENMEM_add_to_physmap+XENMAPSPACE_gmfn_foreign which also returns a DMA handle a plausible solution? I think we want be able to avoid setting up a PTE in the MMU since it's not needed in most (or perhaps all?) cases. Another (wildly under-informed) thought then: A while back Global logic proposed (for ARM) an infrastructure for allowing dom0 drivers to maintain a set of iommu like pagetables under hypervisor supervision (they called these remoteprocessor iommu). I didn't fully grok what it was at the time, let alone remember the details properly now, but AIUI it was essentially a framework for allowing a simple Xen side driver to provide PV-MMU-like update operations for a set of PTs which were not the main-processor's PTs, with validation etc. See http://thread.gmane.org/gmane.comp.emulators.xen.devel/212945 The introductory email even mentions GPUs... That series does indeed seem to be very relevant. Paul I'm not familiar with Arm architecture, but based on a brief reading it's for the assigned case where the MMU is exclusive owned by a VM, so some type of MMU virtualization is required and it's straightforward. However XenGT is a shared GPU usage: - a global GPU page table is partitioned among VMs. a shared shadow global page table is maintained, containing translations for multiple VMs simultaneously based on partitioning information - multiple per-process GPU page tables are created by each VM, and multiple shadow per-process GPU page tables are created correspondingly. shadow page table is switched when