On Wed, 20 Apr 2022, Wei Chen wrote:
> > On Tue, 19 Apr 2022, Wei Chen wrote:
> > > > > ### 3.2. Xen Event Channel Support
> > > > >     In Current RFC patches we haven't enabled the event channel
> > support.
> > > > >     But I think it's good opportunity to do some discussion in
> > advanced.
> > > > >     On Armv8-R, all VMs are native direct-map, because there is no
> > > > stage2
> > > > >     MMU translation. Current event channel implementation depends on
> > > > some
> > > > >     shared pages between Xen and guest: `shared_info` and per-cpu
> > > > `vcpu_info`.
> > > > >
> > > > >     For `shared_info`, in current implementation, Xen will allocate
> > a
> > > > page
> > > > >     from heap for `shared_info` to store initial meta data. When
> > guest
> > > > is
> > > > >     trying to setup `shared_info`, it will allocate a free gfn and
> > use a
> > > > >     hypercall to setup P2M mapping between gfn and `shared_info`.
> > > > >
> > > > >     For direct-mapping VM, this will break the direct-mapping
> > concept.
> > > > >     And on an MPU based system, like Armv8-R system, this operation
> > will
> > > > >     be very unfriendly. Xen need to pop `shared_info` page from Xen
> > heap
> > > > >     and insert it to VM P2M pages. If this page is in the middle of
> > > > >     Xen heap, this means Xen need to split current heap and use
> > extra
> > > > >     MPU regions. Also for the P2M part, this page is unlikely to
> > form
> > > > >     a new continuous memory region with the existing p2m pages, and
> > Xen
> > > > >     is likely to need another additional MPU region to set it up,
> > which
> > > > >     is obviously a waste for limited MPU regions. And This kind of
> > > > dynamic
> > > > >     is quite hard to imagine on an MPU system.
> > > >
> > > > Yeah, it doesn't make any sense for MPU systems
> > > >
> > > >
> > > > >     For `vcpu_info`, in current implementation, Xen will store
> > > > `vcpu_info`
> > > > >     meta data for all vCPUs in `shared_info`. When guest is trying
> > to
> > > > setup
> > > > >     `vcpu_info`, it will allocate memory for `vcpu_info` from guest
> > side.
> > > > >     And then guest will use hypercall to copy meta data from
> > > > `shared_info`
> > > > >     to guest page. After that both Xen `vcpu_info` and guest
> > `vcpu_info`
> > > > >     are pointed to the same page that allocated by guest.
> > > > >
> > > > >     This implementation has serval benifits:
> > > > >     1. There is no waste memory. No extra memory will be allocated
> > from
> > > > Xen heap.
> > > > >     2. There is no P2M remap. This will not break the direct-mapping,
> > > > and
> > > > >        is MPU system friendly.
> > > > >     So, on Armv8-R system, we can still keep current implementation
> > for
> > > > >     per-cpu `vcpu_info`.
> > > > >
> > > > >     So, our proposal is that, can we reuse current implementation
> > idea
> > > > of
> > > > >     `vcpu_info` for `shared_info`? We still allocate one page for
> > > > >     `d->shared_info` at domain construction for holding some initial
> > > > meta-data,
> > > > >     using alloc_domheap_pages instead of alloc_xenheap_pages and
> > > > >     share_xen_page_with_guest. And when guest allocates a page for
> > > > >     `shared_info` and use hypercall to setup it,  We copy the
> > initial
> > > > data from
> > > > >     `d->shared_info` to it. And after copy we can update `d-
> > > > >shared_info` to point
> > > > >     to guest allocated 'shared_info' page. In this case, we don't
> > have
> > > > to think
> > > > >     about the fragmentation of Xen heap and p2m and the extra MPU
> > > > regions.
> > > >
> > > > Yes, I think that would work.
> > > >
> > > > Also I think it should be possible to get rid of the initial
> > > > d->shared_info allocation in Xen, given that d->shared_info is for the
> > > > benefit of the guest and the guest cannot access it until it makes the
> > > > XENMAPSPACE_shared_info hypercall.
> > > >
> > >
> > > While we're working on event channel PoC work on Xen Armv8-R, we found
> > > another issue after we dropped d->shared_info allocation in Xen. Both
> > > shared_info and vcpu_info are allocated from Guest in runtime. That
> > > means the addresses of shared_info and vcpu_info are random. For MMU
> > > system, this is OK, because Xen has a full view of system memory in
> > > runtime. But for MPU system, the situation becomes a little tricky.
> > > We have to setup extra MPU regions for remote domains' shared_info
> > > and vcpu_info in event channel hypercall runtime. That's because
> > > in current Xen hypercall concept, hypercall will not cause vCPU
> > > context switch. When hypercall trap to EL2, it will keep vCPU's
> > > P2M view. For MMU system, we have vttbr_el2 for vCPU P2M view and
> > > ttbr_el2 for Xen view. So in EL2 Xen has full permissions to access
> > > any memory it wants. But for MPU system, we only have one EL2 MPU.
> > > Before entering guest, Xen will setup vCPU P2M view in EL2 MPU.
> > > In this case, when system entry EL2 through hypercall, the EL2
> > > MPU still keeps current vCPU P2M view and with Xen essential
> > > memory (code, data, heap) access permissions. But current EL2 MPU
> > > doesn't have the access permissions for EL2 to access other
> > > domain's memory. For an event channel hypercall, if we want to
> > > update the pending bitmap in remote domain's vcpu_info, it will
> > > cause a dataabort in EL2. To solve this dataabort, we may have
> > > two methods:
> > > 1. Map remote domain's whole memory or pages for shared_info +
> > >    vcpu_info in EL2 MPU temporarily for hypercall to update
> > >    pending bits or other accesses.
> > >
> > >    This method doesn't need to do context switch for EL2 MPU,
> > >    But this method has some disadvantages:
> > >    1. We have to reserve MPU regions for hypercall.
> > >    2. Different hypercall may have different reservation of
> > >       MPU regions.
> > >    3. We have to handle hypercall one by one for existed and
> > >       new in future.
> > >
> > > 2. Switch to Xen's memory view in EL2 MPU when trap from EL1 to
> > >    EL2. In this case, Xen will have full memory access permissions
> > >    to update pending bits in EL2. This only changes the EL2 MPU
> > >    context, does not need to do vCPU context switch. Because the
> > >    trapped vCPU will be used in the full flow of hypercall. After
> > >    the hypercall, before returning to EL2, the EL2 MPU will switch
> > >    to scheduled vCPU' P2M view.
> > >    This method needs to do EL2 MPU context switch, but:
> > >    1. We don't need to reserve MPU regions for Xen's memory view.
> > >       (Xen's memory view has been setup while initialization)
> > >    2. We don't need to handle pages' mapping in hypercall level.
> > >    3. Apply to other EL1 to EL2 traps, like dataabort, IRQ, etc.
> > 
> > 
> > Both approach 1) and 2) are acceptable and in fact I think we'll
> > probably have to do a combination of both.
> > 
> > We don't need to do a full MPU context switch every time we enter Xen.
> > We can be flexible. Only when Xen needs to access another guest memory,
> > if the memory is not mappable using approach 1), Xen could do a full MPU
> > context switch. Basically, try 1) first, if it is not possible, do 2).
> > 
> > This also solves the problem of "other hypercalls". We can always do 2)
> > if we cannot do 1).
> > 
> > So do we need to do 1) at all? It really depends on performance data.
> > Not all hypercalls are made equal. Some are very rare and it is fine if
> > they are slow. Some hypercalls are actually on the hot path. The event
> > channels hypercalls are on the hot path so they need to be fast. It
> > makes sense to implement 1) just for event channels hypercalls if the
> > MPU context switch is slow.
> > 
> > Data would help a lot here to make a good decision. Specifically, how
> > much more expensive is an EL2 MPU context switch compared to add/remove
> > of an MPU region in nanosec or cpu cycles?
> > 
> 
> We will do it when we get a proper platform.
> 
> > 
> > The other aspect is how many extra MPU regions do we need for each guest
> > to implement 1). Do we need one extra MPU region for each domU? If so, I
> > don't think approach 1) if feasible unless we come up with a smart
> > memory allocation scheme for shared_info and vcpu_info. For instance, if
> > shared_info and vcpu_info of all guests were part of the Xen data or
> > heap region, or 1 other special MPU region, then they could become
> > immediately accessible without need for extra mappings when switching to
> > EL2.
> > 
> 
> Allocate shared_info and vcpu_info from Xen data or heap will cause memory
> fragmentation. We have to split the Xen data or heap and populate the pages
> for shared_info and vcpu_info, And insert them to Guest P2M. Because Armv8-R
> MPU doesn't allow memory overlap, this will cause at least 2 extra MPU
> regions usage. One page could not exist in Xen MPU region and Guest P2M
> MPU region at the same time. And we definitely don't want to make the entire
> Xen data and heap accessible to EL1. And this approach does not solve the
> 100% direct mapping problem. A special MPU region might have the same issues.
> Except we make this special MPU region can be accessed in EL1 and EL2 at
> runtime (it's unsafe), and update hypercall to use pages from this special
> region for shared_info and vcpu_info (every guest can see this region, so
> it's still 1:1 mapping).
> 
> For 1), the concern is caused by our current rough PoC, we used extra MPU
> regions to map the whole memory of remote domain, whose may have serval
> memory blocks in the worst case. We have thought it further, we can reduce
> the map granularity to page. For example, Xen wants to update shared_info
> or vcpu_info, Xen must know the address of it. So we can just map this
> one page temporarily. So I think only reserve 1 MPU region for runtime
> mapping is feasible on most platforms.

Actually I think that it would be great if we can do that. It looks like
the best way forward.


> But the additional problem with this is that if the hypercall are
> modifying multiple variables, Xen may need to do multiple mappings if
> they are not on the same page (or a proper MPU region range).

There are not that many hypercalls that require Xen to map multiple
pages, and those might be OK if they are slow.

Reply via email to