Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Thu, Jan 10, 2019 at 1:41 PM Eric Chanudet wrote: > > On 06/01/19 at 11:42pm, Christopher Clark wrote: > >+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, > >+ const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, > >+ uint32_t len) > >+{ > >+unsigned int mfns_index = offset >> PAGE_SHIFT; > >+void *dst; > >+int ret; > >+unsigned int src_offset = 0; > >+ > >+ASSERT(spin_is_locked(&ring_info->lock)); > >+ > >+offset &= ~PAGE_MASK; > >+ > >+if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > > >XEN_ARGO_MAX_RING_SIZE) ) > >+return -EFAULT; > With offset < PAGE_SIZE with the previous mask, shouldn't the sanity > check be: > if (len + offset > XEN_ARGO_MAX_RING_SIZE) Yes, that's correct - thanks. I'll switch the len and offset arguments to unsigned int while at it. Christopher ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Thu, Jan 10, 2019 at 4:53 AM Jan Beulich wrote: > > >>> On 10.01.19 at 13:40, wrote: > > On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich wrote: > >> > >> >>> On 10.01.19 at 13:01, wrote: > >> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark > > wrote: > >> >> > >> >> The second reason is about avoiding exposing the Xen virtual memory > >> >> allocator directly to frequent guest-supplied size requests for > >> >> contiguous regions (of up to 16GB). > >> > > >> > As said in another reply, I'm not sure allowing 16GB rings is safe. > >> > The amount of internal memory required to track such rings is not > >> > trivial given the arrays to store the mfns, the pages, and the virtual > >> > mappings. > >> > > >> >> With single-page allocations to > >> >> build a ring, fragmentation is not a problem, and mischief by a guest > >> >> seems difficult. > >> > > >> > Hm, there's still a lot of big dynamic memory allocations in order to > >> > support a 16GB ring, which makes me think that virtual address space > >> > is not the only problem if you allow 16GB rings. > >> > > >> >> Changing it to issue requests for contiguous regions, > >> >> with variable ring sizes up to the maximum of 16GB, it seems like > >> >> significant fragmentation may be achievable. I don't know the > >> >> practical impact of that but it seems worth avoiding. Are the other > >> >> users of __vmap (or vmap) for multi-gigabyte regions only either > >> >> boot-time, infrequent operations (livepatch), or for actions by > >> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already > >> >> a frequent operation somewhere else? > >> > > >> > I haven't checked, but I would be quite surprised to find any vmap > >> > usage with such size (16GB). Maybe someone more familiar with the mm > >> > subsystem can provide some insight here. > >> > >> And indeed the vmap range reserved in VA space is just 64GB (on > >> x86) at present. > >> > >> >> Given the context above, and Jason's simplification to the > >> >> memcpy_to_guest_ring function, plus the imminent merge freeze > >> >> deadline, and the understanding that this loop and the related data > >> >> structures supporting it have been tested and are working, would it be > >> >> acceptable to omit making this contiguous mapping change from this > >> >> current series? > >> > > >> > My opinion would be to just use vmap if it works, because that IMO > >> > greatly simplifies the code by being able to have the whole ring > >> > mapped at all the time. It would remove the iteration to copy > >> > requests, and remove the usage of ring_map_page everywhere. That would > >> > be my recommendation code-wise, but as said above someone more > >> > familiar with the mm subsystem might have other opinion's about how to > >> > deal with accesses to 16GB of guest memory, and indeed your iterative > >> > solution might be the best approach. > >> > >> No-one can allocate 16GB physically contiguous memory. > > > > Right, my question/comment was whether it would make sense to limit > > the size of the argos ring to something smaller and then use vmap to > > map the whole ring in contiguous virtual space in order to ease > > accesses. > > Whether you vmap() the ring in (page sized) pieces or in one blob is, > for the purpose of the order of magnitude of VA space consumption, > not overly relevant: You can't map more than at most three such > gigantic rings anyway with the current VA layout. (In practice > mapping individual pages would halve the effectively usable VA > space, due to the guard pages inserted between regions.) IOW - > the ring size should be bounded at a lower value anyway imo. > > > TBH, I'm not sure virtual address space is the only issue if argos > > allows 16GB rings to be used. 16GB rings will consume a non-trivial > > amount of memory for the internal argos state tracking structures > > AFAICT. > > Fully agree. It has taken us ages to eliminate all runtime > allocations of order > 0, and it looks like we'd be gaining some > back here. I consider this tolerable as long as the feature is > experimental, but it would need fixing for it to become fully > supported. Christopher - annotating such code with fixme > comments right away helps later spotting (and addressing) them. Sorry for blowing this thread up with the ring size statement based on the incorrect comment (and thanks, Eric, for checking it). I don't think 16MB rings introduce anywhere near the level of concern for internal state, but I'll look at the allocations and see if fixmes are appropriate. Christopher ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On 06/01/19 at 11:42pm, Christopher Clark wrote: +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, + uint32_t len) +{ +unsigned int mfns_index = offset >> PAGE_SHIFT; +void *dst; +int ret; +unsigned int src_offset = 0; + +ASSERT(spin_is_locked(&ring_info->lock)); + +offset &= ~PAGE_MASK; + +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) ) +return -EFAULT; With offset < PAGE_SIZE with the previous mask, shouldn't the sanity check be: if (len + offset > XEN_ARGO_MAX_RING_SIZE) -- Eric Chanudet ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
>>> On 10.01.19 at 13:40, wrote: > On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich wrote: >> >> >>> On 10.01.19 at 13:01, wrote: >> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark > wrote: >> >> >> >> The second reason is about avoiding exposing the Xen virtual memory >> >> allocator directly to frequent guest-supplied size requests for >> >> contiguous regions (of up to 16GB). >> > >> > As said in another reply, I'm not sure allowing 16GB rings is safe. >> > The amount of internal memory required to track such rings is not >> > trivial given the arrays to store the mfns, the pages, and the virtual >> > mappings. >> > >> >> With single-page allocations to >> >> build a ring, fragmentation is not a problem, and mischief by a guest >> >> seems difficult. >> > >> > Hm, there's still a lot of big dynamic memory allocations in order to >> > support a 16GB ring, which makes me think that virtual address space >> > is not the only problem if you allow 16GB rings. >> > >> >> Changing it to issue requests for contiguous regions, >> >> with variable ring sizes up to the maximum of 16GB, it seems like >> >> significant fragmentation may be achievable. I don't know the >> >> practical impact of that but it seems worth avoiding. Are the other >> >> users of __vmap (or vmap) for multi-gigabyte regions only either >> >> boot-time, infrequent operations (livepatch), or for actions by >> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already >> >> a frequent operation somewhere else? >> > >> > I haven't checked, but I would be quite surprised to find any vmap >> > usage with such size (16GB). Maybe someone more familiar with the mm >> > subsystem can provide some insight here. >> >> And indeed the vmap range reserved in VA space is just 64GB (on >> x86) at present. >> >> >> Given the context above, and Jason's simplification to the >> >> memcpy_to_guest_ring function, plus the imminent merge freeze >> >> deadline, and the understanding that this loop and the related data >> >> structures supporting it have been tested and are working, would it be >> >> acceptable to omit making this contiguous mapping change from this >> >> current series? >> > >> > My opinion would be to just use vmap if it works, because that IMO >> > greatly simplifies the code by being able to have the whole ring >> > mapped at all the time. It would remove the iteration to copy >> > requests, and remove the usage of ring_map_page everywhere. That would >> > be my recommendation code-wise, but as said above someone more >> > familiar with the mm subsystem might have other opinion's about how to >> > deal with accesses to 16GB of guest memory, and indeed your iterative >> > solution might be the best approach. >> >> No-one can allocate 16GB physically contiguous memory. > > Right, my question/comment was whether it would make sense to limit > the size of the argos ring to something smaller and then use vmap to > map the whole ring in contiguous virtual space in order to ease > accesses. Whether you vmap() the ring in (page sized) pieces or in one blob is, for the purpose of the order of magnitude of VA space consumption, not overly relevant: You can't map more than at most three such gigantic rings anyway with the current VA layout. (In practice mapping individual pages would halve the effectively usable VA space, due to the guard pages inserted between regions.) IOW - the ring size should be bounded at a lower value anyway imo. > TBH, I'm not sure virtual address space is the only issue if argos > allows 16GB rings to be used. 16GB rings will consume a non-trivial > amount of memory for the internal argos state tracking structures > AFAICT. Fully agree. It has taken us ages to eliminate all runtime allocations of order > 0, and it looks like we'd be gaining some back here. I consider this tolerable as long as the feature is experimental, but it would need fixing for it to become fully supported. Christopher - annotating such code with fixme comments right away helps later spotting (and addressing) them. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich wrote: > > >>> On 10.01.19 at 13:01, wrote: > > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark > > wrote: > >> > >> The second reason is about avoiding exposing the Xen virtual memory > >> allocator directly to frequent guest-supplied size requests for > >> contiguous regions (of up to 16GB). > > > > As said in another reply, I'm not sure allowing 16GB rings is safe. > > The amount of internal memory required to track such rings is not > > trivial given the arrays to store the mfns, the pages, and the virtual > > mappings. > > > >> With single-page allocations to > >> build a ring, fragmentation is not a problem, and mischief by a guest > >> seems difficult. > > > > Hm, there's still a lot of big dynamic memory allocations in order to > > support a 16GB ring, which makes me think that virtual address space > > is not the only problem if you allow 16GB rings. > > > >> Changing it to issue requests for contiguous regions, > >> with variable ring sizes up to the maximum of 16GB, it seems like > >> significant fragmentation may be achievable. I don't know the > >> practical impact of that but it seems worth avoiding. Are the other > >> users of __vmap (or vmap) for multi-gigabyte regions only either > >> boot-time, infrequent operations (livepatch), or for actions by > >> privileged (ie. somewhat trusted) domains (ioremap), or is it already > >> a frequent operation somewhere else? > > > > I haven't checked, but I would be quite surprised to find any vmap > > usage with such size (16GB). Maybe someone more familiar with the mm > > subsystem can provide some insight here. > > And indeed the vmap range reserved in VA space is just 64GB (on > x86) at present. > > >> Given the context above, and Jason's simplification to the > >> memcpy_to_guest_ring function, plus the imminent merge freeze > >> deadline, and the understanding that this loop and the related data > >> structures supporting it have been tested and are working, would it be > >> acceptable to omit making this contiguous mapping change from this > >> current series? > > > > My opinion would be to just use vmap if it works, because that IMO > > greatly simplifies the code by being able to have the whole ring > > mapped at all the time. It would remove the iteration to copy > > requests, and remove the usage of ring_map_page everywhere. That would > > be my recommendation code-wise, but as said above someone more > > familiar with the mm subsystem might have other opinion's about how to > > deal with accesses to 16GB of guest memory, and indeed your iterative > > solution might be the best approach. > > No-one can allocate 16GB physically contiguous memory. Right, my question/comment was whether it would make sense to limit the size of the argos ring to something smaller and then use vmap to map the whole ring in contiguous virtual space in order to ease accesses. TBH, I'm not sure virtual address space is the only issue if argos allows 16GB rings to be used. 16GB rings will consume a non-trivial amount of memory for the internal argos state tracking structures AFAICT. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
>>> On 10.01.19 at 13:01, wrote: > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark > wrote: >> >> The second reason is about avoiding exposing the Xen virtual memory >> allocator directly to frequent guest-supplied size requests for >> contiguous regions (of up to 16GB). > > As said in another reply, I'm not sure allowing 16GB rings is safe. > The amount of internal memory required to track such rings is not > trivial given the arrays to store the mfns, the pages, and the virtual > mappings. > >> With single-page allocations to >> build a ring, fragmentation is not a problem, and mischief by a guest >> seems difficult. > > Hm, there's still a lot of big dynamic memory allocations in order to > support a 16GB ring, which makes me think that virtual address space > is not the only problem if you allow 16GB rings. > >> Changing it to issue requests for contiguous regions, >> with variable ring sizes up to the maximum of 16GB, it seems like >> significant fragmentation may be achievable. I don't know the >> practical impact of that but it seems worth avoiding. Are the other >> users of __vmap (or vmap) for multi-gigabyte regions only either >> boot-time, infrequent operations (livepatch), or for actions by >> privileged (ie. somewhat trusted) domains (ioremap), or is it already >> a frequent operation somewhere else? > > I haven't checked, but I would be quite surprised to find any vmap > usage with such size (16GB). Maybe someone more familiar with the mm > subsystem can provide some insight here. And indeed the vmap range reserved in VA space is just 64GB (on x86) at present. >> Given the context above, and Jason's simplification to the >> memcpy_to_guest_ring function, plus the imminent merge freeze >> deadline, and the understanding that this loop and the related data >> structures supporting it have been tested and are working, would it be >> acceptable to omit making this contiguous mapping change from this >> current series? > > My opinion would be to just use vmap if it works, because that IMO > greatly simplifies the code by being able to have the whole ring > mapped at all the time. It would remove the iteration to copy > requests, and remove the usage of ring_map_page everywhere. That would > be my recommendation code-wise, but as said above someone more > familiar with the mm subsystem might have other opinion's about how to > deal with accesses to 16GB of guest memory, and indeed your iterative > solution might be the best approach. No-one can allocate 16GB physically contiguous memory. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark wrote: > > Thanks for the review, Roger. Replies inline below. > > On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné wrote: > > > > to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark > > wrote: > > > > > > sendv operation is invoked to perform a synchronous send of buffers > > > contained in iovs to a remote domain's registered ring. > > > > > > diff --git a/xen/common/argo.c b/xen/common/argo.c > > > index 59ce8c4..4548435 100644 > > > --- a/xen/common/argo.c > > > +++ b/xen/common/argo.c > > > > > > > +static int > > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, > > > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, > > > + uint32_t len) > > > +{ > > > +unsigned int mfns_index = offset >> PAGE_SHIFT; > > > +void *dst; > > > +int ret; > > > +unsigned int src_offset = 0; > > > + > > > +ASSERT(spin_is_locked(&ring_info->lock)); > > > + > > > +offset &= ~PAGE_MASK; > > > + > > > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > > > > XEN_ARGO_MAX_RING_SIZE) ) > > > +return -EFAULT; > > > + > > > +while ( (offset + len) > PAGE_SIZE ) > > > > I think you could map the whole ring in contiguous virtual address > > space and then writing to it would be much more easy, you wouldn't > > need to iterate with memcpy or copy_from_guest, take a look at __vmap. > > You could likely map this when the ring gets setup and keep it mapped > > for the lifetime of the ring. > > You're right about that, because map_domain_page_global, which the > current code uses, uses vmap itself. I think there's a couple of > reasons why the code has been implemented the iterative way though. > > The first is that I think ring resize has been a consideration: it's > useful to be able to increase the size of a live and active ring that > is under load without having to tear down the mappings, find a new > virtual address region of the right size and then remap it: you can > just supply some more memory and map those pages onto the end of the > ring, and ensure both sides know about the new ring size. Similarly, > shrinking a quiet ring can be useful. Is such on the fly expansion something common with argo? I'm not saying it's something that shouldn't be supported, but the burden of allowing such resizing doesn't seem trivial. You will have to redimension a lot of the arrays used to store the pages used, and at that point I wonder whether remapping the virtual address space is really the biggest issue you are going to have if you allow such run time resizing. > However, the "gfn race" that you (correctly) pointed out in an earlier > review, and Jan's related request to drop the "revalidate an existing > mapping on ring reregister" motivated removal of a section of the code > involved, and then in v3 of the series, I've actually just blocked > ring resize because it's missing a walk through the pending > notifications to find any that have become untriggerable with the new > ring size when a ring is shrunk and I'd like to defer implementing > that for now. So the ring resize reason is more of a consideration for > a possible later version of Argo than the current one. > > The second reason is about avoiding exposing the Xen virtual memory > allocator directly to frequent guest-supplied size requests for > contiguous regions (of up to 16GB). As said in another reply, I'm not sure allowing 16GB rings is safe. The amount of internal memory required to track such rings is not trivial given the arrays to store the mfns, the pages, and the virtual mappings. > With single-page allocations to > build a ring, fragmentation is not a problem, and mischief by a guest > seems difficult. Hm, there's still a lot of big dynamic memory allocations in order to support a 16GB ring, which makes me think that virtual address space is not the only problem if you allow 16GB rings. > Changing it to issue requests for contiguous regions, > with variable ring sizes up to the maximum of 16GB, it seems like > significant fragmentation may be achievable. I don't know the > practical impact of that but it seems worth avoiding. Are the other > users of __vmap (or vmap) for multi-gigabyte regions only either > boot-time, infrequent operations (livepatch), or for actions by > privileged (ie. somewhat trusted) domains (ioremap), or is it already > a frequent operation somewhere else? I haven't checked, but I would be quite surprised to find any vmap usage with such size (16GB). Maybe someone more familiar with the mm subsystem can provide some insight here. > Given the context above, and Jason's simplification to the > memcpy_to_guest_ring function, plus the imminent merge freeze > deadline, and the understanding that this loop and the related data > structures supporting it have been tested and are working, would it be > acceptable to omit making this contiguous mapping change from this > current series
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
Thanks for the review, Roger. Replies inline below. On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné wrote: > > to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark > wrote: > > > > sendv operation is invoked to perform a synchronous send of buffers > > contained in iovs to a remote domain's registered ring. > > > > diff --git a/xen/common/argo.c b/xen/common/argo.c > > index 59ce8c4..4548435 100644 > > --- a/xen/common/argo.c > > +++ b/xen/common/argo.c > > > > +static int > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, > > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, > > + uint32_t len) > > +{ > > +unsigned int mfns_index = offset >> PAGE_SHIFT; > > +void *dst; > > +int ret; > > +unsigned int src_offset = 0; > > + > > +ASSERT(spin_is_locked(&ring_info->lock)); > > + > > +offset &= ~PAGE_MASK; > > + > > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > > > XEN_ARGO_MAX_RING_SIZE) ) > > +return -EFAULT; > > + > > +while ( (offset + len) > PAGE_SIZE ) > > I think you could map the whole ring in contiguous virtual address > space and then writing to it would be much more easy, you wouldn't > need to iterate with memcpy or copy_from_guest, take a look at __vmap. > You could likely map this when the ring gets setup and keep it mapped > for the lifetime of the ring. You're right about that, because map_domain_page_global, which the current code uses, uses vmap itself. I think there's a couple of reasons why the code has been implemented the iterative way though. The first is that I think ring resize has been a consideration: it's useful to be able to increase the size of a live and active ring that is under load without having to tear down the mappings, find a new virtual address region of the right size and then remap it: you can just supply some more memory and map those pages onto the end of the ring, and ensure both sides know about the new ring size. Similarly, shrinking a quiet ring can be useful. However, the "gfn race" that you (correctly) pointed out in an earlier review, and Jan's related request to drop the "revalidate an existing mapping on ring reregister" motivated removal of a section of the code involved, and then in v3 of the series, I've actually just blocked ring resize because it's missing a walk through the pending notifications to find any that have become untriggerable with the new ring size when a ring is shrunk and I'd like to defer implementing that for now. So the ring resize reason is more of a consideration for a possible later version of Argo than the current one. The second reason is about avoiding exposing the Xen virtual memory allocator directly to frequent guest-supplied size requests for contiguous regions (of up to 16GB). With single-page allocations to build a ring, fragmentation is not a problem, and mischief by a guest seems difficult. Changing it to issue requests for contiguous regions, with variable ring sizes up to the maximum of 16GB, it seems like significant fragmentation may be achievable. I don't know the practical impact of that but it seems worth avoiding. Are the other users of __vmap (or vmap) for multi-gigabyte regions only either boot-time, infrequent operations (livepatch), or for actions by privileged (ie. somewhat trusted) domains (ioremap), or is it already a frequent operation somewhere else? Given the context above, and Jason's simplification to the memcpy_to_guest_ring function, plus the imminent merge freeze deadline, and the understanding that this loop and the related data structures supporting it have been tested and are working, would it be acceptable to omit making this contiguous mapping change from this current series? > > > +{ > > +unsigned int head_len = PAGE_SIZE - offset; > > + > > +ret = ring_map_page(ring_info, mfns_index, &dst); > > +if ( ret ) > > +return ret; > > + > > +if ( src ) > > +{ > > +memcpy(dst + offset, src + src_offset, head_len); > > +src_offset += head_len; > > +} > > +else > > +{ > > +ret = copy_from_guest(dst + offset, src_hnd, head_len) ? > > +-EFAULT : 0; > > +if ( ret ) > > +return ret; > > You can simplify this to: > > if ( copy_from_guest(...) ) > return -EFAULT; yes! ack - thanks > > +/* > > + * get_sanitized_ring creates a modified copy of the ring pointers where > > + * the rx_ptr is rounded up to ensure it is aligned, and then ring > > + * wrap is handled. Simplifies safe use of the rx_ptr for available > > + * space calculation. > > + */ > > +static int > > +get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info) > > +{ > > +uint32_t rx_ptr; > > +int ret; > > + > > +ret = get_rx_ptr(ring_info, &rx_ptr); > > +if ( ret ) > > +return ret; > > + > > +ring->tx_ptr = ring_inf
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Wed, Jan 9, 2019 at 10:05 AM Jason Andryuk wrote: > > On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark > wrote: > > > > > @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, > > uint32_t tx_ptr) > > smp_wmb(); > > } > > > > +static int > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, > > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, > > + uint32_t len) > > +{ > > +unsigned int mfns_index = offset >> PAGE_SHIFT; > > +void *dst; > > +int ret; > > +unsigned int src_offset = 0; > > + > > +ASSERT(spin_is_locked(&ring_info->lock)); > > + > > +offset &= ~PAGE_MASK; > > + > > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > > > XEN_ARGO_MAX_RING_SIZE) ) > > +return -EFAULT; > > + > > +while ( (offset + len) > PAGE_SIZE ) > > +{ > > +unsigned int head_len = PAGE_SIZE - offset; > > I think this while loop could be re-written as > while (len) { > head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len; > > and then the extra copying below outside the loop could be dropped. > > The first loop does a partial copy at offset and then sets offset=0. > The next N loops copy exactly PAGE_SIZE. > The Final copy does the remaining len bytes. That looks right to me and makes a nice simplification to that function -- thanks. > > > > + > > +/* > > + * iov_count returns its count on success via an out variable to avoid > > + * potential for a negative return value to be used incorrectly > > + * (eg. coerced into an unsigned variable resulting in a large incorrect > > value) > > + */ > > +static int > > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count) > > +{ > > +uint32_t sum_iov_lens = 0; > > + > > +if ( niov > XEN_ARGO_MAXIOV ) > > +return -EINVAL; > > + > > +while ( niov-- ) > > +{ > > +/* valid iovs must have the padding field set to zero */ > > +if ( piov->pad ) > > +{ > > +argo_dprintk("invalid iov: padding is not zero\n"); > > +return -EINVAL; > > +} > > + > > +/* check each to protect sum against integer overflow */ > > +if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE ) > > Should this be MAX_ARGO_MESSAGE_SIZE? MAX_ARGO_MESSAGE_SIZE is less > than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just > fail the one below. ack - I'll switch it to MAX_ARGO_MESSAGE_SIZE. > > > > @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, > > XEN_GUEST_HANDLE_PARAM(void) arg1, > > break; > > } > > > > +case XEN_ARGO_OP_sendv: > > +{ > > +xen_argo_send_addr_t send_addr; > > + > > +XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd = > > +guest_handle_cast(arg1, xen_argo_send_addr_t); > > +XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd = > > +guest_handle_cast(arg2, xen_argo_iov_t); > > +/* arg3 is niov */ > > +/* arg4 is message_type. Must be a 32-bit value. */ > > + > > +rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0; > > +if ( rc ) > > +break; > > + > > +if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY ) > > +send_addr.src.domain_id = currd->domain_id; > > + > > +/* No domain is currently authorized to send on behalf of another > > */ > > +if ( unlikely(send_addr.src.domain_id != currd->domain_id) ) > > +{ > > +rc = -EPERM; > > +break; > > +} > > + > > +/* Reject niov or message_type values that are outside 32 bit > > range. */ > > +if ( unlikely((arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xUL)) ) > > +{ > > +rc = -EINVAL; > > +break; > > +} > > This needs to check send_addr.src.pad and send_addr.dst.pad == 0. > sendv() does not check the padding either. ack - will fix. thanks Christopher ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark wrote: > > sendv operation is invoked to perform a synchronous send of buffers > contained in iovs to a remote domain's registered ring. > > It takes: > * A destination address (domid, port) for the ring to send to. >It performs a most-specific match lookup, to allow for wildcard. > * A source address, used to inform the destination of where to reply. > * The address of an array of iovs containing the data to send > * .. and the length of that array of iovs > * and a 32-bit message type, available to communicate message context >data (eg. kernel-to-kernel, separate from the application data). > > If insufficient space exists in the destination ring, it will return > -EAGAIN and Xen will notify the caller when sufficient space becomes > available. > > Accesses to the ring indices are appropriately atomic. The rings are > mapped into Xen's private address space to write as needed and the > mappings are retained for later use. > > Fixed-size types are used in some areas within this code where caution > around avoiding integer overflow is important. > > Notifications are sent to guests via VIRQ and send_guest_global_virq is > exposed in the change to enable argo to call it. VIRQ_ARGO_MESSAGE is > claimed from the VIRQ previously reserved for this purpose (#11). > > The VIRQ notification method is used rather than sending events using > evtchn functions directly because: > > * no current event channel type is an exact fit for the intended > behaviour. ECS_IPI is closest, but it disallows migration to > other VCPUs which is not necessarily a requirement for Argo. > > * at the point of argo_init, allocation of an event channel is > complicated by none of the guest VCPUs being initialized yet > and the event channel logic expects that a valid event channel > has a present VCPU. > > * at the point of signalling a notification, the VIRQ logic is already > defensive: if d->vcpu[0] is NULL, the notification is just silently > dropped, whereas the evtchn_send logic is not so defensive: vcpu[0] > must not be NULL, otherwise a null pointer dereference occurs. > > Using a VIRQ removes the need for the guest to query to determine which > event channel notifications will be delivered on. This is also likely to > simplify establishing future L0/L1 nested hypervisor argo communication. > > Signed-off-by: Christopher Clark > --- > The previous double-read of iovs from guest memory has been removed. > > v2 self: use ring_info backpointer in pending_ent to maintain npending > v2 feedback Jan: drop cookie, implement teardown > v2 self: pending_queue: reap stale ents when in need of space > v2 self: pending_requeue: reclaim ents for stale domains > v2.feedback Jan: only override sender domid if DOMID_ANY > v2 feedback Jan: drop message from argo_message_op > v2 self: check npending vs maximum limit > v2 self: get_sanitized_ring instead of get_rx_ptr > v2 feedback v1#13 Jan: remove double read from ringbuf insert, lower MAX_IOV > v2 self: make iov_count const > v2 self: iov_count : return EMSGSIZE for message too big > v2 self: OVERHAUL > v2 self: s/argo_pending_ent/pending_ent/g > v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header > v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions > v1 feedback Roger, Jan: drop argo prefix on static functions > v1 feedback #13 Jan: drop guest_handle_okay when using copy_from_guest > - reorder do_argo_op logic > v2 self: add _hnd suffix to iovs variable name to indicate guest handle type > v2 self: replace use of XEN_GUEST_HANDLE_NULL with two existing macros > > v1 #15 feedback, Jan: sendv op : s/ECONNREFUSED/ESRCH/ > v1 #5 (#15) feedback Paul: sendv: use currd in do_argo_message_op > v1 #13 (#15) feedback Paul: sendv op: do/while reindent only > v1 #13 (#15) feedback Paul: sendv op: do/while: argo_ringbuf_insert to goto > style > v1 #13 (#15) feedback Paul: sendv op: do/while: reindent only again > v1 #13 (#15) feedback Paul: sendv op: do/while : goto > v1 #15 feedback Paul: sendv op: make page var: unsigned > v1 #15 feedback Paul: sendv op: new local var for PAGE_SIZE - offset > v1 #8 feedback Jan: XEN_GUEST_HANDLE : C89 compliance > v1 rebase after switching register op from pfns to page descriptors > v1 self: move iov DEFINE_XEN_GUEST_HANDLE out of public header into argo.c > v1 #13 (#15) feedback Paul: fix loglevel for guest-triggered messages > v1 : add compat xlat.lst entries > v1 self: switched notification to send_guest_global_virq instead of event > v1: fix gprintk use for ARM as its defn dislikes split format strings > v1: init len variable to satisfy ARM compiler initialized checking > v1 #13 feedback Jan: rename page var > v1:#14 feedback Jan: uint8_t* -> void* > v1: #13 feedback Jan: public namespace: prefix with xen > v1: #13 feedback Jan: blank line after case op in do_argo_message_op > v1: #15 feedback Jan: add comments explaining why the writes
Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq
On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark wrote: > @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, > uint32_t tx_ptr) > smp_wmb(); > } > > +static int > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset, > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd, > + uint32_t len) > +{ > +unsigned int mfns_index = offset >> PAGE_SHIFT; > +void *dst; > +int ret; > +unsigned int src_offset = 0; > + > +ASSERT(spin_is_locked(&ring_info->lock)); > + > +offset &= ~PAGE_MASK; > + > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) > ) > +return -EFAULT; > + > +while ( (offset + len) > PAGE_SIZE ) > +{ > +unsigned int head_len = PAGE_SIZE - offset; I think this while loop could be re-written as while (len) { head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len; and then the extra copying below outside the loop could be dropped. The first loop does a partial copy at offset and then sets offset=0. The next N loops copy exactly PAGE_SIZE. The Final copy does the remaining len bytes. > + > +ret = ring_map_page(ring_info, mfns_index, &dst); > +if ( ret ) > +return ret; > + > +if ( src ) > +{ > +memcpy(dst + offset, src + src_offset, head_len); > +src_offset += head_len; > +} > +else > +{ > +ret = copy_from_guest(dst + offset, src_hnd, head_len) ? > +-EFAULT : 0; > +if ( ret ) > +return ret; > + > +guest_handle_add_offset(src_hnd, head_len); > +} > + > +mfns_index++; > +len -= head_len; > +offset = 0; > +} > + > +ret = ring_map_page(ring_info, mfns_index, &dst); > +if ( ret ) > +{ > +argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page" > + " %d of %d\n", ring_info->id.domain_id, > ring_info->id.port, > + ring_info->id.partner_id, ring_info, mfns_index, > + ring_info->nmfns); > +return ret; > +} > + > +if ( src ) > +memcpy(dst + offset, src + src_offset, len); > +else > +ret = copy_from_guest(dst + offset, src_hnd, len) ? -EFAULT : 0; > + > +return ret; > +} > + > +/* > + * iov_count returns its count on success via an out variable to avoid > + * potential for a negative return value to be used incorrectly > + * (eg. coerced into an unsigned variable resulting in a large incorrect > value) > + */ > +static int > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count) > +{ > +uint32_t sum_iov_lens = 0; > + > +if ( niov > XEN_ARGO_MAXIOV ) > +return -EINVAL; > + > +while ( niov-- ) > +{ > +/* valid iovs must have the padding field set to zero */ > +if ( piov->pad ) > +{ > +argo_dprintk("invalid iov: padding is not zero\n"); > +return -EINVAL; > +} > + > +/* check each to protect sum against integer overflow */ > +if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE ) Should this be MAX_ARGO_MESSAGE_SIZE? MAX_ARGO_MESSAGE_SIZE is less than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just fail the one below. > +{ > +argo_dprintk("invalid iov_len: too big (%u)>%llu\n", > + piov->iov_len, XEN_ARGO_MAX_RING_SIZE); > +return -EINVAL; > +} > + > +sum_iov_lens += piov->iov_len; > + > +/* > + * Again protect sum from integer overflow > + * and ensure total msg size will be within bounds. > + */ > +if ( sum_iov_lens > MAX_ARGO_MESSAGE_SIZE ) > +{ > +argo_dprintk("invalid iov series: total message too big\n"); > +return -EMSGSIZE; > +} > + > +piov++; > +} > + > +*count = sum_iov_lens; > + > +return 0; > +} > + > @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, > XEN_GUEST_HANDLE_PARAM(void) arg1, > break; > } > > +case XEN_ARGO_OP_sendv: > +{ > +xen_argo_send_addr_t send_addr; > + > +XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd = > +guest_handle_cast(arg1, xen_argo_send_addr_t); > +XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd = > +guest_handle_cast(arg2, xen_argo_iov_t); > +/* arg3 is niov */ > +/* arg4 is message_type. Must be a 32-bit value. */ > + > +rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0; > +if ( rc ) > +break; > + > +if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY ) > +send_addr.src.domain_id = currd->domain_id; > + > +/* No domain is currently authorized to send on behalf of another */ > +if ( unlikely(send_addr.src.domain_id