Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Christopher Clark
On Thu, Jan 10, 2019 at 1:41 PM Eric Chanudet  wrote:
>
> On 06/01/19 at 11:42pm, Christopher Clark wrote:
> >+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> >+ const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> >+ uint32_t len)
> >+{
> >+unsigned int mfns_index = offset >> PAGE_SHIFT;
> >+void *dst;
> >+int ret;
> >+unsigned int src_offset = 0;
> >+
> >+ASSERT(spin_is_locked(&ring_info->lock));
> >+
> >+offset &= ~PAGE_MASK;
> >+
> >+if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > 
> >XEN_ARGO_MAX_RING_SIZE) )
> >+return -EFAULT;
> With offset < PAGE_SIZE with the previous mask, shouldn't the sanity
> check be:
> if (len + offset > XEN_ARGO_MAX_RING_SIZE)

Yes, that's correct - thanks.
I'll switch the len and offset arguments to unsigned int while at it.

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Christopher Clark
On Thu, Jan 10, 2019 at 4:53 AM Jan Beulich  wrote:
>
> >>> On 10.01.19 at 13:40,  wrote:
> > On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich  wrote:
> >>
> >> >>> On 10.01.19 at 13:01,  wrote:
> >> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark
> >  wrote:
> >> >>
> >> >> The second reason is about avoiding exposing the Xen virtual memory
> >> >> allocator directly to frequent guest-supplied size requests for
> >> >> contiguous regions (of up to 16GB).
> >> >
> >> > As said in another reply, I'm not sure allowing 16GB rings is safe.
> >> > The amount of internal memory required to track such rings is not
> >> > trivial given the arrays to store the mfns, the pages, and the virtual
> >> > mappings.
> >> >
> >> >> With single-page allocations to
> >> >> build a ring, fragmentation is not a problem, and mischief by a guest
> >> >> seems difficult.
> >> >
> >> > Hm, there's still a lot of big dynamic memory allocations in order to
> >> > support a 16GB ring, which makes me think that virtual address space
> >> > is not the only problem if you allow 16GB rings.
> >> >
> >> >> Changing it to issue requests for contiguous regions,
> >> >> with variable ring sizes up to the maximum of 16GB, it seems like
> >> >> significant fragmentation may be achievable. I don't know the
> >> >> practical impact of that but it seems worth avoiding. Are the other
> >> >> users of __vmap (or vmap) for multi-gigabyte regions only either
> >> >> boot-time, infrequent operations (livepatch), or for actions by
> >> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> >> >> a frequent operation somewhere else?
> >> >
> >> > I haven't checked, but I would be quite surprised to find any vmap
> >> > usage with such size (16GB). Maybe someone more familiar with the mm
> >> > subsystem can provide some insight here.
> >>
> >> And indeed the vmap range reserved in VA space is just 64GB (on
> >> x86) at present.
> >>
> >> >> Given the context above, and Jason's simplification to the
> >> >> memcpy_to_guest_ring function, plus the imminent merge freeze
> >> >> deadline, and the understanding that this loop and the related data
> >> >> structures supporting it have been tested and are working, would it be
> >> >> acceptable to omit making this contiguous mapping change from this
> >> >> current series?
> >> >
> >> > My opinion would be to just use vmap if it works, because that IMO
> >> > greatly simplifies the code by being able to have the whole ring
> >> > mapped at all the time. It would remove the iteration to copy
> >> > requests, and remove the usage of ring_map_page everywhere. That would
> >> > be my recommendation code-wise, but as said above someone more
> >> > familiar with the mm subsystem might have other opinion's about how to
> >> > deal with accesses to 16GB of guest memory, and indeed your iterative
> >> > solution might be the best approach.
> >>
> >> No-one can allocate 16GB physically contiguous memory.
> >
> > Right, my question/comment was whether it would make sense to limit
> > the size of the argos ring to something smaller and then use vmap to
> > map the whole ring in contiguous virtual space in order to ease
> > accesses.
>
> Whether you vmap() the ring in (page sized) pieces or in one blob is,
> for the purpose of the order of magnitude of VA space consumption,
> not overly relevant: You can't map more than at most three such
> gigantic rings anyway with the current VA layout. (In practice
> mapping individual pages would halve the effectively usable VA
> space, due to the guard pages inserted between regions.) IOW -
> the ring size should be bounded at a lower value anyway imo.
>
> > TBH, I'm not sure virtual address space is the only issue if argos
> > allows 16GB rings to be used. 16GB rings will consume a non-trivial
> > amount of memory for the internal argos state tracking structures
> > AFAICT.
>
> Fully agree. It has taken us ages to eliminate all runtime
> allocations of order > 0, and it looks like we'd be gaining some
> back here. I consider this tolerable as long as the feature is
> experimental, but it would need fixing for it to become fully
> supported. Christopher - annotating such code with fixme
> comments right away helps later spotting (and addressing) them.

Sorry for blowing this thread up with the ring size statement based on
the incorrect comment (and thanks, Eric, for checking it).
I don't think 16MB rings introduce anywhere near the level of concern
for internal state, but I'll look at the allocations and see if fixmes
are appropriate.

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Eric Chanudet

On 06/01/19 at 11:42pm, Christopher Clark wrote:

+memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
+ const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
+ uint32_t len)
+{
+unsigned int mfns_index = offset >> PAGE_SHIFT;
+void *dst;
+int ret;
+unsigned int src_offset = 0;
+
+ASSERT(spin_is_locked(&ring_info->lock));
+
+offset &= ~PAGE_MASK;
+
+if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) )
+return -EFAULT;

With offset < PAGE_SIZE with the previous mask, shouldn't the sanity
check be:
   if (len + offset > XEN_ARGO_MAX_RING_SIZE)

--
Eric Chanudet

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Jan Beulich
>>> On 10.01.19 at 13:40,  wrote:
> On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich  wrote:
>>
>> >>> On 10.01.19 at 13:01,  wrote:
>> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark 
>  wrote:
>> >>
>> >> The second reason is about avoiding exposing the Xen virtual memory
>> >> allocator directly to frequent guest-supplied size requests for
>> >> contiguous regions (of up to 16GB).
>> >
>> > As said in another reply, I'm not sure allowing 16GB rings is safe.
>> > The amount of internal memory required to track such rings is not
>> > trivial given the arrays to store the mfns, the pages, and the virtual
>> > mappings.
>> >
>> >> With single-page allocations to
>> >> build a ring, fragmentation is not a problem, and mischief by a guest
>> >> seems difficult.
>> >
>> > Hm, there's still a lot of big dynamic memory allocations in order to
>> > support a 16GB ring, which makes me think that virtual address space
>> > is not the only problem if you allow 16GB rings.
>> >
>> >> Changing it to issue requests for contiguous regions,
>> >> with variable ring sizes up to the maximum of 16GB, it seems like
>> >> significant fragmentation may be achievable. I don't know the
>> >> practical impact of that but it seems worth avoiding. Are the other
>> >> users of __vmap (or vmap) for multi-gigabyte regions only either
>> >> boot-time, infrequent operations (livepatch), or for actions by
>> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
>> >> a frequent operation somewhere else?
>> >
>> > I haven't checked, but I would be quite surprised to find any vmap
>> > usage with such size (16GB). Maybe someone more familiar with the mm
>> > subsystem can provide some insight here.
>>
>> And indeed the vmap range reserved in VA space is just 64GB (on
>> x86) at present.
>>
>> >> Given the context above, and Jason's simplification to the
>> >> memcpy_to_guest_ring function, plus the imminent merge freeze
>> >> deadline, and the understanding that this loop and the related data
>> >> structures supporting it have been tested and are working, would it be
>> >> acceptable to omit making this contiguous mapping change from this
>> >> current series?
>> >
>> > My opinion would be to just use vmap if it works, because that IMO
>> > greatly simplifies the code by being able to have the whole ring
>> > mapped at all the time. It would remove the iteration to copy
>> > requests, and remove the usage of ring_map_page everywhere. That would
>> > be my recommendation code-wise, but as said above someone more
>> > familiar with the mm subsystem might have other opinion's about how to
>> > deal with accesses to 16GB of guest memory, and indeed your iterative
>> > solution might be the best approach.
>>
>> No-one can allocate 16GB physically contiguous memory.
> 
> Right, my question/comment was whether it would make sense to limit
> the size of the argos ring to something smaller and then use vmap to
> map the whole ring in contiguous virtual space in order to ease
> accesses.

Whether you vmap() the ring in (page sized) pieces or in one blob is,
for the purpose of the order of magnitude of VA space consumption,
not overly relevant: You can't map more than at most three such
gigantic rings anyway with the current VA layout. (In practice
mapping individual pages would halve the effectively usable VA
space, due to the guard pages inserted between regions.) IOW -
the ring size should be bounded at a lower value anyway imo.

> TBH, I'm not sure virtual address space is the only issue if argos
> allows 16GB rings to be used. 16GB rings will consume a non-trivial
> amount of memory for the internal argos state tracking structures
> AFAICT.

Fully agree. It has taken us ages to eliminate all runtime
allocations of order > 0, and it looks like we'd be gaining some
back here. I consider this tolerable as long as the feature is
experimental, but it would need fixing for it to become fully
supported. Christopher - annotating such code with fixme
comments right away helps later spotting (and addressing) them.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Roger Pau Monné
On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich  wrote:
>
> >>> On 10.01.19 at 13:01,  wrote:
> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark 
> >  wrote:
> >>
> >> The second reason is about avoiding exposing the Xen virtual memory
> >> allocator directly to frequent guest-supplied size requests for
> >> contiguous regions (of up to 16GB).
> >
> > As said in another reply, I'm not sure allowing 16GB rings is safe.
> > The amount of internal memory required to track such rings is not
> > trivial given the arrays to store the mfns, the pages, and the virtual
> > mappings.
> >
> >> With single-page allocations to
> >> build a ring, fragmentation is not a problem, and mischief by a guest
> >> seems difficult.
> >
> > Hm, there's still a lot of big dynamic memory allocations in order to
> > support a 16GB ring, which makes me think that virtual address space
> > is not the only problem if you allow 16GB rings.
> >
> >> Changing it to issue requests for contiguous regions,
> >> with variable ring sizes up to the maximum of 16GB, it seems like
> >> significant fragmentation may be achievable. I don't know the
> >> practical impact of that but it seems worth avoiding. Are the other
> >> users of __vmap (or vmap) for multi-gigabyte regions only either
> >> boot-time, infrequent operations (livepatch), or for actions by
> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> >> a frequent operation somewhere else?
> >
> > I haven't checked, but I would be quite surprised to find any vmap
> > usage with such size (16GB). Maybe someone more familiar with the mm
> > subsystem can provide some insight here.
>
> And indeed the vmap range reserved in VA space is just 64GB (on
> x86) at present.
>
> >> Given the context above, and Jason's simplification to the
> >> memcpy_to_guest_ring function, plus the imminent merge freeze
> >> deadline, and the understanding that this loop and the related data
> >> structures supporting it have been tested and are working, would it be
> >> acceptable to omit making this contiguous mapping change from this
> >> current series?
> >
> > My opinion would be to just use vmap if it works, because that IMO
> > greatly simplifies the code by being able to have the whole ring
> > mapped at all the time. It would remove the iteration to copy
> > requests, and remove the usage of ring_map_page everywhere. That would
> > be my recommendation code-wise, but as said above someone more
> > familiar with the mm subsystem might have other opinion's about how to
> > deal with accesses to 16GB of guest memory, and indeed your iterative
> > solution might be the best approach.
>
> No-one can allocate 16GB physically contiguous memory.

Right, my question/comment was whether it would make sense to limit
the size of the argos ring to something smaller and then use vmap to
map the whole ring in contiguous virtual space in order to ease
accesses.

TBH, I'm not sure virtual address space is the only issue if argos
allows 16GB rings to be used. 16GB rings will consume a non-trivial
amount of memory for the internal argos state tracking structures
AFAICT.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Jan Beulich
>>> On 10.01.19 at 13:01,  wrote:
> On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark 
>  wrote:
>>
>> The second reason is about avoiding exposing the Xen virtual memory
>> allocator directly to frequent guest-supplied size requests for
>> contiguous regions (of up to 16GB).
> 
> As said in another reply, I'm not sure allowing 16GB rings is safe.
> The amount of internal memory required to track such rings is not
> trivial given the arrays to store the mfns, the pages, and the virtual
> mappings.
> 
>> With single-page allocations to
>> build a ring, fragmentation is not a problem, and mischief by a guest
>> seems difficult.
> 
> Hm, there's still a lot of big dynamic memory allocations in order to
> support a 16GB ring, which makes me think that virtual address space
> is not the only problem if you allow 16GB rings.
> 
>> Changing it to issue requests for contiguous regions,
>> with variable ring sizes up to the maximum of 16GB, it seems like
>> significant fragmentation may be achievable. I don't know the
>> practical impact of that but it seems worth avoiding. Are the other
>> users of __vmap (or vmap) for multi-gigabyte regions only either
>> boot-time, infrequent operations (livepatch), or for actions by
>> privileged (ie. somewhat trusted) domains (ioremap), or is it already
>> a frequent operation somewhere else?
> 
> I haven't checked, but I would be quite surprised to find any vmap
> usage with such size (16GB). Maybe someone more familiar with the mm
> subsystem can provide some insight here.

And indeed the vmap range reserved in VA space is just 64GB (on
x86) at present.

>> Given the context above, and Jason's simplification to the
>> memcpy_to_guest_ring function, plus the imminent merge freeze
>> deadline, and the understanding that this loop and the related data
>> structures supporting it have been tested and are working, would it be
>> acceptable to omit making this contiguous mapping change from this
>> current series?
> 
> My opinion would be to just use vmap if it works, because that IMO
> greatly simplifies the code by being able to have the whole ring
> mapped at all the time. It would remove the iteration to copy
> requests, and remove the usage of ring_map_page everywhere. That would
> be my recommendation code-wise, but as said above someone more
> familiar with the mm subsystem might have other opinion's about how to
> deal with accesses to 16GB of guest memory, and indeed your iterative
> solution might be the best approach.

No-one can allocate 16GB physically contiguous memory.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-10 Thread Roger Pau Monné
On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark
 wrote:
>
> Thanks for the review, Roger. Replies inline below.
>
> On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné  wrote:
> >
> >  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
> >  wrote:
> > >
> > > sendv operation is invoked to perform a synchronous send of buffers
> > > contained in iovs to a remote domain's registered ring.
> > >
> > > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > > index 59ce8c4..4548435 100644
> > > --- a/xen/common/argo.c
> > > +++ b/xen/common/argo.c
>
> > >
> > > +static int
> > > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > > + uint32_t len)
> > > +{
> > > +unsigned int mfns_index = offset >> PAGE_SHIFT;
> > > +void *dst;
> > > +int ret;
> > > +unsigned int src_offset = 0;
> > > +
> > > +ASSERT(spin_is_locked(&ring_info->lock));
> > > +
> > > +offset &= ~PAGE_MASK;
> > > +
> > > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > 
> > > XEN_ARGO_MAX_RING_SIZE) )
> > > +return -EFAULT;
> > > +
> > > +while ( (offset + len) > PAGE_SIZE )
> >
> > I think you could map the whole ring in contiguous virtual address
> > space and then writing to it would be much more easy, you wouldn't
> > need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> > You could likely map this when the ring gets setup and keep it mapped
> > for the lifetime of the ring.
>
> You're right about that, because map_domain_page_global, which the
> current code uses, uses vmap itself. I think there's a couple of
> reasons why the code has been implemented the iterative way though.
>
> The first is that I think ring resize has been a consideration: it's
> useful to be able to increase the size of a live and active ring that
> is under load without having to tear down the mappings, find a new
> virtual address region of the right size and then remap it: you can
> just supply some more memory and map those pages onto the end of the
> ring, and ensure both sides know about the new ring size. Similarly,
> shrinking a quiet ring can be useful.

Is such on the fly expansion something common with argo?

I'm not saying it's something that shouldn't be supported, but the
burden of allowing such resizing doesn't seem trivial. You will have
to redimension a lot of the arrays used to store the pages used, and
at that point I wonder whether remapping the virtual address space is
really the biggest issue you are going to have if you allow such run
time resizing.

> However, the "gfn race" that you (correctly) pointed out in an earlier
> review, and Jan's related request to drop the "revalidate an existing
> mapping on ring reregister" motivated removal of a section of the code
> involved, and then in v3 of the series, I've actually just blocked
> ring resize because it's missing a walk through the pending
> notifications to find any that have become untriggerable with the new
> ring size when a ring is shrunk and I'd like to defer implementing
> that for now. So the ring resize reason is more of a consideration for
> a possible later version of Argo than the current one.
>
> The second reason is about avoiding exposing the Xen virtual memory
> allocator directly to frequent guest-supplied size requests for
> contiguous regions (of up to 16GB).

As said in another reply, I'm not sure allowing 16GB rings is safe.
The amount of internal memory required to track such rings is not
trivial given the arrays to store the mfns, the pages, and the virtual
mappings.

> With single-page allocations to
> build a ring, fragmentation is not a problem, and mischief by a guest
> seems difficult.

Hm, there's still a lot of big dynamic memory allocations in order to
support a 16GB ring, which makes me think that virtual address space
is not the only problem if you allow 16GB rings.

> Changing it to issue requests for contiguous regions,
> with variable ring sizes up to the maximum of 16GB, it seems like
> significant fragmentation may be achievable. I don't know the
> practical impact of that but it seems worth avoiding. Are the other
> users of __vmap (or vmap) for multi-gigabyte regions only either
> boot-time, infrequent operations (livepatch), or for actions by
> privileged (ie. somewhat trusted) domains (ioremap), or is it already
> a frequent operation somewhere else?

I haven't checked, but I would be quite surprised to find any vmap
usage with such size (16GB). Maybe someone more familiar with the mm
subsystem can provide some insight here.

> Given the context above, and Jason's simplification to the
> memcpy_to_guest_ring function, plus the imminent merge freeze
> deadline, and the understanding that this loop and the related data
> structures supporting it have been tested and are working, would it be
> acceptable to omit making this contiguous mapping change from this
> current series

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-09 Thread Christopher Clark
Thanks for the review, Roger. Replies inline below.

On Wed, Jan 9, 2019 at 10:57 AM Roger Pau Monné  wrote:
>
>  to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
>  wrote:
> >
> > sendv operation is invoked to perform a synchronous send of buffers
> > contained in iovs to a remote domain's registered ring.
> >
> > diff --git a/xen/common/argo.c b/xen/common/argo.c
> > index 59ce8c4..4548435 100644
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c

> >
> > +static int
> > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > + uint32_t len)
> > +{
> > +unsigned int mfns_index = offset >> PAGE_SHIFT;
> > +void *dst;
> > +int ret;
> > +unsigned int src_offset = 0;
> > +
> > +ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +offset &= ~PAGE_MASK;
> > +
> > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > 
> > XEN_ARGO_MAX_RING_SIZE) )
> > +return -EFAULT;
> > +
> > +while ( (offset + len) > PAGE_SIZE )
>
> I think you could map the whole ring in contiguous virtual address
> space and then writing to it would be much more easy, you wouldn't
> need to iterate with memcpy or copy_from_guest, take a look at __vmap.
> You could likely map this when the ring gets setup and keep it mapped
> for the lifetime of the ring.

You're right about that, because map_domain_page_global, which the
current code uses, uses vmap itself. I think there's a couple of
reasons why the code has been implemented the iterative way though.

The first is that I think ring resize has been a consideration: it's
useful to be able to increase the size of a live and active ring that
is under load without having to tear down the mappings, find a new
virtual address region of the right size and then remap it: you can
just supply some more memory and map those pages onto the end of the
ring, and ensure both sides know about the new ring size. Similarly,
shrinking a quiet ring can be useful.
However, the "gfn race" that you (correctly) pointed out in an earlier
review, and Jan's related request to drop the "revalidate an existing
mapping on ring reregister" motivated removal of a section of the code
involved, and then in v3 of the series, I've actually just blocked
ring resize because it's missing a walk through the pending
notifications to find any that have become untriggerable with the new
ring size when a ring is shrunk and I'd like to defer implementing
that for now. So the ring resize reason is more of a consideration for
a possible later version of Argo than the current one.

The second reason is about avoiding exposing the Xen virtual memory
allocator directly to frequent guest-supplied size requests for
contiguous regions (of up to 16GB). With single-page allocations to
build a ring, fragmentation is not a problem, and mischief by a guest
seems difficult. Changing it to issue requests for contiguous regions,
with variable ring sizes up to the maximum of 16GB, it seems like
significant fragmentation may be achievable. I don't know the
practical impact of that but it seems worth avoiding. Are the other
users of __vmap (or vmap) for multi-gigabyte regions only either
boot-time, infrequent operations (livepatch), or for actions by
privileged (ie. somewhat trusted) domains (ioremap), or is it already
a frequent operation somewhere else?

Given the context above, and Jason's simplification to the
memcpy_to_guest_ring function, plus the imminent merge freeze
deadline, and the understanding that this loop and the related data
structures supporting it have been tested and are working, would it be
acceptable to omit making this contiguous mapping change from this
current series?

>
> > +{
> > +unsigned int head_len = PAGE_SIZE - offset;
> > +
> > +ret = ring_map_page(ring_info, mfns_index, &dst);
> > +if ( ret )
> > +return ret;
> > +
> > +if ( src )
> > +{
> > +memcpy(dst + offset, src + src_offset, head_len);
> > +src_offset += head_len;
> > +}
> > +else
> > +{
> > +ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> > +-EFAULT : 0;
> > +if ( ret )
> > +return ret;
>
> You can simplify this to:
>
> if ( copy_from_guest(...) )
> return -EFAULT;

yes! ack - thanks


> > +/*
> > + * get_sanitized_ring creates a modified copy of the ring pointers where
> > + * the rx_ptr is rounded up to ensure it is aligned, and then ring
> > + * wrap is handled. Simplifies safe use of the rx_ptr for available
> > + * space calculation.
> > + */
> > +static int
> > +get_sanitized_ring(xen_argo_ring_t *ring, struct argo_ring_info *ring_info)
> > +{
> > +uint32_t rx_ptr;
> > +int ret;
> > +
> > +ret = get_rx_ptr(ring_info, &rx_ptr);
> > +if ( ret )
> > +return ret;
> > +
> > +ring->tx_ptr = ring_inf

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-09 Thread Christopher Clark
On Wed, Jan 9, 2019 at 10:05 AM Jason Andryuk  wrote:
>
> On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
>  wrote:
>
> 
>
> > @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, 
> > uint32_t tx_ptr)
> >  smp_wmb();
> >  }
> >
> > +static int
> > +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> > + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> > + uint32_t len)
> > +{
> > +unsigned int mfns_index = offset >> PAGE_SHIFT;
> > +void *dst;
> > +int ret;
> > +unsigned int src_offset = 0;
> > +
> > +ASSERT(spin_is_locked(&ring_info->lock));
> > +
> > +offset &= ~PAGE_MASK;
> > +
> > +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > 
> > XEN_ARGO_MAX_RING_SIZE) )
> > +return -EFAULT;
> > +
> > +while ( (offset + len) > PAGE_SIZE )
> > +{
> > +unsigned int head_len = PAGE_SIZE - offset;
>
> I think this while loop could be re-written as
> while (len) {
> head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len;
>
> and then the extra copying below outside the loop could be dropped.
>
> The first loop does a partial copy at offset and then sets offset=0.
> The next N loops copy exactly PAGE_SIZE.
> The Final copy does the remaining len bytes.

That looks right to me and makes a nice simplification to that
function -- thanks.

> 
>
> > +
> > +/*
> > + * iov_count returns its count on success via an out variable to avoid
> > + * potential for a negative return value to be used incorrectly
> > + * (eg. coerced into an unsigned variable resulting in a large incorrect 
> > value)
> > + */
> > +static int
> > +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> > +{
> > +uint32_t sum_iov_lens = 0;
> > +
> > +if ( niov > XEN_ARGO_MAXIOV )
> > +return -EINVAL;
> > +
> > +while ( niov-- )
> > +{
> > +/* valid iovs must have the padding field set to zero */
> > +if ( piov->pad )
> > +{
> > +argo_dprintk("invalid iov: padding is not zero\n");
> > +return -EINVAL;
> > +}
> > +
> > +/* check each to protect sum against integer overflow */
> > +if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )
>
> Should this be MAX_ARGO_MESSAGE_SIZE?  MAX_ARGO_MESSAGE_SIZE is less
> than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just
> fail the one below.

ack - I'll switch it to MAX_ARGO_MESSAGE_SIZE.

> 
>
> > @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg1,
> >  break;
> >  }
> >
> > +case XEN_ARGO_OP_sendv:
> > +{
> > +xen_argo_send_addr_t send_addr;
> > +
> > +XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
> > +guest_handle_cast(arg1, xen_argo_send_addr_t);
> > +XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
> > +guest_handle_cast(arg2, xen_argo_iov_t);
> > +/* arg3 is niov */
> > +/* arg4 is message_type. Must be a 32-bit value. */
> > +
> > +rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
> > +if ( rc )
> > +break;
> > +
> > +if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
> > +send_addr.src.domain_id = currd->domain_id;
> > +
> > +/* No domain is currently authorized to send on behalf of another 
> > */
> > +if ( unlikely(send_addr.src.domain_id != currd->domain_id) )
> > +{
> > +rc = -EPERM;
> > +break;
> > +}
> > +
> > +/* Reject niov or message_type values that are outside 32 bit 
> > range. */
> > +if ( unlikely((arg3 > XEN_ARGO_MAXIOV) || (arg4 & ~0xUL)) )
> > +{
> > +rc = -EINVAL;
> > +break;
> > +}
>
> This needs to check send_addr.src.pad and send_addr.dst.pad == 0.
> sendv() does not check the padding either.

ack - will fix.

thanks

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-09 Thread Roger Pau Monné
 to.On Mon, Jan 7, 2019 at 8:44 AM Christopher Clark
 wrote:
>
> sendv operation is invoked to perform a synchronous send of buffers
> contained in iovs to a remote domain's registered ring.
>
> It takes:
>  * A destination address (domid, port) for the ring to send to.
>It performs a most-specific match lookup, to allow for wildcard.
>  * A source address, used to inform the destination of where to reply.
>  * The address of an array of iovs containing the data to send
>  * .. and the length of that array of iovs
>  * and a 32-bit message type, available to communicate message context
>data (eg. kernel-to-kernel, separate from the application data).
>
> If insufficient space exists in the destination ring, it will return
> -EAGAIN and Xen will notify the caller when sufficient space becomes
> available.
>
> Accesses to the ring indices are appropriately atomic. The rings are
> mapped into Xen's private address space to write as needed and the
> mappings are retained for later use.
>
> Fixed-size types are used in some areas within this code where caution
> around avoiding integer overflow is important.
>
> Notifications are sent to guests via VIRQ and send_guest_global_virq is
> exposed in the change to enable argo to call it. VIRQ_ARGO_MESSAGE is
> claimed from the VIRQ previously reserved for this purpose (#11).
>
> The VIRQ notification method is used rather than sending events using
> evtchn functions directly because:
>
> * no current event channel type is an exact fit for the intended
>   behaviour. ECS_IPI is closest, but it disallows migration to
>   other VCPUs which is not necessarily a requirement for Argo.
>
> * at the point of argo_init, allocation of an event channel is
>   complicated by none of the guest VCPUs being initialized yet
>   and the event channel logic expects that a valid event channel
>   has a present VCPU.
>
> * at the point of signalling a notification, the VIRQ logic is already
>   defensive: if d->vcpu[0] is NULL, the notification is just silently
>   dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
>   must not be NULL, otherwise a null pointer dereference occurs.
>
> Using a VIRQ removes the need for the guest to query to determine which
> event channel notifications will be delivered on. This is also likely to
> simplify establishing future L0/L1 nested hypervisor argo communication.
>
> Signed-off-by: Christopher Clark 
> ---
> The previous double-read of iovs from guest memory has been removed.
>
> v2 self: use ring_info backpointer in pending_ent to maintain npending
> v2 feedback Jan: drop cookie, implement teardown
> v2 self: pending_queue: reap stale ents when in need of space
> v2 self: pending_requeue: reclaim ents for stale domains
> v2.feedback Jan: only override sender domid if DOMID_ANY
> v2 feedback Jan: drop message from argo_message_op
> v2 self: check npending vs maximum limit
> v2 self: get_sanitized_ring instead of get_rx_ptr
> v2 feedback v1#13 Jan: remove double read from ringbuf insert, lower MAX_IOV
> v2 self: make iov_count const
> v2 self: iov_count : return EMSGSIZE for message too big
> v2 self: OVERHAUL
> v2 self: s/argo_pending_ent/pending_ent/g
> v2 feedback v1#13 Roger: use OS-supplied roundup; drop from public header
> v1,2 feedback Jan/Roger/Paul: drop errno returning guest access functions
> v1 feedback Roger, Jan: drop argo prefix on static functions
> v1 feedback #13 Jan: drop guest_handle_okay when using copy_from_guest
> - reorder do_argo_op logic
> v2 self: add _hnd suffix to iovs variable name to indicate guest handle type
> v2 self: replace use of XEN_GUEST_HANDLE_NULL with two existing macros
>
> v1 #15 feedback, Jan: sendv op : s/ECONNREFUSED/ESRCH/
> v1 #5 (#15) feedback Paul: sendv: use currd in do_argo_message_op
> v1 #13 (#15) feedback Paul: sendv op: do/while reindent only
> v1 #13 (#15) feedback Paul: sendv op: do/while: argo_ringbuf_insert to goto 
> style
> v1 #13 (#15) feedback Paul: sendv op: do/while: reindent only again
> v1 #13 (#15) feedback Paul: sendv op: do/while : goto
> v1 #15 feedback Paul: sendv op: make page var: unsigned
> v1 #15 feedback Paul: sendv op: new local var for PAGE_SIZE - offset
> v1 #8 feedback Jan: XEN_GUEST_HANDLE : C89 compliance
> v1 rebase after switching register op from pfns to page descriptors
> v1 self: move iov DEFINE_XEN_GUEST_HANDLE out of public header into argo.c
> v1 #13 (#15) feedback Paul: fix loglevel for guest-triggered messages
> v1 : add compat xlat.lst entries
> v1 self: switched notification to send_guest_global_virq instead of event
> v1: fix gprintk use for ARM as its defn dislikes split format strings
> v1: init len variable to satisfy ARM compiler initialized checking
> v1 #13 feedback Jan: rename page var
> v1:#14 feedback Jan: uint8_t* -> void*
> v1: #13 feedback Jan: public namespace: prefix with xen
> v1: #13 feedback Jan: blank line after case op in do_argo_message_op
> v1: #15 feedback Jan: add comments explaining why the writes 

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq

2019-01-09 Thread Jason Andryuk
On Mon, Jan 7, 2019 at 2:43 AM Christopher Clark
 wrote:



> @@ -342,6 +357,413 @@ update_tx_ptr(struct argo_ring_info *ring_info, 
> uint32_t tx_ptr)
>  smp_wmb();
>  }
>
> +static int
> +memcpy_to_guest_ring(struct argo_ring_info *ring_info, uint32_t offset,
> + const void *src, XEN_GUEST_HANDLE(uint8_t) src_hnd,
> + uint32_t len)
> +{
> +unsigned int mfns_index = offset >> PAGE_SHIFT;
> +void *dst;
> +int ret;
> +unsigned int src_offset = 0;
> +
> +ASSERT(spin_is_locked(&ring_info->lock));
> +
> +offset &= ~PAGE_MASK;
> +
> +if ( (len > XEN_ARGO_MAX_RING_SIZE) || (offset > XEN_ARGO_MAX_RING_SIZE) 
> )
> +return -EFAULT;
> +
> +while ( (offset + len) > PAGE_SIZE )
> +{
> +unsigned int head_len = PAGE_SIZE - offset;

I think this while loop could be re-written as
while (len) {
head_len = len > PAGE_SIZE ? PAGE_SIZE - offset: len;

and then the extra copying below outside the loop could be dropped.

The first loop does a partial copy at offset and then sets offset=0.
The next N loops copy exactly PAGE_SIZE.
The Final copy does the remaining len bytes.

> +
> +ret = ring_map_page(ring_info, mfns_index, &dst);
> +if ( ret )
> +return ret;
> +
> +if ( src )
> +{
> +memcpy(dst + offset, src + src_offset, head_len);
> +src_offset += head_len;
> +}
> +else
> +{
> +ret = copy_from_guest(dst + offset, src_hnd, head_len) ?
> +-EFAULT : 0;
> +if ( ret )
> +return ret;
> +
> +guest_handle_add_offset(src_hnd, head_len);
> +}
> +
> +mfns_index++;
> +len -= head_len;
> +offset = 0;
> +}
> +
> +ret = ring_map_page(ring_info, mfns_index, &dst);
> +if ( ret )
> +{
> +argo_dprintk("argo: ring (vm%u:%x vm%d) %p attempted to map page"
> + " %d of %d\n", ring_info->id.domain_id, 
> ring_info->id.port,
> + ring_info->id.partner_id, ring_info, mfns_index,
> + ring_info->nmfns);
> +return ret;
> +}
> +
> +if ( src )
> +memcpy(dst + offset, src + src_offset, len);
> +else
> +ret = copy_from_guest(dst + offset, src_hnd, len) ? -EFAULT : 0;
> +
> +return ret;
> +}



> +
> +/*
> + * iov_count returns its count on success via an out variable to avoid
> + * potential for a negative return value to be used incorrectly
> + * (eg. coerced into an unsigned variable resulting in a large incorrect 
> value)
> + */
> +static int
> +iov_count(const xen_argo_iov_t *piov, unsigned long niov, uint32_t *count)
> +{
> +uint32_t sum_iov_lens = 0;
> +
> +if ( niov > XEN_ARGO_MAXIOV )
> +return -EINVAL;
> +
> +while ( niov-- )
> +{
> +/* valid iovs must have the padding field set to zero */
> +if ( piov->pad )
> +{
> +argo_dprintk("invalid iov: padding is not zero\n");
> +return -EINVAL;
> +}
> +
> +/* check each to protect sum against integer overflow */
> +if ( piov->iov_len > XEN_ARGO_MAX_RING_SIZE )

Should this be MAX_ARGO_MESSAGE_SIZE?  MAX_ARGO_MESSAGE_SIZE is less
than XEN_ARGO_MAX_RING_SIZE, so we can pass this check and then just
fail the one below.

> +{
> +argo_dprintk("invalid iov_len: too big (%u)>%llu\n",
> + piov->iov_len, XEN_ARGO_MAX_RING_SIZE);
> +return -EINVAL;
> +}
> +
> +sum_iov_lens += piov->iov_len;
> +
> +/*
> + * Again protect sum from integer overflow
> + * and ensure total msg size will be within bounds.
> + */
> +if ( sum_iov_lens > MAX_ARGO_MESSAGE_SIZE )
> +{
> +argo_dprintk("invalid iov series: total message too big\n");
> +return -EMSGSIZE;
> +}
> +
> +piov++;
> +}
> +
> +*count = sum_iov_lens;
> +
> +return 0;
> +}
> +



> @@ -1073,6 +1683,49 @@ do_argo_op(unsigned int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg1,
>  break;
>  }
>
> +case XEN_ARGO_OP_sendv:
> +{
> +xen_argo_send_addr_t send_addr;
> +
> +XEN_GUEST_HANDLE_PARAM(xen_argo_send_addr_t) send_addr_hnd =
> +guest_handle_cast(arg1, xen_argo_send_addr_t);
> +XEN_GUEST_HANDLE_PARAM(xen_argo_iov_t) iovs_hnd =
> +guest_handle_cast(arg2, xen_argo_iov_t);
> +/* arg3 is niov */
> +/* arg4 is message_type. Must be a 32-bit value. */
> +
> +rc = copy_from_guest(&send_addr, send_addr_hnd, 1) ? -EFAULT : 0;
> +if ( rc )
> +break;
> +
> +if ( send_addr.src.domain_id == XEN_ARGO_DOMID_ANY )
> +send_addr.src.domain_id = currd->domain_id;
> +
> +/* No domain is currently authorized to send on behalf of another */
> +if ( unlikely(send_addr.src.domain_id