Re: [Xen-devel] [PATCH v2 11/18] argo: implement the register op

2018-12-21 Thread Julien Grall

Hi Christopher,

On 21/12/2018 01:17, Christopher Clark wrote:

On Thu, Dec 20, 2018 at 3:20 AM Julien Grall  wrote:


Hi Christopher,

On 12/20/18 6:39 AM, Christopher Clark wrote:

Used by a domain to register a region of memory for receiving messages from
either a specified other domain, or, if specifying a wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

In this code, the p2m type of the memory supplied by the guest for the ring
must be p2m_ram_rw, which is a conservative choice made to defer the need to
reason about the other p2m types with this commit.

xen_argo_page_descr_t type is introduced as a page descriptor, to convey
both the physical address of the start of the page and its granularity. The
smallest granularity page is assumed to be 4096 bytes and the lower twelve
bits of the type are used for indicate an enumerated page size.


I haven't seen any reply from you on my concern with this approach (see
[1]).

For convenience, I will duplicate the message here.


Hi Julien,

Thanks for the reminder.


If you let the user the choice of the granularity, then, I believe, you
will prevent the hypervisor to do some optimization.


OK, let's work through this then.


For instance, if the guest supplies only 4KB page but the hypervisor is
64KB. There are no way to easily map them contiguously in the hypervisor
(e.g using vmap).


Right. So with the matrix:

4K guest, 4K xen : fine.
4K guest, 64K xen : contiguous guest physical chunks or region required.
64K guest, 4K xen : weird? seems doable.


It is not weird, 64KB split nicely into 16 4KB chunk. Actually, Linux upstream 
has all the support for to run with 64KB pages on current Xen.



64K guest, 64K xen : fine (with some work).

as you note, the 4K guest, 64K hypervisor case is the one that
raises the question.


That's correct. To generalize the problem, the problem will happen whenever the 
guest page size is smaller than the Xen page size.





Is there a particular reason to allow the ring buffer to be
non-contiguous in the guest physical address?


It hasn't been a necessary restriction up to this point, and isn't
so on the platforms we're deploying on, so my preference is not to
introduce it as an additional requirement if it can be avoided. It
allows us to use vmalloc (rather than kmalloc) on Linux, which is
helpful.


vmalloc might be an issue on Arm if we request 64KB chunk of physical memory. 
Although I don't know the vmalloc implementation to be able to say whether this 
can be addressed.




There can be high turnover in ring registration for a server with
many short-lived connections. While the rings are not necessarily
large -- the default is 128K in the current Linux driver, though
clients can change what they use -- contiguous memory regions are a
more limited resource for the kernel to manage, and avoiding
pressure on that contiguous region allocator when it isn't necessary
is preferable.

We also do not want to disincentivize a server that is seeking to
improve performance from registering larger rings -- so allowing
non-contiguous regions fits with that.

I'd have to study the Linux driver further to say whether there
are stronger additional requirements that I'm not currently aware
of, but I don't know of any at the moment.


Thank you for the detailed explanation. So I think my option 1) below would suit 
you the best here.





Depending on the answer, there are different way to handle that:
1) Request the guest to allocate memory using 64KB (on Arm) chunk and
pass the base address for each chunk
2) Request the guest to allocate contiguously the buffer and pass the
base address and size


I understand that #2 would avoid the need to describe a contiguous
allocation of memory as a series of chunks; but I think #1 is the
option I would select. Do you think that would be acceptable?


1) is a good option for me. I forgot to mention the base address would need to 
be aligned to 64KB.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 11/18] argo: implement the register op

2018-12-20 Thread Christopher Clark
On Thu, Dec 20, 2018 at 3:20 AM Julien Grall  wrote:
>
> Hi Christopher,
>
> On 12/20/18 6:39 AM, Christopher Clark wrote:
> > Used by a domain to register a region of memory for receiving messages from
> > either a specified other domain, or, if specifying a wildcard, any domain.
> >
> > This operation creates a mapping within Xen's private address space that
> > will remain resident for the lifetime of the ring. In subsequent commits,
> > the hypervisor will use this mapping to copy data from a sending domain into
> > this registered ring, making it accessible to the domain that registered the
> > ring to receive data.
> >
> > In this code, the p2m type of the memory supplied by the guest for the ring
> > must be p2m_ram_rw, which is a conservative choice made to defer the need to
> > reason about the other p2m types with this commit.
> >
> > xen_argo_page_descr_t type is introduced as a page descriptor, to convey
> > both the physical address of the start of the page and its granularity. The
> > smallest granularity page is assumed to be 4096 bytes and the lower twelve
> > bits of the type are used for indicate an enumerated page size.
>
> I haven't seen any reply from you on my concern with this approach (see
> [1]).
>
> For convenience, I will duplicate the message here.

Hi Julien,

Thanks for the reminder.

> If you let the user the choice of the granularity, then, I believe, you
> will prevent the hypervisor to do some optimization.

OK, let's work through this then.

> For instance, if the guest supplies only 4KB page but the hypervisor is
> 64KB. There are no way to easily map them contiguously in the hypervisor
> (e.g using vmap).

Right. So with the matrix:

4K guest, 4K xen : fine.
4K guest, 64K xen : contiguous guest physical chunks or region required.
64K guest, 4K xen : weird? seems doable.
64K guest, 64K xen : fine (with some work).

as you note, the 4K guest, 64K hypervisor case is the one that
raises the question.

> Is there a particular reason to allow the ring buffer to be
> non-contiguous in the guest physical address?

It hasn't been a necessary restriction up to this point, and isn't
so on the platforms we're deploying on, so my preference is not to
introduce it as an additional requirement if it can be avoided. It
allows us to use vmalloc (rather than kmalloc) on Linux, which is
helpful.

There can be high turnover in ring registration for a server with
many short-lived connections. While the rings are not necessarily
large -- the default is 128K in the current Linux driver, though
clients can change what they use -- contiguous memory regions are a
more limited resource for the kernel to manage, and avoiding
pressure on that contiguous region allocator when it isn't necessary
is preferable.

We also do not want to disincentivize a server that is seeking to
improve performance from registering larger rings -- so allowing
non-contiguous regions fits with that.

I'd have to study the Linux driver further to say whether there
are stronger additional requirements that I'm not currently aware
of, but I don't know of any at the moment.

> Depending on the answer, there are different way to handle that:
> 1) Request the guest to allocate memory using 64KB (on Arm) chunk and
> pass the base address for each chunk
> 2) Request the guest to allocate contiguously the buffer and pass the
> base address and size

I understand that #2 would avoid the need to describe a contiguous
allocation of memory as a series of chunks; but I think #1 is the
option I would select. Do you think that would be acceptable?

Thanks again for your interest in this stuff.

Christopher

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 11/18] argo: implement the register op

2018-12-20 Thread Julien Grall

Hi Christopher,

On 12/20/18 6:39 AM, Christopher Clark wrote:

Used by a domain to register a region of memory for receiving messages from
either a specified other domain, or, if specifying a wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

In this code, the p2m type of the memory supplied by the guest for the ring
must be p2m_ram_rw, which is a conservative choice made to defer the need to
reason about the other p2m types with this commit.

xen_argo_page_descr_t type is introduced as a page descriptor, to convey
both the physical address of the start of the page and its granularity. The
smallest granularity page is assumed to be 4096 bytes and the lower twelve
bits of the type are used for indicate an enumerated page size.


I haven't seen any reply from you on my concern with this approach (see 
[1]).


For convenience, I will duplicate the message here.

If you let the user the choice of the granularity, then, I believe, you 
will prevent the hypervisor to do some optimization.


For instance, if the guest supplies only 4KB page but the hypervisor is 
64KB. There are no way to easily map them contiguously in the hypervisor 
(e.g using vmap).


Is there a particular reason to allow the ring buffer to be 
non-contiguous in the guest physical address?


Depending on the answer, there are different way to handle that:
1) Request the guest to allocate memory using 64KB (on Arm) chunk and 
pass the base address for each chunk
2) Request the guest to allocate contiguously the buffer and pass the 
base address and size


Cheers,

[1] https://lists.xen.org/archives/html/xen-devel/2018-12/msg01038.html

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 11/18] argo: implement the register op

2018-12-19 Thread Christopher Clark
Used by a domain to register a region of memory for receiving messages from
either a specified other domain, or, if specifying a wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

In this code, the p2m type of the memory supplied by the guest for the ring
must be p2m_ram_rw, which is a conservative choice made to defer the need to
reason about the other p2m types with this commit.

xen_argo_page_descr_t type is introduced as a page descriptor, to convey
both the physical address of the start of the page and its granularity. The
smallest granularity page is assumed to be 4096 bytes and the lower twelve
bits of the type are used for indicate an enumerated page size.
xen_argo_pfn_t type is introduced here to create a pfn_t type that is 64-bit
on all architectures, to assist with avoiding the need to add a compat ABI.

copy_field_to_guest_errno is added for guest access, performing the same
operation as copy_field_to_guest, but returning -EFAULT if the copy is
incomplete. Added to common code to simplify code at call sites.

Uses array_index_nospec to guard the result of the hash function.
This is out of an abundance of caution, since this is a very basic hash
function, chosen more for its bucket distribution properties to cluster
related rings rather than for cryptographic strength or any uniformness of
output, and it operates upon values supplied by the guest just before being
used as an array index.

Signed-off-by: Christopher Clark 
---
Changes since v1:

v1 #13 feedback, Jan: register op : s/ECONNREFUSED/ESRCH/
v1 #5 (#13) feedback Paul: register op: use currd in do_message_op
v1 #13 feedback, Paul: register op: use mfn_eq comparator
v1 #5 (#13) feedback Paul: register op: use currd in argo_register_ring
v1 #13 feedback Paul: register op: whitespace, unsigned, bounds check
v1 #13 feedback Paul: use of hex in limit constant definition
v1 #13 feedback Paul, register op: set nmfns on loop termination
v1 #13 feedback Paul: register op: do/while -> gotos, reindent
v1 argo_ring_map_page: drop uint32_t for unsigned int
v1. #13 feedback Julien: use page descriptors instead of gpfns.
   - adds ABI support for pages with different granularity.
v1 feedback #13, Paul: adjust log level of message
v1 feedback #13, Paul: use gprintk for guest-triggered warning
v1 feedback #13, Paul: gprintk and XENLOG_DEBUG for ring registration
v1 feedback #13, Paul: use gprintk for errs in argo_ring_map_page
v1 feedback #13, Paul: use ENOMEM if global mapping fails
v1 feedback Paul: overflow check before shift
v1: add define for copy_field_to_guest_errno
v1: fix gprintk use for ARM as its defn dislikes split format strings
v1: use copy_field_to_guest_errno
v1 feedback #13, Jan: argo_hash_fn: no inline, rename, change type
v1 feedback #13, Paul, Jan: EFAULT -> ENOMEM in argo_ring_map_page
v1 feedback #13, Jan: rename page var in argo_ring_map_page
v1 feedback #13, Jan: switch uint8_t* to void* and drop cast
v1 feedback #13, Jan: switch memory barrier to smp_wmb
v1 feedback #13, Jan: make 'ring' comment comply with single-line style
v1 feedback #13, Jan: use xzalloc_array, drop loop NULL init
v1 feedback #13, Jan: init bool with false rather than 0
v1 feedback #13 Jan: use __copy; define and use __copy_field_to_guest_errno
v1 feedback #13, Jan: use xzalloc, drop individual init zeroes
v1 feedback #13, Jan: prefix public namespace with xen
v1 feedback #13, Jan: blank line after op case in do_argo_message_op
v1 self: reflow comment in argo_ring_map_page to within 80 char len
v1 feedback #13, Roger: use true not 1 in assign to update_tx_ptr bool
v1 feedback #21, Jan: fold in the array_index_nospec hash function guards
v1 feedback #18, Jan: fold the max ring count limit into the series
v1 self: use unsigned long type for XEN_ARGO_REGISTER_FLAG_MASK
v1: feedback #15 Jan: handle upper-halves of hypercall args
v1. feedback #13 Jan: add comment re: page alignment
v1. self: confirm ring magic presence in supplied page array
v1. feedback #13 Jan: add comment re: minimum ring size
v1. feedback #13 Roger: use ASSERT_UNREACHABLE
v1. feedback Roger: add comment to hash function

 xen/common/argo.c  | 621 +
 xen/include/asm-arm/guest_access.h |  12 +
 xen/include/asm-x86/guest_access.h |  12 +
 xen/include/public/argo.h  |  71 +
 4 files changed, 716 insertions(+)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index abfc1f0..81f8341 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -23,14 +23,19 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #define ARGO_MAX_RINGS_PER_DOMAIN   128U
 
+DEFINE_XEN_GUEST_HANDLE(xen_argo_page_descr_t);