Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-02-10 Thread Konrad Rzeszutek Wilk
On Fri, Feb 10, 2017 at 12:09:36PM -0800, Stefano Stabellini wrote:
> On Fri, 10 Feb 2017, Konrad Rzeszutek Wilk wrote:
> > .snip..
> > > > > Request fields:
> > > > > 
> > > > > - **cmd** value: 0
> > > > > - additional fields:
> > > > >   - **id**: identifies the socket
> > > > >   - **addr**: address to connect to, see [Socket families and address 
> > > > > format]
> > > > 
> > > > 
> > > > Hm, so what do we do if we want to support AF_UNIX which has an addr of
> > > > 108 bytes?
> > > 
> > > We write a protocol extension and bump the protocol version. However, we
> > 
> > Right. How would you change the protocol for this?
> > 
> > I not asking to have this in this protocol but I just want us to think
> > of what we could do so that if somebody was to implement this - how
> > could we make this easier for this?
> > 
> > My initial thought was to spread the request over two "old" structures.
> > And if so .. would it make sense to include an extra flag or such?
> 
> That's a possibility, but I don't think we need an extra flag. It would
> be easier to introduce a new command, such as PVCALLS_CONNECT_EXTENDED
> or PVCALLS_CONNECT_V2, with the appropriate flags to say that it will
> make use of two request slots instead of one.

Fair enough. Perhaps include a section in the document about how one
could expand the protocol and include this? That would make it easier
for folks to follow an 'paved' way?


> 
> 
> > > could make the addr array size larger now to be more future proof, but
> > > it takes up memory and I have no use for it, given that we can use
> > > loopback for the same purpose.
> > > 
> > 
> > ..snip..
> > > > >  Indexes Page Structure
> > > > > 
> > > > > typedef uint32_t PVCALLS_RING_IDX;
> > > > > 
> > > > > struct pvcalls_data_intf {
> > > > >   PVCALLS_RING_IDX in_cons, in_prod;
> > > > >   int32_t in_error;
> > > > 
> > > > You don't want to perhaps include in_event?
> > > > > 
> > > > >   uint8_t pad[52];
> > > > > 
> > > > >   PVCALLS_RING_IDX out_cons, out_prod;
> > > > >   int32_t out_error;
> > > > 
> > > > And out_event as way to do some form of interrupt mitigation
> > > > (similar to what you had proposed?)
> > > 
> > > Yes, the in_event / out_event optimization that I wrote for the 9pfs
> > > protocol could work here too. However, I thought you preferred to remove
> > > it for now as it is not required and increases complexity?
> > 
> > I did. But I am coming to it by looking at the ring.h header.
> > 
> > My recollection was that your optimization was a bit different than
> > what ring.h has.
> 
> Right. They are similar, but different because in this protocol we have
> two rings: the `in' ring and the `out' ring. Each ring is
> mono-directional and there is no static request size: the producer
> writes opaque data to the ring. In ring.h they are combined together and
> the request size is static and well-known. In PVCalls:
> 
> in -> backend to frontend only
> out-> frontend to backend only
> 
> Let talk about the `in' ring, where the frontend is the consumer
> and the backend is the producer. Everything is the same but mirrored for
> the `out' ring.
> 
> The producer doesn't need any notifications unless the ring is full.
> The producer, the backend in this case, never reads from the `in' ring.
> Thus, I disabled notifications to the producer by default and added an
> in_event field for the producer to ask for notifications only when
> necessary, that is when the ring is full.
> 
> On the other end, the consumer always require notifications, unless the
> consumer is already actively reading from the ring. The
> producer could figure it out without any additional fields in the
> protocol. It can simply compare the indexes at the beginning and at the
> end of the function, that's similar to what the ring protocol does.

I like your description! Could you include this in a section
titled 'Why ring.h macros are not needed.' please?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-02-10 Thread Stefano Stabellini
On Fri, 10 Feb 2017, Konrad Rzeszutek Wilk wrote:
> .snip..
> > > > Request fields:
> > > > 
> > > > - **cmd** value: 0
> > > > - additional fields:
> > > >   - **id**: identifies the socket
> > > >   - **addr**: address to connect to, see [Socket families and address 
> > > > format]
> > > 
> > > 
> > > Hm, so what do we do if we want to support AF_UNIX which has an addr of
> > > 108 bytes?
> > 
> > We write a protocol extension and bump the protocol version. However, we
> 
> Right. How would you change the protocol for this?
> 
> I not asking to have this in this protocol but I just want us to think
> of what we could do so that if somebody was to implement this - how
> could we make this easier for this?
> 
> My initial thought was to spread the request over two "old" structures.
> And if so .. would it make sense to include an extra flag or such?

That's a possibility, but I don't think we need an extra flag. It would
be easier to introduce a new command, such as PVCALLS_CONNECT_EXTENDED
or PVCALLS_CONNECT_V2, with the appropriate flags to say that it will
make use of two request slots instead of one.


> > could make the addr array size larger now to be more future proof, but
> > it takes up memory and I have no use for it, given that we can use
> > loopback for the same purpose.
> > 
> 
> ..snip..
> > > >  Indexes Page Structure
> > > > 
> > > > typedef uint32_t PVCALLS_RING_IDX;
> > > > 
> > > > struct pvcalls_data_intf {
> > > > PVCALLS_RING_IDX in_cons, in_prod;
> > > > int32_t in_error;
> > > 
> > > You don't want to perhaps include in_event?
> > > > 
> > > > uint8_t pad[52];
> > > > 
> > > > PVCALLS_RING_IDX out_cons, out_prod;
> > > > int32_t out_error;
> > > 
> > > And out_event as way to do some form of interrupt mitigation
> > > (similar to what you had proposed?)
> > 
> > Yes, the in_event / out_event optimization that I wrote for the 9pfs
> > protocol could work here too. However, I thought you preferred to remove
> > it for now as it is not required and increases complexity?
> 
> I did. But I am coming to it by looking at the ring.h header.
> 
> My recollection was that your optimization was a bit different than
> what ring.h has.

Right. They are similar, but different because in this protocol we have
two rings: the `in' ring and the `out' ring. Each ring is
mono-directional and there is no static request size: the producer
writes opaque data to the ring. In ring.h they are combined together and
the request size is static and well-known. In PVCalls:

in -> backend to frontend only
out-> frontend to backend only

Let talk about the `in' ring, where the frontend is the consumer
and the backend is the producer. Everything is the same but mirrored for
the `out' ring.

The producer doesn't need any notifications unless the ring is full.
The producer, the backend in this case, never reads from the `in' ring.
Thus, I disabled notifications to the producer by default and added an
in_event field for the producer to ask for notifications only when
necessary, that is when the ring is full.

On the other end, the consumer always require notifications, unless the
consumer is already actively reading from the ring. The
producer could figure it out without any additional fields in the
protocol. It can simply compare the indexes at the beginning and at the
end of the function, that's similar to what the ring protocol does.

 
> > We could always add it later, if we reserved some padding here for it.
> > Something like:
> > 
> >struct pvcalls_data_intf {
> > PVCALLS_RING_IDX in_cons, in_prod;
> > int32_t in_error;
> > 
> > uint8_t pad[52];
> > 
> > PVCALLS_RING_IDX out_cons, out_prod;
> > int32_t out_error;
> > 
> > uint8_t pad[52]; <--- this is new
> > 
> > uint32_t ring_order;
> > grant_ref_t ref[];
> >};
> > 
> > We have plenty of space for the grant refs anyway. This way, we can
> > introduce in_event and out_event by eating up 4 bytes from each pad
> > array.
> 
> That is true.

I think it makes sense to start simple. The optimization could be a
decent first feature flag :-)


> > > > 
> > > > uint32_t ring_order;
> > > > grant_ref_t ref[];
> > > > };
> > > > 
> > > > /* not actually C compliant (ring_order changes from socket to 
> > > > socket) */
> > > > struct pvcalls_data {
> > > > char in[((1< > > > char out[((1< > > > };
> > > > 
> > > > - **ring_order**
> > > >   It represents the order of the data ring. The following list of grant
> > > >   references is of `(1 << ring_order)` elements. It cannot be greater 
> > > > than
> > > >   **max-page-order**, as specified by the backend on XenBus. It has to
> > > >   be one at minimum.
> > > 
> > > Oh? Why not zero? (4KB) as the 'max-page-order' has an example of zero 
> > > order?
> > > Perhaps if it MUST be one or more then the 'max-page-order' 

Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-02-10 Thread Konrad Rzeszutek Wilk
.snip..
> > > Request fields:
> > > 
> > > - **cmd** value: 0
> > > - additional fields:
> > >   - **id**: identifies the socket
> > >   - **addr**: address to connect to, see [Socket families and address 
> > > format]
> > 
> > 
> > Hm, so what do we do if we want to support AF_UNIX which has an addr of
> > 108 bytes?
> 
> We write a protocol extension and bump the protocol version. However, we

Right. How would you change the protocol for this?

I not asking to have this in this protocol but I just want us to think
of what we could do so that if somebody was to implement this - how
could we make this easier for this?

My initial thought was to spread the request over two "old" structures.
And if so .. would it make sense to include an extra flag or such?

> could make the addr array size larger now to be more future proof, but
> it takes up memory and I have no use for it, given that we can use
> loopback for the same purpose.
> 

..snip..
> > >  Indexes Page Structure
> > > 
> > > typedef uint32_t PVCALLS_RING_IDX;
> > > 
> > > struct pvcalls_data_intf {
> > >   PVCALLS_RING_IDX in_cons, in_prod;
> > >   int32_t in_error;
> > 
> > You don't want to perhaps include in_event?
> > > 
> > >   uint8_t pad[52];
> > > 
> > >   PVCALLS_RING_IDX out_cons, out_prod;
> > >   int32_t out_error;
> > 
> > And out_event as way to do some form of interrupt mitigation
> > (similar to what you had proposed?)
> 
> Yes, the in_event / out_event optimization that I wrote for the 9pfs
> protocol could work here too. However, I thought you preferred to remove
> it for now as it is not required and increases complexity?

I did. But I am coming to it by looking at the ring.h header.

My recollection was that your optimization was a bit different than
what ring.h has.

> 
> We could always add it later, if we reserved some padding here for it.
> Something like:
> 
>struct pvcalls_data_intf {
>   PVCALLS_RING_IDX in_cons, in_prod;
>   int32_t in_error;
> 
>   uint8_t pad[52];
> 
>   PVCALLS_RING_IDX out_cons, out_prod;
>   int32_t out_error;
> 
>   uint8_t pad[52]; <--- this is new
> 
>   uint32_t ring_order;
>   grant_ref_t ref[];
>};
> 
> We have plenty of space for the grant refs anyway. This way, we can
> introduce in_event and out_event by eating up 4 bytes from each pad
> array.

That is true.
> 
> 
> > > 
> > >   uint32_t ring_order;
> > >   grant_ref_t ref[];
> > > };
> > > 
> > > /* not actually C compliant (ring_order changes from socket to 
> > > socket) */
> > > struct pvcalls_data {
> > > char in[((1< > > char out[((1< > > };
> > > 
> > > - **ring_order**
> > >   It represents the order of the data ring. The following list of grant
> > >   references is of `(1 << ring_order)` elements. It cannot be greater than
> > >   **max-page-order**, as specified by the backend on XenBus. It has to
> > >   be one at minimum.
> > 
> > Oh? Why not zero? (4KB) as the 'max-page-order' has an example of zero 
> > order?
> > Perhaps if it MUST be one or more then the 'max-page-order' should say
> > that at least it MUST be one?
> 
> So that each in and out array gets to have its own dedicated page,
> although I don't think it's strictly necessary. With zero, they would
> get half a page each.

That is fine. Just pls document 'max-page-order' to make it clear it MUST
be 1 or higher.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-02-09 Thread Stefano Stabellini
On Tue, 7 Feb 2017, Konrad Rzeszutek Wilk wrote:
> .snip..
> >  Frontend XenBus Nodes
> > 
> > version
> >  Values: 
> > 
> >  Protocol version, chosen among the ones supported by the backend
> >  (see **versions** under [Backend XenBus Nodes]). Currently the
> >  value must be "1".
> > 
> > port
> >  Values: 
> > 
> >  The identifier of the Xen event channel used to signal activity
> >  in the command ring.
> > 
> > ring-ref
> >  Values: 
> > 
> >  The Xen grant reference granting permission for the backend to map
> >  the sole page in a single page sized command ring.
> > 
> >  Backend XenBus Nodes
> > 
> > versions
> >  Values: 
> > 
> >  List of comma separated protocol versions supported by the backend.
> >  For example "1,2,3". Currently the value is just "1", as there is
> >  only one version.
> > 
> > max-page-order
> >  Values: 
> > 
> >  The maximum supported size of a memory allocation in units of
> >  log2n(machine pages), e.g. 0 == 1 page,  1 = 2 pages, 2 == 4 pages,
> >  etc.
> 
> .. for the **data rings** (not to be confused with the command ring).
> 
> > 
> > function-calls
> >  Values: 
> > 
> >  Value "0" means that no calls are supported.
> >  Value "1" means that socket, connect, release, bind, listen, accept
> >  and poll are supported.
> > 
> ..snip..
> > ### Commands Ring
> > 
> > The shared ring is used by the frontend to forward POSIX function calls
> > to the backend. We shall refer to this ring as **commands ring** to
> > distinguish it from other rings which can be created later in the
> > lifecycle of the protocol (see [Indexes Page and Data ring]). The grant
> > reference for shared page for this ring is shared on xenstore (see
> > [Frontend XenBus Nodes]). The ring format is defined using the familiar
> > `DEFINE_RING_TYPES` macro (`xen/include/public/io/ring.h`).  Frontend
> > requests are allocated on the ring using the `RING_GET_REQUEST` macro.
> > The list of commands below is in calling order.
> > 
> > The format is defined as follows:
> > 
> > #define PVCALLS_SOCKET 0
> > #define PVCALLS_CONNECT1
> > #define PVCALLS_RELEASE2
> > #define PVCALLS_BIND   3
> > #define PVCALLS_LISTEN 4
> > #define PVCALLS_ACCEPT 5
> > #define PVCALLS_POLL   6
> > 
> > struct xen_pvcalls_request {
> > uint32_t req_id; /* private to guest, echoed in response */
> > uint32_t cmd;/* command to execute */
> > union {
> > struct xen_pvcalls_socket {
> > uint64_t id;
> > uint32_t domain;
> > uint32_t type;
> > uint32_t protocol;
> > #ifdef CONFIG_X86_32
> > uint8_t pad[4];
> 
> Could that be shifted to the right?

Tabs vs Spaces, sigh. I fixed it.


> > #endif
> > } socket;
> > struct xen_pvcalls_connect {
> > uint64_t id;
> > uint8_t addr[28];
> > uint32_t len;
> > uint32_t flags;
> > grant_ref_t ref;
> > uint32_t evtchn;
> > #ifdef CONFIG_X86_32
> > uint8_t pad[4];
> > #endif
> > } connect;
> > struct xen_pvcalls_release {
> > uint64_t id;
> > uint8_t reuse;
> > #ifdef CONFIG_X86_32
> > uint8_t pad[7];
> 
> Could that be shifted to the right?

yep


> > #endif
> > } release;
> > struct xen_pvcalls_bind {
> > uint64_t id;
> > uint8_t addr[28];
> > uint32_t len;
> > } bind;
> > struct xen_pvcalls_listen {
> > uint64_t id;
> > uint32_t backlog;
> > #ifdef CONFIG_X86_32
> > uint8_t pad[4];
> 
> Could that be shifted to the right?

yep


> > #endif
> > } listen;
> > struct xen_pvcalls_accept {
> > uint64_t id;
> > uint64_t id_new;
> > grant_ref_t ref;
> > uint32_t evtchn;
> > } accept;
> > struct xen_pvcalls_poll {
> > uint64_t id;
> > } poll;
> > /* dummy member to force sizeof(struct 
> > xen_pvcalls_request) to match across archs */
> >

Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-02-07 Thread Konrad Rzeszutek Wilk
.snip..
>  Frontend XenBus Nodes
> 
> version
>  Values: 
> 
>  Protocol version, chosen among the ones supported by the backend
>  (see **versions** under [Backend XenBus Nodes]). Currently the
>  value must be "1".
> 
> port
>  Values: 
> 
>  The identifier of the Xen event channel used to signal activity
>  in the command ring.
> 
> ring-ref
>  Values: 
> 
>  The Xen grant reference granting permission for the backend to map
>  the sole page in a single page sized command ring.
> 
>  Backend XenBus Nodes
> 
> versions
>  Values: 
> 
>  List of comma separated protocol versions supported by the backend.
>  For example "1,2,3". Currently the value is just "1", as there is
>  only one version.
> 
> max-page-order
>  Values: 
> 
>  The maximum supported size of a memory allocation in units of
>  log2n(machine pages), e.g. 0 == 1 page,  1 = 2 pages, 2 == 4 pages,
>  etc.

.. for the **data rings** (not to be confused with the command ring).

> 
> function-calls
>  Values: 
> 
>  Value "0" means that no calls are supported.
>  Value "1" means that socket, connect, release, bind, listen, accept
>  and poll are supported.
> 
..snip..
> ### Commands Ring
> 
> The shared ring is used by the frontend to forward POSIX function calls
> to the backend. We shall refer to this ring as **commands ring** to
> distinguish it from other rings which can be created later in the
> lifecycle of the protocol (see [Indexes Page and Data ring]). The grant
> reference for shared page for this ring is shared on xenstore (see
> [Frontend XenBus Nodes]). The ring format is defined using the familiar
> `DEFINE_RING_TYPES` macro (`xen/include/public/io/ring.h`).  Frontend
> requests are allocated on the ring using the `RING_GET_REQUEST` macro.
> The list of commands below is in calling order.
> 
> The format is defined as follows:
> 
> #define PVCALLS_SOCKET 0
> #define PVCALLS_CONNECT1
> #define PVCALLS_RELEASE2
> #define PVCALLS_BIND   3
> #define PVCALLS_LISTEN 4
> #define PVCALLS_ACCEPT 5
> #define PVCALLS_POLL   6
> 
> struct xen_pvcalls_request {
>   uint32_t req_id; /* private to guest, echoed in response */
>   uint32_t cmd;/* command to execute */
>   union {
>   struct xen_pvcalls_socket {
>   uint64_t id;
>   uint32_t domain;
>   uint32_t type;
>   uint32_t protocol;
> #ifdef CONFIG_X86_32
> uint8_t pad[4];

Could that be shifted to the right?
> #endif
>   } socket;
>   struct xen_pvcalls_connect {
>   uint64_t id;
>   uint8_t addr[28];
>   uint32_t len;
>   uint32_t flags;
>   grant_ref_t ref;
>   uint32_t evtchn;
> #ifdef CONFIG_X86_32
> uint8_t pad[4];
> #endif
>   } connect;
>   struct xen_pvcalls_release {
>   uint64_t id;
>   uint8_t reuse;
> #ifdef CONFIG_X86_32
> uint8_t pad[7];

Could that be shifted to the right?
> #endif
>   } release;
>   struct xen_pvcalls_bind {
>   uint64_t id;
>   uint8_t addr[28];
>   uint32_t len;
>   } bind;
>   struct xen_pvcalls_listen {
>   uint64_t id;
>   uint32_t backlog;
> #ifdef CONFIG_X86_32
> uint8_t pad[4];

Could that be shifted to the right?
> #endif
>   } listen;
>   struct xen_pvcalls_accept {
>   uint64_t id;
>   uint64_t id_new;
>   grant_ref_t ref;
>   uint32_t evtchn;
>   } accept;
>   struct xen_pvcalls_poll {
>   uint64_t id;
>   } poll;
>   /* dummy member to force sizeof(struct xen_pvcalls_request) to 
> match across archs */
>   struct xen_pvcalls_dummy {
>   uint8_t dummy[56];
>   } dummy;
>   } u;
> };
> 
> The first two fields are common for every command. Their binary layout
> is:
> 
> 0   4   8
> +---+---+
> |req_id |  cmd  |
> +---+---+
> 
> - **req_id** is generated by the frontend and is a cookie used to
>   identify one specific request/response pair of commands. Not to be
>   confused with any command **id** which are used to identify a socket
>   across multiple commands, see [Socket].
> - **cmd** is the command 

Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-01-29 Thread Oleksandr Andrushchenko



On 01/27/2017 08:48 PM, Stefano Stabellini wrote:

On Fri, 27 Jan 2017, Oleksandr Andrushchenko wrote:

Hi, Stefano!

 Error numbers

The numbers corresponding to the error names specified by POSIX are:

  [EPERM] -1
  [ENOENT]-2


Don't you want to use Xen's errno.h here as described in [1]?
So we have error codes consistent for all PV protocols?

Thanks,
Oleksandr

[1] https://marc.info/?l=xen-devel=148545604312317=2


Hi Oleksandr,

PVCalls is a bit different, because the protocol is meant to send POSIX
calls to the backend, therefore, I have to use POSIX error names in the
protocol.

I could assign any numbers to the names though. It makes sense to use
the Xen/Linux error numbers for simplicity. Whether I declare them
directly as numbers as I have done here, or indirectly as XEN_ERRNO, I
don't think it matters much. But I think that using numbers is clearer,
that's why I did it that way.

got it, thanks

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-01-27 Thread Stefano Stabellini
On Fri, 27 Jan 2017, Oleksandr Andrushchenko wrote:
> Hi, Stefano!
> >  Error numbers
> > 
> > The numbers corresponding to the error names specified by POSIX are:
> > 
> >  [EPERM] -1
> >  [ENOENT]-2
> > 
> Don't you want to use Xen's errno.h here as described in [1]?
> So we have error codes consistent for all PV protocols?
> 
> Thanks,
> Oleksandr
> 
> [1] https://marc.info/?l=xen-devel=148545604312317=2
> 

Hi Oleksandr,

PVCalls is a bit different, because the protocol is meant to send POSIX
calls to the backend, therefore, I have to use POSIX error names in the
protocol.

I could assign any numbers to the names though. It makes sense to use
the Xen/Linux error numbers for simplicity. Whether I declare them
directly as numbers as I have done here, or indirectly as XEN_ERRNO, I
don't think it matters much. But I think that using numbers is clearer,
that's why I did it that way.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v8] PV Calls protocol design

2017-01-26 Thread Oleksandr Andrushchenko

Hi, Stefano!

 Error numbers

The numbers corresponding to the error names specified by POSIX are:

 [EPERM] -1
 [ENOENT]-2


Don't you want to use Xen's errno.h here as described in [1]?
So we have error codes consistent for all PV protocols?

Thanks,
Oleksandr

[1] https://marc.info/?l=xen-devel=148545604312317=2

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [DOC v8] PV Calls protocol design

2017-01-24 Thread Stefano Stabellini
Changes in v8:
- introduce the concept of indexes page
- many clarifications
- add a diagram
- introduce support for multiple versions of the protocol

Changes in v7:
- add a glossary of Xen terms 
- add a paragraph on why Xen was chosen 
- wording improvements
- add links to xenstore documents and headers
- specify that the current version is "1"
- rename max-dataring-page-order to max-page-order
- rename networking-calls to function-calls
- add links to [Data ring] throughout the document
- explain the difference between req_id and id
- mention that future commands larger than 56 bytes will require a
  version increase
- mention that the list of commands is in calling order
- clarify that reuse data rings are found by ref
- rename ENOTSUPP to ENOTSUP
- add padding in struct pvcalls_data_intf for cachelining
- rename pvcalls_ring_queued to pvcalls_ring_unconsumed


Changes in v6:
- add reuse field in release command
- add "networking-calls" backend node on xenstore
- fixed tab/whitespace indentation

Changes in v5:
- clarify text
- rename id to req_id
- rename sockid to id
- move id to request and response specific fields
- add version node to xenstore

Changes in v4:
- rename xensock to pvcalls

Changes in v3:
- add a dummy element to struct xen_xensock_request to make sure the
  size of the struct is the same on both x86_32 and x86_64

Changes in v2:
- add max-dataring-page-order
- add "Publish backend features and transport parameters" to backend
  xenbus workflow
- update new cmd values
- update xen_xensock_request
- add backlog parameter to listen and binary layout
- add description of new data ring format (interface+data)
- modify connect and accept to reflect new data ring format
- add link to POSIX docs
- add error numbers
- add address format section and relevant numeric definitions
- add explicit mention of unimplemented commands
- add protocol node name
- add xenbus shutdown diagram
- add socket operation

---

# PV Calls Protocol version 1

## Glossary

The following is a list of terms and definitions used in the Xen
community. If you are a Xen contributor you can skip this section.

* PV

  Short for paravirtualized.

* Dom0

  First virtual machine that boots. In most configurations Dom0 is
  privileged and has control over hardware devices, such as network
  cards, graphic cards, etc.

* DomU

  Regular unprivileged Xen virtual machine.

* Domain

  A Xen virtual machine. Dom0 and all DomUs are all separate Xen
  domains.

* Guest

  Same as domain: a Xen virtual machine.

* Frontend

  Each DomU has one or more paravirtualized frontend drivers to access
  disks, network, console, graphics, etc. The presence of PV devices is
  advertized on XenStore, a cross domain key-value database. Frontends
  are similar in intent to the virtio drivers in Linux.

* Backend

  A Xen paravirtualized backend typically runs in Dom0 and it is used to
  export disks, network, console, graphics, etcs, to DomUs. Backends can
  live both in kernel space and in userspace. For example xen-blkback
  lives under drivers/block in the Linux kernel and xen_disk lives under
  hw/block in QEMU. Paravirtualized backends are similar in intent to
  virtio device emulators.

* VMX and SVM
  
  On Intel processors, VMX is the CPU flag for VT-x, hardware
  virtualization support. It corresponds to SVM on AMD processors.



## Rationale

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

This version of the document covers networking function calls, such as
connect, accept, bind, release, listen, poll, recvmsg and sendmsg; but
the protocol is meant to be easily extended to cover different sets of
calls. Unimplemented commands return ENOTSUP.

PV Calls provide the following benefits:
* full visibility of the guest behavior on the backend domain, allowing
  for inexpensive filtering and manipulation of any guest calls
* excellent performance

Specifically, PV Calls for networking offer these advantages:
* guest networking works out of the box with VPNs, wireless networks and
  any other complex configurations on the host
* guest services listen on ports bound directly to the backend domain IP
  addresses
* localhost becomes a secure host wide network for inter-VMs
  communications


## Design

### Why Xen?

PV Calls are part of an effort to create a secure runtime environment
for containers (Open Containers Initiative images to be precise). PV
Calls are based on Xen, although porting them to other hypervisors is
possible. Xen was chosen because of its security and isolation
properties and because it supports PV guests, a type of virtual machines
that does not require hardware virtualization extensions (VMX on Intel
processors and SVM on AMD processors). This is important because PV
Calls is meant for