On Thu, Aug 04, 2016 at 04:24:37PM +0100, Stefan Hajnoczi wrote:
> The virtio vsock device is a zero-configuration socket communications
> device.  It is designed as a guest<->host management channel suitable
> for communicating with guest agents.
> 
> vsock is designed with the sockets API in mind and the driver is
> typically implemented as an address family (at the same level as
> AF_INET).  Applications written for the sockets API can be ported with
> minimal changes (similar amount of effort as adding IPv6 support to an
> IPv4 application).
> 
> Unlike the existing console device, which is also used for guest<->host
> communication, multiple clients can connect to a server at the same time
> over vsock.  This limitation requires console-based users to arbitrate
> access through a single client.  In vsock they can connect directly and
> do not have to synchronize with each other.
> 
> Unlike network devices, no configuration is necessary because the device
> comes with its address in the configuration space.
> 
> The vsock device was prototyped by Gerd Hoffmann and Asias He.  I picked
> the code and design up from them.
> 
> VIRTIO-151
> 
> Cc: Gerd Hoffmann <kra...@redhat.com>
> Cc: Asias He <asias.he...@gmail.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>

It's been a while. Stefan, did the implementation change since then?
If not shouldn't we vote on this?


> ---
> v7:
>  * Add virtqueue flow control section to explain how deadlock is avoided
>    when rings are full [Ian]
> 
> v6:
>  * Make CIDs 64-bits but reserve upper 32 bits for now [Michael]
>  * Specify SHUTDOWN -> RST clean disconnect process [Ian]
> 
> v5:
>  * Switch to new, unused Device ID 19 [Ian]
>  * Drop unused ctrl virtqueue, no need to reserve last virtqueue [Ian]
>  * Document that VIRTIO_VSOCK_OP_CREDIT_UPDATE packets are valid even if
>    no VIRTIO_VSOCK_OP_CREDIT_REQUEST was previously received. [Ian]
>  * Document that only payload bytes are counted for buffer space
>    management, not header bytes [Ian]
>  * List the reserved CIDs [Ian]
> 
> v4:
>  * Add event virtqueue and "Device Events" device operation section that
>    explains how transport reset works for migration.
>  * Reorder virtqueues with rx/tx first, then ctrl/event (similar to
>    virtio-net)
>  * __le32/16 -> le32/16 for consistency with existing code snippets
>  * Add missing conformance.tex subsections for socket device entry in
>    table of contents
> 
> v3:
>  * "VSock device" -> "Virtio socket device" in free text [Michael]
>  * Extract normative statements and add references from conformance
>    chapter [Michael]
> v2:
>  * Document guest_cid field
>  * Use MAY/MUST/CAN according to RFC 2119
>  * Remove datagram socket type for the time being.  This can be added in
>    the future but there are currently no applications.
>  * Drop 3-way handshake for stream sockets.  It is not needed since
>    virtio-vsock is reliable, in-order delivery and spoofing source
>    addresses is impossible.
>  * Drop max_virtqueue_pairs configuration space field.  This field was
>    never defined and Linux code does not support multiqueue.  It can be
>    added back later, if necessary.
> ---
>  trunk/conformance.tex |  23 ++++-
>  trunk/content.tex     | 280 
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 301 insertions(+), 2 deletions(-)
> 
> diff --git a/trunk/conformance.tex b/trunk/conformance.tex
> index f59e360..7ee63ed 100644
> --- a/trunk/conformance.tex
> +++ b/trunk/conformance.tex
> @@ -15,13 +15,13 @@ Conformance targets:
>    \begin{itemize}
>      \item Clause \ref{sec:Conformance / Driver Conformance},
>      \item One of clauses \ref{sec:Conformance / Driver Conformance / PCI 
> Driver Conformance}, \ref{sec:Conformance / Driver Conformance / MMIO Driver 
> Conformance} or \ref{sec:Conformance / Driver Conformance / Channel I/O 
> Driver Conformance}.
> -    \item One of clauses \ref{sec:Conformance / Driver Conformance / Network 
> Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory 
> Balloon Driver Conformance} or \ref{sec:Conformance / Driver Conformance / 
> SCSI Host Driver Conformance}.
> +    \item One of clauses \ref{sec:Conformance / Driver Conformance / Network 
> Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver 
> Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory 
> Balloon Driver Conformance}, \ref{sec:Conformance / Driver Conformance / SCSI 
> Host Driver Conformance} or \ref{sec:Conformance / Driver Conformance / 
> Socket Driver Conformance}.
>    \end{itemize}
>  \item[Device] A device MUST conform to three conformance clauses:
>    \begin{itemize}
>      \item Clause \ref{sec:Conformance / Device Conformance},
>      \item One of clauses \ref{sec:Conformance / Device Conformance / PCI 
> Device Conformance}, \ref{sec:Conformance / Device Conformance / MMIO Device 
> Conformance} or \ref{sec:Conformance / Device Conformance / Channel I/O 
> Device Conformance}.
> -    \item One of clauses \ref{sec:Conformance / Device Conformance / Network 
> Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Console Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory 
> Balloon Device Conformance} or \ref{sec:Conformance / Device Conformance / 
> SCSI Host Device Conformance}.
> +    \item One of clauses \ref{sec:Conformance / Device Conformance / Network 
> Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Console Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device 
> Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory 
> Balloon Device Conformance}, \ref{sec:Conformance / Device Conformance / SCSI 
> Host Device Conformance} or \ref{sec:Conformance / Device Conformance / 
> Socket Device Conformance}.
>    \end{itemize}
>  \end{description}
>  
> @@ -146,6 +146,16 @@ An SCSI host driver MUST conform to the following 
> normative statements:
>  \item \ref{drivernormative:Device Types / SCSI Host Device / Device 
> Operation / Device Operation: eventq}
>  \end{itemize}
>  
> +\subsection{Socket Driver Conformance}\label{sec:Conformance / Driver 
> Conformance / Socket Driver Conformance}
> +
> +A socket driver MUST conform to the following normative statements:
> +
> +\begin{itemize}
> +\item \ref{drivernormative:Device Types / Socket Device / Device Operation / 
> Buffer Space Management}
> +\item \ref{drivernormative:Device Types / Socket Device / Device Operation / 
> Receive and Transmit}
> +\item \ref{drivernormative:Device Types / Socket Device / Device Operation / 
> Device Events}
> +\end{itemize}
> +
>  \section{Device Conformance}\label{sec:Conformance / Device Conformance}
>  
>  A device MUST conform to the following normative statements:
> @@ -267,6 +277,15 @@ An SCSI host device MUST conform to the following 
> normative statements:
>  \item \ref{devicenormative:Device Types / SCSI Host Device / Device 
> Operation / Device Operation: eventq}
>  \end{itemize}
>  
> +\subsection{Socket Device Conformance}\label{sec:Conformance / Device 
> Conformance / Socket Device Conformance}
> +
> +A socket device MUST conform to the following normative statements:
> +
> +\begin{itemize}
> +\item \ref{devicenormative:Device Types / Socket Device / Device Operation / 
> Buffer Space Management}
> +\item \ref{devicenormative:Device Types / Socket Device / Device Operation / 
> Receive and Transmit}
> +\end{itemize}
> +
>  \section{Legacy Interface: Transitional Device and
>  Transitional Driver Conformance}\label{sec:Conformance / Legacy
>  Interface: Transitional Device and 
> diff --git a/trunk/content.tex b/trunk/content.tex
> index 4eebfc6..71567eb 100644
> --- a/trunk/content.tex
> +++ b/trunk/content.tex
> @@ -5752,6 +5752,286 @@ descriptor for the \field{sense_len}, 
> \field{residual},
>  \field{status_qualifier}, \field{status}, \field{response} and
>  \field{sense} fields.
>  
> +\section{Socket Device}\label{sec:Device Types / Socket Device}
> +
> +The virtio socket device is a zero-configuration socket communications 
> device.
> +It facilitates data transfer between the guest and device without using the
> +Ethernet or IP protocols.
> +
> +\subsection{Device ID}\label{sec:Device Types / Socket Device / Device ID}
> +  19
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> +\begin{description}
> +\item[0] rx
> +\item[1] tx
> +\item[2] event
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature 
> bits}
> +
> +\begin{description}
> +There are currently no feature bits defined for this device.
> +\end{description}
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Socket 
> Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_vsock_config {
> +     le64 guest_cid;
> +};
> +\end{lstlisting}
> +
> +The \field{guest_cid} field contains the guest's context ID, which uniquely
> +identifies the device for its lifetime.  The upper 32 bits of the CID are
> +reserved and zeroed.
> +
> +The following CIDs are reserved and cannot be used as the guest's context ID:
> +
> +\begin{tabular}{|l|l|}
> +\hline
> +CID    & Notes \\
> +\hline \hline
> +0                 & Reserved \\
> +\hline
> +1                 & Reserved \\
> +\hline
> +2                 & Well-known CID for the host \\
> +\hline
> +0xffffffff        & Reserved \\
> +\hline
> +0xffffffffffffffff        & Reserved \\
> +\hline
> +\end{tabular}
> +
> +\subsection{Device Initialization}\label{sec:Device Types / Socket Device / 
> Device Initialization}
> +
> +\begin{enumerate}
> +\item The guest's cid is read from \field{guest_cid}.
> +
> +\item Buffers are added to the event virtqueue to receive events from the 
> device.
> +
> +\item Buffers are added to the rx virtqueue to start receiving packets.
> +\end{enumerate}
> +
> +\subsection{Device Operation}\label{sec:Device Types / Socket Device / 
> Device Operation}
> +
> +Packets transmitted or received contain a header before the payload:
> +
> +\begin{lstlisting}
> +struct virtio_vsock_hdr {
> +     le64 src_cid;
> +     le64 dst_cid;
> +     le32 src_port;
> +     le32 dst_port;
> +     le32 len;
> +     le16 type;
> +     le16 op;
> +     le32 flags;
> +     le32 buf_alloc;
> +     le32 fwd_cnt;
> +};
> +\end{lstlisting}
> +
> +The upper 32 bits of src_cid and dst_cid are reserved and zeroed.
> +
> +Most packets simply transfer data but control packets are also used for
> +connection and buffer space management.  \field{op} is one of the following
> +operation constants:
> +
> +\begin{lstlisting}
> +enum {
> +     VIRTIO_VSOCK_OP_INVALID = 0,
> +
> +     /* Connect operations */
> +     VIRTIO_VSOCK_OP_REQUEST = 1,
> +     VIRTIO_VSOCK_OP_RESPONSE = 2,
> +     VIRTIO_VSOCK_OP_RST = 3,
> +     VIRTIO_VSOCK_OP_SHUTDOWN = 4,
> +
> +     /* To send payload */
> +     VIRTIO_VSOCK_OP_RW = 5,
> +
> +     /* Tell the peer our credit info */
> +     VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
> +     /* Request the peer to send the credit info to us */
> +     VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
> +};
> +\end{lstlisting}
> +
> +\subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket 
> Device / Device Operation / Virtqueue Flow Control}
> +
> +The tx virtqueue carries packets initiated by applications and replies to
> +received packets.  The rx virtqueue carries packets initiated by the device 
> and
> +replies to previously transmitted packets.
> +
> +If both rx and tx virtqueues are filled by the driver and device at the same
> +time then it appears that a deadlock is reached.  The driver has no free tx
> +descriptors to send replies.  The device has no free rx descriptors to send
> +replies either.  Therefore neither device nor driver can process virtqueues
> +since that may involve sending new replies.
> +
> +This is solved using additional resources outside the virtqueue to hold
> +packets.  With additional resources, it becomes possible to process incoming
> +packets even when outgoing packets cannot be sent.
> +
> +Eventually even the additional resources will be exhausted and further
> +processing is not possible until the other side processes the virtqueue that
> +it has neglected.  This stop to processing prevents one side from causing
> +unbounded resource consumption in the other side.
> +
> +\drivernormative{\paragraph}{Device Operation: Virtqueue Flow 
> Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow 
> Control}
> +
> +The rx virtqueue MUST be processed even when the tx virtqueue is full so 
> long as there are additional resources available to hold packets outside the 
> tx virtqueue.
> +
> +\devicenormative{\paragraph}{Device Operation: Virtqueue Flow 
> Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow 
> Control}
> +
> +The tx virtqueue MUST be processed even when the rx virtqueue is full so 
> long as there are additional resources available to hold packets outside the 
> rx virtqueue.
> +
> +\subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device 
> Operation / Addressing}
> +
> +Flows are identified by a (source, destination) address tuple.  An address
> +consists of a (cid, port number) tuple. The header fields used for this are
> +\field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> +
> +Currently only stream sockets are supported. \field{type} is 1 for stream
> +socket types.
> +
> +Stream sockets provide in-order, guaranteed, connection-oriented delivery
> +without message boundaries.
> +
> +\subsubsection{Buffer Space Management}\label{sec:Device Types / Socket 
> Device / Device Operation / Buffer Space Management}
> +\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> +stream sockets. The guest and the device publish how much buffer space is
> +available per socket. Only payload bytes are counted and header bytes is not
> +included. This facilitates flow control so data is never dropped.
> +
> +\field{buf_alloc} is the total receive buffer space, in bytes, for this 
> socket.
> +This includes both free and in-use buffers. \field{fwd_cnt} is the 
> free-running
> +bytes received counter. The sender calculates the amount of free receive 
> buffer
> +space as follows:
> +
> +\begin{lstlisting}
> +/* tx_cnt is the sender's free-running bytes transmitted counter */
> +u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> +\end{lstlisting}
> +
> +If there is insufficient buffer space, the sender waits until virtqueue 
> buffers
> +are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
> +the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> +available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> +It is also valid to send a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet without
> +previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> +communicating updates any time a change in buffer space occurs.
> +
> +\drivernormative{\paragraph}{Device Operation: Buffer Space 
> Management}{Device Types / Socket Device / Device Operation / Buffer Space 
> Management}
> +VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> +sufficient free buffer space for the payload.
> +
> +All packets associated with a stream flow MUST contain valid information in
> +\field{buf_alloc} and \field{fwd_cnt} fields.
> +
> +\devicenormative{\paragraph}{Device Operation: Buffer Space 
> Management}{Device Types / Socket Device / Device Operation / Buffer Space 
> Management}
> +VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> +sufficient free buffer space for the payload.
> +
> +All packets associated with a stream flow MUST contain valid information in
> +\field{buf_alloc} and \field{fwd_cnt} fields.
> +
> +\subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device 
> / Device Operation / Receive and Transmit}
> +The driver queues outgoing packets on the tx virtqueue and incoming packet
> +receive buffers on the rx virtqueue. Packets are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_vsock_packet {
> +    struct virtio_vsock_hdr hdr;
> +    u8 data[];
> +};
> +\end{lstlisting}
> +
> +Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
> +incoming packets are write-only.
> +
> +\drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device 
> Types / Socket Device / Device Operation / Receive and Transmit}
> +
> +The \field{guest_cid} configuration field MUST be used as the source CID when
> +sending outgoing packets.
> +
> +A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> +unknown \field{type} value.
> +
> +\devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device 
> Types / Socket Device / Device Operation / Receive and Transmit}
> +
> +The \field{guest_cid} configuration field MUST NOT contain a reserved CID as 
> listed in \ref{sec:Device Types / Socket Device / Device configuration 
> layout}.
> +
> +A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> +unknown \field{type} value.
> +
> +\subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / 
> Device Operation / Stream Sockets}
> +
> +Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
> +listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply 
> is
> +sent and the connection is established.  A VIRTIO_VSOCK_OP_RST reply is sent 
> if
> +a listening socket does not exist on the destination or the destination has
> +insufficient resources to establish the connection.
> +
> +When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
> +\field{flags} field bit 0 indicates that the peer will not receive any more
> +data and bit 1 indicates that the peer will not send any more data.  These
> +hints are permanent once sent and successive packets with bits clear do not
> +reset them.
> +
> +The VIRTIO_VSOCK_OP_RST packet aborts the connection process or forcibly
> +disconnects a connected socket.
> +
> +Clean disconnect is achieved by one or more VIRTIO_VSOCK_OP_SHUTDOWN packets
> +that indicate no more data will be sent and received, followed by a
> +VIRTIO_VSOCK_OP_RST response from the peer.  If no VIRTIO_VSOCK_OP_RST 
> response
> +is received within an implementation-specific amount of time, a
> +VIRTIO_VSOCK_OP_RST packet is sent to forcibly disconnect the socket.
> +
> +The clean disconnect process ensures that neither peer reuses the (source,
> +destination) address tuple for a new connection while the other peer is still
> +processing the old connection.
> +
> +\subsubsection{Device Events}\label{sec:Device Types / Socket Device / 
> Device Operation / Device Events}
> +
> +Certain events are communicated by the device to the driver using the event
> +virtqueue.
> +
> +The event buffer is as follows:
> +
> +\begin{lstlisting}
> +enum virtio_vsock_event_id {
> +        VIRTIO_VSOCK_EVENT_TRANSPORT_RESET = 0,
> +};
> +
> +struct virtio_vsock_event {
> +        le32 id;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event indicates that communication has
> +been interrupted.  This usually occurs if the guest has been physically
> +migrated.  The driver shuts down established connections and the
> +\field{guest_cid} configuration field is fetched again.  Existing listen
> +sockets remain but their CID is updated to reflect the current
> +\field{guest_cid}.
> +
> +\drivernormative{\paragraph}{Device Operation: Device Events}{Device Types / 
> Socket Device / Device Operation / Device Events}
> +
> +Event virtqueue buffers SHOULD be replenished quickly so that no events are
> +missed.
> +
> +The \field{guest_cid} configuration field MUST be fetched to determine the
> +current CID when a VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event is received.
> +
> +Existing connections MUST be shut down when a
> +VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event is received.
> +
> +Listen connections MUST remain operational with the current CID when a
> +VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event is received.
> +
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  Currently there are three device-independent feature bits defined:
> -- 
> 2.7.4
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Reply via email to