On Wed, Jun 22, 2022 at 11:52:30AM +0800, Jason Wang wrote: > > 在 2022/6/21 18:54, Michael S. Tsirkin 写道: > > On Mon, Jun 20, 2022 at 11:13:47AM +0800, Xuan Zhuo wrote: > > > On Fri, 17 Jun 2022 05:58:11 -0400, "Michael S. Tsirkin" > > > <m...@redhat.com> wrote: > > > > On Fri, Jun 17, 2022 at 03:16:01PM +0800, Xuan Zhuo wrote: > > > > > The purpose of this feature is to split the header and the payload of > > > > > the packet. > > > > > > > > > > | receive buffer > > > > > | > > > > > | 0th descriptor | 1th descriptor > > > > > | > > > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload > > > > > | > > > > > > > > > > We can use a buffer plus a separate page when allocating the receive > > > > > buffer. In this way, we can ensure that all payloads can be > > > > > independently in a page, which is very beneficial for the zerocopy > > > > > implemented by the upper layer. > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanz...@linux.alibaba.com> > > > > > Reviewed-by: Heng Qi <hen...@linux.alibaba.com> > > > > > Reviewed-by: Kangjie Xu <kangjie...@linux.alibaba.com> > > > > Could you describe the zerocopy in more detail? I'd like to understand > > e.g. whether it mixes with mergeable buffers. > > > > I actually have an idea. Driver could specify an offset from start of buffer > > at which payload is written. This works with mergeable and without, > > > I don't see any difference compared to the current proposal. The only > difference is whether or not we should explicit tell the device about the > offset.
The current proposal seems to rely on descriptor framing, this is using an offset instead. > > > and if header is > offset then flag in packet will simply not be set. > > And I think we will want another knob to specify an offset for packets > > which can't be split. > > > > What do you think? > > > > > > > > > > > --- > > > > > v4: > > > > > 1. fix typo @Cornelia Huck @Jason Wang > > > > > 2. do not split header for IP fragmentation packet. @Jason Wang > > > > > > > > > > v3: > > > > > 1. Fix some syntax issues > > > > > 2. Fix some terminology issues > > > > > 3. It is not unified with ip alignment, so ip alignment is not > > > > > included > > > > > 4. Make it clear that the device must support four types, in the > > > > > case of > > > > > successful negotiation. > > > > > > > > > > conformance.tex | 2 ++ > > > > > content.tex | 88 > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 90 insertions(+) > > > > > > > > > > diff --git a/conformance.tex b/conformance.tex > > > > > index 663e7c3..6f561fb 100644 > > > > > --- a/conformance.tex > > > > > +++ b/conformance.tex > > > > > @@ -149,6 +149,7 @@ \section{Conformance > > > > > Targets}\label{sec:Conformance / Conformance Targets} > > > > > \item \ref{drivernormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Automatic receive steering in > > > > > multiqueue mode} > > > > > \item \ref{drivernormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Offloads State Configuration / > > > > > Setting Offloads State} > > > > > \item \ref{drivernormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) } > > > > > +\item \ref{drivernormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Split Header} > > > > > \end{itemize} > > > > > > > > > > \conformance{\subsection}{Block Driver > > > > > Conformance}\label{sec:Conformance / Driver Conformance / Block > > > > > Driver Conformance} > > > > > @@ -411,6 +412,7 @@ \section{Conformance > > > > > Targets}\label{sec:Conformance / Conformance Targets} > > > > > \item \ref{devicenormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Gratuitous Packet Sending} > > > > > \item \ref{devicenormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Automatic receive steering in > > > > > multiqueue mode} > > > > > \item \ref{devicenormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS > > > > > processing} > > > > > +\item \ref{devicenormative:Device Types / Network Device / Device > > > > > Operation / Control Virtqueue / Split Header} > > > > > \end{itemize} > > > > > > > > > > \conformance{\subsection}{Block Device > > > > > Conformance}\label{sec:Conformance / Device Conformance / Block > > > > > Device Conformance} > > > > > diff --git a/content.tex b/content.tex > > > > > index 7508dd1..78e0ce8 100644 > > > > > --- a/content.tex > > > > > +++ b/content.tex > > > > > @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device > > > > > Types / Network Device / Feature bits > > > > > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through > > > > > control > > > > > channel. > > > > > > > > > > +\item[VIRTIO_NET_F_SPLIT_HEADER (55)] Device supports splitting the > > > > > protocol > > > > > + header and the payload. > > > > > + > > > > > > > > What is split exactly? The name makes it sound like header is split, > > > > and you repeat this everywhere. I suspect it's actually something like > > > > protocol split. > > > > > > This name comes from other network cards, if you think other names are > > > better, I > > > also feel ok. > > So ethtool calls it "TCP header/data split" and the macro is > > ETHTOOL_TCP_DATA_SPLIT. > > > Just FYI, e810 called it "header split" and it support split at various > layers (l2/l3/l4 or even inner header). > > > > > > > > So something like "supports TCP/UDP protocol split > > > > - writing out protocol payload > > > > and header parts of incoming packets into separate buffers"? > > > I don't want to stress tcp/udp here I want to leave room for other > > > protocols in > > > the future. > > Then we'll add more feature flags (we'll need to, anyway). > > Separating TCP and UDP is a good idea, e.g. ethtool only seems > > to support TCP ... > > > > > > I propose we just add 4 bits: > > > > > > > > VIRTIO_NET_F_SPLIT_TCP4 > > > > VIRTIO_NET_F_SPLIT_TCP6 > > > > VIRTIO_NET_F_SPLIT_UDP4 > > > > VIRTIO_NET_F_SPLIT_UDP6 > > > > > > > > Accordingly below we say everywhere: > > > > if one of > > > > VIRTIO_NET_F_SPLIT_TCP4,VIRTIO_NET_F_SPLIT_TCP6,VIRTIO_NET_F_SPLIT_UDP4,VIRTIO_NET_F_SPLIT_UDP6 > > > > > > > > > > > > > \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. > > > > > Unlike UFO > > > > > (fragmenting the packet) the USO splits large UDP packet > > > > > to several segments when each of these smaller packets has UDP > > > > > header. > > > > > @@ -3131,6 +3134,7 @@ \subsubsection{Feature bit > > > > > requirements}\label{sec:Device Types / Network Device > > > > > \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ. > > > > > \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or > > > > > VIRTIO_NET_F_HOST_TSO6. > > > > > \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. > > > > > +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. > > > > > \end{description} > > > > > > > > > > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device > > > > > Types / Network Device / Feature bits / Legacy Interface: Feature > > > > > bits} > > > > > @@ -3362,6 +3366,7 @@ \subsection{Device Operation}\label{sec:Device > > > > > Types / Network Device / Device O > > > > > #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 > > > > > #define VIRTIO_NET_HDR_F_DATA_VALID 2 > > > > > #define VIRTIO_NET_HDR_F_RSC_INFO 4 > > > > > +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 > > > > > u8 flags; > > > > > #define VIRTIO_NET_HDR_GSO_NONE 0 > > > > > #define VIRTIO_NET_HDR_GSO_TCPV4 1 > > > > > @@ -4463,6 +4468,89 @@ \subsubsection{Control > > > > > Virtqueue}\label{sec:Device Types / Network Device / Devi > > > > > according to the native endian of the guest rather than > > > > > (necessarily when not using the legacy interface) little-endian. > > > > > > > > > > +\paragraph{Split Header}\label{sec:Device Types / Network Device / > > > > > Device Operation / Control Virtqueue / Split Header} > > > > > + > > > > > +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated, > > > > > +the device supports splitting the protocol header and the payload. > > > > > +The protocol header and the payload will be separated into different > > > > > +descriptors. > > > > Descriptors or buffers? I think for mergeable buffers you want > > > > buffers. For non mergeable trickier. Maybe just make these > > > > flags depend on mergeable? > > > > > > > > > > > > > + > > > > > +\subparagraph{Split Header}\label{sec:Device Types / Network Device > > > > > / Device Operation / Control Virtqueue / Split Header / Setting Split > > > > > Header} > > > > > + > > > > > +To configure the split header, the following layout structure and > > > > > definitions > > > > > +are used: > > > > > + > > > > > +\begin{lstlisting} > > > > > +struct virtio_net_split_header_config { > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4 (1 << 0) > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6 (1 << 1) > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4 (1 << 2) > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6 (1 << 3) > > > > > + le64 type; > > > > > +}; > > > > > + > > > > > +#define VIRTIO_NET_CTRL_SPLIT_HEADER 6 > > > > > + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET 0 > > > > > +\end{lstlisting} > > > > > + > > > > > +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command: > > > > > +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header > > > > > configuration. > > > > > + > > > > > +The driver can enable or disable split header for different > > > > > protocols by > > > > > +setting or clearing corresponding bits in \field{type}. > > > > > + > > > > > +\begin{itemize} > > > > > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp > > > > > header > > > > > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp > > > > > header > > > > > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp > > > > > header > > > > > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp > > > > > header > > > > > +\end{itemize} > > > > and what is the default? > > > > > > > > > + > > > > > +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / > > > > > Network Device / Device Operation / Control Virtqueue / Split Header} > > > > > + > > > > > +A device MUST initialize \field{type} to 0, and MUST set it to 0 > > > > > +upon device reset. > > > > > + > > > > > +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support > > > > > +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6, > > > > > +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6. > > > > pls move or copy all MUST/SHALL etc to a conformance section. > > > > > > > > > > > > > > > > > + > > > > > +A device MUST NOT split the header in the following cases: > > > > if all of the following apply? if one of these applies? > > > > > > > > > +\begin{itemize} > > > > > + \item the device does not recognize the protocol of the packet. > > > > > + \item the packet is an IP fragmentation. > > > > > + \item \field{type} does not include the protocol of the packet. > > > > > + \item the buffer consists of only one descriptor. > > > > > + \item the total size of the virtio net header and the protocol > > > > > header exceeds > > > > > + the size of the first descriptor. > > > > > + \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size > > > > > of the > > > > > + payload exceeds the size of the descriptor chain starting > > > > > from the 2nd > > > > > + descriptor. > > > > > +\end{itemize} > > > > > > > > Pls spell out what it should do in this case. > > > > > > > > > + > > > > > +If the header is split by the device, the \field{flags} of structure > > > > > +virtio_net_hdr MUST contain VIRTIO_NET_HDR_F_SPLIT_HEADER. > > > > Not contain. This bit must be set. > > > Yes, will fix. > > > > > > > > > > > The protocol header > > > > > +MUST be on the buffer specified by the first descriptor, > > > > buffers?descriptors? this uses both of them, we need clarity. > > > Yes. > > > > > > > > following the virtio > > > > > +net header. The payload MUST start from the second descriptor. The > > > > > device MUST > > > > > +set \field{hdr_len} of structure virtio_net_hdr to the length of the > > > > > protocol > > > > > +header. > > > > > + > > > > > +If the header is split by the device, VIRTIO_NET_F_MRG_RXBUF is > > > > > negotiated, > > > > > +and the received packet is spread over multiple buffers, when the > > > > > device uses a > > > > > +buffer other than the first, if the buffer consists of multiple > > > > > descriptors, the > > > > > +device MUST skip the first descriptor of the buffer, because the > > > > > first > > > > > +descriptor is used to carry the protocol header. > > > > This is a layering violation in that device suddenly cares how > > > > buffers are described in the ring structure. We don't even guarantee > > > > we will have descriptors going forward ... > > > > > > The problem with using merge to implement this function is that the size > > > of the > > > hdr buf prepared by the driver and the payload buffer are inconsistent. > > > For > > > example, the size of the hdr buf is 128 and the payload buf is 4k. > > > > > > If you use merge to implement it, the avali queue will look like the > > > following, > > > so if you want to process a package that cannot be split, but there is an > > > hdr > > > buf in the queue, it will be very inappropriate to process. > > > > > > | 128 buf | 4k buf | 128 buf | 4k buf | 128 buf | 4k buf | 128 buf | 4k > > > buf | > > > > > > If the size of all bufs in the queue is 4k, using one page to store a > > > small > > > protocol header is also a waste of memory. > > Right. However > > - we can use learning/statistics tricks like we do currently to guess > > whether splitting is good or not and size buffers accordingly > > > I'm not sure this is good design. And it basically means if the driver can't > do learning stuffs, header split doesn't make too much sense for them. > > What's more, taking Linux driver as an example, if we learn the packet > should be 64K, do we want to allocate 64K buffer? (which is not sub-optimal > because of the high order allocation). And in the future we may want to > support big ipv6 TCP packets (larger than 64K). > > Thanks > > > - if buffer s too big headers can be copied and buffer released (they are > > hot in cache > > anyway) > > > > > So using descriptor to concatenate hdr buf and payload buf is a better > > > way. > > Well with options there's no real limit on the header size, is there? > > > > > Do you have other better ideas? > > Not at the moment > > > > > Or maybe I don't understand what you mean. > > > > > > > > > As for the problem of desciptors without guarantees, it should be a very > > > big > > > reform. I don't know what it will look like. > > > > > > But I think that there are always scenarios where a large buffer is > > > formed by > > > multiple small buffers. > > > > > > > Switching to next descriptor before first one is used is also > > > > problematic since it's unclear what will the length be. > > > Yes, this is a problem, I don't have a clear definition of this behavior. > > > > > > > > > > How does driver find out the header length? > > > +The device MUST > > > +set \field{hdr_len} of structure virtio_net_hdr to the length of the > > > protocol > > > +header. > > > > > > > > > > > > I propose we just assume VIRTIO_NET_F_MRG_RXBUF is set, and > > > > drop the multiple descriptors in the buffer idea. > > > > > > > > > > > > What happens if header does not fit in a single buffer? > > > + \item the total size of the virtio net header and the protocol > > > header exceeds > > > + the size of the first descriptor. > > might be tricky to always guarantee. what's the plan? > > > > > > > > How does driver > > > > find out where does the header end? > > > +The device MUST > > > +set \field{hdr_len} of structure virtio_net_hdr to the length of the > > > protocol > > > +header. > > So we need to clarify virtio_net_hdr fits in the 1st buffer, and audit > > references to hdr_len to ensure there are no contradictions. > > E.g. we say "the driver MUST NOT rely on \field{hdr_len} to be correct." > > > > > > > > > > > + > > > > > +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / > > > > > Network Device / Device Operation / Control Virtqueue / Split Header} > > > > > + > > > > > +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, > > > > > the driver > > > > > +MUST treat the contents of \field{hdr_len} as the length of the > > > > > protocol header > > > > > +inside the first descriptor. > > > > > + > > > > > +If the split header is enabled, the buffers submitted to receiveq by > > > > > the > > > > > +driver SHOULD be composed of at least two descriptors. The buffer > > > > > specified by > > > > > +the first descriptor SHOULD be able to accommodate the virtio net > > > > > header and the > > > > > +protocol header. > > > > What does this mean? How is driver supposed to know? > > > > > > > > > > > > > \subsubsection{Legacy Interface: Framing > > > > > Requirements}\label{sec:Device > > > > > Types / Network Device / Legacy Interface: Framing Requirements} > > > > > -- > > > > > 2.31.0 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org