On Fri, 23 Sep 2022 07:04:10 -0400, "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Fri, Sep 23, 2022 at 06:48:56PM +0800, Xuan Zhuo wrote: > > On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin" <m...@redhat.com> > > wrote: > > > On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote: > > > > > > > Michael doesn't want to use desc chain, it's not just a > > > > > > > performance issue. In an > > > > > > > early email, he mentioned that desc chain may be abandoned in the > > > > > > > future. So we > > > > > > > have been trying not to rely on desc chain. > > > > > > > > > > > > This seems to be a very large change which seems a little bit too > > > > > > early to be considered. > > > > > > > > > > I'd like to put it in other terms. Fundamentally devices are not > > > > > supposed to talk about descriptors at all. Descriptors are > > > > > a way to describe buffers, and devices should all work in terms > > > > > of buffers. I am working on cleaning up the spec from confusion > > > > > and terminology mixups. We have several major sources all over the > > > > > spec: > > > > > - descriptor/buffer used inconsistently > > > > > - feature negotiated/offered used inconsistently > > > > > - field exists/valid used inconsistently > > > > > > > > > > My way to address the first issue is to make sure devices all work > > > > > with buffers. And buffers are described by descriptors (makes sense, > > > > > right?) and made available to device by driver and used by device. > > > > > > > > Can we try to keep desc info? I think it's a very important feature for > > > > virtio-net, and many NICs are designed based on this. > > > > > > > > Our current solutions B and C will waste a lot of memory. If the page > > > > occupied > > > > by the header can be quickly reclaimed, then we have to do a copy in > > > > the driver. > > > > > > Not sure I understand. Can you explain? > > > > > > For example B, we allocate two consecutive pages, the second page is used > > for > > payload. The first page is used to reserve the header. The first page is too > > wasteful, 4k of space, only the header is saved. In this way, we can copy > > away > > the data of the first page, so that the first page can be quickly reclaimed. > > > > Why two consecutive pages? The payload achieves page alignment, and there is > > still some space in front of the page, so we must allocate two connected > > pages. > > Are we talking without mergeable here?
I think B(offset) is not dependent on mergeable . C must depend on mergeable. Thanks. > > > > > > > > > > > > > > > The advantage of this is layering - we can change the way buffers > > > > > are passed around without changing devices. And, it matches > > > > > the virtio API nicely. > > > > > > > > > > Existing devices are all fine with this - they do not pass any > > > > > information in the descriptor. Yes, this seems like an option to > > > > > pass some information around, but I am not convinced it is worth > > > > > the layering violation. > > > > > > > > > > By comparison, ability to write data at an offset seems generally > > > > > useful, in particular we have a very old issue even without > > > > > the split header feature where with mergeable buffers > > > > > if we attempt to align the data in the 1st buffer at a cache line > > > > > boundary by adding an offset before ETH header, then when it spills > > > > > over to the second buffer it will be misaligned there. Wastes > > > > > an extra cache line for such packets. Offsets can allow fixing this. > > > > > > > > > > > > > Scheme B In addition to the memory problem, under this scheme, if we > > > > want to > > > > implement tcp zerocopy, then we must apply for two consecutive pages > > > > for a > > > > buffer. The second page is used to place the payload. The first is used > > > > to place > > > > the header. > > > > > > > > > I just don't understand why there would be a difference. Explanation? > > > > I'm concerned that allocating two contiguous pages in large numbers will > > cause > > unknowable problems(such perfermance) > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > I don't see architecturally what is wrong with making feature just > > > > > depend on mergeable buffers for now. We can always allow a combination > > > > > down the road. Let's just make it clear that if drivers see SPLIT && > > > > > !MERGEABLE they should not fail probe, they should instead clear the > > > > > split header feature. > > > > > > > > > > > > > According to my understanding, for B, it does not depend on mergeable. > > > > > > > > If we give up desc info completely, then we prefer B. > > > > > > > > Hi Jason, what are your thoughts? > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we can't make a feature depend only on mergeable, should we > > > > > > > use solution B? > > > > > > > > > > > > > > 2. Scheme B ( refer to your suggestion ) > > > > > > > > > > > > > > Our rethinking approach is no longer based on descriptor > > > > > > > chain. > > > > > > > > > > > > > > We refer to your proposed offset-based scheme as scheme B. > > > > > > > > > > > > The offset seems to be the suggestion of Michael. > > > > > > > > > > > > I think I like the design of v7 for several reasons: > > > > > > > > > > > > 1) easy to reserve head/tailroom without any extension of the spec > > > > > > 2) easy to work with mergeable rx buffer > > > > > > 3) it is the model used by modern NIC like e810 [1] > > > > > > > > > > > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate > > > > > > how > > > > > > it works which is similar to v7 > > > > > > > > > > > > Thanks > > > > > > > > > > > > > As you suggested, scheme B gives the device a buffer, using > > > > > > > offset to > > > > > > > indicate where to place the payload. Like this: > > > > > > > > > > > > > > <header>...<padding>... <beginning of page><data> > > > > > > > > > > > > > > But how to apply for this buffer? > > > > > > > Since we want the payload to be placed on a separate page, > > > > > > > the method > > > > > > > we consider is to directly alloc two pages from driver of > > > > > > > contiguous memory. > > > > > > > > > > > > > > Then the beginning of this contiguous memory is used to > > > > > > > store the headroom, > > > > > > > and the contiguous memory after the headroom is directly > > > > > > > handed over to the device. > > > > > > > Similar to the following: > > > > > > > > > > > > > > [------------------ receive buffer(2 pages) > > > > > > > ------------------------------] > > > > > > > [<------------first page -------------------><------ second > > > > > > > page -------->] > > > > > > > [<-----><virtnet hdr> <mac,ip,tcp>..<padding>< payload > > > > > > > >] > > > > > > > ^ ^ > > > > > > > | | > > > > > > > | pointer to device > > > > > > > | > > > > > > > | > > > > > > > Driver reserved, the later part is filled > > > > > > > > > > > > > > We have already entered v8, but we have not been able to reach an > > > > > > > agreement on > > > > > > > the basic capabilities. I want to solve this problem first. > > > > > > > > > > > > > > @Jason @Michael > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >+ \item The total size of the virtio net header and the > > > > > > > > > > >transport header exceeds \field{max_len}. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't get the reason why we need max_len. Can't it > > > > > > > > > > implied in the > > > > > > > > > > length of the first descriptor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Split transport header is usually used in high-throughput > > > > > > > > > scenarios, such as GSO-enabled cases. > > > > > > > > > Therefore, it is best to reserve tailroom with $ (the length > > > > > > > > > of the buffer) - (\field{offset} + \filed{max_len}) $ > > > > > > > > > in the first buffer to build the non-linear data area of the > > > > > > > > > socket buffer. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >+\end{itemize} > > > > > > > > > > >+ > > > > > > > > > > >+If the transport header is split by the device, the > > > > > > > > > > >VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER > > > > > > > > > > >+bit in \field{flags} MUST be set. The transport header > > > > > > > > > > >MUST be on the first > > > > > > > > > > >+buffer, following the virtio net header. The payload MUST > > > > > > > > > > >start from the > > > > > > > > > > >+second buffer. The device MUST set \field{hdr_len} of > > > > > > > > > > >structure > > > > > > > > > > >+virtio_net_hdr to the length of the transport header. > > > > > > > > > > >+The used length still reports the number of bytes it has > > > > > > > > > > >written to memory. > > > > > > > > > > >+ > > > > > > > > > > >+\field{offset} and \field{max_len} are valid when device > > > > > > > > > > >uses the first buffer. > > > > > > > > > > >+The device MUST reserve space in the first buffer using > > > > > > > > > > >\filed{offset}. > > > > > > > > > > >+If \field{offset} exceeds the length of the buffer, the > > > > > > > > > > >device MUST drop > > > > > > > > > > >+the receive packets. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can the device simply don't split the packet in this case? > > > > > > > > > > Anyhow we > > > > > > > > > > need synchronize the driver with the device in the case > > > > > > > > > > (e.g when > > > > > > > > > > driver is try to having a new max_len). > > > > > > > > > > > > > > > > > > > > > > > > > > > > We think that \field{offset} is actively set by the driver, > > > > > > > > > so the driver > > > > > > > > > will also receive packets according to this offset. > > > > > > > > > But if the case is considered to be caused by driver error > > > > > > > > > settings, > > > > > > > > > the device can do not split the packet. > > > > > > > > > > > > > > > > Note that protocol like ipv6 allows variable length of the > > > > > > > > header, > > > > > > > > falling back to not split the header seems better to me. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > (I wonder if the offset deserves a independent feature (but > > > > > > > > > > depends > > > > > > > > > > on the merge able) in this case). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Okay, we can consider later. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The maximum available length of the first buffer > > > > > > > > > > >+used by the device is specified by \field{max_len}. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Similarly the max length seems to be implied by length - > > > > > > > > > > offset? > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can refer to the above answer about \field{max_len} > > > > > > > > > similarly. > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If \field{max_len} is 0 or > > > > > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the > > > > > > > > > > >length of the buffer, > > > > > > > > > > >+the device can use the entire buffer starting at > > > > > > > > > > >\field{offset}. > > > > > > > > > > >+ > > > > > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport > > > > > > > > > > >Header}{Device Types / Network Device / Device Operation / > > > > > > > > > > >Control Virtqueue / Split Transport Header} > > > > > > > > > > >+ > > > > > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in > > > > > > > > > > >\field{flags} is set, the driver > > > > > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the > > > > > > > > > > >length of the transport header > > > > > > > > > > >+inside the first buffer. > > > > > > > > > > >+ > > > > > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater > > > > > > > > > > >than the size of the virtio net header. > > > > > > > > > > > \paragraph{Notifications Coalescing}\label{sec:Device > > > > > > > > > > > Types / Network Device / Device Operation / Control > > > > > > > > > > > Virtqueue / Notifications Coalescing} > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > To unsubscribe, e-mail: > > > > > > > > virtio-dev-unsubscr...@lists.oasis-open.org > > > > > > > > For additional commands, e-mail: > > > > > > > > virtio-dev-h...@lists.oasis-open.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org