On Tue, Feb 12, 2019 at 03:03:06PM -0500, Rob Miller wrote: > > > On Tue, Feb 12, 2019 at 1:55 PM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Fri, Feb 01, 2019 at 09:43:02AM -0800, Rob Miller wrote: > > Agreed that this is needed. > > > > I would also like to suggest splitting the F_IN_ORDER into > > F_RX_IN_ORDER and F_TX_IN_ORDER to support hw LRO implementations, > > which can be more of a scatter/gather than tx. This would allow > > batchmode for tx at least in packed rings. > > I'm not sure what does this buy us. Are you interested in > out of order tx with in order rx then? > > Other way around. It is easier to guarantee that TX pkts are processed in > order > but rx is quite more difficult. The way some rx aggregators work is that as > pkts are received from the wire, a "timer" and flow detector is setup. This > allows for other incoming packet with same TCP header (flow) to be egg'd into > one large packet with multiple segments. When a timeout fires (no other > segments arrive within time) or the TCP header changes, the whole list of RX > buffers is then sent up to the driver for processing. However during this > gathering period, multiple flows, say from other TCP sources could be > arriving. > The hardware allocates rx buffers(descriptors) as the fragments hit the > receiver, hence they will be used out-of-order and batch mode on RX isn't > possible.
OK so that's for in-order. Sounds reasonable enough, would you like to propose a spec enhancement on virtio-comment? From what you describe it would seem that it is not up to driver, so it should be up to the device, where some VQs describe themselves as out of order on a device which is otherwise in order. > > > Finally, i would suggest a means to specify a given rings ring mode > > and packed leans more towards TX, whilst split can be either really > > depending upon LRO, jumbo, rx buff size, ect.. just like F_IN_ORDER, > > we can have RX & TX, split out. > > > > Sent from my iPhone > > Before we jump there a bit more justification would be nice. > > The above description of rx aggregation, where buffer are allocated for each > rx > pkt segment as they hit the receiver . If there were 2 ingressing flows from > different sources, say nicely interleaved, then i would expect to see rx desc > usage as follows: > > buffer 0 - flow 0 > buffer 1- flow 1 > buffer 2 - flow 0 > buffer 3 - flow 1 > > I would think this lends itself to a split virtqueue mode of operation better > than ring buffer. I'm not sure I follow why. Split virtqueue uses less cache for when only a small number of descriptors are outstanding. It pays for this with indirection and more cache line bounces. Its main advantage is thus for PV case when guest and host share a cache so generally ~1 descriptor in flight. > > > E.g. doing this change in software isn't a lot of work. How about > a software patch with some performance gains measured? > Failing that, some back of the napkin calculations showing > the potential gains and costs? > > Yea I could do that. I wanted to bounce the idea around a bit first to > understand more if there are holes in my logic. > > > > > > On Feb 1, 2019, at 9:23 AM, David Riddoch <dridd...@solarflare.com> > wrote: > > > > > > All, > > > > > > I'd like to propose a small extension to the packed virtqueue mode. > My > > > proposal is to add an offset/wrap field, written by the driver, > > > indicating how many available descriptors have been added to the ring. > > > > > > The reason for wanting this is to improve performance of hardware > > > devices. Because of high read latency over a PCIe bus, it is > important > > > for hardware devices to read multiple ring entries in parallel. It is > > > desirable to know how many descriptors are available prior to issuing > > > these reads, else you risk fetching descriptors that are not yet > > > available. As well as wasting bus bandwidth this adds complexity. > > > > > > I'd previously hoped that VIRTIO_F_NOTIFICATION_DATA would solve this > > > problem, but we still have a problem. If you rely on doorbells to > tell > > > you how many descriptors are available, then you have to keep > doorbells > > > enabled at all times. This can result in a very high rate of > doorbells > > > with some drivers, which can become a severe bottleneck (because x86 > > > CPUs can't emit MMIOs at very high rates). > > > > > > The proposed offset/wrap field allows devices to disable doorbells > when > > > appropriate, and determine the latest fill level via a PCIe read. > > > > > > I suggest the best place to put this would be in the driver area, > > > immediately after the event suppression structure. > > > > > > Presumably we would like this to be an optional feature, as > > > implementations of packed mode already exist in the wild. How about > > > VIRTIO_F_RING_PACKED_AVAIL_IDX? > > > > > > If I prepare a patch to the spec is there still time to get this into > v1.1? > > > > > > Best, > > > David > > > > > > -- > > > David Riddoch <dridd...@solarflare.com> -- Chief Architect, > Solarflare > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org