On 05/11/18 08:53, Stefan Hajnoczi wrote:
On Wed, Sep 12, 2018 at 05:41:07PM +0100, David Riddoch wrote:
The VIRTIO_F_NOTIFICATION_DATA feature helps hardware implementations by
telling them directly how many descriptors are available in the ring, saving
speculative reads. There is a related issue with interrupts: As things
stand, after writing 'used' descriptors into the ring (packed or split) the
device has to read the event idx to determine whether a device-to-driver
notification (interrupt) is needed.
It would be very helpful to a hardware implementation if the event idx could
be written directly to the device. Has anyone proposed an extension to
address this already? (I have some thoughts on how to do this if we don't
have a proposal yet).
Hi David,
Has there been any further discussion on this topic? It seems like a
useful feature to have in VIRTIO.
Hi Stefan,
Thanks for the nudge, no further discussion. So here are my thoughts:
To provide context here is an outline of how drivers typically process
completions from a device. This applies to both para-virtualised and
real hardware. (I think we only need to worry about packed-ring).
disable_cb();
again:
while( budget_left && desc_is_used() )
handle_used();
if( budget_left &&enable_cb() )
goto again;
The budget stuff relates to napi polling, and isn't always there. But if
it is, and budget is exhausted, then an outer loop will invoke this code
again and so we don't want an interrupt.
In our current packed-ring design enable_cb() does roughly this:
enable_cb_packed() {
guest_mem->event_suppress = last_used_idx;
virtio_mb();
return desc_is_used();
}
This tells the device to raise an interrupt when moving used_idx over
the given idx. (And you can set that idx to a position further in the
ring if you're not interested in getting woken until a bunch of
descriptors have been consumed).
This design is unpleasant for hardware devices, because it forces them
to issue a read over the PCIe bus after writing used descriptors, in
order to determine whether an interrupt is needed. Such a read adds
complexity, adds overhead to the PCIe bus and memory system, and adds
latency.
In hardware devices interrupts are typically enabled like this:
enable_cb_hw() {
writel(last_used_idx, &dev_bar->int_en_doorbell);
return 0;
}
This tells the device to raise an interrupt once the descriptor at the
given index has been used. If it has already been used, then an
interrupt is raised immediately, else the interrupt is enabled and
raised after the used descriptor has been written. The race condition
is handled without having to check the ring again. Note that interrupts
are fire-once: For each write to int_en_doorbell you get a single
interrupt, and disable_cb() is therefore a no-op.
When you emulate a device using this model, you get a vmexit every time
int_en_doorbell is written, which is expensive. Therefore this approach
is not desirable when emulated.
So perhaps we should introduce an option to use this hardware-oriented
model. The problem is that with vDPA the implementation of a device can
shift under the feet of a driver between hardware and emulation. We'd
like to get the best performance we can in both cases.
My proposal is that we add an option to use both models at the same
time. enable_cb() would become:
30 enable_cb_new() {
31 if( int_en_doorbell_enabled )
32 writel(last_used_idx, p_int_en_doorbell);
33 guest_mem->event_suppress = last_used_idx;
34 virtio_mb();
35 return desc_is_used();
36 }
p_int_en_doorbell it notionally a pointer into a memory mapping onto a
device BAR, and when backed by hardware that is exactly what it is (and
the information in the event suppression structure is ignored). But
when backed by an emulated device the p_int_en_doorbell mapping should
just point at normal host memory (which is otherwise unused). This
means that we avoid vmexits in the emulated case.
Comments?
David
--
David Riddoch <dridd...@solarflare.com> -- Chief Architect, Solarflare
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org