Jan,

You said:

"
The proposal to work around the problem by using multiple vhost-user queues per 
port cannot solve the problem as it has two prerequisites that are not 
generally fulfilled:

1. The guest application needs to be able to use multiple queues and spread its 
Tx traffic across them.
"

But QEMU vhost multiple queue feature is there and it could solve your
issue - if some development decision in qemu (that caused this) was
made. We still cannot affirm that, since we need to find the cause
(bisection is best since I don't have access to your environment AND
you're using packages/patches not generally available for Ubuntu
community - from upstream/customized).


About this one:

"
2. The OpenStack environment must support configuration of vhost multi-queue.
"

According to this documents:

https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
https://github.com/openstack/nova/commit/9a09674220a071e51fdca7911b52c0027c01ff64

It is already supported. You have to specify
hw_vif_multiqueue_enabled=True in the image. The libvirt xml will be
generated - on the instantiation - with vhost queues (one per CPU).

For this one:

"
The work-around to increase the queue length to 1024, in contrast, is 
completely transparent for applications and reduces the likelihood of packet 
drops for all types of sub-ms scale load fluctuations, no matter their cause.


In general we believe that it would be good to dimension the virtio queue size 
roughly equal to the typical queue sizes of physical interfaces (typically ~1K 
packets) to avoid that the virtio queues are the weakest link in the end-to-end 
data path.
"

It didn't happen in 2.2 (or 2.3) but it happens in 2.5. Queue size - not
for the virtio device but for the virtio net device (using vhost) - has
always been 256 in those versions. We would be mitigating an unknown
cause. That will be hard to be accepted upstream if you want to go there
directly.

IMHO I think we should bisect your tests - 12 steps - and find the
cause. After the cause is found I can fix it (in the best possible way
for you) and we can go upstream together, if needed. Sometimes the cause
has already been fixed in development tree.

For this last comment:

"
To this end we do support the idea of making the virtio-net queue size 
configurable in both directions (Rx and Tx) in upstream Qemu.
"

THAT I do agree with you. Changing the default is tricky, but, providing
a mechanism to configure it - up to hardcoded 1024 - could be
beneficial. Although I still think that we are going in hypothesis
without doing the tangible thing we can: bisect the test and find exact
cause.

What do you think ? Can I start bisecting QEMU and providing you new
packages in a PPA for you to test ? You provide comments saying #good or
#bad based on test results. I upload another version, you upgrade qemu,
test again, and so on.

How does that sound ?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1668829

Title:
  Performance regression from qemu 2.3 to 2.5 for vhost-user with ovs +
  dpdk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1668829/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to