Jan, You said:
" The proposal to work around the problem by using multiple vhost-user queues per port cannot solve the problem as it has two prerequisites that are not generally fulfilled: 1. The guest application needs to be able to use multiple queues and spread its Tx traffic across them. " But QEMU vhost multiple queue feature is there and it could solve your issue - if some development decision in qemu (that caused this) was made. We still cannot affirm that, since we need to find the cause (bisection is best since I don't have access to your environment AND you're using packages/patches not generally available for Ubuntu community - from upstream/customized). About this one: " 2. The OpenStack environment must support configuration of vhost multi-queue. " According to this documents: https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html https://github.com/openstack/nova/commit/9a09674220a071e51fdca7911b52c0027c01ff64 It is already supported. You have to specify hw_vif_multiqueue_enabled=True in the image. The libvirt xml will be generated - on the instantiation - with vhost queues (one per CPU). For this one: " The work-around to increase the queue length to 1024, in contrast, is completely transparent for applications and reduces the likelihood of packet drops for all types of sub-ms scale load fluctuations, no matter their cause. In general we believe that it would be good to dimension the virtio queue size roughly equal to the typical queue sizes of physical interfaces (typically ~1K packets) to avoid that the virtio queues are the weakest link in the end-to-end data path. " It didn't happen in 2.2 (or 2.3) but it happens in 2.5. Queue size - not for the virtio device but for the virtio net device (using vhost) - has always been 256 in those versions. We would be mitigating an unknown cause. That will be hard to be accepted upstream if you want to go there directly. IMHO I think we should bisect your tests - 12 steps - and find the cause. After the cause is found I can fix it (in the best possible way for you) and we can go upstream together, if needed. Sometimes the cause has already been fixed in development tree. For this last comment: " To this end we do support the idea of making the virtio-net queue size configurable in both directions (Rx and Tx) in upstream Qemu. " THAT I do agree with you. Changing the default is tricky, but, providing a mechanism to configure it - up to hardcoded 1024 - could be beneficial. Although I still think that we are going in hypothesis without doing the tangible thing we can: bisect the test and find exact cause. What do you think ? Can I start bisecting QEMU and providing you new packages in a PPA for you to test ? You provide comments saying #good or #bad based on test results. I upload another version, you upgrade qemu, test again, and so on. How does that sound ? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1668829 Title: Performance regression from qemu 2.3 to 2.5 for vhost-user with ovs + dpdk To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1668829/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs