This is my initial code analysis:

In between 2.3 and 2.5 we have about 80 vhost changes (no merges, no
tests), being ~30 for vhost-user.

The most important vhost-user ones are these:

48854f57 vhost-user: fix log size
dc3db6ad vhost-user: start/stop all rings
5421f318 vhost-user: print original request on error
2b8819c6 vhost-user: modify SET_LOG_BASE to pass mmap size and offset
f6f56291 vhost user: add support of live migration
9a78a5dd vhost-user: send log shm fd along with log_base
1be0ac21 vhost-user: add vhost_user_requires_shm_log()
7263a0ad vhost-user: add a new message to disable/enable a specific virt queue.
* b931bfbf vhost-user: add multiple queue support
fc57fd99 vhost: introduce vhost_backend_get_vq_index method
e2051e9e vhost-user: add VHOST_USER_GET_QUEUE_NUM message
dcb10c00 vhost-user: add protocol feature negotiation
7305483a vhost-user: use VHOST_USER_XXX macro for switch statement
d345ed2d Revert "vhost-user: add multi queue support"
830d70db vhost-user: add multi queue support
294ce717 vhost-user: Send VHOST_RESET_OWNER on vhost stop

And these for vhost:

12b8cbac3c8 vhost: don't send RESET_OWNER at stop
25a2a920ddd vhost: set the correct queue index in case of migration with 
multiqueue
* 15324404f68 vhost: alloc shareable log
2ce68e4cf5b vhost: add vhost_has_free_slot() interface
0cf33fb6b49 virtio-net: correctly drop truncated packets
fc57fd9900d vhost: introduce vhost_backend_get_vq_index method
06c4670ff6d Revert "virtio-net: enable virtio 1.0"
dfb8e184db7 virtio-pci: initial virtio 1.0 support
b1506132001 vhost_net: add version_1 feature
df91055db5c virtio-net: enable virtio 1.0
* 309750fad51 vhost: logs sharing
9718e4ae362 arm_gicv2m: set kvm_gsi_direct_mapping and kvm_msi_via_irqfd_allowed

The vhost-user change is responsible for refactoring the multiple queue
support for vhost-user. I'm not entirely sure about this change, in
regards to this problem, since they're not using queues=XX in "-netdev"
command.

They have changed amount of virtio device queues (virtio) -
http://pastebin.ubuntu.com/24087865/ - but not the number of queues for
the virtio-net-pci device (vhost-user multi queues, for this example).

Possible causes of such behavior (based on QEMU changes):

- vhost-user multiple queue support refactored
  they are not using "queues=XX" in "-netdev" cmdline 
  it could have changed some logic (to check)

- tx queue callbacks scheduling (either timer or qemu aio bottom half)
  this would happen if there wasn't enough context switching 
  (for qemu and vhost-user threads). could happen due to lock contention
  or system overload (due to some other change unrelated to virtio).

* raising tx queue size we make the flushes longer in time and that is
  possibly causing a bigger throughput (stopping the queue overrun). this
  tells us that either the buffer is small OR the flush is being called
  less times than it should. 
* that is why im focusing on this part. something either reduced buffer 
  size or is causing a bottleneck for the buffer flush typical of the 
  "burst" behavior", btw.


- There was also a change in vhost logging system:

* vhost-user, commit: 309750fad51
  
* For live migration they started to log vhost (309750fad51) into 
  anonymous pages from malloc() and into anonymous pages from 
  memfd_create() OR backed by a file (in specific cases). 

* Not sure the log backend is used when there is no live migration
  occurring (causing a lock contention, for example).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1668829

Title:
  Performance regression from qemu 2.3 to 2.5 for vhost-user with ovs +
  dpdk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1668829/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to