Hi, Our app is currently using dpdk 17.11.10 (CentOS 7.9 Mellanox OFED 5.2-2.2.0.0-rhel7.9-x86_64). It is quite similar to test-pipeline example with io_rx, workers and io_tx threads running on separate lcores.
After upgrading to either dpdk 19.11.2 or dpdk 20.11.1, we observed few packets being dropped between workers and tx threads because tx thread is not able to keep up with the rate at which worker threads are enqueuing packets in the ring, and eventually ring becomes full. it occurs only for a moment initially at the start of traffic generation (@500 Mbps), then packet drops are not seen thereafter. The bottleneck seems to be rte_eth_tx_burst() call in tx thread which seems to be much slower in dpdk 19.11 and 20.11 consuming ~150 milliseconds initially for few packets and later improving with processing time in nanoseconds. With dpdk 17.11.10, where no drop is seen, the same call executes in ~50 microseconds for initial few packets and later in nanoseconds. The burst size used in transmission is 1. Is there any change in the implementation or configuration (offloads?) required for rte_eth_tx_burst() in dpdk 19.11.2 which could impact the initial performance? With only dpdk upgrade and all other platform specific things (OS, drivers) being the same, I am not sure what could be the reason behind the higher execution time for the burst API in dpdk 19/20? Thanks, Kaustubh
