Hi Stephen,
When I replace rte_eth_tx_burst() with mbuf free bulk I do not see the tx ring
fill up. I think this is valuable information. Also, perf analysis of the tx
thread shows common_ring_mp_enqueue and rte_atomic32_cmpset, where I did not
expect to see if I created all the Tx rings as SP and SC (and the workers and
ack rings as well, essentially all the 16 rings).
Perf report snippet:
+ 57.25% DPDK_TX_1 test [.] common_ring_mp_enqueue
+ 25.51% DPDK_TX_1 test [.] rte_atomic32_cmpset
+ 9.13% DPDK_TX_1 test [.] i40e_xmit_pkts
+ 6.50% DPDK_TX_1 test [.] rte_pause
0.21% DPDK_TX_1 test [.]
rte_mempool_ops_enqueue_bulk.isra.0
0.20% DPDK_TX_1 test [.] dpdk_tx_thread
The traffic load is constant 10 Gbps 84 bytes packets with no idles. The burst
size of 512 is a desired burst of mbufs, however the tx thread will transmit
what ever it can get from the Tx ring.
I think if resolving why the perf analysis shows ring is MP when it has been
created as SP / SC should resolve this issue.
Thanks,
ed
-----Original Message-----
From: Stephen Hemminger <[email protected]>
Sent: Tuesday, July 8, 2025 9:47 AM
To: Lombardo, Ed <[email protected]>
Cc: Ivan Malov <[email protected]>; users <[email protected]>
Subject: Re: dpdk Tx falling short
External Email: This message originated outside of NETSCOUT. Do not click links
or open attachments unless you recognize the sender and know the content is
safe.
On Tue, 8 Jul 2025 04:10:05 +0000
"Lombardo, Ed" <[email protected]> wrote:
> Hi Stephen,
> I ensured that in every pipeline stage that enqueue or dequeues mbufs it uses
> the burst version, perf showed the repercussions of doing one mbuf dequeue
> and enqueue.
> For the receive stage rte_eth_rx_burst() is used and Tx stage we use
> rte_eth_tx_burst(). The burst size used in tx_thread for dequeue burst is
> 512 Mbufs.
You might try buffering like rte_eth_tx_buffer does.
Need to add an additional mechanism to ensure that buffer gets flushed when you
detect idle period.