Hi Stephen,
When I replace rte_eth_tx_burst() with mbuf free bulk I do not see the tx ring 
fill up.  I think this is valuable information.  Also, perf analysis of the tx 
thread shows common_ring_mp_enqueue and rte_atomic32_cmpset, where I did not 
expect to see if I created all the Tx  rings as SP and SC (and the workers and 
ack rings as well, essentially all the 16 rings).

Perf report snippet:
+   57.25%  DPDK_TX_1  test            [.] common_ring_mp_enqueue 
+   25.51%  DPDK_TX_1  test            [.] rte_atomic32_cmpset 
+    9.13%  DPDK_TX_1  test             [.] i40e_xmit_pkts 
+    6.50%  DPDK_TX_1  test             [.] rte_pause 
      0.21%  DPDK_TX_1  test              [.] 
rte_mempool_ops_enqueue_bulk.isra.0 
      0.20%  DPDK_TX_1  test              [.] dpdk_tx_thread                    
                          

The traffic load is constant 10 Gbps 84 bytes packets with no idles.  The burst 
size of 512 is a desired burst of mbufs, however the tx thread will transmit 
what ever it can get from the Tx ring.

I think if resolving why the perf analysis shows ring is MP when it has been 
created as SP / SC should resolve this issue.

Thanks,
ed

-----Original Message-----
From: Stephen Hemminger <step...@networkplumber.org> 
Sent: Tuesday, July 8, 2025 9:47 AM
To: Lombardo, Ed <ed.lomba...@netscout.com>
Cc: Ivan Malov <ivan.ma...@arknetworks.am>; users <users@dpdk.org>
Subject: Re: dpdk Tx falling short

External Email: This message originated outside of NETSCOUT. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

On Tue, 8 Jul 2025 04:10:05 +0000
"Lombardo, Ed" <ed.lomba...@netscout.com> wrote:

> Hi Stephen,
> I ensured that in every pipeline stage that enqueue or dequeues mbufs it uses 
> the burst version, perf showed the repercussions of doing one mbuf dequeue 
> and enqueue.
> For the receive stage rte_eth_rx_burst() is used and Tx stage we use 
> rte_eth_tx_burst().  The burst size used in tx_thread for dequeue burst is 
> 512 Mbufs.

You might try buffering like rte_eth_tx_buffer does.
Need to add an additional mechanism to ensure that buffer gets flushed when you 
detect idle period.

Reply via email to