Hi Ed,

On Tue, 8 Jul 2025, Lombardo, Ed wrote:

Hi Stephen,
When I replace rte_eth_tx_burst() with mbuf free bulk I do not see the tx ring 
fill up.  I think this is valuable information.  Also, perf analysis of the tx 
thread shows common_ring_mp_enqueue and rte_atomic32_cmpset, where I did not 
expect to see if I created all the Tx  rings as SP and SC (and the workers and 
ack rings as well, essentially all the 16 rings).

Perf report snippet:
+   57.25%  DPDK_TX_1  test            [.] common_ring_mp_enqueue
+   25.51%  DPDK_TX_1  test            [.] rte_atomic32_cmpset
+    9.13%  DPDK_TX_1  test             [.] i40e_xmit_pkts
+    6.50%  DPDK_TX_1  test             [.] rte_pause
     0.21%  DPDK_TX_1  test              [.] rte_mempool_ops_enqueue_bulk.isra.0
     0.20%  DPDK_TX_1  test              [.] dpdk_tx_thread

The traffic load is constant 10 Gbps 84 bytes packets with no idles.  The burst 
size of 512 is a desired burst of mbufs, however the tx thread will transmit 
what ever it can get from the Tx ring.

I think if resolving why the perf analysis shows ring is MP when it has been 
created as SP / SC should resolve this issue.

The 'common_ring_mp_enqueue' is the enqueue method of mempool variant 'ring',
that is, based on RTE Ring internally. When you say that ring has been created
as SP / SC you seemingly refer to the regular RTE ring created by your
application logic, not the internal ring of the mempool. Am I missing something?

Thank you.


Thanks,
ed

-----Original Message-----
From: Stephen Hemminger <step...@networkplumber.org>
Sent: Tuesday, July 8, 2025 9:47 AM
To: Lombardo, Ed <ed.lomba...@netscout.com>
Cc: Ivan Malov <ivan.ma...@arknetworks.am>; users <users@dpdk.org>
Subject: Re: dpdk Tx falling short

External Email: This message originated outside of NETSCOUT. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

On Tue, 8 Jul 2025 04:10:05 +0000
"Lombardo, Ed" <ed.lomba...@netscout.com> wrote:

Hi Stephen,
I ensured that in every pipeline stage that enqueue or dequeues mbufs it uses 
the burst version, perf showed the repercussions of doing one mbuf dequeue and 
enqueue.
For the receive stage rte_eth_rx_burst() is used and Tx stage we use 
rte_eth_tx_burst().  The burst size used in tx_thread for dequeue burst is 512 
Mbufs.

You might try buffering like rte_eth_tx_buffer does.
Need to add an additional mechanism to ensure that buffer gets flushed when you 
detect idle period.

Reply via email to