Hi VPP maintainers,
Recently VPP has upgraded the DPDK version to DPDK-21.08, which includes two
optimization patches[1][2] from Arm DPDK team. With the mbuf-fast-free flag,
the two patches add code segment to accelerate mbuf free in PMD TX path for
i40e driver, which shows quite obvious performance improvement from DPDK L3FWD
benchmarking results.
I tried to verify the benefits that those optimization patches can bring up to
VPP, but find out that mbuf-fast-free flag is not enabled in VPP+DPDK by
default.
Applying DPDK mbuf-fast-free has some constraints, e.g,
* mbufs to be freed should be coming from the same mempool
* ref_cnt == 1 always in mbuf meta-data when user apps call DPDK
rte_eth_tx_burst ()
* No TX checksum offload
* No jumble frame
But VPP vector mode(set by adding ‘no-tx-checksum-offload’ and ‘no-multi-seg’
parameters in dpdk section of the startup.conf) seems to satisfy all the
requirements. So I made a few code changes shown as below to set mbuf-fast-free
flag by default in VPP vector mode and did some benchmarking for IPv4 routing
test cases with 1 flow/10k flows. The benchmarking results show both throughput
improvement and CPU cycles saved regarding DPDK transmit function.
So any thought on enabling mbuf-fast-free tx offload flag in VPP vector mode?
Any feedback is welcome :)
Code Changes:
diff --git a/src/plugins/dpdk/device/init.c b/src/plugins/dpdk/device/init.c
index f7c1cc106..0fbdd2317 100644
--- a/src/plugins/dpdk/device/init.c
+++ b/src/plugins/dpdk/device/init.c
@@ -398,6 +398,8 @@ dpdk_lib_init (dpdk_main_t * dm)
xd->port_conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
xd->flags |= DPDK_DEVICE_FLAG_MAYBE_MULTISEG;
}
+ if (dm->conf->no_multi_seg && dm->conf->no_tx_checksum_offload)
+ xd->port_conf.txmode.offloads |= DEV_TX_OFFLOAD_MBUF_FAST_FREE;
xd->tx_q_used = clib_min (dev_info.max_tx_queues, tm->n_vlib_mains);
Benchmark Results:
1 flow, bidirectional
Throughput(Mpps):
Original
Patched
Ratio
N1SDP
11.62
12.44
+7.06%
ThunderX2
9.52
10.16
+6.30%
Dell 8268
17.82
18.20
+2.13%
CPU cycles overhead for DPDK transmit function(recorded by Perf tools):
Original
Patched
Ratio
N1SDP
13.08%
5.53%
-7.55%
ThunderX2
11.01%
6.68%
-4.33%
Dell 8268
10.78%
7.35%
-3.43%
10k flows, bidirectional
Throughput(Mpps):
Original
Patched
Ratio
N1SDP
8.48
9.0
+6.13%
ThunderX2
8.84
9.26
+4.75%
Dell 8268
15.04
15.40
+2.39%
CPU cycles overhead for DPDK transmit function(recorded by Perf tools):
Original
Patched
Ratio
N1SDP
10.58%
4.54%
-6.04%
ThunderX2
12.92%
6.63%
-6.29%
Dell 8268
10.36%
7.97%
-2.39%
[1]
http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=be8ff6210851fdacbe00033259b7dc5426e95589
[2]
http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=95e7bb6a5fc9e371e763b11ec15786e4d574ef8e
Best Regards,
Jieqiang Wang
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20152): https://lists.fd.io/g/vpp-dev/message/20152
Mute This Topic: https://lists.fd.io/mt/85669132/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-