Hi Ben,

Thanks for your quick feedback. A few comments inline.

Best Regards,
Jieqiang Wang

-----Original Message-----
From: Benoit Ganne (bganne) <bga...@cisco.com>
Sent: Friday, September 17, 2021 3:34 PM
To: Jieqiang Wang <jieqiang.w...@arm.com>; vpp-dev <vpp-dev@lists.fd.io>
Cc: Lijian Zhang <lijian.zh...@arm.com>; Honnappa Nagarahalli 
<honnappa.nagaraha...@arm.com>; Govindarajan Mohandoss 
<govindarajan.mohand...@arm.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Tianyu 
Li <tianyu...@arm.com>; Feifei Wang <feifei.wa...@arm.com>; nd <n...@arm.com>
Subject: RE: Enable DPDK tx offload flag mbuf-fast-free on VPP vector mode

Hi Jieqiang,

This looks like an interesting optimization but you need to check that the 
'mbufs to be freed should be coming from the same mempool' rule holds true. 
This won't be the case on NUMA systems (VPP creates 1 buffer pool per NUMA).
This should be easy to check with eg. 'vec_len (vm->buffer_main->buffer_pools) 
== 1'.
>>> Jieqiang: That's a really good point here. Like you said, it holds true on 
>>> SMP systems and we can check by if the numbers of buffer pool equal to 1. 
>>> But I am wondering that is this check too strict? If the worker CPUs and 
>>> NICs used reside in the same NUMA node, I think mbufs come from the same 
>>> mempool and we still meet the requirement here.  What do you think?

For the rest, I think we do not use DPDK mbuf refcounting at all as we maintain 
our own anyway, but someone more knowledgeable than me should confirm.
>>> Jieqiang: This echoes with the experiments(IPv4 multicasting and L2 flood) 
>>> I have done. All the mbufs in the two test cases are copied instead of ref 
>>> counting. But this also needs double-check from VPP experts like you 
>>> mentioned.

I'd be curious to see if we can measure a real performance difference in CSIT.
>>> Jieqiang: Let me trigger some performance test cases in CSIT and come back 
>>> to you with related performance figures.

Best
ben

> -----Original Message-----
> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Jieqiang
> Wang
> Sent: vendredi 17 septembre 2021 06:07
> To: vpp-dev <vpp-dev@lists.fd.io>
> Cc: Lijian Zhang <lijian.zh...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Govindarajan Mohandoss
> <govindarajan.mohand...@arm.com>; Ruifeng Wang <ruifeng.w...@arm.com>;
> Tianyu Li <tianyu...@arm.com>; Feifei Wang <feifei.wa...@arm.com>; nd
> <n...@arm.com>
> Subject: [vpp-dev] Enable DPDK tx offload flag mbuf-fast-free on VPP
> vector mode
>
> Hi VPP maintainers,
>
>
>
> Recently VPP has upgraded the DPDK version to DPDK-21.08, which
> includes two optimization patches[1][2] from Arm DPDK team. With the
> mbuf-fast-free flag, the two patches add code segment to accelerate
> mbuf free in PMD TX path for i40e driver, which shows quite obvious
> performance improvement from DPDK L3FWD benchmarking results.
>
>
>
> I tried to verify the benefits that those optimization patches can
> bring up to VPP, but find out that mbuf-fast-free flag is not enabled
> in
> VPP+DPDK by default.
>
> Applying DPDK mbuf-fast-free has some constraints, e.g,
>
> *     mbufs to be freed should be coming from the same mempool
> *     ref_cnt == 1 always in mbuf meta-data when user apps call DPDK
> rte_eth_tx_burst ()
> *     No TX checksum offload
> *     No jumble frame
>
> But VPP vector mode(set by adding ‘no-tx-checksum-offload’ and
> ‘no-multi- seg’ parameters in dpdk section of the startup.conf) seems
> to satisfy all the requirements. So I made a few code changes shown as
> below to set mbuf- fast-free flag by default in VPP vector mode and
> did some benchmarking for
> IPv4 routing test cases with 1 flow/10k flows. The benchmarking
> results show both throughput improvement and CPU cycles saved
> regarding DPDK transmit function.
>
>
>
> So any thought on enabling mbuf-fast-free tx offload flag in VPP
> vector mode?  Any feedback is welcome :)
>
>
>
> Code Changes:
>
>
>
> diff --git a/src/plugins/dpdk/device/init.c
> b/src/plugins/dpdk/device/init.c
>
> index f7c1cc106..0fbdd2317 100644
>
> --- a/src/plugins/dpdk/device/init.c
>
> +++ b/src/plugins/dpdk/device/init.c
>
> @@ -398,6 +398,8 @@ dpdk_lib_init (dpdk_main_t * dm)
>
>           xd->port_conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
>
>           xd->flags |= DPDK_DEVICE_FLAG_MAYBE_MULTISEG;
>
>         }
>
> +      if (dm->conf->no_multi_seg && dm->conf->no_tx_checksum_offload)
>
> +       xd->port_conf.txmode.offloads |=
> + DEV_TX_OFFLOAD_MBUF_FAST_FREE;
>
>
>
>        xd->tx_q_used = clib_min (dev_info.max_tx_queues, tm-
> >n_vlib_mains);
>
>
>
> Benchmark Results:
>
>
>
> 1 flow, bidirectional
>
> Throughput(Mpps):
>
>
>
> Original
>
> Patched
>
> Ratio
>
> N1SDP
>
> 11.62
>
> 12.44
>
> +7.06%
>
> ThunderX2
>
> 9.52
>
> 10.16
>
> +6.30%
>
> Dell 8268
>
> 17.82
>
> 18.20
>
> +2.13%
>
>
>
> CPU cycles overhead for DPDK transmit function(recorded by Perf tools):
>
>
>
> Original
>
> Patched
>
> Ratio
>
> N1SDP
>
> 13.08%
>
> 5.53%
>
> -7.55%
>
> ThunderX2
>
> 11.01%
>
> 6.68%
>
> -4.33%
>
> Dell 8268
>
> 10.78%
>
> 7.35%
>
> -3.43%
>
>
>
> 10k flows, bidirectional
>
> Throughput(Mpps):
>
>
>
> Original
>
> Patched
>
> Ratio
>
> N1SDP
>
> 8.48
>
> 9.0
>
> +6.13%
>
> ThunderX2
>
> 8.84
>
> 9.26
>
> +4.75%
>
> Dell 8268
>
> 15.04
>
> 15.40
>
> +2.39%
>
>
>
> CPU cycles overhead for DPDK transmit function(recorded by Perf tools):
>
>
>
> Original
>
> Patched
>
> Ratio
>
> N1SDP
>
> 10.58%
>
> 4.54%
>
> -6.04%
>
> ThunderX2
>
> 12.92%
>
> 6.63%
>
> -6.29%
>
> Dell 8268
>
> 10.36%
>
> 7.97%
>
> -2.39%
>
>
>
> [1] http://git.dpdk.org/dpdk/commit/?h=v21.08-
> rc1&id=be8ff6210851fdacbe00033259b7dc5426e95589
> <http://git.dpdk.org/dpdk/commit/?h=v21.08-
> rc1&id=be8ff6210851fdacbe00033259b7dc5426e95589>
>
> [2] http://git.dpdk.org/dpdk/commit/?h=v21.08-
> rc1&id=95e7bb6a5fc9e371e763b11ec15786e4d574ef8e
> <http://git.dpdk.org/dpdk/commit/?h=v21.08-
> rc1&id=95e7bb6a5fc9e371e763b11ec15786e4d574ef8e>
>
>
>
> Best Regards,
>
> Jieqiang Wang
>
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20177): https://lists.fd.io/g/vpp-dev/message/20177
Mute This Topic: https://lists.fd.io/mt/85669132/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to