Re: [vpp-dev] Questions about no-multi-seg

2022-10-10 Thread Benoit Ganne (bganne) via lists.fd.io
> From a old message, you saidthat set no-multi-seg will get good
> performance . https://lists.fd.io/g/vpp-dev/message/18489
> My question is when i do not neeed Jumbo MTU in my scenario, is  set
> no-multi-seg and  no-tx-checksum-offload  in startup.conf  a  good option

It depends from the drivers you use and your workload, but yes this should help 
DPDK select the highest performance code.

Best
ben

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21982): https://lists.fd.io/g/vpp-dev/message/21982
Mute This Topic: https://lists.fd.io/mt/94231354/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Questions about no-multi-seg

2022-10-10 Thread Guangming
Hi , ben
   From a old message, you saidthat set no-multi-seg will get good 
performance . https://lists.fd.io/g/vpp-dev/message/18489
  My question is when i do not neeed Jumbo MTU in my scenario, is  set 
no-multi-seg and  no-tx-checksum-offload  in startup.conf  a  good option ? 

Thanks 
Guangming


zhangguangm...@baicells.com

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21981): https://lists.fd.io/g/vpp-dev/message/21981
Mute This Topic: https://lists.fd.io/mt/94231354/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Questions about no-multi-seg option in startup.conf

2021-01-11 Thread Jieqiang Wang
Hi Ben,

Thank you for your quick reply and your insightful explanation. Your answers 
are really helpful.

Thanks,
Jieqiang Wang
-Original Message-
From: Benoit Ganne (bganne) 
Sent: Friday, January 8, 2021 4:20 PM
To: Jieqiang Wang ; vpp-dev 
Cc: Lijian Zhang ; Tianyu Li ; 
Govindarajan Mohandoss ; nd 
Subject: RE: Questions about no-multi-seg option in startup.conf

Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist 
of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports 
vectorization (SSE, NEON...) only for this simpler case. When you set this 
option, DPDK can select the vectorized PMD instead of the more generic, 
non-vectorized (and hence slower) version.
You can see in the 'show hardware' output that in case of 'no-multi-seg' you 
get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX 
otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 
'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 
'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process 
bigger vectors and less pps overall.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Jieqiang
> Wang
> Sent: vendredi 8 janvier 2021 04:26
> To: vpp-dev 
> Cc: Lijian Zhang ; Tianyu Li
> ; Govindarajan Mohandoss
> ; nd 
> Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf
>
> Hi VPP dev,
>
>
>
> I was trying to do some benchmarking on VPP and found out no-multi-seg
> option in startup.conf will have impact on both the performance and
> how the runtime shows.
>
> The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK
> 20.11.0.
>
> With no-multi-seg option set in the startup.conf, the runtime shows
> like the following:
>
> Thread 1 vpp_wk_0 (lcore 2)
>
> Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
>
>   vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0
>
>  Name State Calls  Vectors
> Suspends Clocks   Vectors/Call
>
> dpdk-input   polling11252714403456
> 0 9.34e-1  128.00
>
> eth0-output  active 112527 7201728
> 0 2.04e-1   64.00
>
> eth0-tx  active 112527 7201728
> 0 7.86e-1   64.00
>
> eth1-output  active 112527 7201728
> 0 1.91e-1   64.00
>
> eth1-tx  active 112527 7201728
> 0 7.93e-1   64.00
>
> ethernet-input   active 22505414403456
> 0 5.65e-1   64.00
>
> ip4-input-no-checksumactive 11252714403456
> 0 3.83e-1  128.00
>
> ip4-lookup   active 11252714403456
> 0 5.34e-1  128.00
>
> ip4-rewrite  active 11252714403456
> 0 5.73e-1  128.00
>
> unix-epoll-input polling   110   0
> 0  2.84e10.00
>
>
>
> Output for command 'show hardware-interfaces':
>
> vpp# sh hardware-interfaces
>
>   NameIdx   Link  Hardware
>
> eth0   1 up   eth0
>
>   Link speed: 40 Gbps
>
>   Ethernet address 3c:fd:fe:bb:d4:10
>
>   Intel X710/XL710 Family
>
> carrier up full duplex mtu 9206
>
> flags: admin-up pmd rx-ip4-cksum
>
> Devargs:
>
> rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
>
> tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
>
> pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00
> numa 0
>
> max rx packet len: 9728
>
> promiscuous: unicast off all-multicast on
>
> vlan offload: strip off filter off qinq off
>
> rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-
> strip
>
>outer-ipv4-cksum vlan-filter vlan-extend jumbo-
> frame
>
>scatter keep-crc rss-hash
>
> rx offload active: ipv4-cksum
>
> tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum
> sctp- cksum
>
>tcp-tso outer-ipv4-cksum qinq-insert
> vxlan-tnl-tso
>
>gre-tnl-tso ipip-tnl-tso geneve-tnl-tso
> multi-segs
>
>mbuf-fast-free
>
> tx offload active: none
>
> rss avail: ipv4-frag ipv4-tcp

Re: [vpp-dev] Questions about no-multi-seg option in startup.conf

2021-01-08 Thread Benoit Ganne (bganne) via lists.fd.io
Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist 
of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports 
vectorization (SSE, NEON...) only for this simpler case. When you set this 
option, DPDK can select the vectorized PMD instead of the more generic, 
non-vectorized (and hence slower) version.
You can see in the 'show hardware' output that in case of 'no-multi-seg' you 
get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX 
otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 
'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 
'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process 
bigger vectors and less pps overall.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Jieqiang Wang
> Sent: vendredi 8 janvier 2021 04:26
> To: vpp-dev 
> Cc: Lijian Zhang ; Tianyu Li ;
> Govindarajan Mohandoss ; nd 
> Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf
> 
> Hi VPP dev,
> 
> 
> 
> I was trying to do some benchmarking on VPP and found out no-multi-seg
> option in startup.conf will have impact on both the performance and how
> the runtime shows.
> 
> The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK
> 20.11.0.
> 
> With no-multi-seg option set in the startup.conf, the runtime shows like
> the following:
> 
> Thread 1 vpp_wk_0 (lcore 2)
> 
> Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
> 
>   vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0
> 
>  Name State Calls  Vectors
> Suspends Clocks   Vectors/Call
> 
> dpdk-input   polling11252714403456
> 0 9.34e-1  128.00
> 
> eth0-output  active 112527 7201728
> 0 2.04e-1   64.00
> 
> eth0-tx  active 112527 7201728
> 0 7.86e-1   64.00
> 
> eth1-output  active 112527 7201728
> 0 1.91e-1   64.00
> 
> eth1-tx  active 112527 7201728
> 0 7.93e-1   64.00
> 
> ethernet-input   active 22505414403456
> 0 5.65e-1   64.00
> 
> ip4-input-no-checksumactive 11252714403456
> 0 3.83e-1  128.00
> 
> ip4-lookup   active 11252714403456
> 0 5.34e-1  128.00
> 
> ip4-rewrite  active 11252714403456
> 0 5.73e-1  128.00
> 
> unix-epoll-input polling   110   0
> 0  2.84e10.00
> 
> 
> 
> Output for command 'show hardware-interfaces':
> 
> vpp# sh hardware-interfaces
> 
>   NameIdx   Link  Hardware
> 
> eth0   1 up   eth0
> 
>   Link speed: 40 Gbps
> 
>   Ethernet address 3c:fd:fe:bb:d4:10
> 
>   Intel X710/XL710 Family
> 
> carrier up full duplex mtu 9206
> 
> flags: admin-up pmd rx-ip4-cksum
> 
> Devargs:
> 
> rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
> tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
> pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
> 
> max rx packet len: 9728
> 
> promiscuous: unicast off all-multicast on
> 
> vlan offload: strip off filter off qinq off
> 
> rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-
> strip
> 
>outer-ipv4-cksum vlan-filter vlan-extend jumbo-
> frame
> 
>scatter keep-crc rss-hash
> 
> rx offload active: ipv4-cksum
> 
> tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-
> cksum
> 
>tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
> 
>gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
> 
>mbuf-fast-free
> 
> tx offload active: none
> 
> rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
> ipv6-frag
> 
>ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
> 
> rss active:none
> 
> tx burst mode: Vector Neon
> 
> rx burst mode: Vector Neon
> 
> 
> 
> Without no-mutli-seg option in startup.conf, the runtime shows as below:
> 
> 

[vpp-dev] Questions about no-multi-seg option in startup.conf

2021-01-07 Thread Jieqiang Wang
Hi VPP dev,

I was trying to do some benchmarking on VPP and found out no-multi-seg option 
in startup.conf will have impact on both the performance and how the runtime 
shows.
The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK 20.11.0.
With no-multi-seg option set in the startup.conf, the runtime shows like the 
following:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
  vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call
dpdk-input   polling11252714403456  
 0 9.34e-1  128.00
eth0-output  active 112527 7201728  
 0 2.04e-1   64.00
eth0-tx  active 112527 7201728  
 0 7.86e-1   64.00
eth1-output  active 112527 7201728  
 0 1.91e-1   64.00
eth1-tx  active 112527 7201728  
 0 7.93e-1   64.00
ethernet-input   active 22505414403456  
 0 5.65e-1   64.00
ip4-input-no-checksumactive 11252714403456  
 0 3.83e-1  128.00
ip4-lookup   active 11252714403456  
 0 5.34e-1  128.00
ip4-rewrite  active 11252714403456  
 0 5.73e-1  128.00
unix-epoll-input polling   110   0  
 0  2.84e10.00

Output for command 'show hardware-interfaces':
vpp# sh hardware-interfaces
  NameIdx   Link  Hardware
eth0   1 up   eth0
  Link speed: 40 Gbps
  Ethernet address 3c:fd:fe:bb:d4:10
  Intel X710/XL710 Family
carrier up full duplex mtu 9206
flags: admin-up pmd rx-ip4-cksum
Devargs:
rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
max rx packet len: 9728
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip
   outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame
   scatter keep-crc rss-hash
rx offload active: ipv4-cksum
tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
   tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
   gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
   mbuf-fast-free
tx offload active: none
rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other 
ipv6-frag
   ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
rss active:none
tx burst mode: Vector Neon
rx burst mode: Vector Neon

Without no-mutli-seg option in startup.conf, the runtime shows as below:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70
  vector rates in 1.0186e7, out 1.0186e7, drop 0.e0, punt 0.e0
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call
dpdk-input   polling 3415717488384  
 0 9.51e-1  512.00
eth0-output  active  34157 8744192  
 0 1.66e-1  256.00
eth0-tx  active  34157 8744192  
 0  1.84e0  256.00
eth1-output  active  34157 8744192  
 0 1.71e-1  256.00
eth1-tx  active  34157 8744192  
 0  1.88e0  256.00
ethernet-input   active  6831417488384  
 0 4.60e-1  256.00
ip4-input-no-checksumactive  6831417488384  
 0 3.58e-1  256.00
ip4-lookup   active  6831417488384  
 0 5.29e-1  256.00
ip4-rewrite  active  6831417488384  
 0 5.78e-1  256.00
unix-epoll-input polling33   0  
 0  3.39e10.00

Output for command 'show hardware-interfaces':
vpp# sh