Re: [vpp-dev] Questions about no-multi-seg
> From a old message, you saidthat set no-multi-seg will get good > performance . https://lists.fd.io/g/vpp-dev/message/18489 > My question is when i do not neeed Jumbo MTU in my scenario, is set > no-multi-seg and no-tx-checksum-offload in startup.conf a good option It depends from the drivers you use and your workload, but yes this should help DPDK select the highest performance code. Best ben -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#21982): https://lists.fd.io/g/vpp-dev/message/21982 Mute This Topic: https://lists.fd.io/mt/94231354/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] Questions about no-multi-seg
Hi , ben From a old message, you saidthat set no-multi-seg will get good performance . https://lists.fd.io/g/vpp-dev/message/18489 My question is when i do not neeed Jumbo MTU in my scenario, is set no-multi-seg and no-tx-checksum-offload in startup.conf a good option ? Thanks Guangming zhangguangm...@baicells.com -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#21981): https://lists.fd.io/g/vpp-dev/message/21981 Mute This Topic: https://lists.fd.io/mt/94231354/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Questions about no-multi-seg option in startup.conf
Hi Ben, Thank you for your quick reply and your insightful explanation. Your answers are really helpful. Thanks, Jieqiang Wang -Original Message- From: Benoit Ganne (bganne) Sent: Friday, January 8, 2021 4:20 PM To: Jieqiang Wang ; vpp-dev Cc: Lijian Zhang ; Tianyu Li ; Govindarajan Mohandoss ; nd Subject: RE: Questions about no-multi-seg option in startup.conf Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports vectorization (SSE, NEON...) only for this simpler case. When you set this option, DPDK can select the vectorized PMD instead of the more generic, non-vectorized (and hence slower) version. You can see in the 'show hardware' output that in case of 'no-multi-seg' you get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process bigger vectors and less pps overall. ben > -Original Message- > From: vpp-dev@lists.fd.io On Behalf Of Jieqiang > Wang > Sent: vendredi 8 janvier 2021 04:26 > To: vpp-dev > Cc: Lijian Zhang ; Tianyu Li > ; Govindarajan Mohandoss > ; nd > Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf > > Hi VPP dev, > > > > I was trying to do some benchmarking on VPP and found out no-multi-seg > option in startup.conf will have impact on both the performance and > how the runtime shows. > > The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK > 20.11.0. > > With no-multi-seg option set in the startup.conf, the runtime shows > like the following: > > Thread 1 vpp_wk_0 (lcore 2) > > Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37 > > vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0 > > Name State Calls Vectors > Suspends Clocks Vectors/Call > > dpdk-input polling11252714403456 > 0 9.34e-1 128.00 > > eth0-output active 112527 7201728 > 0 2.04e-1 64.00 > > eth0-tx active 112527 7201728 > 0 7.86e-1 64.00 > > eth1-output active 112527 7201728 > 0 1.91e-1 64.00 > > eth1-tx active 112527 7201728 > 0 7.93e-1 64.00 > > ethernet-input active 22505414403456 > 0 5.65e-1 64.00 > > ip4-input-no-checksumactive 11252714403456 > 0 3.83e-1 128.00 > > ip4-lookup active 11252714403456 > 0 5.34e-1 128.00 > > ip4-rewrite active 11252714403456 > 0 5.73e-1 128.00 > > unix-epoll-input polling 110 0 > 0 2.84e10.00 > > > > Output for command 'show hardware-interfaces': > > vpp# sh hardware-interfaces > > NameIdx Link Hardware > > eth0 1 up eth0 > > Link speed: 40 Gbps > > Ethernet address 3c:fd:fe:bb:d4:10 > > Intel X710/XL710 Family > > carrier up full duplex mtu 9206 > > flags: admin-up pmd rx-ip4-cksum > > Devargs: > > rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) > > tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) > > pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 > numa 0 > > max rx packet len: 9728 > > promiscuous: unicast off all-multicast on > > vlan offload: strip off filter off qinq off > > rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq- > strip > >outer-ipv4-cksum vlan-filter vlan-extend jumbo- > frame > >scatter keep-crc rss-hash > > rx offload active: ipv4-cksum > > tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum > sctp- cksum > >tcp-tso outer-ipv4-cksum qinq-insert > vxlan-tnl-tso > >gre-tnl-tso ipip-tnl-tso geneve-tnl-tso > multi-segs > >mbuf-fast-free > > tx offload active: none > > rss avail: ipv4-frag ipv4-tcp
Re: [vpp-dev] Questions about no-multi-seg option in startup.conf
Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports vectorization (SSE, NEON...) only for this simpler case. When you set this option, DPDK can select the vectorized PMD instead of the more generic, non-vectorized (and hence slower) version. You can see in the 'show hardware' output that in case of 'no-multi-seg' you get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process bigger vectors and less pps overall. ben > -Original Message- > From: vpp-dev@lists.fd.io On Behalf Of Jieqiang Wang > Sent: vendredi 8 janvier 2021 04:26 > To: vpp-dev > Cc: Lijian Zhang ; Tianyu Li ; > Govindarajan Mohandoss ; nd > Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf > > Hi VPP dev, > > > > I was trying to do some benchmarking on VPP and found out no-multi-seg > option in startup.conf will have impact on both the performance and how > the runtime shows. > > The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK > 20.11.0. > > With no-multi-seg option set in the startup.conf, the runtime shows like > the following: > > Thread 1 vpp_wk_0 (lcore 2) > > Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37 > > vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0 > > Name State Calls Vectors > Suspends Clocks Vectors/Call > > dpdk-input polling11252714403456 > 0 9.34e-1 128.00 > > eth0-output active 112527 7201728 > 0 2.04e-1 64.00 > > eth0-tx active 112527 7201728 > 0 7.86e-1 64.00 > > eth1-output active 112527 7201728 > 0 1.91e-1 64.00 > > eth1-tx active 112527 7201728 > 0 7.93e-1 64.00 > > ethernet-input active 22505414403456 > 0 5.65e-1 64.00 > > ip4-input-no-checksumactive 11252714403456 > 0 3.83e-1 128.00 > > ip4-lookup active 11252714403456 > 0 5.34e-1 128.00 > > ip4-rewrite active 11252714403456 > 0 5.73e-1 128.00 > > unix-epoll-input polling 110 0 > 0 2.84e10.00 > > > > Output for command 'show hardware-interfaces': > > vpp# sh hardware-interfaces > > NameIdx Link Hardware > > eth0 1 up eth0 > > Link speed: 40 Gbps > > Ethernet address 3c:fd:fe:bb:d4:10 > > Intel X710/XL710 Family > > carrier up full duplex mtu 9206 > > flags: admin-up pmd rx-ip4-cksum > > Devargs: > > rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) > > tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) > > pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 > > max rx packet len: 9728 > > promiscuous: unicast off all-multicast on > > vlan offload: strip off filter off qinq off > > rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq- > strip > >outer-ipv4-cksum vlan-filter vlan-extend jumbo- > frame > >scatter keep-crc rss-hash > > rx offload active: ipv4-cksum > > tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp- > cksum > >tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso > >gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs > >mbuf-fast-free > > tx offload active: none > > rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other > ipv6-frag > >ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload > > rss active:none > > tx burst mode: Vector Neon > > rx burst mode: Vector Neon > > > > Without no-mutli-seg option in startup.conf, the runtime shows as below: > >
[vpp-dev] Questions about no-multi-seg option in startup.conf
Hi VPP dev, I was trying to do some benchmarking on VPP and found out no-multi-seg option in startup.conf will have impact on both the performance and how the runtime shows. The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK 20.11.0. With no-multi-seg option set in the startup.conf, the runtime shows like the following: Thread 1 vpp_wk_0 (lcore 2) Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37 vector rates in 1.2537e7, out 1.2537e7, drop 0.e0, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling11252714403456 0 9.34e-1 128.00 eth0-output active 112527 7201728 0 2.04e-1 64.00 eth0-tx active 112527 7201728 0 7.86e-1 64.00 eth1-output active 112527 7201728 0 1.91e-1 64.00 eth1-tx active 112527 7201728 0 7.93e-1 64.00 ethernet-input active 22505414403456 0 5.65e-1 64.00 ip4-input-no-checksumactive 11252714403456 0 3.83e-1 128.00 ip4-lookup active 11252714403456 0 5.34e-1 128.00 ip4-rewrite active 11252714403456 0 5.73e-1 128.00 unix-epoll-input polling 110 0 0 2.84e10.00 Output for command 'show hardware-interfaces': vpp# sh hardware-interfaces NameIdx Link Hardware eth0 1 up eth0 Link speed: 40 Gbps Ethernet address 3c:fd:fe:bb:d4:10 Intel X710/XL710 Family carrier up full duplex mtu 9206 flags: admin-up pmd rx-ip4-cksum Devargs: rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame scatter keep-crc rss-hash rx offload active: ipv4-cksum tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs mbuf-fast-free tx offload active: none rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload rss active:none tx burst mode: Vector Neon rx burst mode: Vector Neon Without no-mutli-seg option in startup.conf, the runtime shows as below: Thread 1 vpp_wk_0 (lcore 2) Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70 vector rates in 1.0186e7, out 1.0186e7, drop 0.e0, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 3415717488384 0 9.51e-1 512.00 eth0-output active 34157 8744192 0 1.66e-1 256.00 eth0-tx active 34157 8744192 0 1.84e0 256.00 eth1-output active 34157 8744192 0 1.71e-1 256.00 eth1-tx active 34157 8744192 0 1.88e0 256.00 ethernet-input active 6831417488384 0 4.60e-1 256.00 ip4-input-no-checksumactive 6831417488384 0 3.58e-1 256.00 ip4-lookup active 6831417488384 0 5.29e-1 256.00 ip4-rewrite active 6831417488384 0 5.78e-1 256.00 unix-epoll-input polling33 0 0 3.39e10.00 Output for command 'show hardware-interfaces': vpp# sh