Re: [vpp-dev] ip4-load-balance
+1, "vpp# show hardware" and "$ cat /proc/cpuinfo" to make sure that the selected cores are on the right NUMA socket... HTH... Dave -Original Message- From: vpp-dev@lists.fd.io On Behalf Of Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io Sent: Wednesday, November 14, 2018 11:51 AM To: Ray Kinsella ; vpp-dev@lists.fd.io; Neale Ranns (nranns) ; Damjan Marion (damarion) Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] ip4-load-balance > Any clues as to what might causing the delta? >> using FD.io VPP 18.07 If you are still using that, the performance might depend on the NUMA node used. There was a placement bug, and the fix [1] has been merged just before 18.10. Vratko. [1] https://gerrit.fd.io/r/15483 -Original Message- From: vpp-dev@lists.fd.io On Behalf Of Ray Kinsella Sent: Wednesday, 2018-November-14 15:03 To: vpp-dev@lists.fd.io; Neale Ranns (nranns) ; Damjan Marion (damarion) Subject: Re: [vpp-dev] ip4-load-balance CSIT is measuring roughly 12mpps on HSW. When I measure with an equivalent system I get 10.5mpps on HSW. What I am finding is that DPDK is more efficient on CSIT HSW. CSIT Graph Node Clocks/Vector dpdk-input 6.07e1 TenGigabitEtherneta/0/0-tx 4.09e1 My System dpdk-input 8.30e1 TenGigabitEthernet83/0/1-tx 5.51e1 So between TX and Input, I am burning an extra 30 Clocks per vector. That ends up being the essential difference between 10.5mpps and 12mpps. I tried this with both binaries I built myself and the binaries in package cloud, and I get the same result. Any clues as to what might causing the delta? (I verified that my system setup was equivalent to CSIT) startup.conf @ https://pastebin.com/tuYZ6xt8 Ray K On 13/11/2018 18:09, Ray Kinsella wrote: > Mystery solved, > > I was missing the interface on the IP Route. > > ip route add count 1 20.20.20.0/24 via 1.1.1.2 > TenGigabitEthernet83/0/1 > > Ray K > > On 13/11/2018 15:39, Ray Kinsella wrote: >> Folks, >> >> I have configuring my system to get something comparable to CSIT >> performance and I am a few mpps off at the moment, using FD.io VPP 18.07. >> >> I duplicated the IPv4 Base and Scale Test Cases (and environment) >> locally and I end up with extra graph node 'ip4-load-balance' in my >> pipeline? >> >> CSIT records the following pipeline in Test Operation Data. >> https://docs.fd.io/csit/rls1807/report/test_operational_data >> >> Thread 1 vpp_wk_0 (lcore 2) >> Time 5.7, average vectors/node 245.79, last 128 main loops 13.03 >> per node 151.64 >> vector rates in 1.2082e7, out 1.2082e7, drop 0.e0, punt 0.e0 >> Name State Calls >> Vectors Suspends Clocks Vectors/Call >> TenGigabitEtherneta/0/0-output active 140125 >> 34429184 0 8.41e0 245.70 >> TenGigabitEtherneta/0/0-tx active 140125 >> 34429184 0 4.09e1 245.70 >> TenGigabitEtherneta/0/1-output active 140071 >> 34428928 0 8.58e0 245.79 >> TenGigabitEtherneta/0/1-tx active 140071 >> 34428928 0 3.93e1 245.79 >> dpdk-input polling 140580 >> 68858112 0 6.07e1 489.81 >> ip4-input-no-checksum active 280127 >> 68858112 0 2.05e1 245.81 >> ip4-lookup active 280127 >> 68858112 0 3.03e1 245.81 >> ip4-rewrite active 280127 >> 68858112 0 2.92e1 245.81 >> >> >> I get the following pipeline, with the additional graph node - >> ip4-load-balance. >> >> Thread 2 vpp_wk_1 (lcore 20) >> Time 188.9, average vectors/node 256.00, last 128 main loops 14.00 >> per node 256.00 >> vector rates in 9.3287e6, out 9.3287e6, drop 0.e0, punt 0.e0 >> Name State Calls >> Vectors Suspends Clocks Vectors/Call >> TenGigabitEthernet83/0/1-outpu active 6881842 >> 1761751552 0 8.46e0 256.00 >> TenGigabitEthernet83/0/1-tx active 6881842 >> 1761751552 0 5.53e1 256.00 dpdk-input >> polling 6881842 >> 1761751552 0 8.58e1 256.00 >> ip4-input-no-check
Re: [vpp-dev] ip4-load-balance
> Any clues as to what might causing the delta? >> using FD.io VPP 18.07 If you are still using that, the performance might depend on the NUMA node used. There was a placement bug, and the fix [1] has been merged just before 18.10. Vratko. [1] https://gerrit.fd.io/r/15483 -Original Message- From: vpp-dev@lists.fd.io On Behalf Of Ray Kinsella Sent: Wednesday, 2018-November-14 15:03 To: vpp-dev@lists.fd.io; Neale Ranns (nranns) ; Damjan Marion (damarion) Subject: Re: [vpp-dev] ip4-load-balance CSIT is measuring roughly 12mpps on HSW. When I measure with an equivalent system I get 10.5mpps on HSW. What I am finding is that DPDK is more efficient on CSIT HSW. CSIT Graph Node Clocks/Vector dpdk-input 6.07e1 TenGigabitEtherneta/0/0-tx 4.09e1 My System dpdk-input 8.30e1 TenGigabitEthernet83/0/1-tx 5.51e1 So between TX and Input, I am burning an extra 30 Clocks per vector. That ends up being the essential difference between 10.5mpps and 12mpps. I tried this with both binaries I built myself and the binaries in package cloud, and I get the same result. Any clues as to what might causing the delta? (I verified that my system setup was equivalent to CSIT) startup.conf @ https://pastebin.com/tuYZ6xt8 Ray K On 13/11/2018 18:09, Ray Kinsella wrote: > Mystery solved, > > I was missing the interface on the IP Route. > > ip route add count 1 20.20.20.0/24 via 1.1.1.2 > TenGigabitEthernet83/0/1 > > Ray K > > On 13/11/2018 15:39, Ray Kinsella wrote: >> Folks, >> >> I have configuring my system to get something comparable to CSIT >> performance and I am a few mpps off at the moment, using FD.io VPP 18.07. >> >> I duplicated the IPv4 Base and Scale Test Cases (and environment) >> locally and I end up with extra graph node 'ip4-load-balance' in my >> pipeline? >> >> CSIT records the following pipeline in Test Operation Data. >> https://docs.fd.io/csit/rls1807/report/test_operational_data >> >> Thread 1 vpp_wk_0 (lcore 2) >> Time 5.7, average vectors/node 245.79, last 128 main loops 13.03 >> per node 151.64 >> vector rates in 1.2082e7, out 1.2082e7, drop 0.e0, punt 0.e0 >> Name State Calls >> Vectors Suspends Clocks Vectors/Call >> TenGigabitEtherneta/0/0-output active 140125 >> 34429184 0 8.41e0 245.70 >> TenGigabitEtherneta/0/0-tx active 140125 >> 34429184 0 4.09e1 245.70 >> TenGigabitEtherneta/0/1-output active 140071 >> 34428928 0 8.58e0 245.79 >> TenGigabitEtherneta/0/1-tx active 140071 >> 34428928 0 3.93e1 245.79 >> dpdk-input polling 140580 >> 68858112 0 6.07e1 489.81 >> ip4-input-no-checksum active 280127 >> 68858112 0 2.05e1 245.81 >> ip4-lookup active 280127 >> 68858112 0 3.03e1 245.81 >> ip4-rewrite active 280127 >> 68858112 0 2.92e1 245.81 >> >> >> I get the following pipeline, with the additional graph node - >> ip4-load-balance. >> >> Thread 2 vpp_wk_1 (lcore 20) >> Time 188.9, average vectors/node 256.00, last 128 main loops 14.00 >> per node 256.00 >> vector rates in 9.3287e6, out 9.3287e6, drop 0.e0, punt 0.e0 >> Name State Calls >> Vectors Suspends Clocks Vectors/Call >> TenGigabitEthernet83/0/1-outpu active 6881842 >> 1761751552 0 8.46e0 256.00 >> TenGigabitEthernet83/0/1-tx active 6881842 >> 1761751552 0 5.53e1 256.00 dpdk-input >> >> polling 6881842 >> 1761751552 0 8.58e1 256.00 >> ip4-input-no-checksum active 6881842 >> 1761751552 0 2.19e1 256.00 >> ip4-load-balance active 6881842 >> 1761751552 0 1.68e1 25 >> 6.00 >> ip4-lookup active 6881842 >> 1761751552 0 2.80e1 256.00 >> ip4-rewrite active 6881842 >> 176175155
Re: [vpp-dev] ip4-load-balance
Hi Ray, By way of explanation.. without the interface the route is recursive, i.e. 20.20.20.20/24 is sent via 1.1.1.2. So the forwarding can be thought of as happening in two stages, firstly the lookup for the packet’s destination that matches 20.20.20.20/24 then the ‘lookup’ on the result of 1.1.1.2. One of the prime functions of the FIB is to resolve and cache that second lookup during route programming, so the data-plane can simply follow the result. The ip4-load-balance node is where this occurs. /neale -Message d'origine- De : Ray Kinsella Date : mardi 13 novembre 2018 à 19:09 À : "vpp-dev@lists.fd.io" , "Neale Ranns (nranns)" Objet : Re: [vpp-dev] ip4-load-balance Mystery solved, I was missing the interface on the IP Route. ip route add count 1 20.20.20.0/24 via 1.1.1.2 TenGigabitEthernet83/0/1 Ray K On 13/11/2018 15:39, Ray Kinsella wrote: > Folks, > > I have configuring my system to get something comparable to CSIT > performance and I am a few mpps off at the moment, using FD.io VPP 18.07. > > I duplicated the IPv4 Base and Scale Test Cases (and environment) > locally and I end up with extra graph node 'ip4-load-balance' in my > pipeline? > > CSIT records the following pipeline in Test Operation Data. > https://docs.fd.io/csit/rls1807/report/test_operational_data > > Thread 1 vpp_wk_0 (lcore 2) > Time 5.7, average vectors/node 245.79, last 128 main loops 13.03 per > node 151.64 > vector rates in 1.2082e7, out 1.2082e7, drop 0.e0, punt 0.e0 >Name State Calls Vectors > Suspends Clocks Vectors/Call > TenGigabitEtherneta/0/0-output active 140125 > 34429184 0 8.41e0 245.70 > TenGigabitEtherneta/0/0-tx active 140125 > 34429184 0 4.09e1 245.70 > TenGigabitEtherneta/0/1-output active 140071 > 34428928 0 8.58e0 245.79 > TenGigabitEtherneta/0/1-tx active 140071 > 34428928 0 3.93e1 245.79 > dpdk-input polling140580 > 68858112 0 6.07e1 489.81 > ip4-input-no-checksumactive 280127 > 68858112 0 2.05e1 245.81 > ip4-lookup active 280127 > 68858112 0 3.03e1 245.81 > ip4-rewrite active 280127 > 68858112 0 2.92e1 245.81 > > > I get the following pipeline, with the additional graph node - > ip4-load-balance. > > Thread 2 vpp_wk_1 (lcore 20) > Time 188.9, average vectors/node 256.00, last 128 main loops 14.00 per > node 256.00 >vector rates in 9.3287e6, out 9.3287e6, drop 0.e0, punt 0.e0 > Name State Calls Vectors >Suspends Clocks Vectors/Call > TenGigabitEthernet83/0/1-outpu active6881842 > 1761751552 0 8.46e0 256.00 > TenGigabitEthernet83/0/1-tx active6881842 > 1761751552 0 5.53e1 256.00 > dpdk-input polling 6881842 > 1761751552 0 8.58e1 256.00 > ip4-input-no-checksumactive6881842 > 1761751552 0 2.19e1 256.00 > ip4-load-balance active6881842 > 1761751552 0 1.68e1 25 > 6.00 > ip4-lookup active6881842 > 1761751552 0 2.80e1 256.00 > ip4-rewrite active6881842 > 1761751552 0 2.89e1 256.00 > > Any idea where ip4-load-balance is coming from? > > Ray K > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#11223): https://lists.fd.io/g/vpp-dev/message/11223 > Mute This Topic: https://lists.fd.io/mt/28123777/675355 > Group Owner: vpp-dev+ow...@lists.fd.io > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [m...@ashroe.eu] > -=-=-=-=-=-=-=-=-=-=-=- > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11237): https://lists.fd.io/g/vpp-dev/message/11237 Mute This Topic: https://lists.fd.io/mt/28123777/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] ip4-load-balance
Mystery solved, I was missing the interface on the IP Route. ip route add count 1 20.20.20.0/24 via 1.1.1.2 TenGigabitEthernet83/0/1 Ray K On 13/11/2018 15:39, Ray Kinsella wrote: Folks, I have configuring my system to get something comparable to CSIT performance and I am a few mpps off at the moment, using FD.io VPP 18.07. I duplicated the IPv4 Base and Scale Test Cases (and environment) locally and I end up with extra graph node 'ip4-load-balance' in my pipeline? CSIT records the following pipeline in Test Operation Data. https://docs.fd.io/csit/rls1807/report/test_operational_data Thread 1 vpp_wk_0 (lcore 2) Time 5.7, average vectors/node 245.79, last 128 main loops 13.03 per node 151.64 vector rates in 1.2082e7, out 1.2082e7, drop 0.e0, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEtherneta/0/0-output active 140125 34429184 0 8.41e0 245.70 TenGigabitEtherneta/0/0-tx active 140125 34429184 0 4.09e1 245.70 TenGigabitEtherneta/0/1-output active 140071 34428928 0 8.58e0 245.79 TenGigabitEtherneta/0/1-tx active 140071 34428928 0 3.93e1 245.79 dpdk-input polling 140580 68858112 0 6.07e1 489.81 ip4-input-no-checksum active 280127 68858112 0 2.05e1 245.81 ip4-lookup active 280127 68858112 0 3.03e1 245.81 ip4-rewrite active 280127 68858112 0 2.92e1 245.81 I get the following pipeline, with the additional graph node - ip4-load-balance. Thread 2 vpp_wk_1 (lcore 20) Time 188.9, average vectors/node 256.00, last 128 main loops 14.00 per node 256.00 vector rates in 9.3287e6, out 9.3287e6, drop 0.e0, punt 0.e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernet83/0/1-outpu active 6881842 1761751552 0 8.46e0 256.00 TenGigabitEthernet83/0/1-tx active 6881842 1761751552 0 5.53e1 256.00 dpdk-input polling 6881842 1761751552 0 8.58e1 256.00 ip4-input-no-checksum active 6881842 1761751552 0 2.19e1 256.00 ip4-load-balance active 6881842 1761751552 0 1.68e1 25 6.00 ip4-lookup active 6881842 1761751552 0 2.80e1 256.00 ip4-rewrite active 6881842 1761751552 0 2.89e1 256.00 Any idea where ip4-load-balance is coming from? Ray K -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11223): https://lists.fd.io/g/vpp-dev/message/11223 Mute This Topic: https://lists.fd.io/mt/28123777/675355 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [m...@ashroe.eu] -=-=-=-=-=-=-=-=-=-=-=- -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11226): https://lists.fd.io/g/vpp-dev/message/11226 Mute This Topic: https://lists.fd.io/mt/28123777/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-