Re: [vpp-dev] 100G with iperf3 server using VCL library

Vijay Sampath Fri, 11 Sep 2020 13:09:20 -0700

Hi Florin,

Thanks for the response. Please see inline:


On Fri, Sep 11, 2020 at 10:42 AM Florin Coras <fcoras.li...@gmail.com>
wrote:

> Hi Vijay,
>
> Cool experiment. More inline.
>
> > On Sep 11, 2020, at 9:42 AM, Vijay Sampath <vsamp...@gmail.com> wrote:
> >
> > Hi,
> >
> > I am using iperf3 as a client on an Ubuntu 18.04 Linux machine connected
> to another server running VPP using 100G NICs. Both servers are Intel Xeon
> with 24 cores.
>
> May I ask the frequency for those cores? Also what type of nic are you
> using?
>

2700 MHz. The nic is a Pensando DSC100.


>
> > I am trying to push 100G traffic from the iperf Linux TCP client by
> starting 10 parallel iperf connections on different port numbers and
> pinning them to different cores on the sender side. On the VPP receiver
> side I have 10 worker threads and 10 rx-queues in dpdk, and running iperf3
> using VCL library as follows
> >
> > taskset 0x00400 sh -c
> "LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libvcl_ldpreload.so
> VCL_CONFIG=/etc/vpp/vcl.conf iperf3 -s -4 -p 9000" &
> > taskset 0x00800 sh -c
> "LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libvcl_ldpreload.so
> VCL_CONFIG=/etc/vpp/vcl.conf iperf3 -s -4 -p 9001" &
> > taskset 0x01000 sh -c "LD_PRELOAD=/usr/lib/x86_64
> > ...
> >
> > MTU is set to 9216 everywhere, and TCP MSS set to 8200 on client:
> >
> > taskset 0x0001 iperf3 -c 10.1.1.102 -M 8200 -Z -t 6000 -p 9000
> > taskset 0x0002 iperf3 -c 10.1.1.102 -M 8200 -Z -t 6000 -p 9001
> > ...
>
> Could you try first with only 1 iperf server/client pair, just to see
> where performance is with that?
>

These are the numbers I get
rx-fifo-size 65536: ~8G
rx-fifo-size 524288: 22G
rx-fifo-size 4000000: 25G
rx-fifo-size 8000000: 25G


>
> >
> > I see that I am not able to push beyond 50-60G. I tried different sizes
> for the vcl rx-fifo-size - 64K, 256K and 1M. With 1M fifo size, I see that
> tcp latency as reported on the client increases, but not a significant
> improvement in bandwidth. Are there any suggestions to achieve 100G
> bandwidth? I am using a vpp build from master.
>
> Depends a lot on how many connections you’re running in parallel. With
> only one connection, buffer occupancy might go up, so 1-2MB might be
> better.
>

With the current run I increased this to 8000000.

>
> Could you also check how busy vpp is with “clear run” wait at least 1
> second and then “show run”. That will give you per node/worker vector
> rates. If they go above 100 vectors/dispatch the workers are busy so you
> could increase their number and implicitly the number of sessions. Note
> however that RSS is not perfect so you can get more connections on one
> worker.
>

I am attaching the output of this to the email (10 iperf connections, 4
worker threads)


> >
> > Pasting below the output of vpp and vcl conf files:
> >
> > cpu {
> >   main-core 0
> >   workers 10
>
> You can pin vpp’s workers to cores with: corelist-workers c1,c3-cN to
> avoid overlap with iperf. You might want to start with 1 worker and work
> your way up from there. In my testing, 1 worker should be enough to
> saturate a 40Gbps nic with 1 iperf connection. Maybe you need a couple more
> to reach 100, but I wouldn’t expect more.
>

I changed this to 4 cores and pinned them as you suggested.


>
> > }
> >
> > buffers {
> >   buffers-per-numa 65536
>
> Unless you need the buffers for something else, 16k might be enough.
>
> >   default data-size 9216
>
> Hm, no idea about the impact of this on performance. Session layer can
> build chained buffers so you can also try with this removed to see if it
> changes anything.
>

For now, I kept this setting.


>
> > }
> >
> > dpdk {
> >   dev 0000:15:00.0 {
> >         name eth0
> >         num-rx-queues 10
>
> Keep this in sync with the number of workers
>
> >   }
> >   enable-tcp-udp-checksum
>
> This enables sw checksum. For better performance, you’ll have to remove
> it. It will be needed however if you want to turn tso on.
>

ok. removed.


>
> > }
> >
> > session {
> >   evt_qs_memfd_seg
> > }
> > socksvr { socket-name /tmp/vpp-api.sock}
> >
> > tcp {
> >   mtu 9216
> >   max-rx-fifo 262144
>
> This is only used to compute the window scale factor. Given that your
> fifos might be larger, I would remove it. By default the value is 32MB and
> gives a wnd_scale of 10 (should be okay).
>

When I was testing with Linux TCP stack on both sides, I was restricting
the receive window per socket using net.ipv4.tcp_rmem to get better latency
numbers. I want to mimic that with VPP. What is the right way to restrict
the rcv_wnd on VPP?


>
> > }
> >
> > vcl.conf:
> > vcl {
> >   max-workers 1
>
> No need to constrain it
>
> >   rx-fifo-size 262144
> >   tx-fifo-size 262144
>
> As previously mentioned you can configure them to be larger.
>

Made them 8000000.

Attaching the show run output with 4 workers to this email. Still getting
about 50G.

Thanks,

Vijay

vpp# show run
Thread 0 vpp_main (lcore 0)
Time 3.0, 10 sec internal node vector rate 0.00 loops/sec 1279589.36
  vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
cnat-scanner-process            any wait                 0               0      
         3          5.74e3            0.00
dpdk-process                    any wait                 0               0      
         1          1.84e4            0.00
fib-walk                        any wait                 0               0      
         2          9.05e3            0.00
ikev2-manager-process           any wait                 0               0      
         3          7.19e3            0.00
ip6-mld-process                 any wait                 0               0      
         3          1.58e3            0.00
ip6-ra-process                  any wait                 0               0      
         3          3.69e3            0.00
session-queue-main               polling            526430               0      
         0          1.06e2            0.00
session-queue-process           any wait                 0               0      
         3          4.99e3            0.00
unix-cli-local:1                 active                  0               0      
         9          3.41e5            0.00
unix-epoll-input                 polling            526430               0      
         0          1.22e4            0.00
wg-timer-manager                any wait                 0               0      
       300          3.33e2            0.00
---------------
Thread 1 vpp_wk_0 (lcore 1)
Time 3.0, 10 sec internal node vector rate 140.92 loops/sec 655.72
  vector rates in 1.6822e5, out 1.9057e3, drop 6.6564e-1, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling              1952          499712      
         0          4.82e2          256.00
drop                             active                  2               2      
         0          4.20e3            1.00
dsc1-output                      active               1952            5726      
         0          5.18e2            2.93
dsc1-tx                          active               1952            5726      
         0          8.42e2            2.93
error-drop                       active                  2               2      
         0          3.13e3            1.00
ethernet-input                   active               1952          499712      
         0          2.49e1          256.00
ip4-input-no-checksum            active               1952          499710      
         0          2.24e1          255.99
ip4-local                        active               1952          499710      
         0          2.62e1          255.99
ip4-lookup                       active               3904          505436      
         0          2.87e1          129.47
ip4-rewrite                      active               1952            5726      
         0          2.77e2            2.93
llc-input                        active                  2               2      
         0          5.83e3            1.00
session-queue                    polling              1952            5726      
         0          2.24e3            2.93
snap-input                       active                  1               1      
         0          8.42e3            1.00
tcp4-established                 active               1952          499710      
         0          1.25e4          255.99
tcp4-input                       active               1952          499710      
         0          7.05e1          255.99
tcp4-output                      active               1952            5726      
         0          1.00e3            2.93
unix-epoll-input                 polling                 2               0      
         0          3.30e3            0.00
---------------
Thread 2 vpp_wk_1 (lcore 2)
Time 3.0, 10 sec internal node vector rate 124.65 loops/sec 838.86
  vector rates in 1.7466e5, out 8.1308e2, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling              2443          522342      
         0          4.95e2          213.81
dsc1-output                      active               2443            2443      
         0          1.25e3            1.00
dsc1-tx                          active               2443            2443      
         0          1.71e3            1.00
ethernet-input                   active               2443          522342      
         0          2.56e1          213.81
ip4-input-no-checksum            active               2443          522342      
         0          2.42e1          213.81
ip4-local                        active               2443          522342      
         0          2.69e1          213.81
ip4-lookup                       active               3253          524785      
         0          2.79e1          161.32
ip4-rewrite                      active               2443            2443      
         0          5.84e2            1.00
session-queue                    polling              2443            2443      
         0          3.77e3            1.00
tcp4-established                 active               2443          522342      
         0          1.19e4          213.81
tcp4-input                       active               2443          522342      
         0          6.78e1          213.81
tcp4-output                      active               2443            2443      
         0          2.03e3            1.00
unix-epoll-input                 polling                 3               0      
         0          2.88e3            0.00
---------------
Thread 3 vpp_wk_2 (lcore 3)
Time 3.0, 10 sec internal node vector rate 140.97 loops/sec 663.92
  vector rates in 1.6899e5, out 1.9101e3, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling              1961          502016      
         0          4.82e2          256.00
dsc1-output                      active               1961            5739      
         0          5.05e2            2.93
dsc1-tx                          active               1961            5739      
         0          9.02e2            2.93
ethernet-input                   active               1961          502016      
         0          2.41e1          256.00
ip4-input-no-checksum            active               1961          502016      
         0          2.28e1          256.00
ip4-local                        active               1961          502016      
         0          2.58e1          256.00
ip4-lookup                       active               3922          507755      
         0          2.86e1          129.46
ip4-rewrite                      active               1961            5739      
         0          2.90e2            2.93
session-queue                    polling              1961            5739      
         0          2.12e3            2.93
tcp4-established                 active               1961          502016      
         0          1.24e4          256.00
tcp4-input                       active               1961          502016      
         0          7.00e1          256.00
tcp4-output                      active               1961            5739      
         0          7.43e2            2.93
unix-epoll-input                 polling                 2               0      
         0          3.58e3            0.00
---------------
Thread 4 vpp_wk_3 (lcore 4)
Time 3.0, 10 sec internal node vector rate 140.97 loops/sec 1016.21
  vector rates in 2.5521e5, out 2.8436e3, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling              2962          758272      
         0          4.67e2          256.00
dsc1-output                      active               2962            8544      
         0          3.96e2            2.88
dsc1-tx                          active               2962            8544      
         0          7.67e2            2.88
ethernet-input                   active               2962          758272      
         0          2.17e1          256.00
ip4-input-no-checksum            active               2962          758272      
         0          2.11e1          256.00
ip4-local                        active               2962          758272      
         0          2.55e1          256.00
ip4-lookup                       active               5924          766816      
         0          2.59e1          129.44
ip4-rewrite                      active               2962            8544      
         0          2.49e2            2.88
session-queue                    polling              2962            8544      
         0          1.66e3            2.88
tcp4-established                 active               2962          758272      
         0          8.03e3          256.00
tcp4-input                       active               2962          758272      
         0          7.07e1          256.00
tcp4-output                      active               2962            8544      
         0          5.75e2            2.88
unix-epoll-input                 polling                 2               0      
         0          4.58e3            0.00

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17381): https://lists.fd.io/g/vpp-dev/message/17381
Mute This Topic: https://lists.fd.io/mt/76783803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] 100G with iperf3 server using VCL library

Reply via email to