Re: [vpp-dev] multi-core multi-threading performance

Pragash Vijayaragavan Mon, 06 Nov 2017 05:10:55 -0800

Ok now i provisioned 4 rx queues for 4 worker threads and yea all workers
are processing traffic, but the lookup rate has dropped, i am getting low
packets than when it was 2 workers.


I tried configuring 4 tx queues as well, still same problem (low packets
received compared to 2 workers).



Thanks,

Pragash Vijayaragavan
Grad Student at Rochester Institute of Technology
email : pxv3...@rit.edu
ph : 585 764 4662


On Mon, Nov 6, 2017 at 8:00 AM, Pragash Vijayaragavan <pxv3...@g.rit.edu>
wrote:

> Just 1, let me change it to 2 may be 3 and get back to you.
>
> Thanks,
>
> Pragash Vijayaragavan
> Grad Student at Rochester Institute of Technology
> email : pxv3...@rit.edu
> ph : 585 764 4662 <(585)%20764-4662>
>
>
> On Mon, Nov 6, 2017 at 7:48 AM, Dave Barach (dbarach) <dbar...@cisco.com>
> wrote:
>
>> How many RX queues did you provision? One per worker, or no supper...
>>
>>
>>
>> Thanks… Dave
>>
>>
>>
>> *From:* Pragash Vijayaragavan [mailto:pxv3...@rit.edu]
>> *Sent:* Monday, November 6, 2017 7:36 AM
>>
>> *To:* Dave Barach (dbarach) <dbar...@cisco.com>
>> *Cc:* vpp-dev@lists.fd.io; John Marshall (jwm) <j...@cisco.com>; Neale
>> Ranns (nranns) <nra...@cisco.com>; Minseok Kwon <mxk...@rit.edu>
>> *Subject:* Re: multi-core multi-threading performance
>>
>>
>>
>> Hi Dave,
>>
>>
>>
>> As per your suggestion i tried sending different traffic and i could
>> notice that, 1 worker acts per port (hardware NIC)
>>
>>
>>
>> Is it true that multiple workers cannot work on same port at the same
>> time?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pragash Vijayaragavan
>>
>> Grad Student at Rochester Institute of Technology
>>
>> email : pxv3...@rit.edu
>>
>> ph : 585 764 4662 <(585)%20764-4662>
>>
>>
>>
>>
>>
>> On Mon, Nov 6, 2017 at 7:13 AM, Pragash Vijayaragavan <pxv3...@g.rit.edu>
>> wrote:
>>
>> Thanks Dave,
>>
>>
>>
>> let me try it out real quick and get back to you.
>>
>>
>> Thanks,
>>
>>
>>
>> Pragash Vijayaragavan
>>
>> Grad Student at Rochester Institute of Technology
>>
>> email : pxv3...@rit.edu
>>
>> ph : 585 764 4662 <(585)%20764-4662>
>>
>>
>>
>>
>>
>> On Mon, Nov 6, 2017 at 7:11 AM, Dave Barach (dbarach) <dbar...@cisco.com>
>> wrote:
>>
>> Incrementing / random src/dst addr/port....
>>
>>
>>
>> Thanks… Dave
>>
>>
>>
>> *From:* Pragash Vijayaragavan [mailto:pxv3...@rit.edu]
>> *Sent:* Monday, November 6, 2017 7:06 AM
>> *To:* Dave Barach (dbarach) <dbar...@cisco.com>
>> *Cc:* vpp-dev@lists.fd.io; John Marshall (jwm) <j...@cisco.com>; Neale
>> Ranns (nranns) <nra...@cisco.com>; Minseok Kwon <mxk...@rit.edu>
>> *Subject:* Re: multi-core multi-threading performance
>>
>>
>>
>> Hi Dave,
>>
>>
>>
>> Thanks for the mail
>>
>>
>>
>> a "show run" command shows dpdk-input process on 2 of the workers but the
>> ip6-lookup process is running only on 1 worker.
>>
>>
>>
>> What config should be done to make all threads process traffic.
>>
>>
>>
>> This is for 4 workers and 1 main core.
>>
>>
>>
>> Pasted output :
>>
>>
>>
>>
>>
>> vpp# sh run
>>
>> Thread 0 vpp_main (lcore 1)
>>
>> Time 7.5, average vectors/node 0.00, last 128 main loops 0.00 per node
>> 0.00
>>
>>   vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
>>
>>              Name                 State         Calls          Vectors
>>     Suspends         Clocks       Vectors/Call
>>
>> acl-plugin-fa-cleaner-process   any wait                 0
>>  0              15          4.97e3            0.00
>>
>> api-rx-from-ring                 active                  0
>>  0              79          1.07e5            0.00
>>
>> cdp-process                     any wait                 0
>>  0               3          2.65e3            0.00
>>
>> dpdk-process                    any wait                 0
>>  0               2          6.77e7            0.00
>>
>> fib-walk                        any wait                 0
>>  0            7474          6.74e2            0.00
>>
>> gmon-process                    time wait                0
>>  0               1          4.24e3            0.00
>>
>> ikev2-manager-process           any wait                 0
>>  0               7          7.04e3            0.00
>>
>> ip6-icmp-neighbor-discovery-ev  any wait                 0
>>  0               7          4.67e3            0.00
>>
>> lisp-retry-service              any wait                 0
>>  0               3          7.21e3            0.00
>>
>> unix-epoll-input                 polling          21655148
>>  0               0          5.43e2            0.00
>>
>> vpe-oam-process                 any wait                 0
>>  0               4          5.28e3            0.00
>>
>> ---------------
>>
>> Thread 1 vpp_wk_0 (lcore 2)
>>
>> Time 7.5, average vectors/node 255.99, last 128 main loops 14.00 per node
>> 256.00
>>
>>   vector rates in 4.1903e6, out 4.1903e6, drop 0.0000e0, punt 0.0000e0
>>
>>              Name                 State         Calls          Vectors
>>     Suspends         Clocks       Vectors/Call
>>
>> FortyGigabitEthernet4/0/0-outp   active             123334
>> 31572992               0          6.58e0          255.99
>>
>> FortyGigabitEthernet4/0/0-tx     active             123334
>> 31572992               0          7.20e1          255.99
>>
>> dpdk-input                       polling            124347
>> 31572992               0          5.49e1          253.91
>>
>> ip6-input                        active             123334
>> 31572992               0          2.28e1          255.99
>>
>> ip6-load-balance                 active             123334
>> 31572992               0          1.61e1          255.99
>>
>> ip6-lookup                       active             123334
>> 31572992               0          3.77e2          255.99
>>
>> ip6-rewrite                      active             123334
>> 31572992               0          2.02e1          255.99
>>
>> ---------------
>>
>> Thread 2 vpp_wk_1 (lcore 3)
>>
>> Time 7.5, average vectors/node 0.00, last 128 main loops 0.00 per node
>> 0.00
>>
>>   vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
>>
>>              Name                 State         Calls          Vectors
>>     Suspends         Clocks       Vectors/Call
>>
>> dpdk-input                       polling          83188682
>>  0               0          1.11e2            0.00
>>
>> ---------------
>>
>> Thread 3 vpp_wk_2 (lcore 18)
>>
>> Time 7.5, average vectors/node 0.00, last 128 main loops 0.00 per node
>> 0.00
>>
>>   vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
>>
>>              Name                 State         Calls          Vectors
>>     Suspends         Clocks       Vectors/Call
>>
>> ---------------
>>
>> Thread 4 vpp_wk_3 (lcore 19)
>>
>> Time 7.5, average vectors/node 0.00, last 128 main loops 0.00 per node
>> 0.00
>>
>>   vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
>>
>>              Name                 State         Calls          Vectors
>>     Suspends         Clocks       Vectors/Call
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Pragash Vijayaragavan
>>
>> Grad Student at Rochester Institute of Technology
>>
>> email : pxv3...@rit.edu
>>
>> ph : 585 764 4662 <(585)%20764-4662>
>>
>>
>>
>>
>>
>> On Mon, Nov 6, 2017 at 6:47 AM, Dave Barach (dbarach) <dbar...@cisco.com>
>> wrote:
>>
>> Have you verified that all of the worker threads are processing traffic?
>> Sufficiently poor RSS statistics could mean - in the limit - that only one
>> worker thread is processing traffic.
>>
>>
>>
>> Thanks… Dave
>>
>>
>>
>> *From:* Pragash Vijayaragavan [mailto:pxv3...@rit.edu]
>> *Sent:* Sunday, November 5, 2017 10:03 PM
>> *To:* vpp-dev@lists.fd.io
>> *Cc:* John Marshall (jwm) <j...@cisco.com>; Neale Ranns (nranns) <
>> nra...@cisco.com>; Dave Barach (dbarach) <dbar...@cisco.com>; Minseok
>> Kwon <mxk...@rit.edu>
>> *Subject:* multi-core multi-threading performance
>>
>>
>>
>> Hi ,
>>
>>
>>
>> We are measuring performance of ip6 lookup in multi-core multi-worker
>> environments and
>>
>> we don't see good scaling of performance when we keep increasing the
>> number of cores/workers.
>>
>>
>>
>> We are just changing the startup.conf file to create more workers,
>> rx-queues, sock-mem etc. Should we do anything else to see an increase in
>> performance.
>>
>>
>>
>> Is there a limitation on the performance even if we increase the number
>> of workers.
>>
>>
>>
>> Is it dependent on the number of hardware NICs we have, we only have 1
>> NIC to receive the traffic.
>>
>>
>>
>>
>>
>> TIA,
>>
>>
>> Thanks,
>>
>>
>>
>> Pragash Vijayaragavan
>>
>> Grad Student at Rochester Institute of Technology
>>
>> email : pxv3...@rit.edu
>>
>> ph : 585 764 4662 <(585)%20764-4662>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] multi-core multi-threading performance

Reply via email to