Re: [vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread "Zhou You(Joe Zhou)
It sounds weird, have you commented "if((counter % 600.000.000) == 0)" ?mod 
operation maybe expensive since 600,000,000 is a large number than 
MAX_INT.
but it should do so much harm to throughput :-(



 --
 Best Regard
 Joe








--Original--
From:"Nguyễn Thế Hiếu"-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16261): https://lists.fd.io/g/vpp-dev/message/16261
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] rx misses observed on dpdk interface

2020-05-06 Thread zhangtj03
Hi all;
I am using Intel 82599 (10G) , running with  VPP v20.01-release with line rate 
of 10G 128 bytes of packet size, i am observing Rx misses on the interfaces.

The VPP related config as flow:
vpp# show hardware-interfaces
Name                Idx   Link  Hardware
TenGigabitEthernet3/0/1            1     up   TenGigabitEthernet3/0/1
Link speed: 10 Gbps
Ethernet address 6c:92:bf:4d:e2:fb
Intel 82599
carrier up full duplex mtu 9206
flags: admin-up pmd maybe-multiseg subif tx-offload intel-phdr-cksum 
rx-ip4-cksum
rx: queues 1 (max 128), desc 4096 (min 32 max 4096 align 8)
tx: queues 1 (max 64), desc 4096 (min 32 max 4096 align 8)
pci: device 8086:15ab subsystem 8086: address :03:00.01 numa 0
max rx packet len: 15872
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum outer-ipv4-cksum
vlan-filter vlan-extend jumbo-frame scatter keep-crc
rx offload active: ipv4-cksum jumbo-frame scatter
tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
tcp-tso outer-ipv4-cksum multi-segs
tx offload active: udp-cksum tcp-cksum tcp-tso multi-segs
rss avail:         ipv4-tcp ipv4-udp ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-tcp
ipv6-udp ipv6-ex ipv6
rss active:        none
tx burst function: ixgbe_xmit_pkts
rx burst function: ixgbe_recv_scattered_pkts_vec

tx frames ok                                    61218472
tx bytes ok                                   7591090528
rx frames ok                                    61218472
rx bytes ok                                   7591090528
*rx missed                                          59536*
extended stats:
rx good packets                               61218472
tx good packets                               61218472
rx good bytes                               7591090528
tx good bytes                               7591090528
rx missed errors                                 59536
rx q0packets                                  61218472
rx q0bytes                                  7591090528
tx q0packets                                  61218472
tx q0bytes                                  7591090528
rx size 128 to 255 packets                    66097941
rx total packets                              66097927
rx total bytes                              8196143588
tx total packets                              61218472
tx size 128 to 255 packets                    61218472
rx l3 l4 xsum error                          101351297
out pkts untagged                             61218472
*rx priority0 dropped                             59536*
local0                             0    down  local0
Link speed: unknown
local

cpu {
## In the VPP there is one main thread and optionally the user can create 
worker(s)
## The main thread and worker thread(s) can be pinned to CPU core(s) manually 
or automatically

## Manual pinning of thread(s) to CPU core(s)

## Set logical CPU core where main thread runs, if main core is not set
## VPP will use core 1 if available
#main-core 1

## Set logical CPU core(s) where worker threads are running
#corelist-workers 2-3,18-19
#corelist-workers 4-3,5-7

## Automatic pinning of thread(s) to CPU core(s)

## Sets number of CPU core(s) to be skipped (1 ... N-1)
## Skipped CPU core(s) are not used for pinning main thread and working 
thread(s).
## The main thread is automatically pinned to the first available CPU core and 
worker(s)
## are pinned to next free CPU core(s) after core assigned to main thread
#skip-cores 4

## Specify a number of workers to be created
## Workers are pinned to N consecutive CPU cores while skipping "skip-cores" 
CPU core(s)
## and main thread's CPU core
# workers 4

## Set scheduling policy and priority of main and worker threads

## Scheduling policy options are: other (SCHED_OTHER), batch (SCHED_BATCH)
## idle (SCHED_IDLE), fifo (SCHED_FIFO), rr (SCHED_RR)
scheduler-policy fifo

## Scheduling priority is used only for "real-time policies (fifo and rr),
## and has to be in the range of priorities supported for a particular policy
scheduler-priority 50
}

buffers {
## Increase number of buffers allocated, needed only in scenarios with
## large number of interfaces and worker threads. Value is per numa node.
## Default is 16384 (8192 if running unpriviledged)
buffers-per-numa 3

## Size of buffer data area
## Default is 2048
default data-size 2048
}

dpdk {
## Change default settings for all interfaces
dev default {
## Number of receive queues, enables RSS
## Default is 1
#num-rx-queues 4

## Number of transmit queues, Default is equal
## to number of worker threads or 1 if no workers treads
#num-tx-queues 4

## Number of descriptors in transmit and receive rings
## increasing or reducing number can impact performance
## Default is 1024 for both rx and tx
num-rx-desc 4096
num-tx-desc 4096

## VLAN strip offload mode for interface
## Default is off
#vlan-strip-offload on

## TCP Segment Offload
## Default is off
## To 

Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Elias Rudberg
OK now I updated it (https://gerrit.fd.io/r/c/vpp/+/26934).
Thanks again for your help!
/ Elias


On Thu, 2020-05-07 at 01:58 +0200, Damjan Marion wrote:
> i already pushed one, can you updatr it instead?
> 
> Thanks
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16259): https://lists.fd.io/g/vpp-dev/message/16259
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Damjan Marion via lists.fd.io
i already pushed one, can you updatr it instead?

Thanks

-- 
Damjan

> On 7 May 2020, at 01:56, Elias Rudberg  wrote:
> 
> Hi Dave and Damjan,
> 
> Here is instruction and register info:
> 
> (gdb) x/i $pc
> => 0x7fffabbbdd67 :vmovdqa64
> -0x30a0(%rbp),%ymm0
> (gdb) info registers rbp ymm0
> rbp0x7417daf00x7417daf0
> ymm0   {v8_float = {0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0,
> 0xfffd}, v4_double = {0x0, 0x37, 0x0, 0xff85}, v32_int8
> = {0x0, 0x0, 0x0, 0x10, 
>0x3f, 0xf6, 0x41, 0x80, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x4b,
> 0x40, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x55, 0x0, 0x0, 0x0, 0x0, 0x10,
> 0x3f, 0xf6, 0x5e, 
>0xc0}, v16_int16 = {0x0, 0x1000, 0xf63f, 0x8041, 0x0, 0x1000,
> 0xf63f, 0x404b, 0x0, 0x1000, 0xf63f, 0x55, 0x0, 0x1000, 0xf63f,
> 0xc05e}, v8_int32 = {
>0x1000, 0x8041f63f, 0x1000, 0x404bf63f, 0x1000,
> 0x55f63f, 0x1000, 0xc05ef63f}, v4_int64 = {0x8041f63f1000,
> 0x404bf63f1000, 
>0x55f63f1000, 0xc05ef63f1000}, v2_int128 =
> {0x404bf63f10008041f63f1000,
> 0xc05ef63f1055f63f1000}}
> 
> Not sure if I understand all this but perhaps it means that the value
> in %rbp is used as a memory address, but that address 0x7417daf0 is
> not 32-byte aligned as it needs to be.
> 
> Adding __attribute__((aligned(32))) as Damjan suggests indeed seems to
> help. After that there was again a segfault in another place in the
> same file, where the same trick of adding __attribute__((aligned(32)))
> again helped.
> 
> So it seems the problem can be fixed by adding that alignment attribute
> in two places, like this:
> 
> diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
> index cf0b6bffe..324436f01 100644
> --- a/src/plugins/rdma/input.c
> +++ b/src/plugins/rdma/input.c
> @@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm,
> rdma_device_t * rd,
> 
>   if (is_mlx5dv)
> {
> -  u64 va[8];
> +  u64 va[8] __attribute__((aligned(32)));
>   mlx5dv_rwq_t *wqe = rxq->wqes + slot;
> 
>   while (n >= 1)
> @@ -488,7 +488,7 @@ rdma_device_input_inline (vlib_main_t * vm,
> vlib_node_runtime_t * node,
>   rdma_rxq_t *rxq = vec_elt_at_index (rd->rxqs, qid);
>   vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b = bufs;
>   struct ibv_wc wc[VLIB_FRAME_SIZE];
> -  u32 byte_cnts[VLIB_FRAME_SIZE];
> +  u32 byte_cnts[VLIB_FRAME_SIZE] __attribute__((aligned(32)));
>   vlib_buffer_t bt;
>   u32 next_index, *to_next, n_left_to_next, n_rx_bytes = 0;
>   int n_rx_packets, skip_ip4_cksum = 0;
> 
> Many thanks for you help!
> 
> Should I push the above as a patch to gerrit?
> 
> / Elias
> 
> 
> 
>> On Wed, 2020-05-06 at 20:38 +0200, Damjan Marion wrote:
>> Can you try this:
>> 
>> diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
>> index cf0b6bffe..b461ee27b 100644
>> --- a/src/plugins/rdma/input.c
>> +++ b/src/plugins/rdma/input.c
>> @@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm,
>> rdma_device_t * rd,
>> 
>>   if (is_mlx5dv)
>> {
>> -  u64 va[8];
>> +  u64 va[8] __attribute__((aligned(32)));
>>   mlx5dv_rwq_t *wqe = rxq->wqes + slot;
>> 
>>   while (n >= 1)
>> 
>> 
>> Thanks!
>> 
>>> On 6 May 2020, at 19:45, Elias Rudberg 
>>> wrote:
>>> 
>>> Hello VPP experts,
>>> 
>>> When trying to use the current master branch, we get a segmentation
>>> fault error. Here is what it looks like in gdb:
>>> 
>>> Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7fedf91fe700 (LWP 21309)]
>>> rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
>>> rxq=0x77edea80, is_mlx5dv=1)
>>>   at vpp/src/plugins/rdma/input.c:115
>>> 115  *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
>>> + 4));
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16258): https://lists.fd.io/g/vpp-dev/message/16258
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Elias Rudberg
Hi Dave and Damjan,

Here is instruction and register info:

(gdb) x/i $pc
=> 0x7fffabbbdd67 :   vmovdqa64
-0x30a0(%rbp),%ymm0
(gdb) info registers rbp ymm0
rbp0x7417daf0   0x7417daf0
ymm0   {v8_float = {0x0, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0,
0xfffd}, v4_double = {0x0, 0x37, 0x0, 0xff85}, v32_int8
= {0x0, 0x0, 0x0, 0x10, 
0x3f, 0xf6, 0x41, 0x80, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x4b,
0x40, 0x0, 0x0, 0x0, 0x10, 0x3f, 0xf6, 0x55, 0x0, 0x0, 0x0, 0x0, 0x10,
0x3f, 0xf6, 0x5e, 
0xc0}, v16_int16 = {0x0, 0x1000, 0xf63f, 0x8041, 0x0, 0x1000,
0xf63f, 0x404b, 0x0, 0x1000, 0xf63f, 0x55, 0x0, 0x1000, 0xf63f,
0xc05e}, v8_int32 = {
0x1000, 0x8041f63f, 0x1000, 0x404bf63f, 0x1000,
0x55f63f, 0x1000, 0xc05ef63f}, v4_int64 = {0x8041f63f1000,
0x404bf63f1000, 
0x55f63f1000, 0xc05ef63f1000}, v2_int128 =
{0x404bf63f10008041f63f1000,
0xc05ef63f1055f63f1000}}

Not sure if I understand all this but perhaps it means that the value
in %rbp is used as a memory address, but that address 0x7417daf0 is
not 32-byte aligned as it needs to be.

Adding __attribute__((aligned(32))) as Damjan suggests indeed seems to
help. After that there was again a segfault in another place in the
same file, where the same trick of adding __attribute__((aligned(32)))
again helped.

So it seems the problem can be fixed by adding that alignment attribute
in two places, like this:

diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
index cf0b6bffe..324436f01 100644
--- a/src/plugins/rdma/input.c
+++ b/src/plugins/rdma/input.c
@@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm,
rdma_device_t * rd,
 
   if (is_mlx5dv)
 {
-  u64 va[8];
+  u64 va[8] __attribute__((aligned(32)));
   mlx5dv_rwq_t *wqe = rxq->wqes + slot;
 
   while (n >= 1)
@@ -488,7 +488,7 @@ rdma_device_input_inline (vlib_main_t * vm,
vlib_node_runtime_t * node,
   rdma_rxq_t *rxq = vec_elt_at_index (rd->rxqs, qid);
   vlib_buffer_t *bufs[VLIB_FRAME_SIZE], **b = bufs;
   struct ibv_wc wc[VLIB_FRAME_SIZE];
-  u32 byte_cnts[VLIB_FRAME_SIZE];
+  u32 byte_cnts[VLIB_FRAME_SIZE] __attribute__((aligned(32)));
   vlib_buffer_t bt;
   u32 next_index, *to_next, n_left_to_next, n_rx_bytes = 0;
   int n_rx_packets, skip_ip4_cksum = 0;

Many thanks for you help!

Should I push the above as a patch to gerrit?

/ Elias



On Wed, 2020-05-06 at 20:38 +0200, Damjan Marion wrote:
> Can you try this:
> 
> diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
> index cf0b6bffe..b461ee27b 100644
> --- a/src/plugins/rdma/input.c
> +++ b/src/plugins/rdma/input.c
> @@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm,
> rdma_device_t * rd,
> 
>if (is_mlx5dv)
>  {
> -  u64 va[8];
> +  u64 va[8] __attribute__((aligned(32)));
>mlx5dv_rwq_t *wqe = rxq->wqes + slot;
> 
>while (n >= 1)
> 
> 
> Thanks!
> 
> > On 6 May 2020, at 19:45, Elias Rudberg 
> > wrote:
> > 
> > Hello VPP experts,
> > 
> > When trying to use the current master branch, we get a segmentation
> > fault error. Here is what it looks like in gdb:
> > 
> > Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fedf91fe700 (LWP 21309)]
> > rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
> > rxq=0x77edea80, is_mlx5dv=1)
> >at vpp/src/plugins/rdma/input.c:115
> > 115   *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
> > + 4));

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16257): https://lists.fd.io/g/vpp-dev/message/16257
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] can the pointer of a used-pool-element change before it's put back ?

2020-05-06 Thread Dave Wallace

From the developer documentation [0]:

"Standard programming error: memorize a pointer to the ith element of a 
vector, and then expand the vector. Vectors expand by 3/2, so such code 
may appear to work for a period of time. Correct code almost always 
memorizes vector indices which are invariant across reallocations."


"Pools
Vppinfra pools combine vectors and bitmaps to rapidly allocate and free 
fixed-size data structures with independent lifetimes. Pools are perfect 
for allocating per-session structures."


Hope this helps,
-daw-

[0] 
https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/developers/infrastructure.html


On 5/6/2020 2:22 PM, Satya Murthy wrote:

Hi,

We are seeing some issue in our plugin that seems to caused by the 
change of pointer for a pool element.


The scenario is as below. Can you please let us know , if this can 
really occur.


1. We have multiple workers
2. We have one global pool of  custom-structures ( this is a non-fixed 
pool)

3. This global pool is protected by a lock for addition and deletion.
4. However, it is not protected for reading. So, all workers refer 
read entries from this pool by using pool_elt_at_index()
5. All the workers keep on adding and deleting elements in this pool 
by taking lock.
6. What we are seeing is, the pointer returned by 
pool_elt_at_index(global_pool, index)

is getting changed in between and causing some issue in our logic.

Couple of questions we have is :
1 ) If a pool element is getting used (not-free-element),  Until that 
element is put back again into the pool, can we assume that the 
address of the element remain same and will not get  chagned by pool 
resizing.


2) Having pools that spans across threads will cause any issues like 
this while reading the elements ( though we protect the pool for 
add/del of elements using a spin lock) ?


--
Thanks & Regards,
Murthy




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16256): https://lists.fd.io/g/vpp-dev/message/16256
Mute This Topic: https://lists.fd.io/mt/74034826/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Damjan Marion via lists.fd.io

Can you try this:

diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
index cf0b6bffe..b461ee27b 100644
--- a/src/plugins/rdma/input.c
+++ b/src/plugins/rdma/input.c
@@ -103,7 +103,7 @@ rdma_device_input_refill (vlib_main_t * vm, rdma_device_t * 
rd,

   if (is_mlx5dv)
 {
-  u64 va[8];
+  u64 va[8] __attribute__((aligned(32)));
   mlx5dv_rwq_t *wqe = rxq->wqes + slot;

   while (n >= 1)


Thanks!

> On 6 May 2020, at 19:45, Elias Rudberg  wrote:
> 
> Hello VPP experts,
> 
> When trying to use the current master branch, we get a segmentation
> fault error. Here is what it looks like in gdb:
> 
> Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fedf91fe700 (LWP 21309)]
> rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
> rxq=0x77edea80, is_mlx5dv=1)
>at vpp/src/plugins/rdma/input.c:115
> 115 *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
> + 4));
> (gdb) bt
> #0  rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
> rxq=0x77edea80, is_mlx5dv=1)
>at vpp/src/plugins/rdma/input.c:115
> #1  0x7fffa84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0,
> node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1)
>at vpp/src/plugins/rdma/input.c:622
> #2  0x7fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0,
> node=0x7ff5ccdfee00, frame=0x0)
>at vpp/src/plugins/rdma/input.c:647
> #3  0x760e3155 in dispatch_node (vm=0x7ff8a5d2f4c0,
> node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
>last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235
> #4  0x760ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0,
> is_main=0) at vpp/src/vlib/main.c:1815
> #5  0x760dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at
> vpp/src/vlib/main.c:1996
> #6  0x761345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at
> vpp/src/vlib/threads.c:1795
> #7  0x75531954 in clib_calljmp () at
> vpp/src/vppinfra/longjmp.S:123
> #8  0x7fedf91fdce0 in ?? ()
> #9  0x7612cd53 in vlib_worker_thread_bootstrap_fn
> (arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> This segmentation fault happens the same way every time I try to start
> VPP.
> 
> This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs
> and a Intel Xeon Gold 6126 CPU.
> 
> I have looked back at recent changes and found that this problem
> started with the commit 4ba16a44 "misc: switch to clang-9" dated April
> 28. Before that we could use the master branch without thie problem.
> 
> Changing back to gcc by removing clang in src/CMakeLists.txt makes the
> error go away. However, there is then instead a problem with a "symbol
> lookup error" for crypto_native_plugin.so: undefined symbol:
> crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling
> the crypto_native plugin)
> 
> So, two problems:
> 
> (1) The segmentation fault itself, perhaps indicating a bug somewhere
> but seems to appear only with clang and not with gcc
> 
> (2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem
> when trying to use gcc instead of clang
> 
> What do you think about these?
> 
> As a short-term fix, is removing clang in src/CMakeLists.txt reasonable
> or is there a better/easier workaround?
> 
> Does anyone else use the rdma plugin when compiling using clang --
> perhaps that combination triggers this problem?
> 
> Best regards,
> Elias
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16255): https://lists.fd.io/g/vpp-dev/message/16255
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] can the pointer of a used-pool-element change before it's put back ?

2020-05-06 Thread Satya Murthy
Hi,

We are seeing some issue in our plugin that seems to caused by the change of 
pointer for a pool element.

The scenario is as below. Can you please let us know , if this can really occur.

1. We have multiple workers
2. We have one global pool of  custom-structures ( this is a non-fixed pool)
3. This global pool is protected by a lock for addition and deletion.
4. However, it is not protected for reading. So, all workers refer read entries 
from this pool by using pool_elt_at_index()
5. All the workers keep on adding and deleting elements in this pool by taking 
lock.
6. What we are seeing is, the pointer returned by 
pool_elt_at_index(global_pool, index)
is getting changed in between and causing some issue in our logic.

Couple of questions we have is :
1 ) If a pool element is getting used (not-free-element),  Until that element 
is put back again into the pool, can we assume that the address of the element 
remain same and will not get  chagned by pool resizing.

2) Having pools that spans across threads will cause any issues like this while 
reading the elements ( though we protect the pool for add/del of elements using 
a spin lock) ?

--
Thanks & Regards,
Murthy
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16254): https://lists.fd.io/g/vpp-dev/message/16254
Mute This Topic: https://lists.fd.io/mt/74034826/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Dave Barach via lists.fd.io
Could we please see the faulting instruction, as well as the vector register 
contents involved?

As in "x/i $pc", and the ymmX registers involved?

If the vector instruction requires alignment, "movaps" or similar, it wouldn't 
be a shock to discover an unaligned address. We've already found and fixed a 
few of those since switching to clang, and I have to say that "va + 4" raises 
all sorts of aligned vector instruction red flags...


FWIW... Dave

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Elias Rudberg
Sent: Wednesday, May 6, 2020 1:46 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Segmentation fault in rdma_device_input_refill when using 
clang compiler

Hello VPP experts,

When trying to use the current master branch, we get a segmentation fault 
error. Here is what it looks like in gdb:

Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fedf91fe700 (LWP 21309)] rdma_device_input_refill 
(vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
115   *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
+ 4));
(gdb) bt
#0  rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, 
rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
#1  0x7fffa84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:622
#2  0x7fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, frame=0x0)
at vpp/src/plugins/rdma/input.c:647
#3  0x760e3155 in dispatch_node (vm=0x7ff8a5d2f4c0, 
node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT, 
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235
#4  0x760ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0,
is_main=0) at vpp/src/vlib/main.c:1815
#5  0x760dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at
vpp/src/vlib/main.c:1996
#6  0x761345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at
vpp/src/vlib/threads.c:1795
#7  0x75531954 in clib_calljmp () at
vpp/src/vppinfra/longjmp.S:123
#8  0x7fedf91fdce0 in ?? ()
#9  0x7612cd53 in vlib_worker_thread_bootstrap_fn
(arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584 Backtrace stopped: previous 
frame inner to this frame (corrupt stack?)

This segmentation fault happens the same way every time I try to start VPP.

This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs and a 
Intel Xeon Gold 6126 CPU.

I have looked back at recent changes and found that this problem started with 
the commit 4ba16a44 "misc: switch to clang-9" dated April 28. Before that we 
could use the master branch without thie problem.

Changing back to gcc by removing clang in src/CMakeLists.txt makes the error go 
away. However, there is then instead a problem with a "symbol lookup error" for 
crypto_native_plugin.so: undefined symbol:
crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling the 
crypto_native plugin)

So, two problems:

(1) The segmentation fault itself, perhaps indicating a bug somewhere but seems 
to appear only with clang and not with gcc

(2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem when 
trying to use gcc instead of clang

What do you think about these?

As a short-term fix, is removing clang in src/CMakeLists.txt reasonable or is 
there a better/easier workaround?

Does anyone else use the rdma plugin when compiling using clang -- perhaps that 
combination triggers this problem?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16253): https://lists.fd.io/g/vpp-dev/message/16253
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler

2020-05-06 Thread Elias Rudberg
Hello VPP experts,

When trying to use the current master branch, we get a segmentation
fault error. Here is what it looks like in gdb:

Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fedf91fe700 (LWP 21309)]
rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
115   *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
+ 4));
(gdb) bt
#0  rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
rxq=0x77edea80, is_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:115
#1  0x7fffa84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1)
at vpp/src/plugins/rdma/input.c:622
#2  0x7fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, frame=0x0)
at vpp/src/plugins/rdma/input.c:647
#3  0x760e3155 in dispatch_node (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235
#4  0x760ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0,
is_main=0) at vpp/src/vlib/main.c:1815
#5  0x760dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at
vpp/src/vlib/main.c:1996
#6  0x761345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at
vpp/src/vlib/threads.c:1795
#7  0x75531954 in clib_calljmp () at
vpp/src/vppinfra/longjmp.S:123
#8  0x7fedf91fdce0 in ?? ()
#9  0x7612cd53 in vlib_worker_thread_bootstrap_fn
(arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This segmentation fault happens the same way every time I try to start
VPP.

This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs
and a Intel Xeon Gold 6126 CPU.

I have looked back at recent changes and found that this problem
started with the commit 4ba16a44 "misc: switch to clang-9" dated April
28. Before that we could use the master branch without thie problem.

Changing back to gcc by removing clang in src/CMakeLists.txt makes the
error go away. However, there is then instead a problem with a "symbol
lookup error" for crypto_native_plugin.so: undefined symbol:
crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling
the crypto_native plugin)

So, two problems:

(1) The segmentation fault itself, perhaps indicating a bug somewhere
but seems to appear only with clang and not with gcc

(2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem
when trying to use gcc instead of clang

What do you think about these?

As a short-term fix, is removing clang in src/CMakeLists.txt reasonable
or is there a better/easier workaround?

Does anyone else use the rdma plugin when compiling using clang --
perhaps that combination triggers this problem?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16252): https://lists.fd.io/g/vpp-dev/message/16252
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] #hoststack

2020-05-06 Thread Florin Coras
Hi Igor, 

Could you try [1] instead the echo apps. 

Regards,
Florin

[1] https://wiki.fd.io/view/VPP/HostStack/LDP/iperf

> On May 6, 2020, at 9:41 AM, igorkor.3v...@gmail.com wrote:
> 
> Hi
> 
> Guys
> 
> I'm learning HostStack and have tried example from this page 
> https://wiki.fd.io/view/VPP/HostStack 
> For some unclear reason it doesn't work. At the same moment example from 
> https://wiki.fd.io/view/VPP/HostStack/EchoClientServer 
>  works correctly.
> 
>
> My configuration is:
> 
> Ubuntu 16.04 on vagrant.
> 
> 2 linked veths, 2 vpp instances. Configuration files are attached( 
> startup1.conf and startup2.conf use vpp1.conf and vpp2.conf respectively)
> 
>
> I cloned the latest version of vpp from git by link 
> https://gerrit.fd.io/r/vpp . Branch is master.
> 
> When i try to initialize server by command:
> 
> sudo build-root/install-vpp_debug-native/vpp/bin/vpp_echo server uri 
> tcp://10.0.0.1/1000
> 
> I receive the following error log:
> 
> Timing qconnect:lastbyte
> Missing Start Timing Event (qconnect)!
> Missing End Timing Event (lastbyte)!
>  TX 
> 0 bytes (0 mbytes, 0 gbytes) in 0.00 seconds
>  RX 
> 0 bytes (0 mbytes, 0 gbytes) in 0.00 seconds
> 
> Received close on 0 streams (and 0 Quic conn)
> Received reset on 0 streams (and 0 Quic conn)
> Sent close on 0 streams (and 0 Quic conn)
> Discarded 0 streams (and 0 Quic conn)
> 
> Got accept on 0 streams (and 0 Quic conn)
> Got connected on  0 streams (and 0 Quic conn)
> 
> Failure Return Status: 42
> ECHO_FAIL_VL_API_APP_ATTACH (23): attach failed: Unsupported application 
> config | ECHO_FAIL_ATTACH_TO_VPP (17): Couldn't attach to vpp, did you run 
>  ? | ECHO_FAIL_MISSING_START_EVENT (41): Expected event 
> qconnect to happen, but it did not! | ECHO_FAIL_MISSING_END_EVENT (42): 
> Expected event lastbyte to happen, but it did not!
> 
> 1. Can you advice on this issue, please?
> 
> 2. What is the difference between vpp_echo application and "test echo" 
> command in vppctl?
> 
>
> Regards
> Igor
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16251): https://lists.fd.io/g/vpp-dev/message/16251
Mute This Topic: https://lists.fd.io/mt/74032498/21656
Mute #hoststack: https://lists.fd.io/mk?hashtag=hoststack=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] #hoststack

2020-05-06 Thread igorkor . 3vium
Hi

Guys

I'm learning HostStack and have tried example from this page 
https://wiki.fd.io/view/VPP/HostStack

For some unclear reason it doesn't work. At the same moment example from 
https://wiki.fd.io/view/VPP/HostStack/EchoClientServer works correctly.

My configuration is:

Ubuntu 16.04 on vagrant.

2 linked veths, 2 vpp instances. Configuration files are attached( 
startup1.conf and startup2.conf use vpp1.conf and vpp2.conf respectively)

I cloned the latest version of vpp from git by link https://gerrit.fd.io/r/vpp. 
Branch is master.

When i try to initialize server by command:

sudo build-root/install-vpp_debug-native/vpp/bin/vpp_echo server uri 
tcp://10.0.0.1/1000

I receive the following error log:

Timing qconnect:lastbyte
Missing Start Timing Event (qconnect)!
Missing End Timing Event (lastbyte)!
 TX 
0 bytes (0 mbytes, 0 gbytes) in 0.00 seconds
 RX 
0 bytes (0 mbytes, 0 gbytes) in 0.00 seconds

Received close on 0 streams (and 0 Quic conn)
Received reset on 0 streams (and 0 Quic conn)
Sent close on 0 streams (and 0 Quic conn)
Discarded 0 streams (and 0 Quic conn)

Got accept on 0 streams (and 0 Quic conn)
Got connected on  0 streams (and 0 Quic conn)

Failure Return Status: 42
ECHO_FAIL_VL_API_APP_ATTACH (23): attach failed: Unsupported application config 
| ECHO_FAIL_ATTACH_TO_VPP (17): Couldn't attach to vpp, did you run  ? | ECHO_FAIL_MISSING_START_EVENT (41): Expected event qconnect to 
happen, but it did not! | ECHO_FAIL_MISSING_END_EVENT (42): Expected event 
lastbyte to happen, but it did not!

1. Can you advice on this issue, please?

2. What is the difference between vpp_echo application and "test echo" command 
in vppctl?

Regards
Igor


startup1.conf
Description: Binary data


startup2.conf
Description: Binary data


vpp1.conf
Description: Binary data


vpp2.conf
Description: Binary data
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16250): https://lists.fd.io/g/vpp-dev/message/16250
Mute This Topic: https://lists.fd.io/mt/74032498/21656
Mute #hoststack: https://lists.fd.io/mk?hashtag=hoststack=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] DPO leak in various tunnel types (gtpu, geneve, vxlan, ...)

2020-05-06 Thread Andrew Yourtchenko
Nick,

Fixes are always good, and especially with the UTs, so thanks a lot ! 

I took a glance at the UTs...

one question and the bigger remark:

1) The UT stuff looks fairly straightforward except the tests are IPv4-only - 
is it enough test only one address family ? If yes - a comment inside the UT 
would be cool.. 

2) the deletion of ipv6 parameter in vpp_papi_provider.py seems like some stray 
unrelated change which sneaked in ? (That’s actually what made me to -1).

That’s the only two things I could see, the UTs seem fairly straightforward.

--a

>> On 6 May 2020, at 13:55, Nick Zavaritsky  wrote:
> 
> Dear VPP hackers,
> 
> May I kindly ask to do a code review of the proposed fix?
> 
> Thanks,
> N
> 
> 
>> On 21. Apr 2020, at 15:15, Neale Ranns (nranns)  wrote:
>>
>> Hi Nick,
>>
>> A +1 from me for the VPP change, thank you.
>> I’m all for UT too, but I’ll let some others review the UT first before I 
>> merge.
>>
>> /neale
>>
>> From:  on behalf of Nick Zavaritsky 
>> 
>> Date: Tuesday 21 April 2020 at 14:57
>> To: "vpp-dev@lists.fd.io" 
>> Subject: [vpp-dev] DPO leak in various tunnel types (gtpu, geneve, vxlan, 
>> ...)
>>
>> Dear VPP hackers,
>> 
>> We are spawning and destroying GTPU tunnels at a high rate. Only 10K tunnels 
>> ever exist simultaneously in our test.
>> 
>> With default settings, we observe out of memory error in load_balance_create 
>> after approximately .5M tunnel create commands. Apparently, load balancers 
>> are leaking.
>> 
>> As far as my understanding goes, a load_balancer is first created in 
>> fib_entry_track, to get notifications about the route changes. This is only 
>> created once for a unique DIP and the refcount is correctly decremented once 
>> the last subscription ceases.
>> 
>> The refcount is also bumped in gtpu_tunnel_restack_dpo, when next_dpo is 
>> updated. Since the later is never reset, the refcount never drops to zero.
>> 
>> This is straightforward to exercise in CLI: create and immediately destroy 
>> several GTPU tunnels. Compare `show dpo memory` output before and after.
>> 
>> It looks like other tunnel types, namely geneve, vxlan, vxlan-gpe and 
>> vxlan-gbp are also susceptible.
>> 
>> My take was to add a call to dpo_reset in add_del_tunnel delete path. Please 
>> take a look at the patch: https://gerrit.fd.io/r/c/vpp/+/26617
>> 
>> Note: was unable to make a test case for vxlan and vxlan-gbp since they 
>> don't point next_dpo at a load balancer but rather at a dpo picked up from a 
>> bucket in the load balancer.
>> 
>> Best,
>> N 
> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16249): https://lists.fd.io/g/vpp-dev/message/16249
Mute This Topic: https://lists.fd.io/mt/73171448/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread Nguyễn Thế Hiếu
Hi Dave &  Joe.
Thanks for your both answers. I'm glad to see them.
I will try to use vlib_node_increment_counter() function instead.

But I still wonder. I know printf() can be bottleneck of performance,
that's why I just call printf() every 600.000.000 packets.
Even if  thoughput is 7Gbps, 600.000.000 packets mean VPP just call
printf() function 1 time per minute.
So, I think at least VPP throughput should be 7Gbps at start testing point
and just down after 1 minute.
But in my case, the throughput is 300Mbps from start testing point.

I have increased  600.000.000 to 1.200.000.000 and got the same result.

Best regards!
HieuNT

Vào Th 4, 6 thg 5, 2020 vào lúc 19:08 Dave Barach (dbarach) <
dbar...@cisco.com> đã viết:

> If you want to count things in data plane nodes, use a per-node counter
> and the “show error” debug CLI to inspect it.
>
>
>
> To count every packet fed to the node dispatch function, you can bump a
> node counter once per frame:
>
>
>
>   vlib_node_increment_counter (vm, myplugin_node.index,
> MYPLUGIN_ERROR_WHATEVER, frame->n_vectors);
>
>
>
> A single printf call costs roughly the same number of clock cycles as
> processing O(10) packets from start to finish. It’s really expensive.
>
>
>
> Dave
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Nguy?n
> Th? Hi?u
> *Sent:* Wednesday, May 6, 2020 5:26 AM
> *To:* vpp-dev@lists.fd.io
> *Subject:* [vpp-dev] Why VPP performance down very much when I use
> print() function.
>
>
>
> Hi VPP team.
> I create a simple VPP node name "swap_mac". "swap_mac" node just swap
> between source and destination MAC address and send packet back.
> Then, I use Pktgen tool to send packet to VPP. In VPP, the packet will go
> to swap_mac->interface-output node and finally send back Pktgen tool.
>
> I found out with this test model, VPP throughput can go up *7Gbps* in my
> lab. But VPP throughput just is *300Mbps *when I add a *counter *variable
> to count number of received packet and a printf() to print value of
> *counter* in "swap_mac" node function.
> My code:
>
> counter ++
> if((counter % 600.000.000) == 0)
> {
>printf("Receive packets: %ld", counter );
> }
> So, why VPP throughput change from 7Gbps to 300Mbps when I just call
> printf() function every  600.000.000 packets?
> ( I have tried to  comment out printf() , VPP throughput go up 7Gb again. )
>
> Please help me to see it. I'm sorry for my bad English.
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16248): https://lists.fd.io/g/vpp-dev/message/16248
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] IPsec tunnel interfaces?

2020-05-06 Thread Christian Hopps
Hi, vpp-dev,

Post 19.08 seems to have removed IPsec logical interfaces.

One cannot always use transport mode IPsec.

How can I get the efficiency of route based (FIB) IPsec w/o transport mode? 
Adding superfluous encapsulations (wasting bandwidth) to replace this 
(seemingly lost, hope not) functionality is not an option.

Thanks,
Chris.-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16247): https://lists.fd.io/g/vpp-dev/message/16247
Mute This Topic: https://lists.fd.io/mt/74027328/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread Dave Barach via lists.fd.io
If you want to count things in data plane nodes, use a per-node counter and the 
“show error” debug CLI to inspect it.

To count every packet fed to the node dispatch function, you can bump a node 
counter once per frame:

  vlib_node_increment_counter (vm, myplugin_node.index, 
MYPLUGIN_ERROR_WHATEVER, frame->n_vectors);

A single printf call costs roughly the same number of clock cycles as 
processing O(10) packets from start to finish. It’s really expensive.

Dave

From: vpp-dev@lists.fd.io  On Behalf Of Nguy?n Th? Hi?u
Sent: Wednesday, May 6, 2020 5:26 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Why VPP performance down very much when I use print() 
function.

Hi VPP team.
I create a simple VPP node name "swap_mac". "swap_mac" node just swap between 
source and destination MAC address and send packet back.
Then, I use Pktgen tool to send packet to VPP. In VPP, the packet will go to 
swap_mac->interface-output node and finally send back Pktgen tool.

I found out with this test model, VPP throughput can go up 7Gbps in my lab. But 
VPP throughput just is 300Mbps when I add a counter variable to count number of 
received packet and a printf() to print value of  counter in "swap_mac" node 
function.
My code:

counter ++
if((counter % 600.000.000) == 0)
{
   printf("Receive packets: %ld", counter );
}
So, why VPP throughput change from 7Gbps to 300Mbps when I just call printf() 
function every  600.000.000 packets?
( I have tried to  comment out printf() , VPP throughput go up 7Gb again. )

Please help me to see it. I'm sorry for my bad English.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16246): https://lists.fd.io/g/vpp-dev/message/16246
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] DPO leak in various tunnel types (gtpu, geneve, vxlan, ...)

2020-05-06 Thread Nick Zavaritsky
Dear VPP hackers,

May I kindly ask to do a code review of the proposed fix?

Thanks,
N


On 21. Apr 2020, at 15:15, Neale Ranns (nranns) 
mailto:nra...@cisco.com>> wrote:

Hi Nick,

A +1 from me for the VPP change, thank you.
I’m all for UT too, but I’ll let some others review the UT first before I merge.

/neale

From: mailto:vpp-dev@lists.fd.io>> on behalf of Nick 
Zavaritsky mailto:nick.zavarit...@emnify.com>>
Date: Tuesday 21 April 2020 at 14:57
To: "vpp-dev@lists.fd.io" 
mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] DPO leak in various tunnel types (gtpu, geneve, vxlan, ...)

Dear VPP hackers,

We are spawning and destroying GTPU tunnels at a high rate. Only 10K tunnels 
ever exist simultaneously in our test.

With default settings, we observe out of memory error in load_balance_create 
after approximately .5M tunnel create commands. Apparently, load balancers are 
leaking.

As far as my understanding goes, a load_balancer is first created in 
fib_entry_track, to get notifications about the route changes. This is only 
created once for a unique DIP and the refcount is correctly decremented once 
the last subscription ceases.

The refcount is also bumped in gtpu_tunnel_restack_dpo, when next_dpo is 
updated. Since the later is never reset, the refcount never drops to zero.

This is straightforward to exercise in CLI: create and immediately destroy 
several GTPU tunnels. Compare `show dpo memory` output before and after.

It looks like other tunnel types, namely geneve, vxlan, vxlan-gpe and vxlan-gbp 
are also susceptible.

My take was to add a call to dpo_reset in add_del_tunnel delete path. Please 
take a look at the patch: https://gerrit.fd.io/r/c/vpp/+/26617

Note: was unable to make a test case for vxlan and vxlan-gbp since they don't 
point next_dpo at a load balancer but rather at a dpo picked up from a bucket 
in the load balancer.

Best,
N

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16245): https://lists.fd.io/g/vpp-dev/message/16245
Mute This Topic: https://lists.fd.io/mt/73171448/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread "Zhou You(Joe Zhou)
HiHiếu,

 From your description, printf is the bottleneck of performance. you need 
to understand what's going on behind "printf"。
 printf involves I/O operation,I/O operation is really slow in computer, 
when you calling printf, it will call a syscall in glibc and trap to kernel, 
then call write() in device driver of tty。
 keep in mind that VPP/DPDK should always running in user mode to achieve 
high performance, trap to kernel may cause low performance.


 That's my option, you can testing it by double or triple 600.000.000 and 
watching corresponding result.


 --
 Best Regard
 Joe








--Original--
From:"Nguyễn Thế Hiếu"-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16244): https://lists.fd.io/g/vpp-dev/message/16244
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Why VPP performance down very much when I use print() function.

2020-05-06 Thread Nguyễn Thế Hiếu
Hi VPP team.
I create a simple VPP node name "swap_mac". "swap_mac" node just swap between 
source and destination MAC address and send packet back.
Then, I use Pktgen tool to send packet to VPP. In VPP, the packet will go to 
swap_mac->interface-output node and finally send back Pktgen tool.

I found out with this test model, VPP throughput can go up *7Gbps* in my lab. 
But VPP throughput just is *300Mbps* when I add a counter variable to count 
number of received packet and a printf() to print value of counter in 
"swap_mac" node function.
My code:

counter ++
if((counter % 600.000.000) == 0)
{
printf("Receive packets: %ld", counter );
}
So, why VPP throughput change from 7Gbps to 300Mbps when I just call printf() 
function every  600.000.000 packets?
( I have tried to  comment out printf() , VPP throughput go up 7Gb again. )

Please help me to see it. I'm sorry for my bad English.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16243): https://lists.fd.io/g/vpp-dev/message/16243
Mute This Topic: https://lists.fd.io/mt/74025182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] worker barrier state

2020-05-06 Thread Christian Hopps


> On May 4, 2020, at 3:59 AM, Neale Ranns via lists.fd.io 
>  wrote:
> 
> 
> Hi Chris,
> 
> With SAs there are two scenarios to consider for inflight packets
> 1) the SA is unlinked
> 2) the SA is deleted.
> 
> We've talked at length about how to deal with 2).
> By 'unlinked' I mean that whatever config dictated that an SA be used has now 
> gone (like tunnel protection or SPD policy). An inflight packet that is 
> processed by an IPSec node would (by either scheme we discussed for 1)) 
> retrieve the SA do encrypt/decrypt and then attempt to send the packet on its 
> merry way; this is the point at which it could fail. I say 'could' because it 
> depends on how the unlink affected the vlib graph. In today's tunnel 
> protection esp_encrpyt does vnet_feature_next(), this is not going to end 
> well once esp_encrypt is no longer a feature on the arc. In tomorrow's tunnel 
> protection we'll change that:
>   https://gerrit.fd.io/r/c/vpp/+/26265
> and it should be safe. But, what if the next API removes the tunnel whilst 
> there are still inflight packets? Is it still safe? Is it still correct to 
> send encrypted tunnelled packets?

It's safe to send in-flight encrypted packets for an SA that was deleted after 
they were encrypted, this reduces to buffering. After the SA is deleted new 
packets won't follow that path (they won't have the index+gen associate with 
them at all b/c of policy). If the user configures things such that packets 
which used to be encrypted should no longer encrypted, well then that's what 
they asked for for *newly arived* packets.

The security issue to be careful of here would be to have in-flight packets 
that progressed through valid policy such that they were "in the tunnel" and 
waiting to be encrypted, and then send them un-encrypted. Hopefully the change 
you reference above does not do that.

> 
> I think I'm coming round to the opinion that the safest way to approach this 
> is to ensure that if the SA can be found, whatever state it is in (created, 
> unlinked or deleted) then it needs to have a flag that states whether it 
> should be used or the packet dropped. We'd update this state when the SA is 
> [un]linked, with the barrier held.

Why does the SA object needs to be kept around? The indexes generation number 
being unequal is enough to say the object is deleted. Per the concern I 
mentioned above, we should make sure the packet doesn't *forget* that its been 
associated with that an index+generation number (i.e., it's in the tunnel), but 
that's orthogonal to whether the SA state itself is still in the pool.

> 
> On a somewhat related topic, you probably saw:
>  https://gerrit.fd.io/r/c/vpp/+/26811
> as an example of getting MP safe APIs wrong.

I made a similar change locally back when we started talking about this. :)

Thanks,
Chris.



> /neale
> 
> On 24/04/2020 16:34, "Christian Hopps"  wrote:
> 
> 
>Hi Neale,
> 
>Comments also inline...
> 
>Neale Ranns (nranns)  writes:
> 
>> Hi Chris,
>> 
>> Comments inline...
>> 
>> On 15/04/2020 15:14, "Christian Hopps"  wrote:
>> 
>>Hi Neale,
>> 
>>I agree that something like 4, is probably the correct approach. I had a 
>> side-meeting with some of the ARM folks (Govind and Honnappa), and we 
>> thought using a generation number for the state rather than just waiting 
>> "long-enough" to recycle it could work. The generation number would be the 
>> atomic value associated with the state. So consider this API:
>> 
>> - MP-safe pools store generation numbers alongside each object.
>> - When you allocate a new object from the pool you get an index and 
>> generation number.
>> - When storing the object index you also save the generation number.
>> - When getting a pointer to the object you pass the API the index and 
>> generation number and it will return NULL if the generation number did not 
>> match the one stored with the object in the pool.
>> - When you delete a pool object its generation number is incremented 
>> (with barrier).
>> 
>>The size of the generation number needs to be large enough to guarantee 
>> there is no wrap with objects still in the system that have stored the 
>> generation number. Technically this is a "long-enough" aspect of the scheme. 
>> :) One could imagine using less than 64 bits for the combination of index 
>> and generation, if that was important.
>> 
>> It's a good scheme, I like it.
>> I assume the pool indices would be 64 bit and the separation between vector 
>> index and generation would be hidden from the user. Maybe a 32 bit value 
>> would suffice in most cases, but why skimp...
> 
>I was thinking to keep the index and generation number separate at the 
> most basic API, to allow for selecting the size of the each independently and 
> for efficient storage. I'm thinking for some applications one might want to 
> do something like
> 
>cacheline_packed_struct {
>...
>u32 foo_index;
>u32 bar_index;
>