Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for fd.io_vpp

2017-09-01 Thread Sergio Gonzalez Monroy

Hi Eric,

Are you building against an external DPDK or using the one built by VPP?
VPP does add -fPIC to DPDK CPU_CFLAGS when it builds it from source.

Thanks,
Sergio

On 01/09/2017 05:36, Eric Chen wrote:


Hi Damjan,

Following your suggestion, I upgrade my Ubuntu to Zesty (17.04),

gcc version 6.3.0 20170406 (Ubuntu/Linaro 6.3.0-12ubuntu2)

the previous issue gone,

however did you meet below issue before:

it happens both when I build dpaa2(over dpdk) and marvell (over 
marvell-dpdk).


I checked dpdk when build lib, there is no –FPIC option,

So how to fix it?

/usr/bin/ld: 
/home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o): 
relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol 
`__stack_chk_guard@@GLIBC_2.17' can not be used when making a shared 
object; recompile with -fPIC


/usr/bin/ld: 
/home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o)(.text+0x44): 
unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol 
`__stack_chk_guard@@GLIBC_2.17'


/usr/bin/ld: final link failed: Bad value

collect2: error: ld returned 1 exit status

Thanks

Eric

*From:*Damjan Marion [mailto:dmarion.li...@gmail.com]
*Sent:* 2017年8月27日3:11
*To:* Eric Chen 
*Cc:* Dave Barach ; Sergio Gonzalez Monroy 
; vpp-dev 
*Subject:* Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 
box for fd.io_vpp


Hi Eric,

Same code compiles perfectly fine on ARM64 with newer gcc version.

If you are starting new development cycle it makes sense to me that 
you pick up latest ubuntu release,


specially when new hardware is involved instead of trying to chase 
this kind of bugs.


Do you have any strong reason to stay on ubuntu 16.04? Both 17.04 and 
upcoming 17.10 are working fine on arm64 and


compiling of VPP works without issues.

Thanks,

Damjan

On 26 Aug 2017, at 15:23, Eric Chen mailto:eri...@marvell.com>> wrote:

Dave,

Thanks for your answer.

I tried below variation, it doesn’t help.

Btw, there is not only one place reporting “error: unable to
generate reloads for:”,

I will try to checkout the version of 17.01.1,

since with the same native compiler, I succeeded to build
fd.io_odp4vpp (which is based onfd.io 17.01.1).

will keep you posted.

Thanks

Eric

*From:*Dave Barach (dbarach) [mailto:dbar...@cisco.com]
*Sent:*2017年8月26日20:08
*To:*Eric Chen mailto:eri...@marvell.com>>;
Sergio Gonzalez Monroy mailto:sergio.gonzalez.mon...@intel.com>>; vpp-dev
mailto:vpp-dev@lists.fd.io>>
*Subject:*RE: [vpp-dev] [EXT] Re: compiling error natively on an
am64 box for fd.io_vpp

Just so everyone knows, the function in question is almost too
simple for its own good:

always_inline uword

vlib_process_suspend_time_is_zero (f64 dt)

{

  return dt < 10e-6;

}

What happens if you try this variation?

always_inline int

vlib_process_suspend_time_is_zero (f64 dt)

{

  if (dt < 10e-6)

 return 1;

  return 0;

}

This does look like a gcc bug, but it may not be hard to work
around...

Thanks… Dave

*From:*vpp-dev-boun...@lists.fd.io
[mailto:vpp-dev-boun...@lists.fd.io]*On
Behalf Of*Eric Chen
*Sent:*Friday, August 25, 2017 11:02 PM
*To:*Eric Chen mailto:eri...@marvell.com>>;
Sergio Gonzalez Monroy mailto:sergio.gonzalez.mon...@intel.com>>; vpp-dev
mailto:vpp-dev@lists.fd.io>>
*Subject:*Re: [vpp-dev] [EXT] Re: compiling error natively on an
am64 box for fd.io_vpp

Hi Sergio,

I upgrading to Ubuntu 16.04,

Succedd to Nativly build fd.io_odp4vpp (w / odp-linux),

However when buidl fd.io_vpp (w/ dpdk),  it reported below error,

(almost the same , only difference is over dpdk or
odp-linux)

Anyone met before? Seem a bug of gcc.

In file included from

/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/error_funcs.h:43:0,

from
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/vlib.h:70,

from
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vnet/l2/l2_fib.c:19:

/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:
In function ‘vlib_process_suspend_time_is_zero’:


/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:442:1:
error: unable to generate reloads for:

}

^

(insn 11 37 12 2 (set (reg:CCFPE 66 cc)

(compare:CCFPE (reg:DF 79)

(reg:DF 80)))
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:441
395 {*cmpedf}

(expr_list:REG_DEAD (reg:DF 80)

(expr_list:REG_DEAD (reg:DF 79)

(nil


/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:442:1:
internal compiler error: in curr_insn_transform, at
lra-constraints.c:3509

Please submit a full bug report,

with preprocessed source if appropriate.

See > 

Re: [vpp-dev] Query for IPSec support on VPP

2017-09-01 Thread Sergio Gonzalez Monroy

FYI I updated the doc, hopefully everything is correct and up to date now.
https://gerrit.fd.io/r/#/c/8273/

Thanks,
Sergio

On 31/08/2017 10:00, Sergio Gonzalez Monroy wrote:

On 31/08/2017 09:37, Mukesh Yadav (mukyadav) wrote:


Thanks a lot Sergio for lot of patience and help,



No problem at all. I said before, it is great that someone else goes 
through the docs/wiki to double check everything is working as described.



With you latest comments, I can see dpdk IPSec is happening.
There are some issues, I am getting where post decryption ip4-input 
is not called.
As such I have kept config of IPSec same as was when working with VPP 
core IPSec.


I need to dig further, as such it seems packet is getting dropped in 
*dpdk-esp-decrypt().

*Is there some way to find out some errors related.



There are a few reasons why the packet would be drop in that node, but 
I was expecting the trace to show the drop node for those packets.

What is the output of 'show error' ?

I see you are setting up IPSec as an interface feature, with SPD and 
transport mode SA, I will double check that there are no bugs.


Thanks,
Sergio


**

I am using vpp v17.10 andDPDK 17.05.0

Currently trace is looking like below:
00:01:30:013871: dpdk-input

GigabitEthernet0/8/0 rx queue 0

buffer 0x4d67: current data 14, length 136, free-list 0, clone-count 
0, totlen-nifb 0, trace 0x0


PKT MBUF: port 0, nb_segs 1, pkt_len 150

buf_len 2176, data_len 150, ol_flags 0x0, data_off 128, phys_addr 
0x9cd35a00


packet_type 0x0

IP4: 08:00:27:ba:35:60 -> 08:00:27:67:d4:f0

IPSEC_ESP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 136, checksum 0x9d0f

fragment id 0x44f2, flags DONT_FRAGMENT

00:01:30:013902: ip4-input

IPSEC_ESP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 136, checksum 0x9d0f

fragment id 0x44f2, flags DONT_FRAGMENT

00:01:30:013909: ipsec-input-ip4

esp: sa_id 20 spi 1000 seq 7

00:01:30:013911: *dpdk-esp-decrypt*

esp: crypto aes-cbc-128 integrity sha1-96

Earlier when I was using VPP core IPSec, trace was looking like

00:03:41:528507: dpdk-input

GigabitEthernet0/8/0 rx queue 0

buffer 0x4d19: current data 14, length 136, free-list 0, clone-count 
0, totlen-nifb 0, trace 0x0


PKT MBUF: port 0, nb_segs 1, pkt_len 150

buf_len 2176, data_len 150, ol_flags 0x0, data_off 128, phys_addr 
0x7de34680


packet_type 0x0

IP4: 08:00:27:ba:35:60 -> 08:00:27:67:d4:f0

IPSEC_ESP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 136, checksum 0xc1c0

fragment id 0x2041, flags DONT_FRAGMENT

00:03:41:528548: ip4-input

IPSEC_ESP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 136, checksum 0xc1c0

fragment id 0x2041, flags DONT_FRAGMENT

00:03:41:528556: ipsec-input-ip4

esp: sa_id 20 spi 1000 seq 1

00:03:41:528559*: esp-decrypt***

esp: crypto aes-cbc-128 integrity sha1-96

00:03:41:528648: ip4-input

ICMP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 84, checksum 0x2267

fragment id 0x

ICMP echo_request checksum 0x4201

00:03:41:528649: ip4-lookup

fib 0 dpo-idx 6 flow hash: 0x

ICMP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 84, checksum 0x2267

fragment id 0x

ICMP echo_request checksum 0x4201

00:03:41:528680: ip4-local

ICMP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 84, checksum 0x2267

fragment id 0x

ICMP echo_request checksum 0x4201

00:03:41:528684: ip4-icmp-input

ICMP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 84, checksum 0x2267

fragment id 0x

ICMP echo_request checksum 0x4201

00:03:41:528685: ip4-icmp-echo-request

ICMP: 172.28.128.4 -> 172.28.128.5

tos 0x00, ttl 64, length 84, checksum 0x2267

fragment id 0x

ICMP echo_request checksum 0x4201

00:03:41:528686: ip4-load-balance

fib 0 dpo-idx 13 flow hash: 0x

ICMP: 172.28.128.5 -> 172.28.128.4

tos 0x00, ttl 64, length 84, checksum 0xea0f

fragment id 0x3857

ICMP echo_reply checksum 0x4a01

00:03:41:528688: ip4-rewrite

tx_sw_if_index 1 dpo-idx 1 : ipv4 via 172.28.128.4 
GigabitEthernet0/8/0: 080027ba356008002767d4f00800 flow hash: 0x


: 
080027ba356008002767d4f00800455438574001ea0fac1c8005ac1c


0020: 80044a0107850001d6be8e59898d01001011

00:03:41:528691: ipsec-output-ip4

spd 1

00:03:41:528697: esp-encrypt

esp: spi 1001 seq 0 crypto aes-cbc-128 integrity sha1-96

My configuration is as below:

sudo ifconfig enp0s8 down

sudo service vpp restart

set int ip address GigabitEthernet0/8/0 172.28.128.5/24

set int state GigabitEthernet0/8/0 up

ipsec sa add 10 spi 1001 esp crypto-alg aes-cbc-128 crypto-key 
4a506a794f574265564551694d653768 integ-alg sha1-96 integ-key 
4339314b55523947594d6d3547666b45764e6a58


ipsec sa add 20 spi 1000 esp crypto-alg aes-cbc-128 crypto-key 
4a506a794f574265564551694d653768 integ-alg sha1-96 integ-key 
4339314b55523947594d6d3547666b45764e6a58


ipsec spd add 1

set interface ipsec spd GigabitEthernet0/8/0 1

ipsec policy add spd 1 priority 100 inbound action bypas

Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for fd.io_vpp

2017-09-01 Thread Eric Chen
External DPDK,  I checked by default, DPDK built as static library. (def_xxx)

So how to set in Fdio to not use shared,  any easy configuration?

Thanks
Eric

From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.mon...@intel.com]
Sent: 2017年9月1日 16:19
To: Eric Chen ; Damjan Marion 
Cc: Dave Barach ; vpp-dev 
Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
fd.io_vpp

Hi Eric,

Are you building against an external DPDK or using the one built by VPP?
VPP does add -fPIC to DPDK CPU_CFLAGS when it builds it from source.

Thanks,
Sergio

On 01/09/2017 05:36, Eric Chen wrote:
Hi Damjan,

Following your suggestion, I upgrade my Ubuntu to Zesty (17.04),

gcc version 6.3.0 20170406 (Ubuntu/Linaro 6.3.0-12ubuntu2)

the previous issue gone,

however did you meet below issue before:
it happens both when I build dpaa2(over dpdk) and marvell (over marvell-dpdk).
I checked dpdk when build lib, there is no –FPIC option,
So how to fix it?


/usr/bin/ld: 
/home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o): 
relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol 
`__stack_chk_guard@@GLIBC_2.17' can not be used when making a shared object; 
recompile with -fPIC
/usr/bin/ld: 
/home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o)(.text+0x44):
 unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol 
`__stack_chk_guard@@GLIBC_2.17'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status


Thanks
Eric

From: Damjan Marion [mailto:dmarion.li...@gmail.com]
Sent: 2017年8月27日 3:11
To: Eric Chen 
Cc: Dave Barach ; Sergio Gonzalez 
Monroy 
; 
vpp-dev 
Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
fd.io_vpp

Hi Eric,

Same code compiles perfectly fine on ARM64 with newer gcc version.

If you are starting new development cycle it makes sense to me that you pick up 
latest ubuntu release,
specially when new hardware is involved instead of trying to chase this kind of 
bugs.

Do you have any strong reason to stay on ubuntu 16.04? Both 17.04 and upcoming 
17.10 are working fine on arm64 and
compiling of VPP works without issues.

Thanks,

Damjan


On 26 Aug 2017, at 15:23, Eric Chen 
mailto:eri...@marvell.com>> wrote:

Dave,

Thanks for your answer.
I tried below variation, it doesn’t help.

Btw, there is not only one place reporting “error: unable to generate reloads 
for:”,

I will try to checkout the version of 17.01.1,
since with the same native compiler, I succeeded to build fd.io_odp4vpp (which 
is based on fd.io 17.01.1).

will keep you posted.

Thanks
Eric

From: Dave Barach (dbarach) [mailto:dbar...@cisco.com]
Sent: 2017年8月26日 20:08
To: Eric Chen mailto:eri...@marvell.com>>; Sergio Gonzalez 
Monroy 
mailto:sergio.gonzalez.mon...@intel.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
fd.io_vpp

Just so everyone knows, the function in question is almost too simple for its 
own good:

always_inline uword
vlib_process_suspend_time_is_zero (f64 dt)
{
  return dt < 10e-6;
}

What happens if you try this variation?

always_inline int
vlib_process_suspend_time_is_zero (f64 dt)
{
  if (dt < 10e-6)
 return 1;
  return 0;
}

This does look like a gcc bug, but it may not be hard to work around...

Thanks… Dave

From: vpp-dev-boun...@lists.fd.io 
[mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Eric Chen
Sent: Friday, August 25, 2017 11:02 PM
To: Eric Chen mailto:eri...@marvell.com>>; Sergio Gonzalez 
Monroy 
mailto:sergio.gonzalez.mon...@intel.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
fd.io_vpp

Hi Sergio,

I upgrading to Ubuntu 16.04,

Succedd to Nativly build fd.io_odp4vpp (w / odp-linux),
However when buidl fd.io_vpp (w/ dpdk),  it reported below error,
(almost the same , only difference is over dpdk or odp-linux)

Anyone met before? Seem a bug of gcc.

In file included from 
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/error_funcs.h:43:0,
 from 
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/vlib.h:70,
 from 
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vnet/l2/l2_fib.c:19:
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h: In 
function ‘vlib_process_suspend_time_is_zero’:
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:442:1: 
error: unable to generate reloads for:
}
^
(insn 11 37 12 2 (set (reg:CCFPE 66 cc)
(compare:CCFPE (reg:DF 79)
(reg:DF 80))) 
/home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:441 
395 {*cmpedf}
 (expr_list:REG_DEAD (reg:DF 80)
(expr_list:REG_DEAD (reg:DF 79)
(ni

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Ole Troan
Colin,

Good investigation!

A good first step would be to make all APIs and CLIs thread safe.
When an API/CLI is thread safe, that must be flagged through the is_mp_safe 
flag.
It is quite likely that many already are, but haven't been flagged as such.

Best regards,
Ole


> On 31 Aug 2017, at 19:07, Colin Tregenza Dancer via vpp-dev 
>  wrote:
> 
> I’ve been doing quite a bit of investigation since my last email, in 
> particular adding instrumentation on barrier calls to report 
> open/lowering/closed/raising times, along with calling trees and nesting 
> levels.
> 
> As a result I believe I now have a clearer understanding of what’s leading to 
> the packet loss I’m observing when using the API, along with some code 
> changes which in my testing reliably eliminate the 500K packet loss I was 
> previously observing.
> 
> Would either of you (or anyone else on the list) be able to offer their 
> opinions on my understanding of the causes, along with my proposed solutions?
> 
> Thanks in advance,
> 
> Colin.
> -
> In terms of observed barrier hold times, I’m seeing two main issues related 
> to API calls:
> 
>   • When I issue a long string of async API commands, there is no logic 
> (at least in the version of VPP I’m using) to space out their processing.  As 
> a result, if there is a queue of requests, the barrier is opened for just a 
> few us between API calls, before lowering again.  This is enough to start one 
> burst of packet processing per worker thread (I can see the barrier lower 
> ends up taking ~100us), but over time not enough to keep up with the input 
> traffic.
> 
>   • Whilst many API calls close the barrier for between a few 10’s of 
> microseconds and a few hundred microseconds, there are a number of calls 
> where this extends from 500us+ into the multiple ms range (which obviously 
> causes the Rx ring buffers to overflow).  The particular API calls where I’ve 
> seen this include:  ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
> sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may 
> be others which I’m not currently calling).
> 
> Digging into the call stacks, I can see that in each case there are multiple 
> calls to vlib_node_runtime_update()  (I assume one for each node changed), 
> and each of these calls invokes vlib_worker_thread_node_runtime_update() just 
> before returning (I assume to sync the per thread datastructures with the 
> updated graph).  The observed execution time for 
> vlib_worker_thread_node_runtime_update() seems to vary with load, config 
> size, etc, but times of between 400us and 800us per call are not atypical in 
> my setup.  If there are 5 or 6 invocations of this function per API call, we 
> therefore rapidly get to a situation where the barrier is held for multiple 
> ms.
> 
> The two workarounds I’ve been using are both changes to vlib/vlib/threads.c :
> 
>   • When closing the barrier in vlib_worker_thread_barrier_sync (but not 
> for recursive invocations), if it hasn’t been open for at least a certain 
> minimum period of time (I’ve been running with 300us), then spin until this 
> minimum is reached, before closing.  This ensures that whatever the source of 
> the barrier sync (API, command line, etc), the datapath is always allowed a 
> fair fraction of time to run. (I’ve got in mind various adaptive ways to 
> setting the delay, including a rolling measure of open period over say the 
> last 1ms, and/or Rx ring state, but for initial testing a fixed value seemed 
> easiest.)
> 
>   • From my (potentially superficial) code read, it looks like 
> vlib_worker_thread_node_runtime_update() could be called once to update the 
> workers with multiple node changes (as long as the barrier remains closed 
> between changes), rather than having to be called for each individual change.
> 
> I have therefore tweaked vlib_worker_thread_node_runtime_update(), so that 
> instead of doing the update to the per thread data structures, by default it 
> simply increments a count and returns.  The count is cleared each time the 
> barrier is closed in vlib_worker_thread_barrier_sync()  (but not for 
> recursive invocations), and if it is non-zero when 
> vlib_worker_thread_barrier_release() is about to open the barrier, then 
> vlib_worker_thread_barrier_release() is called with a flag which causes it to 
> actually do the updating.  This means that the per thread data structures are 
> only updated once per API call, rather than for each individual node change.
> 
> In my testing this change has reduced the period for which the problem API 
> calls close the barrier, from mutiple ms, to sub-ms (generally under 500us).  
> I have not yet observed any negative consequences (though I fully accept I 
> might well have missed something).
> 
> Together these two changes eliminate the packet loss I was seeing when using 
> the API under load.
> 
> Views?
> 
> (Whilst the API packet loss is currently most 

Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for fd.io_vpp

2017-09-01 Thread Damjan Marion

Hi Eric,

We are building DPDK statically for several reasons:
- to keep control on dpdk version used (i.e. on ubuntu 17.04 dpdk version 
packaged is 16.11.1.)
- to allow us to quickly apply custom patches

Are you planning to use dpdk version which includes marvell drivers[1] ?

[1] https://github.com/MarvellEmbeddedProcessors/dpdk-marvell 




> On 1 Sep 2017, at 10:27, Eric Chen  wrote:
> 
> External DPDK,  I checked by default, DPDK built as static library. (def_xxx)
>  
> So how to set in Fdio to not use shared,  any easy configuration?
>  
> Thanks
> Eric
>  
> From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.mon...@intel.com] 
> Sent: 2017年9月1日 16:19
> To: Eric Chen ; Damjan Marion 
> Cc: Dave Barach ; vpp-dev 
> Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
> fd.io_vpp
>  
> Hi Eric,
> 
> Are you building against an external DPDK or using the one built by VPP?
> VPP does add -fPIC to DPDK CPU_CFLAGS when it builds it from source.
> 
> Thanks,
> Sergio
> 
> On 01/09/2017 05:36, Eric Chen wrote:
> Hi Damjan,
>  
> Following your suggestion, I upgrade my Ubuntu to Zesty (17.04),
>  
> gcc version 6.3.0 20170406 (Ubuntu/Linaro 6.3.0-12ubuntu2)
>  
> the previous issue gone,
>  
> however did you meet below issue before:
> it happens both when I build dpaa2(over dpdk) and marvell (over marvell-dpdk).
> I checked dpdk when build lib, there is no –FPIC option,
> So how to fix it?
>  
>  
> /usr/bin/ld: 
> /home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o): 
> relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol 
> `__stack_chk_guard@@GLIBC_2.17' can not be used when making a shared object; 
> recompile with -fPIC
> /usr/bin/ld: 
> /home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o)(.text+0x44):
>  unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol 
> `__stack_chk_guard@@GLIBC_2.17'
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
>  
>  
> Thanks
> Eric
>  
> From: Damjan Marion [mailto:dmarion.li...@gmail.com 
> ] 
> Sent: 2017年8月27日 3:11
> To: Eric Chen  
> Cc: Dave Barach  ; Sergio 
> Gonzalez Monroy  
> ; vpp-dev  
> 
> Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
> fd.io_vpp
>  
> Hi Eric,
>  
> Same code compiles perfectly fine on ARM64 with newer gcc version.
>  
> If you are starting new development cycle it makes sense to me that you pick 
> up latest ubuntu release,
> specially when new hardware is involved instead of trying to chase this kind 
> of bugs.
>  
> Do you have any strong reason to stay on ubuntu 16.04? Both 17.04 and 
> upcoming 17.10 are working fine on arm64 and
> compiling of VPP works without issues.
>  
> Thanks,
>  
> Damjan
>  
>  
> On 26 Aug 2017, at 15:23, Eric Chen  > wrote:
>  
> Dave,
>  
> Thanks for your answer.
> I tried below variation, it doesn’t help.
>  
> Btw, there is not only one place reporting “error: unable to generate reloads 
> for:”,
>  
> I will try to checkout the version of 17.01.1,
> since with the same native compiler, I succeeded to build fd.io_odp4vpp 
> (which is based on fd.io  17.01.1).
>  
> will keep you posted.
>  
> Thanks
> Eric
>  
> From: Dave Barach (dbarach) [mailto:dbar...@cisco.com 
> ] 
> Sent: 2017年8月26日 20:08
> To: Eric Chen mailto:eri...@marvell.com>>; Sergio 
> Gonzalez Monroy  >; vpp-dev  >
> Subject: RE: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
> fd.io_vpp
>  
> Just so everyone knows, the function in question is almost too simple for its 
> own good: <>
>  
> always_inline uword
> vlib_process_suspend_time_is_zero (f64 dt)
> {
>   return dt < 10e-6;
> }
>  
> What happens if you try this variation?
>  
> always_inline int
> vlib_process_suspend_time_is_zero (f64 dt)
> {
>   if (dt < 10e-6)
>  return 1;
>   return 0;
> }
>  
> This does look like a gcc bug, but it may not be hard to work around...
>  
> Thanks… Dave
>  
> From: vpp-dev-boun...@lists.fd.io  
> [mailto:vpp-dev-boun...@lists.fd.io ] On 
> Behalf Of Eric Chen
> Sent: Friday, August 25, 2017 11:02 PM
> To: Eric Chen mailto:eri...@marvell.com>>; Sergio 
> Gonzalez Monroy  >; vpp-dev  >
> Subject: Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for 
> fd.io_vpp
>  
> Hi Sergio, 
>  
> I upgrading to Ubuntu 16.04,
>  
> Succedd to Nativly build fd.io_odp4vpp (w / odp-linux), 
> However when buidl fd.io_vpp (w/ dpdk),  it reported below error,
> (almost the same , only

Re: [vpp-dev] why passive time must be greater than active timer in ipfix plugin?

2017-09-01 Thread Ole Troan
H Chu,

>   I found ipfix plugin declares that passive time must be greater than 
> active timer. At Cisco router, we configure cache active timeout or inactive 
> timeout for ipfix or netflow. I think that active timer in vpp compares to 
> active timeout in Cisco router and passive timer in vpp compares to inactive 
> timeout in Cisco router. Howerver, At Cisco router, we always configure 
> inactive timeout smaller than active timeout.
> Why vpp is different from Cisco router at ipfix? Please help me! 
> Thanks.

These two timers determine flow expiration.
The active timer denotes the frequency of how often the metering process expire 
a flow record for an active flow.
It does not delete the flow record from the cache. E.g. an active timer of 5 
seconds, will for an active flow export a flow record roughly every seconds.

The passive timer states the time that an inactive flow record should be kept 
in the cache.

If the packet arrival rate of a flow is one packet every 10 seconds. The active 
timer is 30 seconds and the passive timer is 5 seconds, then the passive timer 
would win, and expire the flow record before the active timer fired. I.e. you'd 
get a single packet per flow.

You are right though, these two timers are independent so we can probably 
remove that check.
Feel like adding a JIRA ticket?

For full disclosure the active timer isn't really a timer, it's a time stamp 
that we check on packet arrival. Which means that you might be left with the 
last packet in a flow being dealt with by the passive timer.

Best regards,
Ole


signature.asc
Description: Message signed with OpenPGP
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Colin Tregenza Dancer via vpp-dev
Hi Ole,

Thanks for the quick reply.

I did think about making all the commands we use is_mp_safe, but was both 
concerned about the extent of the work, and the potential for introducing 
subtle bugs.  I also didn't think it would help my key problem, which is the 
multi-ms commands which make multiple calls to vlib_node_runtime_update(), not 
least because it seemed likely that I'd need to hold the barrier across the 
multiple node changes in a single API call (to avoid inconsistent intermediate 
states).

Do you have any thoughts on the change to call 
vlib_worker_thread_node_runtime_update() a single time just before releasing 
the barrier?  It seems to work fine, but I'm keen to get input from someone who 
has been working on the codebase for longer.


More generally, even with my changes, vlib_worker_thread_node_runtime_update() 
is the single function which holds the barrier for longer than all other 
elements, and is the one which therefore most runs the risk of causing Rx 
overflow.  

Detailed profiling showed that for my setup, ~40-50% of the time is taken in 
"/* re-fork nodes */" with the memory functions used to allocate the new clone 
nodes, and free the old clones.  Given that we know the number of nodes at the 
start of the loop, and given that (as far as I can tell) new clone nodes aren't 
altered between calls to the update function, I tried a change to allocate/free 
all the nodes as a single block (whilst still cloning and inserting them as 
before). I needed to make a matching change in the "/* fork nodes */" code in 
start_workers(), (and probably need to make a matching change in the 
termination code,) but in testing this almost halves the execution time of 
vlib_worker_thread_node_runtime_update() without any obvious problems. 

Having said that, the execution time of the node cloning remains O(M.N), where 
M is the number of threads and N the number of nodes.  This is reflected in the 
fact that when I try on larger system (i.e. more workers and more nodes) I 
again suffer packet loss because this one function is holding the barrier for 
multiple ms.

The change I'm currently working on is to try and reduce to delay to O(N) by 
getting the worker threads to clone their own data structures in parallel.  I'm 
doing this by extending their busy wait on the barrier, to also include looking 
for a flag telling them to rebuild their data structures.  When the main thread 
is about to release the barrier, and decides it needs a rebuild, I was going to 
get it to do the relatively quick stats scraping, then sets the flag telling 
the workers to rebuild their clones.  The rebuild will then happen on all the 
workers in parallel (which looking at the code seems to be safe), and only when 
all the cloning is done, will the main thread actually release the barrier.  

I hope to get results from this soon, and will let you know how it goes, but 
again I'm very keen to get other people's views.

Cheers,

Colin.

-Original Message-
From: Ole Troan [mailto:otr...@employees.org] 
Sent: 01 September 2017 09:37
To: Colin Tregenza Dancer 
Cc: Neale Ranns (nranns) ; Florin Coras 
; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Colin,

Good investigation!

A good first step would be to make all APIs and CLIs thread safe.
When an API/CLI is thread safe, that must be flagged through the is_mp_safe 
flag.
It is quite likely that many already are, but haven't been flagged as such.

Best regards,
Ole


> On 31 Aug 2017, at 19:07, Colin Tregenza Dancer via vpp-dev 
>  wrote:
> 
> I’ve been doing quite a bit of investigation since my last email, in 
> particular adding instrumentation on barrier calls to report 
> open/lowering/closed/raising times, along with calling trees and nesting 
> levels.
> 
> As a result I believe I now have a clearer understanding of what’s leading to 
> the packet loss I’m observing when using the API, along with some code 
> changes which in my testing reliably eliminate the 500K packet loss I was 
> previously observing.
> 
> Would either of you (or anyone else on the list) be able to offer their 
> opinions on my understanding of the causes, along with my proposed solutions?
> 
> Thanks in advance,
> 
> Colin.
> -
> In terms of observed barrier hold times, I’m seeing two main issues related 
> to API calls:
> 
>   • When I issue a long string of async API commands, there is no logic 
> (at least in the version of VPP I’m using) to space out their processing.  As 
> a result, if there is a queue of requests, the barrier is opened for just a 
> few us between API calls, before lowering again.  This is enough to start one 
> burst of packet processing per worker thread (I can see the barrier lower 
> ends up taking ~100us), but over time not enough to keep up with the input 
> traffic.
> 
>   • Whilst many API calls close the barrier for between a few 10’s of 
> microseconds and a few hundred microseconds, there are a number

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Dave Barach (dbarach)
Dear Colin,

Please describe the scenario which leads to vlib_node_runtime_update(). I 
wouldn't mind having a good long stare at the situation. 

I do like the parallel data structure update approach that you've described, 
tempered with the realization that it amounts to "collective brain surgery." I 
had more than enough trouble making the data structure fork-and-update code 
work reliably in the first place. 

Thanks… Dave

-Original Message-
From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of Colin Tregenza Dancer via vpp-dev
Sent: Friday, September 1, 2017 6:12 AM
To: Ole Troan 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Hi Ole,

Thanks for the quick reply.

I did think about making all the commands we use is_mp_safe, but was both 
concerned about the extent of the work, and the potential for introducing 
subtle bugs.  I also didn't think it would help my key problem, which is the 
multi-ms commands which make multiple calls to vlib_node_runtime_update(), not 
least because it seemed likely that I'd need to hold the barrier across the 
multiple node changes in a single API call (to avoid inconsistent intermediate 
states).

Do you have any thoughts on the change to call 
vlib_worker_thread_node_runtime_update() a single time just before releasing 
the barrier?  It seems to work fine, but I'm keen to get input from someone who 
has been working on the codebase for longer.


More generally, even with my changes, vlib_worker_thread_node_runtime_update() 
is the single function which holds the barrier for longer than all other 
elements, and is the one which therefore most runs the risk of causing Rx 
overflow.  

Detailed profiling showed that for my setup, ~40-50% of the time is taken in 
"/* re-fork nodes */" with the memory functions used to allocate the new clone 
nodes, and free the old clones.  Given that we know the number of nodes at the 
start of the loop, and given that (as far as I can tell) new clone nodes aren't 
altered between calls to the update function, I tried a change to allocate/free 
all the nodes as a single block (whilst still cloning and inserting them as 
before). I needed to make a matching change in the "/* fork nodes */" code in 
start_workers(), (and probably need to make a matching change in the 
termination code,) but in testing this almost halves the execution time of 
vlib_worker_thread_node_runtime_update() without any obvious problems. 

Having said that, the execution time of the node cloning remains O(M.N), where 
M is the number of threads and N the number of nodes.  This is reflected in the 
fact that when I try on larger system (i.e. more workers and more nodes) I 
again suffer packet loss because this one function is holding the barrier for 
multiple ms.

The change I'm currently working on is to try and reduce to delay to O(N) by 
getting the worker threads to clone their own data structures in parallel.  I'm 
doing this by extending their busy wait on the barrier, to also include looking 
for a flag telling them to rebuild their data structures.  When the main thread 
is about to release the barrier, and decides it needs a rebuild, I was going to 
get it to do the relatively quick stats scraping, then sets the flag telling 
the workers to rebuild their clones.  The rebuild will then happen on all the 
workers in parallel (which looking at the code seems to be safe), and only when 
all the cloning is done, will the main thread actually release the barrier.  

I hope to get results from this soon, and will let you know how it goes, but 
again I'm very keen to get other people's views.

Cheers,

Colin.

-Original Message-
From: Ole Troan [mailto:otr...@employees.org] 
Sent: 01 September 2017 09:37
To: Colin Tregenza Dancer 
Cc: Neale Ranns (nranns) ; Florin Coras 
; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Colin,

Good investigation!

A good first step would be to make all APIs and CLIs thread safe.
When an API/CLI is thread safe, that must be flagged through the is_mp_safe 
flag.
It is quite likely that many already are, but haven't been flagged as such.

Best regards,
Ole


> On 31 Aug 2017, at 19:07, Colin Tregenza Dancer via vpp-dev 
>  wrote:
> 
> I’ve been doing quite a bit of investigation since my last email, in 
> particular adding instrumentation on barrier calls to report 
> open/lowering/closed/raising times, along with calling trees and nesting 
> levels.
> 
> As a result I believe I now have a clearer understanding of what’s leading to 
> the packet loss I’m observing when using the API, along with some code 
> changes which in my testing reliably eliminate the 500K packet loss I was 
> previously observing.
> 
> Would either of you (or anyone else on the list) be able to offer their 
> opinions on my understanding of the causes, along with my proposed solutions?
> 
> Thanks in advance,
> 
> Colin.
> -
> In terms o

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Colin Tregenza Dancer via vpp-dev
Hi Dave,

Thanks for looking at this.

I get repeated vlib_node_runtime_update() calls when I use the API functions:  
ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be 
others which I’m not currently calling).

To illustrate, I've included below a formatted version of my barrier trace from 
when I make an ip_neighbor_add_del API call (raw traces for the other commands 
are included at the end).  

At the point this call was made there were 3 worker threads, ~425nodes in the 
system, and a load of ~3Mpps saturating two 10G NICs.

It shows the API function name, followed by a tree of the recursive calls to 
barrier_sync/release.  On each line I show the calling function name, current 
recursion depth, and elapsed timing from the point the barrier was actually 
closed.  

[50]: ip_neighbor_add_del

<2(80us)adj_nbr_update_rewrite_internal
<3(82us)vlib_node_runtime_update{(86us)}
(86us)>
<3(87us)vlib_node_runtime_update{(90us)}
(90us)>
<3(91us)vlib_node_runtime_update{(94us)}
(95us)>
(95us)>
<2(119us)adj_nbr_update_rewrite_internal
(120us)>
(135us)>
(136us)>
{(137us)vlib_worker_thread_node_runtime_update
[179us]
[256us]
worker=1
worker=2
worker=3
(480us)}

This trace is taken on my dev branch, where I am delaying the worker thread 
updates till just before the barrier release.  In the vlib_node_runtime_update 
functions, the time stamp within the {} braces show the point as which the 
rework_required flag is set (instead of the mainline behaviour of repeatedly 
invoking vlib_worker_thread_node_runtime_update()) 

At the end you can also see the additional profiling stamps I've added at 
various points within vlib_worker_thread_node_runtime_update().  The first two 
stamps are after the two stats sync loops, then there are three lines of 
tracing for the invocations of the function I've added to contain the code for 
the per worker re-fork.  Those functions calls are further profiled at various 
points, where the gap between B & C is where the clone node alloc/copying is 
occurring, and between C & D is where the old clone nodes are being freed.  As 
you might guess from the short C-D gap, this branch also included my 
optimization to allocate/free all the clone nodes in a single block.

Having successfully tested the move of the per thread re-fork into a separate 
function, I'm about try the "collective brainsurgery" version, where I will get 
the workers to re-fork their own clone (with the barrier still held) rather 
than having in done sequentially by main.

I'll let you know how it goes...

Colin.

_Raw traces of other calls_

Sep  1 12:57:38 pocvmhost vpp[6315]: [155]: gre_add_del_tunnel
Sep  1 12:57:38 pocvmhost vpp[6315]: 
<1(96us)vlib_node_runtime_update{(99us)}(99us)><1(100us)vlib_node_runtime_update{(102us)}(103us)><1(227us)vlib_node_runtime_update{(232us)}(233us)><1(235us)vlib_node_runtime_update{(237us)}(238us)><1(308us)vlib_node_runtime_update{(313us)}(314us)><1(316us)adj_nbr_update_rewrite_internal(317us)><1(349us)adj_nbr_update_rewrite_internal(350us)>(353us)>{(354us)vlib_worker_thread_node_runtime_update[394us][462us]worker=1[423:425]worker=2[423:425]worker=3[423:425](708us)}
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42822  - O   300  D 5  C 
  708  U 0 - nested   8
Sep  1 12:57:38 pocvmhost vpp[6315]: [13]: sw_interface_set_flags
Sep  1 12:57:38 pocvmhost vpp[6315]: 
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42823  - O  1143  D70  C 
   46  U 0 - nested   0
Sep  1 12:57:38 pocvmhost vpp[6315]: [85]: create_loopback
Sep  1 12:57:38 pocvmhost vpp[6315]: 
<1(25us)vlib_node_runtime_update{(27us)}(27us)><1(28us)vlib_node_runtime_update{(30us)}(30us)><1(44us)vlib_node_runtime_update{(47us)}(48us)><1(50us)vlib_node_runtime_update{(51us)}(52us)>(54us)>{(54us)vlib_worker_thread_node_runtime_update[70us][103us]worker=1[425:427]worker=2[425:427]worker=3[425:427](307us)}
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42824  - O   300  D67  C 
  307  U 0 - nested   5
Sep  1 12:57:38 pocvmhost vpp[6315]: [113]: sw_interface_set_l2_bridge
Sep  1 12:57:38 pocvmhost vpp[6315]: 
(25us)>{(25us)vlib_worker_thread_node_runtime_update[43us][76us]worker=1[427:427]worker=2[427:427]worker=3[427:427](259us)}
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42825  - O  1140  D10  C 
  259  U 0 - nested   1
Sep  1 12:57:38 pocvmhost vpp[6315]: [16]: sw_interface_add_del_address
Sep  1 12:57:38 pocvmhost vpp[6315]: 
(70us)>{(71us)vlib_worker_thread_node_runtime_update[87us][115us]worker=1[427:427]worker=2[427:427]worker=3[427:427](307us)}

-Original Message-
From: Dave Barach (dbarach) [mailto:dbar...@cisco.com] 
Sent: 0

[vpp-dev] Не удаётся просматреть код

2017-09-01 Thread Алексей Болдырев
При попытке открыть: https://gerrit.fd.io/r/
Пишет Working...
Потом:
Code Review - Error
Server Unavailable
504 Gateway Time-out

В чём причина?
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Query for IPSec support on VPP

2017-09-01 Thread Mukesh Yadav (mukyadav)
HI Sergio,

Do I have to fetch fresh clone using
git clone https://gerrit.fd.io/r/vpp
 or take some explicit release?

Which document page shall I refer,
https://docs.fd.io/vpp/17.10/dpdk_crypto_ipsec_doc.html
or some other

Thanks
Mukesh




___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev


Re: [vpp-dev] Не удаётся просматреть код

2017-09-01 Thread Damjan Marion (damarion)

Если вы отправляете электронную почту на английском, кто-то может помочь вам ….


> On 1 Sep 2017, at 15:11, Алексей Болдырев  
> wrote:
> 
> При попытке открыть: https://gerrit.fd.io/r/
> Пишет Working...
> Потом:
> Code Review - Error
> Server Unavailable
> 504 Gateway Time-out
> 
> В чём причина?
> ___
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Dave Barach (dbarach)
Dear Colin, 

Of all of these, ip_neighbor_add_del seems like the one to tune right away.

The API message handler itself can be marked mp-safe right away. Both the ip4 
and the ip6 underlying routines are thread-aware (mumble RPC mumble).

We should figure out why the FIB thinks it needs to pull the node runtime 
update lever. AFAICT, adding ip arp / ip6 neighbor adjacencies shouldn’t 
require a node runtime update, at least not in the typical case. 

Copying Neale Ranns. I don't expect to hear back immediately; he's on PTO until 
9/11. 

Thanks… Dave

-Original Message-
From: Colin Tregenza Dancer [mailto:c...@metaswitch.com] 
Sent: Friday, September 1, 2017 8:51 AM
To: Dave Barach (dbarach) ; Ole Troan 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Hi Dave,

Thanks for looking at this.

I get repeated vlib_node_runtime_update() calls when I use the API functions:  
ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be 
others which I’m not currently calling).

To illustrate, I've included below a formatted version of my barrier trace from 
when I make an ip_neighbor_add_del API call (raw traces for the other commands 
are included at the end).  

At the point this call was made there were 3 worker threads, ~425nodes in the 
system, and a load of ~3Mpps saturating two 10G NICs.

It shows the API function name, followed by a tree of the recursive calls to 
barrier_sync/release.  On each line I show the calling function name, current 
recursion depth, and elapsed timing from the point the barrier was actually 
closed.  

[50]: ip_neighbor_add_del

<2(80us)adj_nbr_update_rewrite_internal
<3(82us)vlib_node_runtime_update{(86us)}
(86us)>
<3(87us)vlib_node_runtime_update{(90us)}
(90us)>
<3(91us)vlib_node_runtime_update{(94us)}
(95us)>
(95us)>
<2(119us)adj_nbr_update_rewrite_internal
(120us)>
(135us)>
(136us)>
{(137us)vlib_worker_thread_node_runtime_update
[179us]
[256us]
worker=1
worker=2
worker=3
(480us)}

This trace is taken on my dev branch, where I am delaying the worker thread 
updates till just before the barrier release.  In the vlib_node_runtime_update 
functions, the time stamp within the {} braces show the point as which the 
rework_required flag is set (instead of the mainline behaviour of repeatedly 
invoking vlib_worker_thread_node_runtime_update()) 

At the end you can also see the additional profiling stamps I've added at 
various points within vlib_worker_thread_node_runtime_update().  The first two 
stamps are after the two stats sync loops, then there are three lines of 
tracing for the invocations of the function I've added to contain the code for 
the per worker re-fork.  Those functions calls are further profiled at various 
points, where the gap between B & C is where the clone node alloc/copying is 
occurring, and between C & D is where the old clone nodes are being freed.  As 
you might guess from the short C-D gap, this branch also included my 
optimization to allocate/free all the clone nodes in a single block.

Having successfully tested the move of the per thread re-fork into a separate 
function, I'm about try the "collective brainsurgery" version, where I will get 
the workers to re-fork their own clone (with the barrier still held) rather 
than having in done sequentially by main.

I'll let you know how it goes...

Colin.

_Raw traces of other calls_

Sep  1 12:57:38 pocvmhost vpp[6315]: [155]: gre_add_del_tunnel
Sep  1 12:57:38 pocvmhost vpp[6315]: 
<1(96us)vlib_node_runtime_update{(99us)}(99us)><1(100us)vlib_node_runtime_update{(102us)}(103us)><1(227us)vlib_node_runtime_update{(232us)}(233us)><1(235us)vlib_node_runtime_update{(237us)}(238us)><1(308us)vlib_node_runtime_update{(313us)}(314us)><1(316us)adj_nbr_update_rewrite_internal(317us)><1(349us)adj_nbr_update_rewrite_internal(350us)>(353us)>{(354us)vlib_worker_thread_node_runtime_update[394us][462us]worker=1[423:425]worker=2[423:425]worker=3[423:425](708us)}
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42822  - O   300  D 5  C 
  708  U 0 - nested   8
Sep  1 12:57:38 pocvmhost vpp[6315]: [13]: sw_interface_set_flags
Sep  1 12:57:38 pocvmhost vpp[6315]: 
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us) # 42823  - O  1143  D70  C 
   46  U 0 - nested   0
Sep  1 12:57:38 pocvmhost vpp[6315]: [85]: create_loopback
Sep  1 12:57:38 pocvmhost vpp[6315]: 
<1(25us)vlib_node_runtime_update{(27us)}(27us)><1(28us)vlib_node_runtime_update{(30us)}(30us)><1(44us)vlib_node_runtime_update{(47us)}(48us)><1(50us)vlib_node_runtime_update{(51us)}(52us)>(54us)>{(54us)vlib_worker_thread_node_runtime_update[70us][103us]worker=1[425:427]worker=2[425

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Colin Tregenza Dancer via vpp-dev
I think there is something special in this case related to the fact that we're 
adding of a new tunnel / subnet, before we issue our 63 ip_neighbor_add_del 
calls, because it is only the first call ip_neighbor_add_del which updates the 
nodes, with all of the other just doing a rewrite.

I'll mail you guys the full (long) trace offline so you can see the overall 
sequence.

Cheers,

Colin. 
-Original Message-
From: Dave Barach (dbarach) [mailto:dbar...@cisco.com] 
Sent: 01 September 2017 15:15
To: Colin Tregenza Dancer ; Ole Troan 
; Neale Ranns (nranns) 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Dear Colin, 

Of all of these, ip_neighbor_add_del seems like the one to tune right away.

The API message handler itself can be marked mp-safe right away. Both the ip4 
and the ip6 underlying routines are thread-aware (mumble RPC mumble).

We should figure out why the FIB thinks it needs to pull the node runtime 
update lever. AFAICT, adding ip arp / ip6 neighbor adjacencies shouldn’t 
require a node runtime update, at least not in the typical case. 

Copying Neale Ranns. I don't expect to hear back immediately; he's on PTO until 
9/11. 

Thanks… Dave

-Original Message-
From: Colin Tregenza Dancer [mailto:c...@metaswitch.com]
Sent: Friday, September 1, 2017 8:51 AM
To: Dave Barach (dbarach) ; Ole Troan 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Hi Dave,

Thanks for looking at this.

I get repeated vlib_node_runtime_update() calls when I use the API functions:  
ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be 
others which I’m not currently calling).

To illustrate, I've included below a formatted version of my barrier trace from 
when I make an ip_neighbor_add_del API call (raw traces for the other commands 
are included at the end).  

At the point this call was made there were 3 worker threads, ~425nodes in the 
system, and a load of ~3Mpps saturating two 10G NICs.

It shows the API function name, followed by a tree of the recursive calls to 
barrier_sync/release.  On each line I show the calling function name, current 
recursion depth, and elapsed timing from the point the barrier was actually 
closed.  

[50]: ip_neighbor_add_del

<2(80us)adj_nbr_update_rewrite_internal
<3(82us)vlib_node_runtime_update{(86us)}
(86us)>
<3(87us)vlib_node_runtime_update{(90us)}
(90us)>
<3(91us)vlib_node_runtime_update{(94us)}
(95us)>
(95us)>
<2(119us)adj_nbr_update_rewrite_internal
(120us)>
(135us)>
(136us)>
{(137us)vlib_worker_thread_node_runtime_update
[179us]
[256us]
worker=1
worker=2
worker=3
(480us)}

This trace is taken on my dev branch, where I am delaying the worker thread 
updates till just before the barrier release.  In the vlib_node_runtime_update 
functions, the time stamp within the {} braces show the point as which the 
rework_required flag is set (instead of the mainline behaviour of repeatedly 
invoking vlib_worker_thread_node_runtime_update()) 

At the end you can also see the additional profiling stamps I've added at 
various points within vlib_worker_thread_node_runtime_update().  The first two 
stamps are after the two stats sync loops, then there are three lines of 
tracing for the invocations of the function I've added to contain the code for 
the per worker re-fork.  Those functions calls are further profiled at various 
points, where the gap between B & C is where the clone node alloc/copying is 
occurring, and between C & D is where the old clone nodes are being freed.  As 
you might guess from the short C-D gap, this branch also included my 
optimization to allocate/free all the clone nodes in a single block.

Having successfully tested the move of the per thread re-fork into a separate 
function, I'm about try the "collective brainsurgery" version, where I will get 
the workers to re-fork their own clone (with the barrier still held) rather 
than having in done sequentially by main.

I'll let you know how it goes...

Colin.

_Raw traces of other calls_

Sep  1 12:57:38 pocvmhost vpp[6315]: [155]: gre_add_del_tunnel Sep  1 12:57:38 
pocvmhost vpp[6315]: 
<1(96us)vlib_node_runtime_update{(99us)}(99us)><1(100us)vlib_node_runtime_update{(102us)}(103us)><1(227us)vlib_node_runtime_update{(232us)}(233us)><1(235us)vlib_node_runtime_update{(237us)}(238us)><1(308us)vlib_node_runtime_update{(313us)}(314us)><1(316us)adj_nbr_update_rewrite_internal(317us)><1(349us)adj_nbr_update_rewrite_internal(350us)>(353us)>{(354us)vlib_worker_thread_node_runtime_update[394us][462us]worker=1[423:425]worker=2[423:425]worker=3[423:425](708us)}
Sep  1 12:57:38 pocvmhost vpp[6315]: Barrier(us

Re: [vpp-dev] [EXT] Re: compiling error natively on an am64 box for fd.io_vpp

2017-09-01 Thread Burt Silverman
Eric,

Have you updated your Ubuntu after the initial upgrade to 17.04? You have
libc version 2.17. I imagine it should be closer to 2.24, although I am
looking at the x86_64 architecture. There was a glibc bug, and we saw what
looks like the same problem back in April, see *Build failure with latest
VPP* on the mailing list archive. I thought it was in 2.22 and fixed in
2.23. See if you can get a newer libc and make the problem disappear. I
also found some workaround with -pie but that seemed to be needed only with
the miscreant libc. Hope this helps.

Burt

On Fri, Sep 1, 2017 at 12:36 AM, Eric Chen  wrote:

> Hi Damjan,
>
>
>
> Following your suggestion, I upgrade my Ubuntu to Zesty (17.04),
>
>
>
> gcc version 6.3.0 20170406 (Ubuntu/Linaro 6.3.0-12ubuntu2)
>
>
>
> the previous issue gone,
>
>
>
> however did you meet below issue before:
>
> it happens both when I build dpaa2(over dpdk) and marvell (over
> marvell-dpdk).
>
> I checked dpdk when build lib, there is no –FPIC option,
>
> So how to fix it?
>
>
>
>
>
> /usr/bin/ld: 
> /home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_ena.a(ena_ethdev.o):
> relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol
> `__stack_chk_guard@@GLIBC_2.17' can not be used when making a shared
> object; recompile with -fPIC
>
> /usr/bin/ld: /home/ericxh/work/git_work/dpdk/build//lib/librte_pmd_
> ena.a(ena_ethdev.o)(.text+0x44): unresolvable R_AARCH64_ADR_PREL_PG_HI21
> relocation against symbol `__stack_chk_guard@@GLIBC_2.17'
>
> /usr/bin/ld: final link failed: Bad value
>
> collect2: error: ld returned 1 exit status
>
>
>
>
>
> Thanks
>
> Eric
>
>
>
> *From:* Damjan Marion [mailto:dmarion.li...@gmail.com]
> *Sent:* 2017年8月27日 3:11
> *To:* Eric Chen 
> *Cc:* Dave Barach ; Sergio Gonzalez Monroy <
> sergio.gonzalez.mon...@intel.com>; vpp-dev 
> *Subject:* Re: [vpp-dev] [EXT] Re: compiling error natively on an am64
> box for fd.io_vpp
>
>
>
> Hi Eric,
>
>
>
> Same code compiles perfectly fine on ARM64 with newer gcc version.
>
>
>
> If you are starting new development cycle it makes sense to me that you
> pick up latest ubuntu release,
>
> specially when new hardware is involved instead of trying to chase this
> kind of bugs.
>
>
>
> Do you have any strong reason to stay on ubuntu 16.04? Both 17.04 and
> upcoming 17.10 are working fine on arm64 and
>
> compiling of VPP works without issues.
>
>
>
> Thanks,
>
>
>
> Damjan
>
>
>
>
>
> On 26 Aug 2017, at 15:23, Eric Chen  wrote:
>
>
>
> Dave,
>
>
>
> Thanks for your answer.
>
> I tried below variation, it doesn’t help.
>
>
>
> Btw, there is not only one place reporting “error: unable to generate
> reloads for:”,
>
>
>
> I will try to checkout the version of 17.01.1,
>
> since with the same native compiler, I succeeded to build fd.io_odp4vpp
> (which is based on fd.io 17.01.1).
>
>
>
> will keep you posted.
>
>
>
> Thanks
>
> Eric
>
>
>
> *From:* Dave Barach (dbarach) [mailto:dbar...@cisco.com
> ]
> *Sent:* 2017年8月26日 20:08
> *To:* Eric Chen ; Sergio Gonzalez Monroy <
> sergio.gonzalez.mon...@intel.com>; vpp-dev 
> *Subject:* RE: [vpp-dev] [EXT] Re: compiling error natively on an am64
> box for fd.io_vpp
>
>
>
> Just so everyone knows, the function in question is almost too simple for
> its own good:
>
>
>
> always_inline uword
>
> vlib_process_suspend_time_is_zero (f64 dt)
>
> {
>
>   return dt < 10e-6;
>
> }
>
>
>
> What happens if you try this variation?
>
>
>
> always_inline int
>
> vlib_process_suspend_time_is_zero (f64 dt)
>
> {
>
>   if (dt < 10e-6)
>
>  return 1;
>
>   return 0;
>
> }
>
>
>
> This does look like a gcc bug, but it may not be hard to work around...
>
>
>
> Thanks… Dave
>
>
>
> *From:* vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io
> ] *On Behalf Of *Eric Chen
> *Sent:* Friday, August 25, 2017 11:02 PM
> *To:* Eric Chen ; Sergio Gonzalez Monroy <
> sergio.gonzalez.mon...@intel.com>; vpp-dev 
> *Subject:* Re: [vpp-dev] [EXT] Re: compiling error natively on an am64
> box for fd.io_vpp
>
>
>
> Hi Sergio,
>
>
>
> I upgrading to Ubuntu 16.04,
>
>
>
> Succedd to Nativly build fd.io_odp4vpp (w / odp-linux),
>
> However when buidl fd.io_vpp (w/ dpdk),  it reported below error,
>
> (almost the same , only difference is over dpdk or odp-linux)
>
>
>
> Anyone met before? Seem a bug of gcc.
>
>
>
> In file included from /home/ericxh/work/git_work/fd.
> io_vpp/build-data/../src/vlib/error_funcs.h:43:0,
>
>  from /home/ericxh/work/git_work/fd.
> io_vpp/build-data/../src/vlib/vlib.h:70,
>
>  from /home/ericxh/work/git_work/fd.
> io_vpp/build-data/../src/vnet/l2/l2_fib.c:19:
>
> /home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:
> In function ‘vlib_process_suspend_time_is_zero’:
>
> /home/ericxh/work/git_work/fd.io_vpp/build-data/../src/vlib/node_funcs.h:442:1:
> error: unable to generate reloads for:
>
> }
>
> ^
>
> (insn 11 37 12 2 (set (reg:CCFPE 66 cc)
>
> (compare:CCFPE (reg:DF 79)
>
> (reg:DF 80))) /home/ericx

Re: [vpp-dev] Packet loss on use of API & cmdline

2017-09-01 Thread Dave Barach (dbarach)
Dear Colin,

That makes total sense... Tunnels are modelled as "magic interfaces" 
[especially] in the encap direction. Each tunnel has an output node, which 
means that the [first] FIB entry will need to add a graph arc.

A bit of "show vlib graph" action will confirm that... 

Thanks… Dave

-Original Message-
From: Colin Tregenza Dancer [mailto:c...@metaswitch.com] 
Sent: Friday, September 1, 2017 11:01 AM
To: Dave Barach (dbarach) ; Ole Troan 
; Neale Ranns (nranns) 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

I think there is something special in this case related to the fact that we're 
adding of a new tunnel / subnet, before we issue our 63 ip_neighbor_add_del 
calls, because it is only the first call ip_neighbor_add_del which updates the 
nodes, with all of the other just doing a rewrite.

I'll mail you guys the full (long) trace offline so you can see the overall 
sequence.

Cheers,

Colin. 
-Original Message-
From: Dave Barach (dbarach) [mailto:dbar...@cisco.com] 
Sent: 01 September 2017 15:15
To: Colin Tregenza Dancer ; Ole Troan 
; Neale Ranns (nranns) 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Dear Colin, 

Of all of these, ip_neighbor_add_del seems like the one to tune right away.

The API message handler itself can be marked mp-safe right away. Both the ip4 
and the ip6 underlying routines are thread-aware (mumble RPC mumble).

We should figure out why the FIB thinks it needs to pull the node runtime 
update lever. AFAICT, adding ip arp / ip6 neighbor adjacencies shouldn’t 
require a node runtime update, at least not in the typical case. 

Copying Neale Ranns. I don't expect to hear back immediately; he's on PTO until 
9/11. 

Thanks… Dave

-Original Message-
From: Colin Tregenza Dancer [mailto:c...@metaswitch.com]
Sent: Friday, September 1, 2017 8:51 AM
To: Dave Barach (dbarach) ; Ole Troan 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Hi Dave,

Thanks for looking at this.

I get repeated vlib_node_runtime_update() calls when I use the API functions:  
ip_neighbor_add_del, gre_add_del_tunnel, create_loopback, 
sw_interface_set_l2_bridge & sw_interface_add_del_address (thought there may be 
others which I’m not currently calling).

To illustrate, I've included below a formatted version of my barrier trace from 
when I make an ip_neighbor_add_del API call (raw traces for the other commands 
are included at the end).  

At the point this call was made there were 3 worker threads, ~425nodes in the 
system, and a load of ~3Mpps saturating two 10G NICs.

It shows the API function name, followed by a tree of the recursive calls to 
barrier_sync/release.  On each line I show the calling function name, current 
recursion depth, and elapsed timing from the point the barrier was actually 
closed.  

[50]: ip_neighbor_add_del

<2(80us)adj_nbr_update_rewrite_internal
<3(82us)vlib_node_runtime_update{(86us)}
(86us)>
<3(87us)vlib_node_runtime_update{(90us)}
(90us)>
<3(91us)vlib_node_runtime_update{(94us)}
(95us)>
(95us)>
<2(119us)adj_nbr_update_rewrite_internal
(120us)>
(135us)>
(136us)>
{(137us)vlib_worker_thread_node_runtime_update
[179us]
[256us]
worker=1
worker=2
worker=3
(480us)}

This trace is taken on my dev branch, where I am delaying the worker thread 
updates till just before the barrier release.  In the vlib_node_runtime_update 
functions, the time stamp within the {} braces show the point as which the 
rework_required flag is set (instead of the mainline behaviour of repeatedly 
invoking vlib_worker_thread_node_runtime_update()) 

At the end you can also see the additional profiling stamps I've added at 
various points within vlib_worker_thread_node_runtime_update().  The first two 
stamps are after the two stats sync loops, then there are three lines of 
tracing for the invocations of the function I've added to contain the code for 
the per worker re-fork.  Those functions calls are further profiled at various 
points, where the gap between B & C is where the clone node alloc/copying is 
occurring, and between C & D is where the old clone nodes are being freed.  As 
you might guess from the short C-D gap, this branch also included my 
optimization to allocate/free all the clone nodes in a single block.

Having successfully tested the move of the per thread re-fork into a separate 
function, I'm about try the "collective brainsurgery" version, where I will get 
the workers to re-fork their own clone (with the barrier still held) rather 
than having in done sequentially by main.

I'll let you know how it goes...

Colin.

_Raw traces of other calls_

Sep  1 12:57:38 pocvmhost vpp[6315]: [155]: gre_add_d

[vpp-dev] Can I use VPP as a P / PE router in large MPLS networks.

2017-09-01 Thread Алексей Болдырев
I interisuet sluduyuschie:
1. Can VPP be used as a P / PE router in large MPLS networks, and what can be 
the limitations?
2. How many MPLS labels can be put on one MPLS package?
3. Whether it is possible to use ldpd protocol from frrouting for exchange of 
labels through tap-eject
4. How does the tap-eject mechanism work?
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev


[vpp-dev] how to increase maximum number of sessions which are supported in ACL session hash table (default value is 1, 000, 000))

2017-09-01 Thread khers
Hi Andrew

I increased the value of "ACL_FA_CONN_TABLE_DEFAULT_MAX_ENTRIES" parameter
in fa_node.h, because I want to improve the performance of statefull acl
when there are sessions more than 10,000,000 .

After doing this change , I compiled the source code and run vpp binary,
but it seems that vpp don't have enough memory. I suppose that
ACL_FA_CONN_TABLE_DEFAULT_HASH_MEMORY_SIZE parameter should  be changed
too. but I don't know what change I should do so the value of this
parameter would be valid.

I changed it to 2^31 but I have problem after testing.  Indeed, I have not
understood the relation between max entries and memory size. Can you help
me in this situation? thanks in advance for your answer


Best Regards,

Kenny
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev