Re: [vpp-dev] Contribution of DPDK plugin in VPP virtual memory size

2019-03-25 Thread Kingwel Xie
Looks like your are running an old vPP…

You can check phymem allocation by running CLI:

vpp# show physmem
vpp# show dpdk physmem

The master code is managing phymem by itself, in other words, allocating 
buffers from huge pages hold by vPP instead of DPDK. Typically, it will use 2 
huge pages unless you want many buffers per numa.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of siddarth rai
Sent: 2019年3月25日 20:27
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Contribution of DPDK plugin in VPP virtual memory size

Hello,

I was trying to figure out the contribution of different components in Virtual 
Memory Size of VPP. I am using pmap -x to check this

I see that the heap size directly contributes to it.
I am using the DPDK plugin and I can see dpdk_mbuf_pool taking up 1.2 GB per 
socket (no of mbuffs 500K per socket).

2ac0 1220608   0   0 rw-s- dpdk_mbuf_pool_socket0
2aaaf540 1220608   0   0 rw-s- dpdk_mbuf_pool_socket1


However, I see around 4GB memory being consumed by something anon. This goes 
away as soon as I disable dpdk plugin.

2b066800 4194304   0   0 -   [ anon ]

Can anyone tell me how is DPDK plugin using this extra 4 GB memory ?
Any help will be much appreciated.

Regards,
Siddarth

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12636): https://lists.fd.io/g/vpp-dev/message/12636
Mute This Topic: https://lists.fd.io/mt/30749806/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] vpp build 19.01.1 IPSec crash

2019-03-18 Thread Kingwel Xie
Well, I can not open the xz file. It is always 32B…

Anyway, patch 17889 should be always included if you want to use IPSec 
cryptodev.

From: vpp-dev@lists.fd.io  On Behalf Of Jan Gelety via 
Lists.Fd.Io
Sent: 2019年3月19日 0:25
To: Dave Barach (dbarach) ; Kingwel Xie 
; yuwei1.zh...@intel.com
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] vpp build 19.01.1 IPSec crash

+ Simon Zhang and Kingwel Xie

Hello Dave,

I appologize that I haven’t clearly stated that we used 19.01.1-release_amd64 
xenial vpp build for testing.

Anyway, I created jira ticke VPP-1597<https://jira.fd.io/browse/VPP-1597> to 
cover this issue.

@Kingwel, Simon - could you check if your commits [4] and [5] to 19.01 vpp 
branch could be fix for reported issue?

Thanks,
Jan

[4] 
https://gerrit.fd.io/r/gitweb?p=vpp.git;a=commit;h=50a392f5a0981fb442449864c479511c54145a29
[5] 
https://gerrit.fd.io/r/gitweb?p=vpp.git;a=commit;h=3e59d2eb20e73e0216c12964159c0a708482c548

From: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Sent: Monday, March 18, 2019 2:56 PM
To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) 
mailto:jgel...@cisco.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: RE: vpp build 19.01.1 IPSec crash

Do these tarballs include the .deb packages which correspond to the coredumps?

Thanks... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Jan Gelety via 
Lists.Fd.Io
Sent: Monday, March 18, 2019 9:45 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] vpp build 19.01.1 IPSec crash

Hello,

During csit tests with vpp build 19.01.1 we found out that the first and 
sometimes also second IPSec scale test is failing because of VPP crash - see 
csit log [0].

Collected core-dump files are availabe here [1] and here [2].

Could you, please, let us know if this issue is already fixed by some of later 
commits to vpp stable/1901 branch [3]?

Thanks,
Jan

PS: Instructions to extract the coredump file:
$ unxz 155255350764.tar.lzo.lrz.xz
$ lrunzip 155255350764.tar.lzo.lrz
$ lzop -d 155255350764.tar.lzo
$ tar xf 155255350764.tar

[0] 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-1901-3n-hsw/80/archives/log.html.gz#s1-s1-s1-s1-s1-t1
[1] 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-1901-3n-hsw/80/archives/155255350764.tar.lzo.lrz.xz
[2] 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-1901-3n-hsw/80/archives/155255351965.tar.lzo.lrz.xz
[3] 
https://gerrit.fd.io/r/gitweb?p=vpp.git;a=shortlog;h=refs%2Fheads%2Fstable%2F1901

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12586): https://lists.fd.io/g/vpp-dev/message/12586
Mute This Topic: https://lists.fd.io/mt/30474111/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Published: Tech paper - "Benchmarking SW Data Planes Intel Xeon Skylake vs. Broadwell"

2019-03-12 Thread Kingwel Xie
Thanks. Very helpful.

-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Maciek 
Konstantynowicz (mkonstan) via Lists.Fd.Io
Sent: 2019年3月12日 23:55
To: csit-dev ; vpp-dev ; 
disc...@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] Published: Tech paper - "Benchmarking SW Data Planes Intel 
Xeon Skylake vs. Broadwell"

Hi,

We finally managed to get the go ahead to publish above technical paper.

  Workloads tested: VPP, OVS-DPDK, DPDK(testpmd,l3fwd), CoreMark.
  Processors tested: Intel Xeon Skylake vs. Broadwell.
  Compared side-by-side: performance and efficiency of workloads/processors.

Thanks to all involved in peer reviews!

Here are the LF links:

  LFN sites:
On "Resources" page: 
  https://www.lfnetworking.org/resources/ 
Direct pdf download link: 
  
https://www.lfnetworking.org/wp-content/uploads/sites/55/2019/03/benchmarking_sw_data_planes_skx_bdx_mar07_2019.pdf
 

  FD.io sites:
On "Resources" page: 
  https://fd.io/resources/
Direct pdf download link: 
  
https://fd.io/wp-content/uploads/sites/34/2019/03/benchmarking_sw_data_planes_skx_bdx_mar07_2019.pdf

It is a sequel to our original “Benchmarking and Analysis of Software Data 
Planes” paper published back in December 2017:
  
https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf

Welcome all comments, send them to authors :)

Cheers,
-Maciek
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12501): https://lists.fd.io/g/vpp-dev/message/12501
Mute This Topic: https://lists.fd.io/mt/30403099/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ip-rewrite bug?

2019-03-05 Thread Kingwel Xie
Thanks. Look forward to the fix…

From: Dave Barach (dbarach) 
Sent: Tuesday, March 05, 2019 8:25 PM
To: Kingwel Xie ; vpp-dev@lists.fd.io
Subject: RE: ip-rewrite bug?

Let’s try to fix the underlying code in vnet_rewrite_one_header, and friends. 
The code already checks for the magic rewrite length 14. Anyhow, the 
header-rewrite scheme hasn’t had a serious performance tuning pass in a long 
time. Since it’s used everywhere, shaving a couple of clock cycles from it 
would be a Good Thing...

D.

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Tuesday, March 5, 2019 4:40 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] ip-rewrite bug?

Hi vpp-dev,

I’m looking at the ip-rewrite node, and I think it might be a bug at:

vnet_rewrite_one_header (adj0[0], ip0, sizeof (ethernet_header_t));

The adj-> rewrite_header.data_bytes might be 0 when it comes to tunnel 
interface, f.g., gtpu, ipsec. Thus, at least 8 bytes garbage are written to 
buffer unexpectedly. I came across to this because I’m doing handoff for tunnel 
interface, and in this case, 8 bytes memcpy leads to some performance loss, 
cache trashing…

Temporarily I created a patch for more discussion. Please take a look. There 
should be a better way to fix it…

https://gerrit.fd.io/r/18014

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12437): https://lists.fd.io/g/vpp-dev/message/12437
Mute This Topic: https://lists.fd.io/mt/30224541/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] ip-rewrite bug?

2019-03-05 Thread Kingwel Xie
Hi vpp-dev,

I’m looking at the ip-rewrite node, and I think it might be a bug at:

vnet_rewrite_one_header (adj0[0], ip0, sizeof (ethernet_header_t));

The adj-> rewrite_header.data_bytes might be 0 when it comes to tunnel 
interface, f.g., gtpu, ipsec. Thus, at least 8 bytes garbage are written to 
buffer unexpectedly. I came across to this because I’m doing handoff for tunnel 
interface, and in this case, 8 bytes memcpy leads to some performance loss, 
cache trashing…

Temporarily I created a patch for more discussion. Please take a look. There 
should be a better way to fix it…

https://gerrit.fd.io/r/18014

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12431): https://lists.fd.io/g/vpp-dev/message/12431
Mute This Topic: https://lists.fd.io/mt/30224541/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] ipsec dpdk backend

2019-02-04 Thread Kingwel Xie
Has anyone noticed that dpdk ipsec doesn't work on the master code?

A quick debug session shows it might be a problem of dpdk1902 API changes.

Will come back with a patch later. Probably need a week. Holiday...


 原始邮件 
主题: Re: [vpp-dev] VPP register node change upper limit
来自: "Damjan Marion via Lists.Fd.Io" 
发至: 2019年2月4日 下午3:43
抄送: Abeeha Aqeel 

It is a bit of shame that that plugin doesn’t scale. Somebody will need to 
rewrite that plugin to make it right, i.e simple use of sub-interfaces will 
likely make this limitation to dissapear...

―
Damjan

On Feb 4, 2019, at 5:56 AM, Abeeha Aqeel 
mailto:abeeha.aq...@xflowresearch.com>> wrote:


I am using the vpp pppoe plugin and that’s how its working. I do see an option 
in the vnet/interface.c to create interfaces that do not need TX nodes, but I 
am not sure how to use that.

Also I can not figure out where the nodes created along with the pppoe sessions 
are being used as they do not show up in the “show runtime” or the trace of 
packets.

Regards,

Abeeha

From: Abeeha Aqeel
Sent: Friday, February 1, 2019 5:36 PM
Cc: vpp-dev@lists.fd.io
Subject: FW: [vpp-dev] VPP register node change upper limit




From: Abeeha Aqeel
Sent: Friday, February 1, 2019 5:32 PM
To: dmar...@me.com
Subject: RE: [vpp-dev] VPP register node change upper limit

I am using the vpp pppoe plugin and that’s how its working. I do see an option 
in the vnet/interface.c to create interfaces that do not need TX nodes, but I 
am not sure how to use that.

Also I can not figure out where the nodes created along with the pppoe sessions 
are being used as they do not show up in the “show runtime” or the trace of 
packets.



From: Damjan Marion via Lists.Fd.Io
Sent: Friday, February 1, 2019 5:23 PM
To: Abeeha Aqeel
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP register node change upper limit



On 1 Feb 2019, at 11:32, Abeeha Aqeel 
mailto:abeeha.aq...@xflowresearch.com>> wrote:

Dear All,

I am trying to create 64k PPPoE sessions with VPP but VPP crashes after 
creating 216 sessions each time. From the system logs it seems that it crashes 
while trying to register a node and that node’s index is greater than the limit 
(1024). (attached screenshot of the trace)

From the “show vlib graph”, I can see that two new nodes are registered for 
each session i.e. pppoe_session0-tx and pppoe_session0-output.

Can someone guide me to how to increase the upper limit on the number of nodes?

Currently number of nodes is limited by buffer metadata space, and the way how 
we calculate node errors (vlib_error_t).
Currently vlib_error_t is u16, and 10 bits are used for node. That gives you 1 
<< 10 of node indices, so roughly
300-400 interfaces (2 nodes per interface  + other registered nodes < 1024).

This is something we can improve, but the real question is, do you really want 
to go that way.
Have you considered using some more lighter way to deal with large number of 
sessions...

--
Damjan




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12158): https://lists.fd.io/g/vpp-dev/message/12158
Mute This Topic: https://lists.fd.io/mt/29652686/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-25 Thread Kingwel Xie
Thanks for the clarify. Makes sense.


 原始邮件 
主题: Re: [vpp-dev] Question about vlib_next_frame_change_ownership
来自: Damjan Marion 
发至: 2019年1月25日 下午10:56
抄送: Dave Barach 

Just to avoid potential confusion, my recent ethernet-input change mandates that
frames are not aggregated when they are coming from input nodes. Reason for 
that is much faster
processing of the frame when we know in advance that all packets in frame share 
same sw_if_index.

In other words, input node calls vlib_get_new_next_frame() instead of 
vlib_get_next_frame()
and it stores flags and sw_if_index inside frame scalar data

At the moment this is only case in current codebase which doesn't aggregate...


On 25 Jan 2019, at 14:03, Dave Barach via Lists.Fd.Io 
mailto:dbarach=cisco@lists.fd.io>> wrote:

Dear Kingwei,

On a per-thread basis, only one input node is active at a time. In the 2x 
active input node case you sketched, the second input node will take frame 
ownership of the ethernet input frame - which should be pending but not yet 
dispatched - and add more buffer indices to it. It’s possible that the first 
frame will fill and that a second frame will be allocated and [partially] 
filled; that’s not the typical case.

One lap around the track later, the first input node will take back ownership.

Let’s say that input node 1 adds 50 packets to the ethernet-input frame, and 
input node 2 adds another 50 packets. The frame ownership change dance yields a 
vector of size 100 in ethernet-input. My guess is that the increase in 
efficiency from ethernet-input forward in the graph more than compensates for 
the fixed cost of the frame change ownership dance. This is to confirm what you 
wrote.

I usually call this effect “input aggregation.” Also holds true for the handoff 
node, especially when handing off frames from multiple threads to a single 
thread.

The alternative: dispatch two smaller frames instead of one big one. Doing that 
might not be awful if all input nodes produced a fair number of packets. The 
situation becomes awful when the 2..N input nodes produce very different 
numbers of packets, e.g. 99 and 1. Anyhow, the code doesn’t work that way and 
isn’t going to work that way, so I digress...

Thanks... Dave

From: Kingwel Xie mailto:kingwel@ericsson.com>>
Sent: Friday, January 25, 2019 5:13 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Hi Dave,

After checking the code and some debug sessions, I realized where the bug is – 
crypto-input, which is calling vlib_trace_buffer() earlier than 
get_next_frame/put_next_frame. Therefore, the next_frame->flag is overwritten 
by get_next_frame/change_owenrship. I’ve made a patch for it: 
https://gerrit.fd.io/r/17079  appreciate your comments.

Again, about vlib_next_frame_change_ownership, as I understand, this mechanism 
will always try to enqueue buffers to the owner’s frame_index, so please 
consider this scenario:

We have 2 input nodes running at the same time – dpdk-input and avf-input, they 
will try to compete with each other to enqueue to ethernet-input. Consequently, 
bad side, memcpy per frame as we discussed, and, good side, assume they both 
have 10 buffers per frame, then ethernet-input will have chance to get 20 
buffers for batch processing, good for performance. Please confirm if this is 
expected behavior of frame ownership.

Regards,
Kingwel


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via 
Lists.Fd.Io
Sent: Friday, January 25, 2019 4:52 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

The vpp packet trace which I extracted from your dispatch trace seems exactly 
as I would have expected. See below. In a pg test like this one using a 
loopback interface, anything past loopN-tx is irrelevant. The ipsec packet 
turns into an ARP request for 18.1.0.241.

In non-cyclic graph cases, we don’t end up changing frame ownership at all. In 
this case, you’re doing a double lookup. One small memcpy per frame is a 
trivial cost, especially when one remembers that the cost is amortized over all 
the packets in the frame.

Until you produce a repeatable demonstration of the claimed issue, there’s 
nothing that we can do.

Thanks... Dave

VPP Buffer Trace
Trace:
Trace: 00:00:53:959410: pg-input
Trace:   stream ipsec0, 100 bytes, 0 sw_if_index
Trace:   current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, ch

Re: [vpp-dev] RFC: buffer manager rework

2019-01-25 Thread Kingwel Xie
Thanks, for the optimizations. Will try verifying it on my setup.

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Saturday, January 26, 2019 1:09 AM
To: vpp-dev 
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] RFC: buffer manager rework


I am very close to the finish line with buffer management rework patch, and 
would like to
ask people to take a look before it is merged.

https://gerrit.fd.io/r/16638

It significantly improves performance of buffer alloc free and introduces numa 
awareness.
On my skylake platinum 8180 system, with native AVF driver observed performance 
improvement is:

- single core, 2 threads, ipv4 base forwarding test, CPU running at 2.5GHz (TB 
off):

old code - dpdk buffer manager: 20.4 Mpps
old code - old native buffer manager: 19.4 Mpps
new code: 24.9 Mpps

With DPDK drivers performance stays same as DPDK is maintaining own internal 
buffer cache.
So major perf gain should be observed in native code like: vhost-user, memif, 
AVF, host stack.

user facing changes:
to change number of buffers:
  old startup.conf:
dpdk { num-mbufs  }
  new startup.conf:
buffers { buffers-per-numa }

Internal changes:
 - free lists are deprecated
 - buffer metadata is always initialised.
 - first 64-bytes of metadata are initialised on free, so buffer alloc is very 
fast
 - DPDK mempools are not used anymore, we register custom mempool ops, and dpdk 
is taking buffers from VPP
 - to support such operation plugin can request external header space - in case 
of DPDK it stores rte_mbuf + rte_mempool_objhdr

I'm still running some tests so possible minor changes are possible, but 
nothing major expected.

--
Damjan
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12016): https://lists.fd.io/g/vpp-dev/message/12016
Mute This Topic: https://lists.fd.io/mt/29539221/925728
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [kingwel@ericsson.com]
-=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12021): https://lists.fd.io/g/vpp-dev/message/12021
Mute This Topic: https://lists.fd.io/mt/29539221/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-25 Thread Kingwel Xie
Thanks, Dave. Crystally clear! It is a deliberate design, indeed.

Regards,
Kingwel


 原始邮件 
主题: RE: [vpp-dev] Question about vlib_next_frame_change_ownership
来自: "Dave Barach (dbarach)" 
发至: 2019年1月25日 下午9:03
抄送: Kingwel Xie ,vpp-dev 
Dear Kingwei,

On a per-thread basis, only one input node is active at a time. In the 2x 
active input node case you sketched, the second input node will take frame 
ownership of the ethernet input frame - which should be pending but not yet 
dispatched - and add more buffer indices to it. It’s possible that the first 
frame will fill and that a second frame will be allocated and [partially] 
filled; that’s not the typical case.

One lap around the track later, the first input node will take back ownership.

Let’s say that input node 1 adds 50 packets to the ethernet-input frame, and 
input node 2 adds another 50 packets. The frame ownership change dance yields a 
vector of size 100 in ethernet-input. My guess is that the increase in 
efficiency from ethernet-input forward in the graph more than compensates for 
the fixed cost of the frame change ownership dance. This is to confirm what you 
wrote.

I usually call this effect “input aggregation.” Also holds true for the handoff 
node, especially when handing off frames from multiple threads to a single 
thread.

The alternative: dispatch two smaller frames instead of one big one. Doing that 
might not be awful if all input nodes produced a fair number of packets. The 
situation becomes awful when the 2..N input nodes produce very different 
numbers of packets, e.g. 99 and 1. Anyhow, the code doesn’t work that way and 
isn’t going to work that way, so I digress...

Thanks... Dave

From: Kingwel Xie 
Sent: Friday, January 25, 2019 5:13 AM
To: Dave Barach (dbarach) ; vpp-dev 
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Hi Dave,

After checking the code and some debug sessions, I realized where the bug is – 
crypto-input, which is calling vlib_trace_buffer() earlier than 
get_next_frame/put_next_frame. Therefore, the next_frame->flag is overwritten 
by get_next_frame/change_owenrship. I’ve made a patch for it: 
https://gerrit.fd.io/r/17079  appreciate your comments.

Again, about vlib_next_frame_change_ownership, as I understand, this mechanism 
will always try to enqueue buffers to the owner’s frame_index, so please 
consider this scenario:

We have 2 input nodes running at the same time – dpdk-input and avf-input, they 
will try to compete with each other to enqueue to ethernet-input. Consequently, 
bad side, memcpy per frame as we discussed, and, good side, assume they both 
have 10 buffers per frame, then ethernet-input will have chance to get 20 
buffers for batch processing, good for performance. Please confirm if this is 
expected behavior of frame ownership.

Regards,
Kingwel


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via 
Lists.Fd.Io
Sent: Friday, January 25, 2019 4:52 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

The vpp packet trace which I extracted from your dispatch trace seems exactly 
as I would have expected. See below. In a pg test like this one using a 
loopback interface, anything past loopN-tx is irrelevant. The ipsec packet 
turns into an ARP request for 18.1.0.241.

In non-cyclic graph cases, we don’t end up changing frame ownership at all. In 
this case, you’re doing a double lookup. One small memcpy per frame is a 
trivial cost, especially when one remembers that the cost is amortized over all 
the packets in the frame.

Until you produce a repeatable demonstration of the claimed issue, there’s 
nothing that we can do.

Thanks... Dave

VPP Buffer Trace
Trace:
Trace: 00:00:53:959410: pg-input
Trace:   stream ipsec0, 100 bytes, 0 sw_if_index
Trace:   current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959426: ip4-input
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959519: ip4-lookup
Trace:   fib 0 dpo-idx 2 flow hash: 0x
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959598: ip4-rewrite
Trace:   tx_sw_if_index 2 dpo-idx 2 : ipv4 

Re: [vpp-dev] one question about VPP IPsec implementaions

2019-01-25 Thread Kingwel Xie
First, performance. Second, still performance, you can use QAT with dpdk IPSEC.

Also note that dpdk ipsec doesn’t support AH. And, dpdk ipsec support GCM, but 
native one doesn’t.

There is the 3rd one, ipsecmb ipsec, which provides even better performance if 
using SW ciphering/hashing only. It currently supports AES-CBC only.

Hope this helps.


From: vpp-dev@lists.fd.io  On Behalf Of Zhiyong Yang
Sent: Friday, January 25, 2019 7:15 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] one question about VPP IPsec implementaions

Hi VPP expert,
When I look at the IPsec code,It looks that two implementations 
coexists in VPP. One is dpdk IPsec, and another is native IPsec.
I wonder what’s difference between them from functionality perspective?
Any input is appreciated.

Regards
Zhiyong

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12007): https://lists.fd.io/g/vpp-dev/message/12007
Mute This Topic: https://lists.fd.io/mt/29536073/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-25 Thread Kingwel Xie
Hi Dave,

After checking the code and some debug sessions, I realized where the bug is – 
crypto-input, which is calling vlib_trace_buffer() earlier than 
get_next_frame/put_next_frame. Therefore, the next_frame->flag is overwritten 
by get_next_frame/change_owenrship. I’ve made a patch for it: 
https://gerrit.fd.io/r/17079  appreciate your comments.

Again, about vlib_next_frame_change_ownership, as I understand, this mechanism 
will always try to enqueue buffers to the owner’s frame_index, so please 
consider this scenario:

We have 2 input nodes running at the same time – dpdk-input and avf-input, they 
will try to compete with each other to enqueue to ethernet-input. Consequently, 
bad side, memcpy per frame as we discussed, and, good side, assume they both 
have 10 buffers per frame, then ethernet-input will have chance to get 20 
buffers for batch processing, good for performance. Please confirm if this is 
expected behavior of frame ownership.

Regards,
Kingwel


From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach via 
Lists.Fd.Io
Sent: Friday, January 25, 2019 4:52 AM
To: Kingwel Xie ; vpp-dev 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

The vpp packet trace which I extracted from your dispatch trace seems exactly 
as I would have expected. See below. In a pg test like this one using a 
loopback interface, anything past loopN-tx is irrelevant. The ipsec packet 
turns into an ARP request for 18.1.0.241.

In non-cyclic graph cases, we don’t end up changing frame ownership at all. In 
this case, you’re doing a double lookup. One small memcpy per frame is a 
trivial cost, especially when one remembers that the cost is amortized over all 
the packets in the frame.

Until you produce a repeatable demonstration of the claimed issue, there’s 
nothing that we can do.

Thanks... Dave

VPP Buffer Trace
Trace:
Trace: 00:00:53:959410: pg-input
Trace:   stream ipsec0, 100 bytes, 0 sw_if_index
Trace:   current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959426: ip4-input
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959519: ip4-lookup
Trace:   fib 0 dpo-idx 2 flow hash: 0x
Trace:   UDP: 192.168.2.255 -> 1.2.3.4
Trace: tos 0x00, ttl 64, length 28, checksum 0xb324
Trace: fragment id 0x
Trace:   UDP: 4321 -> 1234
Trace: length 80, checksum 0x30d9
Trace: 00:00:53:959598: ip4-rewrite
Trace:   tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 
flow hash: 0x
Trace:   : 
451c3f11b424c0a802ff0102030410e104d2005030d900010203
Trace:   0020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
Trace: 00:00:53:959687: ipsec0-output
Trace:   ipsec0
Trace:   : 
451c3f11b424c0a802ff0102030410e104d2005030d900010203
Trace:   0020: 
0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223
Trace:   0040: 
2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243
Trace:   0060: 44454647
Trace: 00:00:53:959802: ipsec0-tx
Trace:   IPSec: spi 1 seq 1
Trace: 00:00:53:959934: esp4-encrypt
Trace:   esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96
Trace: 00:00:53:960084: ip4-lookup
Trace:   fib 0 dpo-idx 0 flow hash: 0x
Trace:   IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
Trace: tos 0x00, ttl 254, length 168, checksum 0x96ea
Trace: fragment id 0x
Trace: 00:00:53:960209: ip4-glean
Trace: IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
Trace:   tos 0x00, ttl 254, length 168, checksum 0x96ea
Trace:   fragment id 0x
Trace: 00:00:53:960336: loop0-output
Trace:   loop0
Trace:   ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
Trace:   request, type ethernet/IP4, address size 6/4
Trace:   de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
Trace: 00:00:53:960491: error-drop
Trace:   ip4-glean: ARP requests sent


From: Kingwel Xie mailto:kingwel@ericsson.com>>
Sent: Thursday, January 24, 2019 2:43 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Ok. As requested, pcap trace & test script attached. Actually I made some 
simplification to indicate the problem – using native IPSEC instead of DPDK.

You can see in the buffer trace that ip-lookup is referred 

Re: [vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-23 Thread Kingwel Xie
Ok. As requested, pcap trace & test script attached. Actually I made some 
simplification to indicate the problem – using native IPSEC instead of DPDK.

You can see in the buffer trace that ip-lookup is referred by ip-input in the 
beginning then by esp-encrypt later. It means the ownership of ip-lookup will 
be changed back and forth, 16x3=48 bytes memcpy, per frame basis. Under some 
case, the trace flag in next_frame will be lost, then it leads to buffer trace 
broken. I made a patch for further discussion about it:  
https://gerrit.fd.io/r/17037

Test log shown below:

DBGvpp# show version
vpp v19.04-rc0~24-g0702554 built by root on ubuntu89 at Sat Jan 19 22:13:50 EST 
2019
DBGvpp#
DBGvpp# exec ipsec
loop0
DBGvpp#
DBGvpp# pcap dispatch trace on max 1000 file vpp.pcap buffer-trace pg-input 10
Buffer tracing of 10 pkts from pg-input enabled...
pcap dispatch capture on...
DBGvpp#
DBGvpp#
DBGvpp# packet-generator enable-stream ipsec0
DBGvpp#
DBGvpp# pcap dispatch trace off
captured 14 pkts...
saved to /tmp/vpp.pcap...
DBGvpp#
DBGvpp# show trace
--- Start of thread 0 kw_main ---
Packet 1

00:00:53:959410: pg-input
  stream ipsec0, 100 bytes, 0 sw_if_index
  current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
  UDP: 192.168.2.255 -> 1.2.3.4
tos 0x00, ttl 64, length 28, checksum 0xb324
fragment id 0x
  UDP: 4321 -> 1234
length 80, checksum 0x30d9
00:00:53:959426: ip4-input
  UDP: 192.168.2.255 -> 1.2.3.4
tos 0x00, ttl 64, length 28, checksum 0xb324
fragment id 0x
  UDP: 4321 -> 1234
length 80, checksum 0x30d9
00:00:53:959519: ip4-lookup
  fib 0 dpo-idx 2 flow hash: 0x
  UDP: 192.168.2.255 -> 1.2.3.4
tos 0x00, ttl 64, length 28, checksum 0xb324
fragment id 0x
  UDP: 4321 -> 1234
length 80, checksum 0x30d9
00:00:53:959598: ip4-rewrite
  tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 flow hash: 
0x
  : 451c3f11b424c0a802ff0102030410e104d2005030d900010203
  0020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
00:00:53:959687: ipsec0-output
  ipsec0
  : 451c3f11b424c0a802ff0102030410e104d2005030d900010203
  0020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223
  0040: 2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243
  0060: 44454647
00:00:53:959802: ipsec0-tx
  IPSec: spi 1 seq 1
00:00:53:959934: esp4-encrypt
  esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96
00:00:53:960084: ip4-lookup
  fib 0 dpo-idx 0 flow hash: 0x
  IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
tos 0x00, ttl 254, length 168, checksum 0x96ea
fragment id 0x
00:00:53:960209: ip4-glean
IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
  tos 0x00, ttl 254, length 168, checksum 0x96ea
  fragment id 0x
00:00:53:960336: loop0-output
  loop0
  ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
  request, type ethernet/IP4, address size 6/4
  de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
00:00:53:960491: error-drop
  ip4-glean: ARP requests sent
00:00:53:960780: ethernet-input
  ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
00:00:53:960927: arp-input
  request, type ethernet/IP4, address size 6/4
  de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
00:00:53:961126: error-drop
  arp-input: IP4 source address matches local interface


From: Dave Barach (dbarach) 
Sent: Wednesday, January 23, 2019 11:33 PM
To: Kingwel Xie ; vpp-dev 
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Please write up the issue and share the config and pg input script as I asked. 
You might find that the issue disappears pretty rapidly, with no further action 
on your part... (😉)...

The basic graph engine is not a place to start hacking based on “I think I get 
it...”

Thanks... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Wednesday, January 23, 2019 10:18 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

thanks. I think I get it. By maintaining the ownership, vPP is able to enqueue 
all buffers destinated to the same target node into the owner's next frame at 
one time. This avoids dispatching the node function multiple times.

The bug is still there. I will create a patch later for further discussion.

And, maybe there has some space to improve: considering we have two input nodes 
which will both add elements to the third node, we will see the ownership of 
this node being switched per frame basis.

- Kingwel

 原始邮件 
主题: RE: Question about vlib_next_frame_change_ownership
来自: "Dave Barach (dbarach)" mailto:dbar...@cisco.com>>
发至: 2019年1月23日 下午8:49
抄送: Kingwel Xi

Re: [vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-23 Thread Kingwel Xie
thanks. I think I get it. By maintaining the ownership, vPP is able to enqueue 
all buffers destinated to the same target node into the owner's next frame at 
one time. This avoids dispatching the node function multiple times.

The bug is still there. I will create a patch later for further discussion.

And, maybe there has some space to improve: considering we have two input nodes 
which will both add elements to the third node, we will see the ownership of 
this node being switched per frame basis.

- Kingwel


 原始邮件 
主题: RE: Question about vlib_next_frame_change_ownership
来自: "Dave Barach (dbarach)" 
发至: 2019年1月23日 下午8:49
抄送: Kingwel Xie ,vpp-dev 

As you've probably noticed, the buffer manager has been under active 
development. That may or may not have anything to do with the issue.



Please follow the bug reporting process: 
https://wiki.fd.io/view/VPP/BugReports. In this case, using master/latest, 
please create a Jira ticket including the exact configuration, packet generator 
input script, and a dispatch pcap trace:



  *   "pcap dispatch trace on file dtrace max 1 buffer-trace pg-input 1000",
  *   start the pg stream
  *   "pcap dispatch trace off".
  *   Results in /tmp/dtrace.



I'm not going to speculate on what's going on at this point. Please write up 
the issue so we can look at it.



For a decent explanation of the frame ownership scheme, take a look at 
https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/developers/vlib.html 
under "Complications".



HTH... Dave



-Original Message-
From: Kingwel Xie 
Sent: Wednesday, January 23, 2019 2:16 AM
To: Dave Barach (dbarach) ; vpp-dev 
Subject: Question about vlib_next_frame_change_ownership



Hi Dave and all,



I'm looking at a buffer trace issue with DPDK IPSEC. It turns out the flag 
VLIB_FRAME_TRACE is broken in vlib_next_frame_change_ownership().



The node path in my setup is:  pg-input -> ip-input -> ip-lookup -> ... -> 
dkdp-esp-encrypt -> cryptodev -> crypto-input -> ip-lookup -> ...



As you can see, the ip-lookup node has the owner node ip-input in the 
beginning, then owner will be changed to crypto-input shortly. This change 
causes that we swap the current next_frame with the owner's in 
vlib_next_frame_change_ownership(). As a result, the VLIB_FRAME_TRACE in 
next_frame->flag will be overwritten.



The fix could be very simple, but I'm wondering why we have to change the 
ownership of the next_frame? Actually I can observe the ownership is changed 
back and forth between ip-input and crypto-input for every frame, which leads 
to performance degradation. However, it looks good to me even that we don’t 
care the ownership. In this case, ip-lookup will be dispatched by either 
ip-input or crypto-input, with different next_frame. I guess I must have missed 
something, appreciate if you can elaborate.



Regards,

Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11982): https://lists.fd.io/g/vpp-dev/message/11982
Mute This Topic: https://lists.fd.io/mt/29430823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Question about vlib_next_frame_change_ownership

2019-01-22 Thread Kingwel Xie
Hi Dave and all,

I'm looking at a buffer trace issue with DPDK IPSEC. It turns out the flag 
VLIB_FRAME_TRACE is broken in vlib_next_frame_change_ownership().

The node path in my setup is:  pg-input -> ip-input -> ip-lookup -> ... -> 
dkdp-esp-encrypt -> cryptodev -> crypto-input -> ip-lookup -> ...

As you can see, the ip-lookup node has the owner node ip-input in the 
beginning, then owner will be changed to crypto-input shortly. This change 
causes that we swap the current next_frame with the owner's in 
vlib_next_frame_change_ownership(). As a result, the VLIB_FRAME_TRACE in 
next_frame->flag will be overwritten.

The fix could be very simple, but I'm wondering why we have to change the 
ownership of the next_frame? Actually I can observe the ownership is changed 
back and forth between ip-input and crypto-input for every frame, which leads 
to performance degradation. However, it looks good to me even that we don’t 
care the ownership. In this case, ip-lookup will be dispatched by either 
ip-input or crypto-input, with different next_frame. I guess I must have missed 
something, appreciate if you can elaborate.

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11976): https://lists.fd.io/g/vpp-dev/message/11976
Mute This Topic: https://lists.fd.io/mt/29430823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] About lock-free operation in policer

2019-01-14 Thread Kingwel Xie
Simply because most CLI commands and API handlers are marked as not is_mp_safe, 
hence they are actually protected by vlib_worker_thread_barrier_sync.

So to speak, node functions are temporally stopped and waiting when CLI/API is 
being executed.

You should take a look at: vlib_worker_thread_barrier_sync/ 
vlib_worker_thread_barrier_release

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of blankspot
Sent: Tuesday, January 15, 2019 3:00 PM
To: vpp-dev 
Subject: [vpp-dev] About lock-free operation in policer

hi all,

I have a question about policer classify.
The code of bind/unbind a policer with interface is:

  if (is_add)
pcm->classify_table_index_by_sw_if_index[ti][sw_if_index] = pct[ti];
  else
pcm->classify_table_index_by_sw_if_index[ti][sw_if_index] = ~0;

and the code checking the value in "ip4-policer-classify" node is:

table_index0 =
pcm->classify_table_index_by_sw_if_index[tid][sw_if_index0];

There is no lock protecting the table index.

The function of policer is ok in multi-thread vpp, but I don't know why it is 
lock-free.
Can any one help?

Thanks.

yonggong
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11919): https://lists.fd.io/g/vpp-dev/message/11919
Mute This Topic: https://lists.fd.io/mt/2915/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] the way of using sub-interface

2019-01-13 Thread Kingwel Xie
Just like the regular interface you are using:

create sub-interfaces eth0 181
set interface state eth0.181 up
set interface ip address eth0.181 18.1.0.141/24
ip route add 0.0.0.0/0 via 18.1.0.1 eth0.181

Note there is some performance penalty using vlan sub-intf. As I can see on my 
setup, > 10% on ip4 forwarding case. Therefore, we should avoid vlan sub-intf 
if possible. This might be improved by Damjan's patch 
https://gerrit.fd.io/r/#/c/16173/. I haven't tested it, but I know he is 
keeping improving it.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of 
wangchuan...@163.com
Sent: Monday, January 14, 2019 11:28 AM
To: vpp-dev 
Subject: [vpp-dev] the way of using sub-interface

Hi all,
How can I use the sub-interface after creating ?
#create sub-interfaces GigabitEthernet2/0/0 11
Does it means the VLAN ID ?


I want to let GigabitEthernet2/0/0 && GigabitEthernet2/0/0 to be same VLAN 
(ID:11).
How should I do?
Thanks.


wangchuan...@163.com
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11909): https://lists.fd.io/g/vpp-dev/message/11909
Mute This Topic: https://lists.fd.io/mt/29076170/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ethernet-input on master branch

2019-01-07 Thread Kingwel Xie
Many thanks for the sharing. Yes, as you pointed out, it might not worthwhile 
to do offload for vlan.

Regards,
Kingwel

From: Damjan Marion 
Sent: Monday, January 07, 2019 9:18 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] ethernet-input on master branch




On 7 Jan 2019, at 12:38, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

thanks, Damjan, that is very clear. I checked the device-input and 
ethernet-input code, and yes I totally understand your point.

Two more questions, I can see you spend some time on avf plugin, do you think 
it will eventually become the main stream and replace the dpdk drivers some day?

I don't think we want to be in device driver business, unless device vendors 
pick it up,
but on other side it is good to have one native implementation and AVF is good 
candidate due to compatibility with future Intel cards and
quite simply communication channel with PF driver.

DPDK is suboptimal for our use case, and with native implementation it is easy 
to show that...


For vlan tagging, would you like to consider using the HW rx and tx offload to 
optimize the vlan sub-interface?

Question here is: is it really cheaper to parse dpdk rte_mbuf metadata to 
extract vlan or to simply parse ethernet header as we need to parse ethernet 
header anyway.

Currently code is optimised for untagged, but we simply store u64 of data which 
follows ethertype. It is just one load + store per packet
but allows us to optionally do vlan processing if we detect dot1q or dot1ad 
frame.

On the tx side, it is also questionable specially for L3 path, as we need to 
apply rewrite string anyway,
so only difference is do we memcpy 14, 18 or 22 bytes...

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11860): https://lists.fd.io/g/vpp-dev/message/11860
Mute This Topic: https://lists.fd.io/mt/28940805/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ethernet-input on master branch

2019-01-07 Thread Kingwel Xie
thanks, Damjan, that is very clear. I checked the device-input and 
ethernet-input code, and yes I totally understand your point.

Two more questions, I can see you spend some time on avf plugin, do you think 
it will eventually become the main stream and replace the dpdk drivers some day?

For vlan tagging, would you like to consider using the HW rx and tx offload to 
optimize the vlan sub-interface?

Reagrds,
Kingwel


 原始邮件 
主题: Re: [vpp-dev] ethernet-input on master branch
来自: "Damjan Marion via Lists.Fd.Io" 
发至: 2019年1月7日 下午5:23
抄送: Kingwel Xie 


On 5 Jan 2019, at 04:55, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan,

I noticed you removed the quick path from dpdk-input to ip-input/mpls-input, 
after you merged the patch of ethernet-input optimization. Therefore, all 
packets now have to go through ethernet-input. It would take a few more cpu 
clocks than before.

Please elaborate why making this change.

Dear Kingwei,

Old bypass code beside the fact that it was doing ethertype lookup in the 
device driver code which is architecturally wrong, was broken for some corner 
cases
(i.e. was not doing dMAC check when interface is in promisc mode or interface 
does't do dMAC check at all).
Also bypass code was not dealing properly with VLAN 0 packets.

Keeping things like that means that we will need to maintain separate ethertype 
lookup code i each vpp interface type (i.e. memif, vhost, avf).

With that patch ethertype lookup was moved to one natural place, which is 
ethernet-input node, and as you noticed
there is small cost of doing that (1-2 clocks in my setup).

So with this patch there is small perf hit for L3 untagged traffic, but also 
brings ~10 clocks improvement for L2 traffic. It also
improves L3 performance for memif and vhost-user interfaces.

In addition there is another patch on top of this one which improves tagged 
packet handling and reduces cost of VLAN single/double lookup from 70 clocks to 
less than 30.

Hope this explains,

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11851): https://lists.fd.io/g/vpp-dev/message/11851
Mute This Topic: https://lists.fd.io/mt/28940805/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] ethernet-input on master branch

2019-01-04 Thread Kingwel Xie
Hi Damjan,

I noticed you removed the quick path from dpdk-input to ip-input/mpls-input, 
after you merged the patch of ethernet-input optimization. Therefore, all 
packets now have to go through ethernet-input. It would take a few more cpu 
clocks than before.

Please elaborate why making this change.

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11843): https://lists.fd.io/g/vpp-dev/message/11843
Mute This Topic: https://lists.fd.io/mt/28940805/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of socket-mem

2018-12-30 Thread Kingwel Xie
Hi Dave and Damjan,

Yes, as I confirmed before, the patch works, but I just discovered a side 
effect introduced by this patch.

That is, DPDK in-memory implys no shared config, then 
rte_eal_init/rte_eal_config_create will not check if another DPDK process is 
running. Therefore, we could run multiple vPP processes.

I think it is not an expected behavior.

Regards,
Kingwel


 原始邮件 
主题: Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of socket-mem
来自: "Dave Barach via Lists.Fd.Io" 
发至: 2018年12月23日 上午12:37
抄送: Kingwel Xie ,dmar...@me.com,Matthew Smith 

Patch is in master/latest as of a couple of minutes ago... D.

From: vpp-dev@lists.fd.io  On Behalf Of Kingwel Xie
Sent: Friday, December 21, 2018 9:59 PM
To: Kingwel Xie ; dmar...@me.com; Matthew Smith 

Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of 
socket-mem

Hi,

After checking out the doc,  code and making some tests, I have confirmed patch 
16543 made by Damjan is working well, for both 2MB and 1GB cases. DPDK itself 
can grow or shrink based on request, the vPP master code base doesn’t allow 
this simple because vPP unmount & rmdir the hugepage mount point.

Patch 16543 corrects this behavior, and makes life easier.

Hi Matthew,

You must be running master code without the patch, and you probably only have 
2MB hugepages, so you see the allocation error when you are running out of 
socket-mem. Please switch to patch 16543 or specify how much you need with 
socket-mem option.

Hi Damjan,

Thanks for the heads-up.

Regards,
Kingwel


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Friday, December 21, 2018 11:04 AM
To: dmar...@me.com<mailto:dmar...@me.com>; Matthew Smith 
mailto:mgsm...@netgate.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Non-compliant mail �C action required, contact 
dm...@ericsson.se<mailto:dm...@ericsson.se> : Re: [vpp-dev] dpdk: switch to 
in-memory mode, deprecate use of socket-mem

Hi Matthew,

The patch (https://gerrit.fd.io/r/#/c/16287/) was intended to allocate crypto 
mem pool from DPDK, instead of from vPP. I guess you are using 2MB huge page, 
so you are experiencing out of memory with new patch created by Damjan. Please 
switch to 1GB, to see if it still happens.

Hi Damjan,

As I understand, DPDK could dynamically allocate mem segment when huge page is 
1GB and there is still free space. In this case the socket-mem is valid always. 
Therefore, please do not merge the patch (https://gerrit.fd.io/r/#/c/16543/), 
it won’t work for 2MB case.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion 
via Lists.Fd.Io
Sent: Friday, December 21, 2018 2:02 AM
To: Matthew Smith mailto:mgsm...@netgate.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of 
socket-mem



On 20 Dec 2018, at 18:46, Matthew Smith 
mailto:mgsm...@netgate.com>> wrote:


Hi Damjan,

There is a comment that says  "preallocate at least 16MB of hugepages per 
socket, if more is needed it is up to consumer to preallocate more". What does 
a consumer need to do in order to preallocate more?

This is just equivalent of "sysctl -w vm.nr_hugepages=X", if you set more with 
sysctl or kernel boot parameters, vpp will not try to change that.
We can make this configurable from startup.conf, but in reality it is just 
convenience feature.


I've recently had problems using AES-GCM with IPsec on a test system. Mempool 
allocation for the DPDK aesni_gcm crypto vdev fails during initialization. An 
error message like this is logged:

vnet[4765]: crypto_create_session_drv_pool: failed to create session drv mempool

I found that the DPDK memory allocation set with socket-mem is 64 MB by default 
and this is being exhausted by the various mempool allocations. This behavior 
seems to have started with change https://gerrit.fd.io/r/#/c/16287/ 
("dpdk-ipsec-mempool: allocate from dpdk mem specified by socket-mem in 
startup.conf"). When I increase socket-mem in startup.conf to 128 MB, the 
configured crypto devices successfully initialize.

you can see what exactly dpdk allocated with "show dpdk physmem".


When support for socket-mem goes away, what will need to be done to ensure that 
enough memory is available for crypto devices?

DPDK is supposed to dynamically alloc more memory if needed, Unless I missed 
something...



Thanks,
-Matt



On Thu, Dec 20, 2018 at 3:55 AM Damjan Marion via 
Lists.Fd.Io<http://lists.fd.io/> 
mailto:me@lists.fd.io>> wrote:

Regarding:

https://gerrit.fd.io/r/#/c/16543/

This patch switches dpdk to new in-memory mode, and reduces dpdk memory 
footprint
as pages are allocated dynamically on-deman

Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of socket-mem

2018-12-21 Thread Kingwel Xie
Hi,

After checking out the doc,  code and making some tests, I have confirmed patch 
16543 made by Damjan is working well, for both 2MB and 1GB cases. DPDK itself 
can grow or shrink based on request, the vPP master code base doesn't allow 
this simple because vPP unmount & rmdir the hugepage mount point.

Patch 16543 corrects this behavior, and makes life easier.

Hi Matthew,

You must be running master code without the patch, and you probably only have 
2MB hugepages, so you see the allocation error when you are running out of 
socket-mem. Please switch to patch 16543 or specify how much you need with 
socket-mem option.

Hi Damjan,

Thanks for the heads-up.

Regards,
Kingwel


From: vpp-dev@lists.fd.io  On Behalf Of Kingwel Xie
Sent: Friday, December 21, 2018 11:04 AM
To: dmar...@me.com; Matthew Smith 
Cc: vpp-dev@lists.fd.io
Subject: Non-compliant mail - action required, contact dm...@ericsson.se : Re: 
[vpp-dev] dpdk: switch to in-memory mode, deprecate use of socket-mem

Hi Matthew,

The patch (https://gerrit.fd.io/r/#/c/16287/) was intended to allocate crypto 
mem pool from DPDK, instead of from vPP. I guess you are using 2MB huge page, 
so you are experiencing out of memory with new patch created by Damjan. Please 
switch to 1GB, to see if it still happens.

Hi Damjan,

As I understand, DPDK could dynamically allocate mem segment when huge page is 
1GB and there is still free space. In this case the socket-mem is valid always. 
Therefore, please do not merge the patch (https://gerrit.fd.io/r/#/c/16543/), 
it won't work for 2MB case.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion 
via Lists.Fd.Io
Sent: Friday, December 21, 2018 2:02 AM
To: Matthew Smith mailto:mgsm...@netgate.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of 
socket-mem



On 20 Dec 2018, at 18:46, Matthew Smith 
mailto:mgsm...@netgate.com>> wrote:


Hi Damjan,

There is a comment that says  "preallocate at least 16MB of hugepages per 
socket, if more is needed it is up to consumer to preallocate more". What does 
a consumer need to do in order to preallocate more?

This is just equivalent of "sysctl -w vm.nr_hugepages=X", if you set more with 
sysctl or kernel boot parameters, vpp will not try to change that.
We can make this configurable from startup.conf, but in reality it is just 
convenience feature.


I've recently had problems using AES-GCM with IPsec on a test system. Mempool 
allocation for the DPDK aesni_gcm crypto vdev fails during initialization. An 
error message like this is logged:

vnet[4765]: crypto_create_session_drv_pool: failed to create session drv mempool

I found that the DPDK memory allocation set with socket-mem is 64 MB by default 
and this is being exhausted by the various mempool allocations. This behavior 
seems to have started with change https://gerrit.fd.io/r/#/c/16287/ 
("dpdk-ipsec-mempool: allocate from dpdk mem specified by socket-mem in 
startup.conf"). When I increase socket-mem in startup.conf to 128 MB, the 
configured crypto devices successfully initialize.

you can see what exactly dpdk allocated with "show dpdk physmem".


When support for socket-mem goes away, what will need to be done to ensure that 
enough memory is available for crypto devices?

DPDK is supposed to dynamically alloc more memory if needed, Unless I missed 
something...



Thanks,
-Matt



On Thu, Dec 20, 2018 at 3:55 AM Damjan Marion via 
Lists.Fd.Io<http://lists.fd.io/> 
mailto:me@lists.fd.io>> wrote:

Regarding:

https://gerrit.fd.io/r/#/c/16543/

This patch switches dpdk to new in-memory mode, and reduces dpdk memory 
footprint
as pages are allocated dynamically on-demand.

I tested on both Ubuntu and Centos 7.5 and everything looks good but will 
appreciate
feedback from people using non-standard configs before it is merged...

Thanks,

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11724): https://lists.fd.io/g/vpp-dev/message/11724
Mute This Topic: https://lists.fd.io/mt/28809973/675725
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev%2bow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[mgsm...@netgate.com<mailto:mgsm...@netgate.com>]
-=-=-=-=-=-=-=-=-=-=-=-

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11757): https://lists.fd.io/g/vpp-dev/message/11757
Mute This Topic: https://lists.fd.io/mt/28809973/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of socket-mem

2018-12-20 Thread Kingwel Xie
Hi Matthew,

The patch (https://gerrit.fd.io/r/#/c/16287/) was intended to allocate crypto 
mem pool from DPDK, instead of from vPP. I guess you are using 2MB huge page, 
so you are experiencing out of memory with new patch created by Damjan. Please 
switch to 1GB, to see if it still happens.

Hi Damjan,

As I understand, DPDK could dynamically allocate mem segment when huge page is 
1GB and there is still free space. In this case the socket-mem is valid always. 
Therefore, please do not merge the patch (https://gerrit.fd.io/r/#/c/16543/), 
it won't work for 2MB case.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Friday, December 21, 2018 2:02 AM
To: Matthew Smith 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] dpdk: switch to in-memory mode, deprecate use of 
socket-mem




On 20 Dec 2018, at 18:46, Matthew Smith 
mailto:mgsm...@netgate.com>> wrote:


Hi Damjan,

There is a comment that says  "preallocate at least 16MB of hugepages per 
socket, if more is needed it is up to consumer to preallocate more". What does 
a consumer need to do in order to preallocate more?

This is just equivalent of "sysctl -w vm.nr_hugepages=X", if you set more with 
sysctl or kernel boot parameters, vpp will not try to change that.
We can make this configurable from startup.conf, but in reality it is just 
convenience feature.



I've recently had problems using AES-GCM with IPsec on a test system. Mempool 
allocation for the DPDK aesni_gcm crypto vdev fails during initialization. An 
error message like this is logged:

vnet[4765]: crypto_create_session_drv_pool: failed to create session drv mempool

I found that the DPDK memory allocation set with socket-mem is 64 MB by default 
and this is being exhausted by the various mempool allocations. This behavior 
seems to have started with change https://gerrit.fd.io/r/#/c/16287/ 
("dpdk-ipsec-mempool: allocate from dpdk mem specified by socket-mem in 
startup.conf"). When I increase socket-mem in startup.conf to 128 MB, the 
configured crypto devices successfully initialize.

you can see what exactly dpdk allocated with "show dpdk physmem".



When support for socket-mem goes away, what will need to be done to ensure that 
enough memory is available for crypto devices?

DPDK is supposed to dynamically alloc more memory if needed, Unless I missed 
something...



Thanks,
-Matt



On Thu, Dec 20, 2018 at 3:55 AM Damjan Marion via 
Lists.Fd.Io 
mailto:me@lists.fd.io>> wrote:

Regarding:

https://gerrit.fd.io/r/#/c/16543/

This patch switches dpdk to new in-memory mode, and reduces dpdk memory 
footprint
as pages are allocated dynamically on-demand.

I tested on both Ubuntu and Centos 7.5 and everything looks good but will 
appreciate
feedback from people using non-standard configs before it is merged...

Thanks,

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11724): https://lists.fd.io/g/vpp-dev/message/11724
Mute This Topic: https://lists.fd.io/mt/28809973/675725
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[mgsm...@netgate.com]
-=-=-=-=-=-=-=-=-=-=-=-

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11743): https://lists.fd.io/g/vpp-dev/message/11743
Mute This Topic: https://lists.fd.io/mt/28809973/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] vPP handoff discussion

2018-12-18 Thread Kingwel Xie
Unfortunately no. However we did suffer from the packet drop issue due to queue 
full, when system load is not heavy. In the end we discovered some vectors in 
the queue only have very few buffers in them. And if we increased the queue 
size, drop rate goes down.

We have a dpdk ring based mechanism which will handoff gtpu packets from 
ip-gtpu-bypass to gtpu-input, there are two new nodes created for this purpose: 
handoff-node and handoff-input node. A very preliminary measurement shows these 
two nodes take around < 30 clocks in total when handoff only happens within one 
worker thread. Next, we’ll try to measure the overhead of handoff between 
workers. I’m expecting we’ll have significant performance loss due to cache 
misses. Anyway, the good side is, the code is much easier to understand and 
maintain.


See below for ‘show run’ output:

VirtualFunctionEthernet81/10/7   active  4105410509824  
 0  1.37e1  256.00
VirtualFunctionEthernet81/10/7   active  4105410509824  
 0  7.15e1  256.00
dpdk-input   polling 41054   0  
 0  1.83e20.00
ip4-inputactive  4105410509824  
 0  3.61e1  256.00
ip4-lookup   active  4105410509824  
 0  2.47e1  256.00
ip4-ppf-gtpu-bypass  active  4105410509824  
 0  2.93e1  256.00
ip4-rewrite  active  4105410509824  
 0  2.50e1  256.00
pg-input polling 4105410509824  
 0  7.65e1  256.00
ppf-gtpu4-encap  active  4105410509824  
 0  2.91e1  256.00
ppf-gtpu4-input  active  4105410509824  
 0  2.79e1  256.00
ppf-handoff-inputpolling 4105410509824  
 0  1.39e1  256.00
ppf-handoff  active  4105410509824  
 0  1.17e1  256.00
ppf-pdcp-encap   active  4105410509824  
 0  2.69e1  256.00
ppf-pdcp-encrypt active  4105410509824  
 0  1.69e1  256.00
ppf-sb-path-lb   active  4105410509824  
 0  1.19e1  256.00
ppf-sdap-encap   active  4105410509824  
 0  2.64e1  256.00



From: Damjan Marion 
Sent: Tuesday, December 18, 2018 5:18 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: vPP handoff discussion


Possibly, do you have any numbers to support your statement?

--
Damjan


On 18 Dec 2018, at 10:14, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan,

My fault that I should have made it clear. What I want to say is that I wonder 
if the existing handoff mechanism needs some improvement. Using a ring seems to 
be simpler, and better from performance perspective. Even more, I think it 
could help with the packet drop issue due to bigger and more flexible ring size.

Sorry I changed the subject, it doesn’t strictly follow the original one any 
more.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion 
via Lists.Fd.Io
Sent: Tuesday, December 18, 2018 3:12 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:


Dear Kingwei,

I don't think VPP handoff is right solution for this problem. It can be solved 
in much simpler way.
We can simply have simple ring by worker tthread where new packets pending 
encryption/decryption are enqueued.
Then we can have input node which runs on all threads and  polls those rings. 
When there is new packet on the ring
that input node simply uses atomic compare and swap to declare that it is 
responsible for enc/dec of specific packet.
when encryption is completed, owning thread enqueues packets to the next node 
in preserved packet order...

Does this make sense?

--
Damjan



On 18 Dec 2018, at 03:22, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan,

Yes, agree with you.

Here I got a thought about handoff mechanism in vPP. If looking into the DPDK 
crypto scheduler, you will find out that it heavily depends on DPDK rings, for 
buffer delivery among CPU cores and even for the packet reordering. Therefore, 
something comes to my mind, why can’t we use a ring for handoff?

First, a

Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:

2018-12-18 Thread Kingwel Xie
Hi Avinash,

My question about the MP/MC ring flag that you made a patch to DPDK, any 
comments?

I’d like them to be MP/SC, as we always have only one consumer.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Gonsalves, Avinash 
(Nokia - IN/Bangalore)
Sent: Tuesday, December 18, 2018 3:33 PM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:


Hi Damjan, Kingwel,

Most of the changes in the dpdk plugin is generic and works for other crypto 
devices as well, so can we have this patch integrated while the design with VPP 
Native support is being discussed?

The DPDK patch has been forwarded separately to the DPDK community.

Please find scaling numbers for a sample IMIX profile:

[cid:image001.jpg@01D496F5.65528DD0]




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11664): https://lists.fd.io/g/vpp-dev/message/11664
Mute This Topic: https://lists.fd.io/mt/28779969/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] vPP handoff discussion

2018-12-18 Thread Kingwel Xie
Hi Damjan,

My fault that I should have made it clear. What I want to say is that I wonder 
if the existing handoff mechanism needs some improvement. Using a ring seems to 
be simpler, and better from performance perspective. Even more, I think it 
could help with the packet drop issue due to bigger and more flexible ring size.

Sorry I changed the subject, it doesn’t strictly follow the original one any 
more.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Tuesday, December 18, 2018 3:12 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:


Dear Kingwei,

I don't think VPP handoff is right solution for this problem. It can be solved 
in much simpler way.
We can simply have simple ring by worker tthread where new packets pending 
encryption/decryption are enqueued.
Then we can have input node which runs on all threads and  polls those rings. 
When there is new packet on the ring
that input node simply uses atomic compare and swap to declare that it is 
responsible for enc/dec of specific packet.
when encryption is completed, owning thread enqueues packets to the next node 
in preserved packet order...

Does this make sense?

--
Damjan


On 18 Dec 2018, at 03:22, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan,

Yes, agree with you.

Here I got a thought about handoff mechanism in vPP. If looking into the DPDK 
crypto scheduler, you will find out that it heavily depends on DPDK rings, for 
buffer delivery among CPU cores and even for the packet reordering. Therefore, 
something comes to my mind, why can’t we use a ring for handoff?

First, as you know, the existing handoff is somewhat limited – the queue size 
is 32 by default, very little, and each queue item is a vector with up to 256 
buffer indices, but each vector might only have very few buffers when system is 
not so high. It is not efficient as I can see, and system might drop packets 
due to queue full.

Second, I think the technique used in vlib_get_frame_queue_elt might be slower 
or less efficient than compare-swap in dpdk ring.

Even more, this 2-dimension data structure also brings up complexity when it 
comes to coding. F.g., handoff-dispatch needs to consolidate buffers into a 
size 128 vector.

In general, I’d believe a ring-like mechanism probably makes handoff easier. I 
understand the ring requires compare-swap instruction which definitely 
introduces performance penalty, but on the other hand, handoff itself always 
introduces massive data cache misses, even worse than compare-swap. However, 
handoff  is always worthwhile in some case even there is penalty.

Appreciate you can share your opinion.

Regards,
Kingwel





From: Damjan Marion mailto:dmar...@me.com>>
Sent: Tuesday, December 18, 2018 1:03 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: Gonsalves, Avinash (Nokia - IN/Bangalore) 
mailto:avinash.gonsal...@nokia.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:


Hi Kingwei,

I agree this is useful feature, that's why i believe it should be implemented 
as native code instead of relying on external implementation which is from our 
perspective sub-optimal
due to dpdk dependency, time spent on buffer metadata conversion, etc..

--
Damjan



On 17 Dec 2018, at 15:19, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Avinash,

I happened to look at the patch recently. To my understanding, it is valuable, 
cool stuff, as it allows offloading crypto to other cpu cores. Therefore, more 
throughput can be archieved. A question, you patched the dpdk ring to mp and 
mc, why not mp and sc?

Hi Damjan,

I guess the native ipsec mb plugin doesnot support offloading? Or maybe we can 
do a handoff, but anyhow we can not handoff one ipsec session to multi cores. 
Am I right?

Regards,
Kingwel

 原始邮件 
主题: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:
来自: "Damjan Marion via Lists.Fd.Io" 
mailto:dmarion=me@lists.fd.io>>
发至: 2018年12月17日 下午4:45
抄送: "Gonsalves, Avinash (Nokia - IN/Bangalore)" 
mailto:avinash.gonsal...@nokia.com>>

Dear Avinash,

First, please use public mailing list for such requests, instead of unicasting 
people.

Regarding your patch, I don't feel comfortable to code review it, as I'm not 
familiar with dpdk crypto scheduler.

Personally, I believe such things should be implemented as native VPP code 
instead. We are already in process of moving from
DPDK AES-NI into native code (still dependant on ipsec MB lib) so this stuff 
will not be much usable in this form anyway.

But this is just my opinion, will leave it to others...

--
Damjan



On 13 Dec 2018, at 05:52, Gonsalves, Avinash (Nokia - IN/Bangalore) 
mailto:avinash.gonsal...@nokia.com>> wrote:

Hi Dave, Damjan,

This wa

Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:

2018-12-17 Thread Kingwel Xie
Hi Damjan,

Yes, agree with you.

Here I got a thought about handoff mechanism in vPP. If looking into the DPDK 
crypto scheduler, you will find out that it heavily depends on DPDK rings, for 
buffer delivery among CPU cores and even for the packet reordering. Therefore, 
something comes to my mind, why can’t we use a ring for handoff?

First, as you know, the existing handoff is somewhat limited – the queue size 
is 32 by default, very little, and each queue item is a vector with up to 256 
buffer indices, but each vector might only have very few buffers when system is 
not so high. It is not efficient as I can see, and system might drop packets 
due to queue full.

Second, I think the technique used in vlib_get_frame_queue_elt might be slower 
or less efficient than compare-swap in dpdk ring.

Even more, this 2-dimension data structure also brings up complexity when it 
comes to coding. F.g., handoff-dispatch needs to consolidate buffers into a 
size 128 vector.

In general, I’d believe a ring-like mechanism probably makes handoff easier. I 
understand the ring requires compare-swap instruction which definitely 
introduces performance penalty, but on the other hand, handoff itself always 
introduces massive data cache misses, even worse than compare-swap. However, 
handoff  is always worthwhile in some case even there is penalty.

Appreciate you can share your opinion.

Regards,
Kingwel





From: Damjan Marion 
Sent: Tuesday, December 18, 2018 1:03 AM
To: Kingwel Xie 
Cc: Gonsalves, Avinash (Nokia - IN/Bangalore) ; 
vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:


Hi Kingwei,

I agree this is useful feature, that's why i believe it should be implemented 
as native code instead of relying on external implementation which is from our 
perspective sub-optimal
due to dpdk dependency, time spent on buffer metadata conversion, etc..

--
Damjan


On 17 Dec 2018, at 15:19, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Avinash,

I happened to look at the patch recently. To my understanding, it is valuable, 
cool stuff, as it allows offloading crypto to other cpu cores. Therefore, more 
throughput can be archieved. A question, you patched the dpdk ring to mp and 
mc, why not mp and sc?

Hi Damjan,

I guess the native ipsec mb plugin doesnot support offloading? Or maybe we can 
do a handoff, but anyhow we can not handoff one ipsec session to multi cores. 
Am I right?

Regards,
Kingwel

 原始邮件 
主题: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:
来自: "Damjan Marion via Lists.Fd.Io" 
mailto:dmarion=me@lists.fd.io>>
发至: 2018年12月17日 下午4:45
抄送: "Gonsalves, Avinash (Nokia - IN/Bangalore)" 
mailto:avinash.gonsal...@nokia.com>>

Dear Avinash,

First, please use public mailing list for such requests, instead of unicasting 
people.

Regarding your patch, I don't feel comfortable to code review it, as I'm not 
familiar with dpdk crypto scheduler.

Personally, I believe such things should be implemented as native VPP code 
instead. We are already in process of moving from
DPDK AES-NI into native code (still dependant on ipsec MB lib) so this stuff 
will not be much usable in this form anyway.

But this is just my opinion, will leave it to others...

--
Damjan


On 13 Dec 2018, at 05:52, Gonsalves, Avinash (Nokia - IN/Bangalore) 
mailto:avinash.gonsal...@nokia.com>> wrote:

Hi Dave, Damjan,

This was verified earlier, but didn’t get integrated. Could you please have a 
look?

Thanks,
Avinash

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11635): https://lists.fd.io/g/vpp-dev/message/11635
Mute This Topic: https://lists.fd.io/mt/28779969/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11640): https://lists.fd.io/g/vpp-dev/message/11640
Mute This Topic: https://lists.fd.io/mt/28779969/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:

2018-12-17 Thread Kingwel Xie
Hi Avinash,

I happened to look at the patch recently. To my understanding, it is valuable, 
cool stuff, as it allows offloading crypto to other cpu cores. Therefore, more 
throughput can be archieved. A question, you patched the dpdk ring to mp and 
mc, why not mp and sc?

Hi Damjan,

I guess the native ipsec mb plugin doesnot support offloading? Or maybe we can 
do a handoff, but anyhow we can not handoff one ipsec session to multi cores. 
Am I right?

Regards,
Kingwel


 原始邮件 
主题: Re: [vpp-dev] VPP Review: https://gerrit.fd.io/r/#/c/15084/:
来自: "Damjan Marion via Lists.Fd.Io" 
发至: 2018年12月17日 下午4:45
抄送: "Gonsalves, Avinash (Nokia - IN/Bangalore)" 

Dear Avinash,

First, please use public mailing list for such requests, instead of unicasting 
people.

Regarding your patch, I don't feel comfortable to code review it, as I'm not 
familiar with dpdk crypto scheduler.

Personally, I believe such things should be implemented as native VPP code 
instead. We are already in process of moving from
DPDK AES-NI into native code (still dependant on ipsec MB lib) so this stuff 
will not be much usable in this form anyway.

But this is just my opinion, will leave it to others...

--
Damjan

On 13 Dec 2018, at 05:52, Gonsalves, Avinash (Nokia - IN/Bangalore) 
mailto:avinash.gonsal...@nokia.com>> wrote:

Hi Dave, Damjan,

This was verified earlier, but didn’t get integrated. Could you please have a 
look?

Thanks,
Avinash

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11635): https://lists.fd.io/g/vpp-dev/message/11635
Mute This Topic: https://lists.fd.io/mt/28779969/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] perfmon plugin

2018-12-11 Thread Kingwel Xie
Got it. Thanks!

My CPU is “Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz”. As I can see, the CPU is 
kind of running slower when ‘cpu-cycles’ event is enabled, while it is normal 
with ‘instructions’ event is enabled.

You can see the difference shown below. Texts in red are vector counts for the 
two events, in 3s. I got this by running ‘set pmc instructions-per-clock’. 
Therefore, it is hard to believe the ‘rdpmc’ cause the big difference – 80%.

t1-ip4-input

instructions-per-clo

1609959354

510202112

3.16E+00


instructions

1609959354

15727872

1.02E+02


cpu-cycles

510202112

12838400

3.97E+01


Hope somebody point out the root cause.

Regards,
Kingwel


From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Wednesday, December 12, 2018 12:30 AM
To: Dave Barach 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] perfmon plugin


rdtsc instruction reads timestamp counter which ticks on the constant nominal 
frequency
which means that it will count ticks equally even when CPU is in deep sleep or 
when CPU is in turbo burst
mode. That gives very precise time measurement (i.e. on 2.5GHz CPU  tick will 
always be 0.4ns) but
it is not appropriate for counting real cpu clock cycles.

--
Damjan


On 11 Dec 2018, at 06:02, Dave Barach via Lists.Fd.Io 
mailto:dbarach=cisco@lists.fd.io>> wrote:

The “show run” stats use rdtsc instructions, vs. “show pmc” which uses rdpmc 
instructions with a specific counter selected.

Perhaps Damjan or one of our colleagues from Intel can explain the difference.

D.

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Tuesday, December 11, 2018 5:18 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>; 
vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] perfmon plugin

Hi Dave,

I’m looking at the perfmon plugin. It is a fantastic tool tuning node 
performance, extremely helpful. Thanks for the great contribution.

Here I got a question about ‘cpu-cycles’ event. It shows very different results 
if comparing with clocks of ‘show run’. Around 20% slower. I checked the code 
but didn’t find anything wrong. Also I tried ‘instructions’, it is normal. 
Please check below:

Show run:
ip4-input

active

41843

10711808

0

3.32E+01

256

ppf-pdcp-encap

active

41843

10711808

0

2.69E+01

256


Show pmc:
t1-ip4-input

instructions-per-clo

1609959354

510202112

3.16E+00


instructions

1609959354

15727872

1.02E+02


cpu-cycles

510202112

12838400

3.97E+01

t1-ppf-pdcp-encap

instructions-per-clo

1229445506

415212396

2.96E+00


instructions

1229445506

15727872

7.82E+01


cpu-cycles

415212396

12838400

3.23E+01


So the question is : will the turning on perf event impact system performance? 
If so, why ‘instruction’ event does not? You might notice that packet count 
recorded by ‘instructions’ is very different from that by ‘cpu-cycles’, even 
they are both measured under the same circumstance – running for 3s.

Regards,
Kingwel

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11562): https://lists.fd.io/g/vpp-dev/message/11562
Mute This Topic: https://lists.fd.io/mt/28717911/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11570): https://lists.fd.io/g/vpp-dev/message/11570
Mute This Topic: https://lists.fd.io/mt/28717911/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] perfmon plugin

2018-12-11 Thread Kingwel Xie
Hi Dave,

I’m looking at the perfmon plugin. It is a fantastic tool tuning node 
performance, extremely helpful. Thanks for the great contribution.

Here I got a question about ‘cpu-cycles’ event. It shows very different results 
if comparing with clocks of ‘show run’. Around 20% slower. I checked the code 
but didn’t find anything wrong. Also I tried ‘instructions’, it is normal. 
Please check below:

Show run:
ip4-input

active

41843

10711808

0

3.32E+01

256

ppf-pdcp-encap

active

41843

10711808

0

2.69E+01

256


Show pmc:
t1-ip4-input

instructions-per-clo

1609959354

510202112

3.16E+00


instructions

1609959354

15727872

1.02E+02


cpu-cycles

510202112

12838400

3.97E+01

t1-ppf-pdcp-encap

instructions-per-clo

1229445506

415212396

2.96E+00


instructions

1229445506

15727872

7.82E+01


cpu-cycles

415212396

12838400

3.23E+01


So the question is : will the turning on perf event impact system performance? 
If so, why ‘instruction’ event does not? You might notice that packet count 
recorded by ‘instructions’ is very different from that by ‘cpu-cycles’, even 
they are both measured under the same circumstance – running for 3s.

Regards,
Kingwel

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11555): https://lists.fd.io/g/vpp-dev/message/11555
Mute This Topic: https://lists.fd.io/mt/28717911/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] dpdk socket-mem and dpdk_pool_create

2018-11-26 Thread Kingwel Xie
thanks. then those pools can be moved to dpdk socket-mem, I guess.

I'll submit a patch soon.


 原始邮件 
主题: Re: [vpp-dev] dpdk socket-mem and dpdk_pool_create
来自: Damjan Marion 
发至: 2018年11月25日 下午8:06
抄送: Kingwel Xie 


> On 25 Nov 2018, at 05:23, Kingwel Xie  wrote:
>
> Hi Damjan and vPPers,
>
> I got a question about the physical mem management in vPP.
>
> As I understand, we specify socket-mem in startup.conf for DPDK to allocate 
> mem for itself, but we are using vPP phymem allocator for mbuf and crypto 
> PMDs if vdev is specified. The latter consist of crypto op , session, and 
> driver pools, and they are quite small in general. However, at least one 
> hugepage will be allocated for each pool, which could be 1GB. Big waste in a 
> way.
> DPDK instead handles it in a better way using a mem segment based mechanism.
>
> Question: why use a customzied dpdk_pool_create instea of asking DPDK to 
> manage mempool just like rte_pktbuf_create_by_ops?
> 


We care only for buffer mempools to be allocated by vpp physmem allocator, no 
need for others. They can be allocated by dpdk or
alternatively other mempools can share single physmem region with dpdk buffer 
mempools


--
Damjan
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11406): https://lists.fd.io/g/vpp-dev/message/11406
Mute This Topic: https://lists.fd.io/mt/28308134/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] dpdk socket-mem and dpdk_pool_create

2018-11-24 Thread Kingwel Xie
Hi Damjan and vPPers,

I got a question about the physical mem management in vPP.

As I understand, we specify socket-mem in startup.conf for DPDK to allocate mem 
for itself, but we are using vPP phymem allocator for mbuf and crypto PMDs if 
vdev is specified. The latter consist of crypto op , session, and driver pools, 
and they are quite small in general. However, at least one hugepage will be 
allocated for each pool, which could be 1GB. Big waste in a way. DPDK instead 
handles it in a better way using a mem segment based mechanism.

Question: why use a customzied dpdk_pool_create instea of asking DPDK to manage 
mempool just like rte_pktbuf_create_by_ops?

Regards,
Kingwel


 原始邮件 
主题: [vpp-dev] runing testpmd inside vm without installing guest OS
来自: "Damjan Marion via Lists.Fd.Io" 
发至: 2018年11月23日 下午11:47
抄送: Vpp-dev 

In case anybody is interested, this is quick way to run testpmd inside vm, for 
vhost-user testing.

No guest userspace needed, no disk image, guest kernel just runs bash script 
which starts testpmd

https://gist.github.com/dmarion/161d83165d27af7c39ab807beae4746c


--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11394): https://lists.fd.io/g/vpp-dev/message/11394
Mute This Topic: https://lists.fd.io/mt/28308134/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] pmalloc: clib_pmalloc_create_shared_arena

2018-11-09 Thread Kingwel Xie
Hi dear Damjan,

I'm looking at the DPDK IPSec code, and occasionally I got a strange behavior 
of the memory pools created by dpdk_pool_create.

After some time hard debugging, I think there might be a potential overrun 
issue in clib_pmalloc_create_shared_arena.

This is the modified code:
return pm->base + ((uword) pp->index << pm->def_log2_page_sz);

pp->index is u32, it would wrap after 4 times creation when the page size is 
1G. The overlapped va leads to very unexpected result.

A patch was created for correction:
https://gerrit.fd.io/r/15847

Please take a look at it. Appreciate your comments.

Regards,
Kingwel


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11189): https://lists.fd.io/g/vpp-dev/message/11189
Mute This Topic: https://lists.fd.io/mt/28070034/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Incrementing node counters

2018-11-03 Thread Kingwel Xie
Thanks for the education. will look into the assembly code.


 原始邮件 
主题: Re: [vpp-dev] Incrementing node counters
来自: "Dave Barach via Lists.Fd.Io" 
发至: 2018年11月3日 下午7:55
抄送: Kingwel Xie ,vpp-dev@lists.fd.io
You’ll have to look at the instruction stream in gdb or else “gcc –S” and look 
at the generated assembly-language code. Even at that, the difference in clock 
cycles won’t be completely obvious due to subarchitectural CPU implementation 
details. “Node” is not a typical hot variable which would end up in a register. 
It probably makes exactly zero difference.

FWIW… Dave

From: Kingwel Xie 
Date: Saturday, November 3, 2018 at 3:57 AM
To: Recipient Suppressed , "vpp-dev@lists.fd.io" 

Subject: RE: Incrementing node counters

Thanks for the comments.

I know is_ip6 will be optimized by complier.

I am still wondering how different between using xxx_node.index and 
node->node_index. Anyway,thanks, now I understand it is a performance 
consideration.

Best Regards,
Kingwel

 原始邮件 
主题: RE: Incrementing node counters
来自: "Dave Barach (dbarach)" 
发至: 2018年11月2日 下午7:00
抄送: Kingwel Xie ,vpp-dev@lists.fd.io
Yes, you missed something. This pattern is used in inline functions called with 
compile-time constant values for is_ip6:

always_inline uword
ah_encrypt_inline (vlib_main_t * vm,
  vlib_node_runtime_t * node, vlib_frame_t * from_frame,
  int is_ip6)


VLIB_NODE_FN (ah4_encrypt_node) (vlib_main_t * vm,
 vlib_node_runtime_t * node,
 vlib_frame_t * from_frame)
{
  return ah_encrypt_inline (vm, node, from_frame, 0 /* is_ip6 */ );
}



VLIB_NODE_FN (ah6_encrypt_node) (vlib_main_t * vm,
 vlib_node_runtime_t * node,
 vlib_frame_t * from_frame)
{
  return ah_encrypt_inline (vm, node, from_frame, 1 /* is_ip6 */ );
}

The compiler discards either the “if” clause or the “else” clause, and 
(certainly) never tests is_ip6 at runtime. It might be marginally worth 
s/xxx_node.index/node->node_index/.

Another instance of this game may make sense in performance-critical nodes. 
Here, we remove packet-tracer code:

always_inline uword
nsim_inline (vlib_main_t * vm,
  vlib_node_runtime_t * node, vlib_frame_t * frame, int is_trace)
{
  
  if (is_trace)
 {
   if (b[0]->flags & VLIB_BUFFER_IS_TRACED)
 {
   nsim_trace_t *t = vlib_add_trace (vm, node, b[0], sizeof (*t));
   t->expires = expires;
   t->is_drop = is_drop0;
   t->tx_sw_if_index = (is_drop0 == 0) ? ep->tx_sw_if_index : 0;
 }
 }
  
}

VLIB_NODE_FN (nsim_node) (vlib_main_t * vm, vlib_node_runtime_t * node,
  vlib_frame_t * frame)
{
  if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE))
return nsim_inline (vm, node, frame, 1 /* is_trace */ );
  else
return nsim_inline (vm, node, frame, 0 /* is_trace */ );
}

From: vpp-dev@lists.fd.io  On Behalf Of Kingwel Xie
Sent: Thursday, November 1, 2018 8:43 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Incrementing node counters

Hi vPPers,

I’m looking at the latest changes in IPSEC, and I notice ip4 and ip6 nodes are 
separated. So there are a lot of code in the node function like this:

  if (is_ip6)
vlib_node_increment_counter (vm, esp6_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);
  else
vlib_node_increment_counter (vm, esp4_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);


I’m wondering why not like this:

vlib_node_increment_counter (vm, node->node_index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);

My understanding is that node functions are always dispatched with the correct 
node instances. Or do I miss something? BTW, nt just ipsec, quite some other 
nodes are written as the former.

Regards,
Kingwel


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11084): https://lists.fd.io/g/vpp-dev/message/11084
Mute This Topic: https://lists.fd.io/mt/27823101/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Incrementing node counters

2018-11-03 Thread Kingwel Xie
Thanks for the comments.

I know is_ip6 will be optimized by complier.

I am still wondering how different between using xxx_node.index and 
node->node_index. Anyway,thanks, now I understand it is a performance 
consideration.

Best Regards,
Kingwel


 原始邮件 
主题: RE: Incrementing node counters
来自: "Dave Barach (dbarach)" 
发至: 2018年11月2日 下午7:00
抄送: Kingwel Xie ,vpp-dev@lists.fd.io
Yes, you missed something. This pattern is used in inline functions called with 
compile-time constant values for is_ip6:

always_inline uword
ah_encrypt_inline (vlib_main_t * vm,
  vlib_node_runtime_t * node, vlib_frame_t * from_frame,
  int is_ip6)


VLIB_NODE_FN (ah4_encrypt_node) (vlib_main_t * vm,
 vlib_node_runtime_t * node,
 vlib_frame_t * from_frame)
{
  return ah_encrypt_inline (vm, node, from_frame, 0 /* is_ip6 */ );
}



VLIB_NODE_FN (ah6_encrypt_node) (vlib_main_t * vm,
 vlib_node_runtime_t * node,
 vlib_frame_t * from_frame)
{
  return ah_encrypt_inline (vm, node, from_frame, 1 /* is_ip6 */ );
}

The compiler discards either the “if” clause or the “else” clause, and 
(certainly) never tests is_ip6 at runtime. It might be marginally worth 
s/xxx_node.index/node->node_index/.

Another instance of this game may make sense in performance-critical nodes. 
Here, we remove packet-tracer code:

always_inline uword
nsim_inline (vlib_main_t * vm,
  vlib_node_runtime_t * node, vlib_frame_t * frame, int is_trace)
{
  
  if (is_trace)
 {
   if (b[0]->flags & VLIB_BUFFER_IS_TRACED)
 {
   nsim_trace_t *t = vlib_add_trace (vm, node, b[0], sizeof (*t));
   t->expires = expires;
   t->is_drop = is_drop0;
   t->tx_sw_if_index = (is_drop0 == 0) ? ep->tx_sw_if_index : 0;
 }
 }
  
}

VLIB_NODE_FN (nsim_node) (vlib_main_t * vm, vlib_node_runtime_t * node,
  vlib_frame_t * frame)
{
  if (PREDICT_FALSE (node->flags & VLIB_NODE_FLAG_TRACE))
return nsim_inline (vm, node, frame, 1 /* is_trace */ );
  else
return nsim_inline (vm, node, frame, 0 /* is_trace */ );
}

From: vpp-dev@lists.fd.io  On Behalf Of Kingwel Xie
Sent: Thursday, November 1, 2018 8:43 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Incrementing node counters

Hi vPPers,

I’m looking at the latest changes in IPSEC, and I notice ip4 and ip6 nodes are 
separated. So there are a lot of code in the node function like this:

  if (is_ip6)
vlib_node_increment_counter (vm, esp6_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);
  else
vlib_node_increment_counter (vm, esp4_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);


I’m wondering why not like this:

vlib_node_increment_counter (vm, node->node_index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);

My understanding is that node functions are always dispatched with the correct 
node instances. Or do I miss something? BTW, nt just ipsec, quite some other 
nodes are written as the former.

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11082): https://lists.fd.io/g/vpp-dev/message/11082
Mute This Topic: https://lists.fd.io/mt/27823101/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Incrementing node counters

2018-11-01 Thread Kingwel Xie
Hi vPPers,

I’m looking at the latest changes in IPSEC, and I notice ip4 and ip6 nodes are 
separated. So there are a lot of code in the node function like this:

  if (is_ip6)
vlib_node_increment_counter (vm, esp6_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);
  else
vlib_node_increment_counter (vm, esp4_decrypt_node.index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);


I’m wondering why not like this:

vlib_node_increment_counter (vm, node->node_index,

ESP_DECRYPT_ERROR_RX_PKTS,

from_frame->n_vectors);

My understanding is that node functions are always dispatched with the correct 
node instances. Or do I miss something? BTW, nt just ipsec, quite some other 
nodes are written as the former.

Regards,
Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11065): https://lists.fd.io/g/vpp-dev/message/11065
Mute This Topic: https://lists.fd.io/mt/27823101/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] A bug in IP reassembly?

2018-09-25 Thread Kingwel Xie
Ok. I'll find some time tomorrow to push a patch fixing both v4 and v6.

-Original Message-
From: Klement Sekera  
Sent: Tuesday, September 25, 2018 6:02 PM
To: Kingwel Xie ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] A bug in IP reassembly?

Hi Kingwel,

thanks for finding this bug. Your patch looks fine - would you mind making a 
similar fix in ip4_reassembly.c? The logic suffers from the same flaw there.

Thanks,
Klement

Quoting Kingwel Xie (2018-09-25 11:06:49)
>Hi,
> 
> 
> 
>I worked on testing IP reassembly recently, the hit a crash when testing
>IP reassembly with IPSec. It took me some time to figure out why.
> 
> 
> 
>The crash only happens when there are >1 feature node enabled under
>ip-unicast and ip reassembly is working, like below.
> 
> 
> 
>ip4-unicast:
> 
>  ip4-reassembly-feature
> 
>  ipsec-input-ip4
> 
> 
> 
>It looks like there is a bug in the reassembly code as below:
>vnet_feature_next will do to buffer b0 to update the next0 and the
>current_config_index of b0, but b0 is pointing to some fragment buffer
>which in most cases is not the first buffer in chain indicated by bi0.
>Actually bi0 pointing to the first buffer is returned by ip6_reass_update
>when reassembly is finalized. As I can see this is a mismatch that bi0 and
>b0 are not the same buffer. In the end the quick fix is like what I added
>: b0 = vlib_get_buffer (vm, bi0); to make it right.
> 
> 
> 
>      if (~0 != bi0)
> 
>        {
> 
>        skip_reass:
> 
>      to_next[0] = bi0;
> 
>      to_next += 1;
> 
>      n_left_to_next -= 1;
> 
>      if (is_feature && IP6_ERROR_NONE == error0)
> 
>    {
> 
>      b0 = vlib_get_buffer (vm, bi0);  à added
>by Kingwel
> 
>      vnet_feature_next (&next0, b0);
> 
>    }
> 
>      vlib_validate_buffer_enqueue_x1 (vm, node,
>next_index, to_next,
> 
>   
> 
>   n_left_to_next, bi0, next0);
> 
>        }
> 
> 
> 
>Probably this is not the perfect fix, but it works at least. Wonder if
>committers have better thinking about it? I can of course push a patch if
>you think it is ok.
> 
> 
> 
>Regards,
> 
>Kingwel
> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10648): https://lists.fd.io/g/vpp-dev/message/10648
Mute This Topic: https://lists.fd.io/mt/26218556/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] A bug in IP reassembly?

2018-09-25 Thread Kingwel Xie
Hi,

I worked on testing IP reassembly recently, the hit a crash when testing IP 
reassembly with IPSec. It took me some time to figure out why.

The crash only happens when there are >1 feature node enabled under ip-unicast 
and ip reassembly is working, like below.

ip4-unicast:
  ip4-reassembly-feature
  ipsec-input-ip4

It looks like there is a bug in the reassembly code as below: vnet_feature_next 
will do to buffer b0 to update the next0 and the current_config_index of b0, 
but b0 is pointing to some fragment buffer which in most cases is not the first 
buffer in chain indicated by bi0. Actually bi0 pointing to the first buffer is 
returned by ip6_reass_update when reassembly is finalized. As I can see this is 
a mismatch that bi0 and b0 are not the same buffer. In the end the quick fix is 
like what I added : b0 = vlib_get_buffer (vm, bi0); to make it right.

  if (~0 != bi0)
{
skip_reass:
  to_next[0] = bi0;
  to_next += 1;
  n_left_to_next -= 1;
  if (is_feature && IP6_ERROR_NONE == error0)
{
  b0 = vlib_get_buffer (vm, bi0);  --> added by 
Kingwel
  vnet_feature_next (&next0, b0);
}
  vlib_validate_buffer_enqueue_x1 (vm, node, next_index, 
to_next,

   n_left_to_next, bi0, next0);
}

Probably this is not the perfect fix, but it works at least. Wonder if 
committers have better thinking about it? I can of course push a patch if you 
think it is ok.

Regards,
Kingwel

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10638): https://lists.fd.io/g/vpp-dev/message/10638
Mute This Topic: https://lists.fd.io/mt/26218556/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: 答复: [E] [vpp-dev] Build a telecom-class Security gateway device with VPP

2018-09-21 Thread Kingwel Xie
Well, you should really look for the discussion about IKE/IPSec this mailing 
list had before.

I can put it this way:


  1.  vPP IKEv2/IPSEC is PoC quality, meaning it is far from a carrier grade 
product
  2.  No SNMP support, you got to do it by yourself
  3.  Performance is good though, but still can be further improved. Using DPDK 
crypto PMD can get doubled performance when it comes to AES-CBC
  4.  Many bugs. You couldn’t want too much from a PoC
  5.  It is really not about IKE/IPSEC, you got t understand how vPP works
  6.  If so, please consider replace IKEv2 in vPP with Strongstwan or something 
else, but keep IPSEC in vPP

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of tianye@sina
Sent: Thursday, September 20, 2018 6:56 PM
To: hagb...@gmail.com; vpp-dev@lists.fd.io
Subject: FW: 答复: [E] [vpp-dev] Build a telecom-class Security gateway device 
with VPP

Hello

Is there someone who know something about this?


From: tianye@sina [mailto:tiany...@sina.com]
Sent: Wednesday, September 19, 2018 5:03 PM
To: 'Ed Warnicke'; 'vpp-dev@lists.fd.io'; 'Kevin Yan'
Subject: RE: 答复: [E] [vpp-dev] Build a telecom-class Security gateway device 
with VPP


Hello Dear VPP developers:



I am planning to develop a SeGW product with VPP.

I have a question for Ipsec within VPP platform.

According to 3GPP TS 33.320 V13.0.0 specification, (Annex A.2 Combined Device 
and HP Authentication Call-flow Example, page 37)



21.The SeGW checks the correctness of the AUTH received from the H(e)NB.

The SeGW should send the assigned Remote IP address in the configuration 
payload (CFG_REPLY), if the H(e)NB requested for H(e)NB’s and/or L-GW’s Remote 
IP address through the CFG_REQUEST.

If the SeGW allocates different remote IP addresses to the L-GW and to the 
H(e)NB, then the SeGW can  include information to differentiate the IP address 
assigned to the H(e)NB and the L-GW,

in order to avoid any misconfiguration.A possible mechanism to inform which IP 
address is to be used for H(e)NB or L-GW is implementation specific and out of 
scope of the present document.



The red font above is the requirement from 3GPP standard.

Can anybody tell me if the Ipsec in VPP support this?

If so, which code implemented this?



[cid:image001.png@01D451CD.06D00A70][cid:image003.png@01D451CD.06D00A70]
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10593): https://lists.fd.io/g/vpp-dev/message/10593
Mute This Topic: https://lists.fd.io/mt/25840496/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] cmake is on

2018-09-09 Thread Kingwel Xie
Many thx!

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Friday, September 07, 2018 4:30 PM
To: Damjan Marion 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] cmake is on


I just did it:

https://gerrit.fd.io/r/#/c/14711/


--
Damjan


On 7 Sep 2018, at 09:25, Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>> wrote:


Dear Kingwei,

That should be easy to fix. Can you just remove it and submit patch, so i can 
merge it?

Thanks,

--
Damjan


On 7 Sep 2018, at 02:39, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damijan,

Thanks for the great job done. It is now much faster.

We noticed a difference between using cmake and automake in the latest code:

Vppinfra/qsort.c is included in vppinfra/CMakefile.txt but not in vppinfra.am, 
which creates a situation that cmake image would be linked to the qsort.c but 
automake linked to glibc.

The reason why we noticed is there is a buffer overrun bug in qsort.c that 
causes cli crashed occasionally.

Please comment how to fix. Personally I’d like to remove qsort.c, like before.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion 
via Lists.Fd.Io
Sent: Sunday, September 02, 2018 8:48 PM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] cmake is on


Dear all,

We just switched from autotools to cmake and retired all autotools related 
files in src/.

All verify jobs are ok, and we also tried it on 3 different x86 and 2 different 
ARM Aarch64 machines.

Due to blast radius, i will not be surprised that some small issues pop out, 
but i don't expect anything hard to fix.

Let us know if you hit something...

PS As a part of this change, CentOS 7 build are now using devtoolset-7, so they 
are compiled with gcc-7, which also means images have support for Skylake 
Servers (AVX512).

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10420): https://lists.fd.io/g/vpp-dev/message/10420
Mute This Topic: https://lists.fd.io/mt/25155374/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10421): https://lists.fd.io/g/vpp-dev/message/10421
Mute This Topic: https://lists.fd.io/mt/25155374/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10449): https://lists.fd.io/g/vpp-dev/message/10449
Mute This Topic: https://lists.fd.io/mt/25155374/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] cmake is on

2018-09-06 Thread Kingwel Xie
Hi Damijan,

Thanks for the great job done. It is now much faster.

We noticed a difference between using cmake and automake in the latest code:

Vppinfra/qsort.c is included in vppinfra/CMakefile.txt but not in vppinfra.am, 
which creates a situation that cmake image would be linked to the qsort.c but 
automake linked to glibc.

The reason why we noticed is there is a buffer overrun bug in qsort.c that 
causes cli crashed occasionally.

Please comment how to fix. Personally I'd like to remove qsort.c, like before.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
Lists.Fd.Io
Sent: Sunday, September 02, 2018 8:48 PM
To: vpp-dev 
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] cmake is on


Dear all,

We just switched from autotools to cmake and retired all autotools related 
files in src/.

All verify jobs are ok, and we also tried it on 3 different x86 and 2 different 
ARM Aarch64 machines.

Due to blast radius, i will not be surprised that some small issues pop out, 
but i don't expect anything hard to fix.

Let us know if you hit something...

PS As a part of this change, CentOS 7 build are now using devtoolset-7, so they 
are compiled with gcc-7, which also means images have support for Skylake 
Servers (AVX512).

--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10420): https://lists.fd.io/g/vpp-dev/message/10420
Mute This Topic: https://lists.fd.io/mt/25155374/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Is VPP IPSec implementation thread safe?

2018-07-06 Thread Kingwel Xie
Sorry, Damjan. Maybe I confused you.

This is what I am talking about:

In esp_encrypt_node_fn(), the logic is like this:

u32 *recycle = 0;
…
vec_add1 (recycle, i_bi0);
…
free_buffers_and_exit:
  if (recycle)
vlib_buffer_free (vm, recycle, vec_len (recycle));
  vec_free (recycle);






From: Damjan Marion 
Sent: Friday, July 06, 2018 6:14 PM
To: Kingwel Xie 
Cc: Vamsi Krishna ; Jim Thompson ; Dave 
Barach ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Is VPP IPSec implementation thread safe?


We don't use recycle anymore (except at one place), mainly due ot the issue how 
dpdk works.
--
Damjan


On 6 Jul 2018, at 11:27, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Well, there is a vector named recycle to remember all old buffers, which 
consequently means a lot of mem resize, mem_cpy when vector rate is 256 or so. 
Counting all of these overhead, I’d say, I see around 7~10% impact, after 
fixing openssl usage issue.

We don't use recycle anymore (except at one place), mainly due ot the issue how 
dpdk works.
--
Damjan

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9789): https://lists.fd.io/g/vpp-dev/message/9789
Mute This Topic: https://lists.fd.io/mt/22720913/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Is VPP IPSec implementation thread safe?

2018-07-06 Thread Kingwel Xie
Well, there is a vector named recycle to remember all old buffers, which 
consequently means a lot of mem resize, mem_cpy when vector rate is 256 or so. 
Counting all of these overhead, I’d say, I see around 7~10% impact, after 
fixing openssl usage issue.

BTW, openssl issue means we should always fully initialized the cipher and hmac 
context once, instead of doing it every time handling one packet.

Taking AES-CBC as an example, when encrypting packet:

  EVP_CipherInit_ex (ctx, NULL, NULL, NULL, iv, -1);   // only do it with iv, 
iv is changed per every packet
  EVP_CipherUpdate (ctx, in, &out_len, in, in_len);

On the other hand, we do full initialization when creating contexts. Note keys 
should be specified here, but not IV.

  HMAC_Init_ex (sa->context[thread_id].hmac_ctx, sa->integ_key, 
sa->integ_key_len, md, NULL);
  EVP_CipherInit_ex (sa->context[thread_id].cipher_ctx, cipher, NULL, 
sa->crypto_key, NULL, is_outbound > 0 ? 1 : 0);

Initialization with keys would take quite a long time because openssl do a lot 
of math. It is not necessary, as we know, keys are kept unchanged in most cases 
for a SA.

Regards,
Kingwel


From: Damjan Marion 
Sent: Tuesday, July 03, 2018 5:14 PM
To: Kingwel Xie 
Cc: Vamsi Krishna ; Jim Thompson ; Dave 
Barach ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Is VPP IPSec implementation thread safe?


On 3 Jul 2018, at 02:36, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan,

Thanks for the heads-up. Never come to that. I’m still thinking it is 
acceptable if we are doing IPSec. Buffer copying is a significant overhead.

What i wanted to say by copying is writing encrypted data into new buffer 
instead of overwriting the payload of existing buffer. I will not call that a 
significant overhead.


We are working on the code, will contribute when we think it is ready. There 
are so many corner cases of IPSec, hard to say we can cover all of them.

I know that another people are also working on the code, so will be good that 
we are all in sync to avoid throwaway work

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9786): https://lists.fd.io/g/vpp-dev/message/9786
Mute This Topic: https://lists.fd.io/mt/22720913/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Is VPP IPSec implementation thread safe?

2018-07-02 Thread Kingwel Xie
Hi Damjan,

Thanks for the heads-up. Never come to that. I’m still thinking it is 
acceptable if we are doing IPSec. Buffer copying is a significant overhead.

We are working on the code, will contribute when we think it is ready. There 
are so many corner cases of IPSec, hard to say we can cover all of them.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
Sent: Monday, July 02, 2018 7:43 PM
To: Kingwel Xie 
Cc: Vamsi Krishna ; Jim Thompson ; Dave 
Barach ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Is VPP IPSec implementation thread safe?


--
Damjan


On 2 Jul 2018, at 11:14, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Vamsi, Damjan,

I’d like to contribute my two cents about IPSEC. We have been working on the 
improvement for quite some time.


  1.  Great that vPP supports IPSEC, but the code is mainly for PoC. It lacks 
of many features: buffer chain, AES-GCM/AES-CTR, UDP encap (seems already there 
in master track?) many hardcode, broken packet trace,  SEQ handling, etc.
  2.  Performance is not good, because of wrongly usage of openssl, buffer 
copying.

Buffer copying is needed, otherwise you have problem with cloned buffers. I.e. 
you still want original packet to be SPANed



  1.  We can see 100% up after fixing all these issues.
  2.  DPDK Ipsec has better performance but the quality of code is not good, 
many bugs.

If you are looking for a production IPSEC, vpp is a good start but you still 
have a lot things to do.

Contributions are welcome :)


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9766): https://lists.fd.io/g/vpp-dev/message/9766
Mute This Topic: https://lists.fd.io/mt/22720913/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Is VPP IPSec implementation thread safe?

2018-07-02 Thread Kingwel Xie
Hi Vamsi, Damjan,

I’d like to contribute my two cents about IPSEC. We have been working on the 
improvement for quite some time.


  1.  Great that vPP supports IPSEC, but the code is mainly for PoC. It lacks 
of many features: buffer chain, AES-GCM/AES-CTR, UDP encap (seems already there 
in master track?) many hardcode, broken packet trace,  SEQ handling, etc.
  2.  Performance is not good, because of wrongly usage of openssl, buffer 
copying. We can see 100% up after fixing all these issues.
  3.  DPDK Ipsec has better performance but the quality of code is not good, 
many bugs.

If you are looking for a production IPSEC, vpp is a good start but you still 
have a lot things to do.

Regards,
Kingwel

From: vpp-dev@lists.fd.io  On Behalf Of Vamsi Krishna
Sent: Monday, July 02, 2018 12:05 PM
To: Damjan Marion 
Cc: Jim Thompson ; Dave Barach ; 
vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Is VPP IPSec implementation thread safe?

Thanks Damjan.

Is the IPSec code using spd attached to an interface ("set interface ipsec spd 
 ") production quality?
How is the performance of this code in terms of throughput, are there any 
benchmarks that can be referred to?

Thanks
Vamsi

On Sat, Jun 30, 2018 at 1:35 AM, Damjan Marion 
mailto:dmar...@me.com>> wrote:

Dear Vamsi,

It was long long time ago when I wrote original ipsec code, so I don't remember 
details anymore.
For IKEv2, i would suggest using external implementation, as ikev2 stuff is 
just PoC quality code.

--
Damjan


On 29 Jun 2018, at 18:38, Vamsi Krishna 
mailto:vamsi...@gmail.com>> wrote:

Hi Damjan, Dave,

Can you please also answer the questions I had in the email just before Jim 
hijacked the thread.

Thanks
Vamsi

On Fri, Jun 29, 2018 at 3:06 PM, Damjan Marion 
mailto:dmar...@me.com>> wrote:

Hi Jim,

Atomic add sounds like a reasonable solution to me...

--
Damjan


On 28 Jun 2018, at 09:26, Jim Thompson 
mailto:j...@netgate.com>> wrote:

All,

I don't know if any of the previously-raised issues occur in real-life.  
Goodness knows we've run billions of IPsec packets in the test harnesses 
(harnessi?) here without seeing them.

There are a couple issues with IPsec and multicore that haven't been raised, 
however, so I'm gonna hijack the thread.

If multiple worker threads are configured in VPP, it seems like there’s the 
potential for problems with IPsec where the sequence number or replay window 
for an SA could get stomped on by two threads trying to update them at the 
same. We assume that this issue is well known since the following comment 
occurs at line 173 in src/vnet/ipsec/esp.h

/* TODO seq increment should be atomic to be accessed by multiple workers */

See: https://github.com/FDio/vpp/blob/master/src/vnet/ipsec/esp.h#L173

We've asked if anyone is working on this, and are willing to try and fix it, 
but would need some direction on what is the best way to accomplish same.

We could try to use locking, which would be straightforward but would add 
overhead.  Maybe that overhead could be offset some by requesting a block of 
sequence numbers upfront for all of the packets being processed instead of 
getting a sequence number and incrementing as each packet is processed.

There is also the clib_smp_atomic_add() call, which invokes 
__sync_fetch_and_add(addr,increment).  This is a GCC built_in that uses a 
memory barrier to avoid obtaining a lock.  We're not sure if there are 
drawbacks to using this.

See: http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html

GRE uses clib_smp_atomic_add() for sequence number processing, see 
src/vnet/gre/gre.c#L409 and src/vnet/gre/gre.c#L421

Finally, there seem to be issues around AES-GCM nonce processing when operating 
multi-threaded.  If it is nonce processing, it can probably (also) be addressed 
via clib_smp_atomic_add(), but.. don't know yet.

We've raised these before, but haven't received much in the way of response.  
Again, we're willing to work on these, but would like a bit of 'guidance' from 
vpp-dev.

Thanks,

Jim (and the rest of Netgate)

On Thu, Jun 28, 2018 at 1:44 AM, Vamsi Krishna 
mailto:vamsi...@gmail.com>> wrote:
Hi Damjan, Dave,

Thanks for the quick reply.

It is really helpful. So the barrier ensures that the IPSec data structure 
access is thread safe.

Have a few more question on the IPSec implementation.
1. The inbound SA lookup (in ipsec-input) is actually going through the inbound 
policies for the given spd id linearly and matching a policy. The SA is picked 
based on the matching policy.
 This could have been an SAD hash table with key as (SPI, dst address, 
proto (ESP or AH) ), so that the SA can be looked up from the hash on receiving 
an ESP packet.
 Is there a particular reason it is implemented using a linear policy match?

2. There is also an IKEv2 responder implementation that adds/deletes IPSec 
tunnel interfaces. How does this work? Is there any documentation that can be 
referred to?

Thanks
Krishna

On Wed, Jun 27, 2018 at 6:23 PM, Dav

Re: [vpp-dev] mheap performance issue and fixup

2018-07-02 Thread Kingwel Xie
Hi Dave,

I notices you made some improvement to mheap in the latest code. I’m afraid 
that it would not work as you expected. Actually the same idea came to my mind 
and then I realized it doesn’t work well.

Here is your code:

  if (align > MHEAP_ELT_OVERHEAD_BYTES)
n_user_data_bytes = clib_max (n_user_data_bytes,
  align - 
MHEAP_ELT_OVERHEAD_BYTES);

Instead of allocating a very small object, you round it up to a bit bigger. 
Let’s take a typical case: you want 8 bytes, but aligned to 64B. Then you are 
actually allocating from free bin started with 64-8=56.

This is problematic if all meahp elements in free bin size 56B are not 64B 
aligned. It would become a performance issue when this 56B free bin happens to 
have a lot of elements, which means you have to go through all elements one by 
one.

It would not do better even if you turn to a bigger free bin, because you round 
the requested block size up to a bigger size.

As you can see in my patch, I add a modifier to the request block size when 
looking up an appropriate free bin:

In mheap_get_search_free_list:

  /* kingwel, lookup a free bin which is big enough to hold everything 
align+align_offset+lo_free_size+overhead */
  word modifier =
(align >
 MHEAP_USER_DATA_WORD_BYTES ? align + align_offset +
 sizeof (mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

This is to ensure we can always locate an element without going through the 
whole list. Also I take the align_offset into consideration.

BTW, probably I need more time to work out a test case in test_mheap to prove 
my fix. It seems the latest code doesn’t generate test_mheap any more. I have 
to figure it out at first.

Regards,
Kingwel


From: Dave Barach (dbarach) 
Sent: Thursday, June 28, 2018 9:06 PM
To: Kingwel Xie ; Damjan Marion 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] mheap performance issue and fixup

Allocating a large number of 16 byte objects @ 64 byte alignment will never 
work very well. If you pad the object such that the mheap header plus the 
object is exactly 64 bytes, the issue may go away.

With that hint, however, I’ll go build a test vector. It sounds like the mheap 
required size calculation might be a brick shy of a load.

D.

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Thursday, June 28, 2018 2:25 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>; Damjan 
Marion mailto:dmar...@me.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup

No problem. I’ll do that later.

Actually there has been a discussion about mheap performance, which describe 
the issue we are talking about. Please also check it again:

https://lists.fd.io/g/vpp-dev/topic/10642197#6399


From: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Sent: Thursday, June 28, 2018 3:38 AM
To: Damjan Marion mailto:dmar...@me.com>>; Kingwel Xie 
mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: RE: [vpp-dev] mheap performance issue and fixup

+1.

It would be super-helpful if you were to add test cases to 
.../src/vppinfra/test_mheap.c, and push a draft patch so we can reproduce / fix 
the problem(s).

Thanks... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion
Sent: Wednesday, June 27, 2018 3:27 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup


Dear Kingwei,

We finally managed to look at your mheap patches, sorry for delay.

Still we are not 100% convinced that there is a bug(s) in the mheap code.
Please note that that mheap code is stable, not changed frequently and used for 
years.

It will really help if you can provide test vectors for each issue you observed.
It will be much easier to understand the problem and confirm the fix if we are 
able to reproduce it in controlled environment.

thanks,

Damjan


From: mailto:vpp-dev@lists.fd.io>> on behalf of Kingwel 
Xie mailto:kingwel@ericsson.com>>
Date: Thursday, 19 April 2018 at 03:19
To: "Damjan Marion (damarion)" mailto:damar...@cisco.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Hi Damjan,

We will do it asap. Actually we are quite new to vPP and even don’t know how to 
make bug report and code contribution or so.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Damjan Marion
Sent: Wednesday, April 18, 2018 11:30 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>&g

Re: [vpp-dev] mheap performance issue and fixup

2018-06-27 Thread Kingwel Xie
No problem. I’ll do that later.

Actually there has been a discussion about mheap performance, which describe 
the issue we are talking about. Please also check it again:

https://lists.fd.io/g/vpp-dev/topic/10642197#6399


From: Dave Barach (dbarach) 
Sent: Thursday, June 28, 2018 3:38 AM
To: Damjan Marion ; Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] mheap performance issue and fixup

+1.

It would be super-helpful if you were to add test cases to 
.../src/vppinfra/test_mheap.c, and push a draft patch so we can reproduce / fix 
the problem(s).

Thanks... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion
Sent: Wednesday, June 27, 2018 3:27 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup


Dear Kingwei,

We finally managed to look at your mheap patches, sorry for delay.

Still we are not 100% convinced that there is a bug(s) in the mheap code.
Please note that that mheap code is stable, not changed frequently and used for 
years.

It will really help if you can provide test vectors for each issue you observed.
It will be much easier to understand the problem and confirm the fix if we are 
able to reproduce it in controlled environment.

thanks,

Damjan


From: mailto:vpp-dev@lists.fd.io>> on behalf of Kingwel 
Xie mailto:kingwel@ericsson.com>>
Date: Thursday, 19 April 2018 at 03:19
To: "Damjan Marion (damarion)" mailto:damar...@cisco.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Hi Damjan,

We will do it asap. Actually we are quite new to vPP and even don’t know how to 
make bug report and code contribution or so.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Damjan Marion
Sent: Wednesday, April 18, 2018 11:30 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Dear Kingwel,

Thank you for your email. It will be really appreciated if you can submit your 
changes to gerrit, preferably each point in separate patch.
That will be best place to discuss those changes...

Thanks in Advance,

--
Damjan

On 16 Apr 2018, at 10:13, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi all,

We recently worked on GTPU tunnel and our target is to create 2M tunnels. It is 
not as easy as it looks like, and it took us quite some time to figure it out. 
The biggest problem we found is about mheap, which as you know is the low layer 
memory management function of vPP. We believe it makes sense to share what we 
found and what we’ve done to improve the performance of mheap.

First of all, mheap is fast. It has well-designed small object cache and 
multi-level free lists, to speed up the get/put. However, as discussed in the 
mail list before, it has a performance issue when dealing with 
align/align_offset allocation. We managed to locate the problem is brought by a 
pointer ‘rewrite’ in gtp_tunnel_t. This rewrite is a vector and required to be 
aligned to 64B cache line, therefore with 4 bytes align offset. We realized 
that it is because that the free list must be very long, meaning so many 
mheap_elts, but unfortunately it doesn’t have an element which fits to all 3 
prerequisites: size, align, and align offset. In this case,  each allocation 
has to traverse all elements till it reaches the end of element. As a result, 
you might observe each allocation is greater than 10 clocks/call with ‘show 
memory verbose’. It indicates the allocation takes too long, while it should be 
200~300 clocks/call in general. Also you should have noticed ‘per-attempt’ is 
quite high, even more than 100.

The fix is straight and simple : as discussed int his mail list before, to 
allocate ‘rewrite’ from a pool, instead of from mheap. Frankly speaking, it 
looks like a workaround not a real fix, so we spent some time fix the problem 
thoroughly. The idea is to add a few more bytes to the original required block 
size so that mheap will always lookup in a bigger free list, then most likely a 
suitable block can be easily located. Well, now the problem becomes how big is 
this extra size? It should be at least align+align_offset, not hard to 
understand. But after careful analysis we think it is better to be like this, 
see code below:

Mheap.c:545
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
to hold a complete free element. You will understand it if you real

Re: [vpp-dev] Packet rate & throughput satistics for SW interfaces & Error Counters

2018-06-27 Thread Kingwel Xie
Well, I’ll check it further to see how to make use of the new infra.

From: Ole Troan 
Sent: Wednesday, June 27, 2018 5:15 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet rate & throughput satistics for SW interfaces & 
Error Counters

Hi Kingwei,

Did you see the new stats infrastructure where counters where exported to a 
shared memory segment?

Based on that you could write an external agent, that looked for example like 
top that would scrape the interface counters and display pps in real time.

I think that would be a cleaner approach than adding it to the debug cli.

Cheers
Ole

On 27 Jun 2018, at 11:09, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:
Hi all,

I’m proposing some improvements to CLI commands ‘show interface’ and ‘show 
error’.

As you might have known, these two commands show the cumulative counters of 
interfaces and errors. Unfortunately there are no rate statistics for these 
counters, which makes it hard to know how fast vPP workers are working on 
packets forwarding or internal error processing. Probably you have to calculate 
the rates by running some scripts on vPP CLI or via API. This is quite 
inefficient, particularly when you are working on performance tuning.

We added rate statistics for these two CLI commands, but with some hardcoding 
embedded. In others words, the code is not perfect but effective. Do you think 
it is worth merging it to the master track?

Regards,
Kingwel

See screenshots of these improved commands:

The data show DPDK IPsec encryption. We enabled two pg streams which generate 
plain IP/UDP flows. Note at the last column, you can see the packet rate of the 
two IPsec interfaces are 2.7Mpps/2.14Gbps and 1.8M/7.3G respectively. All the 
traffics are eventually forwarded to VF ethernet interfaces as tx packerts 
@4.4Mbps/12Gbps in total.

vpp# show int
  Name Idx State  MTU (L3/IP4/IP6/MPLS) 
Counter  Count   Rate
VirtualFunctionEthernet81/11/7.11   up   0/0/0/0   rx 
packets  1811  0.000 pps
   rx bytes 
  85490  0.000 bps
   tx 
packets 553780750  4.463M pps
   tx bytes 
   187452927008 12.082G bps
   drops
   1792  0.000
   punts
 14  0.000
   ip4  
 14  0.000
VirtualFunctionEthernet81/11/7.12   up   0/0/0/0   drops
  1  0.000
VirtualFunctionEthernet81/11/7  up  1500/0/0/0 rx 
packets  1811  0.000 pps
   rx bytes 
  85490  0.000 bps
   tx 
packets 553780750  4.463M pps
   tx bytes 
   187452927008 12.082G bps
   drops
   1793  0.000
   punts
 14  0.000
   tx-error 
  1  0.000
ipsec0  3   up   0/0/0/0   tx 
packets 332680709  2.682M pps
   tx bytes 
33268070900  2.146G bps
ipsec   4   up   0/0/0/0   tx 
packets 221122821  1.781M pps
   tx bytes 
   113214884352  7.295G bps
local0  0  down  0/0/0/0   rx 
packets 553804032  4.463M pps
   rx bytes 
   146483108864  9.440G bps
   drops
  29949  0.000
   ip4  
  553803530  4.463M
pg0 5   up  9000/0/0/0
vpp# show int verbose
  Name Idx State  MTU (L3/IP4/IP6/MPLS) 
Counter  Count   Rate
VirtualFunctionEthernet81/11/7.11   up   0/

[vpp-dev] Packet rate & throughput satistics for SW interfaces & Error Counters

2018-06-27 Thread Kingwel Xie
Hi all,

I'm proposing some improvements to CLI commands 'show interface' and 'show 
error'.

As you might have known, these two commands show the cumulative counters of 
interfaces and errors. Unfortunately there are no rate statistics for these 
counters, which makes it hard to know how fast vPP workers are working on 
packets forwarding or internal error processing. Probably you have to calculate 
the rates by running some scripts on vPP CLI or via API. This is quite 
inefficient, particularly when you are working on performance tuning.

We added rate statistics for these two CLI commands, but with some hardcoding 
embedded. In others words, the code is not perfect but effective. Do you think 
it is worth merging it to the master track?

Regards,
Kingwel

See screenshots of these improved commands:

The data show DPDK IPsec encryption. We enabled two pg streams which generate 
plain IP/UDP flows. Note at the last column, you can see the packet rate of the 
two IPsec interfaces are 2.7Mpps/2.14Gbps and 1.8M/7.3G respectively. All the 
traffics are eventually forwarded to VF ethernet interfaces as tx packerts 
@4.4Mbps/12Gbps in total.

vpp# show int
  Name Idx State  MTU (L3/IP4/IP6/MPLS) 
Counter  Count   Rate
VirtualFunctionEthernet81/11/7.11   up   0/0/0/0   rx 
packets  1811  0.000 pps
   rx bytes 
  85490  0.000 bps
   tx 
packets 553780750  4.463M pps
   tx bytes 
   187452927008 12.082G bps
   drops
   1792  0.000
   punts
 14  0.000
   ip4  
 14  0.000
VirtualFunctionEthernet81/11/7.12   up   0/0/0/0   drops
  1  0.000
VirtualFunctionEthernet81/11/7  up  1500/0/0/0 rx 
packets  1811  0.000 pps
   rx bytes 
  85490  0.000 bps
   tx 
packets 553780750  4.463M pps
   tx bytes 
   187452927008 12.082G bps
   drops
   1793  0.000
   punts
 14  0.000
   tx-error 
  1  0.000
ipsec0  3   up   0/0/0/0   tx 
packets 332680709  2.682M pps
   tx bytes 
33268070900  2.146G bps
ipsec   4   up   0/0/0/0   tx 
packets 221122821  1.781M pps
   tx bytes 
   113214884352  7.295G bps
local0  0  down  0/0/0/0   rx 
packets 553804032  4.463M pps
   rx bytes 
   146483108864  9.440G bps
   drops
  29949  0.000
   ip4  
  553803530  4.463M
pg0 5   up  9000/0/0/0
vpp# show int verbose
  Name Idx State  MTU (L3/IP4/IP6/MPLS) 
Counter  Count   Rate
VirtualFunctionEthernet81/11/7.11   up   0/0/0/0   rx 
packets  1811  0.000 pps
   rx bytes 
  85490  0.000 bps
   tx 
packets 563321358  4.446M pps

Thread 1 kw_wk_0  :  2.669M

Thread 2 kw_wk_1  :  1.778M
   tx bytes 
   190684525600 12.048G bps
   

Re: [vpp-dev] GTPu

2018-06-19 Thread Kingwel Xie
Hi,

My colleague Liu Anhua has done the patch:

https://gerrit.fd.io/r/#/c/13134/

Couldn’t add Chengqiang as reviewer, gerrit says he/she can not be identified.

Regards,
Kingwel

From: Ni, Hongjun 
Sent: Friday, June 08, 2018 5:54 PM
To: Edward Warnicke ; Kingwel Xie 
; vpp-dev@lists.fd.io
Cc: Yao, Chengqiang 
Subject: RE: [vpp-dev] GTPu

Thanks Kingwei for your effort on improving GTPU plugin.

Please submit some patches and add 
chengqiang@intel.com<mailto:chengqiang@intel.com> as one of reviewers.

Thanks,
Hongjun

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Edward Warnicke
Sent: Friday, June 8, 2018 10:18 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] GTPu

Please do push a patch :)

Ed


On June 8, 2018 at 4:16:19 AM, Kingwel Xie 
(kingwel@ericsson.com<mailto:kingwel@ericsson.com>) wrote:
Hi all,

We are working on improving gtpu plugin, to make it better comply with 3GPP 
standard.

Here is the brief of what we have done:


  1.  Path management – gtpu-process was added
  2.  Error indication
  3.  Bi-direction TEID
  4.  Some bug fixes

I’m thinking of pushing up-stream the improvement. We can make the patch soon. 
Any comments?

Regards,
Kingwel


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9649): https://lists.fd.io/g/vpp-dev/message/9649
Mute This Topic: https://lists.fd.io/mt/21792620/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Integration with libbacktrace

2018-06-15 Thread Kingwel Xie
Done patch:

https://gerrit.fd.io/r/13070

1. make use of glic backtrace in execinfo.h. the old clib_backtrace is changed 
to clib_backtrace_no_good
   the old one is using __builtin_frame_address which can not retrieve stack 
trace in release version (don’t know why)
2. install SIGABRT in signal handler, but have to remove it when backtrace is 
done
   reason is we want to capture stack trace caused by SIGABRT. vPP ASSERT 
always call os_exit then abort()
   we definitely want to know the trace of this situation. It is a little 
tricky to avoid SIGABRT infinite loop
3. A text log file will be generated as well as the output to stderr. we'd like 
to have a simple file to trace the crash
   this file generation can be removed if vPP thinks it is not suitable
4. always load symbols by calling   clib_elf_main_init () in main(). Otherwise, 
only addresses can be displayed.

In the end, this is what we got:

……
load_one_plugin:189: Loaded plugin: srv6ad_plugin.so (Dynamic SRv6 proxy)
load_one_plugin:189: Loaded plugin: srv6am_plugin.so (Masquerading SRv6 proxy)
load_one_plugin:189: Loaded plugin: srv6as_plugin.so (Static SRv6 proxy)
load_one_plugin:189: Loaded plugin: stn_plugin.so (VPP Steals the NIC for 
Container integration)
load_one_plugin:189: Loaded plugin: tlsmbedtls_plugin.so (mbedtls based TLS 
Engine)
load_one_plugin:189: Loaded plugin: tlsopenssl_plugin.so (openssl based TLS 
Engine)
Writing crashdump to ./crashdump-2018-06-15-04-55-45.log ...
#0  0x7f872383d0bc generate_crash_backtrace + 0xfe
#1  0x7f872383d366 unix_signal_handler + 0x1b3
#2  0x7f872206f390 0x7f872206f390
#3  0x7f87222c89a8 hash_header_bytes + 0x2a
#4  0x7f87222c8a04 hash_header + 0x27
#5  0x7f87222ca554 _hash_get + 0x2b
#6  0x7f87222ed17a mheap_get_trace + 0x18e
#7  0x7f87222ea9db mheap_get_aligned + 0x236
#8  0x7f872233fe6a clib_mem_alloc_aligned_at_offset + 0xa5
#9  0x7f872234020a vec_resize_allocate_memory + 0x6f
#10 0x7f87222bbf4a _vec_resize_inline + 0x136
#11 0x7f87222bcf9a do_percent + 0xcb6
#12 0x7f87222bd407 va_format + 0x113
#13 0x7f87222bd5a0 format + 0xb8
#14 0x7f872259614a shm_name_from_svm_map_region_args + 0x32d
Aborted (core dumped)
Makefile:446: recipe for target 'run' failed


From: Damjan Marion 
Sent: Thursday, June 14, 2018 8:31 PM
To: Dave Barach ; Kingwel Xie 
Cc: vpp-dev 
Subject: Re: [vpp-dev] Integration with libbacktrace


+1, send to stderr so systemd can pick it up

Thanks for your contribution...


On 14 Jun 2018, at 14:21, Dave Barach via Lists.Fd.Io 
mailto:dbarach=cisco@lists.fd.io>> wrote:

Sure. Personally, I'd send the backtrace to syslog vs. creating yet-another 
logfile.

D.

-Original Message-
From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Thursday, June 14, 2018 2:54 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
Damjan Marion mailto:dmar...@me.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Integration with libbacktrace

Hi,

Finally I removed libbacktrace dependency, but to use glibc 'backtrace' which 
can be found in execinfo.h.

The generated crashdump will look like below. I can create a patch if you think 
it is valuable.

Regards,
Kingwel

---
DBGvpp#
DBGvpp# test crash

Thread 1 "kw_main" received signal SIGSEGV, Segmentation fault.
0x00407ace in test_crash_command_fn (vm=0x77b89ac0 
, input=0x7fffbceabee0, cmd=0x7fffbcdda1a0)
   at /home/vppshare/kingwel/vpp/build-data/../src/vpp/vnet/main.c:387
/home/vppshare/kingwel/vpp/src/vpp/vnet/main.c:387:8300:beg:0x407ace
(gdb) c
Continuing.
Writing crashdump to ./crashdump-2018-06-14-06-51-53.log ...
#0  0x7794a428 unix_signal_handler + 0x2e7
#1  0x76168390 0x76168390
#2  0x00407ace test_crash_command_fn + 0x53
#3  0x7789bd47 vlib_cli_dispatch_sub_commands + 0xc27
#4  0x7789bc55 vlib_cli_dispatch_sub_commands + 0xb35
#5  0x7789c02c vlib_cli_input + 0xc0
#6  0x77940ac0 unix_cli_process_input + 0x2dc
#7  0x779415e2 unix_cli_process + 0x94
#8  0x778db84d vlib_process_bootstrap + 0x66
#9  0x763ca8a8 0x763ca8a8

Thread 1 "kw_main" received signal SIGABRT, Aborted.
0x75bbe428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x75bbe428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x75bc002a in __GI_abort () at abort.c:89
#2  0x004079c9 in os_exit (code=1) at 
/home/vppshare/kingwel/vpp/build-data/../src/vpp/vnet/main.c:334
#3  0x7794a528 in unix_signal_handler (signum=11, si=0x7fffbceab6f0, 
uc=0x7fffbce

Re: [vpp-dev] Integration with libbacktrace

2018-06-13 Thread Kingwel Xie
Hi,

Finally I removed libbacktrace dependency, but to use glibc 'backtrace' which 
can be found in execinfo.h.

The generated crashdump will look like below. I can create a patch if you think 
it is valuable.

Regards,
Kingwel

---
DBGvpp# 
DBGvpp# test crash 

Thread 1 "kw_main" received signal SIGSEGV, Segmentation fault.
0x00407ace in test_crash_command_fn (vm=0x77b89ac0 
, input=0x7fffbceabee0, cmd=0x7fffbcdda1a0)
at /home/vppshare/kingwel/vpp/build-data/../src/vpp/vnet/main.c:387
/home/vppshare/kingwel/vpp/src/vpp/vnet/main.c:387:8300:beg:0x407ace
(gdb) c
Continuing.
Writing crashdump to ./crashdump-2018-06-14-06-51-53.log ...
#0  0x7794a428 unix_signal_handler + 0x2e7
#1  0x76168390 0x76168390
#2  0x00407ace test_crash_command_fn + 0x53
#3  0x7789bd47 vlib_cli_dispatch_sub_commands + 0xc27
#4  0x7789bc55 vlib_cli_dispatch_sub_commands + 0xb35
#5  0x7789c02c vlib_cli_input + 0xc0
#6  0x77940ac0 unix_cli_process_input + 0x2dc
#7  0x779415e2 unix_cli_process + 0x94
#8  0x778db84d vlib_process_bootstrap + 0x66
#9  0x763ca8a8 0x763ca8a8

Thread 1 "kw_main" received signal SIGABRT, Aborted.
0x75bbe428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x75bbe428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x75bc002a in __GI_abort () at abort.c:89
#2  0x004079c9 in os_exit (code=1) at 
/home/vppshare/kingwel/vpp/build-data/../src/vpp/vnet/main.c:334
#3  0x7794a528 in unix_signal_handler (signum=11, si=0x7fffbceab6f0, 
uc=0x7fffbceab5c0)
at /home/vppshare/kingwel/vpp/build-data/../src/vlib/unix/main.c:181
#4  
#5  0x00407ace in test_crash_command_fn (vm=0x77b89ac0 
, input=0x7fffbceabee0, cmd=0x7fffbcdda1a0)
at /home/vppshare/kingwel/vpp/build-data/../src/vpp/vnet/main.c:387
#6  0x7789bd47 in vlib_cli_dispatch_sub_commands (vm=0x77b89ac0 
, cm=0x77b89ca0 , 
input=0x7fffbceabee0, parent_command_index=143) at 
/home/vppshare/kingwel/vpp/build-data/../src/vlib/cli.c:589
#7  0x7789bc55 in vlib_cli_dispatch_sub_commands (vm=0x77b89ac0 
, cm=0x77b89ca0 , 
input=0x7fffbceabee0, parent_command_index=0) at 
/home/vppshare/kingwel/vpp/build-data/../src/vlib/cli.c:567
#8  0x7789c02c in vlib_cli_input (vm=0x77b89ac0 , 
input=0x7fffbceabee0, 
function=0x7793afe8 , function_arg=0) at 
/home/vppshare/kingwel/vpp/build-data/../src/vlib/cli.c:663
#9  0x77940ac0 in unix_cli_process_input (cm=0x77b89880 
, cli_file_index=0)
at /home/vppshare/kingwel/vpp/build-data/../src/vlib/unix/cli.c:2419
#10 0x779415e2 in unix_cli_process (vm=0x77b89ac0 
, rt=0x7fffbce9b000, f=0x0)
at /home/vppshare/kingwel/vpp/build-data/../src/vlib/unix/cli.c:2531
#11 0x778db84d in vlib_process_bootstrap (_a=140736265931088) at 
/home/vppshare/kingwel/vpp/build-data/../src/vlib/main.c:1231
#12 0x763ca8a8 in clib_calljmp () at 
/home/vppshare/kingwel/vpp/build-data/../src/vppinfra/longjmp.S:110
#13 0x7fffb7234920 in ?? ()
#14 0x778db978 in vlib_process_startup (vm=0x28, p=0x1, f=0x0) at 
/home/vppshare/kingwel/vpp/build-data/../src/vlib/main.c:1253


-Original Message-
From: vpp-dev@lists.fd.io  On Behalf Of Kingwel Xie
Sent: Monday, June 11, 2018 7:48 PM
To: Damjan Marion 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Integration with libbacktrace

No idea. Hope someone can point it out.

The libbacktrace can locate the line number of the code, otherwise we'd like to 
use glibc backtrace which can generate like:

mheap_put+0x230
clib_mem_free+0x190

It would take extra time to convert the offset to the line number by using 
addr2line.

It is a bit heavy to integrate something like libbacktrace. This is why I'm 
wondering. Probably we can take a light-weight approach to add glib backtrace 
shown above. No dependency :-)

Regards,
Kingwel

-Original Message-
From: Damjan Marion  
Sent: Monday, June 11, 2018 7:27 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Integration with libbacktrace


What is the difference between libunwind and libbacktrace?

I can see that libunwind is available as package on ubuntu, and libbacktrace is 
not...


> On 11 Jun 2018, at 11:41, Kingwel Xie  wrote:
> 
> Hi all,
>  
> I’m wondering if it can be accepted by the community.
>  
> Every time when vPP crashes for some reason, core dump file can be used for 
> further analysis. However, core file is usually big and even unfortunately 
> isn’t generated. A trick we used before is to generate a crash dump file to 
> indicate some basic information. This file is a 

Re: [vpp-dev] Integration with libbacktrace

2018-06-11 Thread Kingwel Xie
No idea. Hope someone can point it out.

The libbacktrace can locate the line number of the code, otherwise we'd like to 
use glibc backtrace which can generate like:

mheap_put+0x230
clib_mem_free+0x190

It would take extra time to convert the offset to the line number by using 
addr2line.

It is a bit heavy to integrate something like libbacktrace. This is why I'm 
wondering. Probably we can take a light-weight approach to add glib backtrace 
shown above. No dependency :-)

Regards,
Kingwel

-Original Message-
From: Damjan Marion  
Sent: Monday, June 11, 2018 7:27 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Integration with libbacktrace


What is the difference between libunwind and libbacktrace?

I can see that libunwind is available as package on ubuntu, and libbacktrace is 
not...


> On 11 Jun 2018, at 11:41, Kingwel Xie  wrote:
> 
> Hi all,
>  
> I’m wondering if it can be accepted by the community.
>  
> Every time when vPP crashes for some reason, core dump file can be used for 
> further analysis. However, core file is usually big and even unfortunately 
> isn’t generated. A trick we used before is to generate a crash dump file to 
> indicate some basic information. This file is a text file like below, not 
> intended to replace core dump, but useful for a quick analysis.
>  
> To make it more friendly, we use libbacktrace instead of glic backtrace 
> function which can only indicate the function name plus offset. Therefore, 
> integration of libbacktrace is now in our code branch.
>  
> Can this be pushed upstream? Libbacktrace can be found at:
> https://github.com/ianlancetaylor/libbacktrace
>  
> Regards,
> Kingwel
>  
> --
> 
> DBGvpp# write crashdump to ../crashdump-2018-06-11-05-14-31.log
> /home/vppshare/rich/vpp/build-root/install-vpp_debug-native/vpp/bin/vp
> p: libbacktrace: DWARF underflow in .debug_abbrev at 402002 
> 7f8907da336b unix_signal_handler 
> /home/vppshare/rich/vpp/build-data/../src/vlib/unix/main.c:204
> 7f89065f238f ?? ??:0
> 7f8905e32428 gsignal ??:0
> 7f8905e34029 abort ??:0
> 407f7d os_panic 
> /home/vppshare/rich/vpp/build-data/../src/vpp/vnet/main.c:310
> 7f890686d890 mheap_put 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/mheap.c:815
> 7f89068c1531 clib_mem_free 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/mem.h:186
> 7f89068c1845 vec_resize_allocate_memory 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/vec.c:96
> 7f890683ea09 _vec_resize_inline 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/vec.h:145
> 7f890683fa59 do_percent 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/format.c:341
> 7f890683fec6 va_format 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/format.c:404
> 7f890682bfc3 elog_string 
> /home/vppshare/rich/vpp/build-data/../src/vppinfra/elog.c:541
> 7f8907ff0dc7 elog_id_for_msg_name 
> /home/vppshare/rich/vpp/build-data/../src/vlibapi/api_shared.c:408
> 7f8907ff1250 vl_msg_api_handler_with_vm_node 
> /home/vppshare/rich/vpp/build-data/../src/vlibapi/api_shared.c:540
> 7f8907ffbcec void_mem_api_handle_msg_i 
> /home/vppshare/rich/vpp/build-data/../src/vlibmemory/memory_api.c:675
> 7f8907ffbd5b vl_mem_api_handle_msg_main 
> /home/vppshare/rich/vpp/build-data/../src/vlibmemory/memory_api.c:685
> 7f89080168fe vl_api_clnt_process 
> /home/vppshare/rich/vpp/build-data/../src/vlibmemory/vlib_api.c:380
> 7f8907d346e0 vlib_process_bootstrap 
> /home/vppshare/rich/vpp/build-data/../src/vlib/main.c:1280
>  
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9584): https://lists.fd.io/g/vpp-dev/message/9584
Mute This Topic: https://lists.fd.io/mt/21987007/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Leave: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Integration with libbacktrace

2018-06-11 Thread Kingwel Xie
Hi all,

I'm wondering if it can be accepted by the community.

Every time when vPP crashes for some reason, core dump file can be used for 
further analysis. However, core file is usually big and even unfortunately 
isn't generated. A trick we used before is to generate a crash dump file to 
indicate some basic information. This file is a text file like below, not 
intended to replace core dump, but useful for a quick analysis.

To make it more friendly, we use libbacktrace instead of glic backtrace 
function which can only indicate the function name plus offset. Therefore, 
integration of libbacktrace is now in our code branch.

Can this be pushed upstream? Libbacktrace can be found at:
https://github.com/ianlancetaylor/libbacktrace

Regards,
Kingwel

--
DBGvpp# write crashdump to ../crashdump-2018-06-11-05-14-31.log
/home/vppshare/rich/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp: 
libbacktrace: DWARF underflow in .debug_abbrev at 402002
7f8907da336b unix_signal_handler 
/home/vppshare/rich/vpp/build-data/../src/vlib/unix/main.c:204
7f89065f238f ?? ??:0
7f8905e32428 gsignal ??:0
7f8905e34029 abort ??:0
407f7d os_panic /home/vppshare/rich/vpp/build-data/../src/vpp/vnet/main.c:310
7f890686d890 mheap_put 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/mheap.c:815
7f89068c1531 clib_mem_free 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/mem.h:186
7f89068c1845 vec_resize_allocate_memory 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/vec.c:96
7f890683ea09 _vec_resize_inline 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/vec.h:145
7f890683fa59 do_percent 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/format.c:341
7f890683fec6 va_format 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/format.c:404
7f890682bfc3 elog_string 
/home/vppshare/rich/vpp/build-data/../src/vppinfra/elog.c:541
7f8907ff0dc7 elog_id_for_msg_name 
/home/vppshare/rich/vpp/build-data/../src/vlibapi/api_shared.c:408
7f8907ff1250 vl_msg_api_handler_with_vm_node 
/home/vppshare/rich/vpp/build-data/../src/vlibapi/api_shared.c:540
7f8907ffbcec void_mem_api_handle_msg_i 
/home/vppshare/rich/vpp/build-data/../src/vlibmemory/memory_api.c:675
7f8907ffbd5b vl_mem_api_handle_msg_main 
/home/vppshare/rich/vpp/build-data/../src/vlibmemory/memory_api.c:685
7f89080168fe vl_api_clnt_process 
/home/vppshare/rich/vpp/build-data/../src/vlibmemory/vlib_api.c:380
7f8907d346e0 vlib_process_bootstrap 
/home/vppshare/rich/vpp/build-data/../src/vlib/main.c:1280


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9581): https://lists.fd.io/g/vpp-dev/message/9581
Mute This Topic: https://lists.fd.io/mt/21987007/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Leave: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] GTPu

2018-06-10 Thread Kingwel Xie
Well, it would take a dew days. Just realized we have to pass the review 
process in Ericsson…

Will get it done asap.

From: Ni, Hongjun 
Sent: Friday, June 08, 2018 5:54 PM
To: Edward Warnicke ; Kingwel Xie 
; vpp-dev@lists.fd.io
Cc: Yao, Chengqiang 
Subject: RE: [vpp-dev] GTPu

Thanks Kingwei for your effort on improving GTPU plugin.

Please submit some patches and add 
chengqiang@intel.com<mailto:chengqiang@intel.com> as one of reviewers.

Thanks,
Hongjun

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Edward Warnicke
Sent: Friday, June 8, 2018 10:18 AM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] GTPu

Please do push a patch :)

Ed


On June 8, 2018 at 4:16:19 AM, Kingwel Xie 
(kingwel@ericsson.com<mailto:kingwel@ericsson.com>) wrote:
Hi all,

We are working on improving gtpu plugin, to make it better comply with 3GPP 
standard.

Here is the brief of what we have done:


  1.  Path management – gtpu-process was added
  2.  Error indication
  3.  Bi-direction TEID
  4.  Some bug fixes

I’m thinking of pushing up-stream the improvement. We can make the patch soon. 
Any comments?

Regards,
Kingwel


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9579): https://lists.fd.io/g/vpp-dev/message/9579
Mute This Topic: https://lists.fd.io/mt/21792620/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Leave: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] GTPu

2018-06-08 Thread Kingwel Xie
Hi all,

We are working on improving gtpu plugin, to make it better comply with 3GPP 
standard.

Here is the brief of what we have done:


  1.  Path management – gtpu-process was added
  2.  Error indication
  3.  Bi-direction TEID
  4.  Some bug fixes

I’m thinking of pushing up-stream the improvement. We can make the patch soon. 
Any comments?

Regards,
Kingwel

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9568): https://lists.fd.io/g/vpp-dev/message/9568
Mute This Topic: https://lists.fd.io/mt/21792620/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Leave: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] show trace caused "out of memory"

2018-05-28 Thread Kingwel Xie
Hi,

You should increase heap size. In startup.conf, heapsize 1g or something like 
that.

When running in multi-core environment, vpp definitely need more memory, 
because some global variables have to be expanded to have multiple copies - per 
worker thread. F.g., Interface Counters, Error counters ...

Regards,
Kingwel

From: vpp-dev@lists.fd.io [mailto:vpp-dev@lists.fd.io] On Behalf Of xulang
Sent: Monday, May 28, 2018 6:16 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] show trace caused "out of memory"

Hi all,
When we only use one CPU core, the cmd "show trace max 5000" works well.
But it will crash when we use four CPU cores because of "out of memory".
Below are some information, any guides?

root@vBRAS:~# cat /proc/meminfo
MemTotal:4028788 kB
MemFree:  585636 kB
MemAvailable: 949116 kB
Buffers:   22696 kB
Cached:   592600 kB
SwapCached:0 kB
Active:  1773520 kB
Inactive: 118616 kB
Active(anon):1295912 kB
Inactive(anon):45640 kB
Active(file): 477608 kB
Inactive(file):72976 kB
Unevictable:3656 kB
Mlocked:3656 kB
SwapTotal:976380 kB
SwapFree: 976380 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages:   1280520 kB
Mapped:   112324 kB
Shmem: 62296 kB
Slab:  84456 kB
SReclaimable:  35976 kB
SUnreclaim:48480 kB
KernelStack:5968 kB
PageTables:   267268 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit: 2466484 kB
Committed_AS:   5368769328 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages:348160 kB
CmaTotal:  0 kB
CmaFree:   0 kB
HugePages_Total: 512
HugePages_Free:  384
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:   96064 kB
DirectMap2M: 3049472 kB
DirectMap1G: 3145728 kB


0: load_one_plugin:63: Loaded plugin: 
/usr/lib/vpp_api_test_plugins/ioam_vxlan_gpe_test_plugin.so
0: vlib_pci_bind_to_uio: Skipping PCI device :02:0e.0 as host interface 
ens46 is up
EAL: Detected 4 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
[New Thread 0x7b0019efa700 (LWP 5207)]
[New Thread 0x7b00196f9700 (LWP 5208)]
[New Thread 0x7b0018ef8700 (LWP 5209)]
EAL: PCI device :02:01.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:06.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:07.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:08.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:09.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:0a.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:0b.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:0c.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:0d.0 on NUMA socket -1
EAL:   probe driver: 8086:100f net_e1000_em
EAL: PCI device :02:0e.0 on NUMA socket -1
EAL:   Device is blacklisted, not initializing
DPDK physical memory layout:
Segment 0: phys:0x7d40, len:2097152, virt:0x7b001500, socket_id:0, 
hugepage_sz:2097152, nchannel:0, nrank:0
Segment 1: phys:0x7d80, len:266338304, virt:0x7affe460, socket_id:0, 
hugepage_sz:2097152, nchannel:0, nrank:0
[New Thread 0x7b00186f7700 (LWP 5210)]
/usr/bin/vpp[5202]: dpdk_ipsec_process:241: DPDK Cryptodev support is disabled, 
default to OpenSSL IPsec
/usr/bin/vpp[5202]: dpdk_lib_init:1084: 16384 mbufs allocated but total rx/tx 
ring size is 18432
/usr/bin/vpp[5202]: svm_client_scan_this_region_nolock:1139: /vpe-api: cleanup 
ghost pid 4719
/usr/bin/vpp[5202]: svm_client_scan_this_region_nolock:1139: /global_vm: 
cleanup ghost pid 4719
Thread 1 "vpp_main" received signal SIGABRT, Aborted.
0x7fffef655428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
(gdb)
(gdb)
(gdb) p errno /*there are only 81 opened fd belong to progress VPP*/
$1 = 9
(gdb) bt
#0  0x7fffef655428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7fffef65702a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0040724e in os_panic () at 
/home/vbras/new_trunk/VBRASV100R001_new_trunk/vpp1704/build-data/../src/vpp/vnet/main.c:290
#3  0x7fffefe6b49b in clib_mem_alloc_aligned_at_offset 
(os_out_of_memory_on_failure=1, align_offset=, align=4, 
size=18768606)   /*mmap*/
at 
/home/vbras/new_trunk/VBRASV100R001_new_trunk/vpp1704/build-data/../src/vppinfra/mem.h:102
#4  vec_resize_allocate_memory (v=, 
length_increment=length_increment@entry=1, data_bytes=, 
header_bytes=, header_bytes@entry=0, 
data_align=data_align@entry=4)
at 
/home/vbras/new_t

Re: [vpp-dev] Using custom openssl with vpp #vpp

2018-05-14 Thread Kingwel Xie
Hi,

We managed to link with openssl 1.1 successfully. The OS is ubuntu 16.04. 
Basically we downloaded v1.1, and built it. Then make some changes to vpp 
makefile pointing to the correct header and lib files.

V1.0.2 is still there for the other apps, but vpp is working with v1.1.

Regards,
Kingwel


From: vpp-dev@lists.fd.io [mailto:vpp-dev@lists.fd.io] On Behalf Of Florin Coras
Sent: Monday, May 14, 2018 11:18 PM
To: duct...@viettel.com.vn
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Using custom openssl with vpp #vpp

Hi DucTM,

Did you try changing src/plugin/tlsopenssl.am to link against openssl 1.1? I’ve 
never tried it, so no idea what the end result may be :-)

Florin


On May 14, 2018, at 3:52 AM, 
duct...@viettel.com.vn wrote:

Hi,
I'm trying to customize the openssl plugin that needs to work with openssl 1.1 
(with some modification also).
Applying the new openssl version to the system is not possible since there are 
some other apps rely on openssl, and they do not work with openssl 1.1.
Is there any configuration I can make to use vpp with a relative build openssl? 
Or just some idea about how to achieve that.
Any help will be highly appreciated.

DucTM




Re: [vpp-dev] Question of worker thread handoff

2018-04-23 Thread Kingwel Xie
Hi Damjan,

I would say there is nothing we can do but to drop the packets. Developers have 
to take care of how to use handoff mechanism. In our case or NAT case, it has 
to be ‘no wait’, while it could be ‘wait forever’ in a stream line mode – 
worker A -> B -> C.

BTW, my colleague Lollita thought there might be some improvement we can do to 
the handoff queue. Please Lollita share what you found.

Regards,
Kingwel

From: Damjan Marion (damarion) [mailto:damar...@cisco.com]
Sent: Monday, April 23, 2018 8:32 PM
To: Kingwel Xie 
Cc: Ole Troan ; vpp-dev ; Lollita 
Liu 
Subject: Re: [vpp-dev] Question of worker thread handoff


Dear Kingwel,

What would you expect from us to do if A  waits for B to take stuff from the 
queue and on the same time
B waits for A for the same reason beside what we already do in NAT code, and 
that is to drop instead of wait.

--
Damjan


On 23 Apr 2018, at 14:14, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Ole, Damjan,

Thanks, for the comments.

But I’m afraid this is the typical case that workers handoff to each other if 
we don’t want to create an I/O thread which might become the bottleneck in the 
end.

Regards,
Kingwel

From: Damjan Marion (damarion) [mailto:damar...@cisco.com]
Sent: Monday, April 23, 2018 6:25 PM
To: Ole Troan mailto:otr...@employees.org>>; Kingwel Xie 
mailto:kingwel@ericsson.com>>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>; Lollita Liu 
mailto:lollita@ericsson.com>>
Subject: Re: [vpp-dev] Question of worker thread handoff


Yes, there are 2 options when handoff queue is full, drop or wait.
Wait gives you nice back-pressure mechanism as it will slow down input worker,
but it will not work in case when A handoffs to B and B handoffs to A.

--
Damjan



On 23 Apr 2018, at 10:50, Ole Troan 
mailto:otr...@employees.org>> wrote:

Kingwei,

Yes, it's possible to dead lock in this case.
We had a similar issue with the NAT implementation. While testing I think we 
ended up dropping when the queue was full.

Best regards,
Ole



On 23 Apr 2018, at 10:33, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan and all,

We are currently thinking of how to utilize the handoff mechanism to serve our 
application logic – to run a packet re-ordering and re-transmit queue in the 
same worker context to avoid any lock between threads. We come across with a 
question when looking into the implementation of handoff. Hope you can figure 
it out for us.

In vlib_get_worker_handoff_queue_elt -> vlib_get_frame_queue_elt  :

 /* Wait until a ring slot is available */
 while (new_tail >= fq->head_hint + fq->nelts)
vlib_worker_thread_barrier_check ();

We understand that the worker has wait for the any available slot from the 
other worker before putting buffer into it. Then the question is: is it a 
potential dead lock that two workers waiting for each other? F.g., Worker A and 
B, A is going to handoff to B but unfortunately at the same time B has the same 
thing to do to A, then they are both waiting forever. If it is true, is it 
better to drop the packet when the ring is full?

I copied my colleague Lollita into this discussion, she is working on it and 
knows better about this hypothesis.

Regards,
Kingwel







Re: [vpp-dev] Question of worker thread handoff

2018-04-23 Thread Kingwel Xie
Hi Ole, Damjan,

Thanks, for the comments.

But I’m afraid this is the typical case that workers handoff to each other if 
we don’t want to create an I/O thread which might become the bottleneck in the 
end.

Regards,
Kingwel

From: Damjan Marion (damarion) [mailto:damar...@cisco.com]
Sent: Monday, April 23, 2018 6:25 PM
To: Ole Troan ; Kingwel Xie 
Cc: vpp-dev ; Lollita Liu 
Subject: Re: [vpp-dev] Question of worker thread handoff


Yes, there are 2 options when handoff queue is full, drop or wait.
Wait gives you nice back-pressure mechanism as it will slow down input worker,
but it will not work in case when A handoffs to B and B handoffs to A.

--
Damjan


On 23 Apr 2018, at 10:50, Ole Troan 
mailto:otr...@employees.org>> wrote:

Kingwei,

Yes, it's possible to dead lock in this case.
We had a similar issue with the NAT implementation. While testing I think we 
ended up dropping when the queue was full.

Best regards,
Ole


On 23 Apr 2018, at 10:33, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi Damjan and all,

We are currently thinking of how to utilize the handoff mechanism to serve our 
application logic – to run a packet re-ordering and re-transmit queue in the 
same worker context to avoid any lock between threads. We come across with a 
question when looking into the implementation of handoff. Hope you can figure 
it out for us.

In vlib_get_worker_handoff_queue_elt -> vlib_get_frame_queue_elt  :

 /* Wait until a ring slot is available */
 while (new_tail >= fq->head_hint + fq->nelts)
vlib_worker_thread_barrier_check ();

We understand that the worker has wait for the any available slot from the 
other worker before putting buffer into it. Then the question is: is it a 
potential dead lock that two workers waiting for each other? F.g., Worker A and 
B, A is going to handoff to B but unfortunately at the same time B has the same 
thing to do to A, then they are both waiting forever. If it is true, is it 
better to drop the packet when the ring is full?

I copied my colleague Lollita into this discussion, she is working on it and 
knows better about this hypothesis.

Regards,
Kingwel








[vpp-dev] Question of worker thread handoff

2018-04-23 Thread Kingwel Xie
Hi Damjan and all,

We are currently thinking of how to utilize the handoff mechanism to serve our 
application logic – to run a packet re-ordering and re-transmit queue in the 
same worker context to avoid any lock between threads. We come across with a 
question when looking into the implementation of handoff. Hope you can figure 
it out for us.

In vlib_get_worker_handoff_queue_elt -> vlib_get_frame_queue_elt  :

  /* Wait until a ring slot is available */
  while (new_tail >= fq->head_hint + fq->nelts)
vlib_worker_thread_barrier_check ();

We understand that the worker has wait for the any available slot from the 
other worker before putting buffer into it. Then the question is: is it a 
potential dead lock that two workers waiting for each other? F.g., Worker A and 
B, A is going to handoff to B but unfortunately at the same time B has the same 
thing to do to A, then they are both waiting forever. If it is true, is it 
better to drop the packet when the ring is full?

I copied my colleague Lollita into this discussion, she is working on it and 
knows better about this hypothesis.

Regards,
Kingwel





Re: [vpp-dev] mheap performance issue and fixup

2018-04-23 Thread Kingwel Xie
Did you delete the handle of the shared memory? These two files:  
/dev/shm/global_vm  vpe-api

The mheap header structure is changed, so you have to ask vPP to re-create the 
shared memory heap.

Sorry, I forgot to mention that before.


From: 薛欣颖 [mailto:xy...@fiberhome.com]
Sent: Monday, April 23, 2018 3:25 PM
To: Kingwel Xie ; Damjan Marion ; 
nranns 
Cc: vpp-dev 
Subject: Re: Re: [vpp-dev] mheap performance issue and fixup

Hi Kingwel,

After I merged the three patch ,there is a SIGSEGV when I startup vpp (not 
every time). And the error didn't appear before .
Is there anything I can do to fix it?

Program received signal SIGSEGV, Segmentation fault.
clib_mem_alloc_aligned_at_offset (size=54, align=4, align_offset=4, 
os_out_of_memory_on_failure=1) at /home/vpp/build-data/../src/vppinfra/mem.h:90
90 cpu = os_get_thread_index ();
(gdb) bt
#0 clib_mem_alloc_aligned_at_offset (size=54, align=4, align_offset=4, 
os_out_of_memory_on_failure=1) at /home/vpp/build-data/../src/vppinfra/mem.h:90
#1 0x7697fcde in vec_resize_allocate_memory (v=0x0, 
length_increment=50, data_bytes=54, header_bytes=4, data_align=4)
at /home/vpp/build-data/../src/vppinfra/vec.c:59
#2 0x769313c7 in _vec_resize (v=0x0, length_increment=50, 
data_bytes=50, header_bytes=0, data_align=0)
at /home/vpp/build-data/../src/vppinfra/vec.h:142
#3 0x769322bb in do_percent (_s=0x7fffb6cee348, fmt=0x76995c90 
"%s:%d (%s) assertion `%s' fails", va=0x7fffb6cee3e0)
at /home/vpp/build-data/../src/vppinfra/format.c:339
#4 0x76932703 in va_format (s=0x0, fmt=0x76995c90 "%s:%d (%s) 
assertion `%s' fails", va=0x7fffb6cee3e0)
at /home/vpp/build-data/../src/vppinfra/format.c:402
#5 0x7692ce4e in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, fmt=0x76995c90 "%s:%d (%s) assertion `%s' fails")
at /home/vpp/build-data/../src/vppinfra/error.c:127
#6 0x769496a3 in mheap_get_search_free_bin (v=0x3000a000, bin=12, 
n_user_data_bytes_arg=0x7fffb6cee6b0, align=4, align_offset=0)
at /home/vpp/build-data/../src/vppinfra/mheap.c:401
#7 0x76949e86 in mheap_get_search_free_list (v=0x3000a000, 
n_user_bytes_arg=0x7fffb6cee6b0, align=4, align_offset=0)
at /home/vpp/build-data/../src/vppinfra/mheap.c:569
#8 0x7694a326 in mheap_get_aligned (v=0x3000a000, n_user_data_bytes=56, 
align=4, align_offset=0, offset_return=0x7fffb6cee758)
at /home/vpp/build-data/../src/vppinfra/mheap.c:700
#9 0x7697f91e in clib_mem_alloc_aligned_at_offset (size=54, align=4, 
align_offset=4, os_out_of_memory_on_failure=1)
at /home/vpp/build-data/../src/vppinfra/mem.h:92
#10 0x7697fcde in vec_resize_allocate_memory (v=0x0, 
length_increment=50, data_bytes=54, header_bytes=4, data_align=4)
at /home/vpp/build-data/../src/vppinfra/vec.c:59
#11 0x769313c7 in _vec_resize (v=0x0, length_increment=50, 
data_bytes=50, header_bytes=0, data_align=0)
at /home/vpp/build-data/../src/vppinfra/vec.h:142
#12 0x769322bb in do_percent (_s=0x7fffb6ceea78, fmt=0x76995c90 
"%s:%d (%s) assertion `%s' fails", va=0x7fffb6ceeb10)
at /home/vpp/build-data/../src/vppinfra/format.c:339
#13 0x76932703 in va_format (s=0x0, fmt=0x76995c90 "%s:%d (%s) 
assertion `%s' fails", va=0x7fffb6ceeb10)
at /home/vpp/build-data/../src/vppinfra/format.c:402
#14 0x7692ce4e in _clib_error (how_to_die=2, function_name=0x0, 
line_number=0, fmt=0x76995c90 "%s:%d (%s) assertion `%s' fails")
at /home/vpp/build-data/../src/vppinfra/error.c:127
#15 0x769496a3 in mheap_get_search_free_bin (v=0x3000a000, bin=12, 
n_user_data_bytes_arg=0x7fffb6ceede0, align=4, align_offset=0)
at /home/vpp/build-data/../src/vppinfra/mheap.c:401
#16 0x76949e86 in mheap_get_search_free_list (v=0x3000a000, 
n_user_bytes_arg=0x7fffb6ceede0, align=4, align_offset=0)
at /home/vpp/build-data/../src/vppinfra/mheap.c:569
#17 0x7694a326 in mheap_get_aligned (v=0x3000a000, 
n_user_data_bytes=56, align=4, align_offset=0, offset_return=0x7fffb6ceee88)
at /home/vpp/build-data/../src/vppinfra/mheap.c:700
#18 0x7697f91e in clib_mem_alloc_aligned_at_offset (size=54, align=4, 
align_offset=4, os_out_of_memory_on_failure=1)
at /home/vpp/build-data/../src/vppinfra/mem.h:92
#19 0x7697fcde in vec_resize_allocate_memory (v=0x0, 
length_increment=50, data_bytes=54, header_bytes=4, data_align=4)
---Type  to continue, or q  to quit---
at /home/vpp/build-data/../src/vppinfra/vec.c:59
#20 0x769313c7 in _vec_resize (v=0x0, length_increment=50, 
data_bytes=50, header_bytes=0, data_align=0)
at /home/vpp/build-data/../src/vppinfra/vec.h:142
#21 0x769322bb in do_percent (_s=0x7fffb6cef1a8, fmt=0x76995c90 
"%s:%d (%s) assertion `%s' fails", va=0x7fffb6cef240)
at /home/vpp/build-data/../src/vppinfra/format.c:339
#22 0x76932703 in va_format (s=0x0, fmt=0x76995c90 &

Re: [vpp-dev] mheap performance issue and fixup

2018-04-20 Thread Kingwel Xie
Hi,

Finally I managed to create 3 patches to include all modifications to mheap. 
Please check below for details. I’ll do some other patches later…

https://gerrit.fd.io/r/11950
https://gerrit.fd.io/r/11952
https://gerrit.fd.io/r/11957

Hi Xue, you need at least the first one for your test.

Regards,
Kingwel

From: Kingwel Xie
Sent: Thursday, April 19, 2018 9:20 AM
To: Damjan Marion 
Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] mheap performance issue and fixup

Hi Damjan,

We will do it asap. Actually we are quite new to vPP and even don’t know how to 
make bug report and code contribution or so.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Damjan Marion
Sent: Wednesday, April 18, 2018 11:30 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Dear Kingwel,

Thank you for your email. It will be really appreciated if you can submit your 
changes to gerrit, preferably each point in separate patch.
That will be best place to discuss those changes...

Thanks in Advance,

--
Damjan

On 16 Apr 2018, at 10:13, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi all,

We recently worked on GTPU tunnel and our target is to create 2M tunnels. It is 
not as easy as it looks like, and it took us quite some time to figure it out. 
The biggest problem we found is about mheap, which as you know is the low layer 
memory management function of vPP. We believe it makes sense to share what we 
found and what we’ve done to improve the performance of mheap.

First of all, mheap is fast. It has well-designed small object cache and 
multi-level free lists, to speed up the get/put. However, as discussed in the 
mail list before, it has a performance issue when dealing with 
align/align_offset allocation. We managed to locate the problem is brought by a 
pointer ‘rewrite’ in gtp_tunnel_t. This rewrite is a vector and required to be 
aligned to 64B cache line, therefore with 4 bytes align offset. We realized 
that it is because that the free list must be very long, meaning so many 
mheap_elts, but unfortunately it doesn’t have an element which fits to all 3 
prerequisites: size, align, and align offset. In this case,  each allocation 
has to traverse all elements till it reaches the end of element. As a result, 
you might observe each allocation is greater than 10 clocks/call with ‘show 
memory verbose’. It indicates the allocation takes too long, while it should be 
200~300 clocks/call in general. Also you should have noticed ‘per-attempt’ is 
quite high, even more than 100.

The fix is straight and simple : as discussed int his mail list before, to 
allocate ‘rewrite’ from a pool, instead of from mheap. Frankly speaking, it 
looks like a workaround not a real fix, so we spent some time fix the problem 
thoroughly. The idea is to add a few more bytes to the original required block 
size so that mheap will always lookup in a bigger free list, then most likely a 
suitable block can be easily located. Well, now the problem becomes how big is 
this extra size? It should be at least align+align_offset, not hard to 
understand. But after careful analysis we think it is better to be like this, 
see code below:

Mheap.c:545
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
to hold a complete free element. You will understand it if you really know how 
mheap_get_search_free_bin is working. I am not going to go through the detail 
of it. In short, every lookup in free list will always locate a suitable 
element, in other words, the hit rate of free list will be almost 100%, and the 
‘per-attempt’ will be always around 1. The test result looks very promising, 
please see below after adding 2M gtpu tunnels and 2M routing entries:

Thread 0 vpp_main
13689507 objects, 3048367k of 3505932k used, 243663k free, 243656k reclaimed, 
106951k overhead, 4194300k capacity
  alloc. from small object cache: 47325868 hits 65271210 attempts (72.51%) 
replacements 8266122
  alloc. from free-list: 21879233 attempts, 21877898 hits (99.99%), 21882794 
considered (per-attempt 1.00)
  alloc. low splits: 13355414, high splits: 512984, combined: 281968
  alloc. from vector-expand: 81907
  allocs: 69285673 276.00 clocks/call
  frees: 55596166 173.09 clocks/call
Free list:
bin 3:
20(82220170 48)
total 1
bin 273:
28340k(80569efc 60)
total 1
bin 276:
215323k(8c88df6c 44)
total 1
Total count in free bin: 3

You can see, as pointed out before, the hit rate is very high, >99.9%, and 
per-attempt is ~1. Furthermore, the total elements in free list is only 3.

Apart from we discussed above, we also made some other improvements/bug fixes 
to 

Re: [vpp-dev] questions in configuring tunnel

2018-04-19 Thread Kingwel Xie
, source=FIB_SOURCE_RR, 
old_flags=FIB_ENTRY_FLAG_NONE) at 
/home/vppshare/kingwel/vpp/build-data/../src/vnet/fib/fib_entry.c:749
#6  0x77386ba9 in fib_entry_source_change (fib_entry=0x7fffb54f88c4, 
old_source=FIB_SOURCE_RR, new_source=FIB_SOURCE_RR)
at /home/vppshare/kingwel/vpp/build-data/../src/vnet/fib/fib_entry.c:795
#7  0x77386c3c in fib_entry_special_add (fib_entry_index=7, 
source=FIB_SOURCE_RR, flags=FIB_ENTRY_FLAG_NONE, 
dpo=0x7fffb6e86760) at 
/home/vppshare/kingwel/vpp/build-data/../src/vnet/fib/fib_entry.c:811
#8  0x77371bdc in fib_table_entry_special_dpo_add (fib_index=0, 
prefix=0x7fffb6e868f0, source=FIB_SOURCE_RR, 
flags=FIB_ENTRY_FLAG_NONE, dpo=0x7fffb6e86760) at 
/home/vppshare/kingwel/vpp/build-data/../src/vnet/fib/fib_table.c:325
#9  0x77371e0f in fib_table_entry_special_add (fib_index=0, 
prefix=0x7fffb6e868f0, source=FIB_SOURCE_RR, 
flags=FIB_ENTRY_FLAG_NONE) at 
/home/vppshare/kingwel/vpp/build-data/../src/vnet/fib/fib_table.c:390
#10 0x7fffb359b006 in vnet_gtpu_add_del_tunnel (a=0x7fffb6e86a50, 
sw_if_indexp=0x7fffb6e869dc)
at /home/vppshare/kingwel/vpp/build-data/../src/plugins/gtpu/gtpu.c:490



The tentative solution is simple: comment fib_table_entry_special_add, 
fib_entry_child_add, gtpu_tunnel_restack_dpo. It is ok for a gtp tunnel, which 
might not be so critical for missing route change notification. Not so good at 
fib part of vPP code, but we really appreciate if someone can point out a 
better solution.  

Regards,
Kingwel








-Original Message-
From: Neale Ranns (nranns) [mailto:nra...@cisco.com] 
Sent: Thursday, April 19, 2018 9:06 PM
To: Kingwel Xie ; xyxue 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] questions in configuring tunnel


Hi Kingwei,

 
[nr] if you skip this then the tunnels are not part of the FIB graph and hence 
any updates in the forwarding to the tunnel’s destination will go unnoticed and 
hence you potentially black hole the tunnel traffic indefinitely (since the 
tunnel is not re-stacked). It is a linked list, but apart from the pool 
allocation of the list element, the list element insertion is O(1), no?
[kingwel] You are right that the update will not be noticed, but we think it is 
acceptable for a p2p tunnel interface. The list element itself is ok when being 
inserted, but the following restack operation will walk through all inserted 
elements. This is the point I’m talking about.
    
restacking will indeed walk all of the child objects of a parent, but this is 
an operation that occurs only when the forwarding of that parent changes (which 
in this case is the route to the tunnel’s destination) and this walk is done 
asynchronously, since the child uses a recursive path. You should not see these 
walks occurring whilst you are creating tunnels (please confirm) so they should 
not affect the tunnel setup time. 

/neale



-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9004): https://lists.fd.io/g/vpp-dev/message/9004
View All Messages In Topic (11): https://lists.fd.io/g/vpp-dev/topic/17543966
Mute This Topic: https://lists.fd.io/mt/17543966/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] questions in configuring tunnel

2018-04-19 Thread Kingwel Xie
Hi Xue,

As I said, there are something else you have to do.


  1.  Better to use socket based API. Otherwise, you might suffer the 10ms 
sleep time of linux_epoll_input. Or please hard code the sleep time to 
100~200us in vl_api_clnt_process:430 vlib_process_wait_for_event_or_clock. 
Don’t know if you understand what I mean. Hard to explain in a few words, but 
this is the behavior of vPP process scheduler. I would not say this is a 
perfect design, but it is just like that.
  2.  Comment the lines of fib_table_entry_special_add and fib_entry_child_add 
in vnet_gtpu_add_del_tunnel  in gtpu.c. The former would create the fib entry 
for tunnel interface, thus might create a long linked list for covered fib 
entry(the default route in most cases), when you have many tunnel endpoints, 
while the latter would create a child node linked list for each fib entry 
created by fib_table_entry_special_add, when you have just a few tunnel 
endpoints. We kind of discussed why in previous email. I would guess this fib 
entry is not so important for a gtp tunnel, because the gtp traffic would 
finally hit a valid fib entry then get sent out
  3.  You should really change the mheap as I mentioned. Otherwise the slow 
memory allocation will kill you in the end.
  4.  We made some other improvements, to avoid frequently memory 
resize/allocation. We managed to start with a very large (2M in our case) 
sw_if_index, so that many vectors related to sw_if_index, are validated to a 
appropriate size. However, I wouldn’t recommend you to do so, because you just 
need 100K.

Hope it helps, instead of confusing you.

Regards,
Kingwel


From: 薛欣颖 [mailto:xy...@fiberhome.com]
Sent: Thursday, April 19, 2018 7:43 PM
To: Kingwel Xie ; nranns 
Cc: vpp-dev 
Subject: Re: Re: [vpp-dev] questions in configuring tunnel

Hi Kingwel,

Thank you very much for your share of the solution of  'cache line alignment'.
I saw you configured 2M gtpu tunnel in 200s .  When I merge the patch 10216 , 
configure 100K gtpu tunnel cost 7 mins.  How do your configure rate reach so 
fast?


Thanks,
Xyxue
____

From: Kingwel Xie<mailto:kingwel@ericsson.com>
Date: 2018-04-19 17:11
To: 薛欣颖<mailto:xy...@fiberhome.com>; nranns<mailto:nra...@cisco.com>
CC: vpp-dev<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] questions in configuring tunnel
Hi Xue,

I’m afraid it will take a few days to commit the code.

For now I copied the key changes for your reference. It should work.

Regards,
Kingwel


/* Search free lists for object with given size and alignment. */
static uword
mheap_get_search_free_list (void *v,
uword * n_user_bytes_arg,
uword align, uword 
align_offset)
{
  mheap_t *h = mheap_header (v);
  uword bin, n_user_bytes, i, bi;

  n_user_bytes = *n_user_bytes_arg;
  bin = user_data_size_to_bin_index (n_user_bytes);

  if (MHEAP_HAVE_SMALL_OBJECT_CACHE
  && (h->flags & MHEAP_FLAG_SMALL_OBJECT_CACHE)
  && bin < 255
  && align == STRUCT_SIZE_OF (mheap_elt_t, user_data[0])
  && align_offset == 0)
{
  uword r = mheap_get_small_object (h, bin);
  h->stats.n_small_object_cache_attempts += 1;
  if (r != MHEAP_GROUNDED)
{
  h->stats.n_small_object_cache_hits += 1;
  return r;
}
}

  /* kingwel, lookup a free bin which is big enough to hold everything 
align+align_offset+lo_free_size+overhead */
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);
  for (i = bin / BITS (uword); i < ARRAY_LEN (h->non_empty_free_elt_heads);
   i++)
{
  uword non_empty_bin_mask = h->non_empty_free_elt_heads[i];

  /* No need to search smaller bins. */
  if (i == bin / BITS (uword))
non_empty_bin_mask &= ~pow2_mask (bin % BITS (uword));

  /* Search each occupied free bin which is large enough. */
  /* *INDENT-OFF* */
  foreach_set_bit (bi, non_empty_bin_mask,
  ({
uword r =
  mheap_get_search_free_bin (v, bi + i * BITS (uword),
 n_user_bytes_arg,
 align,
 align_offset);
if (r != MHEAP_GROUNDED) return r;
  }));
  /* *INDENT-ON* */
}

  return MHEAP_GROUNDED;
}



From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of xyxue
Sent: Thursday, April 19, 2018 4:02 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
nranns mailto:nra...@cisco.com>>
Cc: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi,

Thank you all for your help . I've lea

Re: [vpp-dev] questions in configuring tunnel

2018-04-19 Thread Kingwel Xie
Hi Xue,

I’m afraid it will take a few days to commit the code.

For now I copied the key changes for your reference. It should work.

Regards,
Kingwel


/* Search free lists for object with given size and alignment. */
static uword
mheap_get_search_free_list (void *v,
uword * n_user_bytes_arg,
uword align, uword 
align_offset)
{
  mheap_t *h = mheap_header (v);
  uword bin, n_user_bytes, i, bi;

  n_user_bytes = *n_user_bytes_arg;
  bin = user_data_size_to_bin_index (n_user_bytes);

  if (MHEAP_HAVE_SMALL_OBJECT_CACHE
  && (h->flags & MHEAP_FLAG_SMALL_OBJECT_CACHE)
  && bin < 255
  && align == STRUCT_SIZE_OF (mheap_elt_t, user_data[0])
  && align_offset == 0)
{
  uword r = mheap_get_small_object (h, bin);
  h->stats.n_small_object_cache_attempts += 1;
  if (r != MHEAP_GROUNDED)
{
  h->stats.n_small_object_cache_hits += 1;
  return r;
}
}

  /* kingwel, lookup a free bin which is big enough to hold everything 
align+align_offset+lo_free_size+overhead */
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);
  for (i = bin / BITS (uword); i < ARRAY_LEN (h->non_empty_free_elt_heads);
   i++)
{
  uword non_empty_bin_mask = h->non_empty_free_elt_heads[i];

  /* No need to search smaller bins. */
  if (i == bin / BITS (uword))
non_empty_bin_mask &= ~pow2_mask (bin % BITS (uword));

  /* Search each occupied free bin which is large enough. */
  /* *INDENT-OFF* */
  foreach_set_bit (bi, non_empty_bin_mask,
  ({
uword r =
  mheap_get_search_free_bin (v, bi + i * BITS (uword),
 n_user_bytes_arg,
 align,
 align_offset);
if (r != MHEAP_GROUNDED) return r;
  }));
  /* *INDENT-ON* */
}

  return MHEAP_GROUNDED;
}



From: vpp-dev@lists.fd.io [mailto:vpp-dev@lists.fd.io] On Behalf Of xyxue
Sent: Thursday, April 19, 2018 4:02 PM
To: Kingwel Xie ; nranns 
Cc: vpp-dev 
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi,

Thank you all for your help . I've learned so much in your discussion . There 
is some questions to ask for advice:

About the 3th advice's solution , Can you commit this part, or tell us the 
method to handle it?


The patch 10216 is the solution of 'gtpu geneve vxlan vxlan-gre' .  When we 
create gtpu tunnel ,vpp add 'virtual node' . But create mpls and gre , vpp add 
'true node' to trans.
We can delete the gtpu's 'virtual node'  but not the 'true node' . Is there any 
solution for mpls and gre?

Thanks,
Xyxue


From: Kingwel Xie<mailto:kingwel@ericsson.com>
Date: 2018-04-19 13:44
To: Neale Ranns (nranns)<mailto:nra...@cisco.com>; 
薛欣颖<mailto:xy...@fiberhome.com>
CC: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] questions in configuring tunnel
Thanks for the comments. Please see mine in line.


From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Wednesday, April 18, 2018 9:18 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>; 
xyxue mailto:xy...@fiberhome.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi Kingwei,

Thank you for your analysis. Some comments inline (on subjects I know a bit 
about ☺ )

Regards,
neale

From: Kingwel Xie mailto:kingwel@ericsson.com>>
Date: Wednesday, 18 April 2018 at 13:49
To: "Neale Ranns (nranns)" mailto:nra...@cisco.com>>, xyxue 
mailto:xy...@fiberhome.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] questions in configuring tunnel

Hi,

As we understand, this patch would bypass the node replication, so that adding 
tunnel would not cause main thread to wait for workers  synchronizing the nodes.

However, in addition to that, you have to do more things to be able to add 40k 
or more tunnels in a predictable time period. Here is what we did for adding 2M 
gtp tunnels, for your reference. Mpls tunnel should be pretty much the same.


  1.  Don’t call fib_entry_child_add after adding fib entry to the tunnel 
(fib_table_entry_special_add ). This will create a linked list for all child 
nodes belonged to the fib entry pointed to the tunnel endpoint. As a result, 
adding tunnel would become slower and slower. BTW, it is not a good fix, but it 
works.
  #if 0
  t->sibling_index = fib_entry_child_add
   

Re: [vpp-dev] mheap performance issue and fixup

2018-04-19 Thread Kingwel Xie
Get it. Will look into it. It will take a few days…

I’ll ask someone in the team to commit the code, then ask for your review.

From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Thursday, April 19, 2018 4:30 PM
To: Kingwel Xie ; Damjan Marion (damarion) 

Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] mheap performance issue and fixup

Hi Kingwei,

The instructions are here:
  
https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Pushing

you can also file a bug report here:
  https://jira.fd.io
but we don’t insist on bug reports when making changes to code on the master 
branch.

Regards,
Neale


From: mailto:vpp-dev@lists.fd.io>> on behalf of Kingwel 
Xie mailto:kingwel@ericsson.com>>
Date: Thursday, 19 April 2018 at 03:19
To: "Damjan Marion (damarion)" mailto:damar...@cisco.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Hi Damjan,

We will do it asap. Actually we are quite new to vPP and even don’t know how to 
make bug report and code contribution or so.

Regards,
Kingwel

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Damjan Marion
Sent: Wednesday, April 18, 2018 11:30 PM
To: Kingwel Xie mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] mheap performance issue and fixup

Dear Kingwel,

Thank you for your email. It will be really appreciated if you can submit your 
changes to gerrit, preferably each point in separate patch.
That will be best place to discuss those changes...

Thanks in Advance,

--
Damjan

On 16 Apr 2018, at 10:13, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi all,

We recently worked on GTPU tunnel and our target is to create 2M tunnels. It is 
not as easy as it looks like, and it took us quite some time to figure it out. 
The biggest problem we found is about mheap, which as you know is the low layer 
memory management function of vPP. We believe it makes sense to share what we 
found and what we’ve done to improve the performance of mheap.

First of all, mheap is fast. It has well-designed small object cache and 
multi-level free lists, to speed up the get/put. However, as discussed in the 
mail list before, it has a performance issue when dealing with 
align/align_offset allocation. We managed to locate the problem is brought by a 
pointer ‘rewrite’ in gtp_tunnel_t. This rewrite is a vector and required to be 
aligned to 64B cache line, therefore with 4 bytes align offset. We realized 
that it is because that the free list must be very long, meaning so many 
mheap_elts, but unfortunately it doesn’t have an element which fits to all 3 
prerequisites: size, align, and align offset. In this case,  each allocation 
has to traverse all elements till it reaches the end of element. As a result, 
you might observe each allocation is greater than 10 clocks/call with ‘show 
memory verbose’. It indicates the allocation takes too long, while it should be 
200~300 clocks/call in general. Also you should have noticed ‘per-attempt’ is 
quite high, even more than 100.

The fix is straight and simple : as discussed int his mail list before, to 
allocate ‘rewrite’ from a pool, instead of from mheap. Frankly speaking, it 
looks like a workaround not a real fix, so we spent some time fix the problem 
thoroughly. The idea is to add a few more bytes to the original required block 
size so that mheap will always lookup in a bigger free list, then most likely a 
suitable block can be easily located. Well, now the problem becomes how big is 
this extra size? It should be at least align+align_offset, not hard to 
understand. But after careful analysis we think it is better to be like this, 
see code below:

Mheap.c:545
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
to hold a complete free element. You will understand it if you really know how 
mheap_get_search_free_bin is working. I am not going to go through the detail 
of it. In short, every lookup in free list will always locate a suitable 
element, in other words, the hit rate of free list will be almost 100%, and the 
‘per-attempt’ will be always around 1. The test result looks very promising, 
please see below after adding 2M gtpu tunnels and 2M routing entries:

Thread 0 vpp_main
13689507 objects, 3048367k of 3505932k used, 243663k free, 243656k reclaimed, 
106951k overhead, 4194300k capacity
  alloc. from small object cache: 47325868 hits 65271210 attempts (72.51%) 
replacements 8266122
  alloc. from free-list: 21879233 attempts, 21877898 hits (99.99%), 21882794 
considered (per-attempt 1.00)
  alloc. low

Re: [vpp-dev] questions in configuring tunnel

2018-04-18 Thread Kingwel Xie
Thanks for the comments. Please see mine in line.


From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Wednesday, April 18, 2018 9:18 PM
To: Kingwel Xie ; xyxue 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi Kingwei,

Thank you for your analysis. Some comments inline (on subjects I know a bit 
about ☺ )

Regards,
neale

From: Kingwel Xie mailto:kingwel@ericsson.com>>
Date: Wednesday, 18 April 2018 at 13:49
To: "Neale Ranns (nranns)" mailto:nra...@cisco.com>>, xyxue 
mailto:xy...@fiberhome.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] questions in configuring tunnel

Hi,

As we understand, this patch would bypass the node replication, so that adding 
tunnel would not cause main thread to wait for workers  synchronizing the nodes.

However, in addition to that, you have to do more things to be able to add 40k 
or more tunnels in a predictable time period. Here is what we did for adding 2M 
gtp tunnels, for your reference. Mpls tunnel should be pretty much the same.


  1.  Don’t call fib_entry_child_add after adding fib entry to the tunnel 
(fib_table_entry_special_add ). This will create a linked list for all child 
nodes belonged to the fib entry pointed to the tunnel endpoint. As a result, 
adding tunnel would become slower and slower. BTW, it is not a good fix, but it 
works.
  #if 0
  t->sibling_index = fib_entry_child_add
(t->fib_entry_index, gtm->fib_node_type, t - gtm->tunnels);
  #endif

[nr] if you skip this then the tunnels are not part of the FIB graph and hence 
any updates in the forwarding to the tunnel’s destination will go unnoticed and 
hence you potentially black hole the tunnel traffic indefinitely (since the 
tunnel is not re-stacked). It is a linked list, but apart from the pool 
allocation of the list element, the list element insertion is O(1), no?
[kingwel] You are right that the update will not be noticed, but we think it is 
acceptable for a p2p tunnel interface. The list element itself is ok when being 
inserted, but the following restack operation will walk through all inserted 
elements. This is the point I’m talking about.


  1.  The bihash for Adj_nbr. Each tunnel interface would create one bihash 
which by default is 32MB, mmap and memset then. Typically you don’t need that 
many adjacencies for a p2p tunnel interface. We change the code to use a common 
heap for all p2p interfaces

[nr] if you would push these changes upstream, I would be grateful.
[kingwel] The fix is quite ugly. Let’s see what we can do to make it better.


  1.  As mentioned in my email, rewrite requires cache line alignment, which 
mheap cannot handle very well. Mheap might be super slow when you add too many 
tunnels.
  2.  In vl_api_clnt_process, make sleep_time always 100us. This is to avoid 
main thread yielding to linux_epoll_input_inline 10ms wait time. This is not a 
perfect fix either. But if don’t do this, probably each API call would probably 
have to wait for 10ms until main thread has chance to polling API events.
  3.  Be careful with the counters. It would eat up your memory very quick. 
Each counter will be expanded to number of thread multiply number of tunnels. 
In other words, 1M tunnels means 1M x 8 x 8B = 64MB, if you have 8 workers. The 
combined counter will take double size because it has 16 bytes. Each interface 
has 9 simple and 2 combined counters. Besides, load_balance_t and adjacency_t 
also have some counters. You will have at least that many objects if you have 
that many interfaces. The solution is simple – to make a dedicated heap for all 
counters.

[nr] this would also be a useful addition to the upstream
[kingwel] will do later.


  1.  We also did some other fixes to speed up memory allocation, f.g., 
pre-allocate a big enough pool for gtpu_tunnel_t

[nr] I understand why you would do this and knobs in the startup.conf to enable 
might be a good approach, but for general consumption, IMHO, it’s too specific 
– others may disagree.
[kingwel] agree☺

To honest, it is not easy. It took us quite some time to figure it out. In the 
end, we manage to add 2M tunnels & 2M routes in 250s.

Hope it helps.

Regard,
Kingwel


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
[mailto:vpp-dev@lists.fd.io] On Behalf Of Neale Ranns
Sent: Wednesday, April 18, 2018 4:33 PM
To: xyxue mailto:xy...@fiberhome.com>>; Kingwel Xie 
mailto:kingwel@ericsson.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi Xyxue,

Try applying the changes in this patch:
   https://gerrit.fd.io/r/#/c/10216/
to MPLS tunnels. Please contribute any changes back to the community so we can 
all benefit.

Regards,
Neale


From: mailto:vpp-dev@lists.fd.io>> 

Re: [vpp-dev] mheap performance issue and fixup

2018-04-18 Thread Kingwel Xie
Hi Damjan,

We will do it asap. Actually we are quite new to vPP and even don’t know how to 
make bug report and code contribution or so.

Regards,
Kingwel

From: vpp-dev@lists.fd.io [mailto:vpp-dev@lists.fd.io] On Behalf Of Damjan 
Marion
Sent: Wednesday, April 18, 2018 11:30 PM
To: Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] mheap performance issue and fixup

Dear Kingwel,

Thank you for your email. It will be really appreciated if you can submit your 
changes to gerrit, preferably each point in separate patch.
That will be best place to discuss those changes...

Thanks in Advance,

--
Damjan

On 16 Apr 2018, at 10:13, Kingwel Xie 
mailto:kingwel@ericsson.com>> wrote:

Hi all,

We recently worked on GTPU tunnel and our target is to create 2M tunnels. It is 
not as easy as it looks like, and it took us quite some time to figure it out. 
The biggest problem we found is about mheap, which as you know is the low layer 
memory management function of vPP. We believe it makes sense to share what we 
found and what we’ve done to improve the performance of mheap.

First of all, mheap is fast. It has well-designed small object cache and 
multi-level free lists, to speed up the get/put. However, as discussed in the 
mail list before, it has a performance issue when dealing with 
align/align_offset allocation. We managed to locate the problem is brought by a 
pointer ‘rewrite’ in gtp_tunnel_t. This rewrite is a vector and required to be 
aligned to 64B cache line, therefore with 4 bytes align offset. We realized 
that it is because that the free list must be very long, meaning so many 
mheap_elts, but unfortunately it doesn’t have an element which fits to all 3 
prerequisites: size, align, and align offset. In this case,  each allocation 
has to traverse all elements till it reaches the end of element. As a result, 
you might observe each allocation is greater than 10 clocks/call with ‘show 
memory verbose’. It indicates the allocation takes too long, while it should be 
200~300 clocks/call in general. Also you should have noticed ‘per-attempt’ is 
quite high, even more than 100.

The fix is straight and simple : as discussed int his mail list before, to 
allocate ‘rewrite’ from a pool, instead of from mheap. Frankly speaking, it 
looks like a workaround not a real fix, so we spent some time fix the problem 
thoroughly. The idea is to add a few more bytes to the original required block 
size so that mheap will always lookup in a bigger free list, then most likely a 
suitable block can be easily located. Well, now the problem becomes how big is 
this extra size? It should be at least align+align_offset, not hard to 
understand. But after careful analysis we think it is better to be like this, 
see code below:

Mheap.c:545
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
to hold a complete free element. You will understand it if you really know how 
mheap_get_search_free_bin is working. I am not going to go through the detail 
of it. In short, every lookup in free list will always locate a suitable 
element, in other words, the hit rate of free list will be almost 100%, and the 
‘per-attempt’ will be always around 1. The test result looks very promising, 
please see below after adding 2M gtpu tunnels and 2M routing entries:

Thread 0 vpp_main
13689507 objects, 3048367k of 3505932k used, 243663k free, 243656k reclaimed, 
106951k overhead, 4194300k capacity
  alloc. from small object cache: 47325868 hits 65271210 attempts (72.51%) 
replacements 8266122
  alloc. from free-list: 21879233 attempts, 21877898 hits (99.99%), 21882794 
considered (per-attempt 1.00)
  alloc. low splits: 13355414, high splits: 512984, combined: 281968
  alloc. from vector-expand: 81907
  allocs: 69285673 276.00 clocks/call
  frees: 55596166 173.09 clocks/call
Free list:
bin 3:
20(82220170 48)
total 1
bin 273:
28340k(80569efc 60)
total 1
bin 276:
215323k(8c88df6c 44)
total 1
Total count in free bin: 3

You can see, as pointed out before, the hit rate is very high, >99.9%, and 
per-attempt is ~1. Furthermore, the total elements in free list is only 3.

Apart from we discussed above, we also made some other improvements/bug fixes 
to mheap:


  1.  Bug fix: macros MHEAP_ELT_OVERHEAD_BYTES & MHEAP_MIN_USER_DATA_BYTES are 
wrongly defined. In fact MHEAP_ELT_OVERHEAD_BYTES should be (STRUCT_OFFSET_OF 
(mheap_elt_t, user_data))
  2.  mheap_bytes_overhead is wrongly calculating the total overhead – should 
be number of elements * MHEAP_ELT_OVERHEAD_BYTES
  3.  Do not make an element if hi_free_size is smaller than 4 times of 
MHEAP_MIN_USER_DATA_BYTES. This is to avoid memory fragmentation
  4.  Bug fix: register_node.c:336 is wrongly using vector memory,  should be 
like this: clib_mem_is_heap_object (vec_header (

Re: [vpp-dev] questions in configuring tunnel

2018-04-18 Thread Kingwel Xie
Hi,

As we understand, this patch would bypass the node replication, so that adding 
tunnel would not cause main thread to wait for workers  synchronizing the nodes.

However, in addition to that, you have to do more things to be able to add 40k 
or more tunnels in a predictable time period. Here is what we did for adding 2M 
gtp tunnels, for your reference. Mpls tunnel should be pretty much the same.


  1.  Don’t call fib_entry_child_add after adding fib entry to the tunnel 
(fib_table_entry_special_add ). This will create a linked list for all child 
nodes belonged to the fib entry pointed to the tunnel endpoint. As a result, 
adding tunnel would become slower and slower. BTW, it is not a good fix, but it 
works.
  #if 0
  t->sibling_index = fib_entry_child_add
(t->fib_entry_index, gtm->fib_node_type, t - gtm->tunnels);
  #endif


  1.  The bihash for Adj_nbr. Each tunnel interface would create one bihash 
which by default is 32MB, mmap and memset then. Typically you don’t need that 
many adjacencies for a p2p tunnel interface. We change the code to use a common 
heap for all p2p interfaces
  2.  As mentioned in my email, rewrite requires cache line alignment, which 
mheap cannot handle very well. Mheap might be super slow when you add too many 
tunnels.
  3.  In vl_api_clnt_process, make sleep_time always 100us. This is to avoid 
main thread yielding to linux_epoll_input_inline 10ms wait time. This is not a 
perfect fix either. But if don’t do this, probably each API call would probably 
have to wait for 10ms until main thread has chance to polling API events.
  4.  Be careful with the counters. It would eat up your memory very quick. 
Each counter will be expanded to number of thread multiply number of tunnels. 
In other words, 1M tunnels means 1M x 8 x 8B = 64MB, if you have 8 workers. The 
combined counter will take double size because it has 16 bytes. Each interface 
has 9 simple and 2 combined counters. Besides, load_balance_t and adjacency_t 
also have some counters. You will have at least that many objects if you have 
that many interfaces. The solution is simple – to make a dedicated heap for all 
counters.
  5.  We also did some other fixes to speed up memory allocation, f.g., 
pre-allocate a big enough pool for gtpu_tunnel_t

To honest, it is not easy. It took us quite some time to figure it out. In the 
end, we manage to add 2M tunnels & 2M routes in 250s.

Hope it helps.

Regard,
Kingwel


From: vpp-dev@lists.fd.io [mailto:vpp-dev@lists.fd.io] On Behalf Of Neale Ranns
Sent: Wednesday, April 18, 2018 4:33 PM
To: xyxue ; Kingwel Xie 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] questions in configuring tunnel

Hi Xyxue,

Try applying the changes in this patch:
   https://gerrit.fd.io/r/#/c/10216/
to MPLS tunnels. Please contribute any changes back to the community so we can 
all benefit.

Regards,
Neale


From: mailto:vpp-dev@lists.fd.io>> on behalf of xyxue 
mailto:xy...@fiberhome.com>>
Date: Wednesday, 18 April 2018 at 09:48
To: Xie mailto:kingwel@ericsson.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] questions in configuring tunnel


Hi,

We are testing mpls tunnel.The problems shown below appear in our configuration:
1.A configuration of one tunnel will increase two node (this would lead to a 
very high consumption of memory )
2.more node number, more time to update vlib_node_runtime_update and node info 
traversal;

When we configured 40 thousand mpls tunnels , the configure time is 10+ minutes 
, and the occurrence of out of memory.
How can you configure 2M gtpu tunnels , Can I know the configuration speed and 
the memory usage?

Thanks,
Xyxue




[vpp-dev] mheap performance issue and fixup

2018-04-16 Thread Kingwel Xie
Hi all,

We recently worked on GTPU tunnel and our target is to create 2M tunnels. It is 
not as easy as it looks like, and it took us quite some time to figure it out. 
The biggest problem we found is about mheap, which as you know is the low layer 
memory management function of vPP. We believe it makes sense to share what we 
found and what we've done to improve the performance of mheap.

First of all, mheap is fast. It has well-designed small object cache and 
multi-level free lists, to speed up the get/put. However, as discussed in the 
mail list before, it has a performance issue when dealing with 
align/align_offset allocation. We managed to locate the problem is brought by a 
pointer 'rewrite' in gtp_tunnel_t. This rewrite is a vector and required to be 
aligned to 64B cache line, therefore with 4 bytes align offset. We realized 
that it is because that the free list must be very long, meaning so many 
mheap_elts, but unfortunately it doesn't have an element which fits to all 3 
prerequisites: size, align, and align offset. In this case,  each allocation 
has to traverse all elements till it reaches the end of element. As a result, 
you might observe each allocation is greater than 10 clocks/call with 'show 
memory verbose'. It indicates the allocation takes too long, while it should be 
200~300 clocks/call in general. Also you should have noticed 'per-attempt' is 
quite high, even more than 100.

The fix is straight and simple : as discussed int his mail list before, to 
allocate 'rewrite' from a pool, instead of from mheap. Frankly speaking, it 
looks like a workaround not a real fix, so we spent some time fix the problem 
thoroughly. The idea is to add a few more bytes to the original required block 
size so that mheap will always lookup in a bigger free list, then most likely a 
suitable block can be easily located. Well, now the problem becomes how big is 
this extra size? It should be at least align+align_offset, not hard to 
understand. But after careful analysis we think it is better to be like this, 
see code below:

Mheap.c:545
  word modifier = (align > MHEAP_USER_DATA_WORD_BYTES ? align + align_offset + 
sizeof(mheap_elt_t) : 0);
  bin = user_data_size_to_bin_index (n_user_bytes + modifier);

The reason of extra sizeof(mheap_elt_t) is to avoid lo_free_size is too small 
to hold a complete free element. You will understand it if you really know how 
mheap_get_search_free_bin is working. I am not going to go through the detail 
of it. In short, every lookup in free list will always locate a suitable 
element, in other words, the hit rate of free list will be almost 100%, and the 
'per-attempt' will be always around 1. The test result looks very promising, 
please see below after adding 2M gtpu tunnels and 2M routing entries:

Thread 0 vpp_main
13689507 objects, 3048367k of 3505932k used, 243663k free, 243656k reclaimed, 
106951k overhead, 4194300k capacity
  alloc. from small object cache: 47325868 hits 65271210 attempts (72.51%) 
replacements 8266122
  alloc. from free-list: 21879233 attempts, 21877898 hits (99.99%), 21882794 
considered (per-attempt 1.00)
  alloc. low splits: 13355414, high splits: 512984, combined: 281968
  alloc. from vector-expand: 81907
  allocs: 69285673 276.00 clocks/call
  frees: 55596166 173.09 clocks/call
Free list:
bin 3:
20(82220170 48)
total 1
bin 273:
28340k(80569efc 60)
total 1
bin 276:
215323k(8c88df6c 44)
total 1
Total count in free bin: 3

You can see, as pointed out before, the hit rate is very high, >99.9%, and 
per-attempt is ~1. Furthermore, the total elements in free list is only 3.

Apart from we discussed above, we also made some other improvements/bug fixes 
to mheap:


  1.  Bug fix: macros MHEAP_ELT_OVERHEAD_BYTES & MHEAP_MIN_USER_DATA_BYTES are 
wrongly defined. In fact MHEAP_ELT_OVERHEAD_BYTES should be (STRUCT_OFFSET_OF 
(mheap_elt_t, user_data))
  2.  mheap_bytes_overhead is wrongly calculating the total overhead - should 
be number of elements * MHEAP_ELT_OVERHEAD_BYTES
  3.  Do not make an element if hi_free_size is smaller than 4 times of 
MHEAP_MIN_USER_DATA_BYTES. This is to avoid memory fragmentation
  4.  Bug fix: register_node.c:336 is wrongly using vector memory,  should be 
like this: clib_mem_is_heap_object (vec_header (r->name, 0))
  5.  Bug fix: dpo_stack_from_node in dpo.c: memory leak, of parent_indices
  6.  Some fixes and improvements of format_mheap to show more information of 
heap

The code including all fixes is tentatively in our private code base. It can be 
of course shared if wanted.

Really appreciate any comments!

Regards,
Kingwel