[vpp-dev] Build vpp on openwrt

2017-09-08 Thread yug...@telincn.com
Hi all,
Is there anyone try to build vpp on openwrt?

Regards,
Ewan



yug...@telincn.com
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] FD.io Jenkins Maintenance: 2017-09-07 @ 0415 UTC (9:15am PDT)

2017-09-08 Thread Vanessa Valderrama
While the build timeouts issue appears to be resolved, there does seem
to be an increase in build times for various VPP jobs.  We are aware of
and looking into this issue.

Thank you,
Vanessa


On 09/07/2017 11:27 AM, Vanessa Valderrama wrote:
>
> Change is complete.  The new nodes appear to be spinning up
> successfully.  We'll be monitoring jobs throughout the day.  Please
> report issues via IRC fdio-infra.
>
> Thank you,
> Vanessa
>
> On 09/07/2017 11:18 AM, Vanessa Valderrama wrote:
>>
>> Starting change now.
>>
>>
>> On 09/07/2017 10:51 AM, Vanessa Valderrama wrote:
>>>
>>> What:
>>>
>>> LF is switching VPP jobs to use new Jenkins nodes with dedicated
>>> core instances
>>>
>>> When:
>>>
>>> 2017-09-07 @ 0415 UTC (9:15am PDT)
>>>
>>> Impact:
>>>
>>> No restart is required for this change.  Once the change is made new
>>> instances will spin up on the new nodes.
>>>
>>>
>>> Why:
>>>
>>> Various FD.io projects have been intermittently experiencing build
>>> time out issues.  Dedicated core instances should alleviate these
>>> issues by reducing CPU over subscription.
>>>
>>>
>>
>

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] mheap performance

2017-09-08 Thread Dave Barach (dbarach)
Dear Jacek,

 

Oh, heck, we don’t need to use a sledgehammer to kill a fly. It will take five 
minutes to fix this problem. Copying Ole Troan for his input, and / or to 
simply fix the problem as follows:

 

Make a set of pools whose elements are n * CLIB_CACHE_LINE BYTES in size. It’s 
easy enough to dynamically create a fresh pool if [all of a sudden] you need k 
* CLIB_CACHE_LINE BYTES

 

Allocate d->rules from the appropriate pool by rounding 1rules is unnecessary in the first place. Have you tried dropping the 
alignment constraint? 

 

Thanks… Dave

 

From: Jacek Siuda [mailto:j...@semihalf.com] 
Sent: Friday, September 8, 2017 10:39 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io; Michał Dubiel 
Subject: Re: [vpp-dev] mheap performance

 

Hi Dave,

The perf backtrace (taken from "control-only" lcore 0) is as follows:
-  91.87% vpp_main  libvppinfra.so.0.0.0[.] mheap_get_aligned
   - mheap_get_aligned
  - 99.48% map_add_del_psid
   vl_api_map_add_del_rule_t_handler
   vl_msg_api_handler_with_vm_node
   memclnt_process
   vlib_process_bootstrap
   clib_calljmp

Using DPDK's rte_malloc_socket(), CPU consumption drops to around 0,5%.

>From my (somewhat brief) mheap code analysis, it looks like mheap might not 
>take into account alignment when looking for free space to allocate structure. 
>So, in my case, when I keep allocating 16B objects with 64B alignment, it 
>starts to examine each hole it left by previous object's allocation alignment 
>and only then realize it cannot be used because of alignment. But of course I 
>might be wrong and the root cause is entirely elsewhere...

In my test, I'm just adding 300,000 tunnels (one domain+one rule).

Unfortunately, rte_malloc() provides only aligned memory allocation, not 
aligned-at-offset. Theoretically we could provide wrapper around it, but that 
would need some careful coding and a lot of testing. I made an attempt to 
quickly replace mheap globally, but of course it ended up in utter failure.

 

Right now, I added a concept of external allocator to clib (via function 
pointers), I'm enabling it only upon DPDK plugin initialization. However, such 
approach requires using it directly instead of clib alloc, (e.g. I did it upon 
rule adding). While it does not add dependency on DPDK, I'm not fully 
satisfied, because it would need manual replacement of all allocation calls. If 
you want, I can share the patch.

Best Regards,

Jacek.

 

2017-09-05 15:30 GMT+02:00 Dave Barach (dbarach)  >:

Dear Jacek,

 

Use of the clib memory allocator is mainly historical. It’s elegant in a couple 
of ways - including built-in leak-finding - but it has been known to backfire 
in terms of performance. Individual mheaps are limited to 4gb in a [typical] 
32-bit vector length image. 

 

Note that the idiosyncratic mheap API functions “tell me how long this object 
really is” and “allocate N bytes aligned to a boundary at a certain offset” are 
used all over the place.

 

I wouldn’t mind replacing it - so long as we don’t create a hard dependency on 
the dpdk - but before we go there...: Tell me a bit about the scenario at hand. 
What are we repeatedly allocating / freeing? That’s almost never necessary...

 

Can you easily share the offending backtrace?  

 

Thanks… Dave

 

From: vpp-dev-boun...@lists.fd.io   
[mailto:vpp-dev-boun...@lists.fd.io  ] On 
Behalf Of Jacek Siuda
Sent: Tuesday, September 5, 2017 9:08 AM
To: vpp-dev@lists.fd.io  
Subject: [vpp-dev] mheap performance

 

Hi,

I'm conducting a tunnel test using VPP (vnet) map with the following parameters:

ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain; total 
number of tunnels: 30, total number of control messages: 600k.

My problem is with simple adding tunnels. After adding more than ~150k-200k, 
performance drops significantly: first 100k is added in ~3s (on asynchronous C 
client), next 100k in another ~5s, but the last 100k takes ~37s to add; in 
total: ~45s. Python clients are performing even worse: 32 minutes(!) for 300k 
tunnels with synchronous (blocking) version and ~95s with asynchronous. The 
python clients are expected to perform a bit worse according to vpp docs, but I 
was worried by non-linear time of single tunnel addition that is visible even 
on C client.

While investigating this using perf, I found the culprit: it is the memory 
allocation done for ip address by rule addition request. 
The memory is allocated by clib, which is using mheap 

Re: [vpp-dev] Rearrangement of graph nodes

2017-09-08 Thread Ngo Doan Lap
Hi Dave,
I'm trying to create a flexible way to build new network app without
touching the source code.
Initially, we need to develop basic/atom nodes such as SetMac, SetVLAN,
SetIP, LookUpIP, LookUpMac ...
After that, we can create a new app by using the above atom nodes to define
a graph in a file and VPP parser this file to build the graph.
Does this break any VPP philosophies?


On Fri, Sep 8, 2017 at 7:09 PM, Dave Barach (dbarach) 
wrote:

> One could do that, but what problem are you trying to solve? The data
> structures involved are not super-complicated, but what you’ve described is
> neither a beginner project nor a worthwhile project IMO.
>
>
>
> If you want to spoof MAC addresses in the L2 path, add an L2 feature node
> which does that. Generally speaking, two nodes A and B have a contract in
> terms of buffer metadata setup. Arbitrary graph rewiring would result in
> either gross or subtle malfunction.
>
>
>
> In terms of how to build a plugin: look at .../src/examples/sample-plugin.
>
>
>
>
> I maintain (sporadically) a set of emacs skeletons in .../extras/emacs. If
> you M-x eval-buffer all-skel.el, a subsequent M-x make-plugin will create a
> boilerplate plugin for you.
>
>
>
> Thanks… Dave
>
>
>
> *From:* vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] *On
> Behalf Of *Ngo Doan Lap
> *Sent:* Thursday, September 7, 2017 10:18 PM
> *To:* vpp-dev@lists.fd.io
> *Subject:* [vpp-dev] Rearrangement of graph nodes
>
>
>
> Hi,
>
> From this page https://wiki.fd.io/view/VPP/What_is_VPP%3F
>
> There is an option to create a plugin to rearrange graph nodes.
>
> I want to write a plugin that builds graph nodes from a file, for example
>
> graph.txt:
>
> dpdk-input-->ethernet-input->change-mac
>
>
>
> I would like to know from your opinion that is it possible with VPP?
>
> And if yes, can you tell me how to write a plugin to rearrange graph nodes?
>
> (I'm unable to find the example/doc to build a plugin to rearrange graph
> nodes)
>
> --
>
> Thanks and Best Regards,
> Ngo Doan Lap
>
>
>



-- 
Thanks and Best Regards,
Ngo Doan Lap
Mobile: 0977.833.757
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] mheap performance

2017-09-08 Thread Jacek Siuda
Hi Ole,

I need roughly the amount of tunnels I ended up testing with - ~300-500k.

I thought about extending generic VPP API with a (blocking?) method
accepting burst (array) of messages and returning burst of responses.

One of the challenges in map is that to configure a tunnel I need two
request messages, with the second one depending on first's result (index).
It would be much smoother if I could just send a single message to do both
domain and rule(/s) addition...

Best Regards,
Jacek.

2017-09-05 17:06 GMT+02:00 Ole Troan :

> Jacek,
>
> It's also been on my list for a while to add a better bulk add for MAP
> domains / rules.
> Any idea of the scale you are looking at here?
>
> Best regards,
> Ole
>
>
> > On 5 Sep 2017, at 15:07, Jacek Siuda  wrote:
> >
> > Hi,
> >
> > I'm conducting a tunnel test using VPP (vnet) map with the following
> parameters:
> > ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain;
> total number of tunnels: 30, total number of control messages: 600k.
> >
> > My problem is with simple adding tunnels. After adding more than
> ~150k-200k, performance drops significantly: first 100k is added in ~3s (on
> asynchronous C client), next 100k in another ~5s, but the last 100k takes
> ~37s to add; in total: ~45s. Python clients are performing even worse: 32
> minutes(!) for 300k tunnels with synchronous (blocking) version and ~95s
> with asynchronous. The python clients are expected to perform a bit worse
> according to vpp docs, but I was worried by non-linear time of single
> tunnel addition that is visible even on C client.
> >
> > While investigating this using perf, I found the culprit: it is the
> memory allocation done for ip address by rule addition request.
> > The memory is allocated by clib, which is using mheap library (~98% of
> cpu consumption). I looked into mheap and it looks a bit complicated for
> allocating a short object.
> > I've done a short experiment by replacing (in vnet/map/ only) clib
> allocation with DPDK rte_malloc() and achieved a way better performance:
> 300k tunnels in ~5-6s with the same C-client, and respectively ~70s and
> ~30-40s with Python clients. Also, I haven't noticed any negative impact on
> packet throughput with my experimental allocator.
> >
> > So, here are my questions:
> > 1) Did someone other reported performance penalties for using mheap
> library? I've searched the list archive and could not find any related
> questions.
> > 2) Why mheap library was chosen to be used in clib? Are there any
> performance benefits in some scenarios?
> > 3) Are there any (long- or short-term) plans to replace memory
> management in clib with some other library?
> > 4) I wonder, if I'd like to upstream my solution, how should I approach
> customization of memory allocation, so it would be accepted by community.
> Installable function pointers defaulting to clib?
> >
> > Best Regards,
> > Jacek Siuda.
> >
> >
> > ___
> > vpp-dev mailing list
> > vpp-dev@lists.fd.io
> > https://lists.fd.io/mailman/listinfo/vpp-dev
>
>
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] mheap performance

2017-09-08 Thread Jacek Siuda
Hi Dave,

The perf backtrace (taken from "control-only" lcore 0) is as follows:
-  91.87% vpp_main  libvppinfra.so.0.0.0[.] mheap_get_aligned
   - mheap_get_aligned
  - 99.48% map_add_del_psid
   vl_api_map_add_del_rule_t_handler
   vl_msg_api_handler_with_vm_node
   memclnt_process
   vlib_process_bootstrap
   clib_calljmp

Using DPDK's rte_malloc_socket(), CPU consumption drops to around 0,5%.

>From my (somewhat brief) mheap code analysis, it looks like mheap might not
take into account alignment when looking for free space to allocate
structure. So, in my case, when I keep allocating 16B objects with 64B
alignment, it starts to examine each hole it left by previous object's
allocation alignment and only then realize it cannot be used because of
alignment. But of course I might be wrong and the root cause is entirely
elsewhere...

In my test, I'm just adding 300,000 tunnels (one domain+one rule).

Unfortunately, rte_malloc() provides only aligned memory allocation, not
aligned-at-offset. Theoretically we could provide wrapper around it, but
that would need some careful coding and a lot of testing. I made an attempt
to quickly replace mheap globally, but of course it ended up in utter
failure.

Right now, I added a concept of external allocator to clib (via function
pointers), I'm enabling it only upon DPDK plugin initialization. However,
such approach requires using it directly instead of clib alloc, (e.g. I did
it upon rule adding). While it does not add dependency on DPDK, I'm not
fully satisfied, because it would need manual replacement of all allocation
calls. If you want, I can share the patch.

Best Regards,
Jacek.

2017-09-05 15:30 GMT+02:00 Dave Barach (dbarach) :

> Dear Jacek,
>
>
>
> Use of the clib memory allocator is mainly historical. It’s elegant in a
> couple of ways - including built-in leak-finding - but it has been known to
> backfire in terms of performance. Individual mheaps are limited to 4gb in a
> [typical] 32-bit vector length image.
>
>
>
> Note that the idiosyncratic mheap API functions “tell me how long this
> object really is” and “allocate N bytes aligned to a boundary at a certain
> offset” are used all over the place.
>
>
>
> I wouldn’t mind replacing it - so long as we don’t create a hard
> dependency on the dpdk - but before we go there...: Tell me a bit about the
> scenario at hand. What are we repeatedly allocating / freeing? That’s
> almost never necessary...
>
>
>
> Can you easily share the offending backtrace?
>
>
>
> Thanks… Dave
>
>
>
> *From:* vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] *On
> Behalf Of *Jacek Siuda
> *Sent:* Tuesday, September 5, 2017 9:08 AM
> *To:* vpp-dev@lists.fd.io
> *Subject:* [vpp-dev] mheap performance
>
>
>
> Hi,
>
> I'm conducting a tunnel test using VPP (vnet) map with the following
> parameters:
>
> ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain;
> total number of tunnels: 30, total number of control messages: 600k.
>
> My problem is with simple adding tunnels. After adding more than
> ~150k-200k, performance drops significantly: first 100k is added in ~3s (on
> asynchronous C client), next 100k in another ~5s, but the last 100k takes
> ~37s to add; in total: ~45s. Python clients are performing even worse: 32
> minutes(!) for 300k tunnels with synchronous (blocking) version and ~95s
> with asynchronous. The python clients are expected to perform a bit worse
> according to vpp docs, but I was worried by non-linear time of single
> tunnel addition that is visible even on C client.
>
> While investigating this using perf, I found the culprit: it is the memory
> allocation done for ip address by rule addition request.
> The memory is allocated by clib, which is using mheap library (~98% of cpu
> consumption). I looked into mheap and it looks a bit complicated for
> allocating a short object.
> I've done a short experiment by replacing (in vnet/map/ only) clib
> allocation with DPDK rte_malloc() and achieved a way better performance:
> 300k tunnels in ~5-6s with the same C-client, and respectively ~70s and
> ~30-40s with Python clients. Also, I haven't noticed any negative impact on
> packet throughput with my experimental allocator.
>
> So, here are my questions:
>
> 1) Did someone other reported performance penalties for using mheap
> library? I've searched the list archive and could not find any related
> questions.
>
> 2) Why mheap library was chosen to be used in clib? Are there any
> performance benefits in some scenarios?
>
> 3) Are there any (long- or short-term) plans to replace memory management
> in clib with some other library?
>
> 4) I wonder, if I'd like to upstream my solution, how should I approach
> customization of memory allocation, so it would be accepted by community.
> Installable function pointers defaulting to clib?
>
>
>
> Best Regards,
>
> Jacek Siuda.
>
>
>
>
>

Re: [vpp-dev] Rearrangement of graph nodes

2017-09-08 Thread Dave Barach (dbarach)
One could do that, but what problem are you trying to solve? The data 
structures involved are not super-complicated, but what you’ve described is 
neither a beginner project nor a worthwhile project IMO.

If you want to spoof MAC addresses in the L2 path, add an L2 feature node which 
does that. Generally speaking, two nodes A and B have a contract in terms of 
buffer metadata setup. Arbitrary graph rewiring would result in either gross or 
subtle malfunction.

In terms of how to build a plugin: look at .../src/examples/sample-plugin.

I maintain (sporadically) a set of emacs skeletons in .../extras/emacs. If you 
M-x eval-buffer all-skel.el, a subsequent M-x make-plugin will create a 
boilerplate plugin for you.

Thanks… Dave

From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of Ngo Doan Lap
Sent: Thursday, September 7, 2017 10:18 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Rearrangement of graph nodes

Hi,
From this page https://wiki.fd.io/view/VPP/What_is_VPP%3F
There is an option to create a plugin to rearrange graph nodes.
I want to write a plugin that builds graph nodes from a file, for example
graph.txt:
dpdk-input-->ethernet-input->change-mac

I would like to know from your opinion that is it possible with VPP?
And if yes, can you tell me how to write a plugin to rearrange graph nodes?
(I'm unable to find the example/doc to build a plugin to rearrange graph nodes)
--
Thanks and Best Regards,
Ngo Doan Lap

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev