Hi Ole, I need roughly the amount of tunnels I ended up testing with - ~300-500k.
I thought about extending generic VPP API with a (blocking?) method accepting burst (array) of messages and returning burst of responses. One of the challenges in map is that to configure a tunnel I need two request messages, with the second one depending on first's result (index). It would be much smoother if I could just send a single message to do both domain and rule(/s) addition... Best Regards, Jacek. 2017-09-05 17:06 GMT+02:00 Ole Troan <otr...@employees.org>: > Jacek, > > It's also been on my list for a while to add a better bulk add for MAP > domains / rules. > Any idea of the scale you are looking at here? > > Best regards, > Ole > > > > On 5 Sep 2017, at 15:07, Jacek Siuda <j...@semihalf.com> wrote: > > > > Hi, > > > > I'm conducting a tunnel test using VPP (vnet) map with the following > parameters: > > ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain; > total number of tunnels: 300000, total number of control messages: 600k. > > > > My problem is with simple adding tunnels. After adding more than > ~150k-200k, performance drops significantly: first 100k is added in ~3s (on > asynchronous C client), next 100k in another ~5s, but the last 100k takes > ~37s to add; in total: ~45s. Python clients are performing even worse: 32 > minutes(!) for 300k tunnels with synchronous (blocking) version and ~95s > with asynchronous. The python clients are expected to perform a bit worse > according to vpp docs, but I was worried by non-linear time of single > tunnel addition that is visible even on C client. > > > > While investigating this using perf, I found the culprit: it is the > memory allocation done for ip address by rule addition request. > > The memory is allocated by clib, which is using mheap library (~98% of > cpu consumption). I looked into mheap and it looks a bit complicated for > allocating a short object. > > I've done a short experiment by replacing (in vnet/map/ only) clib > allocation with DPDK rte_malloc() and achieved a way better performance: > 300k tunnels in ~5-6s with the same C-client, and respectively ~70s and > ~30-40s with Python clients. Also, I haven't noticed any negative impact on > packet throughput with my experimental allocator. > > > > So, here are my questions: > > 1) Did someone other reported performance penalties for using mheap > library? I've searched the list archive and could not find any related > questions. > > 2) Why mheap library was chosen to be used in clib? Are there any > performance benefits in some scenarios? > > 3) Are there any (long- or short-term) plans to replace memory > management in clib with some other library? > > 4) I wonder, if I'd like to upstream my solution, how should I approach > customization of memory allocation, so it would be accepted by community. > Installable function pointers defaulting to clib? > > > > Best Regards, > > Jacek Siuda. > > > > > > _______________________________________________ > > vpp-dev mailing list > > vpp-dev@lists.fd.io > > https://lists.fd.io/mailman/listinfo/vpp-dev > >
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev