Hi all,

TLDR: when running tests vs vpp with multiple workers, roughly 25% of
tests fail or crash vpp. It looks like buffer management is still not
completely thread safe.

I've pushed work-in-progress make test modification which runs the test
against both single-thread and multiple-worker vpp. There are quite a
few failures and/or coredumps while running against multiple-worker vpp.

These test cases are failing at this time:

ACLPluginConnTestCase
BFD4TestCase
BFDFIBTestCase
TestDHCP
Datapath
DisableFP
DisableIPFIX
Flowprobe
ReenableFP
ReenableIPFIX
TestGRE
TestIPv4FibCrud
TestIp4VrfMultiInst
TestIP6VrfMultiInst
TestL2fib
TestL2bdArpTerm
TestL2bdMultiInst
TestLB
TestNAT64
TestSNAT
TestSpan
TestVxlanGpe

it seems that there are still some thread safety issues with the buffer
management based on the TestSpan crash:

#2  0x0000000000406d1e in os_exit (code=code@entry=1) at 
/home/ksekera/vpp/build-data/../src/vpp/vnet/main.c:287
#3  0x00007f139af0c2fa in unix_signal_handler (signum=<optimized out>, 
si=<optimized out>, uc=<optimized out>)
    at /home/ksekera/vpp/build-data/../src/vlib/unix/main.c:118
#4  <signal handler called>
#5  mheap_put (v=0x7f1356bdf000, uoffset=18446744073709549696) at 
/home/ksekera/vpp/build-data/../src/vppinfra/mheap.c:797
#6  0x00007f139aeb6574 in vlib_buffer_add_to_free_list (do_init=1 '\001', 
buffer_index=<optimized out>, f=0x7f1359d0f780,
    vm=0x7f139b1252e0 <vlib_global_main>) at 
/home/ksekera/vpp/build-data/../src/vlib/buffer_funcs.h:861
#7  vlib_buffer_free_inline (follow_buffer_next=1, n_buffers=256, 
buffers=<optimized out>, vm=0x7f139b1252e0 <vlib_global_main>)
    at /home/ksekera/vpp/build-data/../src/vlib/buffer.c:705
#8  vlib_buffer_free_internal (vm=0x7f139b1252e0 <vlib_global_main>, 
buffers=0x7f135b504110, n_buffers=<optimized out>)
    at /home/ksekera/vpp/build-data/../src/vlib/buffer.c:730
#9  0x00007f139aaba427 in vlib_buffer_free (n_buffers=256, buffers=<optimized 
out>, vm=0x7f139b1252e0 <vlib_global_main>)
    at /home/ksekera/vpp/build-data/../src/vlib/buffer_funcs.h:327
#10 pg_output (vm=0x7f139b1252e0 <vlib_global_main>, node=<optimized out>, 
frame=<optimized out>)
    at /home/ksekera/vpp/build-data/../src/vnet/pg/output.c:83

(gdb)
#5  mheap_put (v=0x7f1356bdf000, uoffset=18446744073709549696) at 
/home/ksekera/vpp/build-data/../src/vppinfra/mheap.c:797
797       if (e->n_user_data != n->prev_n_user_data)
(gdb) p *n
Cannot access memory at address 0x7f1556bde87c
(gdb)

here is the patch set if you want to try it out...

https://gerrit.fd.io/r/#/c/8090

it's still a bit clunky, as the testing is done in two phases - first
the full suite is run vs single-thread vpp, then vs multiple-worker vpp.
It's not straightforward to do this in one run (so that instead of
running A, B, C vs single and A, B, C vs multi we run A (vs single), A
(vs multi), B (vs single), B (vs multi), C (vs single), C (vs multi)) so
that's why for now it's implemented this way.

If you want to skip the single-thread tests to speed up your own
testing, run it like this:

env VPP_TEST_SKIP_SINGLE_THREAD=y make test

Currently, the number of worker threads is set as the core count minus
two, with a cap of 8. Higher number causes the ACL plugin to freak out
(memory allocation failure) and the VPP refuses to start, ruining the
day for everybody.

Regards,
Klement
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to