Hello VPP experts,

When trying to use the current master branch, we get a segmentation
fault error. Here is what it looks like in gdb:

Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fedf91fe700 (LWP 21309)]
rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
rxq=0x7ffff7edea80, is_mlx5dv=1)
    at vpp/src/plugins/rdma/input.c:115
115               *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va
+ 4));
(gdb) bt
#0  rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0,
rxq=0x7ffff7edea80, is_mlx5dv=1)
    at vpp/src/plugins/rdma/input.c:115
#1  0x00007fffabbbb84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1)
    at vpp/src/plugins/rdma/input.c:622
#2  0x00007fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, frame=0x0)
    at vpp/src/plugins/rdma/input.c:647
#3  0x00007ffff60e3155 in dispatch_node (vm=0x7ff8a5d2f4c0,
node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
    last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235
#4  0x00007ffff60ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0,
is_main=0) at vpp/src/vlib/main.c:1815
#5  0x00007ffff60dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at
vpp/src/vlib/main.c:1996
#6  0x00007ffff61345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at
vpp/src/vlib/threads.c:1795
#7  0x00007ffff5531954 in clib_calljmp () at
vpp/src/vppinfra/longjmp.S:123
#8  0x00007fedf91fdce0 in ?? ()
#9  0x00007ffff612cd53 in vlib_worker_thread_bootstrap_fn
(arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This segmentation fault happens the same way every time I try to start
VPP.

This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs
and a Intel Xeon Gold 6126 CPU.

I have looked back at recent changes and found that this problem
started with the commit 4ba16a44 "misc: switch to clang-9" dated April
28. Before that we could use the master branch without thie problem.

Changing back to gcc by removing clang in src/CMakeLists.txt makes the
error go away. However, there is then instead a problem with a "symbol
lookup error" for crypto_native_plugin.so: undefined symbol:
crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling
the crypto_native plugin)

So, two problems:

(1) The segmentation fault itself, perhaps indicating a bug somewhere
but seems to appear only with clang and not with gcc

(2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem
when trying to use gcc instead of clang

What do you think about these?

As a short-term fix, is removing clang in src/CMakeLists.txt reasonable
or is there a better/easier workaround?

Does anyone else use the rdma plugin when compiling using clang --
perhaps that combination triggers this problem?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16252): https://lists.fd.io/g/vpp-dev/message/16252
Mute This Topic: https://lists.fd.io/mt/74033970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to