Could we please see the faulting instruction, as well as the vector register contents involved?
As in "x/i $pc", and the ymmX registers involved? If the vector instruction requires alignment, "movaps" or similar, it wouldn't be a shock to discover an unaligned address. We've already found and fixed a few of those since switching to clang, and I have to say that "va + 4" raises all sorts of aligned vector instruction red flags... FWIW... Dave -----Original Message----- From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Elias Rudberg Sent: Wednesday, May 6, 2020 1:46 PM To: vpp-dev@lists.fd.io Subject: [vpp-dev] Segmentation fault in rdma_device_input_refill when using clang compiler Hello VPP experts, When trying to use the current master branch, we get a segmentation fault error. Here is what it looks like in gdb: Thread 3 "vpp_wk_0" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fedf91fe700 (LWP 21309)] rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, rxq=0x7ffff7edea80, is_mlx5dv=1) at vpp/src/plugins/rdma/input.c:115 115 *(u64x4 *) (va + 4) = u64x4_byte_swap (*(u64x4 *) (va + 4)); (gdb) bt #0 rdma_device_input_refill (vm=0x7ff8a5d2f4c0, rd=0x7fedd35ed5c0, rxq=0x7ffff7edea80, is_mlx5dv=1) at vpp/src/plugins/rdma/input.c:115 #1 0x00007fffabbbb84d in rdma_device_input_inline (vm=0x7ff8a5d2f4c0, node=0x7ff5ccdfee00, frame=0x0, rd=0x7fedd35ed5c0, qid=0, use_mlx5dv=1) at vpp/src/plugins/rdma/input.c:622 #2 0x00007fffabbbae44 in rdma_input_node_fn_skx (vm=0x7ff8a5d2f4c0, node=0x7ff5ccdfee00, frame=0x0) at vpp/src/plugins/rdma/input.c:647 #3 0x00007ffff60e3155 in dispatch_node (vm=0x7ff8a5d2f4c0, node=0x7ff5ccdfee00, type=VLIB_NODE_TYPE_INPUT, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, last_time_stamp=66486783453597600) at vpp/src/vlib/main.c:1235 #4 0x00007ffff60ddbf5 in vlib_main_or_worker_loop (vm=0x7ff8a5d2f4c0, is_main=0) at vpp/src/vlib/main.c:1815 #5 0x00007ffff60dd227 in vlib_worker_loop (vm=0x7ff8a5d2f4c0) at vpp/src/vlib/main.c:1996 #6 0x00007ffff61345a1 in vlib_worker_thread_fn (arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:1795 #7 0x00007ffff5531954 in clib_calljmp () at vpp/src/vppinfra/longjmp.S:123 #8 0x00007fedf91fdce0 in ?? () #9 0x00007ffff612cd53 in vlib_worker_thread_bootstrap_fn (arg=0x7fffb74ea980) at vpp/src/vlib/threads.c:584 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This segmentation fault happens the same way every time I try to start VPP. This is in Ubuntu 18.04.4 using the rdma plugin with Mellanox mlx5 NICs and a Intel Xeon Gold 6126 CPU. I have looked back at recent changes and found that this problem started with the commit 4ba16a44 "misc: switch to clang-9" dated April 28. Before that we could use the master branch without thie problem. Changing back to gcc by removing clang in src/CMakeLists.txt makes the error go away. However, there is then instead a problem with a "symbol lookup error" for crypto_native_plugin.so: undefined symbol: crypto_native_aes_cbc_init_avx512 (that problem disappears if disabling the crypto_native plugin) So, two problems: (1) The segmentation fault itself, perhaps indicating a bug somewhere but seems to appear only with clang and not with gcc (2) The "undefined symbol: crypto_native_aes_cbc_init_avx512" problem when trying to use gcc instead of clang What do you think about these? As a short-term fix, is removing clang in src/CMakeLists.txt reasonable or is there a better/easier workaround? Does anyone else use the rdma plugin when compiling using clang -- perhaps that combination triggers this problem? Best regards, Elias
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16253): https://lists.fd.io/g/vpp-dev/message/16253 Mute This Topic: https://lists.fd.io/mt/74033970/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-