Hoi,

I am running VPP on a few aarch64 machines and observed regular crashes with a stacktrace that suggests IPv6 FIB lookup issue -

Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #0  0x0000fc1b8ac808f8
Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from linux-vdso.so.1
Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #1  0x0000fc1b8998ac3c ip6_fib_table_lookup_exact_match + 0x3c Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libvnet.so.26.06 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #2  0x0000fc1b89a21d2c proxy_arp_intfc_walk + 0x4030 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libvnet.so.26.06 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #3  0x0000fc1b89101658 vlib_exit_with_status + 0x808 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libvlib.so.26.06 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #4  0x0000fc1b89103eb0 vlib_exit_with_status + 0x3060 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libvlib.so.26.06 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #5  0x0000fc1b8912d88c vlib_worker_thread_bootstrap_fn + 0x6c Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libvlib.so.26.06 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #6  0x0000fc1b88de595c pthread_condattr_setpshared + 0x5bc Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libc.so.6 Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]: #7  0x0000fc1b88e4bb4c __clone + 0x2cc Jun 04 19:42:29 dpu0-ddln0 vpp[1179016]:      from /lib/aarch64-linux-gnu/libc.so.6

I was lucky enough to get two coredumps out of it, which I fed to Claude and it came back with an analysis that pointed to me having changed the link-local address on an interface:

   The interface did have IPv6/ND enabled — otherwise the
   icmp6_neighbor_solicitation node would never have run. What it
   transiently lacked was its per-interface link-local FIB table
   (ilt_fibs[sw_if_index]), which is a structure distinct from "IPv6 is
   enabled." That table is created lazily when a link-local address is
   added (ip6_ll_fib_create) and freed, with the slot reset to ~0, when
   the last FIB_SOURCE_IP6_ND entry is removed
   (ip6_ll_table.c:150-154). Crucially, ip6_link_set_local_address()
   (ip6_link.c:359-362) changes a link-local address by doing
   delete-then-update: it removes the old LL prefix (which frees the
   FIB and sets ilt_fibs = ~0), then re-adds the new one (which
   recreates it). The ND node stays enabled across this whole sequence.
   So the NS did not arrive anywhere unexpected — it hit a
   normally-configured interface during the brief window in which its
   LL FIB had been torn down and not yet rebuilt. A worker thread
   forwarding an ordinary link-local NS in that window read ilt_fibs ==
   ~0 and segfaulted; two crashes 31 minutes apart is consistent with
   two separate LL-address-change events, matching your note that you'd
   been changing link-local addresses.

The debugging session made my head spin a little bit as I'm not very good with gdb, but what the cores do prove: an unguarded ~0 from ip6_ll_fib_get() in the link-local NS branch causes an out-of-bounds pool_elt_at_index; the trigger is an NS for a link-local target arriving while ilt_fibs[sw_if_index] == ~0.

The fix is a simple check of ip6_ll_fib_get() before doing the FIB lookup, just as happens a few lines further down in the same vnet/ip6-nd/ip6_nd.c file. A candidate fix is in https://gerrit.fd.io/r/c/vpp/+/46038 and I have not observed crashes after applying it, although I'm not certain if returning FIB_NODE_INDEX_INVALID and dropping the packet is the right call in this case, or if we have to do something more?

I have the coredumps and symbols here if somebody wants to take a closer look.

groet,
Pim

--
Pim van Pelt<[email protected]>
PBVP1-RIPEhttps://ipng.ch/
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#27063): https://lists.fd.io/g/vpp-dev/message/27063
Mute This Topic: https://lists.fd.io/mt/119745721/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to