The crash have not been found anymore. Does this fix make any sense? it it does, I will submit a patch later.
Zhang Dongya via lists.fd.io <fortitude.zhang=gmail....@lists.fd.io> 于 2022年11月29日周二 22:51写道: > Hi ben, > > In the beginning I also think it should be a barrier issue, however it > turned out not the case. > > The pkt which had sw_if_index[VLIB_RX] set as the to-be-deleted interface > is actually being put to ip4-lookup node by my process node, the process > node add pkt in a timer drive way. > > Since the pkt is added by my process node, I think it is not affected by > the worker barrier. in my case the sub if is deleted by API, which is > processed in linux_epoll_input PRE_INPUT node, let's consider the following > sequence: > > > 1. my process add a pkt to ip4-node, and the pkt refer to a valid sw > if index > 2. linux_epoll_input process a API request to delete the above sw if > index. > 3. vpp schedule ip4-lookup node, then it will crash because the sw if > index is deleted and ip4_lookup node can't use sw_if_index[VLIB_RX] which > is now ~0 to get a valid fib index. > > > There are some code that do this way (ikev2_send_ike and others), I think > it's not doable to update the pending frame when the interface is deleted. > > Benoit Ganne (bganne) via lists.fd.io <bganne=cisco....@lists.fd.io> > 于2022年11月29日周二 22:22写道: > >> Hi Zhang, >> >> I'd expect the interface deletion to happen under the worker barrier. VPP >> workers should drain all their in-flight packets before entering the >> barrier, so it should not be possible for the interface to disappear >> between your node and ip4-lookup. Or am I missing something? >> What I have seen happening is you'd have some data structure where you >> keep the interface index that you use in your node, and this data is not >> updated when the interface is removed. >> Regarding your proposal, I suspect an issue could be when we reuse the >> sw_if_index: if you del a sw_interface and then add a new one, chances are >> you'll be reusing the same index, but fib_index might be different. >> >> Best >> ben >> >> > -----Original Message----- >> > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Zhang >> Dongya >> > Sent: Tuesday, November 29, 2022 3:45 >> > To: vpp-dev@lists.fd.io >> > Subject: Re: [vpp-dev] possible use deleted sw if index in ip4-lookup >> and >> > cause crash >> > >> > >> > I have found a solution and it can solve the crash issue. >> > >> > In ip4_sw_interface_add_del which is a callback for interface deletion, >> we >> > may set the fib index of the removed interface to 0 (default fib) >> instead >> > of ~0. This behavior is same with interface creation. >> > >> > >> > >> > Zhang Dongya via lists.fd.io <http://lists.fd.io> >> > <fortitude.zhang=gmail....@lists.fd.io <mailto:gmail....@lists.fd.io> >> > 于 >> > 2022年11月28日周一 19:41写道: >> > >> > >> > Hi list, >> > >> > Recently I encountered a vpp crash with my plugin enabled, after >> > some investigation I find it may related with l3 sub interface delete >> > while my process node add work to ip4-lookup node. >> > >> > >> > Intuitively I think it may related to a barrier usage but I tried >> > to fix by add some check in my process node to guard the case that l3 >> sub >> > interface is deleted. however the crash still exists. >> > >> > Finally I think it should be related to a pattern like this: >> > >> > 1, my process node adds a pkt by using put_frame_to_node to ip4- >> > lookup directly, which set the rx interface to the l3 sub interface >> > created before. >> > >> > 2, my control plane agent (using govpp) delete the l3 sub >> > interface. (it should be handled in vpp api-process node) >> > >> > 3, vpp schedule pending nodes. since the rx interface is deleted, >> > vpp can't get a valid fib index and there is not check in the following >> > ip4_fib_forwarding_lookup, so it crash with abort. >> > >> > I think vpp may schedule my process node(timeout driven) and api- >> > process node one over one, then it will schedule the pending nodes. >> > >> > Should I add some check in ip4-lookup or there are better way of >> > sending pkt in ctrl process not correct ? >> > >> > Thanks a lot. >> > >> > >> > >> > >> > >> > >> > >> > >> >> >> >> >> > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22295): https://lists.fd.io/g/vpp-dev/message/22295 Mute This Topic: https://lists.fd.io/mt/95307938/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-