Hi list,

During the test, when l3sub if is deleted, I got a new abort in interface
drop node, seems the packet reference to a deleted interface.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007face8d17859 in __GI_abort () at abort.c:79
> #2  0x0000000000407397 in os_exit (code=1) at
> /home/fortitude/glx/vpp/src/vpp/vnet/main.c:440
> #3  0x00007face922dd57 in unix_signal_handler (signum=6,
> si=0x7faca2891170, uc=0x7faca2891040) at
> /home/fortitude/glx/vpp/src/vlib/unix/main.c:188
> #4  <signal handler called>
> #5  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #6  0x00007face8d17859 in __GI_abort () at abort.c:79
> #7  0x0000000000407333 in os_panic () at
> /home/fortitude/glx/vpp/src/vpp/vnet/main.c:416
> #8  0x00007face9067039 in debugger () at
> /home/fortitude/glx/vpp/src/vppinfra/error.c:84
> #9  0x00007face9066dfa in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0, fmt=0x7face9f7a208 "%s:%d (%s) assertion `%s' fails") at
> /home/fortitude/glx/vpp/src/vppinfra/error.c:143
> #10 0x00007face9b28358 in vnet_get_sw_interface (vnm=0x7facea243f38
> <vnet_main>, sw_if_index=14) at
> /home/fortitude/glx/vpp/src/vnet/interface_funcs.h:60
> #11 0x00007face9b2a4ba in interface_drop_punt (vm=0x7facac8e5b00,
> node=0x7faca95c8840, frame=0x7facc2004a40,
> disposition=VNET_ERROR_DISPOSITION_DROP)
>     at /home/fortitude/glx/vpp/src/vnet/interface_output.c:1061
> #12 0x00007face9b29a96 in interface_drop_fn_hsw (vm=0x7facac8e5b00,
> node=0x7faca95c8840, frame=0x7facc2004a40) at
> /home/fortitude/glx/vpp/src/vnet/interface_output.c:1215
> #13 0x00007face91cd50d in dispatch_node (vm=0x7facac8e5b00,
> node=0x7faca95c8840, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7facc2004a40,
>     last_time_stamp=404307411779413) at
> /home/fortitude/glx/vpp/src/vlib/main.c:961
> #14 0x00007face91cdfb0 in dispatch_pending_node (vm=0x7facac8e5b00,
> pending_frame_index=3, last_time_stamp=404307411779413) at
> /home/fortitude/glx/vpp/src/vlib/main.c:1120
> #15 0x00007face91c921f in vlib_main_or_worker_loop (vm=0x7facac8e5b00,
> is_main=0) at /home/fortitude/glx/vpp/src/vlib/main.c:1589
> #16 0x00007face91c8947 in vlib_worker_loop (vm=0x7facac8e5b00) at
> /home/fortitude/glx/vpp/src/vlib/main.c:1723
> #17 0x00007face92080a4 in vlib_worker_thread_fn (arg=0x7facaa227d00) at
> /home/fortitude/glx/vpp/src/vlib/threads.c:1579
> #18 0x00007face9203195 in vlib_worker_thread_bootstrap_fn
> (arg=0x7facaa227d00) at /home/fortitude/glx/vpp/src/vlib/threads.c:418
> #19 0x00007face9121609 in start_thread (arg=<optimized out>) at
> pthread_create.c:477
> #20 0x00007face8e14133 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>

>From the first mail, I want to know is the sequence can happen or not ?

1, my process node adds a pkt by using put_frame_to_node to ip4-lookup
directly, which set the rx interface to the l3 sub interface created before.
2, my control plane agent (using govpp) delete the l3 sub interface. (it
should be handled in vpp api-process node)
3, vpp schedule pending nodes. since the rx interface is deleted, vpp can't
get a valid fib index and there is not check in the following
ip4_fib_forwarding_lookup, so it crash with abort.

I don't think a api barrier in step 2 can solve this, since the pkt is
already in the pending frame.

Zhang Dongya via lists.fd.io <fortitude.zhang=gmail....@lists.fd.io>
于2022年12月8日周四 00:17写道:

> The crash have not been found anymore.
>
> Does this fix make any sense? it it does, I will submit a patch later.
>
> Zhang Dongya via lists.fd.io <fortitude.zhang=gmail....@lists.fd.io> 于
> 2022年11月29日周二 22:51写道:
>
>> Hi ben,
>>
>> In the beginning I also think it should be a barrier issue, however it
>> turned out not the case.
>>
>> The pkt which had sw_if_index[VLIB_RX] set as the to-be-deleted interface
>> is actually being put to ip4-lookup node by my process node, the process
>> node add pkt in a timer drive way.
>>
>> Since the pkt is added by my process node, I think it is not affected by
>> the worker barrier.  in my case the sub if is deleted by API, which is
>> processed in linux_epoll_input PRE_INPUT node, let's consider the following
>> sequence:
>>
>>
>>    1. my process add a pkt to ip4-node, and the pkt refer to a valid sw
>>    if index
>>    2. linux_epoll_input process a API request to delete the above sw if
>>    index.
>>    3. vpp schedule ip4-lookup node, then it will crash because the sw if
>>    index is deleted and ip4_lookup node can't use sw_if_index[VLIB_RX] which
>>    is now ~0 to get a valid fib index.
>>
>>
>> There are some code that do this way (ikev2_send_ike and others), I think
>> it's not doable to update the pending frame when the interface is deleted.
>>
>> Benoit Ganne (bganne) via lists.fd.io <bganne=cisco....@lists.fd.io>
>> 于2022年11月29日周二 22:22写道:
>>
>>> Hi Zhang,
>>>
>>> I'd expect the interface deletion to happen under the worker barrier.
>>> VPP workers should drain all their in-flight packets before entering the
>>> barrier, so it should not be possible for the interface to disappear
>>> between your node and ip4-lookup. Or am I missing something?
>>> What I have seen happening is you'd have some data structure where you
>>> keep the interface index that you use in your node, and this data is not
>>> updated when the interface is removed.
>>> Regarding your proposal, I suspect an issue could be when we reuse the
>>> sw_if_index: if you del a sw_interface and then add a new one, chances are
>>> you'll be reusing the same index, but fib_index might be different.
>>>
>>> Best
>>> ben
>>>
>>> > -----Original Message-----
>>> > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Zhang
>>> Dongya
>>> > Sent: Tuesday, November 29, 2022 3:45
>>> > To: vpp-dev@lists.fd.io
>>> > Subject: Re: [vpp-dev] possible use deleted sw if index in ip4-lookup
>>> and
>>> > cause crash
>>> >
>>> >
>>> > I have found a solution and it can solve the crash issue.
>>> >
>>> > In ip4_sw_interface_add_del which is a callback for interface
>>> deletion, we
>>> > may set the fib index of the removed interface to 0 (default fib)
>>> instead
>>> > of ~0.  This behavior is same with interface creation.
>>> >
>>> >
>>> >
>>> > Zhang Dongya via lists.fd.io <http://lists.fd.io>
>>> > <fortitude.zhang=gmail....@lists.fd.io <mailto:gmail....@lists.fd.io>
>>> > 于
>>> > 2022年11月28日周一 19:41写道:
>>> >
>>> >
>>> >       Hi list,
>>> >
>>> >       Recently I encountered a vpp crash with my plugin enabled, after
>>> > some investigation I find it may related with l3 sub interface delete
>>> > while my process node add work to ip4-lookup node.
>>> >
>>> >
>>> >       Intuitively I think it may related to a barrier usage but I tried
>>> > to fix by add some check in my process node to guard the case that l3
>>> sub
>>> > interface is deleted. however the crash still exists.
>>> >
>>> >       Finally I think it should be related to a pattern like this:
>>> >
>>> >       1, my process node adds a pkt by using put_frame_to_node to ip4-
>>> > lookup directly, which set the rx interface to the l3 sub interface
>>> > created before.
>>> >
>>> >       2, my control plane agent (using govpp) delete the l3 sub
>>> > interface. (it should be handled in vpp api-process node)
>>> >
>>> >       3, vpp schedule pending nodes. since the rx interface is deleted,
>>> > vpp can't get a valid fib index and there is not check in the following
>>> > ip4_fib_forwarding_lookup, so it crash with abort.
>>> >
>>> >       I think vpp may schedule my process node(timeout driven) and api-
>>> > process node one over one, then it will schedule the pending nodes.
>>> >
>>> >       Should I add some check in ip4-lookup or there are better way of
>>> > sending pkt in ctrl process not correct ?
>>> >
>>> >       Thanks a lot.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
> 
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22325): https://lists.fd.io/g/vpp-dev/message/22325
Mute This Topic: https://lists.fd.io/mt/95307938/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to