Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Florin Coras
Hi, 

Inline.

> On Mar 19, 2023, at 6:47 PM, Zhang Dongya  wrote:
> 
> Hi,
> 
> It can be aborted both in established state or half open state because I will 
> do timeout in our app layer. 

[fc] Okay! Is the issue present irrespective of the state of the session or 
does it happen only after a disconnect in hanf-open state? More lower. 

> 
> Regarding your question,
> 
> - Yes we add a builtin in app relys on C apis that  mainly use 
> vnet_connect/disconnect to connect or disconnect session.

[fc] Understood

> - We call these api in a vpp ctrl process which should be running on the 
> master thread, we never do session setup/teardown on worker thread. (the 
> environment that found this issue is configured with 1 master + 1 worker 
> setup.)

[fc] With vpp latest it’s possible to connect from first workers. It’s an 
optimization meant to avoid 1) worker barrier on syns and 2) entering poll mode 
on main (consume less cpu)

> - We started to develop the app using 22.06 and I keep to merge upstream 
> changes to latest vpp by cherry-picking. The reason for line mismatch is that 
> I added some comment to the session layer code, it should be equal to the 
> master branch now.

[fc] Ack

> 
> When reading the code I understand that we mainly want to cleanup half open 
> from bihash in session_stream_connect_notify, however, in syn-sent state if I 
> choose to close the session, the session might be closed by my app due to 
> session setup timeout (in second scale), in that case, session will be marked 
> as half_open_done and half open session will be freed shortly in the ctrl 
> thread (the 1st worker?).

[fc] Actually, this might be the issue. We did start to provide a half-open 
session handle to apps which if closed does clean up the session but apparently 
it is missing the cleanup of the session lookup table. Could you try this patch 
[1]? It might need additional work.

Having said that, forcing a close/cleanup will not free the port synchronously. 
So, if you’re using fixed ports, you’ll have to wait for the half-open cleanup 
notification.

> 
> Should I also registered half open callback or there are some other reason 
> that lead to this failure?
> 

[fc] Yes, see above.

Regards, 
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/38526

> 
> Florin Coras mailto:fcoras.li...@gmail.com>> 
> 于2023年3月20日周一 06:22写道:
>> Hi, 
>> 
>> When you abort the connection, is it fully established or half-open? 
>> Half-opens are cleaned up by the owner thread after a timeout, but the 
>> 5-tuple should be assigned to the fully established session by that point. 
>> tcp_half_open_connection_cleanup does not cleanup the bihash instead 
>> session_stream_connect_notify does once tcp connect returns either success 
>> or failure. 
>> 
>> So a few questions:
>> - is it accurate to assume you have a builtin vpp app and rely only on C 
>> apis to interact with host stack?
>> - on what thread (main or first worker) do you call vnet_connect?
>> - what api do you use to close the session? 
>> - what version of vpp is this because lines don’t match vpp latest?
>> 
>> Regards,
>> Florin
>> 
>> > On Mar 19, 2023, at 2:08 AM, Zhang Dongya > > > wrote:
>> > 
>> > Hi list,
>> > 
>> > recently in our application, we constantly triggered such abrt issue which 
>> > make our connectivity interrupt for a while:
>> > 
>> > Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
>> > 0x7fefd3b2000b
>> > Mar 19 16:11:26 ubuntu vnet[2565933]: 
>> > /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline) 
>> > assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails
>> > 
>> > Our scenario is quite simple, we will make 4 parallel tcp connection (use 
>> > 4 fixed source ports) to a remote vpp stack (fixed ip and port), and will 
>> > do some keepalive in our application layer, since we only use the vpp tcp 
>> > stack to make the middle box happy with the connection, we do not use the 
>> > data transport of tcp statck actually.
>> > 
>> > However, since the network condition is complex, we have to  always need 
>> > to abrt the connection and reconnect.
>> > 
>> > I keep to merge upstream session and tcp fix however the issue still not 
>> > fixed, what I found now it may be in some case 
>> > tcp_half_open_connection_cleanup may not deleted the half open session 
>> > from the lookup table (bihash) and the session index is realloced by other 
>> > connection.
>> > 
>> > Hope the list can provide some hint about how to overcome this issue, 
>> > thanks a lot.
>> > 
>> > 
>> > 
>> 
>> 
>> 
>> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22729): https://lists.fd.io/g/vpp-dev/message/22729
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=

Re: [vpp-dev] VPP not dropping packets with incorrect vlan tags on untagged interface

2023-03-19 Thread Krishna, Parameswaran via lists.fd.io
Hi,
Did anyone get a chance to look at this issue? If anyone has any sort of input, 
that will be of great help.
Please let me know if any additional information is needed. Thank you.

Best regards,
Parameswaran

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22728): https://lists.fd.io/g/vpp-dev/message/22728
Mute This Topic: https://lists.fd.io/mt/97622849/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Zhang Dongya
Hi,

It can be aborted both in established state or half open state because I
will do timeout in our app layer.

Regarding your question,

- Yes we add a builtin in app relys on C apis that  mainly use
vnet_connect/disconnect to connect or disconnect session.
- We call these api in a vpp ctrl process which should be running on the
master thread, we never do session setup/teardown on worker thread. (the
environment that found this issue is configured with 1 master + 1 worker
setup.)
- We started to develop the app using 22.06 and I keep to merge upstream
changes to latest vpp by cherry-picking. The reason for line mismatch is
that I added some comment to the session layer code, it should be equal to
the master branch now.

When reading the code I understand that we mainly want to cleanup half open
from bihash in session_stream_connect_notify, however, in syn-sent state if
I choose to close the session, the session might be closed by my app due to
session setup timeout (in second scale), in that case, session will be
marked as half_open_done and half open session will be freed shortly in the
ctrl thread (the 1st worker?).

Should I also registered half open callback or there are some other reason
that lead to this failure?


Florin Coras  于2023年3月20日周一 06:22写道:

> Hi,
>
> When you abort the connection, is it fully established or half-open?
> Half-opens are cleaned up by the owner thread after a timeout, but the
> 5-tuple should be assigned to the fully established session by that point.
> tcp_half_open_connection_cleanup does not cleanup the bihash instead
> session_stream_connect_notify does once tcp connect returns either success
> or failure.
>
> So a few questions:
> - is it accurate to assume you have a builtin vpp app and rely only on C
> apis to interact with host stack?
> - on what thread (main or first worker) do you call vnet_connect?
> - what api do you use to close the session?
> - what version of vpp is this because lines don’t match vpp latest?
>
> Regards,
> Florin
>
> > On Mar 19, 2023, at 2:08 AM, Zhang Dongya 
> wrote:
> >
> > Hi list,
> >
> > recently in our application, we constantly triggered such abrt issue
> which make our connectivity interrupt for a while:
> >
> > Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC
> 0x7fefd3b2000b
> > Mar 19 16:11:26 ubuntu vnet[2565933]:
> /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline)
> assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails
> >
> > Our scenario is quite simple, we will make 4 parallel tcp connection
> (use 4 fixed source ports) to a remote vpp stack (fixed ip and port), and
> will do some keepalive in our application layer, since we only use the vpp
> tcp stack to make the middle box happy with the connection, we do not use
> the data transport of tcp statck actually.
> >
> > However, since the network condition is complex, we have to  always need
> to abrt the connection and reconnect.
> >
> > I keep to merge upstream session and tcp fix however the issue still not
> fixed, what I found now it may be in some case
> tcp_half_open_connection_cleanup may not deleted the half open session from
> the lookup table (bihash) and the session index is realloced by other
> connection.
> >
> > Hope the list can provide some hint about how to overcome this issue,
> thanks a lot.
> >
> >
> >
>
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22727): https://lists.fd.io/g/vpp-dev/message/22727
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-19 Thread Florin Coras
I just tried iperf3 in cut-through mode, i.e., server and client attached to 
the same vpp instance running 4 workers, with 128 connections and this seems to 
be working fine.

Could you try that out and see if it’s also working for you? It might be that 
this is something specific to how Redis uses sockets, so to reproduce we’ll 
need to replicate your testbed.

Regards,
Florin

> On Mar 19, 2023, at 2:58 PM, Florin Coras via lists.fd.io 
>  wrote:
> 
> Hi, 
> 
> That may very well be a problem introduced by the move of connects to first 
> worker. Unfortunately, I we don’t have tests for all of those corner cases 
> yet.
> 
> However, to replicate this issue, could you provide a bit more details about 
> your setup and the exact backtrace? It looks like you’re leverage cut-through 
> sessions so the server and client are attached to the same vpp instance? 
> Also, could you try vpp latest to see check if the issue still persists? 
> 
> Regards,
> Florin
> 
>> On Mar 19, 2023, at 1:53 AM, chenwei...@outlook.com wrote:
>> 
>> Hi vpp-team,
>>  I'm new to VPP and I'm trying to run Redis 6.0.18 in VCL with LD_PRELOAD 
>> using VPP 22.10 and VPP 23.02. I found that assert fails frequently in VPP 
>> 23.02, and after checking, I found that the assert fails in the session_get 
>> function in vnet/session/session.h. The cause was an invalid session_id with 
>> a value of -1 (or ~0).
>>  This function is called by the session_half_open_migrate_notify function in 
>> vnet/session/session.c, which is called by ct_accept_one in 
>> vnet/session/application_local.c. Function ct_accept_one is called because 
>> of an accept RPC request handled by the session_event_dispatch_ctrl function 
>> from the ct_connect function in vnet/session/application_local.c. Function 
>> ct_connect allocates and initializes a half-open transport object. However, 
>> its c_s_index value is -1 (or ~0), i.e., no session is allocated. allocating 
>> a session is implemented by calling session_alloc_for_half_open in the 
>> session_open_vc function of ct_connect (located in vnet/session/session.c). 
>> Therefore, I think the assertion failure is a case that ct_accept_one 
>> function accesses half-open tc without a session being allocated.
>>  I found that this problem does not exist on VPP 22.10. I checked the 
>> patches between 22.10 and 23.02 and found “session: move connects to first 
>> worker (https://gerrit.fd.io/r/c/vpp/+/35713)” that might be related to this 
>> issue, but I can't give a definite statement and I don’t know how fix it. I 
>> would be very grateful if you could address this issue.
>> Thanks,
>> 
>> 
>> 
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22726): https://lists.fd.io/g/vpp-dev/message/22726
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Florin Coras
Hi, 

When you abort the connection, is it fully established or half-open? Half-opens 
are cleaned up by the owner thread after a timeout, but the 5-tuple should be 
assigned to the fully established session by that point. 
tcp_half_open_connection_cleanup does not cleanup the bihash instead 
session_stream_connect_notify does once tcp connect returns either success or 
failure. 

So a few questions:
- is it accurate to assume you have a builtin vpp app and rely only on C apis 
to interact with host stack?
- on what thread (main or first worker) do you call vnet_connect?
- what api do you use to close the session? 
- what version of vpp is this because lines don’t match vpp latest?

Regards,
Florin

> On Mar 19, 2023, at 2:08 AM, Zhang Dongya  wrote:
> 
> Hi list,
> 
> recently in our application, we constantly triggered such abrt issue which 
> make our connectivity interrupt for a while:
> 
> Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
> 0x7fefd3b2000b
> Mar 19 16:11:26 ubuntu vnet[2565933]: 
> /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline) 
> assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails
> 
> Our scenario is quite simple, we will make 4 parallel tcp connection (use 4 
> fixed source ports) to a remote vpp stack (fixed ip and port), and will do 
> some keepalive in our application layer, since we only use the vpp tcp stack 
> to make the middle box happy with the connection, we do not use the data 
> transport of tcp statck actually.
> 
> However, since the network condition is complex, we have to  always need to 
> abrt the connection and reconnect.
> 
> I keep to merge upstream session and tcp fix however the issue still not 
> fixed, what I found now it may be in some case 
> tcp_half_open_connection_cleanup may not deleted the half open session from 
> the lookup table (bihash) and the session index is realloced by other 
> connection.
> 
> Hope the list can provide some hint about how to overcome this issue, thanks 
> a lot.
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22725): https://lists.fd.io/g/vpp-dev/message/22725
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-19 Thread Florin Coras
Hi, 

That may very well be a problem introduced by the move of connects to first 
worker. Unfortunately, I we don’t have tests for all of those corner cases yet.

However, to replicate this issue, could you provide a bit more details about 
your setup and the exact backtrace? It looks like you’re leverage cut-through 
sessions so the server and client are attached to the same vpp instance? Also, 
could you try vpp latest to see check if the issue still persists? 

Regards,
Florin

> On Mar 19, 2023, at 1:53 AM, chenwei...@outlook.com wrote:
> 
> Hi vpp-team,
>   I'm new to VPP and I'm trying to run Redis 6.0.18 in VCL with LD_PRELOAD 
> using VPP 22.10 and VPP 23.02. I found that assert fails frequently in VPP 
> 23.02, and after checking, I found that the assert fails in the session_get 
> function in vnet/session/session.h. The cause was an invalid session_id with 
> a value of -1 (or ~0).
>   This function is called by the session_half_open_migrate_notify function in 
> vnet/session/session.c, which is called by ct_accept_one in 
> vnet/session/application_local.c. Function ct_accept_one is called because of 
> an accept RPC request handled by the session_event_dispatch_ctrl function 
> from the ct_connect function in vnet/session/application_local.c. Function 
> ct_connect allocates and initializes a half-open transport object. However, 
> its c_s_index value is -1 (or ~0), i.e., no session is allocated. allocating 
> a session is implemented by calling session_alloc_for_half_open in the 
> session_open_vc function of ct_connect (located in vnet/session/session.c). 
> Therefore, I think the assertion failure is a case that ct_accept_one 
> function accesses half-open tc without a session being allocated.
>   I found that this problem does not exist on VPP 22.10. I checked the 
> patches between 22.10 and 23.02 and found “session: move connects to first 
> worker (https://gerrit.fd.io/r/c/vpp/+/35713)” that might be related to this 
> issue, but I can't give a definite statement and I don’t know how fix it. I 
> would be very grateful if you could address this issue.
> Thanks,
> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22724): https://lists.fd.io/g/vpp-dev/message/22724
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] bridge domain is not forwarding to all ports #vpp

2023-03-19 Thread Praveen Singh
Hi All,
I have created a bridge domain with three interface. host-interface, and two 
physical interface (sriov VF).
vpp# sh bridge-domain 20 detail
BD-ID   Index   BSN  Age(min)  Learning  U-Forwrd   UU-Flood   Flooding  
ARP-Term  arp-ufwd   BVI-Intf
20       1      0     off        on        on       flood        on       off   
    off        N/A

Interface           If-idx ISN  SHG  BVI  TxFlood        VLAN-Tag-Rewrite
eth1                2     2    0    -      *                 none
eth0                1     3    0    -      *                 none
host-net1             3     1    0    -      *                 none

Two network traffic is coming from application POD to host-inteface (10.42.x.x 
) and 10.32.x.x. The 10.32.x.x trafiic works well which passes through eth0. 
But for 10.42.x.x traffic also passing through eth0-output. so ping is not 
working for 10.42.x.x network.

vpp# sh trace
Limiting display to 50 packets. To display more specify max.
--- Start of thread 0 vpp_main ---
Packet 1

02:09:36:193834: af-packet-input
af_packet: hw_if_index 3 next-index 4
tpacket2_hdr:
status 0x2009 len 55 snaplen 55 mac 66 net 80
sec 0x64172af6 nsec 0x47a18f6 vlan 0 vlan_tpid 0
02:09:36:193837: ethernet-input
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
02:09:36:193838: l2-input
l2-input: sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0
02:09:36:193840: l2-learn
l2-learn: sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 bd_index 1
02:09:36:193842: l2-fwd
l2-fwd:   sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 bd_index 1 
result [0x36a01, 1] none
02:09:36:193844: l2-output
l2-output: sw_if_index 1 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 data 08 00 
45 00 00 29 0c 0c 40 00 40 11
02:09:36:193844: eth0-output
eth0
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
UDP: 10.22.119.38 -> 10.32.31.4
tos 0x00, ttl 64, length 41, checksum 0x8458 dscp CS0 ecn NON_ECN
fragment id 0x0c0c, flags DONT_FRAGMENT
UDP: 12001 -> 2900
length 21, checksum 0x
02:09:36:193846: eth0-tx
eth0 tx queue 0
buffer 0xfd7be6: current data 0, length 55, buffer-pool 0, ref-count 1, 
totlen-nifb 0, trace handle 0x0
ip4 offload-udp-cksum l2-hdr-offset 0 l3-hdr-offset 14
PKT MBUF: port 65535, nb_segs 1, pkt_len 55
buf_len 2176, data_len 55, ol_flags 0xb0, data_off 128, phys_addr 
0x3f5efa00
packet_type 0x0 l2_len 14 l3_len 20 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
Packet Offload Flags
PKT_TX_TCP_CKSUM (0x) TCP cksum of TX pkt. computed by NIC
PKT_TX_SCTP_CKSUM (0x) SCTP cksum of TX pkt. computed by NIC
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
UDP: 10.22.119.38 -> 10.32.31.4
tos 0x00, ttl 64, length 41, checksum 0x8458 dscp CS0 ecn NON_ECN
fragment id 0x0c0c, flags DONT_FRAGMENT
UDP: 12001 -> 2900
length 21, checksum 0xaa86

Packet 2

02:09:39:23: af-packet-input
af_packet: hw_if_index 3 next-index 4
tpacket2_hdr:
status 0x2009 len 55 snaplen 55 mac 66 net 80
sec 0x64172af9 nsec 0x6e5780a vlan 0 vlan_tpid 0
02:09:39:234446: ethernet-input
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
02:09:39:234447: l2-input
l2-input: sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0
02:09:39:234448: l2-learn
l2-learn: sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 bd_index 1
02:09:39:234448: l2-fwd
l2-fwd:   sw_if_index 3 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 bd_index 1 
result [0x36a01, 1] none
02:09:39:234450: l2-output
l2-output: sw_if_index 1 dst 6e:a7:f1:e0:39:bc src 2e:5d:06:ef:df:f0 data 08 00 
45 00 00 29 0d 47 40 00 40 11
02:09:39:234450: eth0-output
eth0
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
UDP: 10.22.119.38 -> 10.32.31.4
tos 0x00, ttl 64, length 41, checksum 0x831d dscp CS0 ecn NON_ECN
fragment id 0x0d47, flags DONT_FRAGMENT
UDP: 12001 -> 2900
length 21, checksum 0x
02:09:39:234451: eth0-tx
eth0 tx queue 0
buffer 0xff638d: current data 0, length 55, buffer-pool 0, ref-count 1, 
totlen-nifb 0, trace handle 0x1
ip4 offload-udp-cksum l2-hdr-offset 0 l3-hdr-offset 14
PKT MBUF: port 65535, nb_segs 1, pkt_len 55
buf_len 2176, data_len 55, ol_flags 0xb0, data_off 128, phys_addr 
0x3fd8e3c0
packet_type 0x0 l2_len 14 l3_len 20 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
Packet Offload Flags
PKT_TX_TCP_CKSUM (0x) TCP cksum of TX pkt. computed by NIC
PKT_TX_SCTP_CKSUM (0x) SCTP cksum of TX pkt. computed by NIC
IP4: 2e:5d:06:ef:df:f0 -> 6e:a7:f1:e0:39:bc
UDP: 10.22.119.38 -> 10.32.31.4
tos 0x00, ttl 64, length 41, checksum 0x831d dscp CS0 ecn NON_ECN
fragment id 0x0d47, flags DONT_FRAGMENT
UDP: 12001 -> 2900
length 21, checksum 0xaa86

When eth0 interface shutdown then ping for 10.42.x.x works fine. Can you pl 
explain why 10.42.x.x traffic only tx through eth0 only/
Thanks,
Praveen

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22723): https://lists.fd.io/g/vpp-dev/message/22723
Mute This Topic: https://list

Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

2023-03-19 Thread Hao Tian
Hi,

Fix confirmed. Git master branch runs the test code without any error. Thanks 
for everyone participating in this thread!

Best regards,
Hao Tian

-Original Message-
From: vpp-dev@lists.fd.io  on behalf of Hao Tian 

Sent: Friday, March 17, 2023 11:17 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Hi Dave,

I tested the change and found some issue. Please check gerrit comments. Thanks!

Best regards,
Hao Tian

-Original Message-
From: vpp-dev@lists.fd.io  on behalf of Dave Barach 

Sent: Friday, March 17, 2023 1:50 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?

Please see https://gerrit.fd.io/r/c/vpp/+/38507



-Original Message-

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@lists.fd.io

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hi Dave,



Thanks for your work. I am ready to test whenever needed.



Best regards,

Hao Tian





From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> on behalf of Dave Barach 
mailto:v...@barachs.net>>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@lists.fd.io

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



I'm doing a bit of work to straighten out the template, hopefully without 
causing a measurable performance regression.



Hao's test code is a bit of a corner-case: there is exactly one record in the 
database which the code thrashes as hard as possible.



D.



-Original Message-

From: vpp-dev@lists.fd.io 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Andrew 
Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@lists.fd.io

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - 
misuse or bug?



Hao,



I noticed the same behavior when stress-testing the multi thread session 
handling for the ACL plugin a while ago. I thought this trade off is there to 
avoid having to do the hard locks in bihash code, rather than it being a bug.



As you say - the special value comes only if the deletion is in progress, and 
it is always the same. So I just treated that case in my code same as “not 
found”.



My logic was: if an entry is just in process of being deleted, there is very 
little use for its old value anyway.



--a



> On 15 Mar 2023, at 14:45, Hao Tian 
> mailto:tianhao...@outlook.com>> wrote:

>

> Hi,

>

> I tried but could not come up with any way that is able to ensure the kvp 
> being valid upon return without using the full bucket lock.

>

> Maybe we can make a copy of the value before returning, validate the copy and 
> return that copy instead. Critical section can be shrinked to cover only the 
> copying process, which seems to perform better, but I'm not sure if this is 
> the best approach.

>

> Could you please shed some light here? Thanks!

>

> Regards,

> Hao Tian

>




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22722): https://lists.fd.io/g/vpp-dev/message/22722
Mute This Topic: https://lists.fd.io/mt/97599770/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Sigabrt in tcp46_input_inline for tcp_lookup_is_valid

2023-03-19 Thread Zhang Dongya
Hi list,

recently in our application, we constantly triggered such abrt issue which
make our connectivity interrupt for a while:

Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC
0x7fefd3b2000b
Mar 19 16:11:26 ubuntu vnet[2565933]:
/home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 (tcp46_input_inline)
assertion `tcp_lookup_is_valid (tc0, b[0], tcp_buffer_hdr (b[0]))' fails

Our scenario is quite simple, we will make 4 parallel tcp connection (use 4
fixed source ports) to a remote vpp stack (fixed ip and port), and will do
some keepalive in our application layer, since we only use the vpp tcp
stack to make the middle box happy with the connection, we do not use the
data transport of tcp statck actually.

However, since the network condition is complex, we have to  always need to
abrt the connection and reconnect.

I keep to merge upstream session and tcp fix however the issue still not
fixed, what I found now it may be in some case
tcp_half_open_connection_cleanup may not deleted the half open session from
the lookup table (bihash) and the session index is realloced by other
connection.

Hope the list can provide some hint about how to overcome this issue,
thanks a lot.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22721): https://lists.fd.io/g/vpp-dev/message/22721
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] #vnet A bug which may cause assertion error in vnet/session

2023-03-19 Thread chenweihao
Hi vpp-team,

I'm new to VPP and I'm trying to run Redis 6.0.18 in VCL with LD_PRELOAD using 
VPP 22.10 and VPP 23.02. I found that assert fails frequently in VPP 23.02, and 
after checking, I found that the assert fails in the session_get function in 
vnet/session/session.h. The cause was an invalid session_id with a value of -1 
(or ~0).

This function is called by the session_half_open_migrate_notify function in 
vnet/session/session.c, which is called by ct_accept_one in 
vnet/session/application_local.c. Function ct_accept_one is called because of 
an accept RPC request handled by the session_event_dispatch_ctrl function from 
the ct_connect function in vnet/session/application_local.c. Function 
ct_connect allocates and initializes a half-open transport object. However, its 
c_s_index value is -1 (or ~0), i.e., no session is allocated. allocating a 
session is implemented by calling session_alloc_for_half_open in the 
session_open_vc function of ct_connect (located in vnet/session/session.c). 
Therefore, I think the assertion failure is a case that ct_accept_one function 
accesses half-open tc without a session being allocated.

I found that this problem does not exist on VPP 22.10. I checked the patches 
between 22.10 and 23.02 and found “session: move connects to first worker 
(https://gerrit.fd.io/r/c/vpp/+/35713)” that might be related to this issue, 
but I can't give a definite statement and I don’t know how fix it. I would be 
very grateful if you could address this issue.
Thanks,

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22720): https://lists.fd.io/g/vpp-dev/message/22720
Mute This Topic: https://lists.fd.io/mt/97707720/21656
Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-