Hi, 

Okay, resetting of half-opens definitely not supported. I updated the patch to 
just clean them up on forced reset, without sending a reset to make sure 
session lookup table cleanup still happens. 

Regards,
Florin

> On Mar 20, 2023, at 9:13 PM, Zhang Dongya <fortitude.zh...@gmail.com> wrote:
> 
> Hi,
> 
> After review my code, I found that I have add a flag to the vnet_disconnect 
> API which will call session_reset instead of session_close, the reason I do 
> this is to make intermediate firewall just flush the state and reconstruct if 
> I later reconnect.
> 
> It seems in session_reset logic, for half open session, it also missing to 
> remove the session from the lookup hash which may cause the issue too.
> 
> I change my code and will test with your patch along, will provide feedback 
> later.
> 
> I also noticed the bihash issue discussed in the list recently, I will merge 
> later.
> 
> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
> 于2023年3月21日周二 11:56写道:
>> Hi, 
>> 
>> That last thing is pretty interesting. It’s either the issue fixed by this 
>> patch [1] or sessions are somehow cleaned up multiple times. If it’s the 
>> latter, I’d really like to understand how that happens. 
>> 
>> Regards,
>> Florin
>> 
>> [1] https://gerrit.fd.io/r/c/vpp/+/38507 
>> 
>>> On Mar 20, 2023, at 6:52 PM, Zhang Dongya <fortitude.zh...@gmail.com 
>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> After merge this patch and update the test environment, the issue still 
>>> persists.
>>> 
>>> Let me clear my client app config:
>>> 1. register a reset callback, which will call vnet_disconnect there and 
>>> also trigger reconnect by send event to the ctrl process.)
>>> 2. register a connected callback, which will handle connect err by trigger 
>>> reconnect, on success, it will record session handle and extract tcp 
>>> sequence for our app usage.
>>> 3. register a disconnect callback, which basically do same as reset 
>>> callback.
>>> 4. register a cleanup callback and accept callback, which basically make 
>>> the session layer happy without actually relevant work to do.
>>> 
>>> There is a ctrl process in mater, which will handle periodically reconnect 
>>> or triggered by event.
>>> 
>>> BTW, I also see frequently warning 'session %u hash delete rv -3' in 
>>> session_delete in my environment, hope this helps to investigate.
>>> 
>>> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
>>> 于2023年3月20日周一 23:29写道:
>>>> Hi, 
>>>> 
>>>> Understood and yes, connect will synchronously fail if port is not 
>>>> available, so you should be able to retry it later. 
>>>> 
>>>> Regards, 
>>>> Florin
>>>> 
>>>>> On Mar 20, 2023, at 1:58 AM, Zhang Dongya <fortitude.zh...@gmail.com 
>>>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> It seems the issue occurs when there are disconnect called because our 
>>>>> network can't guarantee a tcp can't be reset even when 3 ways handshake 
>>>>> is completed (firewall issue :( ).
>>>>> 
>>>>> When we find the app layer timeout, we will first disconnect (because we 
>>>>> record the session handle, this session might be a half open session), 
>>>>> does vnet session layer guarantee that if we reconnect from master thread 
>>>>> when the half open session still not be released yet (due to asynchronous 
>>>>> logic) that the reconnect fail? if then we can retry connect later.
>>>>> 
>>>>> I prefer to not registered half open callback because I think it make app 
>>>>> complicated from a TCP programming prospective.
>>>>> 
>>>>> For your patch, I think it should be work because I can't delete the half 
>>>>> open session immediately because there is worker configured, so the half 
>>>>> open will be removed from bihash when syn retrans timeout. I have merged 
>>>>> the patch and will provide feedback later.
>>>>> 
>>>>> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
>>>>> 于2023年3月20日周一 13:09写道:
>>>>>> Hi, 
>>>>>> 
>>>>>> Inline.
>>>>>> 
>>>>>>> On Mar 19, 2023, at 6:47 PM, Zhang Dongya <fortitude.zh...@gmail.com 
>>>>>>> <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> It can be aborted both in established state or half open state because 
>>>>>>> I will do timeout in our app layer. 
>>>>>> 
>>>>>> [fc] Okay! Is the issue present irrespective of the state of the session 
>>>>>> or does it happen only after a disconnect in hanf-open state? More 
>>>>>> lower. 
>>>>>> 
>>>>>>> 
>>>>>>> Regarding your question,
>>>>>>> 
>>>>>>> - Yes we add a builtin in app relys on C apis that  mainly use 
>>>>>>> vnet_connect/disconnect to connect or disconnect session.
>>>>>> 
>>>>>> [fc] Understood
>>>>>> 
>>>>>>> - We call these api in a vpp ctrl process which should be running on 
>>>>>>> the master thread, we never do session setup/teardown on worker thread. 
>>>>>>> (the environment that found this issue is configured with 1 master + 1 
>>>>>>> worker setup.)
>>>>>> 
>>>>>> [fc] With vpp latest it’s possible to connect from first workers. It’s 
>>>>>> an optimization meant to avoid 1) worker barrier on syns and 2) entering 
>>>>>> poll mode on main (consume less cpu)
>>>>>> 
>>>>>>> - We started to develop the app using 22.06 and I keep to merge 
>>>>>>> upstream changes to latest vpp by cherry-picking. The reason for line 
>>>>>>> mismatch is that I added some comment to the session layer code, it 
>>>>>>> should be equal to the master branch now.
>>>>>> 
>>>>>> [fc] Ack
>>>>>> 
>>>>>>> 
>>>>>>> When reading the code I understand that we mainly want to cleanup half 
>>>>>>> open from bihash in session_stream_connect_notify, however, in syn-sent 
>>>>>>> state if I choose to close the session, the session might be closed by 
>>>>>>> my app due to session setup timeout (in second scale), in that case, 
>>>>>>> session will be marked as half_open_done and half open session will be 
>>>>>>> freed shortly in the ctrl thread (the 1st worker?).
>>>>>> 
>>>>>> [fc] Actually, this might be the issue. We did start to provide a 
>>>>>> half-open session handle to apps which if closed does clean up the 
>>>>>> session but apparently it is missing the cleanup of the session lookup 
>>>>>> table. Could you try this patch [1]? It might need additional work.
>>>>>> 
>>>>>> Having said that, forcing a close/cleanup will not free the port 
>>>>>> synchronously. So, if you’re using fixed ports, you’ll have to wait for 
>>>>>> the half-open cleanup notification.
>>>>>> 
>>>>>>> 
>>>>>>> Should I also registered half open callback or there are some other 
>>>>>>> reason that lead to this failure?
>>>>>>> 
>>>>>> 
>>>>>> [fc] Yes, see above.
>>>>>> 
>>>>>> Regards, 
>>>>>> Florin
>>>>>> 
>>>>>> [1] https://gerrit.fd.io/r/c/vpp/+/38526
>>>>>> 
>>>>>>> 
>>>>>>> Florin Coras <fcoras.li...@gmail.com <mailto:fcoras.li...@gmail.com>> 
>>>>>>> 于2023年3月20日周一 06:22写道:
>>>>>>>> Hi, 
>>>>>>>> 
>>>>>>>> When you abort the connection, is it fully established or half-open? 
>>>>>>>> Half-opens are cleaned up by the owner thread after a timeout, but the 
>>>>>>>> 5-tuple should be assigned to the fully established session by that 
>>>>>>>> point. tcp_half_open_connection_cleanup does not cleanup the bihash 
>>>>>>>> instead session_stream_connect_notify does once tcp connect returns 
>>>>>>>> either success or failure. 
>>>>>>>> 
>>>>>>>> So a few questions:
>>>>>>>> - is it accurate to assume you have a builtin vpp app and rely only on 
>>>>>>>> C apis to interact with host stack?
>>>>>>>> - on what thread (main or first worker) do you call vnet_connect?
>>>>>>>> - what api do you use to close the session? 
>>>>>>>> - what version of vpp is this because lines don’t match vpp latest?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Florin
>>>>>>>> 
>>>>>>>> > On Mar 19, 2023, at 2:08 AM, Zhang Dongya <fortitude.zh...@gmail.com 
>>>>>>>> > <mailto:fortitude.zh...@gmail.com>> wrote:
>>>>>>>> > 
>>>>>>>> > Hi list,
>>>>>>>> > 
>>>>>>>> > recently in our application, we constantly triggered such abrt issue 
>>>>>>>> > which make our connectivity interrupt for a while:
>>>>>>>> > 
>>>>>>>> > Mar 19 16:11:26 ubuntu vnet[2565933]: received signal SIGABRT, PC 
>>>>>>>> > 0x7fefd3b2000b
>>>>>>>> > Mar 19 16:11:26 ubuntu vnet[2565933]: 
>>>>>>>> > /home/fortitude/glx/vpp/src/vnet/tcp/tcp_input.c:3004 
>>>>>>>> > (tcp46_input_inline) assertion `tcp_lookup_is_valid (tc0, b[0], 
>>>>>>>> > tcp_buffer_hdr (b[0]))' fails
>>>>>>>> > 
>>>>>>>> > Our scenario is quite simple, we will make 4 parallel tcp connection 
>>>>>>>> > (use 4 fixed source ports) to a remote vpp stack (fixed ip and 
>>>>>>>> > port), and will do some keepalive in our application layer, since we 
>>>>>>>> > only use the vpp tcp stack to make the middle box happy with the 
>>>>>>>> > connection, we do not use the data transport of tcp statck actually.
>>>>>>>> > 
>>>>>>>> > However, since the network condition is complex, we have to  always 
>>>>>>>> > need to abrt the connection and reconnect.
>>>>>>>> > 
>>>>>>>> > I keep to merge upstream session and tcp fix however the issue still 
>>>>>>>> > not fixed, what I found now it may be in some case 
>>>>>>>> > tcp_half_open_connection_cleanup may not deleted the half open 
>>>>>>>> > session from the lookup table (bihash) and the session index is 
>>>>>>>> > realloced by other connection.
>>>>>>>> > 
>>>>>>>> > Hope the list can provide some hint about how to overcome this 
>>>>>>>> > issue, thanks a lot.
>>>>>>>> > 
>>>>>>>> > 
>>>>>>>> > 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22741): https://lists.fd.io/g/vpp-dev/message/22741
Mute This Topic: https://lists.fd.io/mt/97707823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to