Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-29 Thread Florin Coras
Hi Ivan, Inline. > On Jul 29, 2020, at 9:40 AM, Ivan Shvedunov wrote: > > Hi Florin, > > while trying to fix the proxy cleanup issue, I've spotted another problem in > the TCP stack, namely RSTs being ignored in SYN_SENT (half-open) connection > state: > https://gerrit.fd.io/r/c/vpp/+/28103

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-29 Thread Ivan Shvedunov
Hi Florin, while trying to fix the proxy cleanup issue, I've spotted another problem in the TCP stack, namely RSTs being ignored in SYN_SENT (half-open) connection state: https://gerrit.fd.io/r/c/vpp/+/28103 The following fix for handling failed active connections in the proxy has worked for me,

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-28 Thread Florin Coras
Hi Ivan, Inline. > On Jul 28, 2020, at 8:45 AM, Ivan Shvedunov wrote: > > Hi Florin, > thanks, the fix has worked and http_static no longer crashes. Perfect, thanks for confirming! > > I still get a number of messages like this when using release build: > /usr/bin/vpp[39]: state_sent_ok:95

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-28 Thread Ivan Shvedunov
Hi Florin, thanks, the fix has worked and http_static no longer crashes. I still get a number of messages like this when using release build: /usr/bin/vpp[39]: state_sent_ok:954: BUG: couldn't send response header! Not sure if it's actually a bug or the queue being actually full because of the pac

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-27 Thread Florin Coras
Hi Ivan, Took a look at the static http server and, as far as I can tell, it has the same type of issue the proxy had, i.e., premature session cleanup/reuse. Does this solve the problem for you [1]? Also, merged your elog fix patch. Thanks! Regards, Florin [1] https://gerrit.fd.io/r/c/vpp/+

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-27 Thread Ivan Shvedunov
Hi. I've debugged http server issue a bit more and here are my observations: if I add an ASSERT(0) in the place of "No http session for thread 0 session_index 54", I get stack trace along the lines of Program received signal SIGABRT, Aborted. 0x7470bf47 in raise () from /lib/x86_64-linux-gn

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-24 Thread Florin Coras
Great! Thanks for confirming! Let me know how it goes with the static http server. Cheers, Florin > On Jul 24, 2020, at 2:00 PM, Ivan Shvedunov wrote: > > Hi Florin, > I re-verified the patches and the modified patch doesn't crash either, so I > think it's safe to merge it. > Thanks! > > I

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-24 Thread Ivan Shvedunov
Hi Florin, I re-verified the patches and the modified patch doesn't crash either, so I think it's safe to merge it. Thanks! I will try to see what is the remaining problem with http_static On Fri, Jul 24, 2020 at 8:15 PM Florin Coras wrote: > Hi Ivan, > > Adding Vanessa to see if she can help w

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-24 Thread Florin Coras
Hi Ivan, Adding Vanessa to see if she can help with the account issues. Thanks a lot for the patches! Pushed them here [1] and [2]. I took the liberty of slightly changing [2], so if you get a chance, do try it out again. Finally, the static http server still needs fixes. Most probably it mi

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-24 Thread Ivan Shvedunov
I did a bit more debugging and found an issue that was causing invalid TCP connection lookups. Basically, if session_connected_callback was failing for an app (in case of proxy, e.g. b/c the other corresponding connection got closed), it was leaving an invalid entry in the session lookup table. Ano

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-23 Thread Florin Coras
Ah, I didn’t try running test.sh 80. The only difference in how I’m running the test is that I start vpp outside of start.sh straight from binaries. Regards, Florin > On Jul 23, 2020, at 8:22 AM, Ivan Shvedunov wrote: > > Well, I always run the same test, the difference being only > "test.sh

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-23 Thread Ivan Shvedunov
Well, I always run the same test, the difference being only "test.sh 80" for http_static (it's configured to be listening on that port) or just "test.sh" for the proxy. As far as I understand, you run the tests without using the containers, does that include setting up netem like this [1] ? [1] ht

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-23 Thread Florin Coras
Hi Ivan, Updated [1] but I’m not seeing [3] after several test iterations. Probably the static server needs the same treatment as the proxy. Are you running a slightly different test? All of the builtin apps have the potential to crash vpp or leave the host stack in an unwanted state since th

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-23 Thread Ivan Shvedunov
http_static produces some errors: /usr/bin/vpp[40]: http_static_server_rx_tx_callback:1010: No http session for thread 0 session_index 4124 /usr/bin/vpp[40]: http_static_server_rx_tx_callback:1010: No http session for thread 0 session_index 4124 /usr/bin/vpp[40]: tcp_input_dispatch_buffer:2812: tcp

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-23 Thread Ivan Shvedunov
Hi, I've found a problem with the timer fix and commented in Gerrit [1] accordingly. Basically this change [2] makes the tcp_prepare_retransmit_segment() issue go away for me. Concerning the proxy example, I can no longer see the SVM FIFO crashes, but when using debug build, VPP crashes with this

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Florin Coras
Hi Ivan, Thanks for the test. After modifying it a bit to run straight from binaries, I managed to repro the issue. As expected, the proxy is not cleaning up the sessions correctly (example apps do run out of sync ..). Here’s a quick patch that solves some of the obvious issues [1] (note that

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Ivan Shvedunov
Concerning the CI: I'd be glad to add that test to "make test", but not sure how to approach it. The test is not about containers but more about using network namespaces and some tools like wrk to create a lot of TCP connections to do some "stress testing" of VPP host stack (and as it was noted, it

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Florin Coras
I missed the point about the CI in my other reply. If we can somehow integrate some container based tests into the “make test” infra, I wouldn’t mind at all! :-) Regards, Florin > On Jul 22, 2020, at 4:17 AM, Ivan Shvedunov wrote: > > Hi, > sadly the patch apparently didn't work. It should ha

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Florin Coras
Hi Ivan, Will try to reproduce but given the types of crashes, it could be that the proxy app is not cleanly releasing the connections. Regards, Florin > On Jul 22, 2020, at 8:29 AM, Ivan Shvedunov wrote: > > Some preliminary observations concerning the crashes in the proxy example: > * !rb

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Ivan Shvedunov
Some preliminary observations concerning the crashes in the proxy example: * !rb_tree_is_init(...) assertion failures are likely caused by multiple active_open_connected_callback() invocations for the same connection * f_update_ooo_deq() SIGSEGV crash is possibly caused for late callbacks for conne

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-22 Thread Ivan Shvedunov
Hi, sadly the patch apparently didn't work. It should have worked but for some reason it didn't ... On the bright side, I've made a test case [1] using fresh upstream VPP code with no UPF that reproduces the issues I mentioned, including both timer and TCP retransmit one along with some other poss

Re: [vpp-dev] TCP timer race and another possible TCP issue

2020-07-16 Thread Florin Coras
Hi Ivan, Thanks for the detailed report! I assume this is a situation where most of the connections time out and the rate limiting we apply on the pending timer queue delays handling for long enough to be in a situation like the one you described. Here’s a draft patch that starts tracking pen

[vpp-dev] TCP timer race and another possible TCP issue

2020-07-16 Thread ivan4th
Hi, I'm working on the Travelping UPF project https://github.com/travelping/vpp ( https://github.com/travelping/vpp ) For variety of reasons, it's presently maintained as a fork of UPF that's rebased on top of upstream master from time to time, but really it's just a plugin. During 40K TCP conne