On Tue, Jan 05, 2021 at 10:30:33AM +0100, Claudio Jeker wrote: > On Tue, Jan 05, 2021 at 10:16:04AM +0100, Jan Klemkow wrote: > > On Wed, Dec 23, 2020 at 11:59:13AM +0000, Stuart Henderson wrote: > > > On 2020/12/17 20:50, Jan Klemkow wrote: > > > > ping > > > > > > > > On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote: > > > > > bluhm and I make some network performance measurements and kernel > > > > > profiling. > > > > > > I've been running this on my workstation since you sent it out - lots > > > of long-running ssh connections, hourly reposync, daily rsync of base > > > snapshots. > > > > > > I don't know enough about TCP stack behaviour to really give a meaningful > > > OK, but certainly not seeing any problems with it. > > > > Thanks, Stuart. Has someone else tested this diff? Or, are there some > > opinions or objections about it? Even bike-shedding is welcome :-) > > From my memory TCP uses the ACKs on startup to increase the send window > and so your diff could slow down the initial startup. Not sure if that > matters actually. It can have some impact if userland reads in big blocks > at infrequent intervals since then the ACK clock slows down. > > I guess to get converage it would be best to commit this and then monitor > the lists for possible slowdowns.
It there a way to commit this, or to test the diff in snapshots? bye, Jan > > > > > Setup: Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> > > > > > Linux (iperf) > > > > > > > > > > We figured out, that the kernel uses a huge amount of processing time > > > > > for sending ACKs to the sender on the receiving interface. After > > > > > receiving a data segment, we send our two ACK. The first one in > > > > > tcp_input() direct after receiving. The second ACK is send out, after > > > > > the userland or the sosplice task read some data out of the socket > > > > > buffer. > > > > > > > > > > The fist ACK in tcp_input() is called after receiving every other data > > > > > segment like it is discribed in RFC1122: > > > > > > > > > > 4.2.3.2 When to Send an ACK Segment > > > > > A TCP SHOULD implement a delayed ACK, but an ACK should > > > > > not be excessively delayed; in particular, the delay > > > > > MUST be less than 0.5 seconds, and in a stream of > > > > > full-sized segments there SHOULD be an ACK for at least > > > > > every second segment. > > > > > > > > > > This advice is based on the paper "Congestion Avoidance and Control": > > > > > > > > > > 4 THE GATEWAY SIDE OF CONGESTION CONTROL > > > > > The 8 KBps senders were talking to 4.3+BSD receivers > > > > > which would delay an ack for atmost one packet (because > > > > > of an ack’s clock’ role, the authors believe that the > > > > > minimum ack frequency should be every other packet). > > > > > > > > > > Sending the first ACK (on every other packet) coasts us too much > > > > > processing time. Thus, we run into a full socket buffer earlier. The > > > > > first ACK just acknowledges the received data, but does not update the > > > > > window. The second ACK, caused by the socket buffer reader, also > > > > > acknowledges the data and also updates the window. So, the second > > > > > ACK, > > > > > is much more worth for a fast packet processing than the fist one. > > > > > > > > > > The performance improvement is between 33% with splicing and 20% > > > > > without > > > > > splice: > > > > > > > > > > splicing relaying > > > > > > > > > > current 3.1 GBit/s 2.6 GBit/s > > > > > w/o first ack 4.1 GBit/s 3.1 GBit/s > > > > > > > > > > As far as I understand the implementation of other operating systems: > > > > > Linux has implement a custom TCP_QUICKACK socket option, to turn this > > > > > kind of feature on and off. FreeBSD and NetBSD sill depend on it, > > > > > when > > > > > using the New Reno implementation. > > > > > > > > > > The following diff turns off the direct ACK on every other segment. > > > > > We > > > > > are running this diff in production on our own machines at genua and > > > > > on > > > > > our products for several month, now. We don't noticed any problems, > > > > > even with interactive network sessions (ssh) nor with bulk traffic. > > > > > > > > > > Another solution could be a sysctl(3) or an additional socket option, > > > > > similar to Linux, to control this behavior per socket or system wide. > > > > > Also, a counter to ACK every 3rd, 4th... data segment could beat the > > > > > problem. > > > > > > > > > > bye, > > > > > Jan > > > > > > > > > > Index: netinet/tcp_input.c > > > > > =================================================================== > > > > > RCS file: /cvs/src/sys/netinet/tcp_input.c,v > > > > > retrieving revision 1.365 > > > > > diff -u -p -r1.365 tcp_input.c > > > > > --- netinet/tcp_input.c 19 Jun 2020 22:47:22 -0000 1.365 > > > > > +++ netinet/tcp_input.c 5 Nov 2020 23:00:34 -0000 > > > > > @@ -165,8 +165,8 @@ do { \ > > > > > #endif > > > > > > > > > > /* > > > > > - * Macro to compute ACK transmission behavior. Delay the ACK unless > > > > > - * we have already delayed an ACK (must send an ACK every two > > > > > segments). > > > > > + * Macro to compute ACK transmission behavior. Delay the ACK until > > > > > + * a read from the socket buffer or the delayed ACK timer causes one. > > > > > * We also ACK immediately if we received a PUSH and the ACK-on-PUSH > > > > > * option is enabled or when the packet is coming from a loopback > > > > > * interface. > > > > > @@ -176,8 +176,7 @@ do { \ > > > > > struct ifnet *ifp = NULL; \ > > > > > if (m && (m->m_flags & M_PKTHDR)) \ > > > > > ifp = if_get(m->m_pkthdr.ph_ifidx); \ > > > > > - if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \ > > > > > - (tcp_ack_on_push && (tiflags) & TH_PUSH) || \ > > > > > + if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \ > > > > > (ifp && (ifp->if_flags & IFF_LOOPBACK))) \ > > > > > tp->t_flags |= TF_ACKNOW; \ > > > > > else \ > > > > > > > > > > > > > > > > -- > :wq Claudio