On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:

> Hi,
> 
> bluhm and I make some network performance measurements and kernel
> profiling.
> 
> Setup:        Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf)
> 
> We figured out, that the kernel uses a huge amount of processing time
> for sending ACKs to the sender on the receiving interface.  After
> receiving a data segment, we send our two ACK.  The first one in
> tcp_input() direct after receiving.  The second ACK is send out, after
> the userland or the sosplice task read some data out of the socket
> buffer.
> 
> The fist ACK in tcp_input() is called after receiving every other data
> segment like it is discribed in RFC1122:
> 
>       4.2.3.2  When to Send an ACK Segment
>               A TCP SHOULD implement a delayed ACK, but an ACK should
>               not be excessively delayed; in particular, the delay
>               MUST be less than 0.5 seconds, and in a stream of
>               full-sized segments there SHOULD be an ACK for at least
>               every second segment.
> 
> This advice is based on the paper "Congestion Avoidance and Control":
> 
>       4 THE GATEWAY SIDE OF CONGESTION CONTROL
>               The 8 KBps senders were talking to 4.3+BSD receivers
>               which would delay an ack for atmost one packet (because
>               of an ack’s clock’ role, the authors believe that the
>               minimum ack frequency should be every other packet).
> 
> Sending the first ACK (on every other packet) coasts us too much
> processing time.  Thus, we run into a full socket buffer earlier.  The
> first ACK just acknowledges the received data, but does not update the
> window.  The second ACK, caused by the socket buffer reader, also
> acknowledges the data and also updates the window.  So, the second ACK,
> is much more worth for a fast packet processing than the fist one.
> 
> The performance improvement is between 33% with splicing and 20% without
> splice:
> 
>                       splicing        relaying
> 
>       current         3.1 GBit/s      2.6 GBit/s
>       w/o first ack   4.1 GBit/s      3.1 GBit/s
> 
> As far as I understand the implementation of other operating systems:
> Linux has implement a custom TCP_QUICKACK socket option, to turn this
> kind of feature on and off.  FreeBSD and NetBSD sill depend on it, when
> using the New Reno implementation.
> 
> The following diff turns off the direct ACK on every other segment.  We
> are running this diff in production on our own machines at genua and on
> our products for several month, now.  We don't noticed any problems,
> even with interactive network sessions (ssh) nor with bulk traffic.
> 
> Another solution could be a sysctl(3) or an additional socket option,
> similar to Linux, to control this behavior per socket or system wide.
> Also, a counter to ACK every 3rd, 4th... data segment could beat the
> problem.

I am wondering if you also looked at another scenario: the process
reading the soecket is sleeping so the receive buffer fills up without
any acks being sent. Won't that lead to a lot of retransmissions
containing data?

        -Otto

> 
> bye,
> Jan
> 
> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.365
> diff -u -p -r1.365 tcp_input.c
> --- netinet/tcp_input.c       19 Jun 2020 22:47:22 -0000      1.365
> +++ netinet/tcp_input.c       5 Nov 2020 23:00:34 -0000
> @@ -165,8 +165,8 @@ do { \
>  #endif
>  
>  /*
> - * Macro to compute ACK transmission behavior.  Delay the ACK unless
> - * we have already delayed an ACK (must send an ACK every two segments).
> + * Macro to compute ACK transmission behavior.  Delay the ACK until
> + * a read from the socket buffer or the delayed ACK timer causes one.
>   * We also ACK immediately if we received a PUSH and the ACK-on-PUSH
>   * option is enabled or when the packet is coming from a loopback
>   * interface.
> @@ -176,8 +176,7 @@ do { \
>       struct ifnet *ifp = NULL; \
>       if (m && (m->m_flags & M_PKTHDR)) \
>               ifp = if_get(m->m_pkthdr.ph_ifidx); \
> -     if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> -         (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> +     if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
>           (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
>               tp->t_flags |= TF_ACKNOW; \
>       else \
> 

Reply via email to