Re: TSO Large Send Offloading for ix(4)

Claudio Jeker Thu, 30 Jun 2022 04:01:27 -0700

On Wed, Jun 29, 2022 at 02:24:35PM +0200, Jan Klemkow wrote:
> Hi,
> 
> This diff introduces the sending side of TSO to our TCP/IP stack.
> If the hardware has TSO capabilities tcp_output() will send huge TCP
> segments down the stack to the interface.  ip{6}_output() will ignore
> the size is greater then eny MTU in this case.
> 
> I tested it with IPv4, IPv6 and VLAN.  VLAN sending is not offloaded
> now, because this interface does not inherited the TSO capability.
> I will do this in a later diff.
> 
> If you have an ix(4) NIC of 82599 or newer, just enable TSO with
> ifconfig(8):
> 
> # ifconfig ix0 tso
> 
> Thanks for testing,
> Jan


A few comments below: 

> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /mount/openbsd/cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.375
> diff -u -p -r1.375 tcp_input.c
> --- netinet/tcp_input.c       4 Jan 2022 06:32:39 -0000       1.375
> +++ netinet/tcp_input.c       28 Jun 2022 17:12:43 -0000
> @@ -2851,6 +2851,15 @@ tcp_mss(struct tcpcb *tp, int offer)
>               mssopt = ifp->if_mtu - iphlen - sizeof(struct tcphdr);
>               mssopt = max(tcp_mssdflt, mssopt);
>       }
> +
> +     if (ISSET(ifp->if_xflags, IFXF_TSO)) {
> +             tp->t_flags |= TF_TSO;
> +
> +             if (ifp->if_hw_tsomax < MAXMCLBYTES)
> +                     tp->t_tsomax = ifp->if_hw_tsomax;
> +             else
> +                     tp->t_tsomax = MAXMCLBYTES;
> +     }

Why is there a limit on MAXMCLBYTES? I guess the card must support chained
buffers because a 64k mbuf is not linearly mapped.

>   out:
>       if_put(ifp);
>       /*

> Index: sys/mbuf.h
> ===================================================================
> RCS file: /mount/openbsd/cvs/src/sys/sys/mbuf.h,v
> retrieving revision 1.254
> diff -u -p -r1.254 mbuf.h
> --- sys/mbuf.h        14 Feb 2022 04:33:18 -0000      1.254
> +++ sys/mbuf.h        28 Jun 2022 17:29:00 -0000
> @@ -133,6 +133,7 @@ struct    pkthdr {
>       u_int16_t                ph_flowid;     /* pseudo unique flow id */
>       u_int16_t                csum_flags;    /* checksum flags */
>       u_int16_t                ether_vtag;    /* Ethernet 802.1p+Q vlan tag */
> +     u_int16_t                ph_mss;        /* max. seg. size. */
>       u_int                    ph_rtableid;   /* routing table id */
>       u_int                    ph_ifidx;      /* rcv interface index */
>       u_int8_t                 ph_loopcnt;    /* mbuf is looping in kernel */

Please move the two u_int fields above the u_int16_t block so that the
pkthdr packs nicely. You add a 5th uint16_t and so that would insert an
extra pad.

> @@ -226,13 +227,14 @@ struct mbuf {
>  #define      M_IPV6_DF_OUT           0x1000  /* don't fragment outgoing IPv6 
> */
>  #define      M_TIMESTAMP             0x2000  /* ph_timestamp is set */
>  #define      M_FLOWID                0x4000  /* ph_flowid is set */
> +#define      M_TCP_TSO               0x8000  /* TCP Segmentation Offload 
> needed */
>  
>  #ifdef _KERNEL
>  #define MCS_BITS \
>      ("\20\1IPV4_CSUM_OUT\2TCP_CSUM_OUT\3UDP_CSUM_OUT\4IPV4_CSUM_IN_OK" \
>      "\5IPV4_CSUM_IN_BAD\6TCP_CSUM_IN_OK\7TCP_CSUM_IN_BAD\10UDP_CSUM_IN_OK" \
>      
> "\11UDP_CSUM_IN_BAD\12ICMP_CSUM_OUT\13ICMP_CSUM_IN_OK\14ICMP_CSUM_IN_BAD" \
> -    "\15IPV6_NODF_OUT" "\16TIMESTAMP" "\17FLOWID")
> +    "\15IPV6_NODF_OUT" "\16TIMESTAMP" "\17FLOWID" "\20TCP_TSO")
>  #endif
>  
>  /* mbuf types */
> 

What does happen when pf reroutes the packets to a non TSO capable
interface? I think this should be done like HW CSUM offloading where the
stack fixies up packets in ip_output() when it realizes that the HW does
not support it. Only then this becomes generally more usable.
Also doing TSO LSO in ip_output would allow to use LRO / LSO in forwarding
and in many more places. It will also improve performance since we can
batch send TCP packets.

-- 
:wq Claudio

Re: TSO Large Send Offloading for ix(4)

Reply via email to