On Wed, Jun 29, 2022 at 02:24:35PM +0200, Jan Klemkow wrote:
> Hi,
>
> This diff introduces the sending side of TSO to our TCP/IP stack.
> If the hardware has TSO capabilities tcp_output() will send huge TCP
> segments down the stack to the interface. ip{6}_output() will ignore
> the size is greater then eny MTU in this case.
>
> I tested it with IPv4, IPv6 and VLAN. VLAN sending is not offloaded
> now, because this interface does not inherited the TSO capability.
> I will do this in a later diff.
>
> If you have an ix(4) NIC of 82599 or newer, just enable TSO with
> ifconfig(8):
>
> # ifconfig ix0 tso
>
> Thanks for testing,
> Jan
A few comments below:
> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /mount/openbsd/cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.375
> diff -u -p -r1.375 tcp_input.c
> --- netinet/tcp_input.c 4 Jan 2022 06:32:39 -0000 1.375
> +++ netinet/tcp_input.c 28 Jun 2022 17:12:43 -0000
> @@ -2851,6 +2851,15 @@ tcp_mss(struct tcpcb *tp, int offer)
> mssopt = ifp->if_mtu - iphlen - sizeof(struct tcphdr);
> mssopt = max(tcp_mssdflt, mssopt);
> }
> +
> + if (ISSET(ifp->if_xflags, IFXF_TSO)) {
> + tp->t_flags |= TF_TSO;
> +
> + if (ifp->if_hw_tsomax < MAXMCLBYTES)
> + tp->t_tsomax = ifp->if_hw_tsomax;
> + else
> + tp->t_tsomax = MAXMCLBYTES;
> + }
Why is there a limit on MAXMCLBYTES? I guess the card must support chained
buffers because a 64k mbuf is not linearly mapped.
> out:
> if_put(ifp);
> /*
> Index: sys/mbuf.h
> ===================================================================
> RCS file: /mount/openbsd/cvs/src/sys/sys/mbuf.h,v
> retrieving revision 1.254
> diff -u -p -r1.254 mbuf.h
> --- sys/mbuf.h 14 Feb 2022 04:33:18 -0000 1.254
> +++ sys/mbuf.h 28 Jun 2022 17:29:00 -0000
> @@ -133,6 +133,7 @@ struct pkthdr {
> u_int16_t ph_flowid; /* pseudo unique flow id */
> u_int16_t csum_flags; /* checksum flags */
> u_int16_t ether_vtag; /* Ethernet 802.1p+Q vlan tag */
> + u_int16_t ph_mss; /* max. seg. size. */
> u_int ph_rtableid; /* routing table id */
> u_int ph_ifidx; /* rcv interface index */
> u_int8_t ph_loopcnt; /* mbuf is looping in kernel */
Please move the two u_int fields above the u_int16_t block so that the
pkthdr packs nicely. You add a 5th uint16_t and so that would insert an
extra pad.
> @@ -226,13 +227,14 @@ struct mbuf {
> #define M_IPV6_DF_OUT 0x1000 /* don't fragment outgoing IPv6
> */
> #define M_TIMESTAMP 0x2000 /* ph_timestamp is set */
> #define M_FLOWID 0x4000 /* ph_flowid is set */
> +#define M_TCP_TSO 0x8000 /* TCP Segmentation Offload
> needed */
>
> #ifdef _KERNEL
> #define MCS_BITS \
> ("\20\1IPV4_CSUM_OUT\2TCP_CSUM_OUT\3UDP_CSUM_OUT\4IPV4_CSUM_IN_OK" \
> "\5IPV4_CSUM_IN_BAD\6TCP_CSUM_IN_OK\7TCP_CSUM_IN_BAD\10UDP_CSUM_IN_OK" \
>
> "\11UDP_CSUM_IN_BAD\12ICMP_CSUM_OUT\13ICMP_CSUM_IN_OK\14ICMP_CSUM_IN_BAD" \
> - "\15IPV6_NODF_OUT" "\16TIMESTAMP" "\17FLOWID")
> + "\15IPV6_NODF_OUT" "\16TIMESTAMP" "\17FLOWID" "\20TCP_TSO")
> #endif
>
> /* mbuf types */
>
What does happen when pf reroutes the packets to a non TSO capable
interface? I think this should be done like HW CSUM offloading where the
stack fixies up packets in ip_output() when it realizes that the HW does
not support it. Only then this becomes generally more usable.
Also doing TSO LSO in ip_output would allow to use LRO / LSO in forwarding
and in many more places. It will also improve performance since we can
batch send TCP packets.
--
:wq Claudio