On 07/05/18 17:14, Rick Macklem wrote:
Andrew Gallatin wrote:
On 7/4/18 9:20 PM, Rodney W. Grimes wrote:
[stuff snipped]

It is using a magic constant twice, where one has a
derived value that is dependent on the value of the other.
That is bad and error prone and does not document that
one depends on the other.  Please fix this.  Or at least
make 65536 a #define so that it only needs changed one
place and clearly shows the interdependence of these
values.

To me, 65536 is one of the few cases where the magic number is
more meaningful than a name.  But fine, if you feel that
strongly about it, I'll change it for you.

Btw, in general, if_hw_tsomax and if_hw_tsomaxsegsize are not
related or the same value. It just happens that they both appear
to be related to 64K in this case. (I believe this is fairly common,
since the original Microsoft "standard" used 64K as a limit, since
it was stored in 16bits.)

Yes; exactly.

if_hw_tsomax is the maximum size of the entire TSO segment,
including MAC level headers (commonly 64K, due to Mircosoft...
but could be larger if the hardware guys chose to do so).

Given that we do TSO like Linux, and not like MS (meaning
we express the size of the pre-segmented packet using the
a 16-bit value in the IPv4/IPv6 header), supporting more
than 64K is not possible in FreeBSD, so I'm basically
saying "nerf this constraint".

MS windows does it better / different; they express the
size of the pre-segmented packet in packet metadata,
leaving ip->ip_len = 0.  This is better, since
then the pseudo hdr checksum in the template header can be
re-used (with the len added) for every segment by the NIC.
If you've ever seen a driver set ip->ip_len = 0, and re-calc
the pseudo-hdr checksum, that's why.   This is also why
MS LSOv2 can support TSO of packets larger than 64K, since they're
not constrained by the 16-bit value in the IP{4,6} header.
The value of TSO larger than 64K is questionable at best though.
Without pacing, you'd just get more packets dropped when
talking across the internet..

if_hw_tsomaxsegsize is the maximum size of contiguous memory
that a "chunk" of the TSO segment can be stored in for handling by
the driver's transmit side. Since higher

And this is what I object to.  TCP should not care about
this.  Drivers should use busdma, or otherwise be capable of
chopping large contig regions down to chunks that they can
handle.   If a driver can really only handle 2K, then it should
be having busdma give it an s/g list that is 2x as long, not having
TCP call m_dupcl() 2x as often on page-sized data generated by
sendfile (or more on non-x86 with larger pages).

level code such as NFS (and iSCSI, I think?) uses MCLBYTE clusters,
anything 2K or higher normally works the same.  Not sure about
sosend(), but I think it also copies the data into MCLBYTE clusters?
This would change if someday jumbo mbuf clusters become the norm.
(I tried changing the NFS code to use jumbo clusters, but it would
  result in fragmentation of the memory used for mbuf cluster allocation,
  so I never committed it.)


At least for sendfile(), vm pages are wrapped up and attached to
mbufs, so you have 4K (and potentially much more on non-x86).
Doesn't NFS do something similar when sending data, or do you copy
into clusters?

I have changes which I have not upstreamed yet which enhance mbufs to
carry TLS metadata & vector of physical addresses (which I call
unmapped mbufs) for sendfile and kernel TLS.  As part of that,
sosend (for kTLS) can allocate many pages and attach them to one mbuf.
The idea (for kTLS) is that you can keep an entire TLS record (with
framing information) in a single unmapped mbuf, which saves a
huge amount of CPU which would be lost to cache misses doing
pointer-chasing of really long mbuf chains (TLS hdrs and trailers
are generally 13 and 16 bytes).  The goal was to regain CPU
during Netflix's transition to https streaming.  However, it
is unintentionally quite helpful on i386, since it reduces
overhead from having to map/unmap sf_bufs. FWIW, these mbufs
have been in production at Netflix for over a year, and carry
a large fraction of the worlds internet traffic :)


rick
ps: And I'll admit I don't find 65536 very magic;-)


:)

Drew
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to