On 30.10.2012 03:25, YongHyeon PYUN wrote:
On Mon, Oct 29, 2012 at 09:20:59AM +0100, Andre Oppermann wrote:
On 29.10.2012 22:40, YongHyeon PYUN wrote:
On Mon, Oct 29, 2012 at 09:21:00AM +0400, Gleb Smirnoff wrote:
On Mon, Oct 29, 2012 at 01:41:04PM -0700, YongHyeon PYUN wrote:
Y> On Sun, Oct 28, 2012 at 02:01:37AM +0400, Gleb Smirnoff wrote:
Y> > On Sat, Oct 27, 2012 at 12:58:52PM +0200, Andre Oppermann wrote:
Y> > A> On 26.10.2012 23:06, Gleb Smirnoff wrote:
Y> > A> > Author: glebius
Y> > A> > Date: Fri Oct 26 21:06:33 2012
Y> > A> > New Revision: 242161
Y> > A> > URL: http://svn.freebsd.org/changeset/base/242161
Y> > A> >
Y> > A> > Log:
Y> > A> >    o Remove last argument to ip_fragment(), and obtain all
needed information
Y> > A> >      on checksums directly from mbuf flags. This simplifies
code.
Y> > A> >    o Clear CSUM_IP from the mbuf in ip_fragment() if we did
checksums in
Y>
Y> I'm not sure whether ti(4)'s checksum offloading for IP fragmented
Y> packets(CSUM_IP_FRAGS) still works after this change.  ti(4)
Y> requires CSUM_IP should be set for IP fragmented packets. Not sure
Y> whether it's a bug or not. I have a ti(4) controller but I don't
Y> remember where I can find it and don't have a link
Y> parter(1000baseSX) to test it. :-(

ti(4) declares both CSUM_IP and CSUM_IP_FRAGS, so ip_fragment() won't do

Because it supports both CSUM_IP and CSUM_IP_FRAGS. Probably ti(4)
is the only controller that supports TCP/UDP checksum offloading
for an IP fragmented packet.

This is a bit weird if it doesn't do the fragmentation itself.
Computing the IP header checksum doesn't differ for normal and
fragmented packets.  The protocol checksum (TCP or UDP) stays
the same for in the case of IP level fragmentation.  It is only
visible in the first fragment which includes the protocol header.

My interpretation for CSUM_IP_FRAGS works like the following.
  - Only peuso header checksum for TCP/UDP is computed by upper
    stack.
  - Controller has no ability to fragment the packet so it should
    done in upper stack(i.e. ip_output()).
  - When ip_output() has to fragment the packet, it just fragments
    the packet without completing TCP/UDP and IP checksum. If
    controller does not support CSUM_IP_FRAGS feature, ip_output()
    can't delay TCP/UDP checksum in this stage.
  - The fragmented packets are sent to driver. Driver sets
    appropriate bits of DMA descriptor based on fragmentation field
    of mbuf(M_FRAG, M_LASTFRAG) and issue the frame to controller.
  - The firmware of controller queues the fragmented frames up in
    its internal memory and hold off sending out the frames since it
    has to compute TCP/UDP checksum. When it sees a frame which
    indicates the end of fragmented frame it finally computes
    TCP/UDP checksum and send each frame out to wire by computing
    IP checksum on the fly.
The difference is which one(upper stack vs. controller) computes
TCP/UDP/IP checksum.

Such a behavior doesn't make much sense and probably wasn't used at all
in practice.  It's very complex as well.  Plus you can't guarantee that
there won't be other packet slipping into the interface queue in an SMP
world.

IP fragmentation really isn't done for TCP within the kernel.  We try
to prevent it as it would have a huge performance impact. Hence the
internal MTU discovery and the Don't Fragment bit set on TCP packets.

IP fragmentation does happen for large UDP packet locally generated.
There however because of the past absence of UDP fragmentation offload
coupled with UDP checksum offloading caused all fragmentation to be
done at the UDP level before it hits ip_output.

The remaining use of IP fragmentation is when the machine is acting
as a router and it has to send packets out on an interface with a
smaller MTU than the one it came in on.

So the only two useful features regarding UDP+IP fragmentation are:

 1. IP fragmentation including UDP checksum calculation for locally
    generated large UDP packets.  This is the TSO for UDP.

 2. Pure IP fragmentation for in-transit packets.  Here only the
    IP header checksum needs to be recalculated for each fragment.
    The layer 4 checksums (UDP, TCP and others) stay the same.

--
Andre


software checksums, and thus won't clear these flags.

Potentially a driver that announces one flag in if_hwassist but relies on
couple of flags to be set on mbuf is not correct. If a driver can't do
single
checksum  processing independently from others, then it should set or
clear
appropriate flags in if_hwassist as a group.

Hmm, then what would be best way to achieve CSUM_IP_FRAGS in
driver? I don't have clear idea how to utilize the hardware
feature. The stack should tell that the mbuf needs TCP/UDP checksum
offloading for IP fragmented packet(i.e. CSUM_IP_FRAGS is not set by
upper stack).

As I said there can't be fragment checksumming without hardware

It's up to controller's firmware. It does not send the fragmented
frame until it computes TCP/UDP checksum.

based fragmentation.  We have three cases here:

  1. TSO where the hardware does the segmentation, TCP and IP header
     checksums for each generated packet.
  2. IP packet fragmentation where a packet is split, the IP header
     checksum is recomputed for each fragment, but the protocol csum
     stays the same and is not modified.
  3. UDP fragmentation where a large packet is sent to the hardware
     and it generates first the UDP checksum and then splits it into
     IP fragments each with its own IP header checksum.

So we end up with these possible large send hardware offload capabilities:
  TSO: including IPv4hdr and TCP checksumming
  UDP fragmentation: including IPv4hdr and UDP checksumming
  IP fragmentation: including IPv4hdr checksumming

Besides that we have the packet <= MTU sized offload capabilities:
  TCP checksumming
  UDP checksumming
  SCTP checksumming
  IPv4hdr checksumming

Y> > A> >      hardware. Some driver may not announce CSUM_IP in theur
if_hwassist,
                ^^^^^^^^

Oh, that was a typo! Software was meant.

That explains quite a bit of confusion.

--
Andre




_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to