Hello,

I've tried Richard's patch on sparc. I took a brief look at its source code.
It's essentially what PF is doing on Solaris.

The checksum handling in PF on systems with HW assisted checksums is getting
tricky for local (out)bound packets. The approach we take on Solaris is as
follows:

        - for inbound packets PF always trusts HW, if HW says chksum is
          correct, then checksum is correct. if HW is not able to verify
          checksum (HW checksum verification is off), PF falls back to SW
          verification (1)

        - PF does not check (verify) checksum for outbound packets, outbound
          packet is either

                - forwarded, so checksum has been verified in inbound side (2a)

                - local outbound, then checksum is either valid or to be
                  calculated by HW (2b)

The things are getting pretty wild in 2b, when PF is doing PBR (policy based
routing) on outbound packets. Consider situation when IP stack routes packet
via NIC, which is able to calculate chksum in HW.  IP stack sets flags and
fields and passes packet to PF. PF changes interface, where packet is bound to,
to NIC, which is not able to calculate checksum, so the HW-cksum flags set by
IP stack are no longer valid. In this case we always revert to calculation
in SW.

I have not looked at current checksum handling at PF on OpenBSD, so can't tell
exactly what's going on there. I feel PF does not bother too much with updating
the checksum, when it changes the packet. It seems to me the
in_proto_cksum_out() gets called as soon as outbound packet gets inspected by
pf_test() to calculate/fix checksums. It looks like in_proto_cksum_out() has to
recalculate checksum in SW for entire packet, when underlying HW does not offer
checksum offload. Is that right? Or am I missing some piece?

On the other hand Richard's patch adjusts checksums by delta caused by update.
The adjustment is of few operations (add/and/not) on very small chunk of
memory. The price should be same we pay for extra logic to decide if
HW will compute chksum for us or we have to do it on our own. However we will
save plenty of cycles, when we would have to revert to SW.


I currently have small suggestion to improve Richard's patch. The macro in
PF_ALGNMNT() in pfvar.h uses modulo:

    #define PF_HI (true)
    #define PF_LO (!PF_HI)
    #define PF_ALGNMNT(off) (((off) % 2) == 0 ? PF_HI : PF_LO)

I think we can get away with simple and operation (& 1), which will be faster
than % on many platforms.

    #define PF_ALGNMNT(off) (((off) & 1) == 0 ? PF_HI : PF_LO)

regards
sasha

Reply via email to