On 14/09/2015, at 11:51 PM, Henning Brauer wrote: > * Martin Pieuchot <m...@openbsd.org> [2015-09-11 13:54]: >> On 11/09/15(Fri) 13:28, Henning Brauer wrote: >>> Ryan pointed me to this diff and we briefly discussed it; we remain >>> convinced that the in-tree approach is better than this. >> Could you elaborate why? > > [ elaboration, quoted further down ] > > And given that, the approach that has less and simpler code and makes > better use of offloading wins.
Well, I agree that less and simpler code is better, all else being equal. In fact if you'd like my opinion, the in-tree approach actually places a greater burden of complexity on the programmer. The patch needs one line to alter a value rewrite += pf_change_16(pd, &pd->hdr.icmp->icmp_id, icmpid); ; the in-tree code requires all packet alterations to occur between two calls, analogous to the way locking must enclose a critical section: if (pd->csum_status == PF_CSUM_UNKNOWN) pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off, pd->proto, pd->af); [...] header->field = new_value; [...] pf_cksum(pd, pd->m); This is the stuff mild headaches are made of. Especially as 1) pf_cksum() usually occurs elsewhere and as 2) the idiom tempts programmers to rely on existing pf_check_proto_cksum() calls. This last occurs three places in the existing code. I presume these are correct but it is much easier to avoid the question by replacing them with the one-liner. The new interface also provides a place to put all the altered-value guards, replacing, e.g.: if (icmpid != pd->hdr.icmp->icmp_id) { if (pd->csum_status == PF_CSUM_UNKNOWN) pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off, pd->proto, pd->af); pd->hdr.icmp->icmp_id = icmpid; rewrite = 1; } with: rewrite += pf_change_16(pd, &pd->hdr.icmp->icmp_id, icmpid); This saves code. Here are some numbers. First, non-comment[0] lines of code for all affected files: pf.c:1.944 pfvar.h:1.420 pf_norm.c:1.182 current 5724 1613 1030 patched 5682 1614 1028 ---- ---- ---- -88 +1 -2 (amd64) pf.o pf_norm.o current 123520 bytes 30816 bytes patched 123584 bytes 30912 bytes ----- ----- +64 +96 (i386) pf.o pf_norm.o current 95904 bytes 21096 bytes patched 94956 bytes 21156 bytes ----- ----- -948 -60 The patch's numbers include the new pf_change_* interface and the checksum modification code. So the proposed interface pays its way. Ok, let's move on to handling checksums: > Well we've been thru it more than once; the argument presented here > was that modifying the cksum instead of verify + recalc is better as > it wouldn't hide cksum mismatches if the cksum engines on the NICs we > offload to misbehave. After many years with the verify + recalc > approach I think it is pretty safe to say that this is of no > concern... I'm sorry, that's not what I'm saying. Even supposing perfection of every offload engine, offload engines have no monopoly on faults; regenerated checksums also mask bus, stack and memory faults in pf-altered packets. Regeneration always hides faults in /something/, unfortunately. pf_check_proto_cksum() takes care to mitigate that by preventing it from hiding faults in the routers a packet passes through --- except for the current one, which it can't. That is, either router faults are a concern or they're not. pf can't have it both ways, concerned to detect faults in every router other than the one on which it is running. The proposed patch fixes that. And, as I wrote in a previous post, we know that routers corrupt data: >> Right now my home firewall shows 30 TCP segments dropped for bad checksums. >> As checks at least as strong are used by every sane link-layer this >> virtually implies the dropped packets suffered router or end-point faults. Another (home) router I administer was seeing IIRC five a day on average over 42 days, something we expect of a globe-spanning internetwork. Passing one of these damaged segments to the user sufficies to break MACs and drop secure connections. Nonetheless, one might still argue that the reduced reliability is acceptable. If there were some compelling upside, perhaps, maybe it might be, but I've seen no evidence of performance benefits and the proposed patch addresses the code complexity from which the more reliable method suffered. Lastly, I don't want to sound like I'm trying to beat up on your checksum work. I know very well how much effort is involved and it's very much appreciated; you've cleaned up the code considerably. The patch on the table builds on that and it would be far uglier without you having moved checksum calculation to the output paths, stopped partially checksummed packets passing through the stack, eliminated the possibility of rdr-to-localhost errors, and generally simplified things all over the place. I also appreciate that you aim to avoid knarly problems by doing something completely different. Too often people just grit their teeth and keep chewing mindlessly on the knot, or tie a bigger one to workaround it. That they don't around here is one of the reasons I like OpenBSD. best, Richard. [0] #!/bin/sh gcc -P -fpreprocessed -dD -E $1 [1] pf.c:1.944 Index: net/pf.c =================================================================== --- net.orig/pf.c +++ net/pf.c @@ -5092,21 +5092,24 @@ pf_test_state_icmp(struct pf_pdesc *pd, #ifdef INET6 case AF_INET6: m_copyback(pd->m, pd->off, sizeof(struct icmp6_hdr), pd->hdr.icmp6, M_NOWAIT); m_copyback(pd2.m, ipoff2, sizeof(h2_6), &h2_6, M_NOWAIT); break; #endif /* INET6 */ } - uh.uh_sum = 0; + /* Avoid recomputing quoted UDP checksum. + * note: udp6 0 csum invalid per rfc2460 p27. + * but presumed nothing cares in this context */ + pf_change_16(pd, &uh.uh_sum, 0); m_copyback(pd2.m, pd2.off, sizeof(uh), &uh, M_NOWAIT); copyback = 1; } break; } case IPPROTO_ICMP: { struct icmp iih; if (pd2.af != AF_INET) { @@ -5158,21 +5161,22 @@ pf_test_state_icmp(struct pf_pdesc *pd, pd->hdr.icmp6, M_NOWAIT); if (pf_change_icmp_af(pd->m, ipoff2, pd, &pd2, &nk->addr[sidx], &nk->addr[didx], pd->af, nk->af)) return (PF_DROP); pd->proto = IPPROTO_ICMPV6; if (pf_translate_icmp_af(nk->af, &iih)) return (PF_DROP); if (virtual_type == htons(ICMP_ECHO) && nk->port[iidx] != iih.icmp_id) - iih.icmp_id = nk->port[iidx]; + pf_change_16(pd, &iih.icmp_id, + nk->port[iidx]); m_copyback(pd2.m, pd2.off, ICMP_MINLEN, &iih, M_NOWAIT); pd->m->m_pkthdr.ph_rtableid = nk->rdomain; pd->destchg = 1; PF_ACPY(&pd->nsaddr, &nk->addr[pd2.sidx], nk->af); PF_ACPY(&pd->ndaddr, &nk->addr[pd2.didx], nk->af); pd->naf = nk->af; @@ -5270,21 +5274,22 @@ pf_test_state_icmp(struct pf_pdesc *pd, if (pf_change_icmp_af(pd->m, ipoff2, pd, &pd2, &nk->addr[sidx], &nk->addr[didx], pd->af, nk->af)) return (PF_DROP); pd->proto = IPPROTO_ICMP; if (pf_translate_icmp_af(nk->af, &iih)) return (PF_DROP); if (virtual_type == htons(ICMP6_ECHO_REQUEST) && nk->port[iidx] != iih.icmp6_id) - iih.icmp6_id = nk->port[iidx]; + pf_change_16(pd, &iih.icmp6_id, + nk->port[iidx]); m_copyback(pd2.m, pd2.off, sizeof(struct icmp6_hdr), &iih, M_NOWAIT); pd->m->m_pkthdr.ph_rtableid = nk->rdomain; pd->destchg = 1; PF_ACPY(&pd->nsaddr, &nk->addr[pd2.sidx], nk->af); PF_ACPY(&pd->ndaddr, &nk->addr[pd2.didx], nk->af); >