Hello,

patch below makes life easier for clients, which always use same source port,
when talking to server (e.g. think of NFS). The scenario we are dealing with
is as follows:

        - client mounts remote NFS share

        - there is a PF sitting between client and NFS server. the mount
          operation in earlier step has created a state in PF

        - client panics/loses power or whatever catastrophe happens preventing
          client to properly cease NFS sessions

        - client boots up and attempts to mount same NFS share

        - the SYN packet sent by client gets blocked by PF as it hits
          existing session, which is in established state.

The patch makes PF to send 'challenge ACK' for SYN packet, which matches
session in established state. Challenge ACK will have the last sequence numbers
firewall remembers for existing session. Client should response with RST to
challenge ACK. The RST will make firewall and remote server to kill old
session. Then client will retransmit SYN packet. The retransmitted SYN will
create a new state in PF and world will be nice and shiny again.

The challenge ACK sent by firewall essentially emulates behavior of NFS
server TCP stack. NFS server would also response with challenge ACK upon
reception of SYN, which matches existing connection.

The patch also comes with test case. I've just learned it's bit tricky
to use scapy for sending TCP packets. There are basically two pitfalls:

        - the session established by scapy interferes with host's TCP stack,
        where scapy is running. Host's TCP answers with RST to SYN-ACK,
        which is sent as a response to scapy's SYN. Adding PF rule, which
        blocks outbound RST solves the problem.

        - the scapy's sr() function (send and receive) is very smart not
        to receive a challenge_ack sent by remote PF. Test case uses
        sniffer to verify remote PF answers with challenge ACK. If you
        are not interested in further details stop reading here.

All packets (I should rather say protocol data units) are objects in scapy.
There is a method .answers(). Every protocol payload has its specific
answers() method. There is answers() method for IP, for UDP, for TCP, ...
Looking at answers() method for TCP we see something disappointing for
challenge ACK:

        475    def answers(self, other):
        476        if not isinstance(other, TCP):
        477            return 0
        478        if conf.checkIPsrc:
        479            if not ((self.sport == other.dport) and
        480                    (self.dport == other.sport)):
        481                return 0
        482        if (abs(other.seq-self.ack) > 2+len(other.payload)):
        483            return 0
        484        return 1

there is a simple check for sequence number at line 482. The test prevents
sr() function to work for test case. The test sends SYN packet with
SeqNo 100000, however PF answers with ACK 1000. I've tried to derive
TCP_OOW class (TCP out of window) with answers() method without check for
sequence numbers. Unfortunately I could not figure out a way to tell sr() to
use my TCP_OOW instead of default TCP class. I gave up and opted for poor man's
solution to use a sniffer instead.

thanks and
regards
sasha

--------8<---------------8<---------------8<------------------8<--------
diff -r a4cc143dcba1 src/sys/net/pf.c
--- a/src/sys/net/pf.c  Fri Sep 30 22:21:48 2016 +0200
+++ b/src/sys/net/pf.c  Fri Sep 30 22:22:49 2016 +0200
@@ -2815,6 +2815,25 @@ pf_send_tcp(const struct pf_rule *r, sa_
        }
 }
 
+static void
+pf_send_challenge_ack(struct pf_pdesc *pd, struct pf_state *s,
+    struct pf_state_peer *src, struct pf_state_peer *dst)
+{
+       /*
+        * We are sending challenge ACK as a response to SYN packet, which
+        * matches existing state (modulo TCP window check). Therefore packet
+        * must be sent on behalf of destination.
+        *
+        * We expect sender to remain either silent, or send RST packet
+        * so both, firewall and remote peer, can purge dead state from
+        * memory.
+        */
+       pf_send_tcp(s->rule.ptr, pd->af, pd->dst, pd->src,
+           pd->hdr.tcp->th_dport, pd->hdr.tcp->th_sport, dst->seqlo,
+           src->seqlo, TH_ACK, 0, 0, s->rule.ptr->return_ttl, 1, 0,
+           pd->rdomain);
+}
+
 void
 pf_send_icmp(struct mbuf *m, u_int8_t type, u_int8_t code, sa_family_t af,
     struct pf_rule *r, u_int rdomain)
@@ -4637,20 +4656,34 @@ pf_test_state(struct pf_pdesc *pd, struc
        case IPPROTO_TCP:
                if ((action = pf_synproxy(pd, state, reason)) != PF_PASS)
                        return (action); 
-               if (((pd->hdr.tcp->th_flags & (TH_SYN|TH_ACK)) == TH_SYN) &&
-                   dst->state >= TCPS_FIN_WAIT_2 &&
-                   src->state >= TCPS_FIN_WAIT_2) {
-                       if (pf_status.debug >= LOG_NOTICE) {
-                               log(LOG_NOTICE, "pf: state reuse ");
-                               pf_print_state(*state);
-                               pf_print_flags(pd->hdr.tcp->th_flags);
-                               addlog("\n");
+               if ((pd->hdr.tcp->th_flags & (TH_SYN|TH_ACK)) == TH_SYN) {
+
+                       if (dst->state >= TCPS_FIN_WAIT_2 &&
+                           src->state >= TCPS_FIN_WAIT_2) {
+                               if (pf_status.debug >= LOG_NOTICE) {
+                                       log(LOG_NOTICE, "pf: state reuse ");
+                                       pf_print_state(*state);
+                                       pf_print_flags(pd->hdr.tcp->th_flags);
+                                       addlog("\n");
+                               }
+                               /* XXX make sure it's the same direction ?? */
+                               (*state)->src.state = TCPS_CLOSED;
+                               (*state)->dst.state = TCPS_CLOSED;
+                               pf_remove_state(*state);
+                               *state = NULL;
+                               pd->m->m_pkthdr.pf.inp = inp;
+                       } else if (dst->state >= TCPS_ESTABLISHED &&
+                           src->state >= TCPS_ESTABLISHED) {
+                                /*
+                                 * SYN matches existing state???
+                                * Typically happens when sender boots up after
+                                * sudden panic. Certain protocols (NFSv3) are
+                                * always using same port numbers. Challenge
+                                * ACK enables all parties (firewall and peers)
+                                * to get in sync again.
+                                 */
+                                pf_send_challenge_ack(pd, *state, src, dst);
                        }
-                       /* XXX make sure it's the same direction ?? */
-                       (*state)->src.state = (*state)->dst.state = TCPS_CLOSED;
-                       pf_remove_state(*state);
-                       *state = NULL;
-                       pd->m->m_pkthdr.pf.inp = inp;
                        return (PF_DROP);
                }
 
diff -r a4cc143dcba1 src/regress/sys/net/pf_forward/Makefile
--- a/src/regress/sys/net/pf_forward/Makefile   Fri Sep 30 22:21:48 2016 +0200
+++ b/src/regress/sys/net/pf_forward/Makefile   Fri Sep 30 22:22:49 2016 +0200
@@ -247,6 +247,16 @@ TARGETS += tcp  tcp6
 
 run-regress-tcp: stamp-pfctl
        @echo '\n======== $@ ========'
+       #
+       # The TCP handshake in challenge_ack.py is initiated by scapy,
+       # SRC's local TCP stack is very surprised to see SYN-ACK coming
+       # from ECO, so it answers with RST. The RST must not leave SRC,
+       # otherwise testing would be spoiled.
+       #
+       echo "block out on ${SRC_IF} proto tcp from ${SRC_OUT} to ${ECO_IN}\
+               port=echo flags R/R" | ${SUDO} pfctl -e -f -
+       ${SUDO} ${PYTHON}challenge_ack.py ${SRC_OUT} ${ECO_IN}
+       ${SUDO} pfctl -d -Fa
 .for ip in ECO_IN ECO_OUT RDR_IN RDR_OUT RTT_IN
        @echo Check tcp ${ip}:
        ${SUDO} route -n delete -host -inet ${${ip}} || true
diff -r a4cc143dcba1 src/regress/sys/net/pf_forward/challenge_ack.py
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/src/regress/sys/net/pf_forward/challenge_ack.py   Fri Sep 30 22:22:49 
2016 +0200
@@ -0,0 +1,67 @@
+#!/usr/local/bin/python2.7
+# check wether path mtu to dst is as expected
+
+import os
+import threading
+from addr import *
+from scapy.all import *
+
+# usage: challenge_ack.py src dst
+
+#
+# we can not use scapy's sr() function as receive side
+# ignores the packet we expect to see. Packet is ignored
+# due to mismatching sequence numbers. 'bogus_syn' is using
+# seq = 1000000, while response sent back by PF has ack,
+# which fits regular session opened by 'syn'.
+#
+class Sniff(threading.Thread):
+       captured = None
+       def run(self):
+               self.captured = sniff(iface=SRC_IF,
+                    filter='tcp src port 7', timeout=3)
+
+srcaddr=sys.argv[1]
+dstaddr=sys.argv[2]
+port=os.getpid() & 0xffff
+
+ip=IP(src=srcaddr, dst=dstaddr)
+
+print "Send SYN packet, receive SYN+ACK"
+syn=TCP(sport=port, dport='echo', seq=1, flags='S', window=(2**16)-1)
+synack=sr1(ip/syn, iface=SRC_IF, timeout=5)
+
+print "Send ACK packet to finish handshake."
+ack=TCP(sport=synack.dport, dport=synack.sport, seq=2, flags='A',
+    ack=synack.seq+1)
+send(ip/ack, iface=SRC_IF)
+
+print "Connection is established, send bogus SYN, expect challenge ACK"
+bogus_syn=TCP(sport=port, dport='echo', seq=1000000, flags='S',
+    window=(2**16)-1)
+sniffer = Sniff();
+sniffer.start()
+challenge_ack=send(ip/bogus_syn, iface=SRC_IF)
+sniffer.join(timeout=5)
+
+if sniffer.captured == None:
+       print "ERROR: no packet received"
+        exit(1)
+
+challenge_ack = None
+
+for p in sniffer.captured:
+       if p.haslayer(TCP) and p.getlayer(TCP).sport == 7:
+               challenge_ack = p
+               break
+
+if challenge_ack == None:
+       print "No ACK has been seen"
+        exit(1)
+
+if challenge_ack.getlayer(TCP).seq != (synack.seq + 1):
+       print "ERROR: expecting seq %d got %d in challange ack" % \
+                (challenge_ack.getlayer(TCP).seq, (synack.seq + 1))
+       exit(1)
+
+exit(0)

Reply via email to