On Wed, Mar 13, 2019 at 11:32:36AM +0100, Mike Belopuhov wrote:
>
> David Gwynne writes:
>
> > On Tue, Mar 05, 2019 at 12:03:05PM +1000, David Gwynne wrote:
> >> this extends the fildrop mechanism so you can drop the packets with bpf
> >> using the existing fildrop method, but with an extra tweak so you can
> >> avoid the cost of copying packets to userland.
> >>
> >> i wanted to quickly drop some packets in the rx interrupt path to try
> >> and prioritise some traffic getting processed by the system. the initial
> >> version was going to use weird custom DLTs and extra bpf interface
> >> pointers and stuff, but most of the glue is already in place with
> >> the fildrop functionality.
> >>
> >> this also adds a bit to tcpdump so you can set a fildrop action. it
> >> means tcpdump can be used as a quick and dirty firewall.
> >
> > there's a bit more discussion about this that i should have included in
> > my original email.
> >
> > firstly, the functionality it offers. this effectively offers a firewall
> > with the ability to filter arbitrary packets. this has significant
> > overlap with the functionality that pf offers, but there are a couple of
> > important differences. pf only handles IP traffic, but we don't
> > really have a good story when it comes to filtering non-ip. we could
> > implement something like pf for the next protocol that people need to
> > manage, but what is that next protocol? pf like implies a highly
> > optimised but constrained set of filters that deeply understands the
> > protocol it is handling. is that next protol ieee1905p? cdp? ipx?
> > macsec? where should that protocol be filtered in the stack?
> >
> > im arguing that bpf with fildrop has the benefit of already existing,
> > it's in place, and it already has the ability to be configured with
> > arbitrary policy. considering we've got this far without handling
> > non-ip, spending more time on it seems unjustified.
> >
> > secondly, the performance aspects of this diff.
> >
> > bpf allows for arbitrarily complicated filters, so it is entirely
> > possible to slow your box down a lot by writing really complicated
> > filters. this is in comparison to pf where each rule has a limit
> > on how much work it will do, which is also mitigated by the ruleset
> > optimiser and skip steps. i don't have a good answer to that except to
> > say you can already add such filters to bpf, they just don't do anything
> > except copy packets at the moment.
> >
> > another interesting performance consideration is that bpf runs a lot
> > earlier than pf, so filtering packets with bpf can avoid a lot of work
> > in the stack. if you want to pass IP statefully, pf is a much better
> > hammer, but to drop packets up front bpf is interesting.
> >
> > for example, thanks to hrvoje popovski i now have a setup where im
> > pushing ~7 million packets per second through a box to do performance
> > measurements. those packets are udp from random ips to port 7 on
> > another set of random ips. if i have the following rule in pf.conf:
> >
> > block in quick proto udp to port 7
> >
> > i can rx and drop about 550kpps. if im sshed in using another
> > interface, the system is super sluggish over that shell.
> >
> > if i use this diff and run the following;
> >
> > # tcpdump -B drop -i ix1 udp and port 7
> >
> > i'm dropping about 1.2 million pps, and the box is responsive when sshed
> > in using another interface.
> >
> > so, to summarise, bpf can already be used to drop packets, this is just
> > a tweak to make it faster, and a tweak so tcpdump can be used to set up
> > that filtering.
> >
>
> I think this is a great development. Diff looks good as well.
I agree. OK claudio@
> >> Index: sys/net/bpf.c
> >> ===================================================================
> >> RCS file: /cvs/src/sys/net/bpf.c,v
> >> retrieving revision 1.170
> >> diff -u -p -r1.170 bpf.c
> >> --- sys/net/bpf.c 13 Jul 2018 08:51:15 -0000 1.170
> >> +++ sys/net/bpf.c 4 Mar 2019 22:30:32 -0000
> >> @@ -926,9 +926,20 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t
> >> *(u_int *)addr = d->bd_fildrop;
> >> break;
> >>
> >> - case BIOCSFILDROP: /* set "filter-drop" flag */
> >> - d->bd_fildrop = *(u_int *)addr ? 1 : 0;
> >> + case BIOCSFILDROP: { /* set "filter-drop" flag */
> >> + unsigned int fildrop = *(u_int *)addr;
> >> + switch (fildrop) {
> >> + case BPF_FILDROP_PASS:
> >> + case BPF_FILDROP_CAPTURE:
> >> + case BPF_FILDROP_DROP:
> >> + d->bd_fildrop = fildrop;
> >> + break;
> >> + default:
> >> + error = EINVAL;
> >> + break;
> >> + }
> >> break;
> >> + }
> >>
> >> case BIOCGDIRFILT: /* get direction filter */
> >> *(u_int *)addr = d->bd_dirfilt;
> >> @@ -1261,23 +1272,26 @@ _bpf_mtap(caddr_t arg, const struct mbuf
> >> pktlen += m0->m_len;
> >>
> >> SRPL_FOREACH(d, &sr, &bp->bif_dlist, bd_next) {
> >> + struct srp_ref bsr;
> >> + struct bpf_program *bf;
> >> + struct bpf_insn *fcode = NULL;
> >> +
> >> atomic_inc_long(&d->bd_rcount);
> >>
> >> - if ((direction & d->bd_dirfilt) != 0)
> >> - slen = 0;
> >> - else {
> >> - struct srp_ref bsr;
> >> - struct bpf_program *bf;
> >> - struct bpf_insn *fcode = NULL;
> >> -
> >> - bf = srp_enter(&bsr, &d->bd_rfilter);
> >> - if (bf != NULL)
> >> - fcode = bf->bf_insns;
> >> - slen = bpf_mfilter(fcode, m, pktlen);
> >> - srp_leave(&bsr);
> >> - }
> >> + if (ISSET(d->bd_dirfilt, direction))
> >> + continue;
> >> +
> >> + bf = srp_enter(&bsr, &d->bd_rfilter);
> >> + if (bf != NULL)
> >> + fcode = bf->bf_insns;
> >> + slen = bpf_mfilter(fcode, m, pktlen);
> >> + srp_leave(&bsr);
> >>
> >> - if (slen > 0) {
> >> + if (slen == 0)
> >> + continue;
> >> + if (d->bd_fildrop != BPF_FILDROP_PASS)
> >> + drop = 1;
> >> + if (d->bd_fildrop != BPF_FILDROP_DROP) {
> >> if (!gottime++)
> >> microtime(&tv);
> >>
> >> @@ -1285,9 +1299,6 @@ _bpf_mtap(caddr_t arg, const struct mbuf
> >> bpf_catchpacket(d, (u_char *)m, pktlen, slen, cpfn,
> >> &tv);
> >> mtx_leave(&d->bd_mtx);
> >> -
> >> - if (d->bd_fildrop)
> >> - drop = 1;
> >> }
> >> }
> >> SRPL_LEAVE(&sr);
> >> Index: sys/net/bpf.h
> >> ===================================================================
> >> RCS file: /cvs/src/sys/net/bpf.h,v
> >> retrieving revision 1.65
> >> diff -u -p -r1.65 bpf.h
> >> --- sys/net/bpf.h 3 Feb 2018 13:37:37 -0000 1.65
> >> +++ sys/net/bpf.h 4 Mar 2019 22:30:32 -0000
> >> @@ -126,6 +126,13 @@ struct bpf_version {
> >> #define BPF_DIRECTION_IN 1
> >> #define BPF_DIRECTION_OUT (1<<1)
> >>
> >> +/*
> >> + * Values for BIOCGFILDROP/BIOCSFILDROP
> >> + */
> >> +#define BPF_FILDROP_PASS 0 /* capture, pass */
> >> +#define BPF_FILDROP_CAPTURE 1 /* capture, drop */
> >> +#define BPF_FILDROP_DROP 2 /* no capture, drop */
> >> +
> >> struct bpf_timeval {
> >> u_int32_t tv_sec;
> >> u_int32_t tv_usec;
> >> Index: share/man/man4/bpf.4
> >> ===================================================================
> >> RCS file: /cvs/src/share/man/man4/bpf.4,v
> >> retrieving revision 1.38
> >> diff -u -p -r1.38 bpf.4
> >> --- share/man/man4/bpf.4 28 Apr 2016 19:07:19 -0000 1.38
> >> +++ share/man/man4/bpf.4 4 Mar 2019 22:30:32 -0000
> >> @@ -391,11 +391,24 @@ This flag is initialized to zero by defa
> >> .Pp
> >> .It Dv BIOCSFILDROP Fa "u_int *"
> >> .It Dv BIOCGFILDROP Fa "u_int *"
> >> -Sets or gets the status of the
> >> +Sets or gets the
> >> .Dq filter drop
> >> -flag.
> >> -If non-zero, packets matching any filters will be reported to the
> >> -associated interface so that they can be dropped.
> >> +action.
> >> +The supported actions for packets matching the filter are:
> >> +.Pp
> >> +.Bl -tag -width "BPF_FILDROP_CAPTURE" -compact
> >> +.It Dv BPF_FILDROP_PASS
> >> +Accept and capture
> >> +.It Dv BPF_FILDROP_CAPTURE
> >> +Drop and capture
> >> +.It Dv BPF_FILDROP_DROP
> >> +Drop and do not capture
> >> +.El
> >> +.Pp
> >> +Packets matching any filter configured to drop packets will be
> >> +reported to the associated interface so that they can be dropped.
> >> +The default action is
> >> +.Dv BPF_FILDROP_PASS .
> >> .Pp
> >> .It Dv BIOCSDIRFILT Fa "u_int *"
> >> .It Dv BIOCGDIRFILT Fa "u_int *"
> >> Index: usr.sbin/tcpdump/privsep.c
> >> ===================================================================
> >> RCS file: /cvs/src/usr.sbin/tcpdump/privsep.c,v
> >> retrieving revision 1.52
> >> diff -u -p -r1.52 privsep.c
> >> --- usr.sbin/tcpdump/privsep.c 17 Nov 2018 16:52:02 -0000 1.52
> >> +++ usr.sbin/tcpdump/privsep.c 4 Mar 2019 22:30:32 -0000
> >> @@ -224,7 +224,7 @@ priv_exec(int argc, char *argv[])
> >> /* parse the arguments for required options */
> >> opterr = 0;
> >> while ((i = getopt(argc, argv,
> >> - "ac:D:deE:fF:i:lLnNOopPqr:s:StT:vw:xXy:Y")) != -1) {
> >> + "aB:c:D:deE:fF:i:lLnNOopPqr:s:StT:vw:xXy:Y")) != -1) {
> >> switch (i) {
> >> case 'n':
> >> nflag++;
> >> @@ -366,7 +366,7 @@ static void
> >> impl_open_bpf(int fd, int *bpfd)
> >> {
> >> int snaplen, promisc, err;
> >> - u_int dlt, dirfilt;
> >> + u_int dlt, dirfilt, fildrop;
> >> char device[IFNAMSIZ];
> >> size_t iflen;
> >>
> >> @@ -376,10 +376,11 @@ impl_open_bpf(int fd, int *bpfd)
> >> must_read(fd, &promisc, sizeof(int));
> >> must_read(fd, &dlt, sizeof(u_int));
> >> must_read(fd, &dirfilt, sizeof(u_int));
> >> + must_read(fd, &fildrop, sizeof(fildrop));
> >> iflen = read_string(fd, device, sizeof(device), __func__);
> >> if (iflen == 0)
> >> errx(1, "Invalid interface size specified");
> >> - *bpfd = pcap_live(device, snaplen, promisc, dlt, dirfilt);
> >> + *bpfd = pcap_live(device, snaplen, promisc, dlt, dirfilt, fildrop);
> >> err = errno;
> >> if (*bpfd < 0)
> >> logmsg(LOG_DEBUG,
> >> Index: usr.sbin/tcpdump/privsep.h
> >> ===================================================================
> >> RCS file: /cvs/src/usr.sbin/tcpdump/privsep.h,v
> >> retrieving revision 1.11
> >> diff -u -p -r1.11 privsep.h
> >> --- usr.sbin/tcpdump/privsep.h 8 Nov 2018 14:06:09 -0000 1.11
> >> +++ usr.sbin/tcpdump/privsep.h 4 Mar 2019 22:30:32 -0000
> >> @@ -45,11 +45,11 @@ __dead void priv_exec(int, char **);
> >> void priv_init_done(void);
> >>
> >> int setfilter(int, int, char *);
> >> -int pcap_live(const char *, int, int, u_int, u_int);
> >> +int pcap_live(const char *, int, int, u_int, u_int, u_int);
> >>
> >> struct bpf_program *priv_pcap_setfilter(pcap_t *, int, u_int32_t);
> >> pcap_t *priv_pcap_live(const char *, int, int, int, char *, u_int,
> >> - u_int);
> >> + u_int, u_int);
> >> pcap_t *priv_pcap_offline(const char *, char *);
> >>
> >> size_t priv_gethostbyaddr(char *, size_t, int, char *, size_t);
> >> Index: usr.sbin/tcpdump/privsep_pcap.c
> >> ===================================================================
> >> RCS file: /cvs/src/usr.sbin/tcpdump/privsep_pcap.c,v
> >> retrieving revision 1.23
> >> diff -u -p -r1.23 privsep_pcap.c
> >> --- usr.sbin/tcpdump/privsep_pcap.c 17 Nov 2018 16:52:02 -0000
> >> 1.23
> >> +++ usr.sbin/tcpdump/privsep_pcap.c 4 Mar 2019 22:30:32 -0000
> >> @@ -173,7 +173,7 @@ priv_pcap_setfilter(pcap_t *hpcap, int o
> >> /* privileged part of priv_pcap_live */
> >> int
> >> pcap_live(const char *device, int snaplen, int promisc, u_int dlt,
> >> - u_int dirfilt)
> >> + u_int dirfilt, u_int fildrop)
> >> {
> >> int fd;
> >> struct ifreq ifr;
> >> @@ -201,6 +201,9 @@ pcap_live(const char *device, int snaple
> >> if (ioctl(fd, BIOCSDIRFILT, &dirfilt) < 0)
> >> goto error;
> >>
> >> + if (ioctl(fd, BIOCSFILDROP, &fildrop) < 0)
> >> + goto error;
> >> +
> >> /* lock the descriptor */
> >> if (ioctl(fd, BIOCLOCK, NULL) < 0)
> >> goto error;
> >> @@ -218,7 +221,7 @@ pcap_live(const char *device, int snaple
> >> */
> >> pcap_t *
> >> priv_pcap_live(const char *dev, int slen, int prom, int to_ms,
> >> - char *ebuf, u_int dlt, u_int dirfilt)
> >> + char *ebuf, u_int dlt, u_int dirfilt, u_int fildrop)
> >> {
> >> int fd, err;
> >> struct bpf_version bv;
> >> @@ -247,6 +250,7 @@ priv_pcap_live(const char *dev, int slen
> >> must_write(priv_fd, &prom, sizeof(int));
> >> must_write(priv_fd, &dlt, sizeof(u_int));
> >> must_write(priv_fd, &dirfilt, sizeof(u_int));
> >> + must_write(priv_fd, &fildrop, sizeof(fildrop));
> >> write_string(priv_fd, dev);
> >>
> >> fd = receive_fd(priv_fd);
> >> Index: usr.sbin/tcpdump/tcpdump.8
> >> ===================================================================
> >> RCS file: /cvs/src/usr.sbin/tcpdump/tcpdump.8,v
> >> retrieving revision 1.99
> >> diff -u -p -r1.99 tcpdump.8
> >> --- usr.sbin/tcpdump/tcpdump.8 6 Jul 2018 09:59:12 -0000 1.99
> >> +++ usr.sbin/tcpdump/tcpdump.8 4 Mar 2019 22:30:32 -0000
> >> @@ -29,6 +29,7 @@
> >> .Nm tcpdump
> >> .Op Fl AadefILlNnOopqStvXx
> >> .Op Fl c Ar count
> >> +.Op Fl B Ar fildrop
> >> .Op Fl D Ar direction
> >> .Op Fl E Oo Ar espalg : Oc Ns Ar espkey
> >> .Op Fl F Ar file
> >> @@ -58,6 +59,23 @@ The smaller of the entire packet or
> >> bytes will be printed.
> >> .It Fl a
> >> Attempt to convert network and broadcast addresses to names.
> >> +.It Fl B Ar fildrop
> >> +Configure the drop action specified by
> >> +.A fildrop
> >> +to be used when the filter expression matches a packet.
> >> +The actions are:
> >> +.Pp
> >> +.Bl -tag -width "capture" -offset indent -compact
> >> +.It Cm pass
> >> +Matching packets are accepted and captured.
> >> +.It Cm capture
> >> +Matching packets are dropped and captured.
> >> +.It Cm drop
> >> +Matching packets are dropped and not captured.
> >> +.El
> >> +.Pp
> >> +The default action is
> >> +.Cm pass .
> >> .It Fl c Ar count
> >> Exit after receiving
> >> .Ar count
> >> Index: usr.sbin/tcpdump/tcpdump.c
> >> ===================================================================
> >> RCS file: /cvs/src/usr.sbin/tcpdump/tcpdump.c,v
> >> retrieving revision 1.88
> >> diff -u -p -r1.88 tcpdump.c
> >> --- usr.sbin/tcpdump/tcpdump.c 8 Nov 2018 14:06:09 -0000 1.88
> >> +++ usr.sbin/tcpdump/tcpdump.c 4 Mar 2019 22:30:32 -0000
> >> @@ -61,6 +61,7 @@
> >>
> >> int Aflag; /* dump ascii */
> >> int aflag; /* translate network and broadcast
> >> addresses */
> >> +int Bflag; /* BPF fildrop setting */
> >> int dflag; /* print filter code */
> >> int eflag; /* print ethernet header */
> >> int fflag; /* don't translate "foreign" IP address
> >> */
> >> @@ -231,7 +232,7 @@ main(int argc, char **argv)
> >>
> >> opterr = 0;
> >> while ((op = getopt(argc, argv,
> >> - "Aac:D:deE:fF:i:IlLnNOopqr:s:StT:vw:xXy:Y")) != -1)
> >> + "AaB:c:D:deE:fF:i:IlLnNOopqr:s:StT:vw:xXy:Y")) != -1)
> >> switch (op) {
> >>
> >> case 'A':
> >> @@ -243,6 +244,19 @@ main(int argc, char **argv)
> >> aflag = 1;
> >> break;
> >>
> >> + case 'B':
> >> + if (strcasecmp(optarg, "pass") == 0)
> >> + Bflag = BPF_FILDROP_PASS;
> >> + else if (strcasecmp(optarg, "capture") == 0)
> >> + Bflag = BPF_FILDROP_CAPTURE;
> >> + else if (strcasecmp(optarg, "drop") == 0)
> >> + Bflag = BPF_FILDROP_DROP;
> >> + else {
> >> + error("invalid BPF fildrop option: %s",
> >> + optarg);
> >> + }
> >> + break;
> >> +
> >> case 'c':
> >> cnt = strtonum(optarg, 1, INT_MAX, &errstr);
> >> if (errstr)
> >> @@ -440,7 +454,7 @@ main(int argc, char **argv)
> >> error("%s", ebuf);
> >> }
> >> pd = priv_pcap_live(device, snaplen, !pflag, 1000, ebuf,
> >> - dlt, dirfilt);
> >> + dlt, dirfilt, Bflag);
> >> if (pd == NULL)
> >> error("%s", ebuf);
> >>
> >> @@ -700,7 +714,7 @@ __dead void
> >> usage(void)
> >> {
> >> (void)fprintf(stderr,
> >> -"Usage: %s [-AadefILlNnOopqStvXx] [-c count] [-D direction]\n",
> >> +"Usage: %s [-AadefILlNnOopqStvXx] [-B fildrop] [-c count] [-D
> >> direction]\n",
> >> program_name);
> >> (void)fprintf(stderr,
> >> "\t [-E [espalg:]espkey] [-F file] [-i interface] [-r file]\n");
> >>
>
--
:wq Claudio