Re: bpf_mtap(9) w/o KERNEL_LOCK

Martin Pieuchot Mon, 12 Dec 2016 05:51:08 -0800

On 07/12/16(Wed) 00:44, Alexander Bluhm wrote:
> On Mon, Nov 28, 2016 at 04:01:14PM +0100, Martin Pieuchot wrote:
> > @@ -717,7 +753,9 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t 
> >                             *(u_int *)addr = size = bpf_maxbufsize;
> >                     else if (size < BPF_MINBUFSIZE)
> >                             *(u_int *)addr = size = BPF_MINBUFSIZE;
> > +                   mtx_enter(&d->bd_mtx);
> >                     d->bd_bufsize = size;
> > +                   mtx_leave(&d->bd_mtx);
> >             }
> >             break;
> >  
> 
> bd_bufsize needs protection, but you only add it in BIOCSBLEN,
> you should also put a mutex in BIOCGBLEN.


I'm not sure to understand why.  The value is only written in ioctl(2)
context.  The mutex is here to make sure the value wont change while
bpf_catchpacket() is executed.

> > @@ -778,30 +815,36 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t 
> >      * Get device parameters.
> >      */
> >     case BIOCGDLT:
> > +           mtx_enter(&d->bd_mtx);
> >             if (d->bd_bif == NULL)
> >                     error = EINVAL;
> >             else
> >                     *(u_int *)addr = d->bd_bif->bif_dlt;
> > +           mtx_leave(&d->bd_mtx);
> >             break;
> 
> bd_bif is not protected elsewhere.  bif_dlt is only set during attach.
> So I think you need no mutex here.

Indeed.

> >  
> >     /*
> >      * Set device parameters.
> >      */
> >     case BIOCSDLT:
> > +           mtx_enter(&d->bd_mtx);
> >             if (d->bd_bif == NULL)
> >                     error = EINVAL;
> >             else
> >                     error = bpf_setdlt(d, *(u_int *)addr);
> > +           mtx_leave(&d->bd_mtx);
> >             break;
> 
> bd_bif needs not protection, so put the mutex only in the else path.
> Then it is obvious why it is there.

Ack.

> >  
> >     /*
> >      * Set interface name.
> >      */
> >     case BIOCGETIF:
> > +           mtx_enter(&d->bd_mtx);
> >             if (d->bd_bif == NULL)
> >                     error = EINVAL;
> >             else
> >                     bpf_ifname(d->bd_bif->bif_ifp, (struct ifreq *)addr);
> > +           mtx_leave(&d->bd_mtx);
> >             break;
> 
> The interface name is not used by bpf without kernel lock.  You do
> not need a mutex here.

Ack.

> > @@ -1151,6 +1197,8 @@ filt_bpfread(struct knote *kn, long hint
> >  {
> >     struct bpf_d *d = kn->kn_hook;
> >  
> > +   KERNEL_ASSERT_LOCKED();
> > +
> >     kn->kn_data = d->bd_hlen;
> >     if (d->bd_immediate)
> >             kn->kn_data += d->bd_slen;
> 
> You need the mutex here to access bd_slen.

Ack.

> > @@ -1232,12 +1279,10 @@ _bpf_mtap(caddr_t arg, const struct mbuf
> >                     if (!gottime++)
> >                             microtime(&tv);
> >  
> > -                   KERNEL_LOCK();
> > -                   s = splnet();
> > +                   mtx_enter(&d->bd_mtx);
> >                     bpf_catchpacket(d, (u_char *)m, pktlen, slen, cpfn,
> >                         &tv);
> > -                   splx(s);
> > -                   KERNEL_UNLOCK();
> > +                   mtx_leave(&d->bd_mtx);
> >  
> >                     if (d->bd_fildrop)
> >                             drop = 1;
> 
> bd_fildrop can be changed by ioctl.  Can we expect to read the
> correct value without a memory barrier?
> 
> Same question applies to bd_dirfilt.

We probably need some.  I guess that the actual code works because these
events happen occasionally.

> bd_rcount is a long value incremented atomically in _bpf_mtap().
> It is read from an other CPU with kernel lock during an ioctl.  Is
> this sufficient to get a consisten value?  Do we also need an atomic
> operation when reading it?

Do we need a consistent value?

> > --- net/bpfdesc.h   22 Aug 2016 10:40:36 -0000      1.31
> > +++ net/bpfdesc.h   28 Nov 2016 14:51:06 -0000
> > @@ -56,15 +56,17 @@ struct bpf_d {
> >      *   fbuf (free) - When read is done, put cluster here.
> >      * On receiving, if sbuf is full and fbuf is 0, packet is dropped.
> >      */
> > +   struct mutex    bd_mtx;         /* protect buffer slots below */
> >     caddr_t         bd_sbuf;        /* store slot */
> >     caddr_t         bd_hbuf;        /* hold slot */
> >     caddr_t         bd_fbuf;        /* free slot */
> >     int             bd_slen;        /* current length of store buffer */
> >     int             bd_hlen;        /* current length of hold buffer */
> > -
> >     int             bd_bufsize;     /* absolute length of buffers */
> 
> So bd_mtx is protecting the fields above.  Everything below must
> either be not accessed without kernel lock, or only be set during
> attach.  Is that correct or do I have a wrong understanding of the
> SMP model?

I'm not sure I understood your sentence, but I think you got the model
right :)

> > -   struct bpf_if * bd_bif;         /* interface descriptor */
> > +   int             __in_uiomove;
> 
> Why do you prefix it with __ ?  bd_uiomove would be more consistent.

Because it's a debug trick, I'm not convinced we should keep it.

Re: bpf_mtap(9) w/o KERNEL_LOCK

Reply via email to