On Sat, Jun 27, 2020 at 12:10:24PM +0200, Martin Pieuchot wrote:
> On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote:
> > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote:
> > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote:
> > > > if_clone_create() has the races caused by context switch.
> > > 
> > > Can you share a backtrace of such race?  Where does the kernel panic?
> > >
> > 
> > This diff was inspired by thread [1]. As I explained [2] here is 3
> > issues that cause panics produced by command below:
> > 
> > ---- cut begin ----
> > for i in 1 2 3; do while true; do ifconfig bridge0 create& \
> >     ifconfig bridge0 destroy& done& done
> > ---- cut end ----
> 
> Thanks, I couldn't reproduce it on any of the machines I tried.  Did you
> managed to reproduce it with other pseudo-devices or just with bridge0?
> 
> > My system was stable with the last diff I did for thread [1]. But since
> > this final diff [3] which include fixes for tun(4) is quick and dirty
> > and not for commit I decided to make the diff to fix the races caused by
> > if_clone_create() at first.
> > 
> > I included screenshot with panic.
> 
> Thanks, interesting that the corruption happens on a list that should be
> initialized.  Does that mean the context switch on Thread 1 is happening
> before if_attach_common() is called?
> 
> You said your previous email that there's a context switch.  Do you know
> when it happens?  You could see that in ddb by looking at the backtrace
> of the other CPU.
> 
> Is the context switch leading to the race common to all pseudo-drivers
> or is it in the bridge(4) driver?
> 
> Regarding your solution, do I understand correctly that the goal is to
> serialize all if_clone_create()?  Is it really needed to remember which
> unit is being currently created or can't we just serialize all of them?
> 
> The fact that a lock is not held over the cloning operation is imho
> positive.
> 

I reworked tool for reproduce. Now I avoided fork()/exec() route and it
takes couple of minutes to take panic on 4 cores. I attached some
screenshots with panics caused by various pseudo-interfaces but my
previous mail was banned. I will try to attach them with separate mails.

I hope anyone else will try it. Now switch(4) is the bast way to
reproduce.

---- cut begin ----

#include <sys/socket.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <errno.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>

static struct ifreq ifr;

static void *clone_create(void *arg)
{
        int s;

        if((s=socket(AF_INET, SOCK_DGRAM, 0))<0)
                err(1, "socket()");
        while(1){
                if(ioctl(s, SIOCIFCREATE, &ifr)<0)
                        if(errno==EINVAL)
                                exit(1);
        }

        return NULL;
}

static void *clone_destroy(void *arg)
{
        int s;

        if((s=socket(AF_INET, SOCK_DGRAM, 0))<0)
                err(1, "socket()");
        while(1){
                if(ioctl(s, SIOCIFDESTROY, &ifr)<0)
                        if(errno==EINVAL)
                                exit(1);
        }

        return NULL;
}

int main(int argc, char *argv[])
{
        pthread_t thr;
        int i;

        if(argc!=2){
                fprintf(stderr, "usage: %s ifname\n", getprogname());
                return 1;
        }

        if(getuid()!=0){
                fprintf(stderr, "should be root\n");
                return 1;
        }

        memset(&ifr, 0, sizeof(ifr));
        strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name));

        for(i=0; i<8*4; ++i){
                if(pthread_create(&thr, NULL, clone_create, NULL)!=0)
                        errx(1, "pthread_create(clone_create)");
        }

        clone_destroy(NULL);

        return 0;
}

---- cut end ----

Reply via email to