On Sat, Jun 27, 2020 at 12:10:24PM +0200, Martin Pieuchot wrote: > On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote: > > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote: > > > > if_clone_create() has the races caused by context switch. > > > > > > Can you share a backtrace of such race? Where does the kernel panic? > > > > > > > This diff was inspired by thread [1]. As I explained [2] here is 3 > > issues that cause panics produced by command below: > > > > ---- cut begin ---- > > for i in 1 2 3; do while true; do ifconfig bridge0 create& \ > > ifconfig bridge0 destroy& done& done > > ---- cut end ---- > > Thanks, I couldn't reproduce it on any of the machines I tried. Did you > managed to reproduce it with other pseudo-devices or just with bridge0? > > > My system was stable with the last diff I did for thread [1]. But since > > this final diff [3] which include fixes for tun(4) is quick and dirty > > and not for commit I decided to make the diff to fix the races caused by > > if_clone_create() at first. > > > > I included screenshot with panic. > > Thanks, interesting that the corruption happens on a list that should be > initialized. Does that mean the context switch on Thread 1 is happening > before if_attach_common() is called? > > You said your previous email that there's a context switch. Do you know > when it happens? You could see that in ddb by looking at the backtrace > of the other CPU. > > Is the context switch leading to the race common to all pseudo-drivers > or is it in the bridge(4) driver? > > Regarding your solution, do I understand correctly that the goal is to > serialize all if_clone_create()? Is it really needed to remember which > unit is being currently created or can't we just serialize all of them? > > The fact that a lock is not held over the cloning operation is imho > positive. >
I reworked tool for reproduce. Now I avoided fork()/exec() route and it takes couple of minutes to take panic on 4 cores. I attached some screenshots with panics caused by various pseudo-interfaces but my previous mail was banned. I will try to attach them with separate mails. I hope anyone else will try it. Now switch(4) is the bast way to reproduce. ---- cut begin ---- #include <sys/socket.h> #include <sys/ioctl.h> #include <net/if.h> #include <stdio.h> #include <stdlib.h> #include <err.h> #include <errno.h> #include <pthread.h> #include <string.h> #include <unistd.h> static struct ifreq ifr; static void *clone_create(void *arg) { int s; if((s=socket(AF_INET, SOCK_DGRAM, 0))<0) err(1, "socket()"); while(1){ if(ioctl(s, SIOCIFCREATE, &ifr)<0) if(errno==EINVAL) exit(1); } return NULL; } static void *clone_destroy(void *arg) { int s; if((s=socket(AF_INET, SOCK_DGRAM, 0))<0) err(1, "socket()"); while(1){ if(ioctl(s, SIOCIFDESTROY, &ifr)<0) if(errno==EINVAL) exit(1); } return NULL; } int main(int argc, char *argv[]) { pthread_t thr; int i; if(argc!=2){ fprintf(stderr, "usage: %s ifname\n", getprogname()); return 1; } if(getuid()!=0){ fprintf(stderr, "should be root\n"); return 1; } memset(&ifr, 0, sizeof(ifr)); strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name)); for(i=0; i<8*4; ++i){ if(pthread_create(&thr, NULL, clone_create, NULL)!=0) errx(1, "pthread_create(clone_create)"); } clone_destroy(NULL); return 0; } ---- cut end ----