On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote:
> On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote:
> > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote:
> > > if_clone_create() has the races caused by context switch.
> > 
> > Can you share a backtrace of such race?  Where does the kernel panic?
> >
> 
> This diff was inspired by thread [1]. As I explained [2] here is 3
> issues that cause panics produced by command below:
> 
> ---- cut begin ----
> for i in 1 2 3; do while true; do ifconfig bridge0 create& \
>       ifconfig bridge0 destroy& done& done
> ---- cut end ----

Thanks, I couldn't reproduce it on any of the machines I tried.  Did you
managed to reproduce it with other pseudo-devices or just with bridge0?

> My system was stable with the last diff I did for thread [1]. But since
> this final diff [3] which include fixes for tun(4) is quick and dirty
> and not for commit I decided to make the diff to fix the races caused by
> if_clone_create() at first.
> 
> I included screenshot with panic.

Thanks, interesting that the corruption happens on a list that should be
initialized.  Does that mean the context switch on Thread 1 is happening
before if_attach_common() is called?

You said your previous email that there's a context switch.  Do you know
when it happens?  You could see that in ddb by looking at the backtrace
of the other CPU.

Is the context switch leading to the race common to all pseudo-drivers
or is it in the bridge(4) driver?

Regarding your solution, do I understand correctly that the goal is to
serialize all if_clone_create()?  Is it really needed to remember which
unit is being currently created or can't we just serialize all of them?

The fact that a lock is not held over the cloning operation is imho
positive.

Reply via email to