On 16/05/21(Sun) 15:56, Vitaliy Makkoveev wrote: > > > > On 14 May 2021, at 14:43, Martin Pieuchot <m...@openbsd.org> wrote: > > > > On 13/05/21(Thu) 14:50, Vitaliy Makkoveev wrote: > >> On Thu, May 13, 2021 at 01:15:05PM +0200, Hrvoje Popovski wrote: > >>> On 13.5.2021. 1:25, Vitaliy Makkoveev wrote: > >>>> It seems this lock order issue is not parallel diff specific. > >>> > >>> > >>> > >>> Yes, you are right ... it seemed familiar but i couldn't reproduce it > >>> on lapc trunk or without this diff so i thought that parallel diff is > >>> one to blame .. > >>> > >>> > >>> sorry for noise .. > >>> > >> > >> Timeout thread and interface destroy thread are both serialized by > >> kernel lock so it's hard to catch this issue. So your report is > >> useful :) > > > > The use of the NET_LOCK() in *clone_destroy() is problematic. tpmr(4) > > has a similar problem as reported by Hrvoje in a different thread. I > > don't know what it is serializing, hopefully David can tell us more. > > > > It serializes detach hook and clone_detach. Detach hooks are executed > with netlock held. Unfortunately this problem is much complicated, > and we can’t just introduce new lock to solve it because this will > introduce lock order issue.
We're talking about different uses of the NET_LOCK(). if_detach() and if_deactivate() internally grab the NET_LOCK() for the reason you mentioned. I'm asking what in aggr_down() and aggr_p_dtor() or respectively tpmr_down() and tpmr_p_dtor() require the NET_LOCK() and if this could be done differently.