On Sun, Jun 02, 2019 at 10:29:01AM -0300, Martin Pieuchot wrote: > On 02/06/19(Sun) 04:30, Visa Hankala wrote: > > On Sat, Jun 01, 2019 at 07:04:23PM -0300, Martin Pieuchot wrote: > > > On 01/06/19(Sat) 23:22, Mark Kettenis wrote: > > > > > Date: Sat, 1 Jun 2019 17:32:52 -0300 > > > > > From: Martin Pieuchot <m...@openbsd.org> > > > > > > > > > > Currently it isn't safe to call mtx_enter_try(9) if you're already > > > > > holding the mutex. That means it isn't safe to call that function > > > > > in hardclock(9), like with `windup_mtx'. That's why the mutex needs > > > > > to be initialized as IPL_CLOCK. > > > > > > > > > > I'm working on removing the SCHED_LOCK() from inside hardclock(9). > > > > > That leads me to wonder if I should initialize all mutexes to > > > > > IPL_SCHED, > > > > > possibly blocking clock interrupts, or if we should change the mutex > > > > > API > > > > > to allow mtx_enter_try(9) to deal with recursion. > > > > > > > > > > The diff below removes the recursion check for mtx_enter_try(9). > > > > > > > > > > Comments? Oks? > > > > > > > > My initial reaction is that if you're trying to lock when you already > > > > have the lock, there is something wrong with your locking strategy and > > > > that this is something we don't want. > > > > > > Could you elaborate? Are you saying that preventing hardclock(9) to run > > > is the way to move forward to unlock its internals? Why isn't that > > > strategy wrong? > > > > > > In the `windup_mtx' case, does it matter if the mutex is taken by > > > another CPU or by myself? What's the problem when CPU0 is one holding > > > the lock? > > > > mutex(9) is not and should not become recursive. Recursive locking > > works when it is voluntary. If recursion was allowed with interrupts, > > the CPU could re-enter the critical section at any moment, possibly > > seeing inconsistent state or breaking assumptions made by the original > > entry. > > I don't understand how your answer is related to my diff :( > > Why allowing mtx_enter_try(9) to fail and not to panic(9) when the CPU > executing the code already owns the mutex is wrong? > > Is that what you call recursion? Not entering the critical section in > the interrupt instead of blocking the interrupt? > > How is that different from not entering the critical section because > another CPU is holding the mutex?
To me, it looks very suspicious if a CPU tries to lock a mutex that it already holds because it indicates that there is a potential recursion. My stance is that such a locking behaviour is a design error in general and the current way of checking recursion in mtx_enter_try() is sound. If a mutex can be taken on multiple interrupt priority levels, the lock has to be configured to block the highest level, and mutexes used in hardclock() are no exception.