Thank you both for the detailed explanations and audits! I'm currently running an 8.0_STABLE kernel on the machine (with 6.1_STABLE userland) and no panics so far. This smay be -- luck -- different timing that doesn't trigger the race -- a bug fixed since 6.1
If someone remembers a bug in this area fixed since 6.1, we can stop here. > if indeed the problem is a use-after-free arising from > soclose/sofree/soput that happened too early. What I'm sure about is that the first solock() in unp_gc() calls mutex_enter() and the preemtion-enabling loop in mutex_vector_enter evaluates mutex_oncpu() (the one in the while condition) on an owner value of -16L. What I guess is that that -16L is MUTEX_THREAD and was put there by MUTEX_DESTROY(), called by mutex_destroy() called by soput() by another thread that ran during the preemtion-enabled phase. Any other ideas on how that -16L could go there? Could I install some hack, that, in soput(), would panic if the socket to be freed is the one unp_gc() is currently working on? If that would trigger, we'd get a useful traceback, no? And if that panic doesn't trigger, but the other one does, we'd know that some of my assumptions were wrong.