Well, it's definitely a race, and the potential deadlock is handled
in mi_switch(), so my vote is for "good intention, bad judgement"
on the assertions in kern_lock.c
On Fri, Nov 05, 2010 at 05:52:23PM +0100, Mike Belopuhov wrote:
> hi,
>
> we've finally got an interesting panic that shed some light on the
> 'kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed'
> panics that plagued some heavily loaded systems.
>
> apparently, there's a window in mi_switch (right after cpu_switchto) that
> has sched_lock held but kernel_lock released, thus allowing other cpus
> to take advantage of the kernel_lock:
>
> cpu_switchto(p, nextproc);
> } else {
> p->p_stat = SONPROC;
> }
>
> clear_resched(curcpu());
>
> SCHED_ASSERT_LOCKED();
>
> /*
> * To preserve lock ordering, we need to release the sched lock
> * and grab it after we grab the big lock.
> * In the future, when the sched lock isn't recursive, we'll
> * just release it here.
> */
> #ifdef MULTIPROCESSOR
> __mp_unlock(&sched_lock);
> #endif
>
> unfortunately they can't do this because we check for a sched_lock
> being held:
>
> void
> _kernel_lock(void)
> {
> SCHED_ASSERT_UNLOCKED();
> __mp_lock(&kernel_lock);
> }
>
> it should be safe to take kernel_lock there because mi_switch:
> 1) doesn't do anything nasty there;
> 2) figures out whether process should take a kernel_lock later on:
>
> #ifdef MULTIPROCESSOR
> /*
> * Reacquire the kernel_lock now. We do this after we've
> * released the scheduler lock to avoid deadlock, and before
> * we reacquire the interlock and the scheduler lock.
> */
> if (p->p_flag & P_BIGLOCK)
> __mp_acquire_count(&kernel_lock, hold_count);
> __mp_acquire_count(&sched_lock, sched_count + 1);
> #endif
>
> opinions? should we just remove an assertion?
> or am i trippin' on some high quality drugs? (:
>
> ddb log session:
>
> ddb{0}> tr
> Debugger(d08cc9e6,e08b7e34,d08aaf94,e08b7e34,d0a44e28) at Debugger+0x4
> panic(d08aaf94,d08346ee,d08a80e0,d08a83ed,16a) at panic+0x5d
> __assert(d08346ee,d08a83ed,16a,d08a80e0,0) at __assert+0x2e
> _kernel_lock(1,0,d09d15cc,7a120,0) at _kernel_lock+0x48
> trap() at trap+0x5fb
> --- trap (number -547639296) ---
> Bad frame pointer: 0xd0ad48a0
> 0:
> ddb{0}> show panic
> kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed:
> file "..
> /../../../kern/kern_lock.c", line 362
> ddb{0}> show ps
> No such command
> ddb{0}> ps
> PID PPID PGRP UID S FLAGS WAIT COMMAND
> 2111 1 2111 0 3 0x2040180 select sendmail
> 12050 1 12050 0 3 0x2004080 ttyin getty
> 3065 1 3065 0 3 0x2004080 ttyin getty
> 2937 1 2937 0 3 0x2004080 ttyin getty
> 15153 1 15153 0 3 0x2004080 ttyin getty
> 24343 1 24343 0 3 0x2004080 ttyin getty
> 631 1 631 0 3 0x2004080 ttyin getty
> 18404 1 18404 0 3 0x2000080 select cron
> 7380 1 7380 0 3 0x2000180 select inetd
> 23123 1 23123 0 3 0x2000080 select sshd
> 8900 27266 27156 83 3 0x2000180 poll ntpd
> 27266 27156 27156 83 3 0x2000180 poll ntpd
> 27156 1 27156 0 3 0x2000080 poll ntpd
> 22867 11112 11112 74 3 0x2000180 bpf pflogd
> 11112 1 11112 0 3 0x2000080 netio pflogd
> 15707 20626 20626 73 3 0x2000180 poll syslogd
> 20626 1 20626 0 3 0x2000088 netio syslogd
> 29035 1 29035 77 3 0x2000180 poll dhclient
> 19499 1 27610 0 3 0x2000080 poll dhclient
> 7362 1 7362 77 3 0x2000180 poll dhclient
> 22676 1 27610 0 3 0x2000080 poll dhclient
> 19 0 0 0 3 0x2100200 aiodoned aiodoned
> 18 0 0 0 3 0x2100200 syncer update
> 17 0 0 0 3 0x2100200 cleaner cleaner
> 16 0 0 0 3 0x100200 reaper reaper
> 15 0 0 0 3 0x2100200 pgdaemon pagedaemon
> 14 0 0 0 3 0x2100200 bored crypto
> 13 0 0 0 3 0x2100200 pftm pfpurge
> 12 0 0 0 3 0x2100200 usbtsk usbtask
> 11 0 0 0 3 0x2100200 usbatsk usbatsk
> 10 0 0 0 3 0x2100200 acpi0 acpi0
> 9 0 0 0 7 0x40100200 idle5
> 8 0 0 0 7 0x40100200 idle4
> 7 0 0 0 7 0x40100200 idle3
> 6 0 0 0 7 0x40100200 idle2
> 5 0 0 0 7 0x40100200 idle1
> 4 0 0 0 3 0x2100200 bored syswq
> * 3 0 0 0 7 0x40100200 idle0
> 2 0 0 0 3 0x2100200 kmalloc kmthread
> 1 0 1 0 3 0x2004080 wait init
> 0 -1 0 0 3 0x2080200 scheduler swapper
> ddb{0}> ddb cpu 1
> No such command
> ddb{0}> mach ddbcpu 1
> Stopped at Debugger+0x4: popl %ebp
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS
> PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS,
> TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{1}> tr
> Debugger(d4320000,d082b116,1,202,1) at Debugger+0x4
> i386_ipi_handler(c0,58,10,10,10) at i386_ipi_handler+0x5f
> Xintripi() at Xintripi+0x47
> --- interrupt ---
> __mp_lock(d0a1d10c,7fffffff,da4154f0,d0a184a0,e08bbf5c) at __mp_lock+0x52
> __mp_acquire_count(d0a1d10c,1,10,d4320000,d4320034,e08bbf8c,0,2,1,0,2f6d)
> at __
> mp_acquire_count+0x20
> mi_switch(d0a1d10c,d4320000,da4154f0,e08bbf90,d03d3a53) at mi_switch+0x168
> sched_idle(d4320000) at sched_idle+0x1cc
> Bad frame pointer: 0xd0b8ae48
> ddb{1}>