Well, it's definitely a race, and the potential deadlock is handled
in mi_switch(), so my vote is for "good intention, bad judgement"
on the assertions in kern_lock.c

On Fri, Nov 05, 2010 at 05:52:23PM +0100, Mike Belopuhov wrote:
> hi,
> 
> we've finally got an interesting panic that shed some light on the
> 'kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed'
> panics that plagued some heavily loaded systems.
> 
> apparently, there's a window in mi_switch (right after cpu_switchto) that
> has sched_lock held but kernel_lock released, thus allowing other cpus
> to take advantage of the kernel_lock:
> 
>                       cpu_switchto(p, nextproc);
>               } else {
>                       p->p_stat = SONPROC;
>               }
> 
>               clear_resched(curcpu());
> 
>               SCHED_ASSERT_LOCKED();
> 
>               /*
>                * To preserve lock ordering, we need to release the sched lock
>                * and grab it after we grab the big lock.
>                * In the future, when the sched lock isn't recursive, we'll
>                * just release it here.
>                */
>       #ifdef MULTIPROCESSOR
>               __mp_unlock(&sched_lock);
>       #endif
> 
> unfortunately they can't do this because we check for a sched_lock
> being held:
> 
>       void
>       _kernel_lock(void)
>       {
>               SCHED_ASSERT_UNLOCKED();
>               __mp_lock(&kernel_lock);
>       }
> 
> it should be safe to take kernel_lock there because mi_switch:
> 1) doesn't do anything nasty there;
> 2) figures out whether process should take a kernel_lock later on:
> 
>       #ifdef MULTIPROCESSOR
>               /*
>                * Reacquire the kernel_lock now.  We do this after we've
>                * released the scheduler lock to avoid deadlock, and before
>                * we reacquire the interlock and the scheduler lock.
>                */
>               if (p->p_flag & P_BIGLOCK)
>                       __mp_acquire_count(&kernel_lock, hold_count);
>               __mp_acquire_count(&sched_lock, sched_count + 1);
>       #endif
> 
> opinions?  should we just remove an assertion?
> or am i trippin' on some high quality drugs? (:
> 
> ddb log session:
> 
> ddb{0}> tr
> Debugger(d08cc9e6,e08b7e34,d08aaf94,e08b7e34,d0a44e28) at Debugger+0x4
> panic(d08aaf94,d08346ee,d08a80e0,d08a83ed,16a) at panic+0x5d
> __assert(d08346ee,d08a83ed,16a,d08a80e0,0) at __assert+0x2e
> _kernel_lock(1,0,d09d15cc,7a120,0) at _kernel_lock+0x48
> trap() at trap+0x5fb
> --- trap (number -547639296) ---
> Bad frame pointer: 0xd0ad48a0
> 0:
> ddb{0}> show panic
> kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed:
> file "..
> /../../../kern/kern_lock.c", line 362
> ddb{0}> show ps
> No such command
> ddb{0}> ps          
>    PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>   2111      1   2111      0  3   0x2040180  select        sendmail
>  12050      1  12050      0  3   0x2004080  ttyin         getty
>   3065      1   3065      0  3   0x2004080  ttyin         getty
>   2937      1   2937      0  3   0x2004080  ttyin         getty
>  15153      1  15153      0  3   0x2004080  ttyin         getty
>  24343      1  24343      0  3   0x2004080  ttyin         getty
>    631      1    631      0  3   0x2004080  ttyin         getty
>  18404      1  18404      0  3   0x2000080  select        cron
>   7380      1   7380      0  3   0x2000180  select        inetd
>  23123      1  23123      0  3   0x2000080  select        sshd
>   8900  27266  27156     83  3   0x2000180  poll          ntpd
>  27266  27156  27156     83  3   0x2000180  poll          ntpd
>  27156      1  27156      0  3   0x2000080  poll          ntpd
>  22867  11112  11112     74  3   0x2000180  bpf           pflogd
>  11112      1  11112      0  3   0x2000080  netio         pflogd
>  15707  20626  20626     73  3   0x2000180  poll          syslogd
>  20626      1  20626      0  3   0x2000088  netio         syslogd
>  29035      1  29035     77  3   0x2000180  poll          dhclient
>  19499      1  27610      0  3   0x2000080  poll          dhclient
>   7362      1   7362     77  3   0x2000180  poll          dhclient
>  22676      1  27610      0  3   0x2000080  poll          dhclient
>     19      0      0      0  3   0x2100200  aiodoned      aiodoned
>     18      0      0      0  3   0x2100200  syncer        update
>     17      0      0      0  3   0x2100200  cleaner       cleaner
>     16      0      0      0  3    0x100200  reaper        reaper
>     15      0      0      0  3   0x2100200  pgdaemon      pagedaemon
>     14      0      0      0  3   0x2100200  bored         crypto
>     13      0      0      0  3   0x2100200  pftm          pfpurge
>     12      0      0      0  3   0x2100200  usbtsk        usbtask
>     11      0      0      0  3   0x2100200  usbatsk       usbatsk
>     10      0      0      0  3   0x2100200  acpi0         acpi0
>      9      0      0      0  7  0x40100200                idle5
>      8      0      0      0  7  0x40100200                idle4
>      7      0      0      0  7  0x40100200                idle3
>      6      0      0      0  7  0x40100200                idle2
>      5      0      0      0  7  0x40100200                idle1
>      4      0      0      0  3   0x2100200  bored         syswq
> *    3      0      0      0  7  0x40100200                idle0
>      2      0      0      0  3   0x2100200  kmalloc       kmthread
>      1      0      1      0  3   0x2004080  wait          init
>      0     -1      0      0  3   0x2080200  scheduler     swapper
> ddb{0}> ddb cpu 1
> No such command
> ddb{0}> mach ddbcpu 1 
> Stopped at      Debugger+0x4:   popl    %ebp
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS
> PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS,
> TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{1}> tr 
> Debugger(d4320000,d082b116,1,202,1) at Debugger+0x4
> i386_ipi_handler(c0,58,10,10,10) at i386_ipi_handler+0x5f
> Xintripi() at Xintripi+0x47
> --- interrupt ---
> __mp_lock(d0a1d10c,7fffffff,da4154f0,d0a184a0,e08bbf5c) at __mp_lock+0x52
> __mp_acquire_count(d0a1d10c,1,10,d4320000,d4320034,e08bbf8c,0,2,1,0,2f6d)
> at __
> mp_acquire_count+0x20
> mi_switch(d0a1d10c,d4320000,da4154f0,e08bbf90,d03d3a53) at mi_switch+0x168
> sched_idle(d4320000) at sched_idle+0x1cc
> Bad frame pointer: 0xd0b8ae48
> ddb{1}> 

Reply via email to