hi, > Hello, > working with a source code based on the matt-nb5-mips64 branch, > I can reproduce this panic: > panic: kernel diagnostic assertion "spc->spc_migrating == NULL" failed: file > "/dsk/l1/misc/bouyer/tmp/src/sys/kern/kern_synch.c", line 656 > mttycn_pollc 1 ipl 0x6 > Stopped in pid 0.4 (system) at netbsd:cpu_Debugger+0x4: jr ra > bdslot: nop > db{0}> tr > cpu_Debugger+4 (c04bd000,b300,10,c0407c00) ra c02192ac sz 0 > panic+1d4 (c04bd000,c02de430,c02f1450,c02f1360) ra c02cac78 sz 48 > __kernassert+48 (c04bd000,c02de430,c02f1450,c02f1360) ra c01f74a4 sz 32 > mi_switch+640 (c04bd000,c02de430,c02f1450,c02f1360) ra c01f3130 sz 64 > sleepq_block+f0 (c04bd000,c02de430,c02f1450,c02f1360) ra c0202f54 sz 48 > turnstile_block+2d0 (c04bd000,c02de430,c02f1450,c02f1360) ra c01e254c sz 56 > mutex_vector_enter+268 (c04bd000,c02de430,c02f1450,c02f1360) ra c026e2cc sz 64 > wapbl_biodone+48 (c04bd000,c02de430,c02f1450,c02f1360) ra c0255638 sz 48 > biodone2+a4 (c04bd000,c02de430,c02f1450,c02f1360) ra c02557c8 sz 32 > biointr+ac (c04bd000,c02de430,c02f1450,c02f1360) ra c01f3acc sz 32 > softint_dispatch+c4 (c04bd000,c02de430,c02f1450,c02f1360) ra c0295fe4 sz 72 > softint_fast_dispatch+80 (0,c02de430,c02f1450,c02f1360) ra 0 sz 24 > User-level: pid 0.4 > > > (The soft int may vary). Looking at the sources, I see that > sched_nextlwp() is carefull to not propose a new lwp if a migration is in > progress. But when this KASSERT fires we're not necesserely about to > switch to a new (non-idle) lwp, but the current lwp got woken up by another > CPU while it was about to switch. > > Shouldn't > KASSERT(spc->spc_migrating == NULL); > if (l->l_target_cpu != NULL) { > spc->spc_migrating = l; > } > be instead: > if (l->l_target_cpu != NULL) { > KASSERT(spc->spc_migrating == NULL); > spc->spc_migrating = l; > } > > I did the above change and it seems to work, can someone confirm this is > correct ?
i think you're correct. i have the attached patch long-staying in my local tree. i haven't committed it because it hasn't been reproduced on my machine yet. YAMAMOTO Takashi > > -- > Manuel Bouyer <bou...@antioche.eu.org> > NetBSD: 26 ans d'experience feront toujours la difference > --
Index: kern_synch.c =================================================================== RCS file: /cvsroot/src/sys/kern/kern_synch.c,v retrieving revision 1.284 diff -u -p -r1.284 kern_synch.c --- kern_synch.c 2 Nov 2010 15:17:37 -0000 1.284 +++ kern_synch.c 23 Nov 2010 22:16:57 -0000 @@ -654,9 +654,22 @@ mi_switch(lwp_t *l) l->l_stat = LSRUN; lwp_setlock(l, spc->spc_mutex); sched_enqueue(l, true); - /* Handle migration case */ - KASSERT(spc->spc_migrating == NULL); - if (l->l_target_cpu != NULL) { +#if 1 + if (spc->spc_migrating != NULL) { + printf("%s: bug %p %p %p\n", __func__, l, newl, spc); + } +#endif + /* + * Handle migration case + * + * spc_migrating != NULL here means that a softint + * which interrupted the idle lwp is blocking. + */ + KASSERT(spc->spc_migrating == NULL || + ((l->l_pflag & LP_INTR) != 0 && + newl != NULL && (newl->l_flag & LW_IDLE) != 0)); + if (l->l_target_cpu != NULL) { + KASSERT((l->l_pflag & LP_INTR) == 0); spc->spc_migrating = l; } } else