hello David. I'm still working on porting your ufs_rename patches to NetBSD-5.x, and I've made even more progress. Now, I can run without filesystem corruption of any kind without logging, or with WAPBL logging enabled. Softdep still isn't working, but I have a feeling I'm close to resolving that issue. However, I'm intermittently running into the same issue you reported in July. Namely, When runing your dirconc test with WAPBL logging enabled, I pretty reliably get: panic: lockdebug_barrier: holding 1 shared locks (curlwp = 0xcbc6fcc0)
Note that I've patched the lockdebug code to show me the relevant curlwp at the time of the panic. In loking at my crash dump with gdb, I find I have a question. This panic comes from subr_lockdebug.c, line 664, or there abouts, under the NetBSD-5.x sources. It says, in part, if (l->l_shlocks != 0) { panic("lockdebug_barrier: holding %d shared locks (curlwp = 0x%x)", l->l_shlocks, (unsigned int)l); } This is after it's checked for an actual lock structure, and just before it declares success. The arguments to the function in this case are all 0, meaning there's no actual lock to be checked, or, at least, I don't think there is. In fact, the call I'm losing things on is from: sys/sys/userret.h, line 104. That line reads: LOCKDEBUG_BARRIER(NULL, 0); I notice in earlier parts of subr_lockdebug.c, that l_shlocks gets set at the same time as ld->ld_shares. Does anyone know why, on this particular check, we do the check for a shared lock in the struct lwp without matching it up with an actual lock? In the backtrace below, you'll notice the call to lockdebug_barrier is passed 0's for arguments across the board, meaning all the code above the lines that panic are rendered moot, unless I'm not understanding something here in a big way, which is highly probabl. Also, everywhere else we manipulate l_shlocks, we do it while holding splhigh. We do this check, ans subsequent panic, without holding splhigh. Is it possible something's changing under us while we're still checking things out? I don't get the panic every time, and sometimes it takes a while to hit after I start the tests, but when it panics, the traces always look the same. Should this check really not fire if there isn't a matching ld structure to go with the lwp in question? Any light anyone could shed on these questions would be greatly appreciated. -thanks -Brian (gdb) target kvm netbsd.22.core #0 0xc050f8e2 in cpu_reboot (howto=256, bootstr=0x0) at /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c:924 924 /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c: No such file or directory. in /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c (gdb)kvm proc 0xcbc6fcc0 #0 0xc0448bdf in mi_switch (l=0xcbc6fcc0) at /usr/local/netbsd/src/sys/kern/kern_synch.c:765 765 /usr/local/netbsd/src/sys/kern/kern_synch.c: No such file or directory. in /usr/local/netbsd/src/sys/kern/kern_synch.c (gdb) bt #0 0xc0448bdf in mi_switch (l=0xcbc6fcc0) at /usr/local/netbsd/src/sys/kern/kern_synch.c:765 #1 0xc044594b in sleepq_block (timo=0, catch=false) at /usr/local/netbsd/src/sys/kern/kern_sleepq.c:269 #2 0xc0424f5c in cv_wait (cv=0xc122fcb0, mtx=0xca4fca10) at /usr/local/netbsd/src/sys/kern/kern_condvar.c:201 #3 0xc0492f52 in biowait (bp=0xc122fc18) at /usr/local/netbsd/src/sys/kern/vfs_bio.c:1515 #4 0xc04a665c in wapbl_doio (data=0xc135be00, len=512, devvp=0xca4fca10, pbn=1301769, flags=0) at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:745 ---Type <return> to continue, or q <return> to quit--- #5 0xc04a773f in wapbl_circ_write (wl=0xc1294700, data=0xc135be00, len=512, offp=0xcbc838c8) at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:800 #6 0xc04a7e59 in wapbl_flush (wl=0xc1294700, waitfor=0) at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:2000 #7 0xc03aea08 in ffs_sync (mp=0xcb38c644, waitfor=2, cred=0xcb926900) at /usr/local/netbsd/src/sys/ufs/ffs/ffs_vfsops.c:1823 #8 0xc049b44c in VFS_SYNC (mp=0xcb38c644, a=2, b=0xcb926900) at /usr/local/netbsd/src/sys/kern/vfs_subr.c:3064 #9 0xc04a2d9c in sys_sync (l=0xcbc6fcc0, v=0x0, retval=0x0) at /usr/local/netbsd/src/sys/kern/vfs_syscalls.c:825 ---Type <return> to continue, or q <return> to quit--- #10 0xc049c12e in vfs_shutdown () at /usr/local/netbsd/src/sys/kern/vfs_subr.c:2383 #11 0xc050f95b in cpu_reboot (howto=256, bootstr=0x0) at /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c:910 #12 0xc015f1e9 in db_sync_cmd (addr=-876070228, have_addr=false, count=-1, modif=0xcbc83a18 "\003ÚÀc") at /usr/local/netbsd/src/sys/ddb/db_command.c:1304 #13 0xc015f9a8 in db_command (last_cmdp=0xc0a4917c) at /usr/local/netbsd/src/sys/ddb/db_command.c:926 #14 0xc015fc22 in db_command_loop () ---Type <return> to continue, or q <return> to quit--- at /usr/local/netbsd/src/sys/ddb/db_command.c:583 #15 0xc0162b30 in db_trap (type=1, code=0) at /usr/local/netbsd/src/sys/ddb/db_trap.c:101 #16 0xc050a89b in kdb_trap (type=1, code=0, regs=0xcbc83c3c) at /usr/local/netbsd/src/sys/arch/i386/i386/db_interface.c:229 #17 0xc05125c3 in trap (frame=0xcbc83c3c) at /usr/local/netbsd/src/sys/arch/i386/i386/trap.c:351 #18 0xc010cb60 in calltrap () #19 0xc0508f7c in breakpoint () #20 0xc0463ac0 in panic ( ---Type <return> to continue, or q <return> to quit--- fmt=0xc09db7cc "lockdebug_barrier: holding %d shared locks (curlwp = 0x%x)") at /usr/local/netbsd/src/sys/kern/subr_prf.c:250 #21 0xc045dba5 in lockdebug_barrier (spinlock=0x0, slplocks=0) at /usr/local/netbsd/src/sys/kern/subr_lockdebug.c:664 #22 0xc05120c7 in syscall (frame=0xcbc83d48) at /usr/local/netbsd/src/sys/sys/userret.h:104 #23 0xc0100505 in syscall1 () (gdb) (gdb) (gdb) up #1 0xc044594b in sleepq_block (timo=0, catch=false) at /usr/local/netbsd/src/sys/kern/kern_sleepq.c:269 269 /usr/local/netbsd/src/sys/kern/kern_sleepq.c: No such file or directory. in /usr/local/netbsd/src/sys/kern/kern_sleepq.c [ . . .] (gdb) #20 0xc0463ac0 in panic ( fmt=0xc09db7cc "lockdebug_barrier: holding %d shared locks (curlwp = 0x%x)") at /usr/local/netbsd/src/sys/kern/subr_prf.c:250 250 /usr/local/netbsd/src/sys/kern/subr_prf.c: No such file or directory. in /usr/local/netbsd/src/sys/kern/subr_prf.c (gdb) #21 0xc045dba5 in lockdebug_barrier (spinlock=0x0, slplocks=0) at /usr/local/netbsd/src/sys/kern/subr_lockdebug.c:664 664 /usr/local/netbsd/src/sys/kern/subr_lockdebug.c: No such file or directory. in /usr/local/netbsd/src/sys/kern/subr_lockdebug.c (gdb) print l $1 = (struct lwp *) 0xcbc6fcc0 (gdb) print l->l_shlocks $2 = 1 (gdb) print ld $3 = (volatile struct lockdebug *) 0x0 (gdb) quit