On Wed, Oct 24, 2012 at 04:07:34PM +0200, Manuel Bouyer wrote: > Hello, > I just got this panic on a NFS server: > uvm_fault(0xfffffe9069ecf468, 0x0, 1) -> e > fatal page fault in supervisor mode > trap type 6 code 0 rip ffffffff804bd391 cs 8 rflags 10246 cr2 c8 cpl 0 rsp > fffffe817503b660 > panic: trap > cpu20: Begin traceback... > printf_nolog() at netbsd:printf_nolog > startlwp() at netbsd:startlwp > alltraps() at netbsd:alltraps+0x9e > ffs_fhtovp() at netbsd:ffs_fhtovp+0x55 > VFS_FHTOVP() at netbsd:VFS_FHTOVP+0x1c > nfsrv_fhtovp() at netbsd:nfsrv_fhtovp+0x9a > nfsrv_write() at netbsd:nfsrv_write+0x502 > nfssvc_nfsd() at netbsd:nfssvc_nfsd+0x1ce > sys_nfssvc() at netbsd:sys_nfssvc+0x22d > syscall() at netbsd:syscall+0xc4 > cpu20: End traceback... > > Does it ring a bell to someone ?
I forgot to add: it does to me, I think I debugged (and fixed) something similar in netbsd-5 ... http://mail-index.netbsd.org/tech-kern/2009/09/04/msg006026.html http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=41147 I think this analysis still holds: I can't see what would prevent vget() from returning a CLEAN vnode: if the nfsd thread, in vn_lock(LK_EXCLUSIVE), gets preempted between mutex_exit(vp->v_interlock) and VOP_LOCK(vp, (flags & ~LK_RETRY));, there is time for the vcleaner thread to start cleaning the vnode. Then nfsd sleeps in VOP_LOCK(), wgets woken up when the cleaner releases the exclusive lock and wins the race with the cleaner grabing the interlock. At this point VI_CLEAN is not set but VI_XLOCK is still set, but we check only for VI_CLEAN. When the nfsd releases the interlock the cleaner finish cleaning the vnode, and nfsd hits a NULL v_data in ffs_fhtovp. I think we should check for VI_XLOCK in addition to VI_CLEAN at the end of vn_lock(). What do you think ? -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --