Hello, some time ago I reported that a NFS server, rock solid running a 5.0_STABLE kernel from february 2010, will lock up when upgraded to 5.1_STABLE. I couldn't reproduce this on a testbed but I got the opportunity to try again the faultly kernel on a production server and I can now report how it deadlocks.
The root of the cause is a NFS process closing a socket: db{0}> tr/a 0x00000000d01585c0 trace: pid 232 lid 19 at 0xd01dc9ac sleepq_block(1,0,c063c259,c06a513c,c2027108,1,c0723d00,0,1,2df948) at netbsd:sleepq_block+0xeb kpause(c063c259,0,1,0,c0715eca,c0387e83,c569f858,1000,d4263000,c569f800) at netbsd:kpause+0xe3 uvm_unloanpage(c569f858,1,2,c0123b65,c569f800,0,1000,0,c58902d8,4dfabc08) at netbsd:uvm_unloanpage+0x69 sodopendfreel(c0715ec8,0,d01dcabc,c03880a2,c58902d8,6,0,0,0,0) at netbsd:sodopendfreel+0xe1 sodopendfree(c58902d8,6,0,0,0,0,d01dcadc,cd5a4f80,0,c58902d8) at netbsd:sodopendfree+0x2a sodisconnect(c58902d8,d01585c0,7,c0723e60,d6fdd700,4dfabc08,c0723e60,c032c4c2,d6fdd700,d6fdd700) at netbsd:sodisconnect+0x62 soclose(c58902d8,0,d01dcb4c,c031a3cd,d6fdd700,c0724060,d01dcb2c,0,c0272501,d01585c0) at netbsd:soclose+0x1b9 closef(d6fdd700,c02737e4,0,d4261d40,d4261d40,d01d93cc,d01dcbec,c027313d,d4261d40,d01d93cc) at netbsd:closef+0x5d nfsrv_slpderef(d4261d40,d01d93cc,d01dcbd4,d01dcbda,c07277a0,c0724060,0,c06a7b18,c0724060,0) at netbsd:nfsrv_slpderef+0x9e nfssvc_nfsd(d01dcc38,b8fffc28,d01585c0,0,0,0,c0724002,c0368cd4,0,d01585c0) at netbsd:nfssvc_nfsd+0x1cd sys_nfssvc(d01585c0,d01dcd00,d01dcd28,0,d0153c80,0,0,4,b8fffc28,7c) at netbsd:sys_nfssvc+0x332 syscall(d01dcd48,b3,ab,1f,1f,0,b8e00000,b8fffcac,b8fffc28,4) at netbsd:syscall+0 The page passed to uvm_unloanpage() has a uobject, and for some reason the lock was not available when uvm_unloanpage() tried to get it, so it went to kpause. Note that the lock is now free: db{0}> x/x c569f858 0xc569f858: c2027108 db{0}> show page c2027108 PAGE 0xc2027108: flags=c<TABLED,CLEAN>, pqflags=200<PRIVATE2>, wire_count=0, pa=0x55ef8000 uobject=0xd38d80b8, uanon=0x0, offset=0x0 loan_count=2 [page ownership tracking disabled] (gdb) print &((struct uvm_object*)0xd38d80b8)->vmobjlock $3 = (kmutex_t *) 0xd38d80b8 db{0}> sh lock 0xd38d80b8 lock address : 0x00000000d38d80b8 type : sleep/adaptive initialized : 0x00000000c03a0746 shared holds : 0 exclusive: 0 shares wanted: 0 exclusive: 0 current cpu : 0 last held: 2 current lwp : 0x00000000cd5a7c80 last held: 000000000000000000 last locked : 0x00000000c03a121e unlocked : 0x00000000c03a15d9 owner field : 000000000000000000 wait/spin: 0/0 Turnstile chain at 0xc0723ce0. => No active turnstile for this lock. It looks like this is a vnode interlock: 0xc03a121e is in vrelel (/home/src/src-5/src/sys/kern/vfs_subr.c:1497). 1492 * 1493 * Note that VOP_INACTIVE() will drop the vnode lock. 1494 */ 1495 VOP_INACTIVE(vp, &recycle); 1496 mutex_enter(&vp->v_interlock); 1497 vp->v_iflag &= ~VI_INACTNOW; 1498 cv_broadcast(&vp->v_cv); 1499 if (!recycle) { 1500 if (vtryrele(vp)) { 1501 mutex_exit(&vp->v_interlock); (gdb) l *(0x00000000c03a15d9) 0xc03a15d9 is in vrelel (/home/src/src-5/src/sys/kern/vfs_subr.c:1567). 1562 } else { 1563 vp->v_freelisthd = &vnode_free_list; 1564 } 1565 TAILQ_INSERT_TAIL(vp->v_freelisthd, vp, v_freelist); 1566 mutex_exit(&vnode_free_list_lock); 1567 mutex_exit(&vp->v_interlock); 1568 } 1569 } (side question: is it OK to put a vnode on a free list, while a loan is still active ?) soclose() got the socket lock, which happens to be the softnet_lock (from a quick look, this looks OK; I couldn't find where in nfs or tcp the socket's lock would be changed to somtheing else but softnet_lock). So we are kpause()ing with softnet_lock held, and this is bad because the cpu's softint thread can run, call one of the nfs or tcp timer routine and sleep on the softnet_lock: 19 3 0 4 d01585c0 slave livelock 46 3 6 204 cd5bd860 softclk/6 tstile 5 3 0 204 cd5a7500 softclk/0 tstile db{0}> tr/a cd5bd860 trace: pid 0 lid 46 at 0xcece0bc0 sleepq_block(0,0,c0647bf4,c06a4a9c,cd5a4f80,c0724000,ced28a78,dc,40,1000001) at netbsd:sleepq_block+0xeb turnstile_block(ced28a60,1,cd5a4f80,c06a4a9c,cec6724c,cece4000,cece0c80,ced28a60,0,0) at netbsd:turnstile_block+0x261 mutex_vector_enter(cd5a4f80,cece4000,cece0ce0,0,c0349cc2,cd5bd860,7,cece4000,cece4000,10c) at netbsd:mutex_vector_enter+0x317 tcp_timer_rexmt(c573e408,cd5bd860,7,cece4000,cecd6074,cece4060,cece4868,cece5068,cece5868,c0135fb0) at netbsd:tcp_timer_rexmt+0x1f callout_softclock(0,10,30,10,10,0,156320,d01cca6c,30,cece0010) at netbsd:callout_softclock+0x25f softint_dispatch(cd5bb0c0,2,0,0,0,0,cece0d90,cece0b68,cece0bc0,0) at netbsd:softint_dispatch+0x9f And now that our softclk are sleeping waiting for the lock held by the nfsd in kpause, the kpause will never be woken up and the kernel is locked. Now the question is why this doesn't happen with the feb 2010 5.0_STABLE kernel. nfsd_use_loan is also set to 1 in this kernel. One thing that maybe is relevant because it defers more things to a softnet_lock callout is tcp_input.c 1.291.4.3. I guess this can happen with any TCP application using page loan, not only nfsd. Any idea on how to properly fix this ? a workaround could be to use yield() in uvm_loan.c because it would not require a clock tick to wake up. I'm not sure if it's possible to drop the socket lock before unloaning the pages. But I wonder if this could be a more general issue with callouts. Maybe we should have one thread per callout, a la fast_softint, which is used when a callout needs to sleep ? -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --