> On 11 Dec 2016, at 22:33, Nick Hudson <sk...@netbsd.org> wrote: > > On 12/11/16 21:05, J. Hannken-Illjes wrote: >>> On 11 Dec 2016, at 21:01, David Holland <dholland-t...@netbsd.org> wrote: >>> >>> On a low-memory machine Nick ran into the following deadlock: >>> >>> (a) rename -> vrele on child -> inactive -> truncate -> getblk -> >>> no memory in buffer pool -> wait for syncer >>> (b) syncer waiting for locked parent vnode from the rename <snip> >> Where is the syncer waiting for the parent? > > db> bt/a ffffffff8ff28060 > pid 0.37 at 0x9800000410960000 > 0x9800000410961bb0: kernel_text+dc (0,0,0,0) ra ffffffff803ad484 sz 0 > 0x9800000410961bb0: mi_switch+1c4 (0,0,0,0) ra ffffffff803a9ef8 sz 96 > 0x9800000410961c10: sleepq_block+b0 (0,0,0,0) ra ffffffff803b8edc sz 64 > 0x9800000410961c50: turnstile_block+2e4 (0,0,0,0) ra ffffffff803a487c sz > 96 > 0x9800000410961cb0: rw_enter+17c (0,0,0,0) ra ffffffff8044862c sz 112 > 0x9800000410961d20: genfs_lock+8c (0,0,0,0) ra ffffffff8043fd60 sz 48 > 0x9800000410961d50: VOP_LOCK+30 (ffffffff8049d4c8,2,0,0) ra > ffffffff80436c8c sz 48 > 0x9800000410961d80: vn_lock+94 (ffffffff8049d4c8,2,0,0) ra > ffffffff803367d8 sz 64 > 0x9800000410961dc0: ffs_sync+c8 (ffffffff8049d4c8,2,0,0) ra > ffffffff80428f4c sz 112 > 0x9800000410961e30: sched_sync+1c4 (ffffffff8049d4c8,2,0,0) ra > ffffffff80228dd0 sz 112 > 0x9800000410961ea0: mips64r2_lwp_trampoline+18 (ffffffff8049d4c8,2,0,0) ra > 0 sz 32 > > > >> Which file system? > > ffs
Looks like a bug introduced by myself. Calling ffs_sync() from the syncer (MNT_LAZY set) will write back modified inodes only, fsync is handled by individual synclist entries. Some time ago I unconditionally removed LK_NOWAIT from vn_lock(). Suppose we need this patch: RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.341 diff -p -u -2 -r1.341 ffs_vfsops.c --- ffs_vfsops.c 20 Oct 2016 19:31:32 -0000 1.341 +++ ffs_vfsops.c 12 Dec 2016 09:45:17 -0000 @@ -1918,5 +1918,6 @@ ffs_sync(struct mount *mp, int waitfor, while ((vp = vfs_vnode_iterator_next(marker, ffs_sync_selector, &ctx))) { - error = vn_lock(vp, LK_EXCLUSIVE); + error = vn_lock(vp, LK_EXCLUSIVE | + (waitfor == MNT_LAZY ? LK_NOWAIT : 0)); if (error) { vrele(vp); Is it reproducible so you can test it? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)