In article <20150619083656.gt19...@homeworld.netbsd.org>, Emmanuel Dreyfus <m...@netbsd.org> wrote: >Hi > >I have encountered a bug with NetBSD NFS client. Despite a mount with >-o intr,soft, we can hit situation where a process can remain hang in >kernel because the NFS server is gone. > >This happens when the ioflush does its duty, with the following code path: >sync_fsync / nfs_sync / VOP_FSYNC / nfs_fsync / nfs_flush / VOP_PUTPAGES > >VOP_PUTPAGES has flags = PGO_ALLPAGES|PGO_FREE. It then goes through >genfs_putpages and genfs_do_putpages, and get stuck in: > > /* Wait for output to complete. */ > if (!wasclean && !async && vp->v_numoutput != 0) { > while (vp->v_numoutput != 0) > cv_wait(&vp->v_cv, slock); > } > >This cv_wait() is tiemout-less and uninterruptible. ioflush will >sleep there forever, holding vnode lock. Any other process doing >I/O on the filesystem will sleep in tstile waiting for the vnode >lock with this path: >sys_write / dofilewrite / vn_write / vn_lock / VOP_LOCK / rw_enter
Yes, but ioflush is not a user process... An interruptible mount means that a user process can interrupt a syscall doing an NFS operation. No other operating system I know of, takes this to mean that you can unmount the filesystem or make delayed writes abort and fail. Having said that, yes it is a problem that you need to reboot because an NFS server is gone, and we should make umount -f work properly in that case. I don't think that we should introduce umount -l (like linux) unless there is a compelling reason to do so. christos