Hi I have encountered a bug with NetBSD NFS client. Despite a mount with -o intr,soft, we can hit situation where a process can remain hang in kernel because the NFS server is gone.
This happens when the ioflush does its duty, with the following code path: sync_fsync / nfs_sync / VOP_FSYNC / nfs_fsync / nfs_flush / VOP_PUTPAGES VOP_PUTPAGES has flags = PGO_ALLPAGES|PGO_FREE. It then goes through genfs_putpages and genfs_do_putpages, and get stuck in: /* Wait for output to complete. */ if (!wasclean && !async && vp->v_numoutput != 0) { while (vp->v_numoutput != 0) cv_wait(&vp->v_cv, slock); } This cv_wait() is tiemout-less and uninterruptible. ioflush will sleep there forever, holding vnode lock. Any other process doing I/O on the filesystem will sleep in tstile waiting for the vnode lock with this path: sys_write / dofilewrite / vn_write / vn_lock / VOP_LOCK / rw_enter We have another timeout-less and uninterruptible wait for the vnode lock, which means -o intr,soft are not honoured. If the NFS server does not come back, the only way out is reboot -n. Even umount -f -R will get hung in tstile. How can we fix it? 1) ioflush should not sleep forever awaiting I/O completion for a NFS mount if it was mounted with -o soft. A PGO_SOFT flags could be added to VOP_PUTPAGES so that cv_timedwait() is used instead of cv_wait(), but how can we get the timeout? Should we introduce a VOP_PUTPAGES2 with an addtionnal argument? Use a sane default? Get it from the filesystem using a new VFS_GETTIMEOUT method? (or more general VFS_GETMNTINFO which would be able to query different informations). 2) Honouring -o intr seems to require either the introduction of a real nfs_lock (currently it is genfs_lock), or a change to genfs_lock. The goal is to create an interruptible sleep for vp->v_lock. How can this be achieved? We have no rw_(try)enter_sig, should we introduce it? Or should we loop sleeping in an interruptible sleep retrying at regular intervals? And how can a -o soft 's timeout could be hnoured here? Last question: is there any hope to get this fixed in netbsd-7, or did the VFS interface changed too much? -- Emmanuel Dreyfus m...@netbsd.org