Chuck Silvers <c...@chuq.com> wrote: > we shouldn't need to change the genfs code to make "soft" work. > if the underlying RPCs time out and all the retries are exhausted, > the NFS code should report the error back to the genfs code by doing > the usual B_ERROR/b_error thing with the buffer, and the genfs code > should handle that by unlocking pages, etc, just like it would for > a failed write to a scsi or ata device, and eventually that should > percolate back up the stack until the cv_wait() returns. > does this not work currently?
It almost works with a few fixes. I tried without touching genfs beyond changing cv_wait() into cv_timedwait() with a debug message, like this: /* Wait for output to complete. */ if (!wasclean && !async && vp->v_numoutput != 0) { - while (vp->v_numoutput != 0) - cv_wait(&vp->v_cv, slock); + while (vp->v_numoutput != 0) { + int cv_error; + + cv_error = cv_timedwait(&vp->v_cv, slock, 2 * hz); + if (cv_error) { + printf("%s: failed to complete I/O on %s, " + "vp = %p, numoutput = %d, error = %d\n", + l->l_name? l->l_name : l->l_proc->p_comm, + vp->v_mount->mnt_stat.f_mntonname, + vp, vp->v_numoutput, cv_error); + } + + } } onworklst = (vp->v_iflag & VI_ONWORKLST) != 0; mutex_exit(slock); I have also changes in NFS code so that RPC on soft mounts can timeout and set bp->b_error. What happens here is that we loop in the while (vp->v_numoutput != 0) block with vp->v_numoutput draining down to 2 and then we loop forever with this value. I note we have this in genfs_do_io(), and I suspect this is the same 2 value: if (iowrite) { mutex_enter(vp->v_interlock); vp->v_numoutput += 2; mutex_exit(vp->v_interlock); } mbp = ge Why the vp->v_numoutput += 2 ? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org