FWIW, I have had a problem with my server getting stuck in "tstile". I could not reproduce the problem easily, but I saw it in production often enough that it was a headache. The Intel port (as opposed to PPC) seems not to have the problem.
If there is no timeout on this loop, and it theoretically only has a problem on HW errors, I have doubts. The machine with the hangs does not have any other symptoms of HW errors. HOWEVER, I have a persistent suspicion that the PPC port drops interrupts on occasion. Just sayin. If this hang happens, I think a panic is far better than a hang. What I would see is the machine lock up hard, with zillions of processes "stuck" in tstile, and no new procs could start. If I caught this early, I could get a couple of ps outputs done. Otherwise, I could get into the kernel debugger - sometimes. Just my opinion. -dgl- On Jun 20, 2015, at 3:27 PM, Christos Zoulas <chris...@zoulas.com> wrote: > On Jun 20, 10:09pm, m...@netbsd.org (Emmanuel Dreyfus) wrote: > -- Subject: Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr > > | Christos Zoulas <chris...@astron.com> wrote: > | > | > Ok, what is it supposed to do? Does it fail? Give up? Get interrupted > | > and keep looping? > | > | The process stuck in tstile waiting for vnode lock is a consequence of > | the initial problem: ioflush stuck in cv_wait(). > | > | What about this: we introduce a mnt_timeo in struct mount, and use > | cv_timedwait() instead of cv_wait() in genfs_do_putpages(). If timeout > | expires, we get a failure: the page was not put to storage. > | > | mnt_timeo should have a sane default (which one) for all filesystems, > | not only NFS: that way we fix process stuck in tstile because of hardare > | failure (I already saw that). For NFS we use the NFS timeout. > | > | That way ioflush never holds a vnode lock forever, and umount -f should > | work. > | > > This is not that simple. There is at least one more place where > it does while (vp->v_numoutput != 0) cv_wait().. And I am not > sure what happens if you make VOP_PUTPAGES timeout. > > christos