On Tue, 31 May 2011, Rick Macklem wrote:

It is only an optimization. Any number of syncs are useless for
actually syncing the system, since sync(2) only does an async sync (it
returns without waiting for most writes to complete). As you pointed
out later in this thread, unmount(2) does a sync that works -- a sync
sync -- before doing the unmount proper. It does this irrespective of
the force flag:

% if (((mp->mnt_flag & MNT_RDONLY) ||
% (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0)
% error = VFS_UNMOUNT(mp, flags);

The force flag doesn't mean that i/o's are forced to complete. It
only means that open files are forced to be closed (and a few related
things). This can be seen here. We wait (potentially forever) for
any existing writes to complete. Only then do we look at the force
flag and bale out if the sync failed and the force flag is _not_ set.
Most i/o problems will cause hangs in this sync, and the force flag
won't help. But if we somehow get past this sync, and have i/o

The sync should actually return an error (after syncing everything
that doesn't hang).  Perhaps it actually does.  Then we have got past
the sync.  But for nfs, it will tend to hang just detecting the i/o
errors (need large timeouts which are not very different from hangs).

problems, then honoring the force flag and continuing gives even more
fragility, since we have almost passed the point of being able to give
up after an error (this point is typically very early in
VFS_UNMOUNT()).

Well, in vgonel(), which is called from vflush() and seems to be what
gets rid of the vnode even when open for forced dismounts, there is:

   /*
    * Clear out any buffers associated with the vnode.
    * If the flush fails, just toss the buffers.
    */
   - followed shortly by
   if (vinvalbuf(vp, V_SAVE, 0, 0) != 0)
            vinvalbuf(vp, 0, 0, 0);

It seems to me that this code is willing to give up when it can't write
dirty buffers, but???

That looks like a hack to me (collateral from vfs_bio.c 1.196).  In
4.4BSD-Lite, the corresponding code is in vclean() and it is just:

%       /*
%        * Clean out any buffers associated with the vnode.
%        */
%       if (flags & DOCLOSE)
%               vinvalbuf(vp, V_SAVE, NOCRED, p, 0, 0);

In 4.4BSD, the buffers would have been tossed at the bwrite() level
before we got here.  Now bwrite() is more careful, and we are a bit
more careful here -- we actually check if vinvalbuf() succeeded and
clean up if it didn't.  There must be a place that tosses buffers
with unrecoverable write errors, and forced unmounts is a good place.
But vgonel() is a bad place, since it has a much wider scope than
forced unmounts.  I think vgonel() is normally called when a vnode
is recycled.  So we can have a vnode unhappily holding buffers with
unrecoverable write errors for several minutes, with syncs trying
to flush them every ~30 seconds, but then when the vnode is recycled
for some reason the above forcibly tosses its buffers with no further
error messages.  Similarly for recoverable write errors (maybe an
nfs server that is down for longer than the recycling time).

To me, a forced dismount of an NFS mount that just wedges on an
unresponsive server isn't very useful. The only case I can think

I agree.  My point is that no one tried hard to make forced unmounts
very forceful, so making it work is even harder than might first appear.
The problem is related to making removed removable devices work properly.

of where a forced dismount that isn't willing to throw away dirty
buffers might be useful would be when an app. (userland process)
is for some reason unkillable and is sitting with an open NFS file.
Is this a problem in practice? (I do know that a mount stuck on an
unresponsive server is a real problem. There have been several PRs,
the most recent being kern/157365.)

The server can easily crash while the client has dirty buffers.

I brought up the concept of a "hard forced dismount" at the developer
summit a year ago, but no one seemed to see a difference between that
and what they believed a forced dismount already did. After that, I
thought about it a little and couldn't see much use for a "soft forced
dismount that fails instead of throwing away dirty buffers, but does
close down open files".

It's useful to be able to forcibly close all open files, even ones open
for writing or with dirty buffers (there can also be dirty buffers for
closed files).  I think umount -f flag gives this, and I wouldn't want
to to also discard dirty buffers on devices where the buffers are
hopefully-transiently unwritable.  Discarding of unwritable buffers
really has little to do with unmount -- if there is 1 with an i/o
error that I know is unrecoverable, then I would like to discard it
immediately instead of letting the system sit there and try to flush
it quite often.

- For example, suppose a userland process is writing a file through
 the stdio library:
 A - fopen()s a file for writing
 B - does fwrite()s of small amounts of data
 C - does an fflush()

If the open is closed by a forced dismount before C, the written data
will be lost, since it isn't even in the kernel yet. If the file is on
a responsive NFS server, it will be written to the server shortly after C.
So, there seems to be a narrow window around C where the data will be lost
for a forced dismount that throws away dirty buffers vs one that doesn't
"but only if the server is responsive".

In other words, "Is there much difference between ripping the open files
out from under a app. before it does a write syscall vs throwing the data
away when it's in the kernel buffer but not yet written to an NFS server?"

Not much, sinc a careful application should do an fsync() as well as the
above and not depend on the data reaching the file system until fsync()
returns successfully, which it must not do if the kernel discarded the
data.  [But the tossing in vgonel() seems to break this: most write()s
are async so they can't return an error.  Errors live in the buffers.
But when buffers are tossed, don't we lose all trace of the errors?
Then a subsequent fsync won't see any errors and will return success.]

Anyhow, the way I now have the NFS clients, they do a "hard forced dismount"
where they can throw away dirty buffers not yet written to the server.
Recent patches to nfs_sync() make it return without attempting writes for
the forced dismount case and the vgonel() { see above code snippet } throws
away the data when the first vinvalbuf(.., V_SAVE,.) fails.

Did you have to change VFS_SYNC() to pass the forced-dismount flag?  I
don't like that, but see the problem: VFS_SYNC(XXX_WAIT) should normally
just try to sync everything, but return an error if this is impossible,
but for an nfs server it would take too long to determine that it is
impossible.

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to