On 10/6/06, Erik Trimble <[EMAIL PROTECTED]> wrote:
David Dyer-Bennet wrote:
> On 10/6/06, Nicolas Williams <[EMAIL PROTECTED]> wrote:
>
>> > >Maybe Erik would find it confusing.  I know I would find it
>> > >_annoying_.
>> >
>> > Then leave it set to 1 version
>>
>> Per-directory?  Per-filesystem?
>
> Whatever.  What's the actual issue here?
>
> I don't recall that on TOPS-20 it was possible to not version.  What
> you could do is set your logout.cmd file to purge your space down to
> one copy when you logged out.
But see, that assumes you have a logout-type functionality to use. Which
indeed is possible for command-line usage, but then only in a very
limited way.   During a typical session, I access almost 20 NFS-mounted
directories. And anyone using autofs/automount trees gets even more.
You're saying that my logout script has to know about all of them to
keep things clean?  That's unrealistic.  And that still doesn't solve
the problem of people who use SAMBA or NFS from machines which don't
have an interactive shell logout system (i.e. Windows).

Seems entirely realistic to me that your logout script would know
about the things you routinely use.  People who don't log into any
system are more of a problem, though.  Various things come to mind,
like having a default number of files (so it doesn't expand without
limits), and maybe a regular cron job; but I've never worked in an
environment doing versioning for non-login users over the network, so
they're all theory, no idea how they'd work in practice.


> This worked fine for the users I knew; even on a system that didn't
> have as much as a gigabyte of disk storage total to support a few
> dozen software engineers.
>
The problem is we are comparing apples to oranges in user bases here.
TOPS-20 systems had a couple of dozen users (or, at most, a few
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of
thousands.  Plus, the number of files being created under typical modern
systems is at least two (and probably three or four) orders of magnitude
greater.  I've got 100,000 files under /usr in Solaris, and almost 1,000
under my home directory.  And I don't have anything significant in my
/home (no source code, no build/test trees, just misc business stuff).
What is managable with a few files quickly becomes unwieldy with more
than a few dozen.

I have to ask again -- is this theory?  Or have you actually worked on
a versioning filesystem?  And specifically on TOPS-20?  (I remember,
vaguely, that people found VMS versioning MUCH less comfortable to
work with than TOPS-20, and I don't know at this distance if that was
just because it was different, or because of subtle UI differences).

I don't think the number of files under /usr is relevant; how often do
you edit them by hand?  I'd expect an installation procedure to clean
up old versions when it was done installing new software; but if not a
simple purge would settle the matter.

I don't recall my directories having much fewer files then than now.
I have more *directories* now, but the number of files in a directory
is set by human issues and by development process issues, not by disk
space available.

This is what Nico and I are talking about:  if you turn on file
versioning automatically (even for just a directory, and not a whole
filesystem), the number of files being created explodes geometrically.

I don't see it; new versions are created *when you do something* to a
file; not from the file just sitting there.  And the number of files I
poke in a day, again, isn't controlled much by the disk space
available, it's controlled by *my time*, and so has stayed more
constant over the years.

>> > The above should be simple to do however -- a program does an open of
>> > a file name "foo.bar".  ZFS / the file system routine would use the
>> > most recent version by default if no version info is given.
>>
>> How can version information be given without changing the APIs or
>> putting the version number/string into the file name?
>
> The version number is part of the file name in all the examples I know
> about.  I'd find it useless without that; it has to be a real part of
> the filesystem, usable by everybody, not a special addon accessible
> only with one or two dedicated applications.
>
>> Putting the version number/string into the file name is hard for me to
>> accept.  It's what would lead to polluting my directories.
>
> Set your ls default to not show versions.  Isn't the problem then
> solved?  Maybe add that option to the GUI filesystem explorer as well.
>
But this requires modifying all the relevant apps, which is the same
amount of work as modifying them to use a new FV API.  It's not
transparent to the end-user.

I think the relevant apps are very different in the two cases.  File
listing tools are much rarer than file using tools, and in my case you
only need to modify the file listing tools.  In your case, you have to
modify every single file using tool.

> In practice, it never was a problem that I noticed, or that other
> people noticed.  And remember that this was on slower systems with
> smaller screens and often rather slower screen update.
>
> Do you not like the idea based on theory, or did you actually use
> TOPS-20 for a while and find the versioning troublesome?
>
Putting the file version number as part of the file name breaks things.
Apps unaware of the special significance of this format will tend to
write similar names, which can screw everything royally.

Example:

Say we use <file>;<version>

In emacs, I edit FOO:2

it will write out a temp file "FOO:2~".  So, how does the FS deal with
this the next time they need to create a new version?

Whatever.  None of the choices are a disaster.   None of them "break"
anything.  I essentially never have to look at these, any version of
them, so it doesn't matter very much what their names are.

Possibly some clever definitions of how things are handled could make
the results cleaner, and that's worth looking at, but the worst
results I can imagine from this scenario are unimportant, they don't
hurt anything.

The problem lies in that under VMS, the ';' was a special character, and
unusable in normal naming. I suspect a similar situation exists under
TOPS-20.  No such luck in a POSIX filesystem - all printable (and many
unprintable) characters are valid for use in filenames. So you _CAN'T_
use them to deliniate File Versioning, without risking blowing the
entire scheme when some random app decides to either use your FV marker
for its own needs, or something similar to the emacs case above.

This is theory again.  In practice, there aren't such schemes in use
anywhere I can find.  If there are, yes, some file-versioning schemes
would break them, and those apps would have to be updated.

A theoretically clean approach is desirable, but an approach that
actually works is more important.  An approach that requires programs
to be updated before they can use file versioning doesn't, by my
standards, "work"; I wouldn't be able to use it with the files and
applications it's valuable to me for any time soon.

When you talk about a new API for versioning -- how do you envision
information being conveyed from the command lines of programs to this
new API?  Isn't it likely that it would end up becoming a part of file
name syntax, and changing the rules about allowable characters in
filenames?  And in that case, you can make the whole change in the
"open" and "link" calls, and get the same end effect.

>> > one UI is the command line shell
>>
>> Indeed!  And command-line tools, like ls(1), find(1), etc...
>>
>> What I'm saying is that I'd like to be able to keep multiple versions of
>> my files without "echo *" or "ls" showing them to me by default.
>
> And I find that completely unacceptable; useless.  The whole point of
> putting versioning in the filesystem is that that makes it accessible
> to all programs.
>
But, because of the explosion in the number of files, you CAN'T
automatically show all versions. Users will NEVER accept this. The only
clean way to do this is to show file versions only upon request. Not by
default.

Is this theory, or do you have some experience to support it?  You say
"can't"; I'm not at all worried about it, myself.  I've worked in
these environments, and liked it very much.  I've watched new people
get introduced to them.  People like this when they see it
well-implemented.

I don't accept your assertion that directories people edit files in
have more files in them today than they used to, in general.  I also
don't accept the assertion that the number of extra versions scales
with the number of files in the directory -- it scales with the number
of files you re-write in the directory, which is limited more by human
working speed and time in the day, not by number of files there.

>> > >What if an application deals in multiple files?
>> >
>> > so?
>>
>> So, file versions aren't useful unless the application explicitly
>> decides tells the OS when to make them.
>
> File versions are created when a file is created.  In the scenario
> where, today, an existing file would be overwritten (deleted), instead
> the old file is kept and the new file is given the version number +1
> of the old file.
>
>> Similarly with applications that keep files open but keep writing
>> transactions in ways that the OS can't isolate without input from the
>> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
>> fsync(2)s would result in no useful versioning.
>
> None of those are candidates for file versioning, and a darned good
> thing, too.

Honestly, as far as file versioning goes, the time to make a new version
is when calling open() with the appropriate arguments to allow for
append or modification. You obviously don't want to create a new version
if you are only opening a file for read-only access, and changing
version on fsync() is ludicrous, and on close() doesn't differentiate
between a file which has been modified or not.

Yes, versioning is a file-create feature.

Given this, we're back into the problem FV is supposed to solve.   It is
entirely possible for an editor to keep open a file for a long time,
periodically writing out your changes without issuing a new open().

You describe this as a problem, but *I* see it as the exact thing that
makes file versioning useful.  It DOESN'T save random magically chosen
moments; it saves exactly all the version that *you*, the user, saved
at some point of the editing session.

Word with auto-save turned off is a prime example.   Given this, you've
only created a new version when you first load the document, and all
your intermediary changes are lost, since it only saves the document on
close().

You're forgetting that the user, unless he's stupid, will save
regularly during the editing session.

Thus, in order to get benefits from FV, your editor must
issue periodic close() and open() commands on the same file, as you
edit, all without your intervention.  Exactly how many editors do this?
I have no idea.  So, the only way to enable FV is to require the user to
periodically push the "Save" button. Which is how much more different
than the current situation?

It is completely and utterly different from the current situation.  In
the current situation, when I type the "save" command *I am deleting a
previous version*.  That's dangerous, because people don't think of it
as performing a destructive operation, and hence don't give it the
care and consideration they give to an explicit "rm".  And that's
precisely what file versioning fixes; saving a file is no longer a
destructive operation.
--
David Dyer-Bennet, <mailto:[EMAIL PROTECTED]>, <http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to