On Dec 27, 2017, at 11:14 AM, Simon Slavin <[email protected]> wrote:
>
> On 27 Dec 2017, at 4:05pm, Warren Young <[email protected]> wrote:
>
>> DVCSes...by their very nature...want to clone the entire history of the
>> whole project to every machine, then make a second copy of the tip of each
>> working branch
>
> Apple recently moved to APFS, a file system which supports file and folder
> cloning. If you copy a file or folder it doesn’t duplicate the data, it just
> creates a pointer that points to the existing copy. However, if you then
> change one of the copies (e.g. change one byte of a huge file) it makes a new
> copy (of the affected sectors) at that point, so that only that one copy of
> the file has changed.
In addition to da Silva’s point about needing to use the OS-specific API to do
this,[1] we couldn’t easily use it in Fossil anyway, for multiple reasons:
1. One of the duplicates is in the repository clone, which is
delta-compressed,[2] and thus is not in the exact same form as when checked
out, so you’ll still have at least two near-duplicate copies. Only a more
primitive version control system like RCS, CVS, or *maybe* Subversion which
really does keep pure duplicates hidden away could get you around this problem.
2. Even if delta compression were disabled on purpose for some files, the
repository copy is stuck in Fossil’s SQLite database file. It is not an
independent file that the filesystem could track anyway.
3. In the Fossil model, checkouts are independent. (And this is one of the
best things about Fossil relative to Git.) Although Fossil keeps track of
which repositories you’ve got open on your system so that “fossil all” can
work, it currently makes no logical or filesystem ties between these
independent checkouts, which it would need to do to make use of these
OS-specific file cloning APIs you talk about. Example:
$ mkdir ../x ; cd ../x ; fossil open ~/museum/x.fossil trunk
$ mkdir ../y ; cd ../y ; fossil open ~/museum/x.fossil y-branch
It is almost certainly the case that some of the files in x and y are identical
so that those files could be cloned from those in the other at the filesystem
level by making one of these OS-specific API calls, but it would require a lot
of bookkeeping on Fossil’s part to pull this off. And at the end of the day,
you’d only be getting the feature on macOS 10.13+ boxes.
Git isn’t much better on these points:
1. Git also uses delta compression in the repo.
2. Git’s “pile of files” repo format still keeps most repo data in “packfiles”,
which are not loose independent copies of the checked-out files.
3. This is the only place where Git’s design helps, because its git-worktree
feature is a cheesy hack compared to the way Fossil separates the checkout and
repository clone.[3] This design inherently keeps track of which repos are
linked to which others, so the bookkeeping of figuring out which files to call
clonefile(2) or similar on would be easier. It’s a high cost to pay to save
some disk space, though.
[1]: https://developer.apple.com/library/content/samplecode/APFSCloning/
[2]:
https://fossil-scm.org/index.html/doc/trunk/www/delta_encoder_algorithm.wiki
[3]:
https://www.mail-archive.com/[email protected]/msg25686.html
> I understand that ZFS does this too, though I’ve never used ZFS.
I’ve used ZFS for years, so I can tell you that in almost every way, ZFS is
greatly superior to APFS. One of the few ways where APFS is superior is in
this clonefile(2) syscall. There is no equivalent under ZFS: cloning is done
at the filesystem level, not per-file.
Apple’s model is traditional “applications,” and they want “File > Save As” to
make copies where possible, rather than duplicate most of the bytes on disk.
ZFS’s model is snapshots and clones of entire filesystems. A single file can
then be modified and only the updated blocks are tracked separately from the
blocks that all clones share, but I don’t believe you can simply tell the
filesystem to clone a file under a new name, short of link(2).
And no, we can’t just use link(2) for this, because the two names only refer to
separate sets of block data on disk when the application makes a copy of the
file, rather than rewriting the data in place. So, if you edit a source file
with a text editor that rewrites in place, all linked versions get changed,
which is almost certainly not what you want when you’re using it to save space
between independent VCS repository checkouts. ZFS and APFS style cloning have
different semantics, more appropriate to this proposed usage.
Linux got something similar in kernel 4.5, ioctl(FIDEDUPERANGE),[4] which
incidentally shows the problem we face here: without a standard, applications
have to be coded for each OS specifically. I don’t know if ZFS-on-Linux
understands this ioctl yet; last I heard, the answer was, “no,” but that might
have changed. (This ioctl was originally a btrfs-specific feature, but was
generalized for other filesystems in Linux kernel 4.5.)
I seem to recall that FreeBSD was talking about adding a similar syscall, but
then you probably still wouldn’t get it on Solaris, OpenZFS-on-Linux, O3X… Or,
you’d get it, but then years late, as we see with built-in crypto.
And even if one were to add such a feature to Fossil making use of the macOS
10.13+, Linux 4.5+, and FreeBSD.next syscalls for this, you’re probably still
only covering about 2% of the client systems Fossil is currently used on.
[4]: http://man7.org/linux/man-pages/man2/ioctl_fideduperange.2.html
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users