On Dec 27, 2017, at 11:14 AM, Simon Slavin <slav...@bigfraud.org> wrote:
> 
> On 27 Dec 2017, at 4:05pm, Warren Young <war...@etr-usa.com> wrote:
> 
>> DVCSes...by their very nature...want to clone the entire history of the 
>> whole project to every machine, then make a second copy of the tip of each 
>> working branch
> 
> Apple recently moved to APFS, a file system which supports file and folder 
> cloning.  If you copy a file or folder it doesn’t duplicate the data, it just 
> creates a pointer that points to the existing copy.  However, if you then 
> change one of the copies (e.g. change one byte of a huge file) it makes a new 
> copy (of the affected sectors) at that point, so that only that one copy of 
> the file has changed.

In addition to da Silva’s point about needing to use the OS-specific API to do 
this,[1] we couldn’t easily use it in Fossil anyway, for multiple reasons:

1. One of the duplicates is in the repository clone, which is 
delta-compressed,[2] and thus is not in the exact same form as when checked 
out, so you’ll still have at least two near-duplicate copies.  Only a more 
primitive version control system like RCS, CVS, or *maybe* Subversion which 
really does keep pure duplicates hidden away could get you around this problem.

2. Even if delta compression were disabled on purpose for some files, the 
repository copy is stuck in Fossil’s SQLite database file.  It is not an 
independent file that the filesystem could track anyway.

3. In the Fossil model, checkouts are independent.  (And this is one of the 
best things about Fossil relative to Git.)  Although Fossil keeps track of 
which repositories you’ve got open on your system so that “fossil all” can 
work, it currently makes no logical or filesystem ties between these 
independent checkouts, which it would need to do to make use of these 
OS-specific file cloning APIs you talk about.  Example:

    $ mkdir ../x ; cd ../x ; fossil open ~/museum/x.fossil trunk
    $ mkdir ../y ; cd ../y ; fossil open ~/museum/x.fossil y-branch

It is almost certainly the case that some of the files in x and y are identical 
so that those files could be cloned from those in the other at the filesystem 
level by making one of these OS-specific API calls, but it would require a lot 
of bookkeeping on Fossil’s part to pull this off.  And at the end of the day, 
you’d only be getting the feature on macOS 10.13+ boxes.

Git isn’t much better on these points:

1. Git also uses delta compression in the repo.

2. Git’s “pile of files” repo format still keeps most repo data in “packfiles”, 
which are not loose independent copies of the checked-out files.

3. This is the only place where Git’s design helps, because its git-worktree 
feature is a cheesy hack compared to the way Fossil separates the checkout and 
repository clone.[3]  This design inherently keeps track of which repos are 
linked to which others, so the bookkeeping of figuring out which files to call 
clonefile(2) or similar on would be easier.  It’s a high cost to pay to save 
some disk space, though.


[1]: https://developer.apple.com/library/content/samplecode/APFSCloning/
[2]: 
https://fossil-scm.org/index.html/doc/trunk/www/delta_encoder_algorithm.wiki
[3]: 
https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg25686.html

> I understand that ZFS does this too, though I’ve never used ZFS.

I’ve used ZFS for years, so I can tell you that in almost every way, ZFS is 
greatly superior to APFS.  One of the few ways where APFS is superior is in 
this clonefile(2) syscall.  There is no equivalent under ZFS: cloning is done 
at the filesystem level, not per-file.

Apple’s model is traditional “applications,” and they want “File > Save As” to 
make copies where possible, rather than duplicate most of the bytes on disk.

ZFS’s model is snapshots and clones of entire filesystems.  A single file can 
then be modified and only the updated blocks are tracked separately from the 
blocks that all clones share, but I don’t believe you can simply tell the 
filesystem to clone a file under a new name, short of link(2).

And no, we can’t just use link(2) for this, because the two names only refer to 
separate sets of block data on disk when the application makes a copy of the 
file, rather than rewriting the data in place.  So, if you edit a source file 
with a text editor that rewrites in place, all linked versions get changed, 
which is almost certainly not what you want when you’re using it to save space 
between independent VCS repository checkouts.  ZFS and APFS style cloning have 
different semantics, more appropriate to this proposed usage.

Linux got something similar in kernel 4.5, ioctl(FIDEDUPERANGE),[4] which 
incidentally shows the problem we face here: without a standard, applications 
have to be coded for each OS specifically.  I don’t know if ZFS-on-Linux 
understands this ioctl yet; last I heard, the answer was, “no,” but that might 
have changed.  (This ioctl was originally a btrfs-specific feature, but was 
generalized for other filesystems in Linux kernel 4.5.)

I seem to recall that FreeBSD was talking about adding a similar syscall, but 
then you probably still wouldn’t get it on Solaris, OpenZFS-on-Linux, O3X…  Or, 
you’d get it, but then years late, as we see with built-in crypto.

And even if one were to add such a feature to Fossil making use of the macOS 
10.13+, Linux 4.5+, and FreeBSD.next syscalls for this, you’re probably still 
only covering about 2% of the client systems Fossil is currently used on.


[4]: http://man7.org/linux/man-pages/man2/ioctl_fideduperange.2.html
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to