On Dec 27, 2017, at 11:14 AM, Simon Slavin <slav...@bigfraud.org> wrote: > > On 27 Dec 2017, at 4:05pm, Warren Young <war...@etr-usa.com> wrote: > >> DVCSes...by their very nature...want to clone the entire history of the >> whole project to every machine, then make a second copy of the tip of each >> working branch > > Apple recently moved to APFS, a file system which supports file and folder > cloning. If you copy a file or folder it doesn’t duplicate the data, it just > creates a pointer that points to the existing copy. However, if you then > change one of the copies (e.g. change one byte of a huge file) it makes a new > copy (of the affected sectors) at that point, so that only that one copy of > the file has changed.
In addition to da Silva’s point about needing to use the OS-specific API to do this,[1] we couldn’t easily use it in Fossil anyway, for multiple reasons: 1. One of the duplicates is in the repository clone, which is delta-compressed,[2] and thus is not in the exact same form as when checked out, so you’ll still have at least two near-duplicate copies. Only a more primitive version control system like RCS, CVS, or *maybe* Subversion which really does keep pure duplicates hidden away could get you around this problem. 2. Even if delta compression were disabled on purpose for some files, the repository copy is stuck in Fossil’s SQLite database file. It is not an independent file that the filesystem could track anyway. 3. In the Fossil model, checkouts are independent. (And this is one of the best things about Fossil relative to Git.) Although Fossil keeps track of which repositories you’ve got open on your system so that “fossil all” can work, it currently makes no logical or filesystem ties between these independent checkouts, which it would need to do to make use of these OS-specific file cloning APIs you talk about. Example: $ mkdir ../x ; cd ../x ; fossil open ~/museum/x.fossil trunk $ mkdir ../y ; cd ../y ; fossil open ~/museum/x.fossil y-branch It is almost certainly the case that some of the files in x and y are identical so that those files could be cloned from those in the other at the filesystem level by making one of these OS-specific API calls, but it would require a lot of bookkeeping on Fossil’s part to pull this off. And at the end of the day, you’d only be getting the feature on macOS 10.13+ boxes. Git isn’t much better on these points: 1. Git also uses delta compression in the repo. 2. Git’s “pile of files” repo format still keeps most repo data in “packfiles”, which are not loose independent copies of the checked-out files. 3. This is the only place where Git’s design helps, because its git-worktree feature is a cheesy hack compared to the way Fossil separates the checkout and repository clone.[3] This design inherently keeps track of which repos are linked to which others, so the bookkeeping of figuring out which files to call clonefile(2) or similar on would be easier. It’s a high cost to pay to save some disk space, though. [1]: https://developer.apple.com/library/content/samplecode/APFSCloning/ [2]: https://fossil-scm.org/index.html/doc/trunk/www/delta_encoder_algorithm.wiki [3]: https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg25686.html > I understand that ZFS does this too, though I’ve never used ZFS. I’ve used ZFS for years, so I can tell you that in almost every way, ZFS is greatly superior to APFS. One of the few ways where APFS is superior is in this clonefile(2) syscall. There is no equivalent under ZFS: cloning is done at the filesystem level, not per-file. Apple’s model is traditional “applications,” and they want “File > Save As” to make copies where possible, rather than duplicate most of the bytes on disk. ZFS’s model is snapshots and clones of entire filesystems. A single file can then be modified and only the updated blocks are tracked separately from the blocks that all clones share, but I don’t believe you can simply tell the filesystem to clone a file under a new name, short of link(2). And no, we can’t just use link(2) for this, because the two names only refer to separate sets of block data on disk when the application makes a copy of the file, rather than rewriting the data in place. So, if you edit a source file with a text editor that rewrites in place, all linked versions get changed, which is almost certainly not what you want when you’re using it to save space between independent VCS repository checkouts. ZFS and APFS style cloning have different semantics, more appropriate to this proposed usage. Linux got something similar in kernel 4.5, ioctl(FIDEDUPERANGE),[4] which incidentally shows the problem we face here: without a standard, applications have to be coded for each OS specifically. I don’t know if ZFS-on-Linux understands this ioctl yet; last I heard, the answer was, “no,” but that might have changed. (This ioctl was originally a btrfs-specific feature, but was generalized for other filesystems in Linux kernel 4.5.) I seem to recall that FreeBSD was talking about adding a similar syscall, but then you probably still wouldn’t get it on Solaris, OpenZFS-on-Linux, O3X… Or, you’d get it, but then years late, as we see with built-in crypto. And even if one were to add such a feature to Fossil making use of the macOS 10.13+, Linux 4.5+, and FreeBSD.next syscalls for this, you’re probably still only covering about 2% of the client systems Fossil is currently used on. [4]: http://man7.org/linux/man-pages/man2/ioctl_fideduperange.2.html _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users