On 08/16/2016 09:17 AM, Stefan Hett wrote: > Just to have this mentioned: Be aware that the working copy (aka: the > checked out data of the repository) will have a 2x storage requirement > on the data since it will keep a copy of the pristine version of the > file in addition to the "actual" file.
The type of system that I am imagining might typically have several terabytes of instrumentation data in a repository[1]. Various client machines might need to check-out a few gigabytes or a few hundred gigabytes at a time to run data analysis (automated compute jobs) or to perform a study (scientist/human-interest). [1]: Version control isn't a requirement in this use-case/hypothetical-system. Sophisticated access control is much more of a concern. Mandatory audit trails and distributed contract based data handling are examples of more relevant architectural characteristics. I am currently looking at the possibility of using Subversion (in a non-traditional, off-label fashion) to bootstrap a [very] simplified demonstration-of-concept type of setup. My current data-set is only about 25GB and growing at a rate of about 1GB/week. A desktop server and laptop client shouldn't have any storage space problems (in this case as a small demonstration system). > If this is a concern for your use-case, you could export the files and > only use a working copy in cases where you need to commit or reorder files. By "export the files" do you mean something like an NFS share of the repository, thus bypassing svnserve and the check-in/check-out process? That seems like a clever possibility worth remembering, but for now the system I am currently building/imagining is headed in a different direction. > To clarify: This is purely a client side storage requirement. It does > not apply to the storage requirements on the server side. To reduce network load, are there any client-side caching options for Subversion? Does the svn program account for the files already in the working copy (on the local disk) and avoid transferring those files over the network during a subsequent check-out [that requires those files]? Is it possible to clone or mirror all or part of a Subversion repository? <speculative fun> This probably isn't relevant to Subversion, but in the system I am imagining it might be reasonable for clients to check-out data-sets via torrent connections with other full/partial repositories.