On Mon, Aug 15, 2011 at 18:40, Russell N. Nelson - rnnelson
<rnnel...@clarkson.edu> wrote:
> The problem is that 1) the files are bulky,

That's expected. :-)

> 2) there are many of them, 3) they are in constant flux,

That is not really a problem: since there are many of them
statistically they are not in flux.

> and 4) it's likely that your connection would close for whatever reason 
> part-way through the download..

I seem not to forgot to mention zsync/rsync. ;-)

> Even taking a snapshot of the filenames is dicey. By the time you finish, 
> it's likely that there will be new ones, and possible that some will be 
> deleted. Probably the best way to make this work is to 1) make a snapshot of 
> files periodically,

Since I've been told they're backed up it naturally should exist.

> 2) create an API which returns a tarball using the snapshot of files that 
> also implements Range requests.

I would very much prefer ready-to-use format instead of a tarball, not
to mention it's pretty resource consuming to create a tarball just for
that.

> Of course, this would result in a 12-terabyte file on the recipient's host. 
> That wouldn't work very well. I'm pretty sure that the recipient would need 
> an http client which would 1) keep track of the place in the bytestream and 
> 2) split out files and write them to disk as separate files. It's possible 
> that a program like getbot already implements this.

I'd make a snapshot without tar especially because partial transfers
aren't possible that way.

-- 
 byte-byte,
    grin

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to