So I wrote up my ideas regarding blob chunking as code; see attached.
This is against git-0.4 (I know, ancient, but I had to start somewhere.)
The idea here is that blobs are chunked using a rolling checksum (so the
chunk boundaries are content-dependent and stay fixed even if you mutate
pieces
On Wed, 20 Apr 2005, Chris Mason wrote:
With the basic changes I described before, the 100 patch time only goes down
to 40s. Certainly not fast enough to justify the changes. In this case, the
bulk of the extra time comes from write-tree writing the index file, so I
split write-tree.c up into
On Wed, 20 Apr 2005, Linus Torvalds wrote:
I was considering using a chunked representation for *all* files (not just
blobs), which would avoid the original 'trees must reference other trees
or they become too large' issue -- and maybe the performance issue you're
referring to, as well?
No. The
On Wed, 20 Apr 2005, David Greaves wrote:
In doing this I noticed a couple of points:
* update-cache won't accept ./file or fred/./file
The comment in update-cache.c reads:
/*
* We fundamentally don't like some paths: we don't want
* dot or dot-dot anywhere, and in fact, we don't even want
*
store. This way
* similar files will be expected to share chunks, saving space.
* Files less than one disk block long are expected to fit in a single
* chunk, so there is no extra indirection overhead for this case.
*
* Copyright (C) 2005 C. Scott Ananian [EMAIL PROTECTED]
*/
/*
* We assume
On Wed, 20 Apr 2005, Petr Baudis wrote:
I think one thing git's objects database is not very well suited for are
network transports. You want to have something smart doing the
transports, comparing trees so that it can do some delta compression;
that could probably reduce the amount of data needed
On Wed, 20 Apr 2005, Linus Torvalds wrote:
What's the disk usage results? I'm on ext3, for example, which means that
even small files invariably take up 4.125kB on disk (with the inode).
Even uncompressed, most source files tend to be small. Compressed, I'm
seeing the median blob size being ~1.6kB
On Tue, 19 Apr 2005, Linus Torvalds wrote:
(*) Actually, I think it's the compression that ends up being the most
expensive part.
You're also using the equivalent of '-9', too -- and *that's slow*.
Changing to Z_NORMAL_COMPRESSION would probably help a lot
(but would break all existing
On Tue, 19 Apr 2005, David Meybohm wrote:
But doesn't this require assuming the distribution of MD5 is uniform,
and don't the papers finding collisions in less show it's not? So, your
birthday-argument for calculating the probability wouldn't apply, because
it rests on the assumption MD5 is
On Mon, 18 Apr 2005, Andy Isaacson wrote:
If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%
But in the
On Thu, 14 Apr 2005, Paul Jackson wrote:
To me, rename is a special case of the more general case of a
big chunk of code (a portion of a file) that was in one place
either being moved or copied to another place.
I wonder if there might be someway to use the tools that biologists use
to analyze DNA
11 matches
Mail list logo