Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: I'll finish off the patch once you ok the basics below. My current code works like this: Chris, before you do anything further, let me re-consider. Assuming that the real cost of write-tree is the compression (and I think it is), I really suspect

Re: [PATCH] write-tree performance problems

2005-04-20 Thread H. Peter Anvin
Linus Torvalds wrote: So I'll see if I can turn the current fsck into a convert into uncompressed format, and do a nice clean format conversion. Just let me know what you want to do, and I can trivially change the conversion scripts I've already written to do what you want. -hpa - To

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Ingo Molnar
* Linus Torvalds [EMAIL PROTECTED] wrote: So to convert your old git setup to a new git setup, do the following: [...] did this for two repositories (git and kernel-git), it works as advertised. Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote: I converted my git archives (kernel and git itself) to do the SHA1 hash _before_ the compression phase. Linus, Am I correct to understand that with this change, all the objects in the database are still being compressed (so no net

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:11:10PM +1000, Jon Seymour wrote: On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote: I converted my git archives (kernel and git itself) to do the SHA1 hash _before_ the compression phase. Linus, Am I correct to understand that with this change,

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Morten Welinder
On 4/20/05, Martin Uecker [EMAIL PROTECTED] wrote: The storage method of the database of a collection of files in the underlying file system. Because of the random nature of the hashes this leads to a horrible amount of seeking for all operations which walk the logical structure of some tree

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
The main point is not about trying different compression techniques but that you don't need to compress at all just to calculate the hash of some data. (to know if it is unchanged for example) Ah, ok, I didn't understand that there were extra compresses being performed for that reason.

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 02:08 -0700, Linus Torvalds wrote: I converted my git archives (kernel and git itself) to do the SHA1 hash _before_ the compression phase. I'm happy to see that -- because I'm going to be asking you to make another change which will also require a simple repository

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Jon Seymour wrote: Am I correct to understand that with this change, all the objects in the database are still being compressed (so no net performance benefit), but by doing the SHA1 calculations before compression you are keeping open the possibility that at some

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:30:15AM -0400, C. Scott Ananian wrote: Hi, your code looks pretty cool. thank you! On Wed, 20 Apr 2005, Martin Uecker wrote: The other thing I don't like is the use of a sha1 for a complete file. Switching to some kind of hash tree would allow to introduce

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 02:43, Linus Torvalds wrote: On Tue, 19 Apr 2005, Chris Mason wrote: I'll finish off the patch once you ok the basics below. My current code works like this: Chris, before you do anything further, let me re-consider. Assuming that the real cost of write-tree is

Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Chris Mason wrote: With the basic changes I described before, the 100 patch time only goes down to 40s. Certainly not fast enough to justify the changes. In this case, the bulk of the extra time comes from write-tree writing the index file, so I split write-tree.c up into

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, C. Scott Ananian wrote: Hmm. Are our index files too large, or is there some other factor? They _are_ pretty large, but they have to be, For the kernel, the index file is about 1.6MB. That's - 17,000+ files and filenames - stat information for all of them - the

Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Linus Torvalds wrote: I was considering using a chunked representation for *all* files (not just blobs), which would avoid the original 'trees must reference other trees or they become too large' issue -- and maybe the performance issue you're referring to, as well? No. The

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote: Hi, A merkle-tree (which I think you initially pointed me at) makes the hash of the internal nodes be a hash of the chunk's hashes; ie not a straight content hash. This is roughly what my current implementation does, but I

Re: [PATCH] write-tree performance problems

2005-04-20 Thread David Willmore
On 4/20/05, Linus Torvalds [EMAIL PROTECTED] wrote: It really _shouldn't_ be faster. It still does the compression, and throws the end result away. Am I misunderstanding or is the proglem that doing: file with unknown status - compress - sha1 - compare with existing hash is expensive? What

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, C. Scott Ananian wrote: OK, sure. But how 'bout chunking trees? Are you grown happy with the new trees-reference-other-trees paradigm, or is there a deep longing in your heart for the simplicity of 'trees-reference-blobs-period'? I'm pretty sure we do better

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Linus Torvalds wrote: To actually go faster, it _should_ need this patch. Untested. See if it works.. NO! Don't see if this works. For the sha1 file already exists file, it forgot to return the SHA1 value in returnsha1, and would thus corrupt the trees it wrote. So

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 11:40, Linus Torvalds wrote: On Wed, 20 Apr 2005, Chris Mason wrote: Thanks for looking at this. Your new tree is faster, it gets the commit 100 patches time down from 1m5s to 50s. It really _shouldn't_ be faster. It still does the compression, and throws the

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Linus Torvalds wrote: NO! Don't see if this works. For the sha1 file already exists file, it forgot to return the SHA1 value in returnsha1, and would thus corrupt the trees it wrote. Proper version with fixes checked in. For me, it brings down the time to write a

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: At any rate, the time for a single write-tree is pretty consistent. Before it was around .5 seconds, and with this change it goes down to .128s. Oh, wow. I bet your SHA1 implementation is done with hand-optimized and scheduled x86 MMX code or

Re: [PATCH] write-tree performance problems

2005-04-20 Thread David S. Miller
On Wed, 20 Apr 2005 10:06:15 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] wrote: I bet your SHA1 implementation is done with hand-optimized and scheduled x86 MMX code or something, while my poor G5 is probably using some slow generic routine. As a result, it only improved by 33% for me since

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: Well, the difference there should be pretty hard to see with any benchmark. But I was being lazy...new patch attached. This one gets the same perf numbers, if this is still wrong then I really need some more coffee. I did my preferred version.

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 07:59 -0700, Linus Torvalds wrote: external-parent commit-hash external-parent-ID comment for this parent and the nice thing about that is that now that information allows you to add external parents at any point. Why do it like this? First

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Olivier Galibert
On Tue, Apr 19, 2005 at 10:36:06AM -0700, Linus Torvalds wrote: In fact, git has all the same issues that BK had, and for the same fundamental reason: if you do distributed work, you have to always append stuff, and that means that you can never re-order anything after the fact. You can,

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: Very true, you can't replace quilt with git without ruining both of them. But it would be nice to take a quilt tree and turn it into a git tree for merging purposes, or to make use of whatever visualization tools might exist someday. Fair

Re: [PATCH] write-tree performance problems

2005-04-19 Thread David Lang
On Tue, 19 Apr 2005, Linus Torvalds wrote: On Tue, 19 Apr 2005, Chris Mason wrote: Very true, you can't replace quilt with git without ruining both of them. But it would be nice to take a quilt tree and turn it into a git tree for merging purposes, or to make use of whatever visualization tools

Re: [PATCH] write-tree performance problems

2005-04-19 Thread C. Scott Ananian
On Tue, 19 Apr 2005, Linus Torvalds wrote: (*) Actually, I think it's the compression that ends up being the most expensive part. You're also using the equivalent of '-9', too -- and *that's slow*. Changing to Z_NORMAL_COMPRESSION would probably help a lot (but would break all existing

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: 5) right before exiting, write-tree updates the index if it made any changes. This part won't work. It needs to do the proper locking, which means that it needs to create index.lock _before_ it reads the index file, and write everything to that one