Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-03-02 Thread Ævar Arnfjörð Bjarmason
On Tue, Feb 24, 2015 at 1:44 PM, Michael Haggerty mhag...@alum.mit.edu wrote: On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote: On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri,

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-03-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 10:04 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I actually ran this a few times while testing it, so this is a before and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran the gc + repack + bitmaps for

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-03-02 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason ava...@gmail.com writes: On Fri, Feb 20, 2015 at 10:04 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I actually ran this a few times while testing it, so this is a before and after on a hot cache of linux.git with 406

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-25 Thread Duy Nguyen
On Sat, Feb 21, 2015 at 11:01 AM, Duy Nguyen pclo...@gmail.com wrote: I wonder how efficient rsync is for transferring these refs: the client generates a file containing all refs, the server does the same with their refs, then the client rsync their file to the server.. The changes between the

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-24 Thread Michael Haggerty
On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote: On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote:

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-23 Thread David Turner
On Fri, 2015-02-20 at 12:59 -0800, Junio C Hamano wrote: David Turner dtur...@twopensource.com writes: On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: * 'git push'? This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick..

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes: On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: * 'git push'? This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick.. Ah the number of refs may affect both git-push and git-pull. I

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Sebastian Schuberth
On 20.02.2015 01:03, brian m. carlson wrote: If you want good performance, I'd recommend the latest version of Git both client- and server-side. Newer versions of Git provide pack bitmaps, which can dramatically speed up clones and fetches, and Git Do you happen now which version, if at all,

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I actually ran this a few times while testing it, so this is a before and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran the gc + repack + bitmaps for both repos noted in an earlier reply of mine, and took the fastest run out

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread brian m. carlson
On Fri, Feb 20, 2015 at 11:08:55PM +0100, Sebastian Schuberth wrote: On 20.02.2015 01:03, brian m. carlson wrote: If you want good performance, I'd recommend the latest version of Git both client- and server-side. Newer versions of Git provide pack bitmaps, which can dramatically speed up

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Martin Fick
On Friday, February 20, 2015 01:29:12 PM David Turner wrote: ... For a more general solution, perhaps a log of ref updates could be used. Every time a ref is updated on the server, that ref would be written into an append-only log. Every time a client pulls, their pull data includes an index

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 7:42 AM, David Turner dtur...@twopensource.com wrote: On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: * 'git push'? This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick.. Ah the number of refs may affect

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread David Turner
On Fri, 2015-02-20 at 13:37 -0700, Martin Fick wrote: On Friday, February 20, 2015 01:29:12 PM David Turner wrote: ... For a more general solution, perhaps a log of ref updates could be used. Every time a ref is updated on the server, that ref would be written into an append-only log.

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 7:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: But actually most of git fetch is spent in the reachability check subsequently done by git-rev-list which takes several seconds. I I wonder if reachability bitmap could help here.. I could have sworn I had that

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Stephen Morton
This is fantastic. I really appreciate all the answers. And it's great that I think I've sparked some general discussion that could lead somewhere too. Notes: I'm currently using 2.1.3. I'll move to 2.3.x I'm experimenting with git-annex to reduce repo size on disk. We'll see. I could remove

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Matthieu Moy
Stephen Morton stephen.c.mor...@gmail.com writes: 1. Ævar : I'm a bit concerned by your statement that git rebases take about 1-2 s per commit. Does that mean that a git pull --rebase, if it is picking up say 120 commits (not at all unrealistic), could potentially take 4 minutes to complete?

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread brian m. carlson
On Fri, Feb 20, 2015 at 11:06:44AM -0500, Stephen Morton wrote: 2. I'd not heard about bitmap indexes before this thread but it sounds like they should help me. In limited searching I can't find much useful documentation about them. It is also not clear to me if I have to explicitly run git

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Sebastian Schuberth
On 20.02.2015 15:25, Ævar Arnfjörð Bjarmason wrote: tl;dr: After some more testing it turns out the performance issues we have are almost entirely due to the number of refs. Some of these I Interesting. We currently have similar performance issues when pushing to a Git repo hosted on Gerrit.

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread David Turner
On Thu, 2015-02-19 at 23:57 -0700, Martin Fick wrote: On Feb 19, 2015 5:42 PM, David Turner dtur...@twopensource.com wrote: On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: * 'git push'? This one is not affected by how deep your repo's history is, or how wide your tree

RE: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Randall S. Becker
-Original Message- On Feb 20, 2015 1:58AM Martin Fick wrote: On Feb 19, 2015 5:42 PM, David Turner dtur...@twopensource.com wrote: This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick.. Good to hear that others are starting to

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stefan Beller
On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton stephen.c.mor...@gmail.com wrote: I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask it here. I'm evaluating the feasibility of moving my team from SVN to

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Martin Fick
On Feb 19, 2015 5:42 PM, David Turner dtur...@twopensource.com wrote: On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote:     * 'git push'? This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick.. Ah the number of refs may

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stephen Morton
On Thu, Feb 19, 2015 at 5:21 PM, Stefan Beller sbel...@google.com wrote: On Thu, Feb 19, 2015 at 1:26 PM, Stephen Morton stephen.c.mor...@gmail.com wrote: I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread brian m. carlson
On Thu, Feb 19, 2015 at 04:26:58PM -0500, Stephen Morton wrote: I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask it here. I'm evaluating the feasibility of moving my team from SVN to git. We have a

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Ævar Arnfjörð Bjarmason
On Thu, Feb 19, 2015 at 10:26 PM, Stephen Morton stephen.c.mor...@gmail.com wrote: I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask it here. I'm evaluating the feasibility of moving my team from SVN to

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Stefan Beller
On Thu, Feb 19, 2015 at 3:06 PM, Stephen Morton stephen.c.mor...@gmail.com wrote: I think I addressed most of this in my original post with the paragraph Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo, 50k tags, 1,000 branches. (Due to historical code

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Duy Nguyen
On Fri, Feb 20, 2015 at 4:26 AM, Stephen Morton stephen.c.mor...@gmail.com wrote: By 'performance', I guess I mean speed of day to day operations for devs. * (Obviously, trivially, a (non-local) clone will be slow with a large repo.) * Will a few simultaneous clones from the central