Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-10 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 10 2018, SZEDER Gábor wrote: > On Thu, Oct 04, 2018 at 11:09:58PM -0700, Junio C Hamano wrote: >> SZEDER Gábor writes: >> >> >> git-gc - Cleanup unnecessary files and optimize the local repository >> >> >> >> Creating these indexes like the commit-graph falls under "optimize

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-10 Thread SZEDER Gábor
On Thu, Oct 04, 2018 at 11:09:58PM -0700, Junio C Hamano wrote: > SZEDER Gábor writes: > > >> git-gc - Cleanup unnecessary files and optimize the local repository > >> > >> Creating these indexes like the commit-graph falls under "optimize the > >> local repository", > > > > But it doesn't

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-09 Thread SZEDER Gábor
On Mon, Oct 08, 2018 at 11:08:03PM -0400, Jeff King wrote: > I'd have done it as one fixed-size filter per commit. Then you should be > able to hash the path keys once, and apply the result as a bitwise query > to each individual commit (I'm assuming that it's constant-time to > access the filter

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 03:03:08PM -0400, Derrick Stolee wrote: > > I wonder if Roaring does better here. > > In these sparse cases, usually Roaring will organize the data as "array > chunks" which are simply lists of the values. The thing that makes this > still compressible is that we store

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
On 10/9/2018 2:46 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: [I snipped all of the parts about bloom filters that seemed entirely reasonable to me ;) ] Imagine we have that list. Is a bloom filter still the best data structure for each commit? At

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: > [I snipped all of the parts about bloom filters that seemed entirely > reasonable to me ;) ] > > Imagine we have that list. Is a bloom filter still the best data > > structure for each commit? At the point that we have the

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Ævar Arnfjörð Bjarmason
On Tue, Oct 09 2018, Derrick Stolee wrote: > The filter needs to store every path that would be considered "not > TREESAME". It can't store wildcards, so you would need to evaluate the > wildcard and test all of those paths individually (not a good idea). If full paths are stored, yes, But

Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
(Changing title to reflect the new topic.) On 10/8/2018 11:08 PM, Jeff King wrote: On Mon, Oct 08, 2018 at 02:29:47PM -0400, Derrick Stolee wrote: There are two questions that I was hoping to answer by looking at your code: 1. How do you store your Bloom filter? Is it connected to the

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Jeff King
On Mon, Oct 08, 2018 at 02:29:47PM -0400, Derrick Stolee wrote: > > > > But I'm afraid it will take a while until I get around to turn it into > > > > something presentable... > > > Do you have the code pushed somewhere public where one could take a look? > > > I > > > Do you have the code

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Junio C Hamano
SZEDER Gábor writes: > There is certainly potential there. With a (very) rough PoC > experiment, a 8MB bloom filter, and a carefully choosen path I can > achieve a nice, almost 25x speedup: > > $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh > 6 > > real0m1.563s > user

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee
On 10/8/2018 2:10 PM, SZEDER Gábor wrote: On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote: Nice! These numbers make sense to me, in terms of how many TREESAME queries we actually need to perform for such a query. Yeah... because you didn't notice that I deliberately cheated :)

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread SZEDER Gábor
On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote: > On 10/8/2018 12:41 PM, SZEDER Gábor wrote: > >On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote: > >>I'm still excited about the prospect of a bloom filter for paths which > >>each commit touches. I think that's the next big

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee
On 10/8/2018 12:41 PM, SZEDER Gábor wrote: On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote: I'm still excited about the prospect of a bloom filter for paths which each commit touches. I think that's the next big frontier in getting things like "git log -- path" to a reasonable

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote: > I'm still excited about the prospect of a bloom filter for paths which > each commit touches. I think that's the next big frontier in getting > things like "git log -- path" to a reasonable run-time. There is certainly potential there.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 10:01:31PM +0200, Ævar Arnfjörð Bjarmason wrote: > > There's unfortunately not a fast way of doing that. One option would be > > to keep a counter of "ungraphed commit objects", and have callers update > > it. Anybody admitting a pack via index-pack or unpack-objects can

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 04:00:12PM -0400, Derrick Stolee wrote: > On 10/5/2018 3:47 PM, Jeff King wrote: > > On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: > > > > > > So can we really just take (total_objects - commit_graph_objects) and > > > > compare it to some threshold? > >

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason
On Fri, Oct 05 2018, Jeff King wrote: > On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: > >> > So can we really just take (total_objects - commit_graph_objects) and >> > compare it to some threshold? >> >> The commit-graph only stores the number of _commits_, not total objects.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 3:47 PM, Jeff King wrote: On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: So can we really just take (total_objects - commit_graph_objects) and compare it to some threshold? The commit-graph only stores the number of _commits_, not total objects. Oh, right, of

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: > > So can we really just take (total_objects - commit_graph_objects) and > > compare it to some threshold? > > The commit-graph only stores the number of _commits_, not total objects. Oh, right, of course. That does throw a monkey

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 3:21 PM, Jeff King wrote: On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote: My misunderstanding was that your proposed change to gc computes the commit-graph in either of these two cases: (1) The auto-GC threshold is met. (2) There is no commit-graph file. And

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote: > My misunderstanding was that your proposed change to gc computes the > commit-graph in either of these two cases: > > (1) The auto-GC threshold is met. > > (2) There is no commit-graph file. > > And what I hope to have instead

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason
On Fri, Oct 05 2018, Derrick Stolee wrote: > On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote: >> On Fri, Oct 05 2018, Derrick Stolee wrote: >> >>> On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to polish this up for submission now, but here's a WIP patch

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote: On Fri, Oct 05 2018, Derrick Stolee wrote: On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to polish this up for submission now, but here's a WIP patch that implements this, highlights: * There's a

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason
On Fri, Oct 05 2018, Derrick Stolee wrote: > On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: >> I don't have time to polish this up for submission now, but here's a WIP >> patch that implements this, highlights: >> >> * There's a gc.clone.autoDetach=false default setting which overrides

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to polish this up for submission now, but here's a WIP patch that implements this, highlights: * There's a gc.clone.autoDetach=false default setting which overrides gc.autoDetach if 'git gc --auto' is run via git-clone

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Junio C Hamano
SZEDER Gábor writes: >> git-gc - Cleanup unnecessary files and optimize the local repository >> >> Creating these indexes like the commit-graph falls under "optimize the >> local repository", > > But it doesn't fall under "cleanup unnecessary files", which the > commit-graph file is, since,

[RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-04 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, Ævar Arnfjörð Bjarmason wrote: > Don't have time to patch this now, but thought I'd send a note / RFC > about this. > > Now that we have the commit graph it's nice to be able to set > e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or > /etc/gitconfig

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, Jeff King wrote: > On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote: > >> I share these concerns in a slightly more abstract way, as >> I would bucket the actions into two separate bins: >> >> One bin that throws away information. >> this would include removing

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote: > I share these concerns in a slightly more abstract way, as > I would bucket the actions into two separate bins: > > One bin that throws away information. > this would include removing expired reflog entries (which > I do not think

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 02:59:34PM -0400, Derrick Stolee wrote: > > They don't help yet, and there's no good reason to enable bitmaps for > > clients. I have a few patches that use bitmaps for things like > > ahead/behind and --contains checks, but the utility of those may be > > lessened quite a

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Stefan Beller
> > But you thought right, I do have an objection against that. 'git gc' > should, well, collect garbage. Any non-gc stuff is already violating > separation of concerns. I share these concerns in a slightly more abstract way, as I would bucket the actions into two separate bins: One bin that

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Derrick Stolee
On 10/3/2018 2:51 PM, Jeff King wrote: On Wed, Oct 03, 2018 at 08:47:11PM +0200, Ævar Arnfjörð Bjarmason wrote: On Wed, Oct 03 2018, Stefan Beller wrote: So we wouldn't be spending 5 minutes repacking linux.git right after cloning it, just ~10s generating the commit graph, and the same would

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 08:47:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 03 2018, Stefan Beller wrote: > > >> So we wouldn't be spending 5 minutes repacking linux.git right after > >> cloning it, just ~10s generating the commit graph, and the same would > >> happen if you rm'd

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, Stefan Beller wrote: >> So we wouldn't be spending 5 minutes repacking linux.git right after >> cloning it, just ~10s generating the commit graph, and the same would >> happen if you rm'd .git/objects/info/commit-graph and ran "git commit", >> which would kick of "gc

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Stefan Beller
> So we wouldn't be spending 5 minutes repacking linux.git right after > cloning it, just ~10s generating the commit graph, and the same would > happen if you rm'd .git/objects/info/commit-graph and ran "git commit", > which would kick of "gc --auto" in the background and do the same thing. Or

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 05:19:41PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> >> So we should make "git gc --auto" be run on clone, > >> >> > > >> >> > There is no garbage after 'git clone'... > >> >> > >> >> "git gc" is really "git gc-or-create-indexes" these days. > >> > > >> > Because it

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Duy Nguyen
On Wed, Oct 3, 2018 at 3:23 PM Ævar Arnfjörð Bjarmason wrote: > > Don't have time to patch this now, but thought I'd send a note / RFC > about this. > > Now that we have the commit graph it's nice to be able to set > e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or >

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, SZEDER Gábor wrote: > On Wed, Oct 03, 2018 at 04:22:12PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Oct 03 2018, SZEDER Gábor wrote: >> >> > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> On Wed, Oct 03 2018, SZEDER Gábor

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 04:22:12PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 03 2018, SZEDER Gábor wrote: > > > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> On Wed, Oct 03 2018, SZEDER Gábor wrote: > >> > >> > On Wed, Oct 03, 2018 at 03:23:57PM

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Duy Nguyen
On Wed, Oct 3, 2018 at 4:01 PM Ævar Arnfjörð Bjarmason wrote: > >> and change the > >> need_to_gc() / cmd_gc() behavior so that we detect that the > >> gc.writeCommitGraph=true setting is on, but we have no commit graph, and > >> then just generate that without doing a full repack. > > > > Or

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, SZEDER Gábor wrote: > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Oct 03 2018, SZEDER Gábor wrote: >> >> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> Don't have time to patch this now, but

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, Derrick Stolee wrote: > On 10/3/2018 9:36 AM, SZEDER Gábor wrote: >> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: >>> Don't have time to patch this now, but thought I'd send a note / RFC >>> about this. >>> >>> Now that we have the commit graph

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 03 2018, SZEDER Gábor wrote: > > > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Don't have time to patch this now, but thought I'd send a note / RFC > >> about this. > >> >

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 03 2018, SZEDER Gábor wrote: > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: >> Don't have time to patch this now, but thought I'd send a note / RFC >> about this. >> >> Now that we have the commit graph it's nice to be able to set >> e.g.

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Derrick Stolee
On 10/3/2018 9:36 AM, SZEDER Gábor wrote: On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: Don't have time to patch this now, but thought I'd send a note / RFC about this. Now that we have the commit graph it's nice to be able to set e.g. core.commitGraph=true &

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: > Don't have time to patch this now, but thought I'd send a note / RFC > about this. > > Now that we have the commit graph it's nice to be able to set > e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig

We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason
Don't have time to patch this now, but thought I'd send a note / RFC about this. Now that we have the commit graph it's nice to be able to set e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or /etc/gitconfig to apply them to all repos. But when I clone e.g. linux.git stuff