Re: [RFC] Generation Number v2

2018-11-03 Thread Jakub Narebski
Jakub Narebski writes: > Jakub Narebski writes: >> Stefan Beller writes: > [...] >>> How would this impact creation of a commit? >>> >>> The current generation numbers can be lazily updated or not >>> updated at all. In my understanding of the ma

Re: [RFC] Generation Number v2

2018-11-03 Thread Jakub Narebski
Derrick Stolee writes: > On 11/1/2018 8:27 AM, Jakub Narebski wrote: >> Derrick Stolee writes: >> >>> Please also let me know about any additional tests that I could >>> run. Now that I've got a lot of test scripts built up, I can re-run >>> the te

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski
Derrick Stolee writes: > On 10/31/2018 8:54 AM, Ævar Arnfjörð Bjarmason wrote: >> On Tue, Oct 30 2018, Junio C Hamano wrote: >>> Derrick Stolee writes: In contrast, maximum generation numbers and corrected commit dates both performed quite well. They are frequently the top

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski
Derrick Stolee writes: > On 10/29/2018 11:59 PM, Junio C Hamano wrote: >> Derrick Stolee writes: [...] >>> * **Compatible?** In our test implementation, we use a previously unused >>>byte of data in the commit-graph format to indicate which reachability >>>index version we are using.

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski
Jakub Narebski writes: > Stefan Beller writes: [...] >> How would this impact creation of a commit? >> >> The current generation numbers can be lazily updated or not >> updated at all. In my understanding of the maximum generation >> numbers, a new commit wo

Re: [RFC] Generation Number v2

2018-11-01 Thread Jakub Narebski
Derrick Stolee writes: > Here is a re-formatted version of the tables I introduced earlier. > The tables were too wide for public-inbox to render correctly (when > paired with my email client). Hopefully this bulleted-list format > works better. Thanks, Stefan, for pointing out the rendering >

Re: [RFC] Generation Number v2

2018-11-01 Thread Jakub Narebski
Stefan Beller writes: >> Based on the performance results alone, we should remove minimum >> generation numbers, (epoch, date) pairs, and FELINE index from >> consideration. There are enough examples of these indexes performing >> poorly. >> >> In contrast, maximum generation numbers and

Re: [RFC] Generation Number v2

2018-11-01 Thread Jakub Narebski
[I have noticed that in some places I wrote A..B instead of B..A. Sorry about that] Derrick Stolee writes: > We've discussed in several places how to improve upon generation > numbers. This RFC is a report based on my investigation into a > few new options, and how they compare for Git's

Re: [PATCH v3 10/12] Add a base implementation of SHA-256 support

2018-10-27 Thread Jakub Narebski
"brian m. carlson" writes: > SHA-1 is weak and we need to transition to a new hash function. For > some time, we have referred to this new function as NewHash. Recently, > we decided to pick SHA-256 as NewHash. Even if we have decided to not repeat the reasoning behind the need to switch away

Re: [PATCH] Poison gettext with the Ook language

2018-10-27 Thread Jakub Narebski
SZEDER Gábor writes: > On Mon, Oct 22, 2018 at 05:36:33PM +0200, Nguyễn Thái Ngọc Duy wrote: >> >> The current gettext() function just replaces all strings with >> '# GETTEXT POISON #' including format strings and hides the things >> that we should be allowed to grep (like branch names, or some

Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-26 Thread Jakub Narebski
Junio C Hamano writes: > Derrick Stolee writes: > >>     time git log --topo-order -10 master >/dev/null >> >>     time git log --topo-order -10 maint..master >/dev/null >> >> I get 0.39s for the first call and 0.01s for the second. (Note: I >> specified "-10" to ensure we are only writing 10

Re: [PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-26 Thread Jakub Narebski
Derrick Stolee writes: > On 10/22/2018 9:37 AM, Jakub Narebski wrote: >> "Derrick Stolee via GitGitGadget" writes: [...] >> Sidenote: the new algorithm looks a bit like Unix pipeline, where each >> step of pipeline does not output much more than next step

Re: [PATCH v4 7/7] t6012: make rev-list tests more interesting

2018-10-23 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > As we are working to rewrite some of the revision-walk machinery, > there could easily be some interesting interactions between the > options that force topological constraints (--topo-order, > --date-order, and

Re: [PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-22 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > The current --topo-order algorithm requires walking all > reachable commits up front, topo-sorting them, all before > outputting the first value. This patch introduces a new > algorithm which uses stored generation numbers to

Re: [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring

2018-10-21 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > There are a few things that need to move around a little before > making a big refactoring in the topo-order logic: > > 1. We need access to record_author_date() and >compare_commits_by_author_date() in revision.c. These

Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-21 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > When running 'git rev-list --topo-order' and its kin, the topo_order > setting in struct rev_info implies the limited setting. This means > that the following things happen during prepare_revision_walk(): > > * revs->limited

Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-21 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > This patch series performs a decently-sized refactoring of the revision-walk > machinery. Well, "refactoring" is probably the wrong word, as I don't > actually remove the old code. Instead, when we see certain options in the > 'rev_info' struct, we

Re: [PATCH v4 3/7] test-reach: add rev-list tests

2018-10-21 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > The rev-list command is critical to Git's functionality. Ensure it > works in the three commit-graph environments constructed in > t6600-test-reach.sh. Here are a few important types of rev-list > operations: > > * Basic: git

Re: [PATCH v3 7/7] revision.c: refactor basic topo-order logic

2018-10-06 Thread Jakub Narebski
Derrick Stolee writes: > On 9/21/2018 1:39 PM, Derrick Stolee via GitGitGadget wrote: > Hello, Git contributors. > > I understand that this commit message and patch are pretty > daunting. There is a lot to read and digest. I would like to see if > anyone is willing to put the work in to review

Re: [PATCH 1/1] commit: don't use generation numbers if not needed

2018-10-06 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > In 3afc679b "commit: use generations in paint_down_to_common()", > the queue in paint_down_to_common() was changed to use a priority > order based on generation number before commit date. This served > two purposes: > > 1.

Re: [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base'

2018-10-06 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > As I was testing the release candidate, I stumbled across a regression in > 'git merge-base' as a result of the switch to generation numbers. The commit > message in [PATCH 1/1] describes the topology involved, but you can test it > yourself by

Re: Git Evolve

2018-10-04 Thread Jakub Narebski
Junio C Hamano writes: > Stefan Xenos writes: > >> What is the evolve command? >> ... >> - Systems like gerrit would no longer need to rely on "change-id" tags >> in commit comments to associate commits with the change that they >> edit, since git itself would have that information. >> ... >> Is

Re: What's cooking in git.git (Aug 2018, #01; Thu, 2)

2018-08-07 Thread Jakub Narebski
Junio C Hamano writes: > * ds/commit-graph-with-grafts (2018-07-19) 8 commits > (merged to 'next' on 2018-08-02 at 0ee624e329) > + commit-graph: close_commit_graph before shallow walk > + commit-graph: not compatible with uninitialized repo > + commit-graph: not compatible with grafts > +

Re: [PATCH] DO-NOT-MERGE: write and read commit-graph always

2018-07-31 Thread Jakub Narebski
Stefan Beller writes: >> I wonder though if all those changes to the testsuite shouldn't be >> merged. > > I think Stolee doesn't want this to be merged after rereading > subject and the commit message. Yes, I understand that, and for the most part I agree with it. This commit main purpose is

Re: [PATCH 8/8] commit-graph: close_commit_graph before shallow walk

2018-07-30 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > Make close_commit_graph() work for arbitrary repositories. The first line (above) does not match the rest of the commit message. > Call close_commit_graph() when about to start a rev-list walk that > includes shallow

Re: [PATCH] DO-NOT-MERGE: write and read commit-graph always

2018-07-30 Thread Jakub Narebski
Derrick Stolee writes: > This commit is not intended to be merged, but is only to create a test > environment to see what works with the commit-graph feature and what > does not. The tests that fail are primarily related to corrupting the > object store to remove a commit from visibility and

Re: [PATCH 7/8] commit-graph: not compatible with uninitialized repo

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: [I hope that the rest of replies would make it all right through GitGitGadget] > From: Derrick Stolee > > Signed-off-by: Derrick Stolee > --- > commit-graph.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c

Re: [PATCH 6/8] commit-graph: not compatible with grafts

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > Augment commit_graph_compatible(r) to return false when the given > repository r has commit grafts or is a shallow clone. Test that in these > situations we ignore existing commit-graph files and we do not write new >

Re: [PATCH 4/8] test-repository: properly init repo

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > Signed-off-by: Derrick Stolee > --- > t/helper/test-repository.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c > index

Re: [PATCH 5/8] commit-graph: not compatible with replace objects

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > Create new method commit_graph_compatible(r) to check if a given > repository r is compatible with the commit-graph feature. Fill the > method with a check to see if replace-objects exist. All right, looks sensible. Did you

Re: [PATCH 3/8] commit-graph: update design document

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: > From: Derrick Stolee > > As it exists right now, the commit-graph feature may provide > inconsistent results when combined with commit grafts, replace objects, > and shallow clones. Update the design document to discuss why these > interactions are

Re: [PATCH 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-07-29 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget" writes: Sidenote: the GitGitGadget doesn't fold/wrap lines properly in the cover-letter. Not that it is much of a problem. > One unresolved issue with the commit-graph feature is that it can > cause issues when combined with replace objects, commit grafts, or

Re: [RFC PATCH 3/6] commit-graph: enable replace-object and grafts

2018-06-09 Thread Jakub Narebski
Derrick Stolee writes: First, this commit seems to try to do two things at once, and in its current form it should be split into adding destroy_commit_graph() and into grafts / replacements check. Those are separate and unconnected features, and should be in separate patches, in my opinion. >

Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-06-08 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file stores a condensed version of the commit history. > This helps speed up several operations involving commit walks. This > feature does not work well if those commits "change" using features like > commit grafts, replace objects, or shallow clones.

Re: [PATCH v3 11/20] commit-graph: verify root tree OIDs

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > On 5/30/2018 6:24 PM, Jakub Narebski wrote: [...] >> NOTE: we will be checking Commit Data chunk; I think it would be good >> idea to verify that size of Commit Data chunk matches (N * (H + 16) bytes) >> that format gives us, so that we don't acc

Re: [PATCH v3 07/20] commit-graph: verify catches corrupt signature

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > On 5/28/2018 10:05 AM, Jakub Narebski wrote: >> Derrick Stolee writes: [...] >>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh >>> index 6ca451dfd2..bd64481c7a 100755 >>> --- a/t/t5318-commit-graph.sh >>>

Re: [PATCH v3 06/20] commit-graph: add 'verify' subcommand

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > On 5/27/2018 6:55 PM, Jakub Narebski wrote: >> Derrick Stolee writes: [...] >>> +static int verify_commit_graph_error; >>> + >>> +static void graph_report(const char *fmt, ...) >>> +{ >>>

Re: [PATCH v3 02/20] commit-graph: fix GRAPH_MIN_SIZE

2018-06-02 Thread Jakub Narebski
"brian m. carlson" writes: > On Sat, May 26, 2018 at 08:46:09PM +0200, Jakub Narebski wrote: >> One issue: in the future when Git moves to NewHash, it could encounter >> then both commit-graph files using SHA-1 and using NewHash. What about >> GRPH_OID_LEN

Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > On 5/31/2018 10:30 PM, Junio C Hamano wrote: >> Derrick Stolee writes: >> >>> Shallow clones do not interact well with the commit-graph feature for >>> several reasons. Instead of doing the hard thing to fix those >>> interactions, instead prevent reading or writing a

Re: [PATCH v3 20/20] commit-graph: update design document

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph feature is now integrated with 'fsck' and 'gc', > so remove those items from the "Future Work" section of the > commit-graph design document. It is always nice to have such commit as a summary what was done in the series, and to have up to date roadmap.

Re: [PATCH v3 19/20] gc: automatically write commit-graph files

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file is a very helpful feature for speeding up git > operations. In order to make it more useful, write the commit-graph file > by default during standard garbage collection operations. I think you meant here "make it possible to write the commit-graph

Re: [PATCH v3 18/20] commit-graph: add '--reachable' option

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > When writing commit-graph files, it can be convenient to ask for all > reachable commits (starting at the ref set) in the resulting file. This > is particularly helpful when writing to stdin is complicated, such as a > future integration with 'git gc' which will call >

Re: [PATCH v3 17/20] fsck: verify commit-graph

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > If core.commitGraph is true, verify the contents of the commit-graph > during 'git fsck' using the 'git commit-graph verify' subcommand. Run > this check on all alternates, as well. All right, so we have one config variable to control the use of serialized commit-graph

Re: [PATCH v3 16/20] commit-graph: verify contents match checksum

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file ends with a SHA1 hash of the previous contents. If > a commit-graph file has errors but the checksum hash is correct, then we > know that the problem is a bug in Git and not simply file corruption > after-the-fact. > > Compute the checksum right

Re: [PATCH v3 15/20] commit-graph: test for corrupted octopus edge

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file has an extra chunk to store the parent int-ids for > parents beyond the first parent for octopus merges. Our test repo has a > single octopus merge that we can manipulate to demonstrate the 'verify' > subcommand detects incorrect values in that

Re: [PATCH v3 14/20] commit-graph: verify commit date

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: Nice and simple. The only possible question may be the ordering of patches in the series, namely whether this change should be before or after test checking generation numbers. > Signed-off-by: Derrick Stolee > --- > commit-graph.c | 6 ++ >

Re: [PATCH v3 13/20] commit-graph: verify generation number

2018-06-02 Thread Jakub Narebski
Derrick Stolee writes: > While iterating through the commit parents, perform the generation > number calculation and compare against the value stored in the > commit-graph. All right, that's good. What about commit-graph files that have GENERATION_NUMBER_ZERO for all its commits (because we

Re: [PATCH v3 12/20] commit-graph: verify parent list

2018-06-01 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file stores parents in a two-column portion of the > commit data chunk. If there is only one parent, then the second column > stores 0x to indicate no second parent. All right, it is certainly nice to have this information, but isn't it

Re: [PATCH v3 11/20] commit-graph: verify root tree OIDs

2018-05-30 Thread Jakub Narebski
Derrick Stolee writes: > The 'verify' subcommand must compare the commit content parsed from the > commit-graph and compare it against the content in the object database. You have "compare" twice in the above sentence. > Use lookup_commit() and parse_commit_in_graph_one() to parse the commits

Re: [PATCH v3 10/20] commit-graph: verify objects exist

2018-05-30 Thread Jakub Narebski
Derrick Stolee writes: > In the 'verify' subcommand, load commits directly from the object > database to ensure they exist. Parse by skipping the commit-graph. All right, before we check that the commit data matches, we need to check that all the commits in cache (in the serialized commit

Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-05-30 Thread Jakub Narebski
Derrick Stolee writes: > In the commit-graph file, the OID fanout chunk provides an index into > the OID lookup. The 'verify' subcommand should find incorrect values > in the fanout. > > Similarly, the 'verify' subcommand should find out-of-order values in > the OID lookup. O.K., the OID Lookup

Re: [PATCH v3 08/20] commit-graph: verify required chunks are present

2018-05-28 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file requires the following three chunks: > > * OID Fanout > * OID Lookup > * Commit Data > > If any of these are missing, then the 'verify' subcommand should > report a failure. This includes the chunk IDs malformed or the > chunk count is truncated.

Re: [PATCH v3 07/20] commit-graph: verify catches corrupt signature

2018-05-28 Thread Jakub Narebski
Derrick Stolee writes: > This is the first of several commits that add a test to check that > 'git commit-graph verify' catches corruption in the commit-graph > file. The first test checks that the command catches an error in > the file signature. This is a check that

Re: [PATCH v3 06/20] commit-graph: add 'verify' subcommand

2018-05-27 Thread Jakub Narebski
Derrick Stolee writes: > If the commit-graph file becomes corrupt, we need a way to verify > that its contents match the object database. In the manner of > 'git fsck' we will implement a 'git commit-graph verify' subcommand > to report all issues with the file. > > Add

Re: [PATCH v3 05/20] commit-graph: load a root tree from specific graph

2018-05-27 Thread Jakub Narebski
Derrick Stolee writes: > When lazy-loading a tree for a commit, it will be important to select > the tree from a specific struct commit_graph. Create a new method that > specifies the commit-graph file and use that in > get_commit_tree_in_graph(). Is this for the same

Re: [PATCH v3 04/20] commit: force commit to parse from object database

2018-05-27 Thread Jakub Narebski
Derrick Stolee writes: > In anticipation of verifying commit-graph file contents against the > object database, create parse_commit_internal() to allow side-stepping > the commit-graph file and parse directly from the object database. > > Due to the use of generation

Re: [PATCH v3 03/20] commit-graph: parse commit from chosen graph

2018-05-27 Thread Jakub Narebski
Derrick Stolee writes: > Before verifying a commit-graph file against the object database, we > need to parse all commits from the given commit-graph file. Create > parse_commit_in_graph_one() to target a given struct commit_graph. If I understand it properly the problem

Re: [PATCH v3 02/20] commit-graph: fix GRAPH_MIN_SIZE

2018-05-26 Thread Jakub Narebski
Derrick Stolee writes: > The GRAPH_MIN_SIZE macro should be the smallest size of a parsable > commit-graph file. However, the minimum number of chunks was wrong. > It is possible to write a commit-graph file with zero commits, and > that violates this macro's value. > >

Re: RFC: New reference iteration paradigm

2018-05-26 Thread Jakub Narebski
Jeff King writes: > On Thu, Mar 31, 2016 at 11:01:44AM -0700, Junio C Hamano wrote: >> Michael Haggerty writes: >> >>> the backend now has to implement >>> struct ref_iterator *ref_iterator_begin_fn(const char *submodule,

Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-24 Thread Jakub Narebski
Derrick Stolee writes: > On 5/22/2018 1:39 AM, Michael Haggerty wrote: >> On 05/21/2018 08:10 PM, Derrick Stolee wrote: >>> [...] >>> In the Discussion section of the `git merge-base` docs [1], we have the >>> following: >>> >>>     When the history involves criss-cross merges,

Re: [PATCH v2 03/12] commit-graph: test that 'verify' finds corruption

2018-05-21 Thread Jakub Narebski
Derrick Stolee writes: > Add test cases to t5318-commit-graph.sh that corrupt the commit-graph > file and check that the 'git commit-graph verify' command fails. These > tests verify the header and chunk information is checked carefully. > > Helped-by: Martin Ågren

Re: [PATCH v2 02/12] commit-graph: verify file header information

2018-05-20 Thread Jakub Narebski
Derrick Stolee writes: > During a run of 'git commit-graph verify', list the issues with the > header information in the commit-graph file. Some of this information > is inferred from the loaded 'struct commit_graph'. Some header > information is checked as part of

Re: [PATCH v2 01/12] commit-graph: add 'verify' subcommand

2018-05-20 Thread Jakub Narebski
Derrick Stolee writes: > If the commit-graph file becomes corrupt, we need a way to verify > that its contents match the object database. In the manner of > 'git fsck' we will implement a 'git commit-graph verify' subcommand > to report all issues with the file. > > Add

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-14 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 5/12/2018 10:00 AM, Jakub Narebski wrote: >> Derrick Stolee <sto...@gmail.com> writes: >>> On 5/4/2018 3:40 PM, Jakub Narebski wrote: >>>> >>>> With early parts of commit-graph feature (ds/co

Re: [PATCH 00/12] Die commit->util, die!

2018-05-12 Thread Jakub Narebski
Nguyễn Thái Ngọc Duy writes: > There's not much to write here. It's basically a copy from 12/12: > > This 'util' pointer can be used for many different purposes, > controlled in different ways. Some are not even contained in a command > code, but buried deep in common code

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-12 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 5/4/2018 3:40 PM, Jakub Narebski wrote: >> >> With early parts of commit-graph feature (ds/commit-graph and >> ds/lazy-load-trees) close to being merged into "master", see >> https://public-inbox.org/git

[RFC] Other chunks for commit-graph, part 2 - reachability indexes

2018-05-06 Thread Jakub Narebski
Hello, In previous email I wrote: JN> As this post is getting long, I'll post other ideas, about commit JN> labeling for faster reachability queries in a separate email. This is this email. Here is second part of series dedicated to discussing what other data, like various reachability

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-05 Thread Jakub Narebski
Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes: > On Fri, May 04 2018, Jakub Narebski wrote: > > (Just off-the cuff here and I'm surely about to be corrected by > Derrick...) > >> * What to do about merge commits, and octopus merges in particular? >> Shoul

[RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-04 Thread Jakub Narebski
Hello, With early parts of commit-graph feature (ds/commit-graph and ds/lazy-load-trees) close to being merged into "master", see https://public-inbox.org/git/xmqq4ljtz87g@gitster-ct.c.googlers.com/ I think it would be good idea to think what other data could be added there to make Git even

Re: [PATCH v5 09/11] commit: use generation number in remove_redundant()

2018-05-03 Thread Jakub Narebski
me is spent walking the history to find the > set of merge bases before the remove_redundant() call. The > starting commits are closer together in the second example, > therefore more time is spent in remove_redundant(). The relative > change in performance differs as expected. > > Rep

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Jakub Narebski
Junio C Hamano writes: > Shin Kojima writes: > >> Offset positions should not be counted by byte length, but by actual >> character length. >> ... >> # escape tabs (convert tabs to spaces) >> sub untabify { >> -my $line = shift; >> +my $line =

Re: [PATCH v5 00/11] Compute and consume generation numbers

2018-05-03 Thread Jakub Narebski
Derrick Stolee writes: > Most of the changes from v4 are cosmetic, but there is one new commit: > > commit: use generation number in remove_redundant() > > Other changes are non-functional, but do clarify things. I wonder if out perf framework in t/perf could help

Re: [PATCH v4 03/10] commit-graph: compute generation numbers

2018-05-02 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/29/2018 5:08 AM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: [...] >> It is a bit strange to me that the code uses get_be32 for reading, but >> htonl for writing. Is Git tested on non li

Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()

2018-05-02 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/30/2018 6:19 PM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: [...] >>> @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct >>> commit *one, int n, struc >

Re: [PATCH v4 09/10] merge: check config before loading commits

2018-05-02 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/30/2018 6:54 PM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: >> >>> Now that we use generation numbers from the commit-graph, we must >>> ensure that all commits that

Re: [PATCH v4 10/10] commit-graph.txt: update design document

2018-05-02 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/30/2018 7:32 PM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: [...] >>> - After computing and storing generation numbers, we must make graph >>> walks aware of generation n

Re: [PATCH v4 10/10] commit-graph.txt: update design document

2018-04-30 Thread Jakub Narebski
Derrick Stolee writes: > We now calculate generation numbers in the commit-graph file and use > them in paint_down_to_common(). > > Expand the section on generation numbers to discuss how the three > special generation numbers GENERATION_NUMBER_INFINITY, _ZERO, and > _MAX

Re: [PATCH v4 09/10] merge: check config before loading commits

2018-04-30 Thread Jakub Narebski
Derrick Stolee writes: > Now that we use generation numbers from the commit-graph, we must > ensure that all commits that exist in the commit-graph are loaded > from that file instead of from the object database. Since the > commit-graph file is only checked if

Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()

2018-04-30 Thread Jakub Narebski
Derrick Stolee writes: > When running 'git branch --contains', the in_merge_bases_many() > method calls paint_down_to_common() to discover if a specific > commit is reachable from a set of branches. Commits with lower > generation number are not needed to correctly answer

Re: [PATCH v4 07/10] commit: use generation numbers for in_merge_bases()

2018-04-30 Thread Jakub Narebski
Derrick Stolee writes: > The containment algorithm for 'git branch --contains' is different > from that for 'git tag --contains' in that it uses is_descendant_of() > instead of contains_tag_algo(). The expensive portion of the branch > algorithm is computing merge bases. >

Re: [PATCH v4 06/10] ref-filter: use generation number for --contains

2018-04-30 Thread Jakub Narebski
Derrick Stolee writes: > A commit A can reach a commit B only if the generation number of A > is strictly larger than the generation number of B. This condition > allows significantly short-circuiting commit-graph walks. > > Use generation number for '--contains' type

Re: [PATCH v4 05/10] commit-graph: always load commit-graph information

2018-04-29 Thread Jakub Narebski
[Forgot about one thing] Derrick Stolee writes: > Create new load_commit_graph_info() method to fill in the information > for a commit that exists only in the commit-graph file. The above sentence is a bit hard to parse because of ambiguity: is it "the information" that

Re: [PATCH v4 05/10] commit-graph: always load commit-graph information

2018-04-29 Thread Jakub Narebski
Derrick Stolee writes: > Most code paths load commits using lookup_commit() and then > parse_commit(). And this automatically loads commit graph if needed, thanks to changes in parse_commit_gently(), which parse_commit() uses. > In some cases, including

Re: [PATCH v4 04/10] commit: use generations in paint_down_to_common()

2018-04-29 Thread Jakub Narebski
Derrick Stolee writes: > Define compare_commits_by_gen_then_commit_date(), which uses generation > numbers as a primary comparison and commit date to break ties (or as a > comparison when both commits do not have computed generation numbers). All right, this looks

Re: [PATCH v4 03/10] commit-graph: compute generation numbers

2018-04-29 Thread Jakub Narebski
Derrick Stolee writes: > While preparing commits to be written into a commit-graph file, compute > the generation numbers using a depth-first strategy. Sidenote: for generation numbers it does not matter if we use depth-first or breadth-first strategy, but it is more

Re: [PATCH v4 02/10] commit: add generation number to struct commmit

2018-04-28 Thread Jakub Narebski
Derrick Stolee writes: > The generation number of a commit is defined recursively as follows: > > * If a commit A has no parents, then the generation number of A is one. > * If a commit A has parents, then the generation number of A is one > more than the maximum

Re: [PATCH v4 01/10] ref-filter: fix outdated comment on in_commit_list

2018-04-28 Thread Jakub Narebski
Derrick Stolee <dsto...@microsoft.com> writes: > The in_commit_list() method does not check the parents of > the candidate for containment in the list. Fix the comment > that incorrectly states that it does. > > Reported-by: Jakub Narebski <jna...@gmail.com> > Signe

Re: [PATCH v4 00/10] Compute and consume generation numbers

2018-04-28 Thread Jakub Narebski
Derrick Stolee writes: > As promised, here is the diff from v3. What is this strange string "   " in place of tabs in the interdiff? " " here is Unicode Character 'NO-BREAK SPACE' (U+00A0). Though it doesn't matter for viewing, my newsreader (Gnus from GNU Emacs) thinks

Re: [PATCH v4 04/10] commit: use generations in paint_down_to_common()

2018-04-28 Thread Jakub Narebski
Jakub Narebski <jna...@gmail.com> writes: > Junio C Hamano <gits...@pobox.com> writes: >> Derrick Stolee <dsto...@microsoft.com> writes: [...] >>> +int compare_commits_by_gen_then_commit_date(const void *a_, const void >>> *b_, void *unused) >&g

Re: [PATCH v4 04/10] commit: use generations in paint_down_to_common()

2018-04-26 Thread Jakub Narebski
Junio C Hamano writes: > Derrick Stolee writes: > >> Define compare_commits_by_gen_then_commit_date(), which uses generation >> numbers as a primary comparison and commit date to break ties (or as a >> comparison when both commits do not have computed

Re: [PATCH v3 5/9] ref-filter: use generation number for --contains

2018-04-24 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/18/2018 5:02 PM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: >> >>> A commit A can reach a commit B only if the generation number of A >>> is larger than the generation number

Re: [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common()

2018-04-23 Thread Jakub Narebski
Derrick Stolee <sto...@gmail.com> writes: > On 4/18/2018 7:19 PM, Jakub Narebski wrote: >> Derrick Stolee <dsto...@microsoft.com> writes: >> > [...] >>> [...], and this saves time during 'git branch --contains' queries >>> that would othe

Re: [PATCH 0/6] Compute and consume generation numbers

2018-04-21 Thread Jakub Narebski
Jakub Narebski <jna...@gmail.com> writes: > Derrick Stolee <sto...@gmail.com> writes: >> On 4/11/2018 3:32 PM, Jakub Narebski wrote: > >>> What would you suggest as a good test that could imply performance? The >>> Google Colab notebook linked to abo

Re: [RFC PATCH 12/12] commit-graph: update design document

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph feature is now integrated with 'fsck' and 'gc', > so remove those items from the "Future Work" section of the > commit-graph design document. See comments below, but this looks good to me. What is missing from the "Future Work"

Re: [RFC PATCH 11/12] gc: automatically write commit-graph files

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > The commit-graph file is a very helpful feature for speeding up git > operations. In order to make it more useful, write the commit-graph file > by default during standard garbage collection operations. > > Add a 'gc.commitGraph' config setting

Re: [RFC PATCH 10/12] commit-graph: add '--reachable' option

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > When writing commit-graph files, it can be convenient to ask for all > reachable commits (starting at the ref set) in the resulting file. This > is particularly helpful when writing to stdin is complicated, such as a > future integration with 'git

Re: [RFC PATCH 09/12] fsck: check commit-graph

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > If a commit-graph file exists, check its contents during 'git fsck'. Is it "if a commit-graph file exists", or is it core.commitGraph feature is turned on? > > Signed-off-by: Derrick Stolee > --- > builtin/fsck.c | 13

Re: [RFC PATCH 08/12] commit-graph: verify commit contents against odb

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: One more check that could been done, and which do not require accessing the object database, would be testing correctness of the Large Edge List (EDGE) chunk. For each commit in the commit-graph (in the Commit Data (CDAT) chunk), check if it has

Re: [RFC PATCH 07/12] commit-graph: load a root tree from specific graph

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > When lazy-loading a tree for a commit, it will be important to select > the tree from a specific struct commit_graph. Create a new method that > specifies the commit-graph file and use that in > get_commit_tree_in_graph(). > > Signed-off-by:

Re: [RFC PATCH 06/12] commit: force commit to parse from object database

2018-04-20 Thread Jakub Narebski
Derrick Stolee writes: > In anticipation of checking commit-graph file contents against the > object database, create parse_commit_internal() to allow side-stepping > the commit-graph file and parse directly from the object database. Nitpick/Bikeshed painting: do we have

  1   2   >