[PATCH v2 5/6] pack-objects: create pack.useSparse setting

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The '--sparse' flag in 'git pack-objects' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory. The sparse algorithm is not recommended for a server, which likely

[PATCH v2 0/6] Add a new "sparse" tree walk algorithm

2018-11-29 Thread Derrick Stolee via GitGitGadget
One of the biggest remaining pain points for users of very large repositories is the time it takes to run 'git push'. We inspected some slow pushes by our developers and found that the "Enumerating Objects" phase of a push was very slow. This is unsurprising, because this is why reachability

[PATCH v2 1/6] revision: add mark_tree_uninteresting_sparse

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to

[PATCH v2 2/6] list-objects: consume sparse tree walk

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new

[PATCH v2 6/6] pack-objects: create GIT_TEST_PACK_SPARSE

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Create a test variable GIT_TEST_PACK_SPARSE to enable the sparse object walk algorithm by default during the test suite. Enabling this variable ensures coverage in many interesting cases, such as shallow clones, partial clones, and missing objects. Signed-off-by: Derrick

[PATCH v2 4/6] revision: implement sparse algorithm

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When enumerating objects to place in a pack-file during 'git pack-objects --revs', we discover the "frontier" of commits that we care about and the boundary with commit we find uninteresting. From that point, we walk trees to discover which trees and blobs are uninteresting.

[PATCH v2 3/6] pack-objects: add --sparse option

2018-11-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Add a '--sparse' option flag to the pack-objects builtin. This allows the user to specify that they want to use the new logic for walking trees. This logic currently does not differ from the existing output, but will in a later change. Create a new test script,

[PATCH 4/5] revision: implement sparse algorithm

2018-11-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When enumerating objects to place in a pack-file during 'git pack-objects --revs', we discover the "frontier" of commits that we care about and the boundary with commit we find uninteresting. From that point, we walk trees to discover which trees and blobs are uninteresting.

[PATCH 0/5] Add a new "sparse" tree walk algorithm

2018-11-28 Thread Derrick Stolee via GitGitGadget
One of the biggest remaining pain points for users of very large repositories is the time it takes to run 'git push'. We inspected some slow pushes by our developers and found that the "Enumerating Objects" phase of a push was very slow. This is unsurprising, because this is why reachability

[PATCH 3/5] pack-objects: add --sparse option

2018-11-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Add a '--sparse' option flag to the pack-objects builtin. This allows the user to specify that they want to use the new logic for walking trees. This logic currently does not differ from the existing output, but will in a later change. Create a new test script,

[PATCH 5/5] pack-objects: create pack.useSparse setting

2018-11-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The '--sparse' flag in 'git pack-objects' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory. The sparse algorithm is not recommended for a server, which likely

[PATCH 2/5] list-objects: consume sparse tree walk

2018-11-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new

[PATCH 1/5] revision: add mark_tree_uninteresting_sparse

2018-11-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to

[PATCH 1/1] revision.c: use new topo-order logic in tests

2018-11-19 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The revision-walk machinery is being rewritten to use generation numbers in the commit-graph when availble. Due to some problematic commit histories, the new logic can be slower than the previous method due to how commit dates and generation numbers interact. Thus, the new

[PATCH 0/1] Use new topo-order logic with GIT_TEST_COMMIT_GRAPH

2018-11-19 Thread Derrick Stolee via GitGitGadget
The recent Git test report for v2.20.0-rc0 shows that the logic around UNINTERESTING commits is not covered by the test suite. This is because the code is currently unreachable! See the commit message for details. An alternate approach would be to delete the code around UNINTERESTING commits, but

[PATCH v2 1/1] pack-objects: ignore ambiguous object warnings

2018-11-06 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee A git push process runs several processes during its run, but one includes git send-pack which calls git pack-objects and passes the known have/wants into stdin using object ids. However, the default setting for core.warnAmbiguousRefs requires git pack-objects to check for

[PATCH v2 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls

[PATCH 1/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee During a 'git push' command, we run 'git send-pack' inside of our transport helper. This creates a 'git pack-objects' process and passes a list of object ids. If this list is large, then the pack-objects process can spend a lot of time checking the possible refs these

[PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls

[PATCH v2 0/3] Make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget
As reported earlier [1], the add_missing_tags() method in remote.c has quadratic performance. Some of that performance is curbed due to the generation-number cutoff in in_merge_bases_many(). However, that fix doesn't help users without a commit-graph, and it can still be painful if that cutoff is

[PATCH v2 1/3] commit-reach: implement get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least

[PATCH v2 3/3] remote: make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The add_missing_tags() method currently has quadratic behavior. This is due to a linear number (based on number of tags T) of calls to in_merge_bases_many, which has linear performance (based on number of commits C in the repository). Replace this O(T * C) algorithm with an

[PATCH v2 2/3] test-reach: test get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The get_reachable_subset() method returns the list of commits in the 'to' array that are reachable from at least one commit in the 'from' array. Add tests that check this method works in a few cases: 1. All commits in the 'to' list are reachable. This exercises the

[PATCH 2/3] test-reach: test get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The get_reachable_subset() method returns the list of commits in the 'to' array that are reachable from at least one commit in the 'from' array. Add tests that check this method works in a few cases: 1. All commits in the 'to' list are reachable. This exercises the

[PATCH 3/3] remote: make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The add_missing_tags() method currently has quadratic behavior. This is due to a linear number (based on number of tags T) of calls to in_merge_bases_many, which has linear performance (based on number of commits C in the repository). Replace this O(T * C) algorithm with an

[PATCH 0/3] Make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
As reported earlier [1], the add_missing_tags() method in remote.c has quadratic performance. Some of that performance is curbed due to the generation-number cutoff in in_merge_bases_many(). However, that fix doesn't help users without a commit-graph, and it can still be painful if that cutoff is

[PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least

[PATCH 1/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The algorithm in can_all_from_reach_with_flags() performs a depth- first-search, terminated by generation number, intending to use a hueristic that "important" commits are found in the first-parent history. This heuristic is valuable in scenarios like fetch negotiation.

[PATCH 0/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
I originally reported this fix [1] after playing around with the trace2 series for measuring performance. Since trace2 isn't merging quickly, I pulled the performance fix patch out and am sending it on its own. The only difference here is that we don't have the tracing to verify the performance

[PATCH 3/3] commit-graph: Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The config setting "core.commitGraph" enables using the commit-graph file to accelerate commit walks through parsing speed and generation numbers. The setting "gc.writeCommitGraph" enables writing the commit-graph file on every non-trivial 'git gc' operation. Together, they

[PATCH 1/3] t6501: use --quiet when testing gc stderr

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The test script t6501-freshen-objects.sh has some tests that care if 'git gc' has any output to stderr. This is intended to say that no warnings occurred related to broken links. However, when we have operations that output progress (like writing the commit-graph) this

[PATCH 0/3] Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
The commit-graph feature is starting to stabilize. Based on what is in master right now, we have: Git 2.18: * Ability to write commit-graph (requires user interaction). * Commit parsing is faster when commit-graph exists. * Must have core.commitGraph true to use. Git

[PATCH 2/3] t: explicitly turn off core.commitGraph as needed

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few tests that already require GIT_TEST_COMMIT_GRAPH=0 as they rely on an interaction with the commits in the object store that is circumvented by parsing commit information from the commit-graph instead. Before enabling core.commitGraph as true by default,

[PATCH 0/1] Run GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX during CI

2018-10-17 Thread Derrick Stolee via GitGitGadget
Our CI scripts include a step to run the test suite with certain optional variables enabled. Now that all branches should build and have tests succeed with GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX enabled, add those variables to that stage. Note: the GIT_TEST_MULTI_PACK_INDEX variable

[PATCH 1/1] ci: add optional test variables

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The commit-graph and multi-pack-index features introduce optional data structures that are not required for normal Git operations. It is important to run the normal test suite without them enabled, but it is helpful to also run the test suite using them. Our continuous

[PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire

[PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as

[PATCH v4 3/7] test-reach: add rev-list tests

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list

[PATCH v4 1/7] prio-queue: add 'peek' operation

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in

[PATCH v4 7/7] t6012: make rev-list tests more interesting

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee As we are working to rewrite some of the revision-walk machinery, there could easily be some interesting interactions between the options that force topological constraints (--topo-order, --date-order, and --author-date-order) along with specifying a path. Add extra tests

[PATCH v4 5/7] commit/revisions: bookkeeping before refactoring

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in

[PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-16 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk machinery. Well, "refactoring" is probably the wrong word, as I don't actually remove the old code. Instead, when we see certain options in the 'rev_info' struct, we redirect the commit-walk logic to a new set of methods

[PATCH v4 2/7] test-reach: add run_three_modes method

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

[PATCH v2 2/3] midx: close multi-pack-index on repack

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

[PATCH v2 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH v2 1/3] midx: fix broken free() in close_midx()

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH v2 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-12 Thread Derrick Stolee via GitGitGadget
To increase coverage of the multi-pack-index feature, add a GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_* variables. After creating the environment variable and running the test suite with it enabled, I found a few bugs in the multi-pack-index implementation. These

[PATCH 1/3] midx: fix broken free() in close_midx()

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-08 Thread Derrick Stolee via GitGitGadget
To increase coverage of the multi-pack-index feature, add a GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_* variables. After creating the environment variable and running the test suite with it enabled, I found a few bugs in the multi-pack-index implementation. These

[PATCH 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH 2/3] midx: close multi-pack-index on repack

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

[PATCH v4 1/1] contrib: add coverage-diff script

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an

[PATCH v4 0/1] contrib: Add script to show uncovered "new" lines

2018-10-08 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage

[PATCH v2 1/3] commit-graph: clean up leaked memory during write

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the

[PATCH v2 3/3] commit-graph: reduce initial oid allocation

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for

[PATCH v2 0/3] Clean up leaks in commit-graph.c

2018-10-03 Thread Derrick Stolee via GitGitGadget
While looking at the commit-graph code, I noticed some memory leaks. These can be found by running valgrind --leak-check=full ./git commit-graph write --reachable The impact of these leaks are small, as we never call write_commit_graph _reachable in a loop, but it is best to be diligent here.

[PATCH 2/2] commit-graph: reduce initial oid allocation

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for

[PATCH 1/2] commit-graph: clean up leaked memory during write

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Running 'valgrind --leak-check=full git commit-graph

[PATCH 0/2] Clean up leaks in commit-graph.c

2018-10-02 Thread Derrick Stolee via GitGitGadget
While looking at the commit-graph code, I noticed some memory leaks. These can be found by running valgrind --leak-check=full ./git commit-graph write --reachable The impact of these leaks are small, as we never call write_commit_graph _reachable in a loop, but it is best to be diligent here.

[PATCH 1/1] read-cache: update index format default to v4

2018-09-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The index v4 format has been available since 2012 with 9d22778 "reach-cache.c: write prefix-compressed names in the index". Since the format has been stable for so long, almost all versions of Git in use today understand version 4, removing one barrier to upgrade -- that

[PATCH 0/1] read-cache: update index format default to v4

2018-09-24 Thread Derrick Stolee via GitGitGadget
After discussing this with several people off-list, I thought I would open the question up to the list: Should we update the default index version to v4? The .git/index file stores the list of every path tracked by Git in the working directory, including merge information, staged paths, and

[PATCH v4 0/1] Properly peel tags in can_all_from_reach_with_flags()

2018-09-24 Thread Derrick Stolee via GitGitGadget
As Peff reported [1], the refactored can_all_from_reach_with_flags() method does not properly peel tags. Since the helper method can_all_from_reach() and code in t/helper/test-reach.c all peel tags before getting to this method, it is not super-simple to create a test that demonstrates this. I

[PATCH v4 1/1] commit-reach: properly peel tags and clear flags

2018-09-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e "commit-reach: make can_all_from_reach... linear" but incorrectly assumed that all objects provided were commits. During a fetch negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags to

[PATCH v3 5/7] commit/revisions: bookkeeping before refactoring

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in

[PATCH v3 7/7] revision.c: refactor basic topo-order logic

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING,

[PATCH v3 3/7] test-reach: add rev-list tests

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list

[PATCH v3 6/7] revision.h: add whitespace in flag definitions

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In anticipation of adding longer flag names in the next change, add an extra tab to each flag definition in revision.h. Signed-off-by: Derrick Stolee --- revision.h | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git

[PATCH v3 4/7] revision.c: begin refactoring --topo-order logic

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire

[PATCH v3 1/7] prio-queue: add 'peek' operation

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in

[PATCH v3 2/7] test-reach: add run_three_modes method

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

[PATCH v3 0/7] Use generation numbers for --topo-order

2018-09-21 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk machinery. Well, "refactoring" is probably the wrong word, as I don't actually remove the old code. Instead, when we see certain options in the 'rev_info' struct, we redirect the commit-walk logic to a new set of methods

[PATCH v3 1/1] contrib: add coverage-diff script

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an

[PATCH v3 0/1] contrib: Add script to show uncovered "new" lines

2018-09-21 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage

[PATCH v3 2/2] commit-reach: fix memory and flag leaks

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() method uses 'assign_flag' as a value we can use to mark objects temporarily during our commit walk. The intent is that these flags are removed from all objects before returning. However, this is not the case. The 'from' array could also

[PATCH v3 0/2] Properly peel tags in can_all_from_reach_with_flags()

2018-09-21 Thread Derrick Stolee via GitGitGadget
As Peff reported [1], the refactored can_all_from_reach_with_flags() method does not properly peel tags. Since the helper method can_all_from_reach() and code in t/helper/test-reach.c all peel tags before getting to this method, it is not super-simple to create a test that demonstrates this. I

[PATCH v3 1/2] commit-reach: properly peel tags

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e "commit-reach: make can_all_from_reach... linear" but incorrectly assumed that all objects provided were commits. During a fetch negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags to

[PATCH v2 5/6] commit/revisions: bookkeeping before refactoring

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in

[PATCH v2 0/6] Use generation numbers for --topo-order

2018-09-17 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk machinery. Well, "refactoring" is probably the wrong word, as I don't actually remove the old code. Instead, when we see certain options in the 'rev_info' struct, we redirect the commit-walk logic to a new set of methods

[PATCH v2 3/6] test-reach: add rev-list tests

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list

[PATCH v2 1/6] prio-queue: add 'peek' operation

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in

[PATCH v2 6/6] revision.c: refactor basic topo-order logic

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING,

[PATCH v2 4/6] revision.c: begin refactoring --topo-order logic

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire

[PATCH v2 2/6] test-reach: add run_three_modes method

2018-09-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

[PATCH v2 01/11] multi-pack-index: add 'verify' verb

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index builtin writes multi-pack-index files, and uses a 'write' verb to do so. Add a 'verify' verb that checks this file matches the contents of the pack-indexes it replaces. The current implementation is a no-op, but will be extended in small increments in

[PATCH v2 10/11] multi-pack-index: report progress during 'verify'

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When verifying a multi-pack-index, the only action that takes significant time is checking the object offsets. For example, to verify a multi-pack-index containing 6.2 million objects in the Linux kernel repository takes 1.3 seconds on my machine. 99% of that time is spent

[PATCH v2 04/11] multi-pack-index: verify packname order

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The final check we make while loading a multi-pack-index is that the packfile names are in lexicographical order. Make this error be a die() instead. In order to test this condition, we need multiple packfiles. Earlier in t5319-multi-pack-index.sh, we tested the interaction

[PATCH v2 09/11] multi-pack-index: verify object offsets

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'git multi-pack-index verify' command must verify the object offsets stored in the multi-pack-index are correct. There are two ways the offset chunk can be incorrect: the pack-int-id and the object offset. Replace the BUG() statement with a die() statement, now that we

[PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When loading a 64-bit offset, we intend to check that off_t can store the resulting offset. However, the condition accidentally checks the 32-bit offset to see if it is smaller than a 64-bit value. Fix it, and this will be covered by a test in the 'git multi-pack-index

[PATCH v2 05/11] multi-pack-index: verify missing pack

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 16 t/t5319-multi-pack-index.sh | 5 + 2 files changed, 21 insertions(+) diff --git a/midx.c b/midx.c index e655a15aed..a02b19efc1 100644 --- a/midx.c +++ b/midx.c @@ -926,13 +926,29 @@

[PATCH v2 11/11] fsck: verify multi-pack-index

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When core.multiPackIndex is true, we may have a multi-pack-index in our object directory. Add calls to 'git multi-pack-index verify' at the end of 'git fsck' if so. Signed-off-by: Derrick Stolee --- builtin/fsck.c | 18 ++

[PATCH v2 07/11] multi-pack-index: verify oid lookup order

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 11 +++ t/t5319-multi-pack-index.sh | 8 2 files changed, 19 insertions(+) diff --git a/midx.c b/midx.c index dfd26b4d74..06d5cfc826 100644 --- a/midx.c +++ b/midx.c @@ -959,5 +959,16 @@ int

[PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 3 +++ t/t5319-multi-pack-index.sh | 13 + 2 files changed, 16 insertions(+) diff --git a/midx.c b/midx.c index ec78254bb6..8b054b39ab 100644 --- a/midx.c +++ b/midx.c @@ -100,6 +100,9 @@ struct

[PATCH v2 02/11] multi-pack-index: verify bad header

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When verifying if a multi-pack-index file is valid, we want the command to fail to signal an invalid file. Previously, we wrote an error to stderr and continued as if we had no multi-pack-index. Now, die() instead of error(). Add tests that check corrupted headers in a few

[PATCH v2 06/11] multi-pack-index: verify oid fanout order

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 9 + t/t5319-multi-pack-index.sh | 8 2 files changed, 17 insertions(+) diff --git a/midx.c b/midx.c index a02b19efc1..dfd26b4d74 100644 --- a/midx.c +++ b/midx.c @@ -950,5 +950,14 @@ int

[PATCH v2 00/11] Add 'git multi-pack-index verify' command

2018-09-13 Thread Derrick Stolee via GitGitGadget
The multi-pack-index file provides faster lookups in repos with many packfiles by duplicating the information from multiple pack-indexes into a single file. This series allows us to verify a multi-pack-index using 'git multi-pack-index verify' and 'git fsck' (when core.multiPackIndex is true).

[PATCH v2 1/1] commit-reach: properly peel tags

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e "commit-reach: make can_all_from_reach... linear" but incorrectly assumed that all objects provided were commits. During a fetch negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags to

[PATCH v2 0/1] Properly peel tags in can_all_from_reach_with_flags()

2018-09-13 Thread Derrick Stolee via GitGitGadget
As Peff reported [1], the refactored can_all_from_reach_with_flags() method does not properly peel tags. Since the helper method can_all_from_reach() and code in t/helper/test-reach.c all peel tags before getting to this method, it is not super-simple to create a test that demonstrates this. I

[PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an

[PATCH v2 0/1] contrib: Add script to show uncovered "new" lines

2018-09-13 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage

[PATCH 0/1] contrib: Add script to show uncovered "new" lines

2018-09-12 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage

  1   2   >