[PATCH 0/3] Make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
(). Thanks, -Stolee [1] https://public-inbox.org/git/cabpp-becpsoxudovjbdg_3w9wus102rw+e+qpmd4g3qyd-q...@mail.gmail.com/ Derrick Stolee (3): commit-reach: implement get_reachable_subset test-reach: test get_reachable_subset remote: make add_missing_tags() linear commit-reach.c| 70

[PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least

Git Test Coverage Report (Tuesday, Oct 30)

2018-10-30 Thread Derrick Stolee
introducing uncovered code: Ævar Arnfjörð Bjarmason  879b6a9e6: i18n: remote.c: mark error(...) messages for translation Antonio Ospite  45f5ef3d7: submodule: factor out a config_set_in_gitmodules_file_gently function Antonio Ospite  bcbc780d1: submodule: add a print_config_from_gitmodules() helper Anton

Re: [RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee
Here is a re-formatted version of the tables I introduced earlier. The tables were too wide for public-inbox to render correctly (when paired with my email client). Hopefully this bulleted-list format works better. Thanks, Stefan, for pointing out the rendering problems! ### Test 1: `git log

Re: [RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee
On 10/29/2018 3:22 PM, Stefan Beller wrote: Based on the performance results alone, we should remove minimum generation numbers, (epoch, date) pairs, and FELINE index from consideration. There are enough examples of these indexes performing poorly. In contrast, maximum generation numbers and

[RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee
We've discussed in several places how to improve upon generation numbers. This RFC is a report based on my investigation into a few new options, and how they compare for Git's purposes on several existing open-source repos. You can find this report and the associated test scripts at

Re: [PATCH] packfile: close multi-pack-index in close_all_packs

2018-10-29 Thread Derrick Stolee
On 10/29/2018 7:10 AM, SZEDER Gábor wrote: On Thu, Oct 25, 2018 at 12:54:05PM +, Derrick Stolee wrote: Whenever we delete pack-files from the object directory, we need to also delete the multi-pack-index that may refer to those objects. Sometimes, this connection is obvious, like during

Re: Git Test Coverage Report (Saturday, Oct 27)

2018-10-27 Thread Derrick Stolee
On 10/27/2018 9:55 AM, Junio C Hamano wrote: Derrick Stolee writes: Uncovered in mater not in master@{1} Does this typo indicate that some part of the process to produce and send out this report involve manual editing? I kick off four builds

Git Test Coverage Report (Saturday, Oct 27)

2018-10-27 Thread Derrick Stolee
o_list() to work on a todo_list Alban Gruin  b8dac44d1: sequencer: refactor skip_unnecessary_picks() to work on a todo_list Antonio Ospite  45f5ef3d7: submodule: factor out a config_set_in_gitmodules_file_gently function Antonio Ospite  bcbc780d1: submodule: add a print_config_from_gitmodules() hel

Re: [PATCH v3 7/7] revision.c: refactor basic topo-order logic

2018-10-25 Thread Derrick Stolee
On 10/25/2018 5:43 AM, Jeff King wrote: On Thu, Oct 11, 2018 at 12:21:44PM -0400, Derrick Stolee wrote: 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since

[PATCH] packfile: close multi-pack-index in close_all_packs

2018-10-25 Thread Derrick Stolee
failure when running t6500-gc.sh with GIT_TEST_MULTI_PACK_INDEX=1. Reported-by: Szeder Gábor Signed-off-by: Derrick Stolee --- Thanks for the report, Szeder! I think this is the correct fix, and more likely to be permanent across all builtins that run auto-GC. I based it on ds/test-multi-pack-index so

Recommended configurations (was Re: [PATCH v1 2/2] reset: add new reset.quietDefault config setting)

2018-10-24 Thread Derrick Stolee
On 10/23/2018 4:03 PM, Ævar Arnfjörð Bjarmason wrote: [snip] The --ahead-behind config setting stalled on-list before: https://public-inbox.org/git/36e3a9c3-f7e2-4100-1bfc-647b809a0...@jeffhostetler.com/ Now we have this similarly themed thing. I think we need to be mindful of how changes like

Re: [PATCH v3 09/12] commit-graph: convert to using the_hash_algo

2018-10-24 Thread Derrick Stolee
On 10/21/2018 10:43 PM, brian m. carlson wrote: Instead of using hard-coded constants for object sizes, use the_hash_algo to look them up. In addition, use a function call to look up the object ID version and produce the correct value. For now, we use version 1, which means to use the default

Re: [PATCH] commit-reach: fix sorting commits by generation

2018-10-24 Thread Derrick Stolee
On 10/23/2018 4:32 PM, Thomas Gummerer wrote: On 10/22, René Scharfe wrote: Am 22.10.2018 um 23:10 schrieb Thomas Gummerer: Anyway, your implied question was discussed back then. Derrick wrote: The reason to sort is to hopefully minimize the amount we walk by exploring the "lower"

Re: [PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-23 Thread Derrick Stolee
On 10/22/2018 9:37 AM, Jakub Narebski wrote: "Derrick Stolee via GitGitGadget" writes: From: Derrick Stolee The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new

[RFC PATCH] revision.c: use new algorithm in A..B case

2018-10-21 Thread Derrick Stolee
Signed-off-by: Derrick Stolee --- I just wanted to mention that in order to use the new logic for 'git log --topo-order A..B', we just need the following patch. It is an extra time that sets 'revs->limited' to 1, triggering the old logic. You can use this for comparison purposes, but

Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-21 Thread Derrick Stolee
On 10/21/2018 9:12 PM, Junio C Hamano wrote: Jakub Narebski writes: So if revs->limited is set (but not because revs->topo_order is set), which means A..B queries, we will be still using the old algorithm. All right, though I wonder if it could be improved in the future (perhaps with the help

Re: [PATCH v4 3/7] test-reach: add rev-list tests

2018-10-21 Thread Derrick Stolee
On 10/21/2018 6:21 AM, Jakub Narebski wrote: "Derrick Stolee via GitGitGadget" writes: From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few impor

Re: What's cooking in git.git (Oct 2018, #04; Fri, 19)

2018-10-19 Thread Derrick Stolee
On 10/19/2018 2:02 AM, Junio C Hamano wrote: * ds/ci-commit-graph-and-midx (2018-10-19) 1 commit - ci: add optional test variables One of our CI tests to run with "unusual/experimental/random" settings now also uses commti-graph and midx. Will merge to 'next'.

Re: [PATCH 1/1] commit-reach: fix first-parent heuristic

2018-10-19 Thread Derrick Stolee
On 10/19/2018 1:24 AM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: We can also re-run the performance tests from commit 4fbcca4e "commit-reach: make can_all_from_reach... linear". Performance was measured on the Linux repository using 'test-tool reac

Re: [PATCH] multi-pack-index: avoid dead store for struct progress

2018-10-18 Thread Derrick Stolee
On 10/18/2018 2:59 PM, Carlo Marcelo Arenas Belón wrote: it is initialized unconditionally by a call to start_progress below. Signed-off-by: Carlo Marcelo Arenas Belón --- midx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/midx.c b/midx.c index ea2f3ffe2e..4fac0cd08a

[PATCH 1/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The algorithm in can_all_from_reach_with_flags() performs a depth- first-search, terminated by generation number, intending to use a hueristic that "important" commits are found in the first-parent history. This heuristic is valuable in scenarios like fetch n

[PATCH 0/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
fix in the test script. See the patch message for details about the fix. Thanks, -Stolee [1] https://public-inbox.org/git/20180906151309.66712-7-dsto...@microsoft.com/ [RFC PATCH 6/6] commit-reach: fix first-parent heuristic Derrick Stolee (1): commit-reach: fix first-parent heuristic commit

Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-18 Thread Derrick Stolee
On 10/17/2018 8:06 PM, brian m. carlson wrote: On Wed, Oct 17, 2018 at 04:31:19PM +0200, Duy Nguyen wrote: On Wed, Oct 17, 2018 at 12:44 AM brian m. carlson wrote: Honestly, anything in the .git directory that is not the v3 pack indexes or the loose object file should be in exactly one hash

Re: [PATCH 0/3] Use commit-graph by default

2018-10-18 Thread Derrick Stolee
On 10/17/2018 11:47 PM, Junio C Hamano wrote: If I recall correctly, one more task that was discussed but hasn't been addressed well is how the generation and incremental update of it should integrate with the normal repository maintenance workflow (perhaps "gc --auto"). If we are going to turn

Re: [PATCH 1/3] t6501: use --quiet when testing gc stderr

2018-10-18 Thread Derrick Stolee
On 10/18/2018 1:23 AM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: From: Derrick Stolee The test script t6501-freshen-objects.sh has some tests that care if 'git gc' has any output to stderr. This is intended to say that no warnings occurred related to br

[PATCH 3/3] commit-graph: Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The config setting "core.commitGraph" enables using the commit-graph file to accelerate commit walks through parsing speed and generation numbers. The setting "gc.writeCommitGraph" enables writing the commit-graph file on every non-trivial 'git gc'

[PATCH 1/3] t6501: use --quiet when testing gc stderr

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The test script t6501-freshen-objects.sh has some tests that care if 'git gc' has any output to stderr. This is intended to say that no warnings occurred related to broken links. However, when we have operations that output progress (like writing the commit-graph

[PATCH 0/3] Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
a necessary evil, since we already had to disable GIT_TEST_COMMIT_GRAPH for some tests, we now also need to turn off core.commitGraph. Thanks, -Stolee Derrick Stolee (3): t6501: use --quiet when testing gc stderr t: explicitly turn off core.commitGraph as needed commit-graph: Use commit-graph

[PATCH 2/3] t: explicitly turn off core.commitGraph as needed

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few tests that already require GIT_TEST_COMMIT_GRAPH=0 as they rely on an interaction with the commits in the object store that is circumvented by parsing commit information from the commit-graph instead. Before enabling core.commitGraph as true by default

Re: commit-graph is cool (overcoming add_missing_tags() perf issues)

2018-10-17 Thread Derrick Stolee
On 10/17/2018 2:00 PM, Elijah Newren wrote: Hi, Just wanted to give a shout-out for the commit-graph work and how impressive it is. I had an internal report from a user that git pushes containing only one new tiny commit were taking over a minute (in a moderate size repo with good network

Git Test Coverage Report (Wednesday, Oct 17)

2018-10-17 Thread Derrick Stolee
gc_log_path); 3029970275 builtin/gc.c 470) ret = error_errno(_("cannot read '%s'"), gc_log_path); fec2ed2187 builtin/gc.c 495) die(FAILED_RUN, pack_refs_cmd.argv[0]); fec2ed2187 builtin/gc.c 498) die(FAILED_RUN, reflog.argv[0]); 3029970275 builtin/gc.c 585) exit(128); fec2ed2187 buil

Re: [PATCH 0/1] Run GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX during CI

2018-10-17 Thread Derrick Stolee
On 10/17/2018 9:00 AM, Derrick Stolee via GitGitGadget wrote: [1] https://git.visualstudio.com/git/_build?definitionId=4 Newlines are hard. Sorry for the formatting issues when translating from a PR description. Build definition that tests Git with different arrangements of GIT_TEST_

Re: [PATCH] test-tool: show tool list on error

2018-10-17 Thread Derrick Stolee
utors' lives! Thanks. Reviewed-by: Derrick Stolee

[PATCH 0/1] Run GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX during CI

2018-10-17 Thread Derrick Stolee via GitGitGadget
with different arrangements of GIT_TEST_* variables. Derrick Stolee (1): ci: add optional test variables ci/run-build-and-tests.sh | 2 ++ 1 file changed, 2 insertions(+) base-commit: d82963f34cf6921ed29d1fc2d96b16acf9005159 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-49

[PATCH 1/1] ci: add optional test variables

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The commit-graph and multi-pack-index features introduce optional data structures that are not required for normal Git operations. It is important to run the normal test suite without them enabled, but it is helpful to also run the test suite using them. Our continuous

Re: [PATCH 00/19] Bring more repository handles into our code base

2018-10-17 Thread Derrick Stolee
On 10/16/2018 7:35 PM, Stefan Beller wrote: This series takes another approach as it doesn't change the signature of functions, but introduces new functions that can deal with arbitrary repositories, keeping the old function signature around using a shallow wrapper.

Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-17 Thread Derrick Stolee
On 10/14/2018 10:19 PM, brian m. carlson wrote: Since the commit-graph code wants to serialize the hash algorithm into the data store, specify a version number for each supported algorithm. Note that we don't use the values of the constants themselves, as they are internal and could change in

[PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the ent

[PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits

[PATCH v4 3/7] test-reach: add rev-list tests

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list --topo

[PATCH v4 1/7] prio-queue: add 'peek' operation

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in t/helper

[PATCH v4 7/7] t6012: make rev-list tests more interesting

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee As we are working to rewrite some of the revision-walk machinery, there could easily be some interesting interactions between the options that force topological constraints (--topo-order, --date-order, and --author-date-order) along with specifying a path. Add extra tests

[PATCH v4 5/7] commit/revisions: bookkeeping before refactoring

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order

[PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-16 Thread Derrick Stolee via GitGitGadget
ee/topo-order/testA branch containing this series along with commits to compute commit-graph in entire test suite. Cc: avarab@gmail.comCc: szeder@gmail.com Derrick Stolee (7): prio-queue: add 'peek' operation test-reach: add run_three_modes method test-reach: add rev-list tests r

[PATCH v4 2/7] test-reach: add run_three_modes method

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread Derrick Stolee
On 10/16/2018 11:35 AM, Duy Nguyen wrote: On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson wrote: Since the commit-graph code wants to serialize the hash algorithm into the data store, specify a version number for each supported algorithm. Note that we don't use the values of the constants

Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee
On 10/16/2018 8:57 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Oct 16 2018, Derrick Stolee wrote: On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When

Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee
On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more

Re: Git Test Coverage Report (Monday, Oct 15)

2018-10-15 Thread Derrick Stolee
On 10/15/2018 12:24 PM, Derrick Stolee wrote: Uncovered code in 'jch' (22f2f0f) and not in 'next' (152ad8e) - prio-queue.c 2d181390f3 94) return queue->array[queue->nr - 1].data; (I have a fix to cover this in my private

Git Test Coverage Report (Monday, Oct 15)

2018-10-15 Thread Derrick Stolee
(_list); b97e187364 4804) return -1; b97e187364 4808) return error(_("could not copy '%s' to '%s'."), todo_file, b97e187364 4812) return error(_("could not transform the todo list")); b97e187364 4841) return error(_("could not transform the todo list")); b97e187364 4844)

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-15 Thread Derrick Stolee
On 10/14/2018 10:29 AM, René Scharfe wrote: It still has some repetition, converted code is a bit longer than the current one, and I don't know how to build a Coccinelle rule that would do that conversion. Looked for a possibility to at least leave QSORT call-sites alone by enhancing that

Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-15 Thread Derrick Stolee
On 10/14/2018 10:19 PM, brian m. carlson wrote: Since the commit-graph code wants to serialize the hash algorithm into the data store, specify a version number for each supported algorithm. Note that we don't use the values of the constants themselves, as they are internal and could change in

Re: [PATCH v2 09/13] commit-graph: convert to using the_hash_algo

2018-10-15 Thread Derrick Stolee
On 10/14/2018 10:18 PM, brian m. carlson wrote: Instead of using hard-coded constants for object sizes, use the_hash_algo to look them up. In addition, use a function call to look up the object ID version and produce the correct value. This looks good and I can see already how this new

Re: [PATCH 0/4] Bloom filter experiment

2018-10-15 Thread Derrick Stolee
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. Peff, Szeder, and Jonathan, Thanks for giving me the kick in the pants to finally write a proof of concept for my personal take on

Re: Git Test Coverage Report (Friday, Oct 12)

2018-10-15 Thread Derrick Stolee
On 10/15/2018 4:09 AM, Johannes Schindelin wrote: Hi Stolee, On Fri, 12 Oct 2018, Derrick Stolee wrote: In an effort to ensure new code is reasonably covered by the test suite, we now have contrib/coverage-diff.sh to combine the gcov output from 'make coverage-test ; make coverage-report

Re: [PATCH v2 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-12 Thread Derrick Stolee
On 10/12/2018 1:34 PM, Derrick Stolee via GitGitGadget wrote: To increase coverage of the multi-pack-index feature, add a GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_* variables. After creating the environment variable and running the test suite with it enabled, I

[PATCH v2 2/3] midx: close multi-pack-index on repack

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

[PATCH v2 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH v2 1/3] midx: fix broken free() in close_midx()

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH v2 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-12 Thread Derrick Stolee via GitGitGadget
of the ongoing work. Eventually, we can add these variables to the Travis CI scripts. [1] https://git.visualstudio.com/git/_build?definitionId=4 Derrick Stolee (3): midx: fix broken free() in close_midx() midx: close multi-pack-index on repack multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX builtin

Re: [PATCH v3 1/2] commit-graph write: add progress output

2018-10-12 Thread Derrick Stolee
On 10/12/2018 11:07 AM, Ævar Arnfjörð Bjarmason wrote: On Fri, Oct 12 2018, Junio C Hamano wrote: Makes sense. If this second iteration were also time consuming, then it probably is a good idea to split these into two separate phases? "Counting 1...N" followed by "Inspecting 1...N" or

Git Test Coverage Report (Friday, Oct 12)

2018-10-12 Thread Derrick Stolee
3 314) ce_flags = ce->ce_flags; 568f3a6073 315) base_flags = base->ce_flags; 568f3a6073 317) ce->ce_flags   &= ondisk_flags; 568f3a6073 318) base->ce_flags &= ondisk_flags; 568f3a6073 319) ret = memcmp(>ce_stat_data, >ce_stat_data, 568f3a6073 322) ce->ce_flags = ce_flags; 568f3

Re: [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic

2018-10-12 Thread Derrick Stolee
On 10/12/2018 2:33 AM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: * revs->limited implies we run limit_list() to walk the entire reachable set. There are some short-cuts here, such as if we perform a range query like 'git rev-list COMPARE..HEAD' and w

Re: [PATCH v4 0/1] contrib: Add script to show uncovered "new" lines

2018-10-12 Thread Derrick Stolee
On 10/11/2018 11:01 PM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: CHANGES IN V4: I reduced the blame output using -s which decreases the width. I include a summary of the commit authors at the end to help people see the lines they wrote. This version is a

Re: [PATCH v3 7/7] revision.c: refactor basic topo-order logic

2018-10-11 Thread Derrick Stolee
On 10/11/2018 11:35 AM, Jeff King wrote: On Fri, Sep 21, 2018 at 10:39:36AM -0700, Derrick Stolee via GitGitGadget wrote: From: Derrick Stolee When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: [...] In the new algorithm, these three steps

Re: [PATCH 1/2] One filter per commit

2018-10-11 Thread Derrick Stolee
On 10/10/2018 9:21 PM, Jonathan Tan wrote: diff --git a/commit-graph.c b/commit-graph.c index f415d3b41f..90b0b3df90 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -715,13 +715,11 @@ static int add_ref_to_list(const char *refname, static void add_changes_to_bloom_filter(struct

Re: Bloom Filters

2018-10-11 Thread Derrick Stolee
On 10/9/2018 7:12 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote: Hmph. It really sounds like we could do better with a custom RLE solution. But that makes me feel like I'm missing something, because surely I can't invent something better than the state of the

Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-11 Thread Derrick Stolee
On 10/10/2018 1:43 AM, Junio C Hamano wrote: * ds/reachable-topo-order (2018-09-21) 7 commits - revision.c: refactor basic topo-order logic - revision.h: add whitespace in flag definitions - commit/revisions: bookkeeping before refactoring - revision.c: begin refactoring --topo-order

Re: [RFC PATCH 6/6] commit-reach: fix first-parent heuristic

2018-10-11 Thread Derrick Stolee
On 10/10/2018 9:50 PM, Jonathan Nieder wrote: Hi, Derrick Stolee wrote: commit-reach.c| 4 +++- t/t6600-test-reach.sh | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) I like this testing technique, and the test passes for me. Except: if I put CC = cc -m32

Re: [PATCH 0/4] Bloom filter experiment

2018-10-09 Thread Derrick Stolee
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
On 10/9/2018 2:46 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: [I snipped all of the parts about bloom filters that seemed entirely reasonable to me ;) ] Imagine we have that list. Is a bloom filter still the best data structure for each commit

Re: [PATCH 2/3] midx: close multi-pack-index on repack

2018-10-09 Thread Derrick Stolee
On 10/9/2018 5:10 AM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: diff --git a/builtin/repack.c b/builtin/repack.c index c6a7943d5c..7925bb976e 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -432,6 +432,10 @@ int cmd_repack(int argc, const char **argv,

Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
(Changing title to reflect the new topic.) On 10/8/2018 11:08 PM, Jeff King wrote: On Mon, Oct 08, 2018 at 02:29:47PM -0400, Derrick Stolee wrote: There are two questions that I was hoping to answer by looking at your code: 1. How do you store your Bloom filter? Is it connected to the commit

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee
On 10/8/2018 2:10 PM, SZEDER Gábor wrote: On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote: Nice! These numbers make sense to me, in terms of how many TREESAME queries we actually need to perform for such a query. Yeah... because you didn't notice that I deliberately cheated

Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-08 Thread Derrick Stolee
On 10/8/2018 1:05 PM, Ananya Krishna Maram wrote: Hi All, Hello, Ananya! Welcome. I was searching through #leftovers and found this. https://public-inbox.org/git/cabpp-bgvvxcbzx44er6to-pusfen_6gnyj1u5cuon9deaa4...@mail.gmail.com/ This patch address the task discussed in the above link. The

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee
On 10/8/2018 12:41 PM, SZEDER Gábor wrote: On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote: I'm still excited about the prospect of a bloom filter for paths which each commit touches. I think that's the next big frontier in getting things like "git log -- path" to a reasonable

[PATCH 1/3] midx: fix broken free() in close_midx()

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-08 Thread Derrick Stolee via GitGitGadget
of the ongoing work. Eventually, we can add these variables to the Travis CI scripts. [1] https://git.visualstudio.com/git/_build?definitionId=4 Derrick Stolee (3): midx: fix broken free() in close_midx() midx: close multi-pack-index on repack multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX builtin

[PATCH 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH 2/3] midx: close multi-pack-index on repack

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Derrick Stolee
On 10/8/2018 10:58 AM, Ævar Arnfjörð Bjarmason wrote: On Mon, Oct 08 2018, Derrick Stolee wrote: On 10/8/2018 9:43 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote: From: Derrick Stolee The commit-graph feature is tested in isolation by t5318

[PATCH v4 1/1] contrib: add coverage-diff script

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has

[PATCH v4 0/1] contrib: Add script to show uncovered "new" lines

2018-10-08 Thread Derrick Stolee via GitGitGadget
e.p->pack_name); cc6af73c02 992) break; Commits introducing uncovered code: Derrick Stolee 56ee7ff15: multi-pack-index: add 'verify' verb Derrick Stolee 66ec0390e: fsck: verify multi-pack-index Derrick Stolee cc6af73c0: multi-pack-index: verify object offsets Junio C Hamano 76f2

Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Derrick Stolee
On 10/8/2018 9:43 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote: From: Derrick Stolee The commit-graph feature is tested in isolation by t5318-commit-graph.sh and t6600-test-reach.sh, but there are many more interesting scenarios involving

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 3:47 PM, Jeff King wrote: On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: So can we really just take (total_objects - commit_graph_objects) and compare it to some threshold? The commit-graph only stores the number of _commits_, not total objects. Oh, right

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 3:21 PM, Jeff King wrote: On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote: My misunderstanding was that your proposed change to gc computes the commit-graph in either of these two cases: (1) The auto-GC threshold is met. (2) There is no commit-graph file

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote: On Fri, Oct 05 2018, Derrick Stolee wrote: On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to polish this up for submission now, but here's a WIP patch that implements this, highlights: * There's

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Derrick Stolee
On 10/4/2018 6:59 PM, René Scharfe wrote: Am 01.10.2018 um 22:37 schrieb René Scharfe: Am 01.10.2018 um 21:26 schrieb Derrick Stolee: Good catch! I'm disappointed that we couldn't use type-checking here, as it is quite difficult to discover that the types are wrong here. Generics in C

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee
On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: I don't have time to polish this up for submission now, but here's a WIP patch that implements this, highlights: * There's a gc.clone.autoDetach=false default setting which overrides gc.autoDetach if 'git gc --auto' is run via git-clone

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Derrick Stolee
On 10/3/2018 2:51 PM, Jeff King wrote: On Wed, Oct 03, 2018 at 08:47:11PM +0200, Ævar Arnfjörð Bjarmason wrote: On Wed, Oct 03 2018, Stefan Beller wrote: So we wouldn't be spending 5 minutes repacking linux.git right after cloning it, just ~10s generating the commit graph, and the same would

[PATCH v2 1/3] commit-graph: clean up leaked memory during write

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find

[PATCH v2 3/3] commit-graph: reduce initial oid allocation

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count

[PATCH v2 0/3] Clean up leaks in commit-graph.c

2018-10-03 Thread Derrick Stolee via GitGitGadget
se changes. V2 includes feedback from V1 along with Martin's additional patches. Thanks, -Stolee Derrick Stolee (2): commit-graph: clean up leaked memory during write commit-graph: reduce initial oid allocation Martin Ågren (1): builtin/commit-graph.c: UNLEAK variables builtin/commit-gr

Re: [PATCH 0/2] commit-graph: more leak fixes

2018-10-03 Thread Derrick Stolee
On 10/3/2018 11:36 AM, Martin Ågren wrote: Hi Derrick, These two patches on top of yours make the test suite (i.e., the subset of it that I run) leak-free with respect to builtin/commit-graph.c and commit-graph.c. Thanks! The first could be squashed into your patch 1/2. It touches the same

Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Derrick Stolee
On 10/3/2018 9:36 AM, SZEDER Gábor wrote: On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote: Don't have time to patch this now, but thought I'd send a note / RFC about this. Now that we have the commit graph it's nice to be able to set e.g. core.commitGraph=true &

Re: [PATCH 1/2] commit-graph: clean up leaked memory during write

2018-10-03 Thread Derrick Stolee
On 10/2/2018 6:44 PM, Stefan Beller wrote: My preference is to avoid them in the name of simplicity. If you're using "make SANITIZE=leak test" to check for leaks, it will skip these cases. If you're using valgrind, I think these may be reported as "reachable". But that number already isn't

Re: [PATCH] more oideq/hasheq conversions

2018-10-02 Thread Derrick Stolee
t's a reasonable time to apply it. There are a few lingering cases in pu, so another option is to wait a week or two and see if they get merged. These conversions look good to me! Reviewed-by: Derrick Stolee

[PATCH 2/2] commit-graph: reduce initial oid allocation

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count

[PATCH 1/2] commit-graph: clean up leaked memory during write

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Running 'valgrind --leak-check=full git commit-graph

<    1   2   3   4   5   6   7   8   9   10   >