. If possible, I'd like to remove it and speed up MIDX writes.
-- >8 --
Add a technical documentation file describing the design
for the multi-pack index (MIDX). Includes current limitations
and future work.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Documentation/technical/m
g new prefetch packfiles using the following command:
git midx --write --update-head --delete-expired --pack-dir=
As that release deploys we will gather more specific numbers on the
performance improvements and report them in this thread.
Derrick Stolee (18):
docs: Multi-Pack Index (MIDX
e required to ensure collision counts were low.
- We need to identify the two lexicographically closest OIDs for
fast abbreviations. Binary search allows this.
The current solution presents multiple packfiles as if they were
packed into a single packfile with one pack-index.
Signed-off-by: Derrick St
On 12/15/2017 1:30 PM, Junio C Hamano wrote:
Derrick Stolee <sto...@gmail.com> writes:
The biggest reason for the 20 seconds is not just the number of
commits in the ahead/behind but how many commits are walked (including
common to both branches) before paint_down_to_common() breaks its
On 12/15/2017 10:08 AM, Jeff Hostetler wrote:
On 12/15/2017 5:08 AM, Jeff King wrote:
On Thu, Dec 14, 2017 at 04:49:31PM -0500, Jeff Hostetler wrote:
[*] Sadly, the local repo was only about 20 days out of
date (including the Thanksgiving holidays)
Taking 20 seconds to traverse 20
On 12/11/2017 8:44 AM, George Papanikolaou wrote:
`git tag --points-at` can simply return if the given rev does not have
any tags pointing to it. It's not a failure but it shouldn't return
with 0 value.
I disagree. I think the 0 return means "I completed successfully" and
the empty output
There are several places in Git where we refer to the size of an object
by an 'unsigned long' instead of a 'size_t'. In 64-bit Linux, 'unsigned
long' is 8 bytes, but in 64-bit Windows it is 4 bytes.
The main issue with this conversion is that large objects fail to load
(they seem to hash and
On 12/4/2017 11:56 AM, Jeff King wrote:
When you put your cover letter first, you need to use "scissors" like:
-- >8 --
to separate it from the commit message. Using three-dashes means "git
am" will include your cover letter as the commit message, and omit your
real commit message entirely.
-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_file.c | 12 +++-
t/perf/p4211-line-log.sh | 4
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 8ae6cb6285..2fc8fa93b4 100644
--- a/sha1_file.c
+++ b/sha1_
On 12/1/2017 1:22 PM, Jeff King wrote:
On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote:
[snip]
diff --git a/sha1_file.c b/sha1_file.c
index 8ae6cb6285..2160323c4a 100644
This overall looks good, but I noticed one bug and a few cosmetic
improvements.
Thanks for finding quality
:
HEAD~1HEAD
7.70(7.15+0.54) 7.44(7.09+0.29) -3.4%
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_file.c | 7 ---
t/perf/p4211-line-log.sh | 4
2 files changed, 8 insertions(+), 3 deletions(-)
diff
On 11/10/2017 7:37 AM, Peter Krefting wrote:
Jeff King:
Can you get a backtrace? I'd do something like:
Seems that it spends most time in diffcore_count_changes(), that is
where it hits whenever I hit Ctrl+C (various line numbers 199-207 in
diffcore-delta.c; this is on the v2.15.0 tag).
On 10/20/2017 1:18 PM, Brandon Williams wrote:
Overview
==
This document presents a specification for a version 2 of Git's wire
protocol. Protocol v2 will improve upon v1 in the following ways:
* Instead of multiple service names, multiple commands will be
supported by a
On 10/13/2017 11:27 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 10:26:46AM -0400, Jeff King wrote:
On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:
This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
causing diff_can_quit_early() to return false. Due
On 10/13/2017 10:26 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:
This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
causing diff_can_quit_early() to return false. Due to the corner-case of the
bug it seems it will not be a huge
On 10/13/2017 10:20 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 10:10:18AM -0400, Jeff King wrote:
Hmm. So this patch makes it go fast:
diff --git a/revision.c b/revision.c
index d167223e69..b52ea4e9d8 100644
--- a/revision.c
+++ b/revision.c
@@ -409,7 +409,7 @@ static void
On 10/13/2017 9:50 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 09:39:14AM -0400, Derrick Stolee wrote:
Since I don't understand enough about the consumers to diff_tree_oid() (and
the fact that the recursive behavior may be wanted in some cases), I think
we can fix this in builtin/rev-list.c
On 10/13/2017 9:15 AM, Derrick Stolee wrote:
On 10/13/2017 8:44 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:
On 13.10.2017 15:04, Junio C Hamano wrote:
Christian Couder <christian.cou...@gmail.com> writes:
Yeah, but perhaps Git could be smarter wh
On 10/13/2017 8:44 AM, Jeff King wrote:
On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:
On 13.10.2017 15:04, Junio C Hamano wrote:
Christian Couder writes:
Yeah, but perhaps Git could be smarter when rev-listing too and avoid
processing files or
On 10/12/2017 8:02 AM, Derrick Stolee wrote:
Changes since previous version:
* Make 'pos' unsigned in get_hex_char_from_oid()
* Check response from open_pack_index()
* Small typos in commit messages
Thanks,
Stolee
I forgot to mention that I rebased on master this morning to be sure
three copies of the Linux repo:
| Packs | Loose | HEAD~3 | HEAD | Rel% |
|---||--|--|---|
| 1| 0 | 41.27 s | 38.93 s | -4.8% |
| 24| 0 | 98.04 s | 91.35 s | -5.7% |
| 23| 323952 | 117.78 s | 112.18 s | -4.8% |
Signed-off-by: Derrick
Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time computing
diffs.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
t/perf
.
The focus of this change is to refactor the existing method in a way
that clearly does not change the current behavior. In some cases, the
new method is slower than the previous method. Later changes will
correct all performance loss.
Signed-off-by: Derrick Stolee <dsto...@microsoft.
Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 14 --
| 41.27 s | 38.93 s | -4.8% |
| 24| 0 | 98.04 s | 91.35 s | -5.7% |
| 23| 323952 | 117.78 s | 112.18 s | -4.8% |
Derrick Stolee (4):
p4211-line-log.sh: add log --online --raw --parents perf test
sha1_name: unroll len loop in find_unique_abbrev_r
sha1_name: parse less while
On 10/10/2017 9:30 AM, Jeff King wrote:
On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote:
On 10/10/2017 8:56 AM, Junio C Hamano wrote:
Jeff King <p...@peff.net> writes:
OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:
On 10/10/2017 8:56 AM, Junio C Hamano wrote:
Jeff King writes:
OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:
git pack-objects .git/objects/pack/pack num_objects)
return;
Technically that also covers open_pack_index()
On 10/9/2017 9:49 AM, Jeff King wrote:
On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote:
@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid,
void *cb_data)
return 0;
}
+static void find_abbrev_len_for_pack(struct packed_git *p
Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time computing
diffs.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
t/perf
three copies of the Linux repo:
| Packs | Loose | HEAD~3 | HEAD | Rel% |
|---||--|--|---|
| 1| 0 | 41.27 s | 38.93 s | -4.8% |
| 24| 0 | 98.04 s | 91.35 s | -5.7% |
| 23| 323952 | 117.78 s | 112.18 s | -4.8% |
Signed-off-by: Derrick
Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 14 --
% |
|---||---|--|---|
| 1| 0 | 41.27 s | 38.93 s | -4.8% |
| 24| 0 | 98.04 s | 91.35 s | -5.7% |
| 23| 323952 | 117.78 s | 112.18 s | -4.8% |
Derrick Stolee (4):
p4211-line-log.sh: add log --online --raw --parents perf test
sha1_name: Unroll len loop
.
The focus of this change is to refactor the existing method in a way
that clearly does not change the current behavior. In some cases, the
new method is slower than the previous method. Later changes will
correct all performance loss.
Signed-off-by: Derrick Stolee <dsto...@microsoft.
conditioned on "min < max". The included changes were found using
the following git grep:
git grep '/ *2;' '*.c'
Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.
Signed-off-by: Derrick Stolee <dsto...@microsoft.c
On 10/6/2017 10:11 AM, Jeff King wrote:
On Thu, Oct 05, 2017 at 08:39:42AM -0400, Derrick Stolee wrote:
I'll run some perf numbers for these commands you recommend, and also see if
I can replicate some of the pain points that triggered this change using the
Linux repo.
Thanks!
-Peff
In my
On 10/6/2017 10:18 AM, Jeff King wrote:
On Fri, Oct 06, 2017 at 09:52:31AM -0400, Derrick Stolee wrote:
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:
mid = (min + max) / 2;
Instead, use the overflow-safe version
:
grep "/ 2;" *.c
grep "/ 2;" */*.c
grep "/2;" */*.c
Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
builtin/ind
On 10/5/2017 6:00 AM, Jeff King wrote:
On Thu, Oct 05, 2017 at 06:48:10PM +0900, Junio C Hamano wrote:
Jeff King writes:
This is weirdly specific. Can we accomplish the same thing with existing
tools?
E.g., could:
git cat-file --batch-all-objects
On 10/3/2017 5:51 PM, Ramsay Jones wrote:
Signed-off-by: Ramsay Jones
---
Hi Derrick,
If you need to re-roll your 'ds/find-unique-abbrev-optim' branch,
could you please squash this into the relevant patch (commit 3792c78ba0,
"test-list-objects: list a subset of
On 10/4/2017 2:07 AM, Junio C Hamano wrote:
Derrick Stolee <dsto...@microsoft.com> writes:
- exists = has_sha1_file(sha1);
- while (len < GIT_SHA1_HEXSZ) {
- struct object_id oid_ret;
- status = get_short_oid(hex, len, _ret, GET_OI
On 10/4/2017 2:10 AM, Junio C Hamano wrote:
Derrick Stolee <sto...@gmail.com> writes:
...
I understand that this patch on its own does not have good numbers. I
split the
patches 3 and 4 specifically to highlight two distinct changes:
Patch 3: Unroll the len loop that may inspect all
On 10/3/2017 11:55 AM, Stefan Beller wrote:
@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid,
void *cb_data)
return 0;
}
+static void find_abbrev_len_for_pack(struct packed_git *p,
+struct min_abbrev_data *mad)
+{
+
On 10/3/2017 6:49 AM, Junio C Hamano wrote:
Derrick Stolee <dsto...@microsoft.com> writes:
p0008.1: find_unique_abbrev() for existing objects
--
For 10 repeated tests, each checking 100,000 known objects, we find the
following result
o running in Windows Subsystem for Linux:
Pack Files: 50
Packed Objects: 22,385,898
Loose Objects: 492
Base Time: 38.9 s
New Time: 2.7 s
Rel %: -93.1%
Derrick Stolee (5):
test-list-objects: List a subset of object ids
p0008-abbrev.sh: Test find_unique_abbrev() perf
s
-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 57 ++---
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1ebe49 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -474,10 +
running in Windows Subsystem for Linux:
Pack Files: 50
Packed Objects: 22,385,898
Loose Objects: 492
Base Time: 3.91 s
New Time: 3.08 s
Rel %: -21.1%
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 14 --
1 file changed, 12 insertions
Time: 2.72 s
Rel %: -11.8
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 70 +
1 file changed, 66 insertions(+), 4 deletions(-)
diff --git a/sha1_name.c b/sha1_name.c
index 5081aeb71..54b3a37da
probability).
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Makefile | 1 +
t/helper/.gitignore| 1 +
t/helper/test-abbrev.c | 18 ++
t/perf/p0008-abbrev.sh | 22 ++
4 files changed, 42 insertions(+)
create mode 100644 t/
to avoid
clustering and therefore quite uniformly distributed.
If a command line argument "--missing" is given before the sample count,
then a list of OIDs is generated without examining the repo.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Makefile
My v3 patch is incoming, but I wanted to respond directly to this message.
On 9/25/2017 7:42 PM, Stefan Beller wrote:
On Mon, Sep 25, 2017 at 2:54 AM, Derrick Stolee <dsto...@microsoft.com> wrote:
Create get_hex_char_from_oid() to parse oids one hex character at a
time. This pr
Hi Junio,
On 9/29/2017 12:34 AM, Junio C Hamano wrote:
* ds/find-unique-abbrev-optim (2017-09-19) 4 commits
- SQUASH???
- sha1_name: parse less while finding common prefix
- sha1_name: unroll len loop in find_unique_abbrev_r()
- sha1_name: create perf test for find_unique_abbrev()
: 22,385,898
Loose Objects: 492
Base Time: 38.9 s
New Time: 2.7 s
Rel %: -93.1%
Derrick Stolee (5):
test-list-objects: List a subset of object ids
p0008-abbrev.sh: Test find_unique_abbrev() perf
sha1_name: Unroll len loop in find_unique_abbrev_r
sha1_name: Parse less while
running in Windows Subsystem for Linux:
Pack Files: 50
Packed Objects: 22,385,898
Loose Objects: 492
Base Time: 3.91 s
New Time: 3.08 s
Rel %: -21.1%
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
to avoid
clustering and therefore quite uniformly distributed.
If a second command line argument "--missing" is given, then a list of
OIDs is generated without examining the repo.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Makefile | 1 +
t/h
Time: 2.72 s
Rel %: -11.8
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 70 +
1 file changed, 66 insertions(+), 4 deletions(-)
diff --git a/sha1_name.c b/sha1_name.c
index bb47b6702..1566cd4fc
probability). For each test, use `sort -R`
to (deterministically) shuffle the sample of object ids to not check
abbreviations in lexicographic order.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Makefile | 1 +
t/helper/.gitignore| 1 +
t/helper/test-abbrev.
Stolee <dsto...@microsoft.com>
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 57 ++---
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1
On 9/17/2017 5:51 PM, Junio C Hamano wrote:
Derrick Stolee <dsto...@microsoft.com> writes:
+int cmd_main(int ac, const char **av)
+{
+ setup_git_directory();
As far as I recall, we do not (yet) allow declaration after
statement in our codebase. Move this down to make it aft
repos.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
Makefile | 1 +
t/helper/.gitignore| 1 +
t/helper/test-abbrev.c | 23 +++
t/perf/p0008-abbrev.sh | 12
4 files changed, 37 insertions(+)
create mode 100644 t/helper/test-ab
Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.
This change decreases the time to run test-abbrev by up to 40% on
large repos.
Signed-off-by: Derrick
Hello,
My name is Derrick Stolee and I just switched teams at Microsoft from
the VSTS Git Server to work on performance improvements in core Git.
This is my first patch submission, and I look forward to your feedback.
Thanks,
Stolee
When displaying object ids, we frequently want to see
.
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
sha1_name.c | 57 ++---
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1ebe49 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -
1301 - 1362 of 1362 matches
Mail list logo