[PATCH v5 11/12] t1407: make hash size independent

2018-09-12 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1407-worktree-ref-store.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1407-worktree-ref-store.sh b/t/t1407-worktree-ref-store.sh
index 4623ae15c4..9a84858118 100755
--- a/t/t1407-worktree-ref-store.sh
+++ b/t/t1407-worktree-ref-store.sh
@@ -58,7 +58,7 @@ test_expect_success 'for_each_reflog()' '
mkdir -p .git/worktrees/wt/logs/refs/bisect &&
echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-   $RWT for-each-reflog | cut -c 42- | sort >actual &&
+   $RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
PSEUDO-WT 0x0
@@ -68,7 +68,7 @@ test_expect_success 'for_each_reflog()' '
EOF
test_cmp expected actual &&
 
-   $RMAIN for-each-reflog | cut -c 42- | sort >actual &&
+   $RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
PSEUDO-MAIN 0x0


[PATCH v5 10/12] t1406: make hash-size independent

2018-09-12 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1406-submodule-ref-store.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index e093782cc3..d199d872fb 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -39,7 +39,7 @@ test_expect_success 'rename_refs() not allowed' '
 '
 
 test_expect_success 'for_each_ref(refs/heads/)' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
master 0x0
new-master 0x0
@@ -48,7 +48,7 @@ test_expect_success 'for_each_ref(refs/heads/)' '
 '
 
 test_expect_success 'for_each_ref() is sorted' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
sort actual > expected &&
test_cmp expected actual
 '
@@ -65,7 +65,7 @@ test_expect_success 'verify_ref(new-master)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-   $RUN for-each-reflog | sort | cut -c 42- >actual &&
+   $RUN for-each-reflog | sort | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
refs/heads/master 0x0


[PATCH v5 08/12] t1400: switch hard-coded object ID to variable

2018-09-12 Thread brian m. carlson
Switch a hard-coded all-zeros object ID to use a variable instead.

Signed-off-by: brian m. carlson 
---
 t/t1400-update-ref.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index 7c8df20955..6072650686 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -346,7 +346,7 @@ test_expect_success "verifying $m's log (logged by config)" 
'
 
 git update-ref $m $D
 cat >.git/logs/$m < 1117150320 -0500
+$Z $C $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150320 -0500
 $C $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150350 -0500
 $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150380 -0500
 $F $Z $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150680 -0500


[PATCH v5 07/12] t1006: make hash size independent

2018-09-12 Thread brian m. carlson
Compute the size of the tree and commit objects we're creating by
checking for the size of an object ID and computing the resulting sizes
accordingly.

Signed-off-by: brian m. carlson 
---
 t/t1006-cat-file.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f19d591f2..a7c95bb785 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -140,15 +140,17 @@ test_expect_success '--batch-check without %(rest) 
considers whole line' '
test_cmp expect actual
 '
 
+test_oid_init
+
 tree_sha1=$(git write-tree)
-tree_size=33
+tree_size=$(($(test_oid rawsz) + 13))
 tree_pretty_content="100644 blob $hello_sha1   hello"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
 commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree 
$tree_sha1)
-commit_size=177
+commit_size=$(($(test_oid hexsz) + 137))
 commit_content="tree $tree_sha1
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 00 +
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 00 +


[PATCH v5 01/12] t: add test functions to translate hash-related values

2018-09-12 Thread brian m. carlson
Add several test functions to make working with various hash-related
values easier.

Add test_oid_init, which loads common hash-related constants and
placeholder object IDs from the newly added files in t/oid-info.
Provide values for these constants for both SHA-1 and SHA-256.

Add test_oid_cache, which accepts data on standard input in the form of
hash-specific key-value pairs that can be looked up later, using the
same format as the files in t/oid-info.  Document this format in a
t/oid-info/README directory so that it's easier to use in the future.

Add test_oid, which is used to specify look up a per-hash value
(produced on standard output) based on the key specified as its
argument.  Usually the data to be looked up will be a hash-related
constant (such as the size of the hash in binary or hexadecimal), a
well-known or placeholder object ID (such as the all-zeros object ID or
one consisting of "deadbeef" repeated), or something similar.  For these
reasons, test_oid will usually be used within a command substitution.
Consequently, redirect the error output to standard error, since
otherwise it will not be displayed.

Add test_detect_hash, which currently only detects SHA-1, and
test_set_hash, which can be used to set a different hash algorithm for
test purposes.  In the future, test_detect_hash will learn to actually
detect the hash depending on how the testsuite is to be run.

Use the local keyword within these functions to avoid overwriting other
shell variables.  We have had a test balloon in place for a couple of
releases to catch shells that don't have this keyword and have not
received any reports of failure.  Note that the varying usages of local
used here are supported by all common open-source shells supporting the
local keyword.

Test these new functions as part of t, which also serves to
demonstrate basic usage of them.  In addition, add documentation on how
to format the lookup data and how to use the test functions.

Implement two basic lookup charts, one for common invalid or synthesized
object IDs, and one for various facts about the hash function in use.
Provide versions of the data for both SHA-1 and SHA-256.

Since we use shell variables for storage, names used for lookup can
currently consist only of shell identifier characters.  If this is a
problem in the future, we can hash the names before use.

Improved-by: Eric Sunshine 
Signed-off-by: Eric Sunshine 
Signed-off-by: brian m. carlson 
---
 t/README| 22 +
 t/oid-info/README   | 19 
 t/oid-info/hash-info|  8 +
 t/oid-info/oid  | 29 +
 t/t-basic.sh| 37 ++
 t/test-lib-functions.sh | 69 +
 6 files changed, 184 insertions(+)
 create mode 100644 t/oid-info/README
 create mode 100644 t/oid-info/hash-info
 create mode 100644 t/oid-info/oid

diff --git a/t/README b/t/README
index 9028b47d92..eed574dd9b 100644
--- a/t/README
+++ b/t/README
@@ -806,6 +806,28 @@ library for your script to use.
the symbolic link in the file system and a part that does; then only
the latter part need be protected by a SYMLINKS prerequisite (see below).
 
+ - test_oid_init
+
+   This function loads facts and useful object IDs related to the hash
+   algorithm(s) in use from the files in t/oid-info.
+
+ - test_oid_cache
+
+   This function reads per-hash algorithm information from standard
+   input (usually a heredoc) in the format described in
+   t/oid-info/README.  This is useful for test-specific values, such as
+   object IDs, which must vary based on the hash algorithm.
+
+   Certain fixed values, such as hash sizes and common placeholder
+   object IDs, can be loaded with test_oid_init (described above).
+
+ - test_oid 
+
+   This function looks up a value for the hash algorithm in use, based
+   on the key given.  The value must have been loaded using
+   test_oid_init or test_oid_cache.  Providing an unknown key is an
+   error.
+
 Prerequisites
 -
 
diff --git a/t/oid-info/README b/t/oid-info/README
new file mode 100644
index 00..27f843fc00
--- /dev/null
+++ b/t/oid-info/README
@@ -0,0 +1,19 @@
+This directory contains various per-hash values that are used in the testsuite.
+
+Each file contains lines containing a key-value pair; blank lines and lines
+starting with `#` are ignored.  The key and value are separated by whitespace
+(specifically, those whitespace in the default `$IFS`).  The key consists only
+of shell identifier characters, and the value consists of a hash algorithm,
+colon, and value.  The hash algorithm also consists only of shell identifier
+characters; it should match the value in sha1-file.c.
+
+For example, the following lines map the key "rawsz" to "20" if SHA-1 is in use
+and to "32" if SHA-256 is in use:
+
+
+rawsz sha1:20
+rawsz sha256:32
+
+
+The keys and values used here are loaded by `test_oid_init` (see the README 
file
+in the "t" 

[PATCH v5 06/12] t0064: make hash size independent

2018-09-12 Thread brian m. carlson
Compute test values of the appropriate size instead of hard-coding
40-character values.  Rename the echo20 function to echoid, since the
values may be of varying sizes.

Signed-off-by: brian m. carlson 
---
 t/t0064-sha1-array.sh | 49 ---
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
index 67484502a0..5dda570b9a 100755
--- a/t/t0064-sha1-array.sh
+++ b/t/t0064-sha1-array.sh
@@ -3,30 +3,30 @@
 test_description='basic tests for the SHA1 array implementation'
 . ./test-lib.sh
 
-echo20 () {
+echoid () {
prefix="${1:+$1 }"
shift
while test $# -gt 0
do
-   echo "$prefix$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1"
+   echo "$prefix$ZERO_OID" | sed -e "s/00/$1/g"
shift
done
 }
 
 test_expect_success 'ordered enumeration' '
-   echo20 "" 44 55 88 aa >expect &&
+   echoid "" 44 55 88 aa >expect &&
{
-   echo20 append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
echo for_each_unique
} | test-tool sha1-array >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'ordered enumeration with duplicate suppression' '
-   echo20 "" 44 55 88 aa >expect &&
+   echoid "" 44 55 88 aa >expect &&
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
echo for_each_unique
} | test-tool sha1-array >actual &&
test_cmp expect actual
@@ -34,8 +34,8 @@ test_expect_success 'ordered enumeration with duplicate 
suppression' '
 
 test_expect_success 'lookup' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 55
+   echoid append 88 44 aa 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -eq 1
@@ -43,8 +43,8 @@ test_expect_success 'lookup' '
 
 test_expect_success 'lookup non-existing entry' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 33
+   echoid append 88 44 aa 55 &&
+   echoid lookup 33
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -lt 0
@@ -52,9 +52,9 @@ test_expect_success 'lookup non-existing entry' '
 
 test_expect_success 'lookup with duplicates' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 55
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -ge 2 &&
@@ -63,19 +63,24 @@ test_expect_success 'lookup with duplicates' '
 
 test_expect_success 'lookup non-existing entry with duplicates' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 66
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid lookup 66
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -lt 0
 '
 
 test_expect_success 'lookup with almost duplicate values' '
+   # n-1 5s
+   root=$(echoid "" 55) &&
+   root=${root%5} &&
{
-   echo "append " &&
-   echo "append 555f" &&
-   echo20 lookup 55
+   id1="${root}5" &&
+   id2="${root}f" &&
+   echo "append $id1" &&
+   echo "append $id2" &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -eq 0
@@ -83,8 +88,8 @@ test_expect_success 'lookup with almost duplicate values' '
 
 test_expect_success 'lookup with single duplicate value' '
{
-   echo20 append 55 55 &&
-   echo20 lookup 55
+   echoid append 55 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -ge 0 &&


[PATCH v5 02/12] t0000: use hash translation table

2018-09-12 Thread brian m. carlson
If the hash we're using is 32 bytes in size, attempting to insert a
20-byte object name won't work.  Since these are synthesized objects
that are almost all zeros, look them up in a translation table.

Signed-off-by: brian m. carlson 
---
 t/t-basic.sh | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index b94221b951..a9dc534048 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -1015,12 +1015,13 @@ test_expect_success SHA1 'validate object ID for a 
known tree' '
 
 test_expect_success 'put invalid objects into the index' '
rm -f .git/index &&
-   cat >badobjects <<-\EOF &&
-   100644 blob 1000dir/file1
-   100644 blob 2000dir/file2
-   100644 blob 3000dir/file3
-   100644 blob 4000dir/file4
-   100644 blob 5000dir/file5
+   suffix=$(echo $ZERO_OID | sed -e "s/^.//") &&
+   cat >badobjects <<-EOF &&
+   100644 blob $(test_oid 001) dir/file1
+   100644 blob $(test_oid 002) dir/file2
+   100644 blob $(test_oid 003) dir/file3
+   100644 blob $(test_oid 004) dir/file4
+   100644 blob $(test_oid 005) dir/file5
EOF
git update-index --index-info 

[PATCH v5 12/12] t5318: use test_oid for HASH_LEN

2018-09-12 Thread brian m. carlson
From: Derrick Stolee 

Signed-off-by: Derrick Stolee 
Signed-off-by: brian m. carlson 
---
 t/t5318-commit-graph.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0c500f7ca2..75fe09521f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -8,7 +8,8 @@ test_expect_success 'setup full repo' '
cd "$TRASH_DIRECTORY/full" &&
git init &&
git config core.commitGraph true &&
-   objdir=".git/objects"
+   objdir=".git/objects" &&
+   test_oid_init
 '
 
 test_expect_success 'verify graph with no graph file' '
@@ -273,7 +274,7 @@ test_expect_success 'git commit-graph verify' '
 
 NUM_COMMITS=9
 NUM_OCTOPUS_EDGES=2
-HASH_LEN=20
+HASH_LEN="$(test_oid rawsz)"
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6


[PATCH v5 03/12] t0000: update tests for SHA-256

2018-09-12 Thread brian m. carlson
Test t tests the "basics of the basics" and as such, checks that we
have various fixed hard-coded object IDs.  The tests relying on these
assertions have been marked with the SHA1 prerequisite, as they will
obviously not function in their current form with SHA-256.

Use the test_oid helper to update these assertions and provide values
for both SHA-1 and SHA-256.

These object IDs were synthesized using a set of scripts that created
the objects for both SHA-1 and SHA-256 using the same method to ensure
that they are indeed the correct values.

Signed-off-by: brian m. carlson 
---
 t/t-basic.sh | 163 +--
 1 file changed, 102 insertions(+), 61 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index a9dc534048..391f910c6a 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -861,6 +861,47 @@ test_expect_success 'test_oid can look up data for 
SHA-256' '
 
 # Basics of the basics
 
+test_oid_cache <<\EOF
+path0f sha1:f87290f8eb2cbbea7857214459a0739927eab154
+path0f sha256:638106af7c38be056f3212cbd7ac65bc1bac74f420ca5a436ff006a9d025d17d
+
+path0s sha1:15a98433ae33114b085f3eb3bb03b832b3180a01
+path0s sha256:3a24cc53cf68edddac490bbf94a418a52932130541361f685df685e41dd6c363
+
+path2f sha1:3feff949ed00a62d9f7af97c15cd8a30595e7ac7
+path2f sha256:2a7f36571c6fdbaf0e3f62751a0b25a3f4c54d2d1137b3f4af9cb794bb498e5f
+
+path2s sha1:d8ce161addc5173867a3c3c730924388daedbc38
+path2s sha256:18fd611b787c2e938ddcc248fabe4d66a150f9364763e9ec133dd01d5bb7c65a
+
+path2d sha1:58a09c23e2ca152193f2786e06986b7b6712bdbe
+path2d sha256:00e4b32b96e7e3d65d79112dcbea53238a22715f896933a62b811377e2650c17
+
+path3f sha1:0aa34cae68d0878578ad119c86ca2b5ed5b28376
+path3f sha256:09f58616b951bd571b8cb9dc76d372fbb09ab99db2393f5ab3189d26c45099ad
+
+path3s sha1:8599103969b43aff7e430efea79ca4636466794f
+path3s sha256:fce1aed087c053306f3f74c32c1a838c662bbc4551a7ac2420f5d6eb061374d0
+
+path3d sha1:21ae8269cacbe57ae09138dcc3a2887f904d02b3
+path3d sha256:9b60497be959cb830bf3f0dc82bcc9ad9e925a24e480837ade46b2295e47efe1
+
+subp3f sha1:00fb5908cb97c2564a9783c0c64087333b3b464f
+subp3f sha256:a1a9e16998c988453f18313d10375ee1d0ddefe757e710dcae0d66aa1e0c58b3
+
+subp3s sha1:6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c
+subp3s sha256:81759d9f5e93c6546ecfcadb560c1ff057314b09f93fe8ec06e2d8610d34ef10
+
+subp3d sha1:3c5e5399f3a333eddecce7a9b9465b63f65f51e2
+subp3d sha256:76b4ef482d4fa1c754390344cf3851c7f883b27cf9bc999c6547928c46aeafb7
+
+root sha1:087704a96baf1c2d1c869a8b084481e121c88b5b
+root sha256:9481b52abab1b2ffeedbf9de63ce422b929f179c1b98ff7bee5f8f1bc0710751
+
+simpletree sha1:7bb943559a305bdd6bdee2cef6e5df2413c3d30a
+simpletree 
sha256:1710c07a6c86f9a3c7376364df04c47ee39e5a5e221fcdd84b743bc9bb7e2bc5
+EOF
+
 # updating a new file without --add should fail.
 test_expect_success 'git update-index without --add should fail adding' '
test_must_fail git update-index should-be-empty
@@ -876,8 +917,8 @@ test_expect_success 'writing tree out with git write-tree' '
 '
 
 # we know the shape and contents of the tree and know the object ID for it.
-test_expect_success SHA1 'validate object ID of a known tree' '
-   test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a
+test_expect_success 'validate object ID of a known tree' '
+   test "$tree" = "$(test_oid simpletree)"
 '
 
 # Removing paths.
@@ -919,16 +960,16 @@ test_expect_success 'showing stage with git ls-files 
--stage' '
git ls-files --stage >current
 '
 
-test_expect_success SHA1 'validate git ls-files output for a known tree' '
-   cat >expected <<-\EOF &&
-   100644 f87290f8eb2cbbea7857214459a0739927eab154 0   path0
-   12 15a98433ae33114b085f3eb3bb03b832b3180a01 0   path0sym
-   100644 3feff949ed00a62d9f7af97c15cd8a30595e7ac7 0   path2/file2
-   12 d8ce161addc5173867a3c3c730924388daedbc38 0   path2/file2sym
-   100644 0aa34cae68d0878578ad119c86ca2b5ed5b28376 0   path3/file3
-   12 8599103969b43aff7e430efea79ca4636466794f 0   path3/file3sym
-   100644 00fb5908cb97c2564a9783c0c64087333b3b464f 0   
path3/subp3/file3
-   12 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c 0   
path3/subp3/file3sym
+test_expect_success 'validate git ls-files output for a known tree' '
+   cat >expected <<-EOF &&
+   100644 $(test_oid path0f) 0 path0
+   12 $(test_oid path0s) 0 path0sym
+   100644 $(test_oid path2f) 0 path2/file2
+   12 $(test_oid path2s) 0 path2/file2sym
+   100644 $(test_oid path3f) 0 path3/file3
+   12 $(test_oid path3s) 0 path3/file3sym
+   100644 $(test_oid subp3f) 0 path3/subp3/file3
+   12 $(test_oid subp3s) 0 path3/subp3/file3sym
EOF
test_cmp expected current
 '
@@ -937,20 +978,20 @@ test_expect_success 'writing tree out with git 
write-tree' '
tree=$(git write-tree)
 '
 

[PATCH v5 00/12] Hash-independent tests (part 3)

2018-09-12 Thread brian m. carlson
This is the next in the series of improvements to make tests
hash-independent.

A range-diff is below.

Changes from v4:
* Add local statements to the &&-chain.
* Fix a typo in the local statement.
* Add a helpful comment about why test_detect_hash is hard-coded to
  SHA-1.

Changes from v3:
* Update t/README to specify new helpers.
* Escape "$" in double quotes.
* Add documentation comments to test helpers.
* Use local instead of underscores in variable names.
* Use expr instead of egrep to match hash algorithm names.
* Improve clarity of unset variable check in test_oid.
* Wrap test_oid_init in a test_expect_success.
* Improved && chain handling in test functions.
* Add test_detect_hash in test_oid_init.
* Clean up extra blank line.
* Add patch from Derrick Stolee for t5318, modified slightly.

Changes from v2:
* Fix a typo in "zero_2".
* Provide better matching of expected output.
* Add and use test_oid_init instead of filename-based test_oid_cache.
* Add test_set_hash.
* Provide better error checking in newly added test functions.
* Move t constants into the test, removing the separate file.
* Switch to using a differently named temporary file in t0027.

Changes from v1:
* Adopt pure shell approach for helper.
* Add tests for the helpers.
* Explicitly refer to SHA-256 now that we know it will be NewHash.
* Updated t to remove SHA1 prerequisite.
* Change name of helper from test_translate to test_oid.
* Add helper to cache information in the shell.
* Simplified lookup of HEAD in t0002.
* Switched to using existing helper function in t0027.
* Simplified handling of IDs in t0064.

Derrick Stolee (1):
  t5318: use test_oid for HASH_LEN

brian m. carlson (11):
  t: add test functions to translate hash-related values
  t: use hash translation table
  t: update tests for SHA-256
  t0002: abstract away SHA-1 specific constants
  t0027: make hash size independent
  t0064: make hash size independent
  t1006: make hash size independent
  t1400: switch hard-coded object ID to variable
  t1405: make hash size independent
  t1406: make hash-size independent
  t1407: make hash size independent

 t/README   |  22 
 t/oid-info/README  |  19 +++
 t/oid-info/hash-info   |   8 ++
 t/oid-info/oid |  29 +
 t/t-basic.sh   | 213 ++---
 t/t0002-gitfile.sh |  27 +++--
 t/t0027-auto-crlf.sh   |   3 +-
 t/t0064-sha1-array.sh  |  49 
 t/t1006-cat-file.sh|   6 +-
 t/t1400-update-ref.sh  |   2 +-
 t/t1405-main-ref-store.sh  |   4 +-
 t/t1406-submodule-ref-store.sh |   6 +-
 t/t1407-worktree-ref-store.sh  |   4 +-
 t/t5318-commit-graph.sh|   5 +-
 t/test-lib-functions.sh|  69 +++
 15 files changed, 352 insertions(+), 114 deletions(-)
 create mode 100644 t/oid-info/README
 create mode 100644 t/oid-info/hash-info
 create mode 100644 t/oid-info/oid

Range-diff against v4:
 1:  fd13b542e7 !  1:  831a0df666 t: add test functions to translate 
hash-related values
@@ -222,6 +222,8 @@
 +
 +# Detect the hash algorithm in use.
 +test_detect_hash () {
++  # Currently we only support SHA-1, but in the future this function will
++  # actually detect the algorithm in use.
 +  test_hash_algo='sha1'
 +}
 +
@@ -241,9 +243,9 @@
 +# rawsz sha1:20
 +# rawsz sha256:32
 +test_oid_cache () {
-+  local tag reset k v
++  local tag rest k v &&
 +
-+  test -n "$test_hash_algo" || test_detect_hash &&
++  { test -n "$test_hash_algo" || test_detect_hash; } &&
 +  while read tag rest
 +  do
 +  case $tag in
@@ -271,7 +273,7 @@
 +# Look up a per-hash value based on a key ($1).  The value must have been 
loaded
 +# by test_oid_init or test_oid_cache.
 +test_oid () {
-+  local var="test_oid_${test_hash_algo}_$1"
++  local var="test_oid_${test_hash_algo}_$1" &&
 +
 +  # If the variable is unset, we must be missing an entry for this
 +  # key-hash pair, so exit with an error.
 2:  335c75e1ec =  2:  75eb36456b t: use hash translation table
 3:  257b458ad9 =  3:  d1cebc5dba t: update tests for SHA-256
 4:  69080d3bfc =  4:  a75f4a049e t0002: abstract away SHA-1 specific constants
 5:  a4297d7ae8 =  5:  ab8598048e t0027: make hash size independent
 6:  7cf1221274 =  6:  ed540a08e8 t0064: make hash size independent
 7:  7e5c857c25 =  7:  e221995a27 t1006: make hash size independent
 8:  9271787d4b =  8:  3a168746ff t1400: switch hard-coded object ID to variable
 9:  43bd41156a =  9:  3e55ccdd15 t1405: make hash size independent
10:  875bbd7fc5 = 10:  df7d4eb112 t1406: make hash-size independent
11:  a9c2d31089 = 11:  b1a04c1986 t1407: make hash size independent
12:  10452cc352 = 12:  63d21a1eca t5318: use test_oid for HASH_LEN


[PATCH v5 04/12] t0002: abstract away SHA-1 specific constants

2018-09-12 Thread brian m. carlson
Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes.

Signed-off-by: brian m. carlson 
---
 t/t0002-gitfile.sh | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 3691023d51..0aa9908ea1 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -92,11 +92,12 @@ test_expect_success 'enter_repo non-strict mode' '
mv .git .realgit &&
echo "gitdir: .realgit" >.git
) &&
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote enter_repo >actual &&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '
@@ -106,21 +107,23 @@ test_expect_success 'enter_repo linked checkout' '
cd enter_repo &&
git worktree add  ../foo refs/tags/foo
) &&
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote foo >actual &&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '
 
 test_expect_success 'enter_repo strict mode' '
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote --upload-pack="git upload-pack --strict" foo/.git >actual 
&&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '


[PATCH v5 05/12] t0027: make hash size independent

2018-09-12 Thread brian m. carlson
We transform various object IDs into all-zero object IDs for comparison.
Adjust the length as well so that this works for all hash algorithms.

While it would be slightly more efficient to use a sed y/// expression
instead of both the tr and sed, retain the tr in both cases for
parallelism and compactness.

Signed-off-by: brian m. carlson 
---
 t/t0027-auto-crlf.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index beb5927f77..dfb46bbe86 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -15,7 +15,8 @@ compare_ws_file () {
pfx=$1
exp=$2.expect
act=$pfx.actual.$3
-   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
+   tr '\015\000abcdef0123456789' QN0 <"$2" |
+   sed -e "s/*/$ZERO_OID/" >"$exp" &&
tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
test_cmp "$exp" "$act" &&
rm "$exp" "$act"


[PATCH v5 09/12] t1405: make hash size independent

2018-09-12 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1405-main-ref-store.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index a74c38b5fb..331899ddc4 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -54,7 +54,7 @@ test_expect_success 'for_each_ref(refs/heads/)' '
 '
 
 test_expect_success 'for_each_ref() is sorted' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
sort actual > expected &&
test_cmp expected actual
 '
@@ -71,7 +71,7 @@ test_expect_success 'verify_ref(new-master)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-   $RUN for-each-reflog | sort -k2 | cut -c 42- >actual &&
+   $RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
refs/heads/master 0x0


Re: Multiple GIT Accounts & HTTPS Client Certificates - Config

2018-09-12 Thread brian m. carlson
On Tue, Sep 11, 2018 at 09:42:35AM +0200, Sergei Haller wrote:
> PS: while I was trying to send the mail to this mailing list, there is
> some update from the stack overflow side:
> 
> * I am using sslBackend schannel
> * the private key of my client certificate can be provided by using
> the http.sslKey config option
> * the private key is on a smart card, so there is no way I can copy it
> over to a file and use the openssl backend (at least no way that I am
> aware of :)
> 
> so basically this pins down to the fact that schannel implementation
> is picking the wrong key.

Git doesn't provide an option to query a smartcard outside of what your
typical TLS implementation provides by default.  Not being a Windows
user, I don't know if Schannel provides a way to automatically select
the right private key based on which certificate is used, or if that's
outside its capability.

> PS: the recent realization makes me believe this is a window specific
> problem. I think I have read somewhere
> about a separate windows mailing list (but not sure where I saw it)

If you're talking about Git for Windows, yes, this is probably better
addressed as an issue there: https://github.com/git-for-windows/git.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Compliment of the day to you Dear Friend.

2018-09-12 Thread Mrs. Amina Kadi
 Compliment of the day to you Dear Friend.

Dear Friend.
 
  I am Mrs. Amina Kadi. am sending this brief letter to solicit your
partnership to transfer $5.5 million US Dollars. I shall send you
more information and procedures when I receive positive response from
you.

Mrs. Amina Kadi









Re: [PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-12 Thread Johannes Schindelin
Hi Thomas,

[quickly, as I will go back to a proper vacation after this]

On Wed, 12 Sep 2018, Thomas Gummerer wrote:

> diff --git a/linear-assignment.c b/linear-assignment.c
> index 9b3e56e283..7700b80eeb 100644
> --- a/linear-assignment.c
> +++ b/linear-assignment.c
> @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> int *cost,
>   else if (j1 < -1)
>   row2column[i] = -2 - j1;
>   else {
> - int min = COST(!j1, i) - v[!j1];
> - for (j = 1; j < column_count; j++)
> + int min = INT_MAX;

I am worried about this, as I tried very hard to avoid integer overruns.

Wouldn't it be possible to replace the `else {` by an appropriate `else if
(...) { ... } else {`? E.g. `else if (column_count < 2)` or some such?

Ciao,
Dscho

> + for (j = 0; j < column_count; j++)
>   if (j != j1 && min > COST(j, i) - v[j])
>   min = COST(j, i) - v[j];
>   v[j1] -= min;
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 2237c7f4af..fb4c13a84a 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'no commits on one side' '
> + git commit --amend -m "new message" &&
> + git range-diff master HEAD@{1} HEAD
> +'
> +
>  test_done
> -- 
> 2.19.0.397.gdd90340f6a
> 
> 


[ANNOUNCE] Git for Windows 2.19.0

2018-09-12 Thread Johannes Schindelin
Dear Git users,

It is my pleasure to announce that Git for Windows 2.19.0 is available from:

https://gitforwindows.org/

Changes since Git for Windows v2.18.0 (June 22nd 2018)

New Features

  * Comes with Git v2.19.0.
  * There are now fast, built-in versions of git stash and git rebase,
available as experimental options.
  * The included OpenSSH client now enables modern ciphers.
  * The gitweb component was removed because it is highly unlikely to
be used on Windows.
  * The git archimport tool (which was probably used by exactly 0
users) is no longer included in Git for Windows.
  * Comes with tig v2.4.0.
  * Comes with Git LFS v2.5.1.
  * Comes with Git Credential Manager v1.17.1.
  * Comes with OpenSSL v1.0.2p.
  * Comes with cURL v7.61.1.
  * Comes with mingw-w64-nodejs v8.12.0.

Bug Fixes

  * The http.schannel.checkRevoke setting (which never worked) was
renamed to http.schannelCheckRevoke. In the same run,
http.schannel.useSSLCAInfo (which also did not work, for the same
reason) was renamed to http.schannelUseSSLCAInfo.
  * Avoids a stack overflow with recent Windows Insider versions.
  * Git GUI now handles hooks correctly in worktrees other than the
main one.
  * When using core.autocrlf, the bogus "LF will be replaced by CRLF"
warning is now suppressed.
  * The funny fatal error -cmalloc would have returned NULL problems
should be gone.

Filename | SHA-256
 | ---
Git-2.19.0-64-bit.exe | 
2dd5824e29ca44e86016cdb3dae8446cb5b80b77f596b67e3d8754451fa4bbcb
Git-2.19.0-32-bit.exe | 
66b29edc994838586e141b07394900426d59d86a3126cddc04d9ab26032eb7b4
PortableGit-2.19.0-64-bit.7z.exe | 
1b8761ae57f589890a83995d0da7891efbddfee14e9f0c3ffda91f6add5b9351
PortableGit-2.19.0-32-bit.7z.exe | 
edaeb1962b7620bff33d45889eb4bcb6e4ac95021042910871b6025515785c16
MinGit-2.19.0-64-bit.zip | 
424d24b5fc185a9c5488d7872262464f2facab4f1d4693ea8008196f14a3c19b
MinGit-2.19.0-32-bit.zip | 
83cf018bd6f5c24e2b3088539bbeef9067fd632087d094d447a3a0ff676e7bd7
MinGit-2.19.0-busybox-64-bit.zip | 
128b355e4321dbaf699ff4994ddbb6cde92783f4930be50aad507f6d8dbec0d0
MinGit-2.19.0-busybox-32-bit.zip | 
2695578e2c53f32194986456ce714388149d051041f638e9ed43dd8342465fb2
Git-2.19.0-64-bit.tar.bz2 | 
c3ca6f9f2b6b800ade80ee4973cdaed9869b8c2c684c9a6a0940d5fb1e18bef7
Git-2.19.0-32-bit.tar.bz2 | 
6f8426646e68ea2689b6a1cfc09ba221805f21540e7dc7af00335f27d3b333b8
pdbs-for-git-64-bit-2.19.0.1.d96bb8bc6c-1.zip | 
5f4417ab5b362823b416ead4960e8050682349b603af9ef653bacbbb7eea408a
pdbs-for-git-32-bit-2.19.0.1.d96bb8bc6c-1.zip | 
157f78f63d8df39b1ae6dedfcafbe1a9093a1fcc4768b123dae12ee2ae8c82c5

Ciao,
Johannes


Re: [Question] Signature calculation ignoring parts of binary files

2018-09-12 Thread Junio C Hamano
"Randall S. Becker"  writes:

>> author is important to our process. My objective is to keep the original file
>> 100% exact as supplied and then ignore any changes to the metadata that I
>> don't care about (like Creator) if the remainder of the file is the same.

That will *not* work.  If person A gave you a version of original,
which hashes to X after you strip the cruft you do not care about,
you would register that original with person A's fingerprint on
under the name of X.  What happens when person B gives you another
version, which is not byte-for-byte identical to the one you got
earlier from person A, but does hash to the same X after you strip
the cruft?  If you are going to store it in Git, and if by SHA-1 you
are calling what we perceive as "object name" in Git land, you must
store that one with person B's fingerprint on it also under the name
of X.  Now which version will you get from Git when you ask it to
give you the object that hashes to X?  


Re: [PATCH 1/1] contrib: add coverage-diff script

2018-09-12 Thread Junio C Hamano
Junio C Hamano  writes:

> "Derrick Stolee via GitGitGadget"  writes:
>
>>  contrib/coverage-diff.sh | 70 
>>  1 file changed, 70 insertions(+)
>>  create mode 100755 contrib/coverage-diff.sh
>
> I fully appreciate the motivation.  But it is a bit sad that this
> begins with "#!/bin/bash" but it seems that the script is full of
> bash-isms.  I haven't gone through the script to see if these are
> inevitable or gratuitous yet, but I'd assume it made it easier for
> you to write it to step outside the pure POSIX shell?
> ...
>> +elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
>> +path=${BASH_REMATCH[2]}
>
> OK, it probably is easier to write in bash than using expr if you
> want to do regexp.

Just to clarify. I am saying that it is OK to give up writing in
pure POSIX and relying on bash-isms after seeing these lines.


Re: [PATCH] linear-assignment: fix potential out of bounds memory access

2018-09-12 Thread Thomas Gummerer
On 09/12, Junio C Hamano wrote:
> Thomas Gummerer  writes:

> > --- >8 ---
> >
> > Subject: [PATCH] linear-assignment: fix potential out of bounds memory 
> > access
> >
> > Currently the 'compute_assignment()' function can may read memory out
> > of bounds, even if used correctly.  Namely this happens when we only
> > have one column.  In that case we try to calculate the initial
> > minimum cost using '!j1' as column in the reduction transfer code.
> > That in turn causes us to try and get the cost from column 1 in the
> > cost matrix, which does not exist, and thus results in an out of
> > bounds memory read.
> 
> This nicely explains what goes wrong.
> 
> > Instead of trying to intialize the minimum cost from another column,
> > just set it to INT_MAX.  This also matches what the example code in the
> > original paper for the algorithm [1] does (it initializes the value to
> > inf, for which INT_MAX is the closest match in C).
> 
> Yeah, if we really want to avoid INT_MAX we could use another "have
> we found any value yet?" boolean variable, but the caller in
> get_correspondences() does not even worry about integer overflows
> when stuffing diffsize to the cost[] array, and the other possible
> value that can go to cost[] array is COST_MAX that is mere 65k, so
> it would be OK to use INT_MAX as sentinel here, I guess.

Right, I'm not sure it would be worth introducing another boolean
variable here.  In the normal case we'll always enter the if condition
inside the loop, and set a reasonable 'min' value.

That does not happen if we only have one column, and the 'min' will
remain 'INT_MAX'.  Now in that case it doesn't matter much, as having
only one column means there's no possibility to assign anything, so
the actual values shouldn't matter (at least that's my understanding
of the algorithm so far).

Another improvement we may be able to make here is to not even try to
compute the assignment if there's only one column for that reason, but
I'm out of time today and the rest of my week looks a bit busy, so I
probably won't get to do anything before the beginning of next week.

> > Note that the test only fails under valgrind on Linux, but the same
> > command has been reported to segfault on Mac OS.
> >
> > Also start from 0 in the loop, which matches what the example code in
> > the original paper does as well.  Starting from 1 means we'd ignore
> > the first column during the reduction transfer phase.  Note that in
> > the original paper the loop does start from 1, but the implementation
> > is in Pascal, where arrays are 1 indexed.
> >
> > [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
> >  algorithm for dense and sparse linear assignment
> >  problems. Computing, 38(4), 325–340.
> >
> > Reported-by: ryenus 
> > Helped-by: Derrick Stolee 
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  linear-assignment.c   | 4 ++--
> >  t/t3206-range-diff.sh | 5 +
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> > int *cost,
> > else if (j1 < -1)
> > row2column[i] = -2 - j1;
> > else {
> > -   int min = COST(!j1, i) - v[!j1];
> > -   for (j = 1; j < column_count; j++)
> > +   int min = INT_MAX;
> > +   for (j = 0; j < column_count; j++)
> > if (j != j1 && min > COST(j, i) - v[j])
> > min = COST(j, i) - v[j];
> > v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done


Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2

2018-09-12 Thread Stefan Beller
On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon  wrote:
>
> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c  |  8 +++-
>  http-backend.c | 10 +-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index 73831887d..5fa75b3f7 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
> status = packet_reader_read();
> if (status != PACKET_READ_FLUSH)
> die(_("git archive: expected a flush"));
> -   }
> +   } else if (version == protocol_v2 &&
> +  starts_with(transport->url, "http"))

I assume this is a smart way to cover both http and https
as the latter is prefixed by the former.

How does this interact with remote helpers
(See gitremote-helpers(1), which doesn't mention archives,
but I would imagine that one would be able to use a remote
helper eventually, too?

git archive --remote=abc://example.org/test 

> +   /*
> +* Commands over HTTP require two requests, so there's an
> +* additional server response to parse.
> +*/
> +   discover_version();
>
> /* Now, start reading from fd[0] and spit it out to stdout */
> rv = recv_sideband("archive", fd[0], 1);
> diff --git a/http-backend.c b/http-backend.c
> index 458642ef7..d62d583c7 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
> { "upload-pack", "uploadpack", 1, 1 },
> { "receive-pack", "receivepack", 0, -1 },
> +   { "upload-archive", "uploadarchive", 1, 1 },

So git archives are not supported in protocol v0 via http?
Then it makes sense to see not mention of archives in
the remote helpers as well.

The rest of the code is a surprisingly small patch.

Stefan


Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-12 Thread Thomas Gummerer
On 09/12, Ben Peart wrote:
> Teach get_index_format_default() to support running the test suite
> with specific index versions.  In particular, this enables the test suite
> to be run using index version 4 which is not the default so gets less testing.

I found this commit message slightly misleading.  Running the test
suite with specific index versions is already supported, by defining
TEST_GIT_INDEX_VERSION in 'config.mak'.  What we're doing here is
introduce an additional environment variable that can also be used to
set the index format in tests.

Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the
test suite with index-v4.  Admittedly the name is a bit strange
compared to our usual GIT_TEST_* environment variable names, and it
should probably be documented better (it's only documented in the
Makefile currently), but I'm not sure we should introduce another
environment variable for this purpose?

> Signed-off-by: Ben Peart 
> ---
> 
> Notes:
> Base Ref: v2.19.0
> Web-Diff: https://github.com/benpeart/git/commit/52e733e2ce
> Checkout: git fetch https://github.com/benpeart/git 
> git-test-index-version-v1 && git checkout 52e733e2ce
> 
>  read-cache.c | 47 +--
>  t/README |  6 +-
>  2 files changed, 38 insertions(+), 15 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 7b1354d759..d140ce9989 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1570,26 +1570,45 @@ static unsigned int get_index_format_default(void)
>   char *envversion = getenv("GIT_INDEX_VERSION");
>   char *endp;
>   int value;
> - unsigned int version = INDEX_FORMAT_DEFAULT;
> + unsigned int version = -1;
> +
> + if (envversion) {
> + version = strtoul(envversion, , 10);
> + if (*endp ||
> + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) 
> {
> + warning(_("GIT_INDEX_VERSION set, but the value is 
> invalid.\n"
> + "Using version %i"), INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
> + }
>  
> - if (!envversion) {
> - if (!git_config_get_int("index.version", ))
> + if (version == -1) {
> + if (!git_config_get_int("index.version", )) {
>   version = value;
> - if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
> - warning(_("index.version set, but the value is 
> invalid.\n"
> -   "Using version %i"), INDEX_FORMAT_DEFAULT);
> - return INDEX_FORMAT_DEFAULT;
> + if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
> version) {
> + warning(_("index.version set, but the value is 
> invalid.\n"
> + "Using version %i"), 
> INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
>   }
> - return version;
>   }
>  
> - version = strtoul(envversion, , 10);
> - if (*endp ||
> - version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
> - warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n"
> -   "Using version %i"), INDEX_FORMAT_DEFAULT);
> - version = INDEX_FORMAT_DEFAULT;
> + if (version == -1) {
> + envversion = getenv("GIT_TEST_INDEX_VERSION");
> + if (envversion) {
> + version = strtoul(envversion, , 10);
> + if (*endp ||
> + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
> version) {
> + warning(_("GIT_TEST_INDEX_VERSION set, but the 
> value is invalid.\n"
> + "Using version %i"), 
> INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
> + }
>   }
> +
> + if (version == -1)
> + version = INDEX_FORMAT_DEFAULT;
> +
>   return version;
>  }
>  
> diff --git a/t/README b/t/README
> index 9028b47d92..f872638a78 100644
> --- a/t/README
> +++ b/t/README
> @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the 
> object size is
>  over 2GB. This variable forces the code path on any object larger than
>   bytes.
>  
> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.
>  
> +GIT_TEST_INDEX_VERSION= exercises the index read/write code path
> +for the index version specified.  Can be set to any valid version
> +but the non-default version 4 is probably the most beneficial.
> +
>  Naming Tests
>  
>  
> 
> base-commit: 

Re: [PATCH 2/3] archive: implement protocol v2 archive command

2018-09-12 Thread Stefan Beller
On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon  wrote:
>
> This adds a new archive command for protocol v2. The command expects
> arguments in the form "argument X" which are passed unmodified to
> git-upload-archive--writer.
>
> This command works over the file://, Git, and SSH transports. HTTP
> support will be added in a separate patch.
>
> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c| 45 +++-
>  builtin/upload-archive.c | 44 ---
>  t/t5000-tar-tree.sh  |  5 +
>  3 files changed, 77 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index e54fc39ad..73831887d 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -5,9 +5,11 @@
>  #include "cache.h"
>  #include "builtin.h"
>  #include "archive.h"
> +#include "connect.h"
>  #include "transport.h"
>  #include "parse-options.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>
>  static void create_output_file(const char *output_file)
> @@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
> }
>  }
>
> +static int do_v2_command_and_cap(int out)
> +{
> +   packet_write_fmt(out, "command=archive\n");
> +   /* Capability list would go here, if we had any. */
> +   packet_delim(out);
> +}
> +
>  static int run_remote_archiver(int argc, const char **argv,
>const char *remote, const char *exec,
>const char *name_hint)
> @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
> struct remote *_remote;
> struct packet_reader reader;
> enum packet_read_status status;
> +   enum protocol_version version;
>
> _remote = remote_get(remote);
> if (!_remote->url[0])
> @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
>
> packet_reader_init(, fd[0], NULL, 0, 
> PACKET_READ_CHOMP_NEWLINE);
>
> +   version = discover_version();
> +
> +   if (version == protocol_v2)
> +   do_v2_command_and_cap(fd[1]);
> +
> /*
>  * Inject a fake --format field at the beginning of the
>  * arguments, with the format inferred from our output
> @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char 
> **argv,
> packet_write_fmt(fd[1], "argument %s\n", argv[i]);
> packet_flush(fd[1]);
>
> -   status = packet_reader_read();
> -
> -   if (status == PACKET_READ_FLUSH)
> -   die(_("git archive: expected ACK/NAK, got a flush packet"));
> -   if (strcmp(reader.buffer, "ACK")) {
> -   if (starts_with(reader.buffer, "NACK "))
> -   die(_("git archive: NACK %s"), reader.buffer + 5);
> -   if (starts_with(reader.buffer, "ERR "))
> -   die(_("remote error: %s"), reader.buffer + 4);
> -   die(_("git archive: protocol error"));

Maybe we also want to support v1
(which is v0 prefixed with one pkt_line saying it is v1).

If (version == protocol_v1)
/* drop version v1 line, and then follow v0 logic. */
packet_reader_read();

Do we care about v1, or do we just ignore it here? why?
(Don't answer me here, but rather put it in the commit message)

> +   if (version == protocol_v0) {
> +   status = packet_reader_read();
> +
> +   if (status == PACKET_READ_FLUSH)
> +   die(_("git archive: expected ACK/NAK, got a flush 
> packet"));
> +   if (strcmp(reader.buffer, "ACK")) {
> +   if (starts_with(reader.buffer, "NACK "))
> +   die(_("git archive: NACK %s"), reader.buffer 
> + 5);
> +   if (starts_with(reader.buffer, "ERR "))
> +   die(_("remote error: %s"), reader.buffer + 4);
> +   die(_("git archive: protocol error"));
> +   }
> +
> +   status = packet_reader_read();
> +   if (status != PACKET_READ_FLUSH)
> +   die(_("git archive: expected a flush"));
> }
>
> -   status = packet_reader_read();
> -   if (status != PACKET_READ_FLUSH)
> -   die(_("git archive: expected a flush"));
> -
> /* Now, start reading from fd[0] and spit it out to stdout */
> rv = recv_sideband("archive", fd[0], 1);
> rv |= transport_disconnect(transport);
> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 25d911635..534e8fd56 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -5,6 +5,7 @@
>  #include "builtin.h"
>  #include "archive.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "argv-array.h"
> @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
> 

RE: [Question] Signature calculation ignoring parts of binary files

2018-09-12 Thread Randall S. Becker
On September 12, 2018 4:54 PM, I wrote:
> On September 12, 2018 4:48 PM, Johannes Sixt wrote:
> > Am 12.09.18 um 21:16 schrieb Randall S. Becker:
> > > I feel really bad asking this, and I should know the answer, and yet.
> > >
> > > I have a binary file that needs to go into a repo intact (unchanged).
> > > I also have a program that interprets the contents, like a textconv,
> > > that can output the relevant portions of the file in whatever format
> > > I like - used for diff typically, dumps in 1K chunks by file section.
> > > What I'm looking for is to have the SHA1 signature calculated with
> > > just the relevant portions of the file so that two actually
> > > different files will be considered the same by git during a commit
> > > or status. In real terms, I'm trying to ignore the Creator metadata
> > > of a JPG because it is mutable and irrelevant to my repo contents.
> > >
> > > I'm sorry to ask, but I thought this was in .gitattributes but I
> > > can't confirm the SHA1 behaviour.
> >
> > You are looking for a clean filter. See the 'filter' attribute in 
> > gitattributes(5).
> > Your clean filter program or script should strip the unwanted metadata
> > or set it to a constant known-good value.
> >
> > (You shouldn't need a smudge filter.)
> >
> > -- Hannes
> 
> Thanks Hannes. I thought about the clean filter, but I don't actually want to
> modify the file when going into git, just for SHA calculation. I need to be 
> able
> to keep some origin metadata that might change with subsequent copies, so
> just cleaning the origin is not going to work - actually knowing the original
> author is important to our process. My objective is to keep the original file
> 100% exact as supplied and then ignore any changes to the metadata that I
> don't care about (like Creator) if the remainder of the file is the same.

I had a thought that might be workable, opinions are welcome on this.

The commit of my rather weird project is done by a script so I have flexibility 
in my approach. What I could do is set up a diff textconv configuration so that 
the text diff of the two JPG files will show no differences if the immutable 
fields and the image are the same. I can then trigger a git add and git commit 
for only those files where git diff reports no differences. That way the actual 
original file is stored in git with 100% fidelity (no cleaning). It's not as 
elegant as I'd like, but it does solve what I'm trying to do. Does this sound 
reasonable and/or is there a better way?

Cheers,
Randall

-- Brief whoami:
 NonStop developer since approximately 2112884442
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.





Re: [PATCH 1/1] contrib: add coverage-diff script

2018-09-12 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

>  contrib/coverage-diff.sh | 70 
>  1 file changed, 70 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh

I fully appreciate the motivation.  But it is a bit sad that this
begins with "#!/bin/bash" but it seems that the script is full of
bash-isms.  I haven't gone through the script to see if these are
inevitable or gratuitous yet, but I'd assume it made it easier for
you to write it to step outside the pure POSIX shell?

> +V1=$1
> +V2=$2
> +
> +diff-lines() {

Being able to use '-' in identifier is probably a bash-ism that you
did not have to use.

> +local path=
> +local line=
> +while read; do

Being able to omit variable to be read into and can use the implicit
variable $REPLY also is.

> + esc=$'\033'
> + if [[ $REPLY =~ ---\ (a/)?.* ]]; then
> + continue
> + elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
> + path=${BASH_REMATCH[2]}

OK, it probably is easier to write in bash than using expr if you
want to do regexp.  Where do these escape code come from in "git
diff" output, by the way?

> + elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; 
> then
> + line=${BASH_REMATCH[2]}
> + elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then
> + echo "$path:$line:$REPLY"
> + if [[ ${BASH_REMATCH[2]} != - ]]; then
> + ((line++))
> + fi
> + fi
> +done
> +}
> +
> +git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt

Hmph, not 

git diff --name-only "$V1" "$V2" -- "*.c"

Do we (or do we not) want "--no-renames"?

> +for file in $(cat files.txt)
> +do
> + hash_file=${file//\//\#}
> +
> + git diff $V1 $V2 -- $file \
> + | diff-lines \
> + | grep ":+" \
> + >"diff_file.txt"

Style:

cmd1 |
cmd2 |
cmd3 >output

is easier to read without backslashes.

> + cat diff_file.txt \
> + | sed -E 's/:/ /g' \
> + | awk '{print $2}' \
> + | sort \
> + >new_lines.txt
> +
> + cat "$hash_file.gcov" \
> + | grep \#\#\#\#\# \
> + | sed 's/#: //g' \
> + | sed 's/\:/ /g' \
> + | awk '{print $1}' \
> + | sort \
> + >uncovered_lines.txt

OK, so we assume that we have run coverage in $V2 checkout so that
we can pick up the postimage line numbers in "diff $V1 $V2" and find
corresponding record in .gcov file in the filesystem.  I did not
realize the significance of 'topic' being the later argument to the
script in this part

After creating the coverage statistics at a version (say,
'topic') you can then run

contrib/coverage-diff.sh base topic

of your description before I see this implementation.  Also the
comment at the beginning

# Usage: 'contrib/coverage-diff.sh  
# Outputs a list of new lines in version2 compared to version1 that are
# not covered by the test suite. Assumes you ran
# 'make coverage-test coverage-report' from root first, so .gcov files 
exist.

would want to make it clear that we want coverage run from root
for version2 before using this script.

> + comm -12 uncovered_lines.txt new_lines.txt \
> + | sed -e 's/$/\)/' \
> + | sed -e 's/^/\t/' \
> + >uncovered_new_lines.txt
> +
> + grep -q '[^[:space:]]' < uncovered_new_lines.txt && \

Style: when you end a line with && (or || or | for that matter), the
shell knows that you have not finished speaking, and will wait to
listen to you to finish the sentence.  No need for backslash there.

> + echo $file && \
> + git blame -c $file \
> + | grep -f uncovered_new_lines.txt
> +
> + rm -f diff_file.txt new_lines.txt \
> + uncovered_lines.txt uncovered_new_lines.txt
> +done
> +
> +rm -rf files.txt


Re: [PATCH 1/3] archive: use packet_reader for communications

2018-09-12 Thread Stefan Beller
On Tue, Sep 11, 2018 at 10:35 PM Josh Steadmon  wrote:
>
> Using packet_reader will simplify version detection and capability
> handling, which will make implementation of protocol v2 support in
> git-archive easier.
>
> Signed-off-by: Josh Steadmon 

Reviewed-by: Stefan Beller 


Re: [PATCH 1/2] fetch-object: provide only one fetching function

2018-09-12 Thread Junio C Hamano
Jonathan Tan  writes:

> fetch-object.h currently provides two functions (fetch_object() and
> fetch_objects()) that could be replaced by a single, more flexible
> function. Replace those two functions with the more flexible function.

The latter half of the first sentence and the second sentence are
pretty-much repeating the same thing.  

I think you wanted to justify two changes:

 (1) There are two helpers.  fetch_object() fetches a single object
 from a given remote; fetch_objects() fetches a set of objects
 from a given remote.  It is not like the former is implemented
 as a specialization of the latter (i.e. passing length=1
 array), and it is not like the former is optimized specially
 for a single object fetch that cannot be made into such a thin
 wrapper.  It is not like the latter is implemented as a
 repeated call to the former in a loop, either.  There is no
 justification to keep two independent implementations, and
 using a single helper that can be used by all the callers of
 these two would make sense.

 (2) The variant that fetches multiple objects takes oid_array.  To
 be used by all the existing callers of these two helpers, it is
 better to use a different calling convention, namely, a array[]
 and its length passed as separate parameters.
 
Instead of explaining why the new convention is better to justify
(2), the above three lines handwave by saying "more flexible"
twice.  We should do better.

fetch-object: unify fetch_object[s] functions

There are fetch_object() and fetch_objects() helpers in
fetch-object.h; as the latter takes "struct oid_array",
the former cannot be made into a thin wrapper around the
latter without an extra allocation and set-up cost.

Update fetch_objects() to take an array of "struct
object_id" and number of elements in it as separate
parameters, remove fetch_object(), and adjust all existing
callers of these functions to use the new fetch_objects().

perhaps?

> diff --git a/sha1-file.c b/sha1-file.c
> index 97b742384..2edf4564f 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -1317,7 +1317,7 @@ int oid_object_info_extended(struct repository *r, 
> const struct object_id *oid,
>* TODO Pass a repository struct through fetch_object,
>* such that arbitrary repositories work.
>*/
> - fetch_object(repository_format_partial_clone, 
> real->hash);
> + fetch_objects(repository_format_partial_clone, real, 1);
>   already_retried = 1;
>   continue;
>   }
> diff --git a/unpack-trees.c b/unpack-trees.c
> index f25089b87..82a83dbc6 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -392,7 +392,7 @@ static int check_updates(struct unpack_trees_options *o)
>   }
>   if (to_fetch.nr)
>   fetch_objects(repository_format_partial_clone,
> -   _fetch);
> +   to_fetch.oid, to_fetch.nr);
>   fetch_if_missing = fetch_if_missing_store;
>   oid_array_clear(_fetch);
>   }

Changes to these two callers do explain why  is an
interface that is easier to use.  Those who already have an
oid_array can pass its fields as separate parameters without too
much trouble, and those who only have an oid (or an array of oid and
knows how many are in the array) do not have to wrap them into an
oid_array.

Thanks.


Re: [PATCH 1/1] commit-reach: properly peel tags

2018-09-12 Thread Jeff King
On Wed, Sep 12, 2018 at 02:23:42PM -0700, Junio C Hamano wrote:

> > diff --git a/commit-reach.c b/commit-reach.c
> > index 86715c103c..6de72c6e03 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array 
> > *from,
> >  {
> > struct commit **list = NULL;
> > int i;
> > +   int nr_commits;
> > int result = 1;
> >  
> > ALLOC_ARRAY(list, from->nr);
> > +   nr_commits = 0;
> > for (i = 0; i < from->nr; i++) {
> > -   list[i] = (struct commit *)from->objects[i].item;
> > +   struct object *from_one = from->objects[i].item;
> >  
> > -   if (parse_commit(list[i]) ||
> > -   list[i]->generation < min_generation)
> > -   return 0;
> > +   from_one = deref_tag(the_repository, from_one,
> > +"a from object", 0);
> > +   if (!from_one || from_one->type != OBJ_COMMIT) {
> > +   from->objects[i].item->flags |= assign_flag;
> 
> I wondered why this is not futzing with "from_one->flags"; by going
> back to the original from->objects[] array, the code is setting the
> flags on the original tag object and not the non-commit object that
> was pointed at by the tag.

Note that from_one may even be NULL.

> > +   continue;
> > +   }
> > +
> > +   list[nr_commits] = (struct commit *)from_one;
> > +   if (parse_commit(list[nr_commits]) ||
> > +   list[nr_commits]->generation < min_generation)
> > +   return 0; /* is this a leak? */
> > +   nr_commits++;
> > }
> 
> In the original code, the flags bits were left unchanged if the loop
> terminated by hitting a commit whose generation is too young (or
> parse_commit() returns non-zero).  With this updated code, flags bit
> can be modified before the code notices the situation and leave the
> function, bypassing the "cleanup" we see below that clears the
> "assign_flag" bits.
> 
> Would it be a problem that we return early without cleaning up?
> 
> Even if we do not call this early return, the assign_flag bits added
> to the original tag in from->objects[i].item won't be cleaned in
> this new code, as "cleanup:" section now loops over the list[] that
> omits the object whose flags was smudged above before the "continue".
> 
> Would it be a problem that we leave the assign_flags without
> cleaning up?

Yeah, I hadn't thought about the bit cleanup when making my original
suggestion. In the original code (before 4fbcca4eff), I think we did set
flags as we iterated through the loop, and we could still do an early
return when we hit "!reachable(...)". But I don't see any cleanup of
assign_flag there at all.

So I guess I'm pretty confused about what the semantics are supposed to
be.

-Peff


Re: [RFC PATCH 1/1] builtin/remote: quote remote name on error to display empty name

2018-09-12 Thread Junio C Hamano
Shulhan  writes:

> Rationale: consistent error format

You can and should drop this line.

> When adding new remote name with empty string, git will print the
> following error message,
>
>   fatal: '' is not a valid remote name\n
>
> But when removing remote name with empty string as input, git did not
> print the empty string with quote,

"git shows the empty string without quote" would be easier to
understand which part of the behaviour you are disturbed by.

>   fatal: No such remote: \n

After stating the above observation, you can say something like

To make these error messages consistent, quote the name of
the remote that we tried and failed to find.

at the end.

> ---

Needs sign-off.  See Documentation/SubmittingPatches.

>  builtin/remote.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 40c6f8a1b..f7edf7f2c 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -626,7 +626,7 @@ static int mv(int argc, const char **argv)
>  
>   oldremote = remote_get(rename.old_name);
>   if (!remote_is_configured(oldremote, 1))
> - die(_("No such remote: %s"), rename.old_name);
> + die(_("No such remote: '%s'"), rename.old_name);
>  
>   if (!strcmp(rename.old_name, rename.new_name) && oldremote->origin != 
> REMOTE_CONFIG)
>   return migrate_file(oldremote);
> @@ -762,7 +762,7 @@ static int rm(int argc, const char **argv)
>  
>   remote = remote_get(argv[1]);
>   if (!remote_is_configured(remote, 1))
> - die(_("No such remote: %s"), argv[1]);
> + die(_("No such remote: '%s'"), argv[1]);
>  
>   known_remotes.to_delete = remote;
>   for_each_remote(add_known_remote, _remotes);
> @@ -861,7 +861,7 @@ static int get_remote_ref_states(const char *name,
>  
>   states->remote = remote_get(name);
>   if (!states->remote)
> - return error(_("No such remote: %s"), name);
> + return error(_("No such remote: '%s'"), name);
>  
>   read_branches();


[PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-12 Thread Ben Peart
Teach get_index_format_default() to support running the test suite
with specific index versions.  In particular, this enables the test suite
to be run using index version 4 which is not the default so gets less testing.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: v2.19.0
Web-Diff: https://github.com/benpeart/git/commit/52e733e2ce
Checkout: git fetch https://github.com/benpeart/git 
git-test-index-version-v1 && git checkout 52e733e2ce

 read-cache.c | 47 +--
 t/README |  6 +-
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..d140ce9989 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1570,26 +1570,45 @@ static unsigned int get_index_format_default(void)
char *envversion = getenv("GIT_INDEX_VERSION");
char *endp;
int value;
-   unsigned int version = INDEX_FORMAT_DEFAULT;
+   unsigned int version = -1;
+
+   if (envversion) {
+   version = strtoul(envversion, , 10);
+   if (*endp ||
+   version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) 
{
+   warning(_("GIT_INDEX_VERSION set, but the value is 
invalid.\n"
+   "Using version %i"), INDEX_FORMAT_DEFAULT);
+   version = INDEX_FORMAT_DEFAULT;
+   }
+   }
 
-   if (!envversion) {
-   if (!git_config_get_int("index.version", ))
+   if (version == -1) {
+   if (!git_config_get_int("index.version", )) {
version = value;
-   if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
-   warning(_("index.version set, but the value is 
invalid.\n"
- "Using version %i"), INDEX_FORMAT_DEFAULT);
-   return INDEX_FORMAT_DEFAULT;
+   if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
version) {
+   warning(_("index.version set, but the value is 
invalid.\n"
+   "Using version %i"), 
INDEX_FORMAT_DEFAULT);
+   version = INDEX_FORMAT_DEFAULT;
+   }
}
-   return version;
}
 
-   version = strtoul(envversion, , 10);
-   if (*endp ||
-   version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
-   warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n"
- "Using version %i"), INDEX_FORMAT_DEFAULT);
-   version = INDEX_FORMAT_DEFAULT;
+   if (version == -1) {
+   envversion = getenv("GIT_TEST_INDEX_VERSION");
+   if (envversion) {
+   version = strtoul(envversion, , 10);
+   if (*endp ||
+   version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
version) {
+   warning(_("GIT_TEST_INDEX_VERSION set, but the 
value is invalid.\n"
+   "Using version %i"), 
INDEX_FORMAT_DEFAULT);
+   version = INDEX_FORMAT_DEFAULT;
+   }
+   }
}
+
+   if (version == -1)
+   version = INDEX_FORMAT_DEFAULT;
+
return version;
 }
 
diff --git a/t/README b/t/README
index 9028b47d92..f872638a78 100644
--- a/t/README
+++ b/t/README
@@ -315,10 +315,14 @@ packs on demand. This normally only happens when the 
object size is
 over 2GB. This variable forces the code path on any object larger than
  bytes.
 
-GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
+GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
 path where deltas larger than this limit require extra memory
 allocation for bookkeeping.
 
+GIT_TEST_INDEX_VERSION= exercises the index read/write code path
+for the index version specified.  Can be set to any valid version
+but the non-default version 4 is probably the most beneficial.
+
 Naming Tests
 
 

base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670
-- 
2.18.0.windows.1



Re: [PATCH 1/1] commit-reach: properly peel tags

2018-09-12 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> diff --git a/commit-reach.c b/commit-reach.c
> index 86715c103c..6de72c6e03 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array 
> *from,
>  {
>   struct commit **list = NULL;
>   int i;
> + int nr_commits;
>   int result = 1;
>  
>   ALLOC_ARRAY(list, from->nr);
> + nr_commits = 0;
>   for (i = 0; i < from->nr; i++) {
> - list[i] = (struct commit *)from->objects[i].item;
> + struct object *from_one = from->objects[i].item;
>  
> - if (parse_commit(list[i]) ||
> - list[i]->generation < min_generation)
> - return 0;
> + from_one = deref_tag(the_repository, from_one,
> +  "a from object", 0);
> + if (!from_one || from_one->type != OBJ_COMMIT) {
> + from->objects[i].item->flags |= assign_flag;

I wondered why this is not futzing with "from_one->flags"; by going
back to the original from->objects[] array, the code is setting the
flags on the original tag object and not the non-commit object that
was pointed at by the tag.

> + continue;
> + }
> +
> + list[nr_commits] = (struct commit *)from_one;
> + if (parse_commit(list[nr_commits]) ||
> + list[nr_commits]->generation < min_generation)
> + return 0; /* is this a leak? */
> + nr_commits++;
>   }

In the original code, the flags bits were left unchanged if the loop
terminated by hitting a commit whose generation is too young (or
parse_commit() returns non-zero).  With this updated code, flags bit
can be modified before the code notices the situation and leave the
function, bypassing the "cleanup" we see below that clears the
"assign_flag" bits.

Would it be a problem that we return early without cleaning up?

Even if we do not call this early return, the assign_flag bits added
to the original tag in from->objects[i].item won't be cleaned in
this new code, as "cleanup:" section now loops over the list[] that
omits the object whose flags was smudged above before the "continue".

Would it be a problem that we leave the assign_flags without
cleaning up?

> - QSORT(list, from->nr, compare_commits_by_gen);
> + QSORT(list, nr_commits, compare_commits_by_gen);
>  
> - for (i = 0; i < from->nr; i++) {
> + for (i = 0; i < nr_commits; i++) {
>   /* DFS from list[i] */
>   struct commit_list *stack = NULL;
>  
> @@ -600,7 +611,7 @@ int can_all_from_reach_with_flag(struct object_array 
> *from,
>   }
>  
>  cleanup:
> - for (i = 0; i < from->nr; i++) {
> + for (i = 0; i < nr_commits; i++) {
>   clear_commit_marks(list[i], RESULT);
>   clear_commit_marks(list[i], assign_flag);
>   }


[PATCH] sequencer: fix --allow-empty-message behavior, make it smarter

2018-09-12 Thread Elijah Newren
In commit b00bf1c9a8dd ("git-rebase: make --allow-empty-message the
default", 2018-06-27), several arguments were given for transplanting
empty commits without halting and asking the user for confirmation on
each commit.  These arguments were incomplete because the logic clearly
assumed the only cases under consideration were transplanting of commits
with empty messages (see the comment about "There are two sources for
commits with empty messages).  It didn't discuss or even consider
rewords, squashes, etc. where the user is explicitly asked for a new
commit message and provides an empty one.  (My bad, I totally should
have thought about that at the time, but just didn't.)

Rewords and squashes are significantly different, though, as described
by SZEDER:

Let's suppose you start an interactive rebase, choose a commit to
squash, save the instruction sheet, rebase fires up your editor, and
then you notice that you mistakenly chose the wrong commit to
squash.  What do you do, how do you abort?

Before [that commit] you could clear the commit message, exit the
editor, and then rebase would say "Aborting commit due to empty
commit message.", and you get to run 'git rebase --abort', and start
over.

But [since that commit, ...] saving the commit message as is would
let rebase continue and create a bunch of unnecessary objects, and
then you would have to use the reflog to return to the pre-rebase
state.

Also, he states:

The instructions in the commit message template, which is shown for
'reword' and 'squash', too, still say...

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.

These are sound arguments that when editing commit messages during a
sequencer operation, that if the commit message is empty then the
operation should halt and ask the user to correct.  The arguments in
commit b00bf1c9a8dd (referenced above) still apply when transplanting
previously created commits with empty commit messages, so the sequencer
should not halt for those.

Furthermore, all rationale so far applies equally for cherry-pick as for
rebase.  Therefore, make the code default to --allow-empty-message when
transplanting an existing commit, and to default to halting when the
user is asked to edit a commit message and provides an empty one -- for
both rebase and cherry-pick.

Signed-off-by: Elijah Newren 
---
This patch cleanly applies to both 2.19.0 and pu.  There are some related
code cleanups that I'd like to make, but doing that cleanup conflicts with
the various rewrite-rebase-in-C topics sitting in pu; since those are
fairly lengthy, I really don't want to cause problems there, but I think
SZEDER really wants this 2.19.0 regression fix before 2.20.0 and thus
before those other topics.


 sequencer.c   |  4 ++--
 t/t3404-rebase-interactive.sh |  7 +++
 t/t3405-rebase-malformed.sh   |  2 +-
 t/t3505-cherry-pick-empty.sh  | 18 --
 4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..638ee151f2 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -899,7 +899,7 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
if ((flags & ALLOW_EMPTY))
argv_array_push(, "--allow-empty");
 
-   if (opts->allow_empty_message)
+   if (!(flags & EDIT_MSG))
argv_array_push(, "--allow-empty-message");
 
if (cmd.err == -1) {
@@ -1313,7 +1313,7 @@ static int try_to_commit(struct strbuf *msg, const char 
*author,
 
if (cleanup != COMMIT_MSG_CLEANUP_NONE)
strbuf_stripspace(msg, cleanup == COMMIT_MSG_CLEANUP_ALL);
-   if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) {
+   if ((flags & EDIT_MSG) && message_is_empty(msg, cleanup)) {
res = 1; /* run 'git commit' to display error message */
goto out;
}
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 86bba5ed7c..ff89b6341a 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -569,16 +569,15 @@ test_expect_success '--continue tries to commit, even for 
"edit"' '
 '
 
 test_expect_success 'aborted --continue does not squash commits after "edit"' '
-   test_when_finished "git rebase --abort" &&
old=$(git rev-parse HEAD) &&
test_tick &&
set_fake_editor &&
FAKE_LINES="edit 1" git rebase -i HEAD^ &&
echo "edited again" > file7 &&
git add file7 &&
-   echo all the things >>conflict &&
-   test_must_fail git rebase --continue &&
-   test $old = $(git rev-parse HEAD)
+   test_must_fail env FAKE_COMMIT_MESSAGE=" " git rebase --continue &&
+   test $old = $(git rev-parse HEAD) &&
+   git rebase --abort
 '
 
 test_expect_success 'auto-amend only edited commits after "edit"' '
diff --git 

RE: [Question] Signature calculation ignoring parts of binary files

2018-09-12 Thread Randall S. Becker
> -Original Message-
> From: git-ow...@vger.kernel.org  On Behalf
> Of Johannes Sixt
> Sent: September 12, 2018 4:48 PM
> To: Randall S. Becker 
> Cc: git@vger.kernel.org
> Subject: Re: [Question] Signature calculation ignoring parts of binary files
> 
> Am 12.09.18 um 21:16 schrieb Randall S. Becker:
> > I feel really bad asking this, and I should know the answer, and yet.
> >
> > I have a binary file that needs to go into a repo intact (unchanged).
> > I also have a program that interprets the contents, like a textconv,
> > that can output the relevant portions of the file in whatever format I
> > like - used for diff typically, dumps in 1K chunks by file section.
> > What I'm looking for is to have the SHA1 signature calculated with
> > just the relevant portions of the file so that two actually different
> > files will be considered the same by git during a commit or status. In
> > real terms, I'm trying to ignore the Creator metadata of a JPG because
> > it is mutable and irrelevant to my repo contents.
> >
> > I'm sorry to ask, but I thought this was in .gitattributes but I can't
> > confirm the SHA1 behaviour.
> 
> You are looking for a clean filter. See the 'filter' attribute in 
> gitattributes(5).
> Your clean filter program or script should strip the unwanted metadata or set
> it to a constant known-good value.
> 
> (You shouldn't need a smudge filter.)
> 
> -- Hannes

Thanks Hannes. I thought about the clean filter, but I don't actually want to 
modify the file when going into git, just for SHA calculation. I need to be 
able to keep some origin metadata that might change with subsequent copies, so 
just cleaning the origin is not going to work - actually knowing the original 
author is important to our process. My objective is to keep the original file 
100% exact as supplied and then ignore any changes to the metadata that I don't 
care about (like Creator) if the remainder of the file is the same.

Regards,
Randall




Re: [Question] Signature calculation ignoring parts of binary files

2018-09-12 Thread Johannes Sixt

Am 12.09.18 um 21:16 schrieb Randall S. Becker:

I feel really bad asking this, and I should know the answer, and yet.

I have a binary file that needs to go into a repo intact (unchanged). I also
have a program that interprets the contents, like a textconv, that can
output the relevant portions of the file in whatever format I like - used
for diff typically, dumps in 1K chunks by file section. What I'm looking for
is to have the SHA1 signature calculated with just the relevant portions of
the file so that two actually different files will be considered the same by
git during a commit or status. In real terms, I'm trying to ignore the
Creator metadata of a JPG because it is mutable and irrelevant to my repo
contents.

I'm sorry to ask, but I thought this was in .gitattributes but I can't
confirm the SHA1 behaviour.


You are looking for a clean filter. See the 'filter' attribute in 
gitattributes(5). Your clean filter program or script should strip the 
unwanted metadata or set it to a constant known-good value.


(You shouldn't need a smudge filter.)

-- Hannes


Re: [PATCH] linear-assignment: fix potential out of bounds memory access

2018-09-12 Thread Junio C Hamano
Thomas Gummerer  writes:

>> > I'm looking into why that fails.  Also adding Dscho to Cc here as the
>> > author of this code.
>> 
>> The diff below seems to fix it.  Not submitting this as a proper
>> patch [...]
>
> I found the time to actually have a look at the paper, so here's a
> proper patch:
>
> I'm still not entirely sure what the initial code tried to do here,

It looks to me an attempt to optimize (instead of starting from a
value that is too big to be minimum, pick the first value and start
from there, and all the "found even smaller one, so let's replace"
later would work the same way) that went wrong (just that the "first
one" was written incorrectly), but it is not absolutely necessary to
find out why the code was written in a particular way that happened
to be buggy.

> but I think staying as close as possible to the original is probably
> our best option here, also for future readers of this code.

Thanks for digging.

> --- >8 ---
>
> Subject: [PATCH] linear-assignment: fix potential out of bounds memory access
>
> Currently the 'compute_assignment()' function can may read memory out
> of bounds, even if used correctly.  Namely this happens when we only
> have one column.  In that case we try to calculate the initial
> minimum cost using '!j1' as column in the reduction transfer code.
> That in turn causes us to try and get the cost from column 1 in the
> cost matrix, which does not exist, and thus results in an out of
> bounds memory read.

This nicely explains what goes wrong.

> Instead of trying to intialize the minimum cost from another column,
> just set it to INT_MAX.  This also matches what the example code in the
> original paper for the algorithm [1] does (it initializes the value to
> inf, for which INT_MAX is the closest match in C).

Yeah, if we really want to avoid INT_MAX we could use another "have
we found any value yet?" boolean variable, but the caller in
get_correspondences() does not even worry about integer overflows
when stuffing diffsize to the cost[] array, and the other possible
value that can go to cost[] array is COST_MAX that is mere 65k, so
it would be OK to use INT_MAX as sentinel here, I guess.

> Note that the test only fails under valgrind on Linux, but the same
> command has been reported to segfault on Mac OS.
>
> Also start from 0 in the loop, which matches what the example code in
> the original paper does as well.  Starting from 1 means we'd ignore
> the first column during the reduction transfer phase.  Note that in
> the original paper the loop does start from 1, but the implementation
> is in Pascal, where arrays are 1 indexed.
>
> [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
>  algorithm for dense and sparse linear assignment
>  problems. Computing, 38(4), 325–340.
>
> Reported-by: ryenus 
> Helped-by: Derrick Stolee 
> Signed-off-by: Thomas Gummerer 
> ---
>  linear-assignment.c   | 4 ++--
>  t/t3206-range-diff.sh | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/linear-assignment.c b/linear-assignment.c
> index 9b3e56e283..7700b80eeb 100644
> --- a/linear-assignment.c
> +++ b/linear-assignment.c
> @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> int *cost,
>   else if (j1 < -1)
>   row2column[i] = -2 - j1;
>   else {
> - int min = COST(!j1, i) - v[!j1];
> - for (j = 1; j < column_count; j++)
> + int min = INT_MAX;
> + for (j = 0; j < column_count; j++)
>   if (j != j1 && min > COST(j, i) - v[j])
>   min = COST(j, i) - v[j];
>   v[j1] -= min;
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 2237c7f4af..fb4c13a84a 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'no commits on one side' '
> + git commit --amend -m "new message" &&
> + git range-diff master HEAD@{1} HEAD
> +'
> +
>  test_done


Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines

2018-09-12 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> There have been a few bugs in recent patches what would have been caught if
> the test suite covered those blocks (including a few of mine). I want to
> work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.

Nice.

> It is important to not measure the coverage of the codebase by what old code
> is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
> After creating the coverage statistics at a version (say, 'topic') you can
> then run
>
> contrib/coverage-diff.sh base topic
> ...
> Using this 'git blame' output, we can quickly inspect whether the uncovered
> lines are appropriate. For instance:
> ...
> I used this approach for 'next' over 'master' and got a larger list, some of
> which I have already submitted tests to increase coverage [2] or will be
> covered by topics not in 'next' [3].
>
> Thanks, -Stolee

Thanks for working on this.


Re: [PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens

2018-09-12 Thread Junio C Hamano
William Chargin  writes:

> While the `test_dir_is_empty` function appears correct in most normal
> use cases, it can improperly pass if a directory contains a filename
> with a newline, and can improperly fail if an empty directory looks like
> an argument to `ls`. This patch changes the implementation to check that
> the output of `ls -a` has at most two lines (for `.` and `..`), which
> should be better behaved, and adds the `--` delimiter before the
> directory name when invoking `ls`.

AFIAK dot and dot-dot are allowed not to exist; "at most two" is not
a good test.

Quite honestly, our tests are still run inside a sort-of controlled
environment, so if it _requires_ use of things we have avoided
depending on, like "ls -A" and "xargs -0", or the fact that most
filesystems always have "." and ".." even in an empty directory, in
order to be resistant to funnily-named files like dot-LF-dot, I
would say it is not worth worrying about these funny names--instead
we can simply refrain from using such a pathological name, can't we?

In other words, is there a real-world need in the context of our
test suite for this change?

Also, I find that its support for directories whose names begin with
a dash red-herring.  All the test scripts in our test suite knows that
they can prefix "./" to avoid problems, i.e.

test_dir_is_empty ./--wat

So it appears that the only problematic case is when we create a
directory, create a file or a directory whose name is dot-LF-dot and
nothing else, and then do something that ought to cause that file to
disappear, and make sure that the directory is empty, e.g.

mkdir empty &&
echo foo >"empty/$dotLFdot" &&
git add "empty/$dotLFdot" &&
git reset --hard &&
test_dir_is_empty empty

We do want to make sure funny names can be added with "git add" and
"git reset --hard" to HEAD that lacked those paths with funny names
to remove them correctly.  But the funny names used in such a test
do not have to be $dotLFdot; you can use "${dotLFdot}X" instead in
the above and can ensure whatever the original test wanted to
ensure.

So...





Re: [PATCH 1/1] commit-reach: properly peel tags

2018-09-12 Thread Jeff King
On Wed, Sep 12, 2018 at 07:22:56AM -0700, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee 
> 
> The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
> "commit-reach: make can_all_from_reach... linear" but incorrectly
> assumed that all objects provided were commits. During a fetch
> negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
> to the 'from' array. The current code creates a segfault.
> 
> Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
> and add a test in t6600-test-reach.sh that demonstrates this segfault.

Thanks, that makes a lot of sense for reproducing. I almost wonder if
the whole X_array of commits in test-reach could just go away, and we'd
feed whatever list of objects the caller happens to send. That may make
it simpler to include non-commit objects in a variety of tests.

That said, I didn't look closely at other fallout in the program from
that, so I'll defer to your judgement.

> Correct the issue by peeling tags when investigating the initial list
> of objects in the 'from' array.
> 
> Signed-off-by: Jeff King 

I'm not sure if this should just be Reported-by, since I don't know that
I actually contributed any code. ;) But for anything I might have
contributed, certainly you have my signoff.

>   for (i = 0; i < from->nr; i++) {
> - list[i] = (struct commit *)from->objects[i].item;
> + struct object *from_one = from->objects[i].item;
>  
> - if (parse_commit(list[i]) ||
> - list[i]->generation < min_generation)
> - return 0;
> + from_one = deref_tag(the_repository, from_one,
> +  "a from object", 0);
> + if (!from_one || from_one->type != OBJ_COMMIT) {
> + from->objects[i].item->flags |= assign_flag;
> + continue;
> + }

I didn't resurrect the comment from this conditional that was in the
original code (mostly because I wasn't sure if the reasoning was still
entirely valid, or what setting the flag here actually means). But it's
probably worth saying something here to explain why it's OK to punt on
this case, and what it means to just set the flag.

> [...]

The rest of the patch looks sane to me. I didn't go through the trouble
to reproduce the segfault with the test, but it sounds like you did.

-Peff


Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines

2018-09-12 Thread Ben Peart




On 9/12/2018 12:45 PM, Derrick Stolee via GitGitGadget wrote:

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report



Very nice, I was unaware of the coverage test make targets.  I like the 
new report; it makes it easier to verify any new changes are actually 
tested.


 
  4. The lines in read-cache.c are part of a new block for the condition "if

 (expand_name_field)" as part of an optimization. These lines should
 probably be covered before that series is merged to 'next'. I understand
 that Ben and Duy are continuing work in this direction [1].
 


This code is only exercised when the index format is V4 but the default 
is version 2/3 [1].  To enable the test suite to use version 4 and test 
those code paths will require the addition of a new 
GIT_TEST_INDEX_VERSION environment variable.  I'll add that to my TODO list.


[1] https://git-scm.com/docs/git/2.1.0



[PATCH v1 1/1] Make git_check_attr() a void function

2018-09-12 Thread tboegi
From: Torsten Bögershausen 

git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.

Signed-off-by: Torsten Bögershausen 
---
 archive.c  |  3 ++-
 attr.c |  8 +++-
 attr.h |  4 ++--
 builtin/check-attr.c   |  3 +--
 builtin/pack-objects.c |  3 +--
 convert.c  | 42 ++--
 ll-merge.c | 16 +++
 userdiff.c |  3 +--
 ws.c   | 44 +++---
 9 files changed, 57 insertions(+), 69 deletions(-)

diff --git a/archive.c b/archive.c
index 0a07b140fe..c1870105eb 100644
--- a/archive.c
+++ b/archive.c
@@ -110,7 +110,8 @@ static const struct attr_check *get_archive_attrs(struct 
index_state *istate,
static struct attr_check *check;
if (!check)
check = attr_check_initl("export-ignore", "export-subst", NULL);
-   return git_check_attr(istate, path, check) ? NULL : check;
+   git_check_attr(istate, path, check);
+   return check;
 }
 
 static int check_attr_export_ignore(const struct attr_check *check)
diff --git a/attr.c b/attr.c
index 98e4953f6e..60d284796d 100644
--- a/attr.c
+++ b/attr.c
@@ -1143,9 +1143,9 @@ static void collect_some_attrs(const struct index_state 
*istate,
fill(path, pathlen, basename_offset, check->stack, check->all_attrs, 
rem);
 }
 
-int git_check_attr(const struct index_state *istate,
-  const char *path,
-  struct attr_check *check)
+void git_check_attr(const struct index_state *istate,
+   const char *path,
+   struct attr_check *check)
 {
int i;
 
@@ -1158,8 +1158,6 @@ int git_check_attr(const struct index_state *istate,
value = ATTR__UNSET;
check->items[i].value = value;
}
-
-   return 0;
 }
 
 void git_all_attrs(const struct index_state *istate,
diff --git a/attr.h b/attr.h
index 2be86db36e..b0378bfe5f 100644
--- a/attr.h
+++ b/attr.h
@@ -63,8 +63,8 @@ void attr_check_free(struct attr_check *check);
  */
 const char *git_attr_name(const struct git_attr *);
 
-int git_check_attr(const struct index_state *istate,
-  const char *path, struct attr_check *check);
+void git_check_attr(const struct index_state *istate,
+   const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index c05573ff9c..30a2f84274 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -65,8 +65,7 @@ static void check_attr(const char *prefix,
if (collect_all) {
git_all_attrs(_index, full_path, check);
} else {
-   if (git_check_attr(_index, full_path, check))
-   die("git_check_attr died");
+   git_check_attr(_index, full_path, check);
}
output_attr(check, file);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d1144a8f7e..eb71dab5be 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -951,8 +951,7 @@ static int no_try_delta(const char *path)
 
if (!check)
check = attr_check_initl("delta", NULL);
-   if (git_check_attr(_index, path, check))
-   return 0;
+   git_check_attr(_index, path, check);
if (ATTR_FALSE(check->items[0].value))
return 1;
return 0;
diff --git a/convert.c b/convert.c
index 6057f1f580..e0848226d2 100644
--- a/convert.c
+++ b/convert.c
@@ -1297,6 +1297,7 @@ static void convert_attrs(const struct index_state 
*istate,
  struct conv_attrs *ca, const char *path)
 {
static struct attr_check *check;
+   struct attr_check_item *ccheck = NULL;
 
if (!check) {
check = attr_check_initl("crlf", "ident", "filter",
@@ -1306,30 +1307,25 @@ static void convert_attrs(const struct index_state 
*istate,
git_config(read_convert_config, NULL);
}
 
-   if (!git_check_attr(istate, path, check)) {
-   struct attr_check_item *ccheck = check->items;
-   ca->crlf_action = git_path_check_crlf(ccheck + 4);
-   if (ca->crlf_action == CRLF_UNDEFINED)
-   ca->crlf_action = git_path_check_crlf(ccheck + 0);
-   ca->ident = git_path_check_ident(ccheck + 1);
-   ca->drv = git_path_check_convert(ccheck + 2);
-   if (ca->crlf_action != CRLF_BINARY) {
-   enum eol eol_attr = git_path_check_eol(ccheck + 3);
-   if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
-   ca->crlf_action = CRLF_AUTO_INPUT;
-   else if (ca->crlf_action == CRLF_AUTO && eol_attr == 
EOL_CRLF)
-

Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> For Gerrit users that use submodules the invocation of fetch without a
> branch is their main use case.

That's way under explains this commit.  It is totally unclear how
that statement of fact relates to the problem this patch is trying
to address; it does not even make it clear what problem is being
addressed by the patch.

>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/fetch.c | 5 -
>  t/t5526-fetch-submodules.sh | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 95c44bf6ffa..ea6ecd123e7 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, 
> const char *remote_name,
>   rc |= update_local_ref(ref, what, rm, ,
>  summary_width);
>   free(ref);
> - } else
> + } else {
> + check_for_new_submodule_commits(>old_oid);

This happens when there is no "ref", which is set only when
rm->peer_ref exists, which is set only when we are using remote
tracking branch (or more generally storing the fetched rev somewhere
in our refs/ hierarchy), e.g. the rev is recorded only in FETCH_HEAD.

What does rm->old_oid have in such a case?  Is this the tip of the
superproject history we just fetched?

When we keep record of what we saw in the previous attempt to fetch,
we can tell "we have seen their history up to this old commit
before, and now we fetched their history up to this new commit" and
the question "during that time, which submodules have been modified
in the history of the superproject" becomes answerable.  When we are
not keeping the record of previous fetch, how would we answer that
question without going through the whole history?

The answer is that check-for-new does not even do the "old
branch tip was X and new branch tip is Y, so we can look
only at X..Y"; it only cares about the new branch tip of the
superproject, and excludes the existing tips of all branches
in the superproject (i.e. computing something akin to "Y
--not --all" instead of "X..Y").

So, I guess this is probably reasonable.  But does the call to
"check-for-new submodule" need to be unconditional?  In this
codepath, do we know when we are not doing a recursive fetch in a
superproject?  If so, perhaps we can omit the cost of going through
all the refs to populate ref_tips_before_fetch array in such a case.

>   format_display(, '*',
>  *kind ? kind : "branch", NULL,
>  *what ? what : "HEAD",
>  "FETCH_HEAD", summary_width);
> + }
> +
>   if (note.len) {
>   if (verbosity >= 0 && !shown_url) {
>   fprintf(stderr, _("From %.*s\n"),
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index af12c50e7dd..a509eabb044 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when 
> they are not reachable" '
>   git update-ref refs/changes/2 $D &&
>   (
>   cd downstream &&
> - git fetch --recurse-submodules --recurse-submodules-default 
> on-demand origin refs/changes/2:refs/heads/my_branch &&
> + git fetch --recurse-submodules origin refs/changes/2 &&
>   git -C submodule cat-file -t $C &&
>   git checkout --recurse-submodules FETCH_HEAD
>   )


[Question] Signature calculation ignoring parts of binary files

2018-09-12 Thread Randall S. Becker
I feel really bad asking this, and I should know the answer, and yet.

I have a binary file that needs to go into a repo intact (unchanged). I also
have a program that interprets the contents, like a textconv, that can
output the relevant portions of the file in whatever format I like - used
for diff typically, dumps in 1K chunks by file section. What I'm looking for
is to have the SHA1 signature calculated with just the relevant portions of
the file so that two actually different files will be considered the same by
git during a commit or status. In real terms, I'm trying to ignore the
Creator metadata of a JPG because it is mutable and irrelevant to my repo
contents.

I'm sorry to ask, but I thought this was in .gitattributes but I can't
confirm the SHA1 behaviour.

Sheepishly,
Randall


-- Brief whoami:
 NonStop developer since approximately 2112884442
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.





Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines

2018-09-12 Thread Derrick Stolee

On 9/12/2018 12:45 PM, Derrick Stolee via GitGitGadget wrote:

For example, I ran this against the 'jch' branch (d3c0046)
versus 'next' (dd90340)


As another example, I ran this against the 'pu' branch (4c416a53) versus 
'jch' (d3c0046) and got the following output, submitted here without 
commentary:


builtin/bisect--helper.c
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
43) free((void *) terms->term_good);
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
162)    if (get_oid_commit(commit, ))
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
163)    return error(_("'%s' is not a valid commit"), 
commit);
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
164)    strbuf_addstr(, commit);
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
172)    error(_("Could not check out original HEAD '%s'. 
Try "
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
174)    strbuf_release();
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
175)    argv_array_clear();
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
176)    return -1;
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
215)    error(_("Bad bisect_write argument: %s"), state);
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
216)    goto fail;
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
220)    error(_("couldn't get the oid of the rev '%s'"), rev);
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
221)    goto fail;
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
226)    goto fail;
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
230)    error_errno(_("couldn't open the file '%s'"), 
git_path_bisect_log());
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 
231)    goto fail;

0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 242)fail:
0b1f0fd910c (Pranit Bauva   2017-10-27 15:06:37 + 243)    retval 
= -1;
a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 
323)    yesno = git_prompt(_("Are you sure [Y/n]? "), PROMPT_ECHO);
a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 
324)    if (starts_with(yesno, "N") || starts_with(yesno, "n"))
a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 
327)    goto finish;
a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 
336)    error(_("You need to start by \"git bisect start\". You "
a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 
341)    goto fail;

a919f328ba3 (Pranit Bauva   2017-10-27 15:06:37 + 345)fail:
35f7ca528ae (Pranit Bauva   2017-10-27 15:06:37 + 
387)    return error(_("--bisect-term requires exactly one 
argument"));
35f7ca528ae (Pranit Bauva   2017-10-27 15:06:37 + 
400)    error(_("BUG: invalid argument %s for 'git 
bisect terms'.\n"
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
416)    return -1;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
419)    goto fail;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
423)    goto fail;

5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 427)fail:
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 428)    retval 
= -1;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
466)    no_checkout = 1;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
488) !one_of(arg, "--term-good", "--term-bad", NULL)) {
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
489)    return error(_("unrecognised option: '%s'"), arg);
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
523)    if (get_oid("HEAD", _oid))
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
524)    return error(_("Bad HEAD - I need a HEAD"));
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
539)    error(_("checking out '%s' failed. Try 
'git "
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
559)    return error(_("won't bisect on 
cg-seek'ed tree"));
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
562)    return error(_("Bad HEAD - strange symbolic ref"));
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
570)    return -1;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
588)    goto fail;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
598)    goto fail;
5dfeec316b8 (Pranit Bauva   2017-10-27 15:06:37 + 
606)    goto fail;
3d3237b0e6b (Pranit Bauva   2017-10-27 15:06:37 + 
686)    return error(_("--bisect-reset requires either 
no argument or a commit"));
0b1f0fd910c

Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it

2018-09-12 Thread Stefan Beller
On Wed, Sep 12, 2018 at 11:18 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > We can string_list_insert() to maintain sorted-ness of the
> > list as we find new items, or we can string_list_append() to
> > build an unsorted list and sort it at the end just once.
> >
> > To pick which one is more appropriate, we notice the fact
> > that we discover new items more or less in the already
> > sorted order.  That makes "append then sort" more
> > appropriate.
>
> Sorry, but I still do not get the math you are implying in the
> second paragraph.  Are you saying that append-then-sort is efficient
> when items being appended is already sorted?  That depends on the
> sorting algorithm used, so the logic is incomplete unless you say
> "given that we use X for sorting,...", I think.
>
> Do we really discover new items in sorted order, by the way?  In a
> single diff invocation made inside collect_changed_submodules() for
> one commit in the superproject's history, we will grab changed paths
> in the pathname order (i.e. sorted); if the superproject's tip commit
> touches the submodules at paths A and Z, we will discover these two
> paths in sorted order.
>
> But because we are walking the superproject's history to collect all
> paths that have been affected in that function, and repeatedly
> calling diff as we discover commit in the superproject's history, I
> am not sure how well the resulting set of paths would be sorted.
>
> The tip commit in superproject's history may have modified the
> submodule at path X, the parent of that commit may have touched the
> submodule at path M, and its parent may have touched the submodule
> at path A.  Don't we end up grabbing these paths in that discoverd
> order, i.e. X, M and A?

That is true.

>
> I still think changing it from "insert as we find an item, keeping
> the list sorted" to "append all and then sort before we start
> looking things up from the result" makes sense, but I do not think
> the "we find things in sorted order" is either true, or it would
> affect the choice between the two.  A justification to choose the
> latter I can think of that makes sense is that we don't have to pay
> cost to keep the list sorted while building it because we do not do
> any look-up while building the list.

ok.

Thanks,
Stefan


Re: [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> Retry fetching a submodule if the object id that the superproject points
> to, cannot be found.

By object name?  By attempting to fetch all refs?  Or by doing
something else?  Fetching by the exact object name is the most
efficient approach, but the server side may not be prepared to
serve such a request, and that is why spelling it out here would
help the readers.

> This doesn't support fetching to FETCH_HEAD yet, but only into a local
> branch.

It is not clear if this sentence is talking about the fetch done at
the superproject level, or what happens in a submodule repository
when this "retrying" happens.  Assuming that it is the former,
perhaps

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step.

would help the readers understand what you are trying to say.

> To make fetching into FETCH_HEAD work, we need some refactoring
> in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits'
> that is coming in the next patch.
>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/fetch.c |  9 ++--
>  submodule.c | 87 -
>  t/t5526-fetch-submodules.sh | 16 +++
>  3 files changed, 104 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 61bec5d213d..95c44bf6ffa 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
>   what = _("[new ref]");
>   }
>  
> - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> - (recurse_submodules != RECURSE_SUBMODULES_ON))
> + if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>   check_for_new_submodule_commits(>new_oid);
>   r = s_update_ref(msg, ref, 0);
>   format_display(display, r ? '!' : '*', what,
> @@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
>   strbuf_add_unique_abbrev(, >object.oid, 
> DEFAULT_ABBREV);
>   strbuf_addstr(, "..");
>   strbuf_add_unique_abbrev(, >new_oid, 
> DEFAULT_ABBREV);
> - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> - (recurse_submodules != RECURSE_SUBMODULES_ON))
> + if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>   check_for_new_submodule_commits(>new_oid);
>   r = s_update_ref("fast-forward", ref, 1);
>   format_display(display, r ? '!' : ' ', quickref.buf,
> @@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
>   strbuf_add_unique_abbrev(, >object.oid, 
> DEFAULT_ABBREV);
>   strbuf_addstr(, "...");
>   strbuf_add_unique_abbrev(, >new_oid, 
> DEFAULT_ABBREV);
> - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> - (recurse_submodules != RECURSE_SUBMODULES_ON))
> + if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>   check_for_new_submodule_commits(>new_oid);
>   r = s_update_ref("forced-update", ref, 1);
>   format_display(display, r ? '!' : '+', quickref.buf,

All of these used to react to any value set to recurse-submodules
that is not off or on (i.e. on-demand, default, none), but now
unless the value is explicitly set to off, we call into the check.
It is not immediately clear how that change is linked to the
retrying.  "When set to 'on', we did not check for new commits, but
now we do" can be read from the patch text but not the reasoning
behind it.

What was the reason why we didn't call "check-for-new" when recurse
is set to "on"?  Is it because "we are going to recurse anyway, so
there is no need to check to decide if we need to fetch in
submodule"?  And the reason why we now need to call when we are set
to recurse anyway is because check-for-new now learns much more than
just "do we need to cd there and run git-fetch? yes/no?"

The answers to these two questions would help readers in the log
message.

> diff --git a/submodule.c b/submodule.c
> index 1e6781504f0..a75146e89cf 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
>   int result;
>  
>   struct string_list changed_submodule_names;
> + struct string_list retry;
>  };
> -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
> STRING_LIST_INIT_DUP }
> +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
> +   STRING_LIST_INIT_DUP, \
> +   STRING_LIST_INIT_NODUP}
>  
>  static void calculate_changed_submodule_paths(
>   struct submodule_parallel_fetch *spf)
> @@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
>  {
>   int ret = 0;
>   struct 

[PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-12 Thread Thomas Gummerer
On 09/11, Thomas Gummerer wrote:
> On 09/11, Thomas Gummerer wrote:
> > I think you're on the right track here.  I can not test this on Mac
> > OS, but on Linux, the following fails when running the test under
> > valgrind:
> > 
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..a8b0ef8c1d 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'amend and check' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
> > 
> > valgrind gives me the following:
> > 
> > ==18232== Invalid read of size 4
> > ==18232==at 0x34D7B5: compute_assignment (linear-assignment.c:54)
> > ==18232==by 0x2A4253: get_correspondences (range-diff.c:245)
> > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==by 0x122698: run_builtin (git.c:418)
> > ==18232==by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==by 0x122BCC: run_argv (git.c:689)
> > ==18232==by 0x122D90: cmd_main (git.c:766)
> > ==18232==by 0x1D55A3: main (common-main.c:45)
> > ==18232==  Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd
> > ==18232==at 0x483777F: malloc (vg_replace_malloc.c:299)
> > ==18232==by 0x3381B0: do_xmalloc (wrapper.c:60)
> > ==18232==by 0x338283: xmalloc (wrapper.c:87)
> > ==18232==by 0x2A3F8C: get_correspondences (range-diff.c:207)
> > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==by 0x122698: run_builtin (git.c:418)
> > ==18232==by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==by 0x122BCC: run_argv (git.c:689)
> > ==18232==by 0x122D90: cmd_main (git.c:766)
> > ==18232==by 0x1D55A3: main (common-main.c:45)
> > ==18232== 
> > 
> > I'm looking into why that fails.  Also adding Dscho to Cc here as the
> > author of this code.
> 
> The diff below seems to fix it.  Not submitting this as a proper
> patch [...]

I found the time to actually have a look at the paper, so here's a
proper patch:

I'm still not entirely sure what the initial code tried to do here,
but I think staying as close as possible to the original is probably
our best option here, also for future readers of this code.

--- >8 ---

Subject: [PATCH] linear-assignment: fix potential out of bounds memory access

Currently the 'compute_assignment()' function can may read memory out
of bounds, even if used correctly.  Namely this happens when we only
have one column.  In that case we try to calculate the initial
minimum cost using '!j1' as column in the reduction transfer code.
That in turn causes us to try and get the cost from column 1 in the
cost matrix, which does not exist, and thus results in an out of
bounds memory read.

Instead of trying to intialize the minimum cost from another column,
just set it to INT_MAX.  This also matches what the example code in the
original paper for the algorithm [1] does (it initializes the value to
inf, for which INT_MAX is the closest match in C).

Note that the test only fails under valgrind on Linux, but the same
command has been reported to segfault on Mac OS.

Also start from 0 in the loop, which matches what the example code in
the original paper does as well.  Starting from 1 means we'd ignore
the first column during the reduction transfer phase.  Note that in
the original paper the loop does start from 1, but the implementation
is in Pascal, where arrays are 1 indexed.

[1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
 algorithm for dense and sparse linear assignment
 problems. Computing, 38(4), 325–340.

Reported-by: ryenus 
Helped-by: Derrick Stolee 
Signed-off-by: Thomas Gummerer 
---
 linear-assignment.c   | 4 ++--
 t/t3206-range-diff.sh | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..7700b80eeb 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int 
*cost,
else if (j1 < -1)
row2column[i] = -2 - j1;
else {
-   int min = COST(!j1, i) - v[!j1];
-   for (j = 1; j < column_count; j++)
+   int min = INT_MAX;
+   for (j = 0; j < column_count; j++)
if (j != j1 && min > COST(j, i) - v[j])
min = COST(j, i) - v[j];
v[j1] -= min;
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..fb4c13a84a 100755
--- a/t/t3206-range-diff.sh
+++ 

[PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens

2018-09-12 Thread William Chargin
While the `test_dir_is_empty` function appears correct in most normal
use cases, it can improperly pass if a directory contains a filename
with a newline, and can improperly fail if an empty directory looks like
an argument to `ls`. This patch changes the implementation to check that
the output of `ls -a` has at most two lines (for `.` and `..`), which
should be better behaved, and adds the `--` delimiter before the
directory name when invoking `ls`.

The newly added unit test fails before this change and passes after it.

Signed-off-by: William Chargin 
---
This patch depends on "t: factor out FUNNYNAMES as shared lazy prereq"
(2018-08-06), which is now in master.

I originally wrote this patch for the standalone Sharness library, but
that library advises that such patches be sent to the Git mailing list
first.

Tested on GNU/Linux (Mint 18.2) and macOS (10.13).

 t/t-basic.sh| 43 +
 t/test-lib-functions.sh |  2 +-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4e..a5c57c6aa5 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -821,6 +821,49 @@ test_expect_success 'tests clean up even on failures' "
EOF
 "
 
+test_expect_success FUNNYNAMES \
+   'test_dir_is_empty behaves even in pathological cases' "
+   run_sub_test_lib_test \
+   dir-empty 'behavior of test_dir_is_empty' <<-\\EOF &&
+   test_expect_success 'should pass with actually empty directory' '
+   mkdir empty_dir &&
+   test_dir_is_empty empty_dir
+   '
+   test_expect_success 'should fail with a normal filename' '
+   mkdir nonempty_dir &&
+   >nonempty_dir/some_file &&
+   ! test_dir_is_empty nonempty_dir
+   '
+   test_expect_success 'should fail with dot-newline-dot filename' '
+   mkdir pathological_dir &&
+   >\"pathological_dir/.
+   .\" &&
+   ! test_dir_is_empty pathological_dir
+   '
+   test_expect_success 'should pass with an empty directory \"-l\"' '
+   mkdir -- -l &&
+   test_dir_is_empty -l &&
+   rmdir -- -l
+   '
+   test_expect_success 'should pass with an empty directory \"--wat\"' '
+   mkdir -- --wat &&
+   test_dir_is_empty --wat &&
+   rmdir -- --wat
+   '
+   test_done
+   EOF
+   check_sub_test_lib_test dir-empty <<-\\EOF
+   > ok 1 - should pass with actually empty directory
+   > ok 2 - should fail with a normal filename
+   > ok 3 - should fail with dot-newline-dot filename
+   > ok 4 - should pass with an empty directory \"-l\"
+   > ok 5 - should pass with an empty directory \"--wat\"
+   > # passed all 5 test(s)
+   > 1..5
+   EOF
+"
+
+
 
 # Basics of the basics
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 4207af4077..3df6b8027f 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -576,7 +576,7 @@ test_path_exists () {
 # Check if the directory exists and is empty as expected, barf otherwise.
 test_dir_is_empty () {
test_path_is_dir "$1" &&
-   if test -n "$(ls -a1 "$1" | egrep -v '^\.\.?$')"
+   if test "$(ls -a1 -- "$1" | wc -l)" -gt 2
then
echo "Directory '$1' is not empty, it contains:"
ls -la "$1"
-- 
2.18.0.549.gd66323a05



Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> This patch started as a refactoring to make 'get_next_submodule' more
> readable, but upon doing so, I realized that git-fetch actually doesn't
> need to be run in the worktree. So let's run it in the git dir instead.

It may be clear to the author but not clear to the reader of the
above paragraph that "worktree", "fetch" and "git dir" all refer to
the recursively invoked operation that updates the submodules
repository.  s/git-fetch/"git fetch" for the submodule/ should be
sufficient to help the readers.

> That should pave the way towards fetching submodules that are currently
> not checked out.

Very good.

> +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> +{
> + prepare_submodule_repo_env_no_git_dir(out);
> + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> +}
> +
>  /* Helper function to display the submodule header line prior to the full
>   * summary output. If it can locate the submodule objects directory it will
>   * attempt to lookup both the left and right commits and put them into the
> @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
> submodule *submodule,
>   return spf->default_option;
>  }
>  
> +static const char *get_submodule_git_dir(struct repository *r, const char 
> *path)
> +{
> + struct repository subrepo;
> + const char *ret;
> +
> + if (repo_submodule_init(, r, path)) {
> + /* no entry in .gitmodules? */
> + struct strbuf gitdir = STRBUF_INIT;
> + strbuf_repo_worktree_path(, r, "%s/.git", path);
> + if (repo_init(, gitdir.buf, NULL)) {
> + strbuf_release();
> + return NULL;
> + }

This is for the modern "absorbed" layout?  Do we get a notice and
encouragement to migrate from the historical layout, or there is no
need to (e.g. the migration happens automatically in some other
codepaths)?

> + }
> +
> + ret = xstrdup(subrepo.gitdir);
> + repo_clear();
> +
> + return ret;
> +}

Returned value from this function is xstrdup()'ed so the caller
owns, not borrows.  There is no need to return "const char *" from
this function.  Also the caller needs to free it once done.

>  static int get_next_submodule(struct child_process *cp,
> struct strbuf *err, void *data, void **task_cb)
>  {
> @@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
>   struct submodule_parallel_fetch *spf = data;
>  
>   for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> - struct strbuf submodule_path = STRBUF_INIT;
> - struct strbuf submodule_git_dir = STRBUF_INIT;
>   struct strbuf submodule_prefix = STRBUF_INIT;
>   const struct cache_entry *ce = spf->r->index->cache[spf->count];
>   const char *git_dir, *default_argv;
> @@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process 
> *cp,
>   continue;
>   }
>  
> - strbuf_repo_worktree_path(_path, spf->r, "%s", 
> ce->name);
> - strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
>   strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
> - git_dir = read_gitfile(submodule_git_dir.buf);
> - if (!git_dir)
> - git_dir = submodule_git_dir.buf;
> - if (is_directory(git_dir)) {

In the old code, git_dir came from read_gitfile() which borrowed.

> + git_dir = get_submodule_git_dir(spf->r, ce->name);

In the new code, we own it, so we'd eventually need to get rid of
it.  How does it happen?

> + if (git_dir) {
>   child_process_init(cp);
> - cp->dir = strbuf_detach(_path, NULL);
> - prepare_submodule_repo_env(>env_array);
> + prepare_submodule_repo_env_in_gitdir(>env_array);
> + cp->dir = git_dir;

Does cp now own it and cp->dir gets freed once run_processes_parallel()
is done with this task?  Or is cp->dir simply leaking?  The old code
gave the result of strbuf_detach(), so even if cp->dir is leaking,
the leak is not new in this patch.

>   cp->git_cmd = 1;
>   if (!spf->quiet)
>   strbuf_addf(err, "Fetching submodule %s%s\n",
> @@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp,
>   argv_array_push(>args, submodule_prefix.buf);
>   ret = 1;
>   }
> - strbuf_release(_path);
> - strbuf_release(_git_dir);

But if it is a leak, it is easily plugged by freeing git_dir here, I
think.

Thanks.


>   strbuf_release(_prefix);
>   if (ret) {
>   spf->count++;
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 

[PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens

2018-09-12 Thread William Chargin
While the `test_dir_is_empty` function appears correct in most normal
use cases, it can improperly pass if a directory contains a filename
with a newline, and can improperly fail if an empty directory looks like
an argument to `ls`. This patch changes the implementation to check that
the output of `ls -a` has at most two lines (for `.` and `..`), which
should be better behaved, and adds the `--` delimiter before the
directory name when invoking `ls`.

The newly added unit test fails before this change and passes after it.

Signed-off-by: William Chargin 
---
This patch depends on "t: factor out FUNNYNAMES as shared lazy prereq"
(2018-08-06), which is now in master.

I originally wrote this patch for the standalone Sharness library, but
that library advises that such patches be sent to the Git mailing list
first.

Tested on GNU/Linux (Mint 18.2) and macOS (10.13).

 t/t-basic.sh| 43 +
 t/test-lib-functions.sh |  2 +-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4e..a5c57c6aa5 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -821,6 +821,49 @@ test_expect_success 'tests clean up even on failures' "
EOF
 "
 
+test_expect_success FUNNYNAMES \
+   'test_dir_is_empty behaves even in pathological cases' "
+   run_sub_test_lib_test \
+   dir-empty 'behavior of test_dir_is_empty' <<-\\EOF &&
+   test_expect_success 'should pass with actually empty directory' '
+   mkdir empty_dir &&
+   test_dir_is_empty empty_dir
+   '
+   test_expect_success 'should fail with a normal filename' '
+   mkdir nonempty_dir &&
+   >nonempty_dir/some_file &&
+   ! test_dir_is_empty nonempty_dir
+   '
+   test_expect_success 'should fail with dot-newline-dot filename' '
+   mkdir pathological_dir &&
+   >\"pathological_dir/.
+   .\" &&
+   ! test_dir_is_empty pathological_dir
+   '
+   test_expect_success 'should pass with an empty directory \"-l\"' '
+   mkdir -- -l &&
+   test_dir_is_empty -l &&
+   rmdir -- -l
+   '
+   test_expect_success 'should pass with an empty directory \"--wat\"' '
+   mkdir -- --wat &&
+   test_dir_is_empty --wat &&
+   rmdir -- --wat
+   '
+   test_done
+   EOF
+   check_sub_test_lib_test dir-empty <<-\\EOF
+   > ok 1 - should pass with actually empty directory
+   > ok 2 - should fail with a normal filename
+   > ok 3 - should fail with dot-newline-dot filename
+   > ok 4 - should pass with an empty directory \"-l\"
+   > ok 5 - should pass with an empty directory \"--wat\"
+   > # passed all 5 test(s)
+   > 1..5
+   EOF
+"
+
+
 
 # Basics of the basics
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 4207af4077..3df6b8027f 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -576,7 +576,7 @@ test_path_exists () {
 # Check if the directory exists and is empty as expected, barf otherwise.
 test_dir_is_empty () {
test_path_is_dir "$1" &&
-   if test -n "$(ls -a1 "$1" | egrep -v '^\.\.?$')"
+   if test "$(ls -a1 -- "$1" | wc -l)" -gt 2
then
echo "Directory '$1' is not empty, it contains:"
ls -la "$1"
-- 
2.18.0.549.gd66323a05



Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order.  That makes "append then sort" more
> appropriate.

Sorry, but I still do not get the math you are implying in the
second paragraph.  Are you saying that append-then-sort is efficient
when items being appended is already sorted?  That depends on the
sorting algorithm used, so the logic is incomplete unless you say
"given that we use X for sorting,...", I think.

Do we really discover new items in sorted order, by the way?  In a
single diff invocation made inside collect_changed_submodules() for
one commit in the superproject's history, we will grab changed paths
in the pathname order (i.e. sorted); if the superproject's tip commit
touches the submodules at paths A and Z, we will discover these two
paths in sorted order.

But because we are walking the superproject's history to collect all
paths that have been affected in that function, and repeatedly
calling diff as we discover commit in the superproject's history, I
am not sure how well the resulting set of paths would be sorted.

The tip commit in superproject's history may have modified the
submodule at path X, the parent of that commit may have touched the
submodule at path M, and its parent may have touched the submodule
at path A.  Don't we end up grabbing these paths in that discoverd
order, i.e. X, M and A?

I still think changing it from "insert as we find an item, keeping
the list sorted" to "append all and then sort before we start
looking things up from the result" makes sense, but I do not think
the "we find things in sorted order" is either true, or it would
affect the choice between the two.  A justification to choose the
latter I can think of that makes sense is that we don't have to pay
cost to keep the list sorted while building it because we do not do
any look-up while building the list.

> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index d29dfa3d1f5..c6eff7699f3 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
>   case RECURSE_SUBMODULES_DEFAULT:
>   case RECURSE_SUBMODULES_ON_DEMAND:
>   if (!submodule ||
> - !unsorted_string_list_lookup(
> + !string_list_lookup(
>   _submodule_names,
>   submodule->name))
>   continue;
> @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
>   /* default value, "--submodule-prefix" and its value are added later */
>  
>   calculate_changed_submodule_paths();
> + string_list_sort(_submodule_names);
>   run_processes_parallel(max_parallel_jobs,
>  get_next_submodule,
>  fetch_start_failure,


Re: [PATCH 3/9] submodule.c: fix indentation

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> The submodule subsystem is really bad at staying within 80 characters.
> Fix it while we are here.

Makes sense.  Thanks.

>
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index a2b266fbfae..d29dfa3d1f5 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
>   if (!submodule) {
>   const char *name = default_name_or_path(ce->name);
>   if (name) {
> - default_submodule.path = default_submodule.name 
> = name;
> + default_submodule.path = name;
> + default_submodule.name = name;
>   submodule = _submodule;
>   }
>   }
> @@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
>   default:
>   case RECURSE_SUBMODULES_DEFAULT:
>   case RECURSE_SUBMODULES_ON_DEMAND:
> - if (!submodule || 
> !unsorted_string_list_lookup(_submodule_names,
> -  submodule->name))
> + if (!submodule ||
> + !unsorted_string_list_lookup(
> + _submodule_names,
> + submodule->name))
>   continue;
>   default_argv = "on-demand";
>   break;


Re: [PATCH 2/9] sha1-array: provide oid_array_filter

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> Helped-by: Junio C Hamano 
> Signed-off-by: Stefan Beller 
> ---
>  sha1-array.c | 18 ++
>  sha1-array.h |  5 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/sha1-array.c b/sha1-array.c
> index 265941fbf40..76323935dd7 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,21 @@ int oid_array_for_each_unique(struct oid_array *array,
>   }
>   return 0;
>  }
> +
> +int oid_array_filter(struct oid_array *array,
> +  for_each_oid_fn fn,

It probably makes sense to call it "want" instead of "fn" to match
object_array_filter().

> +  void *cbdata)
> +{
> + int src, dst;
> +
> + for (src = dst = 0; src < array->nr; src++) {
> + if (fn(>oid[src], cbdata)) {
> + if (dst < src)
> + oidcpy(>oid[dst], >oid[src]);
> + dst++;
> + }

In fact, matching the implementation of object_array_fiter() may
also make sense, as I do not see a strong reason why the resulting
code would become better by rewriting "dst != src" over there to
"dst < src" here.

> + }
> + array->nr = dst;
> +
> + return 0;
> +}
> diff --git a/sha1-array.h b/sha1-array.h
> index 232bf950172..a2d7c210835 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -23,4 +23,9 @@ int oid_array_for_each_unique(struct oid_array *array,
> for_each_oid_fn fn,
> void *data);
>  
> +/* Call fn for each oid, and remove it if fn returns 0, retain it otherwise 
> */

Also perhaps mimic the wording of object_array_filter()'s comment?
I find it easier that the latter says "retaining only if X" instead
of saying "remove if !X, retain otherwise"; it's both shorter and
more to the point.  It also is nicer that it notes that the order is
preserved.

> +int oid_array_filter(struct oid_array *array,
> +  for_each_oid_fn fn,
> +  void *cbdata);
> +

Other than that, the function makes sense very much, and the
callsite we see in patch 8/9 does, too.

Thanks.

>  #endif /* SHA1_ARRAY_H */


Re: [PATCH 1/9] string-list: add string_list_{pop, last} functions

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> Add a few functions to allow a string-list to be used as a stack:
>
>  - string_list_last() lets a caller peek the string_list_item at the
>end of the string list.  The caller needs to be aware that it is
>borrowing a pointer, which can become invalid if/when the
>string_list is resized.
>
>  - string_list_pop() removes the string_list_item at the end of
>the string list.
>
>  - _pop usually has a friend _push. This role is taken by
> string_list_append already, as they are not symmetrical
> in our code base: _append returns the pointer, such that
> adding a util is easy, but _pop doesn't return such a pointer.

This third-item somehow is indented one place too deeply.

"our pop() does not return a pointer, so we won't introduce push()
that does return a pointer"?  Am I reading the third item right?

pop() and push() usually go together, but they are opposite
operations, they are expected to behave oppositely, and they are
expected to be interfaced differently.  pop() that does not return a
pointer is no better or no worse match to push() that returns a
pointer, no?

Lack of something_push(), when something_pop() exists, would always
be surprising to anybody new to the "something" API, and no amount
of mumbling would justify it, I would think, but this third item
sounds like it is trying to make a lame excuse when there is no need
to.

Wouldn't it be sufficient to say "there is no string_list_push();
string_list_append() can be used in its place" and stop there?

> You can use them in this pattern:
>
> while (list.nr) {
> struct string_list_item *item = string_list_last();
>
> work_on(item);
> string_list_pop();
> }

"free_util" as the second argument to this sample call is missing.

>  string-list.c | 14 ++
>  string-list.h | 11 +++
>  2 files changed, 25 insertions(+)

> diff --git a/string-list.c b/string-list.c
> index 771c4550980..04db2b537c0 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const 
> char *string,
>   }
>  }
>  
> +void string_list_pop(struct string_list *list, int free_util)
> +{
> + if (list->nr == 0)
> + BUG("tried to remove an item from empty string list");
> +
> + if (list->strdup_strings)
> + free(list->items[list->nr - 1].string);
> +
> + if (free_util)
> + free(list->items[list->nr - 1].util);
> +
> + list->nr--;
> +}
> +
>  int string_list_has_string(const struct string_list *list, const char 
> *string)
>  {
>   int exact_match;
> diff --git a/string-list.h b/string-list.h
> index ff8f6094a33..ce2528bbe15 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, 
> const char *string,
>   */
>  struct string_list_item *string_list_lookup(struct string_list *list, const 
> char *string);
>  
> +/**
> + * Removes the last item from the list.
> + * The caller must ensure that the list is not empty.
> + */
> +void string_list_pop(struct string_list *list, int free_util);
> +
> +static inline struct string_list_item *string_list_last(struct string_list 
> *list)

We may want to warn users that pop(), append(), etc. shouldn't be
done while using the returned value from this function in an in-code
comment or docstring.

> +{
> + return >items[list->nr - 1];
> +}
> +

Other than that, nicely done.

>  /*
>   * Remove all but the first of consecutive entries with the same
>   * string value.  If free_util is true, call free() on the util


Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)

2018-09-12 Thread Ævar Arnfjörð Bjarmason


On Tue, Sep 11 2018, Junio C Hamano wrote:

> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

Can you please pick up v5 of the patches René & I had for various fsck
skipList improvements, at:
https://public-inbox.org/git/20180903144928.30691-1-ava...@gmail.com/

Looks like v5 is ready be picked up, and might have fallen through the
cracks during the release period.


Starting subshells via tags

2018-09-12 Thread Dave Marotti
Hello -

This is not a git bug or issue. It's just something I stumbled across
and didn't see any google results for. I thought others would benefit
from being aware of it.

I saw a makefile which included of "git describe --tags --dirty" to
define version information for a binary's --version command line
parameter. The commit/tag information was passed to g++ in a Makefile
via:

CXXFLAGS += -DBUILD_COMMIT="\"$(shell git describe --tags --dirty)\""

For fun (on Linux) I made simple c++ program and Makefile with the
above CXXFLAGS, and a tag (backticks work too): git tag
'$(echo>/tmp/test.txt)'

Then built. Make executes the git command via a shell and the shell
executes the subshell. /tmp/test.txt was created.

The tags themselves don't allow spaces so the complexity of the
command is limited, though I didn't explore what I could do with
chaining shells or escape characters. It's easy enough to add a script
to the repository where the tag is located and execute that script
from the tag's subshell with a tag, such as $(./test.sh).

Again, this is not at all a git issue, git is just used as the
transport. As with every other shell attack, it comes down to "always
sanitize what you pass through to a shell". Or don't pass it to the
shell at all, use another mechanism.


[PATCH 0/1] contrib: Add script to show uncovered "new" lines

2018-09-12 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has an
X.c.gcov file containing the coverage counts for every line, and "#" at
every uncovered line.

There have been a few bugs in recent patches what would have been caught if
the test suite covered those blocks (including a few of mine). I want to
work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by
the test suite. For example, I ran this against the 'jch' branch (d3c0046)
versus 'next' (dd90340) and got the following output:

builtin/commit.c
859fdc0c3cf (Derrick Stolee 2018-08-29 05:49:04 -0700   1657)   
write_commit_graph_reachable(get_object_directory(), 0);
builtin/rev-list.c
250edfa8c87 (Harald Nordgren2018-04-18 23:05:35 +0200   431)
bisect_flags |= BISECT_FIND_ALL;
builtin/worktree.c
e5353bef550 (Eric Sunshine  2018-08-28 17:20:19 -0400   60) 
error_errno(_("failed to delete '%s'"), sb.buf);
e19831c94f9 (Eric Sunshine  2018-08-28 17:20:23 -0400   251)
die(_("unable to re-add worktree '%s'"), path);
68a6b3a1bd4 (Eric Sunshine  2018-08-28 17:20:24 -0400   793)
die(_("cannot move a locked working tree, lock reason: %s\nuse 'move -f 
-f' to override or unlock first"),
f4143101cbb (Eric Sunshine  2018-08-28 17:20:25 -0400   906)
die(_("cannot remove a locked working tree, lock reason: %s\nuse 
'remove -f -f' to override or unlock first"),
read-cache.c
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1754)   
const unsigned char *cp = (const unsigned char *)name;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1757)   
previous_len = previous_ce ? previous_ce->ce_namelen : 0;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1758)   
strip_len = decode_varint();
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1759)   
if (previous_len < strip_len) {
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1760)   
if (previous_ce)
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1761)   
die(_("malformed name field in the index, near path 
'%s'"),
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1762)   
previous_ce->name);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1764)   
die(_("malformed name field in the index in the first 
path"));
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1766)   
copy_len = previous_len - strip_len;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1767)   
name = (const char *)cp;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1773)   
len += copy_len;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1794)   
if (copy_len)
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1795)   
memcpy(ce->name, previous_ce->name, copy_len);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1796)   
memcpy(ce->name + copy_len, name, len + 1 - copy_len);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1797)   
*ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len;
remote-curl.c
c3b9bc94b9b (Elijah Newren  2018-09-05 10:03:07 -0700   181)
options.filter = xstrdup(value);

Using this 'git blame' output, we can quickly inspect whether the uncovered
lines are appropriate. For instance:

 1. The line in builtin/commit.c is due to writing the commit-graph file
when GIT_TEST_COMMIT_GRAPH is enabled, which is not on by default in the
test suite. Being uncovered is expected here.


 2. The lines in builtin/worktree.c are all related to error conditions.
This is acceptable.


 3. The line in builtin/rev-list.c is a flag replacement in a block that is
otherwise unchanged. It must not be covered by the test suite normally.
This could be worth adding a test to ensure the new logic maintains old
behavior.


 4. The lines in 

[PATCH 1/1] contrib: add coverage-diff script

2018-09-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Signed-off-by: Derrick Stolee 
---
 contrib/coverage-diff.sh | 70 
 1 file changed, 70 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 00..22acb13d38
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+
+# Usage: 'contrib/coverage-diff.sh  
+# Outputs a list of new lines in version2 compared to version1 that are
+# not covered by the test suite. Assumes you ran
+# 'make coverage-test coverage-report' from root first, so .gcov files exist.
+
+V1=$1
+V2=$2
+
+diff-lines() {
+local path=
+local line=
+while read; do
+   esc=$'\033'
+   if [[ $REPLY =~ ---\ (a/)?.* ]]; then
+   continue
+   elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
+   path=${BASH_REMATCH[2]}
+   elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; 
then
+   line=${BASH_REMATCH[2]}
+   elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then
+   echo "$path:$line:$REPLY"
+   if [[ ${BASH_REMATCH[2]} != - ]]; then
+   ((line++))
+   fi
+   fi
+done
+}
+
+git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt
+
+for file in $(cat files.txt)
+do
+   hash_file=${file//\//\#}
+
+   git diff $V1 $V2 -- $file \
+   | diff-lines \
+   | grep ":+" \
+   >"diff_file.txt"
+
+   cat diff_file.txt \
+   | sed -E 's/:/ /g' \
+   | awk '{print $2}' \
+   | sort \
+   >new_lines.txt
+
+   cat "$hash_file.gcov" \
+   | grep \#\#\#\#\# \
+   | sed 's/#: //g' \
+   | sed 's/\:/ /g' \
+   | awk '{print $1}' \
+   | sort \
+   >uncovered_lines.txt
+
+   comm -12 uncovered_lines.txt new_lines.txt \
+   | sed -e 's/$/\)/' \
+   | sed -e 's/^/\t/' \
+   >uncovered_new_lines.txt
+
+   grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+   echo $file && \
+   git blame -c $file \
+   | grep -f uncovered_new_lines.txt
+
+   rm -f diff_file.txt new_lines.txt \
+   uncovered_lines.txt uncovered_new_lines.txt
+done
+
+rm -rf files.txt
-- 
gitgitgadget


[PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension

2018-09-12 Thread Ben Peart
The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.

Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.

The extension consists of:

- 32-bit offset to the end of the index entries

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

SHA-1("TREE" +  +
"REUC" + )

Signed-off-by: Ben Peart 
---
 Documentation/technical/index-format.txt |  23 
 read-cache.c | 154 +--
 t/README |   5 +
 t/t1700-split-index.sh   |   1 +
 4 files changed, 175 insertions(+), 8 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index db3572626b..6bc2d90f7f 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -314,3 +314,26 @@ The remaining data of each directory block is grouped by 
type:
 
   - An ewah bitmap, the n-th bit indicates whether the n-th index entry
 is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+  The End of Index Entry (EOIE) is used to locate the end of the variable
+  length index entries and the begining of the extensions. Code can take
+  advantage of this to quickly locate the index extensions without having
+  to parse through all of the index entries.
+
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+  The extension consists of:
+
+  - 32-bit offset to the end of the index entries
+
+  - 160-bit SHA-1 over the extension types and their sizes (but not
+   their contents).  E.g. if we have "TREE" extension that is N-bytes
+   long, "REUC" extension that is M-bytes long, followed by "EOIE",
+   then the hash would be:
+
+   SHA-1("TREE" +  +
+   "REUC" + )
diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..858935f123 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -43,6 +43,7 @@
 #define CACHE_EXT_LINK 0x6c696e6b/* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452   /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E   /* "FSMN" */
+#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1693,6 +1694,9 @@ static int read_index_extension(struct index_state 
*istate,
case CACHE_EXT_FSMONITOR:
read_fsmonitor_extension(istate, data, sz);
break;
+   case CACHE_EXT_ENDOFINDEXENTRIES:
+   /* already handled in do_read_index() */
+   break;
default:
if (*ext < 'A' || 'Z' < *ext)
return error("index uses %.4s extension, which we do 
not understand",
@@ -1889,6 +1893,11 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+#ifndef NO_PTHREADS
+static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
+#endif
+static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, unsigned long offset);
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2198,11 +2207,15 @@ static int ce_write(git_hash_ctx *context, int fd, void 
*data, unsigned int len)
return 0;
 }
 
-static int write_index_ext_header(git_hash_ctx *context, int fd,
- unsigned int ext, unsigned int sz)
+static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx 
*eoie_context,
+ int fd, unsigned int ext, unsigned int sz)
 {
ext = htonl(ext);
sz = htonl(sz);
+   if (eoie_context) {
+   the_hash_algo->update_fn(eoie_context, , 4);
+   the_hash_algo->update_fn(eoie_context, , 4);
+   }
return ((ce_write(context, fd, , 4) < 0) ||
(ce_write(context, fd, , 4) < 0)) ? -1 : 0;
 }
@@ -2445,7 +2458,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 {
uint64_t start = getnanotime();
int newfd = tempfile->fd;
-   git_hash_ctx c;
+   git_hash_ctx c, eoie_c;
struct cache_header hdr;
int i, err = 0, removed, 

Re: [RFC PATCH 0/1] builtin/remote: quote remote name on error to display empty name

2018-09-12 Thread Junio C Hamano
Shulhan  writes:

> Rationale: consistent error format
>
> When adding new remote name with empty string, git will print the
> following error message,
>
>   fatal: '' is not a valid remote name\n
>
> But when removing remote name with empty string as input, git did not
> print the empty string with quote,
>
>   fatal: No such remote: \n

Thanks for noticing.  I think the consistency given by the patch 1/1
is good.

> Follow up question: If this is ok, should the po files also updated?

They should be updated, but not by you and not as part of this
change.  The po files will be updated by our l10n coordinator,
typically when it gets close to the release time, for all the
message changes done during the release cycle.

Thanks.



[PATCH v5 4/5] read-cache.c: optimize reading index format v4

2018-09-12 Thread Ben Peart
From: Nguyễn Thái Ngọc Duy 

Index format v4 requires some more computation to assemble a path
based on a previous one. The current code is not very efficient
because

 - it doubles memory copy, we assemble the final path in a temporary
   first before putting it back to a cache_entry

 - strbuf_remove() in expand_name_field() is not exactly a good fit
   for stripping a part at the end, _setlen() would do the same job
   and is much cheaper.

 - the open-coded loop to find the end of the string in
   expand_name_field() can't beat an optimized strlen()

This patch avoids the temporary buffer and writes directly to the new
cache_entry, which addresses the first two points. The last point
could also be avoided if the total string length fits in the first 12
bits of ce_flags, if not we fall back to strlen().

Running "test-tool read-cache 100" on webkit.git (275k files), reading
v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more
time. The patch reduces read time on v4 to 4.319 seconds.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ben Peart 
---
 read-cache.c | 132 ++-
 1 file changed, 67 insertions(+), 65 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 880f627b4c..40dc4723b2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1721,33 +1721,6 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
-  struct ondisk_cache_entry 
*ondisk,
-  unsigned int flags,
-  const char *name,
-  size_t len)
-{
-   struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len);
-
-   ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
-   ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
-   ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec);
-   ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec);
-   ce->ce_stat_data.sd_dev   = get_be32(>dev);
-   ce->ce_stat_data.sd_ino   = get_be32(>ino);
-   ce->ce_mode  = get_be32(>mode);
-   ce->ce_stat_data.sd_uid   = get_be32(>uid);
-   ce->ce_stat_data.sd_gid   = get_be32(>gid);
-   ce->ce_stat_data.sd_size  = get_be32(>size);
-   ce->ce_flags = flags & ~CE_NAMEMASK;
-   ce->ce_namelen = len;
-   ce->index = 0;
-   hashcpy(ce->oid.hash, ondisk->sha1);
-   memcpy(ce->name, name, len);
-   ce->name[len] = '\0';
-   return ce;
-}
-
 /*
  * Adjacent cache entries tend to share the leading paths, so it makes
  * sense to only store the differences in later entries.  In the v4
@@ -1762,22 +1735,24 @@ static unsigned long expand_name_field(struct strbuf 
*name, const char *cp_)
 
if (name->len < len)
die("malformed name field in the index");
-   strbuf_remove(name, name->len - len, len);
-   for (ep = cp; *ep; ep++)
-   ; /* find the end */
+   strbuf_setlen(name, name->len - len);
+   ep = cp + strlen((const char *)cp);
strbuf_add(name, cp, ep - cp);
return (const char *)ep + 1 - cp_;
 }
 
-static struct cache_entry *create_from_disk(struct mem_pool *mem_pool,
+static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
+   unsigned int version,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
-   struct strbuf *previous_name)
+   const struct cache_entry 
*previous_ce)
 {
struct cache_entry *ce;
size_t len;
const char *name;
unsigned int flags;
+   size_t copy_len = 0;
+   int expand_name_field = version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1797,21 +1772,50 @@ static struct cache_entry *create_from_disk(struct 
mem_pool *mem_pool,
else
name = ondisk->name;
 
-   if (!previous_name) {
-   /* v3 and earlier */
-   if (len == CE_NAMEMASK)
-   len = strlen(name);
-   ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, name, 
len);
+   if (expand_name_field) {
+   const unsigned char *cp = (const unsigned char *)name;
+   size_t strip_len, previous_len;
 
-   *ent_size = ondisk_ce_size(ce);
-   } else {
-   unsigned long consumed;
-   consumed = expand_name_field(previous_name, name);
-   ce = cache_entry_from_ondisk(mem_pool, ondisk, flags,
-previous_name->buf,
- 

[PATCH v5 5/5] read-cache: clean up casting and byte decoding

2018-09-12 Thread Ben Peart
This patch does a clean up pass to minimize the casting required to work
with the memory mapped index (mmap).

It also makes the decoding of network byte order more consistent by using
get_be32() where possible.

Signed-off-by: Ben Peart 
---
 read-cache.c | 49 +++--
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 40dc4723b2..c05e887fc9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1655,7 +1655,7 @@ int verify_index_checksum;
 /* Allow fsck to force verification of the cache entry order. */
 int verify_ce_order;
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
git_hash_ctx c;
unsigned char hash[GIT_MAX_RAWSZ];
@@ -1679,7 +1679,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
 }
 
 static int read_index_extension(struct index_state *istate,
-   const char *ext, void *data, unsigned long sz)
+   const char *ext, const char *data, unsigned 
long sz)
 {
switch (CACHE_EXT(ext)) {
case CACHE_EXT_TREE:
@@ -1902,7 +1902,7 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
 }
 
 #ifndef NO_PTHREADS
-static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
+static unsigned long read_eoie_extension(const char *mmap, size_t mmap_size);
 #endif
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, unsigned long offset);
 
@@ -1912,14 +1912,14 @@ struct load_index_extensions
pthread_t pthread;
 #endif
struct index_state *istate;
-   void *mmap;
+   const char *mmap;
size_t mmap_size;
unsigned long src_offset;
 };
 
-static void *load_index_extensions(void *_data)
+static void *load_index_extensions(void *data)
 {
-   struct load_index_extensions *p = _data;
+   struct load_index_extensions *p = data;
unsigned long src_offset = p->src_offset;
 
while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
@@ -1930,13 +1930,12 @@ static void *load_index_extensions(void *_data)
 * in 4-byte network byte order.
 */
uint32_t extsize;
-   memcpy(, (char *)p->mmap + src_offset + 4, 4);
-   extsize = ntohl(extsize);
+   extsize = get_be32(p->mmap + src_offset + 4);
if (read_index_extension(p->istate,
-   (const char *)p->mmap + src_offset,
-   (char *)p->mmap + src_offset + 8,
+   p->mmap + src_offset,
+   p->mmap + src_offset + 8,
extsize) < 0) {
-   munmap(p->mmap, p->mmap_size);
+   munmap((void *)p->mmap, p->mmap_size);
die("index file corrupt");
}
src_offset += 8;
@@ -1951,7 +1950,7 @@ static void *load_index_extensions(void *_data)
  * from the memory mapped file and add them to the given index.
  */
 static unsigned long load_cache_entry_block(struct index_state *istate,
-   struct mem_pool *ce_mem_pool, int offset, int nr, void 
*mmap,
+   struct mem_pool *ce_mem_pool, int offset, int nr, const 
char *mmap,
unsigned long start_offset, const struct cache_entry 
*previous_ce)
 {
int i;
@@ -1962,7 +1961,7 @@ static unsigned long load_cache_entry_block(struct 
index_state *istate,
struct cache_entry *ce;
unsigned long consumed;
 
-   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, 
, previous_ce);
set_index_entry(istate, i, ce);
 
@@ -1973,7 +1972,7 @@ static unsigned long load_cache_entry_block(struct 
index_state *istate,
 }
 
 static unsigned long load_all_cache_entries(struct index_state *istate,
-   void *mmap, size_t mmap_size, unsigned long src_offset)
+   const char *mmap, size_t mmap_size, unsigned long 
src_offset)
 {
unsigned long consumed;
 
@@ -2006,7 +2005,7 @@ struct load_cache_entries_thread_data
struct index_state *istate;
struct mem_pool *ce_mem_pool;
int offset, nr;
-   void *mmap;
+   const char *mmap;
unsigned long start_offset;
struct cache_entry *previous_ce;
unsigned long consumed; /* return # of bytes in index file processed */
@@ -2026,7 +2025,7 @@ static void *load_cache_entries_thread(void *_data)
 }
 
 static unsigned long load_cache_entries_threaded(int nr_threads, struct 
index_state *istate,
-   void *mmap, size_t mmap_size, unsigned 

[PATCH v5 3/5] read-cache: load cache entries on worker threads

2018-09-12 Thread Ben Peart
This patch helps address the CPU cost of loading the index by creating
multiple threads to divide the work of loading and converting the cache
entries across all available CPU cores.

It accomplishes this by having the primary thread loop across the index file
tracking the offset and (for V4 indexes) expanding the name. It creates a
thread to process each block of entries as it comes to them.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 filesBaseline Parallel entries
---
read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0%

Test w/1,000,000 files  Baseline Parallel entries
--
read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1%

Signed-off-by: Ben Peart 
---
 read-cache.c | 242 +--
 t/README |   3 +
 2 files changed, 217 insertions(+), 28 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index b203eebb44..880f627b4c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1942,20 +1942,212 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, void 
*mmap,
+   unsigned long start_offset, struct strbuf 
*previous_name)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   ce = create_from_disk(ce_mem_pool, disk_ce, , 
previous_name);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state *istate,
+   void *mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   unsigned long consumed;
+
+   if (istate->version == 4) {
+   previous_name = _name_buf;
+   mem_pool_init(>ce_mem_pool,
+   
estimate_cache_size_from_compressed(istate->cache_nr));
+   } else {
+   previous_name = NULL;
+   mem_pool_init(>ce_mem_pool,
+   estimate_cache_size(mmap_size, 
istate->cache_nr));
+   }
+
+   consumed = load_cache_entry_block(istate, istate->ce_mem_pool,
+   0, istate->cache_nr, mmap, src_offset, 
previous_name);
+   strbuf_release(_name_buf);
+   return consumed;
+}
+
+#ifndef NO_PTHREADS
+
+/*
+ * Mostly randomly chosen maximum thread counts: we
+ * cap the parallelism to online_cpus() threads, and we want
+ * to have at least 1 cache entries per thread for it to
+ * be worth starting a thread.
+ */
+#define THREAD_COST(1)
+
+struct load_cache_entries_thread_data
+{
+   pthread_t pthread;
+   struct index_state *istate;
+   struct mem_pool *ce_mem_pool;
+   int offset, nr;
+   void *mmap;
+   unsigned long start_offset;
+   struct strbuf previous_name_buf;
+   struct strbuf *previous_name;
+   unsigned long consumed; /* return # of bytes in index file processed */
+};
+
+/*
+ * A thread proc to run the load_cache_entries() computation
+ * across multiple background threads.
+ */
+static void *load_cache_entries_thread(void *_data)
+{
+   struct load_cache_entries_thread_data *p = _data;
+
+   p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool,
+   p->offset, p->nr, p->mmap, p->start_offset, p->previous_name);
+   return NULL;
+}
+
+static unsigned long load_cache_entries_threaded(int nr_threads, struct 
index_state *istate,
+   void *mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   struct load_cache_entries_thread_data *data;
+   int ce_per_thread;
+   unsigned long consumed;
+   int i, thread;
+
+   /* a little sanity checking */
+   if (istate->name_hash_initialized)
+   BUG("the name hash isn't thread safe");
+
+   mem_pool_init(>ce_mem_pool, 0);
+   if (istate->version == 4)
+   previous_name = _name_buf;
+   else
+   previous_name = NULL;
+
+   ce_per_thread = DIV_ROUND_UP(istate->cache_nr, nr_threads);

[PATCH v5 2/5] read-cache: load cache extensions on a worker thread

2018-09-12 Thread Ben Peart
This patch helps address the CPU cost of loading the index by loading
the cache extensions on a worker thread in parallel with loading the cache
entries.

In some cases, loading the extensions takes longer than loading the
cache entries so this patch utilizes the new EOIE to start the thread to
load the extensions before loading all the cache entries in parallel.

This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 filesBaseline Parallel Extensions
---
read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0%

Test w/1,000,000 files  Baseline Parallel Extensions
--
read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1%

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  6 +++
 config.c | 18 
 config.h |  1 +
 read-cache.c | 94 
 4 files changed, 102 insertions(+), 17 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..79f8296d9c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2391,6 +2391,12 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.threads::
+   Specifies the number of threads to spawn when loading the index.
+   This is meant to reduce index load time on multiprocessor machines.
+   Specifying 0 or 'true' will cause Git to auto-detect the number of
+   CPU's and set the number of threads accordingly. Defaults to 'true'.
+
 index.version::
Specify the version with which new index files should be
initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 9a0b10d4bc..9bd79fb165 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
+/*
+ * You can disable multi-threaded code by setting index.threads
+ * to 'false' (or 1)
+ */
+int git_config_get_index_threads(void)
+{
+   int is_bool, val;
+
+   if (!git_config_get_bool_or_int("index.threads", _bool, )) {
+   if (is_bool)
+   return val ? 0 : 1;
+   else
+   return val;
+   }
+
+   return 0; /* auto-detect */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 858935f123..b203eebb44 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -23,6 +23,10 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#ifndef NO_PTHREADS
+#include 
+#include 
+#endif
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1898,6 +1902,46 @@ static unsigned long read_eoie_extension(void *mmap_, 
size_t mmap_size);
 #endif
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, unsigned long offset);
 
+struct load_index_extensions
+{
+#ifndef NO_PTHREADS
+   pthread_t pthread;
+#endif
+   struct index_state *istate;
+   void *mmap;
+   size_t mmap_size;
+   unsigned long src_offset;
+};
+
+static void *load_index_extensions(void *_data)
+{
+   struct load_index_extensions *p = _data;
+   unsigned long src_offset = p->src_offset;
+
+   while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
+   /* After an array of active_nr index entries,
+* there can be arbitrary number of extended
+* sections, each of which is prefixed with
+* extension name (4-byte) and section length
+* in 4-byte network byte order.
+*/
+   uint32_t extsize;
+   memcpy(, (char *)p->mmap + src_offset + 4, 4);
+

[PATCH v5 0/5] read-cache: speed up index load through parallelization

2018-09-12 Thread Ben Peart
This version of the patch merges in Duy's work to speed up index v4 decoding.
I had to massage it a bit to get it to work with the multi-threading but it is
still largely his code. I also responded to Junio's feedback on initializing
copy_len to avoid compiler warnings.

I also added a minor cleanup patch to minimize the casting required when
working with the memory mapped index and other minor changes based on the
feedback received.

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/dcf62005f8
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v5 
&& git checkout dcf62005f8


### Interdiff (v3..v5):

diff --git a/read-cache.c b/read-cache.c
index 8537a55750..c05e887fc9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1655,7 +1655,7 @@ int verify_index_checksum;
 /* Allow fsck to force verification of the cache entry order. */
 int verify_ce_order;
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
git_hash_ctx c;
unsigned char hash[GIT_MAX_RAWSZ];
@@ -1679,7 +1679,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
 }
 
 static int read_index_extension(struct index_state *istate,
-   const char *ext, void *data, unsigned long sz)
+   const char *ext, const char *data, unsigned 
long sz)
 {
switch (CACHE_EXT(ext)) {
case CACHE_EXT_TREE:
@@ -1721,33 +1721,6 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
-  struct ondisk_cache_entry 
*ondisk,
-  unsigned int flags,
-  const char *name,
-  size_t len)
-{
-   struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len);
-
-   ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
-   ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
-   ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec);
-   ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec);
-   ce->ce_stat_data.sd_dev   = get_be32(>dev);
-   ce->ce_stat_data.sd_ino   = get_be32(>ino);
-   ce->ce_mode  = get_be32(>mode);
-   ce->ce_stat_data.sd_uid   = get_be32(>uid);
-   ce->ce_stat_data.sd_gid   = get_be32(>gid);
-   ce->ce_stat_data.sd_size  = get_be32(>size);
-   ce->ce_flags = flags & ~CE_NAMEMASK;
-   ce->ce_namelen = len;
-   ce->index = 0;
-   hashcpy(ce->oid.hash, ondisk->sha1);
-   memcpy(ce->name, name, len);
-   ce->name[len] = '\0';
-   return ce;
-}
-
 /*
  * Adjacent cache entries tend to share the leading paths, so it makes
  * sense to only store the differences in later entries.  In the v4
@@ -1768,15 +1741,18 @@ static unsigned long expand_name_field(struct strbuf 
*name, const char *cp_)
return (const char *)ep + 1 - cp_;
 }
 
-static struct cache_entry *create_from_disk(struct mem_pool *mem_pool,
+static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
+   unsigned int version,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
-   struct strbuf *previous_name)
+   const struct cache_entry 
*previous_ce)
 {
struct cache_entry *ce;
size_t len;
const char *name;
unsigned int flags;
+   size_t copy_len = 0;
+   int expand_name_field = version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1796,21 +1772,50 @@ static struct cache_entry *create_from_disk(struct 
mem_pool *mem_pool,
else
name = ondisk->name;
 
-   if (!previous_name) {
-   /* v3 and earlier */
+   if (expand_name_field) {
+   const unsigned char *cp = (const unsigned char *)name;
+   size_t strip_len, previous_len;
+
+   previous_len = previous_ce ? previous_ce->ce_namelen : 0;
+   strip_len = decode_varint();
+   if (previous_len < strip_len) {
+   if (previous_ce)
+   die(_("malformed name field in the index, near 
path '%s'"),
+   previous_ce->name);
+   else
+   die(_("malformed name field in the index in the 
first path"));
+   }
+   copy_len = previous_len - strip_len;
+   name = (const char *)cp;
+   }
+
if (len == CE_NAMEMASK)
-   len = 

Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-12 Thread Junio C Hamano
Jeff King  writes:

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

OK.


[PATCH 0/2] Bugfix for partial clones and ref-in-want servers

2018-09-12 Thread Jonathan Tan
A coworker discovered a bug when having a partial clone against a server
that supports ref-in-want - the lazy fetches no longer work, because the
client attempts to fetch with "want-ref ", which is not permitted
by the protocol.

The first patch deduplicates some code, so that the bugfix need only be
applied once, and the second patch contains the actual bugfix.

Jonathan Tan (2):
  fetch-object: provide only one fetching function
  fetch-object: set exact_oid when fetching

 fetch-object.c   | 17 ++---
 fetch-object.h   |  8 ++--
 sha1-file.c  |  2 +-
 t/t0410-partial-clone.sh | 12 
 unpack-trees.c   |  2 +-
 5 files changed, 22 insertions(+), 19 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog



[PATCH 2/2] fetch-object: set exact_oid when fetching

2018-09-12 Thread Jonathan Tan
fetch_objects() currently does not set exact_oid in struct ref when
invoking transport_fetch_refs(). If the server supports ref-in-want,
fetch_pack() uses this field to determine whether a wanted ref should be
requested as a "want-ref" line or a "want" line; without the setting of
exact_oid, the wrong line will be sent.

Set exact_oid, so that the correct line is sent.

Signed-off-by: Jonathan Tan 
---
 fetch-object.c   |  1 +
 t/t0410-partial-clone.sh | 12 
 2 files changed, 13 insertions(+)

diff --git a/fetch-object.c b/fetch-object.c
index 1af1bf857..426654880 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -32,6 +32,7 @@ void fetch_objects(const char *remote_name, const struct 
object_id *oids,
for (i = 0; i < oid_nr; i++) {
struct ref *new_ref = alloc_ref(oid_to_hex([i]));
oidcpy(_ref->old_oid, [i]);
+   new_ref->exact_oid = 1;
new_ref->next = ref;
ref = new_ref;
}
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..0ab02c337 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -170,6 +170,18 @@ test_expect_success 'fetching of missing objects' '
git verify-pack --verbose "$IDX" | grep "$HASH"
 '
 
+test_expect_success 'fetching of missing objects works with ref-in-want 
enabled' '
+   # ref-in-want requires protocol version 2
+   git -C server config protocol.version 2 &&
+   git -C server config uploadpack.allowrefinwant 1 &&
+   git -C repo config protocol.version 2 &&
+
+   rm -rf repo/.git/objects/* &&
+   rm -f trace &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C repo cat-file -p "$HASH" &&
+   grep "git< fetch=.*ref-in-want" trace
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised commit' '
rm -rf repo &&
test_create_repo repo &&
-- 
2.19.0.397.gdd90340f6a-goog



[PATCH 1/2] fetch-object: provide only one fetching function

2018-09-12 Thread Jonathan Tan
fetch-object.h currently provides two functions (fetch_object() and
fetch_objects()) that could be replaced by a single, more flexible
function. Replace those two functions with the more flexible function.

Signed-off-by: Jonathan Tan 
---
 fetch-object.c | 16 +---
 fetch-object.h |  8 ++--
 sha1-file.c|  2 +-
 unpack-trees.c |  2 +-
 4 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 853624f81..1af1bf857 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -23,21 +23,15 @@ static void fetch_refs(const char *remote_name, struct ref 
*ref)
fetch_if_missing = original_fetch_if_missing;
 }
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
-{
-   struct ref *ref = alloc_ref(sha1_to_hex(sha1));
-   hashcpy(ref->old_oid.hash, sha1);
-   fetch_refs(remote_name, ref);
-}
-
-void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+void fetch_objects(const char *remote_name, const struct object_id *oids,
+  int oid_nr)
 {
struct ref *ref = NULL;
int i;
 
-   for (i = 0; i < to_fetch->nr; i++) {
-   struct ref *new_ref = alloc_ref(oid_to_hex(_fetch->oid[i]));
-   oidcpy(_ref->old_oid, _fetch->oid[i]);
+   for (i = 0; i < oid_nr; i++) {
+   struct ref *new_ref = alloc_ref(oid_to_hex([i]));
+   oidcpy(_ref->old_oid, [i]);
new_ref->next = ref;
ref = new_ref;
}
diff --git a/fetch-object.h b/fetch-object.h
index 4b269d07e..d2f996d4e 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,11 +1,7 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
-#include "sha1-array.h"
-
-extern void fetch_object(const char *remote_name, const unsigned char *sha1);
-
-extern void fetch_objects(const char *remote_name,
- const struct oid_array *to_fetch);
+void fetch_objects(const char *remote_name, const struct object_id *oids,
+  int oid_nr);
 
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 97b742384..2edf4564f 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1317,7 +1317,7 @@ int oid_object_info_extended(struct repository *r, const 
struct object_id *oid,
 * TODO Pass a repository struct through fetch_object,
 * such that arbitrary repositories work.
 */
-   fetch_object(repository_format_partial_clone, 
real->hash);
+   fetch_objects(repository_format_partial_clone, real, 1);
already_retried = 1;
continue;
}
diff --git a/unpack-trees.c b/unpack-trees.c
index f25089b87..82a83dbc6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -392,7 +392,7 @@ static int check_updates(struct unpack_trees_options *o)
}
if (to_fetch.nr)
fetch_objects(repository_format_partial_clone,
- _fetch);
+ to_fetch.oid, to_fetch.nr);
fetch_if_missing = fetch_if_missing_store;
oid_array_clear(_fetch);
}
-- 
2.19.0.397.gdd90340f6a-goog



[RFC PATCH 1/1] builtin/remote: quote remote name on error to display empty name

2018-09-12 Thread Shulhan
Rationale: consistent error format

When adding new remote name with empty string, git will print the
following error message,

  fatal: '' is not a valid remote name\n

But when removing remote name with empty string as input, git did not
print the empty string with quote,

  fatal: No such remote: \n
---
 builtin/remote.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 40c6f8a1b..f7edf7f2c 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -626,7 +626,7 @@ static int mv(int argc, const char **argv)
 
oldremote = remote_get(rename.old_name);
if (!remote_is_configured(oldremote, 1))
-   die(_("No such remote: %s"), rename.old_name);
+   die(_("No such remote: '%s'"), rename.old_name);
 
if (!strcmp(rename.old_name, rename.new_name) && oldremote->origin != 
REMOTE_CONFIG)
return migrate_file(oldremote);
@@ -762,7 +762,7 @@ static int rm(int argc, const char **argv)
 
remote = remote_get(argv[1]);
if (!remote_is_configured(remote, 1))
-   die(_("No such remote: %s"), argv[1]);
+   die(_("No such remote: '%s'"), argv[1]);
 
known_remotes.to_delete = remote;
for_each_remote(add_known_remote, _remotes);
@@ -861,7 +861,7 @@ static int get_remote_ref_states(const char *name,
 
states->remote = remote_get(name);
if (!states->remote)
-   return error(_("No such remote: %s"), name);
+   return error(_("No such remote: '%s'"), name);
 
read_branches();
 
-- 
2.19.0.398.gef7f67bed



[RFC PATCH 0/1] builtin/remote: quote remote name on error to display empty name

2018-09-12 Thread Shulhan
Rationale: consistent error format

When adding new remote name with empty string, git will print the
following error message,

  fatal: '' is not a valid remote name\n

But when removing remote name with empty string as input, git did not
print the empty string with quote,

  fatal: No such remote: \n

Follow up question: If this is ok, should the po files also updated?

This is my first patch, sorry if I did some mistakes.

Shulhan (1):
  builtin/remote: quote remote name on error to display empty name

 builtin/remote.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.19.0.398.gef7f67bed



Re: [PATCH v4 0/5] read-cache: speed up index load through parallelization

2018-09-12 Thread Ben Peart




On 9/11/2018 7:26 PM, Ben Peart wrote:

This version of the patch merges in Duy's work to speed up index v4 decoding.
I had to massage it a bit to get it to work with the multi-threading but its
still largely his code. It helps a little (3%-4%) when the cache entry thread(s)
take the longest and not when the index extensions loading is the long thread.

I also added a minor cleanup patch to minimize the casting required when
working with the memory mapped index and other minor changes based on the
feedback received.

Base Ref: v2.19.0
Web-Diff: https://github.com/benpeart/git/commit/9d31d5fb20
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v4 
&& git checkout 9d31d5fb20




A bad merge (mistake on my part, not a bug) means this is missing some 
of the changes from V3.  Please ignore, I'll send an updated series to 
address it.



### Patches

Ben Peart (4):
   eoie: add End of Index Entry (EOIE) extension
   read-cache: load cache extensions on a worker thread
   read-cache: speed up index load through parallelization
   read-cache: clean up casting and byte decoding

Nguyễn Thái Ngọc Duy (1):
   read-cache.c: optimize reading index format v4

  Documentation/config.txt |   6 +
  Documentation/technical/index-format.txt |  23 +
  config.c |  18 +
  config.h |   1 +
  read-cache.c | 581 +++
  5 files changed, 531 insertions(+), 98 deletions(-)


base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670



[PATCH v2 1/1] t3206-range-diff.sh: cover single-patch case

2018-09-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The commit 40ce4160 "format-patch: allow --range-diff to apply to
a lone-patch" added the ability to see a range-diff as commentary
after the commit message of a single patch series (i.e. [PATCH]
instead of [PATCH X/N]). However, this functionality was not
covered by a test case.

Add a simple test case that checks that a range-diff is written as
commentary to the patch.

Signed-off-by: Derrick Stolee 
---
 t/t3206-range-diff.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 3d7a2d8a4d..6babc0e276 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -154,4 +154,9 @@ do
'
 done
 
+test_expect_success 'format-patch --range-diff as commentary' '
+   git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual &&
+   test_i18ngrep "^Range-diff:$" actual
+'
+
 test_done
-- 
gitgitgadget


[PATCH v2 0/1] Add coverage for 'git format-patch --range-diff' single-patch case

2018-09-12 Thread Derrick Stolee via GitGitGadget
In es/format-patch-rangediff, we added a '--range-diff' option to
git-format-patch to automatically add a range-diff. We also added an option
to write the diff as commentary to a single patch submission. However, this
check was not tested.

I discovered this test gap by running 'make coverage-test coverage-report'
on 'next' and then comparing the uncovered lines with those in the diff
between 'master' and 'next'. I have a script that automates this process,
and I'm still working on polishing it. You can see an earlier version at
[1].

Based on es/format-patch-rangediff

Cc: sunsh...@sunshine.co

Cc: p...@peff.net

[1] 
https://github.com/derrickstolee/git/blob/coverage/contrib/coverage-diff.shA
bash script to report uncovered lines that were added in a diff.

Derrick Stolee (1):
  t3206-range-diff.sh: cover single-patch case

 t/t3206-range-diff.sh | 5 +
 1 file changed, 5 insertions(+)


base-commit: 40ce41604daf200cdc85abded0133d40faafc2f8
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-37%2Fderrickstolee%2Frange-diff-test-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-37/derrickstolee/range-diff-test-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/37

Range-diff vs v1:

 1:  58347a9624 ! 1:  277a4d2bd8 t3206-range-diff.sh: cover single-patch case
 @@ -22,7 +22,7 @@
   
  +test_expect_success 'format-patch --range-diff as commentary' '
  + git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual &&
 -+ grep -A 1 -e "\-\-\-" actual | grep "Range-diff:"
 ++ test_i18ngrep "^Range-diff:$" actual
  +'
  +
   test_done

-- 
gitgitgadget


[PATCH 1/1] commit-reach: properly peel tags

2018-09-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
"commit-reach: make can_all_from_reach... linear" but incorrectly
assumed that all objects provided were commits. During a fetch
negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
to the 'from' array. The current code creates a segfault.

Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
and add a test in t6600-test-reach.sh that demonstrates this segfault.

Correct the issue by peeling tags when investigating the initial list
of objects in the 'from' array.

Signed-off-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 commit-reach.c| 25 ++---
 t/helper/test-reach.c | 22 +-
 t/t6600-test-reach.sh | 30 --
 3 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 86715c103c..6de72c6e03 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array 
*from,
 {
struct commit **list = NULL;
int i;
+   int nr_commits;
int result = 1;
 
ALLOC_ARRAY(list, from->nr);
+   nr_commits = 0;
for (i = 0; i < from->nr; i++) {
-   list[i] = (struct commit *)from->objects[i].item;
+   struct object *from_one = from->objects[i].item;
 
-   if (parse_commit(list[i]) ||
-   list[i]->generation < min_generation)
-   return 0;
+   from_one = deref_tag(the_repository, from_one,
+"a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   from->objects[i].item->flags |= assign_flag;
+   continue;
+   }
+
+   list[nr_commits] = (struct commit *)from_one;
+   if (parse_commit(list[nr_commits]) ||
+   list[nr_commits]->generation < min_generation)
+   return 0; /* is this a leak? */
+   nr_commits++;
}
 
-   QSORT(list, from->nr, compare_commits_by_gen);
+   QSORT(list, nr_commits, compare_commits_by_gen);
 
-   for (i = 0; i < from->nr; i++) {
+   for (i = 0; i < nr_commits; i++) {
/* DFS from list[i] */
struct commit_list *stack = NULL;
 
@@ -600,7 +611,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
}
 
 cleanup:
-   for (i = 0; i < from->nr; i++) {
+   for (i = 0; i < nr_commits; i++) {
clear_commit_marks(list[i], RESULT);
clear_commit_marks(list[i], assign_flag);
}
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index eb21103998..08d2ea68e8 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -31,6 +31,7 @@ int cmd__reach(int ac, const char **av)
struct object_id oid_A, oid_B;
struct commit *A, *B;
struct commit_list *X, *Y;
+   struct object_array X_obj = OBJECT_ARRAY_INIT;
struct commit **X_array;
int X_nr, X_alloc;
struct strbuf buf = STRBUF_INIT;
@@ -49,7 +50,8 @@ int cmd__reach(int ac, const char **av)
 
while (strbuf_getline(, stdin) != EOF) {
struct object_id oid;
-   struct object *o;
+   struct object *orig;
+   struct object *peeled;
struct commit *c;
if (buf.len < 3)
continue;
@@ -57,14 +59,14 @@ int cmd__reach(int ac, const char **av)
if (get_oid_committish(buf.buf + 2, ))
die("failed to resolve %s", buf.buf + 2);
 
-   o = parse_object(r, );
-   o = deref_tag_noverify(o);
+   orig = parse_object(r, );
+   peeled = deref_tag_noverify(orig);
 
-   if (!o)
+   if (!peeled)
die("failed to load commit for input %s resulting in 
oid %s\n",
buf.buf, oid_to_hex());
 
-   c = object_as_type(r, o, OBJ_COMMIT, 0);
+   c = object_as_type(r, peeled, OBJ_COMMIT, 0);
 
if (!c)
die("failed to load commit for input %s resulting in 
oid %s\n",
@@ -85,6 +87,7 @@ int cmd__reach(int ac, const char **av)
commit_list_insert(c, );
ALLOC_GROW(X_array, X_nr + 1, X_alloc);
X_array[X_nr++] = c;
+   add_object_array(orig, NULL, _obj);
break;
 
case 'Y':
@@ -113,6 +116,15 @@ int cmd__reach(int ac, const char **av)
print_sorted_commit_ids(list);
} else if (!strcmp(av[1], "can_all_from_reach")) {
printf("%s(X,Y):%d\n", av[1], can_all_from_reach(X, Y, 1));
+ 

[PATCH 0/1] Properly peel tags in can_all_from_reach_with_flags()

2018-09-12 Thread Derrick Stolee via GitGitGadget
As Peff reported [1], the refactored can_all_from_reach_with_flags() method
does not properly peel tags. Since the helper method can_all_from_reach()
and code in t/helper/test-reach.c all peel tags before getting to this
method, it is not super-simple to create a test that demonstrates this.

I modified t/helper/test-reach.c to allow calling
can_all_from_reach_with_flags() directly, and added a test in
t6600-test-reach.sh that demonstrates the segfault without the fix. The fix
in commit-reach.c is Peff's fix verbatim.

[1] 
https://public-inbox.org/git/0bf9103c-9377-506b-7ad7-e5273d8e9...@gmail.com/T/#u

Derrick Stolee (1):
  commit-reach: properly peel tags

 commit-reach.c| 25 ++---
 t/helper/test-reach.c | 22 +-
 t/t6600-test-reach.sh | 30 --
 3 files changed, 63 insertions(+), 14 deletions(-)


base-commit: 6621c838743812aaba96e55cfec8524ea1144c2d
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-39%2Fderrickstolee%2Ftag-fix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-39/derrickstolee/tag-fix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/39
-- 
gitgitgadget


Re: [PATCH 1/1] t3206-range-diff.sh: cover single-patch case

2018-09-12 Thread Derrick Stolee

On 9/11/2018 5:34 PM, Eric Sunshine wrote:

On Tue, Sep 11, 2018 at 4:26 PM Derrick Stolee via GitGitGadget
 wrote:

The commit 40ce4160 "format-patch: allow --range-diff to apply to
a lone-patch" added the ability to see a range-diff as commentary
after the commit message of a single patch series (i.e. [PATCH]
instead of [PATCH X/N]). However, this functionality was not
covered by a test case.

Add a simple test case that checks that a range-diff is written as
commentary to the patch.

Signed-off-by: Derrick Stolee 
---
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
@@ -154,4 +154,9 @@ do
+test_expect_success 'format-patch --range-diff as commentary' '
+   git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual &&
+   grep -A 1 -e "\-\-\-" actual | grep "Range-diff:"
+'

Aside from Junio's and Stefan's comments...

Patch 6/14 [1], in addition to checking that a solo patch contains an
interdiff, takes the extra step of checking that individual patches
_don't_ contain an interdiff when --cover-letter is used. I wonder if
the same should be done here, though I don't feel too strongly about
it. If you do go that route, it might make sense to move this test to
t4014 as neighbor to the --interdiff tests. The reason 10/14 [2] added
the "git format-patch --range-diff" test to t3206 instead of t4014 was
so it could do a thorough check of the embedded range-diff by re-using
the specially crafted test repo set up by t3206. Your new test is much
looser, thus could be moved alongside the --interdiff tests. Not a big
deal, though. Either way is fine. Thanks for working on this.

[1]: 
https://public-inbox.org/git/20180722095717.17912-7-sunsh...@sunshineco.com/
[2]: 
https://public-inbox.org/git/20180722095717.17912-11-sunsh...@sunshineco.com/
Thanks for these links! In particular, [2] uses this line to test the 
inter-diff appears:


+    test_i18ngrep "^Interdiff:$" 0001-fleep.patch &&

That's a better way to test, especially with the translation. It would 
be enough for my needs.


Thanks,

-Stolee

P.S. Resending because apparently I had HTML in the last response



Gitattributes not properly supported with add command

2018-09-12 Thread smaudet
The following:

git add -u :\(glob,attr:-someAttr\):src/**

Produces an error that, according to the source code, should never be visible 
to the user. This attribute/pathspec *should* be supported according to the 
documentation provided by git:

fatal: BUG:builtin/add.c:498: unsupported magic 40

It looks like the documentation claims more features than the source code 
supports, perhaps incorrectly because I think you shouldn't be restricted to a 
set of attributes, and the source code 
doesn't properly do its job anyways and never handles this scenario.

This should be fixed, if I have any time I'll look into what it would take to 
submit a patch, but I don't have the time for this right now.




Re: [PATCH v2 17/18] commit-reach: make can_all_from_reach... linear

2018-09-12 Thread Derrick Stolee

On 9/12/2018 12:29 AM, Jeff King wrote:

On Wed, Sep 12, 2018 at 12:14:25AM -0400, Jeff King wrote:


+   ALLOC_ARRAY(list, from->nr);
for (i = 0; i < from->nr; i++) {
-   struct object *from_one = from->objects[i].item;
+   list[i] = (struct commit *)from->objects[i].item;

Some of the objects in my array are not commits, but rather tags, so
this is a bogus cast.

You can see that the original code peeled them and threw away
non-commits:

  
-		if (from_one->flags & assign_flag)

-   continue;
-   from_one = deref_tag(the_repository, from_one, "a from object", 
0);
-   if (!from_one || from_one->type != OBJ_COMMIT) {
-   /* no way to tell if this is reachable by
-* looking at the ancestry chain alone, so
-* leave a note to ourselves not to worry about
-* this object anymore.
-*/
-   from->objects[i].item->flags |= assign_flag;
-   continue;
-   }

So presumably we'd need to do something similar.

This patch seems to fix it for me. It's more or less a reversion of the
hunk above, though I didn't dig into whether I'm violating some other
assumption in your new code.

I think this function leaks "list" both from the location I noted here,
as well as from normal exit
Thanks for the report and the fix. I'll try to create  test that 
demonstrates this and then push up a full patch.

---
  commit-reach.c | 25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 622eeb313d..abe90a2f55 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -547,20 +547,31 @@ int can_all_from_reach_with_flag(struct object_array 
*from,
  {
struct commit **list = NULL;
int i;
+   int nr_commits;
int result = 1;
  
  	ALLOC_ARRAY(list, from->nr);

+   nr_commits = 0;
for (i = 0; i < from->nr; i++) {
-   list[i] = (struct commit *)from->objects[i].item;
+   struct object *from_one = from->objects[i].item;
  
-		if (parse_commit(list[i]) ||

-   list[i]->generation < min_generation)
-   return 0;
+   from_one = deref_tag(the_repository, from_one,
+"a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   from->objects[i].item->flags |= assign_flag;
+   continue;
+   }
+
+   list[nr_commits] = (struct commit *)from_one;
+   if (parse_commit(list[nr_commits]) ||
+   list[nr_commits]->generation < min_generation)
+   return 0; /* is this a leak? */
+   nr_commits++;
}
  
-	QSORT(list, from->nr, compare_commits_by_gen);

+   QSORT(list, nr_commits, compare_commits_by_gen);
  
-	for (i = 0; i < from->nr; i++) {

+   for (i = 0; i < nr_commits; i++) {
/* DFS from list[i] */
struct commit_list *stack = NULL;
  
@@ -603,7 +614,7 @@ int can_all_from_reach_with_flag(struct object_array *from,

}
  
  cleanup:

-   for (i = 0; i < from->nr; i++) {
+   for (i = 0; i < nr_commits; i++) {
clear_commit_marks(list[i], RESULT);
clear_commit_marks(list[i], assign_flag);
}




Re: git-credential-libsecret not prompting to unlock keyring

2018-09-12 Thread Paul Jolly
Apologies, forgot the crucial details post that log:

$ git --version
git version 2.19.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 18.04.1 LTS
Release:18.04
Codename:   bionic


git-credential-libsecret not prompting to unlock keyring

2018-09-12 Thread Paul Jolly
Hi,

For some reason today, after an apt update, my combination of GNOME
keyring and git-credential-libsecret stopped working. Stopped working
in so far as I am no longer prompted by GNOME keyring to unlock my
keyring, instead I get prompted for the password that git is looking
for.

Trace output below; would appreciate any pointers.

Many thanks

==

$ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git push
13:38:59.324799 git.c:415   trace: built-in: git push
13:38:59.325070 run-command.c:637   trace: run_command:
GIT_DIR=.git git-remote-https origin
https://github.com/myitcv/go-modules-by-example
* Couldn't find host github.com in the .netrc file; using defaults
*   Trying 192.30.253.113...
* TCP_NODELAY set
* Connected to github.com (192.30.253.113) port 443 (#0)
* found 133 certificates in /etc/ssl/certs/ca-certificates.crt
* found 399 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*server certificate verification OK
*server certificate status verification SKIPPED
*common name: github.com (matched)
*server certificate expiration date OK
*server certificate activation date OK
*certificate public key: RSA
*certificate version: #3
*subject: businessCategory=Private
Organization,jurisdictionOfIncorporationCountryName=US,jurisdictionOfIncorporationStateOrProvinceName=Delaware,serialNumber=5157550,C=US,ST=California,L=San
Francisco,O=GitHub\, Inc.,CN=github.com
*start date: Tue, 08 May 2018 00:00:00 GMT
*expire date: Wed, 03 Jun 2020 12:00:00 GMT
*issuer: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert
SHA2 Extended Validation Server CA
*compression: NULL
* ALPN, server accepted to use http/1.1
> GET /myitcv/go-modules-by-example/info/refs?service=git-receive-pack HTTP/1.1
Host: github.com
User-Agent: git/2.19.0
Accept: */*
Accept-Encoding: deflate, gzip
Accept-Language: en-GB, *;q=0.9
Pragma: no-cache

< HTTP/1.1 401 Authorization Required
< Server: GitHub Babel 2.0
< Content-Type: text/plain
< Content-Length: 60
< WWW-Authenticate: Basic realm="GitHub"
< X-GitHub-Request-Id: EF1E:073A:6C8091:C2DB67:5B9908E8
< X-Frame-Options: DENY
<
* Connection #0 to host github.com left intact
13:39:05.084588 run-command.c:637   trace: run_command:
'/home/myitcv/dev/git/contrib/credential/libsecret/git-credential-libsecret
get'
Password for 'https://myi...@github.com':


Re: [PATCH 3/3] sequencer: use read_author_script()

2018-09-12 Thread Eric Sunshine
On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood  wrote:
> Use the new function to read the author script, updating
> read_env_script() and read_author_ident(). This means we now have a
> single code path that reads the author script and uses sq_dequote() to
> dequote it. This fixes potential problems with user edited scripts
> as read_env_script() which did not track quotes properly.
> [...]
> Signed-off-by: Phillip Wood 
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, 
> char **email, char **date,
>  static int read_env_script(struct argv_array *env)
>  {
> +   strbuf_addstr(, "GIT_AUTHOR_NAME=");
> +   strbuf_addstr(, name);
> +   argv_array_push(env, script.buf);
> +   strbuf_reset();
> +   strbuf_addstr(, "GIT_AUTHOR_EMAIL=");
> +   strbuf_addstr(, email);
> +   argv_array_push(env, script.buf);
> +   strbuf_reset();
> +   strbuf_addstr(, "GIT_AUTHOR_DATE=");
> +   strbuf_addstr(, date);
> +   argv_array_push(env, script.buf);
> +   strbuf_release();

I haven't read this series closely yet, but this caught my eye while
scanning it quickly. The above noisy code can all be collapsed to the
simpler:

argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
argv_array_pushf(env, "GIT_AUTHOR_EMAIL =%s", email);
argv_array_pushf(env, "GIT_AUTHOR_DATE =%s", date);


Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)

2018-09-12 Thread Stephen & Linda Smith
On Wednesday, September 12, 2018 12:13:23 AM MST Junio C Hamano wrote:
> Stephen & Linda Smith  writes:
> > On Tuesday, September 11, 2018 3:20:19 PM MST Junio C Hamano wrote:
> >> * jc/wt-status-state-cleanup (2018-09-07) 1 commit
> >> 
> >>  - WIP: roll wt_status_state into wt_status and populate in the collect
> >> 
> >> phase (this branch uses ss/wt-status-committable.)
> >> 
> >> * ss/wt-status-committable (2018-09-07) 4 commits
> >> 
> >>  - wt-status.c: set the committable flag in the collect phase
> >>  - t7501: add test of "commit --dry-run --short"
> >>  - wt-status: rename commitable to committable
> >>  - wt-status.c: move has_unmerged earlier in the file
> >>  (this branch is used by jc/wt-status-state-cleanup.)
> > 
> > I note that the jc/wt-status-state-cleanup branch is a patch "for
> > illustration purposes only" [1].
> > 
> > I was about to update that patch to start dealing with the free() function
> > calls, but noted you added the patch.  Do you want me to take that patch
> > and continue on?  Or does someone else have something in progress?
> 
> I do not plan to.  
Ok ... from the wording I wasn't sure if there wasn't another developer 
working this.  I will pick up that patch and continue.

> In general, anything that is only in 'pu' is a
> fair game---when a better alternative appears, or a discussion leads
> to a conclusion that a change is unneeded, they are replaced and/or
> discarded.  Just think of them as being kept slightly better record
> of existence than merely being in the list archive, nothing more.

Thanks that confirmed my reading of the Documentation/gitworkflows.txt.





[PATCH 2/3] add read_author_script() to libgit

2018-09-12 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 57 ---
 sequencer.c  | 62 
 sequencer.h  |  3 +++
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 8c165f747b..aa5de0ee73 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,32 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return -1;
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
+   if (read_author_script(filename, >author_name,
+  >author_email, >author_date, 1))
+   exit(128);
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
-   goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return 0;
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..5d0ff8f1c1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,68 @@ static int write_author_script(const char *message)
return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where VALUE part is
+ * sq-quoted, and append  at the end of the string list
+ */
+static int parse_key_value_squoted(char *buf, struct string_list *list)
+{
+   while (*buf) {
+   struct string_list_item *item;
+   char *np;
+   char *cp = strchr(buf, '=');
+   if (!cp)
+   return -1;
+   np = strchrnul(cp, '\n');
+   *cp++ = '\0';
+   item = string_list_append(list, buf);
+
+   buf = np + (*np == '\n');
+   *np = '\0';
+   cp = sq_dequote(cp);
+   if (!cp)
+   return -1;
+   item->util = xstrdup(cp);
+   }
+   return 0;
+}
+
+int read_author_script(const char *path, char **name, char **email, char 
**date,
+  int allow_missing)
+{
+   struct strbuf buf = STRBUF_INIT;
+   struct string_list kv = STRING_LIST_INIT_DUP;
+   int retval = -1;
+
+   if (strbuf_read_file(, path, 256) <= 0) {
+   strbuf_release();
+   if (errno == ENOENT && allow_missing)
+   return 0;
+   else
+   return error_errno(_("could not open '%s' for reading"),
+  path);
+   }
+
+   if (parse_key_value_squoted(buf.buf, )) {
+   error(_("unable to parse '%s'"), path);
+   goto finish;
+   }
+   if (kv.nr != 3 ||
+   

[PATCH 1/3] am: rename read_author_script()

2018-09-12 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..8c165f747b 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -302,7 +302,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -411,7 +411,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.18.0



[PATCH 3/3] sequencer: use read_author_script()

2018-09-12 Thread Phillip Wood
From: Phillip Wood 

Use the new function to read the author script, updating
read_env_script() and read_author_ident(). This means we now have a
single code path that reads the author script and uses sq_dequote() to
dequote it. This fixes potential problems with user edited scripts
as read_env_script() which did not track quotes properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---
 sequencer.c | 108 +++-
 1 file changed, 30 insertions(+), 78 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 5d0ff8f1c1..630741cfe0 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
return retval;
 }
 
-/*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
 /*
  * Read a list of environment variable assignments (such as the author-script
  * file) into an environment block. Returns -1 on error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   strbuf_addstr(, "GIT_AUTHOR_NAME=");
+   strbuf_addstr(, name);
+   argv_array_push(env, script.buf);
+   strbuf_reset();
+   strbuf_addstr(, "GIT_AUTHOR_EMAIL=");
+   strbuf_addstr(, email);
+   argv_array_push(env, script.buf);
+   strbuf_reset();
+   strbuf_addstr(, "GIT_AUTHOR_DATE=");
+   strbuf_addstr(, date);
+   argv_array_push(env, script.buf);
+   strbuf_release();
+
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -790,54 +771,25 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
-   struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return NULL;
 
-   /* dequote values and construct ident line in-place */
-   for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
-   if (!skip_prefix(in, keys[i], (const char **))) {
-   warning(_("could not parse '%s' (looking for '%s')"),
-   rebase_path_author_script(), keys[i]);
-   return NULL;
-   }
-
-   eol = strchrnul(in, '\n');
-   *eol = '\0';
-   if (!sq_dequote(in)) {
-   warning(_("bad quoting on %s value in '%s'"),
-   keys[i], rebase_path_author_script());
-   return NULL;
-   }
- 

[PATCH 0/3] am/rebase: share read_author_script()

2018-09-12 Thread Phillip Wood
From: Phillip Wood 

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (3):
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  61 ++--
 sequencer.c  | 160 ---
 sequencer.h  |   3 +
 3 files changed, 96 insertions(+), 128 deletions(-)

-- 
2.18.0



Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)

2018-09-12 Thread Junio C Hamano
Stephen & Linda Smith  writes:

> On Tuesday, September 11, 2018 3:20:19 PM MST Junio C Hamano wrote:
>> 
>> * jc/wt-status-state-cleanup (2018-09-07) 1 commit
>>  - WIP: roll wt_status_state into wt_status and populate in the collect
>> phase (this branch uses ss/wt-status-committable.)
>> 
>> * ss/wt-status-committable (2018-09-07) 4 commits
>>  - wt-status.c: set the committable flag in the collect phase
>>  - t7501: add test of "commit --dry-run --short"
>>  - wt-status: rename commitable to committable
>>  - wt-status.c: move has_unmerged earlier in the file
>>  (this branch is used by jc/wt-status-state-cleanup.)
>> 
>
> I note that the jc/wt-status-state-cleanup branch is a patch "for 
> illustration 
> purposes only" [1].
>
> I was about to update that patch to start dealing with the free() function 
> calls, but noted you added the patch.  Do you want me to take that patch and 
> continue on?  Or does someone else have something in progress?

I do not plan to.  In general, anything that is only in 'pu' is a
fair game---when a better alternative appears, or a discussion leads
to a conclusion that a change is unneeded, they are replaced and/or
discarded.  Just think of them as being kept slightly better record
of existence than merely being in the list archive, nothing more.


Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero

2018-09-12 Thread Jonathan Nieder
Jeff King wrote:
> On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote:

>> I do not think we would mind terribly if we do not support
>> combinations like gzipped-and-then-chunked from day one.  An in-code
>> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
>> may not hurt, though.
>
> It's pretty common for Git to send gzip'd contents, so this might
> actually be necessary on day one. However, it looks like we do so by
> setting the content-encoding header.

Correct, we haven't been using Transfer-Encoding for that.

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

I wonder about the motivating IIS case.  The CGI spec says that
CONTENT_LENGTH is set if and only if the message has a message-body.
When discussing message-body, it says

  Request-Data   = [ request-body ] [ extension-data ]
[...]
   A request-body is supplied with the request if the CONTENT_LENGTH is
   not NULL.  The server MUST make at least that many bytes available
   for the script to read.  The server MAY signal an end-of-file
   condition after CONTENT_LENGTH bytes have been read or it MAY supply
   extension data.  Therefore, the script MUST NOT attempt to read more
   than CONTENT_LENGTH bytes, even if more data is available.

Does that mean that if CONTENT_LENGTH is not set, then we are
guaranteed to see EOF, because extension-data cannot be present?  If
so, then what we have in v2.19 (plus Max's test improvement that is in
"next") is already enough.

So I agree.

 1. Junio, please eject this patch from "pu", since we don't have any
need for it.

 2. IIS users, please test v2.19 and let us know how it goes.

Do we have any scenarios that would use an empty POST (or other
non-GET) request?

Thanks,
Jonathan


Re: 2.19.0 regression: leaving editor with empty commit message doesn't stop rebase [was: Re: [RFC PATCH v5 9/9] git-rebase: make --allow-empty-message the default]

2018-09-12 Thread Elijah Newren
Hi,

On Tue, Sep 11, 2018 at 7:42 PM SZEDER Gábor  wrote:
> Lately I noticed that occasionally I ended up with an empty commit
> message after an interactive rebase...those empty commit messages are the
> consequence of a regression in v2.19.0, which bisects down to this
> patch.
>
> To reproduce the issue, just start an interactive rebase, choose a
> commit to reword, save, delete all the commit message, save, and BAM!
> there is the commit with the empty message.
>
>
> On Wed, Jun 27, 2018 at 12:23:19AM -0700, Elijah Newren wrote:
> > rebase backends currently behave differently with empty commit messages,
> > largely as a side-effect of the different underlying commands on which
> > they are based.  am-based rebases apply commits with an empty commit
...
> I agree that this is an issue that should be addressed, and also agree
> that it's reasonable to accept an empty commit message, if the
> original commit already had an empty commit message.  (Though perhaps
> not completely silently, but with a warning?  Dunno.)
>
> However, this should only be the case for existing commit messages
> that are taken verbatim, but never in the case when the user is asked
> for a commit message.
...
>   # Please enter the commit message for your changes. Lines starting
>   # with '#' will be ignored, and an empty message aborts the commit.
...
> Let's suppose you start an interactive rebase, choose a commit to
> squash, save the instruction sheet, rebase fires up your editor, and
> then you notice that you mistakenly chose the wrong commit to squash.
> What do you do, how do you abort?

All sound like reasonable arguments to me to differentiate between
commit messages that started empty (which I admit was what I had in
mind) vs. ones where we asked for user input and it came out empty.

Are you cooking up a patch?  I might be able to find a little time to
dig into this tomorrow if no one else is working on it already by
then.  It'll probably conflict a bunch with all the
rewrite-rebase-in-C topics in flight in pu.  :-(