[PATCH v5 11/12] t1407: make hash size independent
Instead of hard-coding a 40-based constant, split the output of for-each-ref and for-each-reflog by field. Signed-off-by: brian m. carlson --- t/t1407-worktree-ref-store.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t1407-worktree-ref-store.sh b/t/t1407-worktree-ref-store.sh index 4623ae15c4..9a84858118 100755 --- a/t/t1407-worktree-ref-store.sh +++ b/t/t1407-worktree-ref-store.sh @@ -58,7 +58,7 @@ test_expect_success 'for_each_reflog()' ' mkdir -p .git/worktrees/wt/logs/refs/bisect && echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random && - $RWT for-each-reflog | cut -c 42- | sort >actual && + $RWT for-each-reflog | cut -d" " -f 2- | sort >actual && cat >expected <<-\EOF && HEAD 0x1 PSEUDO-WT 0x0 @@ -68,7 +68,7 @@ test_expect_success 'for_each_reflog()' ' EOF test_cmp expected actual && - $RMAIN for-each-reflog | cut -c 42- | sort >actual && + $RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual && cat >expected <<-\EOF && HEAD 0x1 PSEUDO-MAIN 0x0
[PATCH v5 10/12] t1406: make hash-size independent
Instead of hard-coding a 40-based constant, split the output of for-each-ref and for-each-reflog by field. Signed-off-by: brian m. carlson --- t/t1406-submodule-ref-store.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh index e093782cc3..d199d872fb 100755 --- a/t/t1406-submodule-ref-store.sh +++ b/t/t1406-submodule-ref-store.sh @@ -39,7 +39,7 @@ test_expect_success 'rename_refs() not allowed' ' ' test_expect_success 'for_each_ref(refs/heads/)' ' - $RUN for-each-ref refs/heads/ | cut -c 42- >actual && + $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual && cat >expected <<-\EOF && master 0x0 new-master 0x0 @@ -48,7 +48,7 @@ test_expect_success 'for_each_ref(refs/heads/)' ' ' test_expect_success 'for_each_ref() is sorted' ' - $RUN for-each-ref refs/heads/ | cut -c 42- >actual && + $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual && sort actual > expected && test_cmp expected actual ' @@ -65,7 +65,7 @@ test_expect_success 'verify_ref(new-master)' ' ' test_expect_success 'for_each_reflog()' ' - $RUN for-each-reflog | sort | cut -c 42- >actual && + $RUN for-each-reflog | sort | cut -d" " -f 2- >actual && cat >expected <<-\EOF && HEAD 0x1 refs/heads/master 0x0
[PATCH v5 08/12] t1400: switch hard-coded object ID to variable
Switch a hard-coded all-zeros object ID to use a variable instead. Signed-off-by: brian m. carlson --- t/t1400-update-ref.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh index 7c8df20955..6072650686 100755 --- a/t/t1400-update-ref.sh +++ b/t/t1400-update-ref.sh @@ -346,7 +346,7 @@ test_expect_success "verifying $m's log (logged by config)" ' git update-ref $m $D cat >.git/logs/$m < 1117150320 -0500 +$Z $C $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150320 -0500 $C $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150350 -0500 $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150380 -0500 $F $Z $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150680 -0500
[PATCH v5 07/12] t1006: make hash size independent
Compute the size of the tree and commit objects we're creating by checking for the size of an object ID and computing the resulting sizes accordingly. Signed-off-by: brian m. carlson --- t/t1006-cat-file.sh | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index 7f19d591f2..a7c95bb785 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -140,15 +140,17 @@ test_expect_success '--batch-check without %(rest) considers whole line' ' test_cmp expect actual ' +test_oid_init + tree_sha1=$(git write-tree) -tree_size=33 +tree_size=$(($(test_oid rawsz) + 13)) tree_pretty_content="100644 blob $hello_sha1 hello" run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" commit_message="Initial commit" commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1) -commit_size=177 +commit_size=$(($(test_oid hexsz) + 137)) commit_content="tree $tree_sha1 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 00 + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 00 +
[PATCH v5 01/12] t: add test functions to translate hash-related values
Add several test functions to make working with various hash-related values easier. Add test_oid_init, which loads common hash-related constants and placeholder object IDs from the newly added files in t/oid-info. Provide values for these constants for both SHA-1 and SHA-256. Add test_oid_cache, which accepts data on standard input in the form of hash-specific key-value pairs that can be looked up later, using the same format as the files in t/oid-info. Document this format in a t/oid-info/README directory so that it's easier to use in the future. Add test_oid, which is used to specify look up a per-hash value (produced on standard output) based on the key specified as its argument. Usually the data to be looked up will be a hash-related constant (such as the size of the hash in binary or hexadecimal), a well-known or placeholder object ID (such as the all-zeros object ID or one consisting of "deadbeef" repeated), or something similar. For these reasons, test_oid will usually be used within a command substitution. Consequently, redirect the error output to standard error, since otherwise it will not be displayed. Add test_detect_hash, which currently only detects SHA-1, and test_set_hash, which can be used to set a different hash algorithm for test purposes. In the future, test_detect_hash will learn to actually detect the hash depending on how the testsuite is to be run. Use the local keyword within these functions to avoid overwriting other shell variables. We have had a test balloon in place for a couple of releases to catch shells that don't have this keyword and have not received any reports of failure. Note that the varying usages of local used here are supported by all common open-source shells supporting the local keyword. Test these new functions as part of t, which also serves to demonstrate basic usage of them. In addition, add documentation on how to format the lookup data and how to use the test functions. Implement two basic lookup charts, one for common invalid or synthesized object IDs, and one for various facts about the hash function in use. Provide versions of the data for both SHA-1 and SHA-256. Since we use shell variables for storage, names used for lookup can currently consist only of shell identifier characters. If this is a problem in the future, we can hash the names before use. Improved-by: Eric Sunshine Signed-off-by: Eric Sunshine Signed-off-by: brian m. carlson --- t/README| 22 + t/oid-info/README | 19 t/oid-info/hash-info| 8 + t/oid-info/oid | 29 + t/t-basic.sh| 37 ++ t/test-lib-functions.sh | 69 + 6 files changed, 184 insertions(+) create mode 100644 t/oid-info/README create mode 100644 t/oid-info/hash-info create mode 100644 t/oid-info/oid diff --git a/t/README b/t/README index 9028b47d92..eed574dd9b 100644 --- a/t/README +++ b/t/README @@ -806,6 +806,28 @@ library for your script to use. the symbolic link in the file system and a part that does; then only the latter part need be protected by a SYMLINKS prerequisite (see below). + - test_oid_init + + This function loads facts and useful object IDs related to the hash + algorithm(s) in use from the files in t/oid-info. + + - test_oid_cache + + This function reads per-hash algorithm information from standard + input (usually a heredoc) in the format described in + t/oid-info/README. This is useful for test-specific values, such as + object IDs, which must vary based on the hash algorithm. + + Certain fixed values, such as hash sizes and common placeholder + object IDs, can be loaded with test_oid_init (described above). + + - test_oid + + This function looks up a value for the hash algorithm in use, based + on the key given. The value must have been loaded using + test_oid_init or test_oid_cache. Providing an unknown key is an + error. + Prerequisites - diff --git a/t/oid-info/README b/t/oid-info/README new file mode 100644 index 00..27f843fc00 --- /dev/null +++ b/t/oid-info/README @@ -0,0 +1,19 @@ +This directory contains various per-hash values that are used in the testsuite. + +Each file contains lines containing a key-value pair; blank lines and lines +starting with `#` are ignored. The key and value are separated by whitespace +(specifically, those whitespace in the default `$IFS`). The key consists only +of shell identifier characters, and the value consists of a hash algorithm, +colon, and value. The hash algorithm also consists only of shell identifier +characters; it should match the value in sha1-file.c. + +For example, the following lines map the key "rawsz" to "20" if SHA-1 is in use +and to "32" if SHA-256 is in use: + + +rawsz sha1:20 +rawsz sha256:32 + + +The keys and values used here are loaded by `test_oid_init` (see the README file +in the "t"
[PATCH v5 06/12] t0064: make hash size independent
Compute test values of the appropriate size instead of hard-coding 40-character values. Rename the echo20 function to echoid, since the values may be of varying sizes. Signed-off-by: brian m. carlson --- t/t0064-sha1-array.sh | 49 --- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh index 67484502a0..5dda570b9a 100755 --- a/t/t0064-sha1-array.sh +++ b/t/t0064-sha1-array.sh @@ -3,30 +3,30 @@ test_description='basic tests for the SHA1 array implementation' . ./test-lib.sh -echo20 () { +echoid () { prefix="${1:+$1 }" shift while test $# -gt 0 do - echo "$prefix$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1" + echo "$prefix$ZERO_OID" | sed -e "s/00/$1/g" shift done } test_expect_success 'ordered enumeration' ' - echo20 "" 44 55 88 aa >expect && + echoid "" 44 55 88 aa >expect && { - echo20 append 88 44 aa 55 && + echoid append 88 44 aa 55 && echo for_each_unique } | test-tool sha1-array >actual && test_cmp expect actual ' test_expect_success 'ordered enumeration with duplicate suppression' ' - echo20 "" 44 55 88 aa >expect && + echoid "" 44 55 88 aa >expect && { - echo20 append 88 44 aa 55 && - echo20 append 88 44 aa 55 && + echoid append 88 44 aa 55 && + echoid append 88 44 aa 55 && echo for_each_unique } | test-tool sha1-array >actual && test_cmp expect actual @@ -34,8 +34,8 @@ test_expect_success 'ordered enumeration with duplicate suppression' ' test_expect_success 'lookup' ' { - echo20 append 88 44 aa 55 && - echo20 lookup 55 + echoid append 88 44 aa 55 && + echoid lookup 55 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -eq 1 @@ -43,8 +43,8 @@ test_expect_success 'lookup' ' test_expect_success 'lookup non-existing entry' ' { - echo20 append 88 44 aa 55 && - echo20 lookup 33 + echoid append 88 44 aa 55 && + echoid lookup 33 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -lt 0 @@ -52,9 +52,9 @@ test_expect_success 'lookup non-existing entry' ' test_expect_success 'lookup with duplicates' ' { - echo20 append 88 44 aa 55 && - echo20 append 88 44 aa 55 && - echo20 lookup 55 + echoid append 88 44 aa 55 && + echoid append 88 44 aa 55 && + echoid lookup 55 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -ge 2 && @@ -63,19 +63,24 @@ test_expect_success 'lookup with duplicates' ' test_expect_success 'lookup non-existing entry with duplicates' ' { - echo20 append 88 44 aa 55 && - echo20 append 88 44 aa 55 && - echo20 lookup 66 + echoid append 88 44 aa 55 && + echoid append 88 44 aa 55 && + echoid lookup 66 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -lt 0 ' test_expect_success 'lookup with almost duplicate values' ' + # n-1 5s + root=$(echoid "" 55) && + root=${root%5} && { - echo "append " && - echo "append 555f" && - echo20 lookup 55 + id1="${root}5" && + id2="${root}f" && + echo "append $id1" && + echo "append $id2" && + echoid lookup 55 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -eq 0 @@ -83,8 +88,8 @@ test_expect_success 'lookup with almost duplicate values' ' test_expect_success 'lookup with single duplicate value' ' { - echo20 append 55 55 && - echo20 lookup 55 + echoid append 55 55 && + echoid lookup 55 } | test-tool sha1-array >actual && n=$(cat actual) && test "$n" -ge 0 &&
[PATCH v5 02/12] t0000: use hash translation table
If the hash we're using is 32 bytes in size, attempting to insert a 20-byte object name won't work. Since these are synthesized objects that are almost all zeros, look them up in a translation table. Signed-off-by: brian m. carlson --- t/t-basic.sh | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index b94221b951..a9dc534048 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -1015,12 +1015,13 @@ test_expect_success SHA1 'validate object ID for a known tree' ' test_expect_success 'put invalid objects into the index' ' rm -f .git/index && - cat >badobjects <<-\EOF && - 100644 blob 1000dir/file1 - 100644 blob 2000dir/file2 - 100644 blob 3000dir/file3 - 100644 blob 4000dir/file4 - 100644 blob 5000dir/file5 + suffix=$(echo $ZERO_OID | sed -e "s/^.//") && + cat >badobjects <<-EOF && + 100644 blob $(test_oid 001) dir/file1 + 100644 blob $(test_oid 002) dir/file2 + 100644 blob $(test_oid 003) dir/file3 + 100644 blob $(test_oid 004) dir/file4 + 100644 blob $(test_oid 005) dir/file5 EOF git update-index --index-info
[PATCH v5 12/12] t5318: use test_oid for HASH_LEN
From: Derrick Stolee Signed-off-by: Derrick Stolee Signed-off-by: brian m. carlson --- t/t5318-commit-graph.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 0c500f7ca2..75fe09521f 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -8,7 +8,8 @@ test_expect_success 'setup full repo' ' cd "$TRASH_DIRECTORY/full" && git init && git config core.commitGraph true && - objdir=".git/objects" + objdir=".git/objects" && + test_oid_init ' test_expect_success 'verify graph with no graph file' ' @@ -273,7 +274,7 @@ test_expect_success 'git commit-graph verify' ' NUM_COMMITS=9 NUM_OCTOPUS_EDGES=2 -HASH_LEN=20 +HASH_LEN="$(test_oid rawsz)" GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 GRAPH_BYTE_CHUNK_COUNT=6
[PATCH v5 03/12] t0000: update tests for SHA-256
Test t tests the "basics of the basics" and as such, checks that we have various fixed hard-coded object IDs. The tests relying on these assertions have been marked with the SHA1 prerequisite, as they will obviously not function in their current form with SHA-256. Use the test_oid helper to update these assertions and provide values for both SHA-1 and SHA-256. These object IDs were synthesized using a set of scripts that created the objects for both SHA-1 and SHA-256 using the same method to ensure that they are indeed the correct values. Signed-off-by: brian m. carlson --- t/t-basic.sh | 163 +-- 1 file changed, 102 insertions(+), 61 deletions(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index a9dc534048..391f910c6a 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -861,6 +861,47 @@ test_expect_success 'test_oid can look up data for SHA-256' ' # Basics of the basics +test_oid_cache <<\EOF +path0f sha1:f87290f8eb2cbbea7857214459a0739927eab154 +path0f sha256:638106af7c38be056f3212cbd7ac65bc1bac74f420ca5a436ff006a9d025d17d + +path0s sha1:15a98433ae33114b085f3eb3bb03b832b3180a01 +path0s sha256:3a24cc53cf68edddac490bbf94a418a52932130541361f685df685e41dd6c363 + +path2f sha1:3feff949ed00a62d9f7af97c15cd8a30595e7ac7 +path2f sha256:2a7f36571c6fdbaf0e3f62751a0b25a3f4c54d2d1137b3f4af9cb794bb498e5f + +path2s sha1:d8ce161addc5173867a3c3c730924388daedbc38 +path2s sha256:18fd611b787c2e938ddcc248fabe4d66a150f9364763e9ec133dd01d5bb7c65a + +path2d sha1:58a09c23e2ca152193f2786e06986b7b6712bdbe +path2d sha256:00e4b32b96e7e3d65d79112dcbea53238a22715f896933a62b811377e2650c17 + +path3f sha1:0aa34cae68d0878578ad119c86ca2b5ed5b28376 +path3f sha256:09f58616b951bd571b8cb9dc76d372fbb09ab99db2393f5ab3189d26c45099ad + +path3s sha1:8599103969b43aff7e430efea79ca4636466794f +path3s sha256:fce1aed087c053306f3f74c32c1a838c662bbc4551a7ac2420f5d6eb061374d0 + +path3d sha1:21ae8269cacbe57ae09138dcc3a2887f904d02b3 +path3d sha256:9b60497be959cb830bf3f0dc82bcc9ad9e925a24e480837ade46b2295e47efe1 + +subp3f sha1:00fb5908cb97c2564a9783c0c64087333b3b464f +subp3f sha256:a1a9e16998c988453f18313d10375ee1d0ddefe757e710dcae0d66aa1e0c58b3 + +subp3s sha1:6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c +subp3s sha256:81759d9f5e93c6546ecfcadb560c1ff057314b09f93fe8ec06e2d8610d34ef10 + +subp3d sha1:3c5e5399f3a333eddecce7a9b9465b63f65f51e2 +subp3d sha256:76b4ef482d4fa1c754390344cf3851c7f883b27cf9bc999c6547928c46aeafb7 + +root sha1:087704a96baf1c2d1c869a8b084481e121c88b5b +root sha256:9481b52abab1b2ffeedbf9de63ce422b929f179c1b98ff7bee5f8f1bc0710751 + +simpletree sha1:7bb943559a305bdd6bdee2cef6e5df2413c3d30a +simpletree sha256:1710c07a6c86f9a3c7376364df04c47ee39e5a5e221fcdd84b743bc9bb7e2bc5 +EOF + # updating a new file without --add should fail. test_expect_success 'git update-index without --add should fail adding' ' test_must_fail git update-index should-be-empty @@ -876,8 +917,8 @@ test_expect_success 'writing tree out with git write-tree' ' ' # we know the shape and contents of the tree and know the object ID for it. -test_expect_success SHA1 'validate object ID of a known tree' ' - test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a +test_expect_success 'validate object ID of a known tree' ' + test "$tree" = "$(test_oid simpletree)" ' # Removing paths. @@ -919,16 +960,16 @@ test_expect_success 'showing stage with git ls-files --stage' ' git ls-files --stage >current ' -test_expect_success SHA1 'validate git ls-files output for a known tree' ' - cat >expected <<-\EOF && - 100644 f87290f8eb2cbbea7857214459a0739927eab154 0 path0 - 12 15a98433ae33114b085f3eb3bb03b832b3180a01 0 path0sym - 100644 3feff949ed00a62d9f7af97c15cd8a30595e7ac7 0 path2/file2 - 12 d8ce161addc5173867a3c3c730924388daedbc38 0 path2/file2sym - 100644 0aa34cae68d0878578ad119c86ca2b5ed5b28376 0 path3/file3 - 12 8599103969b43aff7e430efea79ca4636466794f 0 path3/file3sym - 100644 00fb5908cb97c2564a9783c0c64087333b3b464f 0 path3/subp3/file3 - 12 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c 0 path3/subp3/file3sym +test_expect_success 'validate git ls-files output for a known tree' ' + cat >expected <<-EOF && + 100644 $(test_oid path0f) 0 path0 + 12 $(test_oid path0s) 0 path0sym + 100644 $(test_oid path2f) 0 path2/file2 + 12 $(test_oid path2s) 0 path2/file2sym + 100644 $(test_oid path3f) 0 path3/file3 + 12 $(test_oid path3s) 0 path3/file3sym + 100644 $(test_oid subp3f) 0 path3/subp3/file3 + 12 $(test_oid subp3s) 0 path3/subp3/file3sym EOF test_cmp expected current ' @@ -937,20 +978,20 @@ test_expect_success 'writing tree out with git write-tree' ' tree=$(git write-tree) '
[PATCH v5 00/12] Hash-independent tests (part 3)
This is the next in the series of improvements to make tests hash-independent. A range-diff is below. Changes from v4: * Add local statements to the &&-chain. * Fix a typo in the local statement. * Add a helpful comment about why test_detect_hash is hard-coded to SHA-1. Changes from v3: * Update t/README to specify new helpers. * Escape "$" in double quotes. * Add documentation comments to test helpers. * Use local instead of underscores in variable names. * Use expr instead of egrep to match hash algorithm names. * Improve clarity of unset variable check in test_oid. * Wrap test_oid_init in a test_expect_success. * Improved && chain handling in test functions. * Add test_detect_hash in test_oid_init. * Clean up extra blank line. * Add patch from Derrick Stolee for t5318, modified slightly. Changes from v2: * Fix a typo in "zero_2". * Provide better matching of expected output. * Add and use test_oid_init instead of filename-based test_oid_cache. * Add test_set_hash. * Provide better error checking in newly added test functions. * Move t constants into the test, removing the separate file. * Switch to using a differently named temporary file in t0027. Changes from v1: * Adopt pure shell approach for helper. * Add tests for the helpers. * Explicitly refer to SHA-256 now that we know it will be NewHash. * Updated t to remove SHA1 prerequisite. * Change name of helper from test_translate to test_oid. * Add helper to cache information in the shell. * Simplified lookup of HEAD in t0002. * Switched to using existing helper function in t0027. * Simplified handling of IDs in t0064. Derrick Stolee (1): t5318: use test_oid for HASH_LEN brian m. carlson (11): t: add test functions to translate hash-related values t: use hash translation table t: update tests for SHA-256 t0002: abstract away SHA-1 specific constants t0027: make hash size independent t0064: make hash size independent t1006: make hash size independent t1400: switch hard-coded object ID to variable t1405: make hash size independent t1406: make hash-size independent t1407: make hash size independent t/README | 22 t/oid-info/README | 19 +++ t/oid-info/hash-info | 8 ++ t/oid-info/oid | 29 + t/t-basic.sh | 213 ++--- t/t0002-gitfile.sh | 27 +++-- t/t0027-auto-crlf.sh | 3 +- t/t0064-sha1-array.sh | 49 t/t1006-cat-file.sh| 6 +- t/t1400-update-ref.sh | 2 +- t/t1405-main-ref-store.sh | 4 +- t/t1406-submodule-ref-store.sh | 6 +- t/t1407-worktree-ref-store.sh | 4 +- t/t5318-commit-graph.sh| 5 +- t/test-lib-functions.sh| 69 +++ 15 files changed, 352 insertions(+), 114 deletions(-) create mode 100644 t/oid-info/README create mode 100644 t/oid-info/hash-info create mode 100644 t/oid-info/oid Range-diff against v4: 1: fd13b542e7 ! 1: 831a0df666 t: add test functions to translate hash-related values @@ -222,6 +222,8 @@ + +# Detect the hash algorithm in use. +test_detect_hash () { ++ # Currently we only support SHA-1, but in the future this function will ++ # actually detect the algorithm in use. + test_hash_algo='sha1' +} + @@ -241,9 +243,9 @@ +# rawsz sha1:20 +# rawsz sha256:32 +test_oid_cache () { -+ local tag reset k v ++ local tag rest k v && + -+ test -n "$test_hash_algo" || test_detect_hash && ++ { test -n "$test_hash_algo" || test_detect_hash; } && + while read tag rest + do + case $tag in @@ -271,7 +273,7 @@ +# Look up a per-hash value based on a key ($1). The value must have been loaded +# by test_oid_init or test_oid_cache. +test_oid () { -+ local var="test_oid_${test_hash_algo}_$1" ++ local var="test_oid_${test_hash_algo}_$1" && + + # If the variable is unset, we must be missing an entry for this + # key-hash pair, so exit with an error. 2: 335c75e1ec = 2: 75eb36456b t: use hash translation table 3: 257b458ad9 = 3: d1cebc5dba t: update tests for SHA-256 4: 69080d3bfc = 4: a75f4a049e t0002: abstract away SHA-1 specific constants 5: a4297d7ae8 = 5: ab8598048e t0027: make hash size independent 6: 7cf1221274 = 6: ed540a08e8 t0064: make hash size independent 7: 7e5c857c25 = 7: e221995a27 t1006: make hash size independent 8: 9271787d4b = 8: 3a168746ff t1400: switch hard-coded object ID to variable 9: 43bd41156a = 9: 3e55ccdd15 t1405: make hash size independent 10: 875bbd7fc5 = 10: df7d4eb112 t1406: make hash-size independent 11: a9c2d31089 = 11: b1a04c1986 t1407: make hash size independent 12: 10452cc352 = 12: 63d21a1eca t5318: use test_oid for HASH_LEN
[PATCH v5 04/12] t0002: abstract away SHA-1 specific constants
Adjust the test so that it computes variables for object IDs instead of using hard-coded hashes. Signed-off-by: brian m. carlson --- t/t0002-gitfile.sh | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh index 3691023d51..0aa9908ea1 100755 --- a/t/t0002-gitfile.sh +++ b/t/t0002-gitfile.sh @@ -92,11 +92,12 @@ test_expect_success 'enter_repo non-strict mode' ' mv .git .realgit && echo "gitdir: .realgit" >.git ) && + head=$(git -C enter_repo rev-parse HEAD) && git ls-remote enter_repo >actual && - cat >expected <<-\EOF && - 946e985ab20de757ca5b872b16d64e92ff3803a9HEAD - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo + cat >expected <<-EOF && + $head HEAD + $head refs/heads/master + $head refs/tags/foo EOF test_cmp expected actual ' @@ -106,21 +107,23 @@ test_expect_success 'enter_repo linked checkout' ' cd enter_repo && git worktree add ../foo refs/tags/foo ) && + head=$(git -C enter_repo rev-parse HEAD) && git ls-remote foo >actual && - cat >expected <<-\EOF && - 946e985ab20de757ca5b872b16d64e92ff3803a9HEAD - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo + cat >expected <<-EOF && + $head HEAD + $head refs/heads/master + $head refs/tags/foo EOF test_cmp expected actual ' test_expect_success 'enter_repo strict mode' ' + head=$(git -C enter_repo rev-parse HEAD) && git ls-remote --upload-pack="git upload-pack --strict" foo/.git >actual && - cat >expected <<-\EOF && - 946e985ab20de757ca5b872b16d64e92ff3803a9HEAD - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master - 946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo + cat >expected <<-EOF && + $head HEAD + $head refs/heads/master + $head refs/tags/foo EOF test_cmp expected actual '
[PATCH v5 05/12] t0027: make hash size independent
We transform various object IDs into all-zero object IDs for comparison. Adjust the length as well so that this works for all hash algorithms. While it would be slightly more efficient to use a sed y/// expression instead of both the tr and sed, retain the tr in both cases for parallelism and compactness. Signed-off-by: brian m. carlson --- t/t0027-auto-crlf.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index beb5927f77..dfb46bbe86 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -15,7 +15,8 @@ compare_ws_file () { pfx=$1 exp=$2.expect act=$pfx.actual.$3 - tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" && + tr '\015\000abcdef0123456789' QN0 <"$2" | + sed -e "s/*/$ZERO_OID/" >"$exp" && tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" && test_cmp "$exp" "$act" && rm "$exp" "$act"
[PATCH v5 09/12] t1405: make hash size independent
Instead of hard-coding a 40-based constant, split the output of for-each-ref and for-each-reflog by field. Signed-off-by: brian m. carlson --- t/t1405-main-ref-store.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh index a74c38b5fb..331899ddc4 100755 --- a/t/t1405-main-ref-store.sh +++ b/t/t1405-main-ref-store.sh @@ -54,7 +54,7 @@ test_expect_success 'for_each_ref(refs/heads/)' ' ' test_expect_success 'for_each_ref() is sorted' ' - $RUN for-each-ref refs/heads/ | cut -c 42- >actual && + $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual && sort actual > expected && test_cmp expected actual ' @@ -71,7 +71,7 @@ test_expect_success 'verify_ref(new-master)' ' ' test_expect_success 'for_each_reflog()' ' - $RUN for-each-reflog | sort -k2 | cut -c 42- >actual && + $RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual && cat >expected <<-\EOF && HEAD 0x1 refs/heads/master 0x0
Re: Multiple GIT Accounts & HTTPS Client Certificates - Config
On Tue, Sep 11, 2018 at 09:42:35AM +0200, Sergei Haller wrote: > PS: while I was trying to send the mail to this mailing list, there is > some update from the stack overflow side: > > * I am using sslBackend schannel > * the private key of my client certificate can be provided by using > the http.sslKey config option > * the private key is on a smart card, so there is no way I can copy it > over to a file and use the openssl backend (at least no way that I am > aware of :) > > so basically this pins down to the fact that schannel implementation > is picking the wrong key. Git doesn't provide an option to query a smartcard outside of what your typical TLS implementation provides by default. Not being a Windows user, I don't know if Schannel provides a way to automatically select the right private key based on which certificate is used, or if that's outside its capability. > PS: the recent realization makes me believe this is a window specific > problem. I think I have read somewhere > about a separate windows mailing list (but not sure where I saw it) If you're talking about Git for Windows, yes, this is probably better addressed as an issue there: https://github.com/git-for-windows/git. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Compliment of the day to you Dear Friend.
Compliment of the day to you Dear Friend. Dear Friend. I am Mrs. Amina Kadi. am sending this brief letter to solicit your partnership to transfer $5.5 million US Dollars. I shall send you more information and procedures when I receive positive response from you. Mrs. Amina Kadi
Re: [PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)
Hi Thomas, [quickly, as I will go back to a proper vacation after this] On Wed, 12 Sep 2018, Thomas Gummerer wrote: > diff --git a/linear-assignment.c b/linear-assignment.c > index 9b3e56e283..7700b80eeb 100644 > --- a/linear-assignment.c > +++ b/linear-assignment.c > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, > int *cost, > else if (j1 < -1) > row2column[i] = -2 - j1; > else { > - int min = COST(!j1, i) - v[!j1]; > - for (j = 1; j < column_count; j++) > + int min = INT_MAX; I am worried about this, as I tried very hard to avoid integer overruns. Wouldn't it be possible to replace the `else {` by an appropriate `else if (...) { ... } else {`? E.g. `else if (column_count < 2)` or some such? Ciao, Dscho > + for (j = 0; j < column_count; j++) > if (j != j1 && min > COST(j, i) - v[j]) > min = COST(j, i) - v[j]; > v[j1] -= min; > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index 2237c7f4af..fb4c13a84a 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -142,4 +142,9 @@ test_expect_success 'changed message' ' > test_cmp expected actual > ' > > +test_expect_success 'no commits on one side' ' > + git commit --amend -m "new message" && > + git range-diff master HEAD@{1} HEAD > +' > + > test_done > -- > 2.19.0.397.gdd90340f6a > >
[ANNOUNCE] Git for Windows 2.19.0
Dear Git users, It is my pleasure to announce that Git for Windows 2.19.0 is available from: https://gitforwindows.org/ Changes since Git for Windows v2.18.0 (June 22nd 2018) New Features * Comes with Git v2.19.0. * There are now fast, built-in versions of git stash and git rebase, available as experimental options. * The included OpenSSH client now enables modern ciphers. * The gitweb component was removed because it is highly unlikely to be used on Windows. * The git archimport tool (which was probably used by exactly 0 users) is no longer included in Git for Windows. * Comes with tig v2.4.0. * Comes with Git LFS v2.5.1. * Comes with Git Credential Manager v1.17.1. * Comes with OpenSSL v1.0.2p. * Comes with cURL v7.61.1. * Comes with mingw-w64-nodejs v8.12.0. Bug Fixes * The http.schannel.checkRevoke setting (which never worked) was renamed to http.schannelCheckRevoke. In the same run, http.schannel.useSSLCAInfo (which also did not work, for the same reason) was renamed to http.schannelUseSSLCAInfo. * Avoids a stack overflow with recent Windows Insider versions. * Git GUI now handles hooks correctly in worktrees other than the main one. * When using core.autocrlf, the bogus "LF will be replaced by CRLF" warning is now suppressed. * The funny fatal error -cmalloc would have returned NULL problems should be gone. Filename | SHA-256 | --- Git-2.19.0-64-bit.exe | 2dd5824e29ca44e86016cdb3dae8446cb5b80b77f596b67e3d8754451fa4bbcb Git-2.19.0-32-bit.exe | 66b29edc994838586e141b07394900426d59d86a3126cddc04d9ab26032eb7b4 PortableGit-2.19.0-64-bit.7z.exe | 1b8761ae57f589890a83995d0da7891efbddfee14e9f0c3ffda91f6add5b9351 PortableGit-2.19.0-32-bit.7z.exe | edaeb1962b7620bff33d45889eb4bcb6e4ac95021042910871b6025515785c16 MinGit-2.19.0-64-bit.zip | 424d24b5fc185a9c5488d7872262464f2facab4f1d4693ea8008196f14a3c19b MinGit-2.19.0-32-bit.zip | 83cf018bd6f5c24e2b3088539bbeef9067fd632087d094d447a3a0ff676e7bd7 MinGit-2.19.0-busybox-64-bit.zip | 128b355e4321dbaf699ff4994ddbb6cde92783f4930be50aad507f6d8dbec0d0 MinGit-2.19.0-busybox-32-bit.zip | 2695578e2c53f32194986456ce714388149d051041f638e9ed43dd8342465fb2 Git-2.19.0-64-bit.tar.bz2 | c3ca6f9f2b6b800ade80ee4973cdaed9869b8c2c684c9a6a0940d5fb1e18bef7 Git-2.19.0-32-bit.tar.bz2 | 6f8426646e68ea2689b6a1cfc09ba221805f21540e7dc7af00335f27d3b333b8 pdbs-for-git-64-bit-2.19.0.1.d96bb8bc6c-1.zip | 5f4417ab5b362823b416ead4960e8050682349b603af9ef653bacbbb7eea408a pdbs-for-git-32-bit-2.19.0.1.d96bb8bc6c-1.zip | 157f78f63d8df39b1ae6dedfcafbe1a9093a1fcc4768b123dae12ee2ae8c82c5 Ciao, Johannes
Re: [Question] Signature calculation ignoring parts of binary files
"Randall S. Becker" writes: >> author is important to our process. My objective is to keep the original file >> 100% exact as supplied and then ignore any changes to the metadata that I >> don't care about (like Creator) if the remainder of the file is the same. That will *not* work. If person A gave you a version of original, which hashes to X after you strip the cruft you do not care about, you would register that original with person A's fingerprint on under the name of X. What happens when person B gives you another version, which is not byte-for-byte identical to the one you got earlier from person A, but does hash to the same X after you strip the cruft? If you are going to store it in Git, and if by SHA-1 you are calling what we perceive as "object name" in Git land, you must store that one with person B's fingerprint on it also under the name of X. Now which version will you get from Git when you ask it to give you the object that hashes to X?
Re: [PATCH 1/1] contrib: add coverage-diff script
Junio C Hamano writes: > "Derrick Stolee via GitGitGadget" writes: > >> contrib/coverage-diff.sh | 70 >> 1 file changed, 70 insertions(+) >> create mode 100755 contrib/coverage-diff.sh > > I fully appreciate the motivation. But it is a bit sad that this > begins with "#!/bin/bash" but it seems that the script is full of > bash-isms. I haven't gone through the script to see if these are > inevitable or gratuitous yet, but I'd assume it made it easier for > you to write it to step outside the pure POSIX shell? > ... >> +elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then >> +path=${BASH_REMATCH[2]} > > OK, it probably is easier to write in bash than using expr if you > want to do regexp. Just to clarify. I am saying that it is OK to give up writing in pure POSIX and relying on bash-isms after seeing these lines.
Re: [PATCH] linear-assignment: fix potential out of bounds memory access
On 09/12, Junio C Hamano wrote: > Thomas Gummerer writes: > > --- >8 --- > > > > Subject: [PATCH] linear-assignment: fix potential out of bounds memory > > access > > > > Currently the 'compute_assignment()' function can may read memory out > > of bounds, even if used correctly. Namely this happens when we only > > have one column. In that case we try to calculate the initial > > minimum cost using '!j1' as column in the reduction transfer code. > > That in turn causes us to try and get the cost from column 1 in the > > cost matrix, which does not exist, and thus results in an out of > > bounds memory read. > > This nicely explains what goes wrong. > > > Instead of trying to intialize the minimum cost from another column, > > just set it to INT_MAX. This also matches what the example code in the > > original paper for the algorithm [1] does (it initializes the value to > > inf, for which INT_MAX is the closest match in C). > > Yeah, if we really want to avoid INT_MAX we could use another "have > we found any value yet?" boolean variable, but the caller in > get_correspondences() does not even worry about integer overflows > when stuffing diffsize to the cost[] array, and the other possible > value that can go to cost[] array is COST_MAX that is mere 65k, so > it would be OK to use INT_MAX as sentinel here, I guess. Right, I'm not sure it would be worth introducing another boolean variable here. In the normal case we'll always enter the if condition inside the loop, and set a reasonable 'min' value. That does not happen if we only have one column, and the 'min' will remain 'INT_MAX'. Now in that case it doesn't matter much, as having only one column means there's no possibility to assign anything, so the actual values shouldn't matter (at least that's my understanding of the algorithm so far). Another improvement we may be able to make here is to not even try to compute the assignment if there's only one column for that reason, but I'm out of time today and the rest of my week looks a bit busy, so I probably won't get to do anything before the beginning of next week. > > Note that the test only fails under valgrind on Linux, but the same > > command has been reported to segfault on Mac OS. > > > > Also start from 0 in the loop, which matches what the example code in > > the original paper does as well. Starting from 1 means we'd ignore > > the first column during the reduction transfer phase. Note that in > > the original paper the loop does start from 1, but the implementation > > is in Pascal, where arrays are 1 indexed. > > > > [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path > > algorithm for dense and sparse linear assignment > > problems. Computing, 38(4), 325–340. > > > > Reported-by: ryenus > > Helped-by: Derrick Stolee > > Signed-off-by: Thomas Gummerer > > --- > > linear-assignment.c | 4 ++-- > > t/t3206-range-diff.sh | 5 + > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/linear-assignment.c b/linear-assignment.c > > index 9b3e56e283..7700b80eeb 100644 > > --- a/linear-assignment.c > > +++ b/linear-assignment.c > > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, > > int *cost, > > else if (j1 < -1) > > row2column[i] = -2 - j1; > > else { > > - int min = COST(!j1, i) - v[!j1]; > > - for (j = 1; j < column_count; j++) > > + int min = INT_MAX; > > + for (j = 0; j < column_count; j++) > > if (j != j1 && min > COST(j, i) - v[j]) > > min = COST(j, i) - v[j]; > > v[j1] -= min; > > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > > index 2237c7f4af..fb4c13a84a 100755 > > --- a/t/t3206-range-diff.sh > > +++ b/t/t3206-range-diff.sh > > @@ -142,4 +142,9 @@ test_expect_success 'changed message' ' > > test_cmp expected actual > > ' > > > > +test_expect_success 'no commits on one side' ' > > + git commit --amend -m "new message" && > > + git range-diff master HEAD@{1} HEAD > > +' > > + > > test_done
Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon wrote: > > Signed-off-by: Josh Steadmon > --- > builtin/archive.c | 8 +++- > http-backend.c | 10 +- > transport-helper.c | 5 +++-- > 3 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/builtin/archive.c b/builtin/archive.c > index 73831887d..5fa75b3f7 100644 > --- a/builtin/archive.c > +++ b/builtin/archive.c > @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv, > status = packet_reader_read(); > if (status != PACKET_READ_FLUSH) > die(_("git archive: expected a flush")); > - } > + } else if (version == protocol_v2 && > + starts_with(transport->url, "http")) I assume this is a smart way to cover both http and https as the latter is prefixed by the former. How does this interact with remote helpers (See gitremote-helpers(1), which doesn't mention archives, but I would imagine that one would be able to use a remote helper eventually, too? git archive --remote=abc://example.org/test > + /* > +* Commands over HTTP require two requests, so there's an > +* additional server response to parse. > +*/ > + discover_version(); > > /* Now, start reading from fd[0] and spit it out to stdout */ > rv = recv_sideband("archive", fd[0], 1); > diff --git a/http-backend.c b/http-backend.c > index 458642ef7..d62d583c7 100644 > --- a/http-backend.c > +++ b/http-backend.c > @@ -32,6 +32,7 @@ struct rpc_service { > static struct rpc_service rpc_service[] = { > { "upload-pack", "uploadpack", 1, 1 }, > { "receive-pack", "receivepack", 0, -1 }, > + { "upload-archive", "uploadarchive", 1, 1 }, So git archives are not supported in protocol v0 via http? Then it makes sense to see not mention of archives in the remote helpers as well. The rest of the code is a surprisingly small patch. Stefan
Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support
On 09/12, Ben Peart wrote: > Teach get_index_format_default() to support running the test suite > with specific index versions. In particular, this enables the test suite > to be run using index version 4 which is not the default so gets less testing. I found this commit message slightly misleading. Running the test suite with specific index versions is already supported, by defining TEST_GIT_INDEX_VERSION in 'config.mak'. What we're doing here is introduce an additional environment variable that can also be used to set the index format in tests. Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the test suite with index-v4. Admittedly the name is a bit strange compared to our usual GIT_TEST_* environment variable names, and it should probably be documented better (it's only documented in the Makefile currently), but I'm not sure we should introduce another environment variable for this purpose? > Signed-off-by: Ben Peart > --- > > Notes: > Base Ref: v2.19.0 > Web-Diff: https://github.com/benpeart/git/commit/52e733e2ce > Checkout: git fetch https://github.com/benpeart/git > git-test-index-version-v1 && git checkout 52e733e2ce > > read-cache.c | 47 +-- > t/README | 6 +- > 2 files changed, 38 insertions(+), 15 deletions(-) > > diff --git a/read-cache.c b/read-cache.c > index 7b1354d759..d140ce9989 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1570,26 +1570,45 @@ static unsigned int get_index_format_default(void) > char *envversion = getenv("GIT_INDEX_VERSION"); > char *endp; > int value; > - unsigned int version = INDEX_FORMAT_DEFAULT; > + unsigned int version = -1; > + > + if (envversion) { > + version = strtoul(envversion, , 10); > + if (*endp || > + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) > { > + warning(_("GIT_INDEX_VERSION set, but the value is > invalid.\n" > + "Using version %i"), INDEX_FORMAT_DEFAULT); > + version = INDEX_FORMAT_DEFAULT; > + } > + } > > - if (!envversion) { > - if (!git_config_get_int("index.version", )) > + if (version == -1) { > + if (!git_config_get_int("index.version", )) { > version = value; > - if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { > - warning(_("index.version set, but the value is > invalid.\n" > - "Using version %i"), INDEX_FORMAT_DEFAULT); > - return INDEX_FORMAT_DEFAULT; > + if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < > version) { > + warning(_("index.version set, but the value is > invalid.\n" > + "Using version %i"), > INDEX_FORMAT_DEFAULT); > + version = INDEX_FORMAT_DEFAULT; > + } > } > - return version; > } > > - version = strtoul(envversion, , 10); > - if (*endp || > - version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { > - warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n" > - "Using version %i"), INDEX_FORMAT_DEFAULT); > - version = INDEX_FORMAT_DEFAULT; > + if (version == -1) { > + envversion = getenv("GIT_TEST_INDEX_VERSION"); > + if (envversion) { > + version = strtoul(envversion, , 10); > + if (*endp || > + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < > version) { > + warning(_("GIT_TEST_INDEX_VERSION set, but the > value is invalid.\n" > + "Using version %i"), > INDEX_FORMAT_DEFAULT); > + version = INDEX_FORMAT_DEFAULT; > + } > + } > } > + > + if (version == -1) > + version = INDEX_FORMAT_DEFAULT; > + > return version; > } > > diff --git a/t/README b/t/README > index 9028b47d92..f872638a78 100644 > --- a/t/README > +++ b/t/README > @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the > object size is > over 2GB. This variable forces the code path on any object larger than > bytes. > > -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code > +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code > path where deltas larger than this limit require extra memory > allocation for bookkeeping. > > +GIT_TEST_INDEX_VERSION= exercises the index read/write code path > +for the index version specified. Can be set to any valid version > +but the non-default version 4 is probably the most beneficial. > + > Naming Tests > > > > base-commit:
Re: [PATCH 2/3] archive: implement protocol v2 archive command
On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon wrote: > > This adds a new archive command for protocol v2. The command expects > arguments in the form "argument X" which are passed unmodified to > git-upload-archive--writer. > > This command works over the file://, Git, and SSH transports. HTTP > support will be added in a separate patch. > > Signed-off-by: Josh Steadmon > --- > builtin/archive.c| 45 +++- > builtin/upload-archive.c | 44 --- > t/t5000-tar-tree.sh | 5 + > 3 files changed, 77 insertions(+), 17 deletions(-) > > diff --git a/builtin/archive.c b/builtin/archive.c > index e54fc39ad..73831887d 100644 > --- a/builtin/archive.c > +++ b/builtin/archive.c > @@ -5,9 +5,11 @@ > #include "cache.h" > #include "builtin.h" > #include "archive.h" > +#include "connect.h" > #include "transport.h" > #include "parse-options.h" > #include "pkt-line.h" > +#include "protocol.h" > #include "sideband.h" > > static void create_output_file(const char *output_file) > @@ -23,6 +25,13 @@ static void create_output_file(const char *output_file) > } > } > > +static int do_v2_command_and_cap(int out) > +{ > + packet_write_fmt(out, "command=archive\n"); > + /* Capability list would go here, if we had any. */ > + packet_delim(out); > +} > + > static int run_remote_archiver(int argc, const char **argv, >const char *remote, const char *exec, >const char *name_hint) > @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv, > struct remote *_remote; > struct packet_reader reader; > enum packet_read_status status; > + enum protocol_version version; > > _remote = remote_get(remote); > if (!_remote->url[0]) > @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv, > > packet_reader_init(, fd[0], NULL, 0, > PACKET_READ_CHOMP_NEWLINE); > > + version = discover_version(); > + > + if (version == protocol_v2) > + do_v2_command_and_cap(fd[1]); > + > /* > * Inject a fake --format field at the beginning of the > * arguments, with the format inferred from our output > @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char > **argv, > packet_write_fmt(fd[1], "argument %s\n", argv[i]); > packet_flush(fd[1]); > > - status = packet_reader_read(); > - > - if (status == PACKET_READ_FLUSH) > - die(_("git archive: expected ACK/NAK, got a flush packet")); > - if (strcmp(reader.buffer, "ACK")) { > - if (starts_with(reader.buffer, "NACK ")) > - die(_("git archive: NACK %s"), reader.buffer + 5); > - if (starts_with(reader.buffer, "ERR ")) > - die(_("remote error: %s"), reader.buffer + 4); > - die(_("git archive: protocol error")); Maybe we also want to support v1 (which is v0 prefixed with one pkt_line saying it is v1). If (version == protocol_v1) /* drop version v1 line, and then follow v0 logic. */ packet_reader_read(); Do we care about v1, or do we just ignore it here? why? (Don't answer me here, but rather put it in the commit message) > + if (version == protocol_v0) { > + status = packet_reader_read(); > + > + if (status == PACKET_READ_FLUSH) > + die(_("git archive: expected ACK/NAK, got a flush > packet")); > + if (strcmp(reader.buffer, "ACK")) { > + if (starts_with(reader.buffer, "NACK ")) > + die(_("git archive: NACK %s"), reader.buffer > + 5); > + if (starts_with(reader.buffer, "ERR ")) > + die(_("remote error: %s"), reader.buffer + 4); > + die(_("git archive: protocol error")); > + } > + > + status = packet_reader_read(); > + if (status != PACKET_READ_FLUSH) > + die(_("git archive: expected a flush")); > } > > - status = packet_reader_read(); > - if (status != PACKET_READ_FLUSH) > - die(_("git archive: expected a flush")); > - > /* Now, start reading from fd[0] and spit it out to stdout */ > rv = recv_sideband("archive", fd[0], 1); > rv |= transport_disconnect(transport); > diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c > index 25d911635..534e8fd56 100644 > --- a/builtin/upload-archive.c > +++ b/builtin/upload-archive.c > @@ -5,6 +5,7 @@ > #include "builtin.h" > #include "archive.h" > #include "pkt-line.h" > +#include "protocol.h" > #include "sideband.h" > #include "run-command.h" > #include "argv-array.h" > @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band) >
RE: [Question] Signature calculation ignoring parts of binary files
On September 12, 2018 4:54 PM, I wrote: > On September 12, 2018 4:48 PM, Johannes Sixt wrote: > > Am 12.09.18 um 21:16 schrieb Randall S. Becker: > > > I feel really bad asking this, and I should know the answer, and yet. > > > > > > I have a binary file that needs to go into a repo intact (unchanged). > > > I also have a program that interprets the contents, like a textconv, > > > that can output the relevant portions of the file in whatever format > > > I like - used for diff typically, dumps in 1K chunks by file section. > > > What I'm looking for is to have the SHA1 signature calculated with > > > just the relevant portions of the file so that two actually > > > different files will be considered the same by git during a commit > > > or status. In real terms, I'm trying to ignore the Creator metadata > > > of a JPG because it is mutable and irrelevant to my repo contents. > > > > > > I'm sorry to ask, but I thought this was in .gitattributes but I > > > can't confirm the SHA1 behaviour. > > > > You are looking for a clean filter. See the 'filter' attribute in > > gitattributes(5). > > Your clean filter program or script should strip the unwanted metadata > > or set it to a constant known-good value. > > > > (You shouldn't need a smudge filter.) > > > > -- Hannes > > Thanks Hannes. I thought about the clean filter, but I don't actually want to > modify the file when going into git, just for SHA calculation. I need to be > able > to keep some origin metadata that might change with subsequent copies, so > just cleaning the origin is not going to work - actually knowing the original > author is important to our process. My objective is to keep the original file > 100% exact as supplied and then ignore any changes to the metadata that I > don't care about (like Creator) if the remainder of the file is the same. I had a thought that might be workable, opinions are welcome on this. The commit of my rather weird project is done by a script so I have flexibility in my approach. What I could do is set up a diff textconv configuration so that the text diff of the two JPG files will show no differences if the immutable fields and the image are the same. I can then trigger a git add and git commit for only those files where git diff reports no differences. That way the actual original file is stored in git with 100% fidelity (no cleaning). It's not as elegant as I'd like, but it does solve what I'm trying to do. Does this sound reasonable and/or is there a better way? Cheers, Randall -- Brief whoami: NonStop developer since approximately 2112884442 UNIX developer since approximately 421664400 -- In my real life, I talk too much.
Re: [PATCH 1/1] contrib: add coverage-diff script
"Derrick Stolee via GitGitGadget" writes: > contrib/coverage-diff.sh | 70 > 1 file changed, 70 insertions(+) > create mode 100755 contrib/coverage-diff.sh I fully appreciate the motivation. But it is a bit sad that this begins with "#!/bin/bash" but it seems that the script is full of bash-isms. I haven't gone through the script to see if these are inevitable or gratuitous yet, but I'd assume it made it easier for you to write it to step outside the pure POSIX shell? > +V1=$1 > +V2=$2 > + > +diff-lines() { Being able to use '-' in identifier is probably a bash-ism that you did not have to use. > +local path= > +local line= > +while read; do Being able to omit variable to be read into and can use the implicit variable $REPLY also is. > + esc=$'\033' > + if [[ $REPLY =~ ---\ (a/)?.* ]]; then > + continue > + elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then > + path=${BASH_REMATCH[2]} OK, it probably is easier to write in bash than using expr if you want to do regexp. Where do these escape code come from in "git diff" output, by the way? > + elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; > then > + line=${BASH_REMATCH[2]} > + elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then > + echo "$path:$line:$REPLY" > + if [[ ${BASH_REMATCH[2]} != - ]]; then > + ((line++)) > + fi > + fi > +done > +} > + > +git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt Hmph, not git diff --name-only "$V1" "$V2" -- "*.c" Do we (or do we not) want "--no-renames"? > +for file in $(cat files.txt) > +do > + hash_file=${file//\//\#} > + > + git diff $V1 $V2 -- $file \ > + | diff-lines \ > + | grep ":+" \ > + >"diff_file.txt" Style: cmd1 | cmd2 | cmd3 >output is easier to read without backslashes. > + cat diff_file.txt \ > + | sed -E 's/:/ /g' \ > + | awk '{print $2}' \ > + | sort \ > + >new_lines.txt > + > + cat "$hash_file.gcov" \ > + | grep \#\#\#\#\# \ > + | sed 's/#: //g' \ > + | sed 's/\:/ /g' \ > + | awk '{print $1}' \ > + | sort \ > + >uncovered_lines.txt OK, so we assume that we have run coverage in $V2 checkout so that we can pick up the postimage line numbers in "diff $V1 $V2" and find corresponding record in .gcov file in the filesystem. I did not realize the significance of 'topic' being the later argument to the script in this part After creating the coverage statistics at a version (say, 'topic') you can then run contrib/coverage-diff.sh base topic of your description before I see this implementation. Also the comment at the beginning # Usage: 'contrib/coverage-diff.sh # Outputs a list of new lines in version2 compared to version1 that are # not covered by the test suite. Assumes you ran # 'make coverage-test coverage-report' from root first, so .gcov files exist. would want to make it clear that we want coverage run from root for version2 before using this script. > + comm -12 uncovered_lines.txt new_lines.txt \ > + | sed -e 's/$/\)/' \ > + | sed -e 's/^/\t/' \ > + >uncovered_new_lines.txt > + > + grep -q '[^[:space:]]' < uncovered_new_lines.txt && \ Style: when you end a line with && (or || or | for that matter), the shell knows that you have not finished speaking, and will wait to listen to you to finish the sentence. No need for backslash there. > + echo $file && \ > + git blame -c $file \ > + | grep -f uncovered_new_lines.txt > + > + rm -f diff_file.txt new_lines.txt \ > + uncovered_lines.txt uncovered_new_lines.txt > +done > + > +rm -rf files.txt
Re: [PATCH 1/3] archive: use packet_reader for communications
On Tue, Sep 11, 2018 at 10:35 PM Josh Steadmon wrote: > > Using packet_reader will simplify version detection and capability > handling, which will make implementation of protocol v2 support in > git-archive easier. > > Signed-off-by: Josh Steadmon Reviewed-by: Stefan Beller
Re: [PATCH 1/2] fetch-object: provide only one fetching function
Jonathan Tan writes: > fetch-object.h currently provides two functions (fetch_object() and > fetch_objects()) that could be replaced by a single, more flexible > function. Replace those two functions with the more flexible function. The latter half of the first sentence and the second sentence are pretty-much repeating the same thing. I think you wanted to justify two changes: (1) There are two helpers. fetch_object() fetches a single object from a given remote; fetch_objects() fetches a set of objects from a given remote. It is not like the former is implemented as a specialization of the latter (i.e. passing length=1 array), and it is not like the former is optimized specially for a single object fetch that cannot be made into such a thin wrapper. It is not like the latter is implemented as a repeated call to the former in a loop, either. There is no justification to keep two independent implementations, and using a single helper that can be used by all the callers of these two would make sense. (2) The variant that fetches multiple objects takes oid_array. To be used by all the existing callers of these two helpers, it is better to use a different calling convention, namely, a array[] and its length passed as separate parameters. Instead of explaining why the new convention is better to justify (2), the above three lines handwave by saying "more flexible" twice. We should do better. fetch-object: unify fetch_object[s] functions There are fetch_object() and fetch_objects() helpers in fetch-object.h; as the latter takes "struct oid_array", the former cannot be made into a thin wrapper around the latter without an extra allocation and set-up cost. Update fetch_objects() to take an array of "struct object_id" and number of elements in it as separate parameters, remove fetch_object(), and adjust all existing callers of these functions to use the new fetch_objects(). perhaps? > diff --git a/sha1-file.c b/sha1-file.c > index 97b742384..2edf4564f 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -1317,7 +1317,7 @@ int oid_object_info_extended(struct repository *r, > const struct object_id *oid, >* TODO Pass a repository struct through fetch_object, >* such that arbitrary repositories work. >*/ > - fetch_object(repository_format_partial_clone, > real->hash); > + fetch_objects(repository_format_partial_clone, real, 1); > already_retried = 1; > continue; > } > diff --git a/unpack-trees.c b/unpack-trees.c > index f25089b87..82a83dbc6 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -392,7 +392,7 @@ static int check_updates(struct unpack_trees_options *o) > } > if (to_fetch.nr) > fetch_objects(repository_format_partial_clone, > - _fetch); > + to_fetch.oid, to_fetch.nr); > fetch_if_missing = fetch_if_missing_store; > oid_array_clear(_fetch); > } Changes to these two callers do explain why is an interface that is easier to use. Those who already have an oid_array can pass its fields as separate parameters without too much trouble, and those who only have an oid (or an array of oid and knows how many are in the array) do not have to wrap them into an oid_array. Thanks.
Re: [PATCH 1/1] commit-reach: properly peel tags
On Wed, Sep 12, 2018 at 02:23:42PM -0700, Junio C Hamano wrote: > > diff --git a/commit-reach.c b/commit-reach.c > > index 86715c103c..6de72c6e03 100644 > > --- a/commit-reach.c > > +++ b/commit-reach.c > > @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array > > *from, > > { > > struct commit **list = NULL; > > int i; > > + int nr_commits; > > int result = 1; > > > > ALLOC_ARRAY(list, from->nr); > > + nr_commits = 0; > > for (i = 0; i < from->nr; i++) { > > - list[i] = (struct commit *)from->objects[i].item; > > + struct object *from_one = from->objects[i].item; > > > > - if (parse_commit(list[i]) || > > - list[i]->generation < min_generation) > > - return 0; > > + from_one = deref_tag(the_repository, from_one, > > +"a from object", 0); > > + if (!from_one || from_one->type != OBJ_COMMIT) { > > + from->objects[i].item->flags |= assign_flag; > > I wondered why this is not futzing with "from_one->flags"; by going > back to the original from->objects[] array, the code is setting the > flags on the original tag object and not the non-commit object that > was pointed at by the tag. Note that from_one may even be NULL. > > + continue; > > + } > > + > > + list[nr_commits] = (struct commit *)from_one; > > + if (parse_commit(list[nr_commits]) || > > + list[nr_commits]->generation < min_generation) > > + return 0; /* is this a leak? */ > > + nr_commits++; > > } > > In the original code, the flags bits were left unchanged if the loop > terminated by hitting a commit whose generation is too young (or > parse_commit() returns non-zero). With this updated code, flags bit > can be modified before the code notices the situation and leave the > function, bypassing the "cleanup" we see below that clears the > "assign_flag" bits. > > Would it be a problem that we return early without cleaning up? > > Even if we do not call this early return, the assign_flag bits added > to the original tag in from->objects[i].item won't be cleaned in > this new code, as "cleanup:" section now loops over the list[] that > omits the object whose flags was smudged above before the "continue". > > Would it be a problem that we leave the assign_flags without > cleaning up? Yeah, I hadn't thought about the bit cleanup when making my original suggestion. In the original code (before 4fbcca4eff), I think we did set flags as we iterated through the loop, and we could still do an early return when we hit "!reachable(...)". But I don't see any cleanup of assign_flag there at all. So I guess I'm pretty confused about what the semantics are supposed to be. -Peff
Re: [RFC PATCH 1/1] builtin/remote: quote remote name on error to display empty name
Shulhan writes: > Rationale: consistent error format You can and should drop this line. > When adding new remote name with empty string, git will print the > following error message, > > fatal: '' is not a valid remote name\n > > But when removing remote name with empty string as input, git did not > print the empty string with quote, "git shows the empty string without quote" would be easier to understand which part of the behaviour you are disturbed by. > fatal: No such remote: \n After stating the above observation, you can say something like To make these error messages consistent, quote the name of the remote that we tried and failed to find. at the end. > --- Needs sign-off. See Documentation/SubmittingPatches. > builtin/remote.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/builtin/remote.c b/builtin/remote.c > index 40c6f8a1b..f7edf7f2c 100644 > --- a/builtin/remote.c > +++ b/builtin/remote.c > @@ -626,7 +626,7 @@ static int mv(int argc, const char **argv) > > oldremote = remote_get(rename.old_name); > if (!remote_is_configured(oldremote, 1)) > - die(_("No such remote: %s"), rename.old_name); > + die(_("No such remote: '%s'"), rename.old_name); > > if (!strcmp(rename.old_name, rename.new_name) && oldremote->origin != > REMOTE_CONFIG) > return migrate_file(oldremote); > @@ -762,7 +762,7 @@ static int rm(int argc, const char **argv) > > remote = remote_get(argv[1]); > if (!remote_is_configured(remote, 1)) > - die(_("No such remote: %s"), argv[1]); > + die(_("No such remote: '%s'"), argv[1]); > > known_remotes.to_delete = remote; > for_each_remote(add_known_remote, _remotes); > @@ -861,7 +861,7 @@ static int get_remote_ref_states(const char *name, > > states->remote = remote_get(name); > if (!states->remote) > - return error(_("No such remote: %s"), name); > + return error(_("No such remote: '%s'"), name); > > read_branches();
[PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support
Teach get_index_format_default() to support running the test suite with specific index versions. In particular, this enables the test suite to be run using index version 4 which is not the default so gets less testing. Signed-off-by: Ben Peart --- Notes: Base Ref: v2.19.0 Web-Diff: https://github.com/benpeart/git/commit/52e733e2ce Checkout: git fetch https://github.com/benpeart/git git-test-index-version-v1 && git checkout 52e733e2ce read-cache.c | 47 +-- t/README | 6 +- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/read-cache.c b/read-cache.c index 7b1354d759..d140ce9989 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1570,26 +1570,45 @@ static unsigned int get_index_format_default(void) char *envversion = getenv("GIT_INDEX_VERSION"); char *endp; int value; - unsigned int version = INDEX_FORMAT_DEFAULT; + unsigned int version = -1; + + if (envversion) { + version = strtoul(envversion, , 10); + if (*endp || + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { + warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n" + "Using version %i"), INDEX_FORMAT_DEFAULT); + version = INDEX_FORMAT_DEFAULT; + } + } - if (!envversion) { - if (!git_config_get_int("index.version", )) + if (version == -1) { + if (!git_config_get_int("index.version", )) { version = value; - if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { - warning(_("index.version set, but the value is invalid.\n" - "Using version %i"), INDEX_FORMAT_DEFAULT); - return INDEX_FORMAT_DEFAULT; + if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { + warning(_("index.version set, but the value is invalid.\n" + "Using version %i"), INDEX_FORMAT_DEFAULT); + version = INDEX_FORMAT_DEFAULT; + } } - return version; } - version = strtoul(envversion, , 10); - if (*endp || - version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { - warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n" - "Using version %i"), INDEX_FORMAT_DEFAULT); - version = INDEX_FORMAT_DEFAULT; + if (version == -1) { + envversion = getenv("GIT_TEST_INDEX_VERSION"); + if (envversion) { + version = strtoul(envversion, , 10); + if (*endp || + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) { + warning(_("GIT_TEST_INDEX_VERSION set, but the value is invalid.\n" + "Using version %i"), INDEX_FORMAT_DEFAULT); + version = INDEX_FORMAT_DEFAULT; + } + } } + + if (version == -1) + version = INDEX_FORMAT_DEFAULT; + return version; } diff --git a/t/README b/t/README index 9028b47d92..f872638a78 100644 --- a/t/README +++ b/t/README @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the object size is over 2GB. This variable forces the code path on any object larger than bytes. -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code path where deltas larger than this limit require extra memory allocation for bookkeeping. +GIT_TEST_INDEX_VERSION= exercises the index read/write code path +for the index version specified. Can be set to any valid version +but the non-default version 4 is probably the most beneficial. + Naming Tests base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670 -- 2.18.0.windows.1
Re: [PATCH 1/1] commit-reach: properly peel tags
"Derrick Stolee via GitGitGadget" writes: > diff --git a/commit-reach.c b/commit-reach.c > index 86715c103c..6de72c6e03 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array > *from, > { > struct commit **list = NULL; > int i; > + int nr_commits; > int result = 1; > > ALLOC_ARRAY(list, from->nr); > + nr_commits = 0; > for (i = 0; i < from->nr; i++) { > - list[i] = (struct commit *)from->objects[i].item; > + struct object *from_one = from->objects[i].item; > > - if (parse_commit(list[i]) || > - list[i]->generation < min_generation) > - return 0; > + from_one = deref_tag(the_repository, from_one, > + "a from object", 0); > + if (!from_one || from_one->type != OBJ_COMMIT) { > + from->objects[i].item->flags |= assign_flag; I wondered why this is not futzing with "from_one->flags"; by going back to the original from->objects[] array, the code is setting the flags on the original tag object and not the non-commit object that was pointed at by the tag. > + continue; > + } > + > + list[nr_commits] = (struct commit *)from_one; > + if (parse_commit(list[nr_commits]) || > + list[nr_commits]->generation < min_generation) > + return 0; /* is this a leak? */ > + nr_commits++; > } In the original code, the flags bits were left unchanged if the loop terminated by hitting a commit whose generation is too young (or parse_commit() returns non-zero). With this updated code, flags bit can be modified before the code notices the situation and leave the function, bypassing the "cleanup" we see below that clears the "assign_flag" bits. Would it be a problem that we return early without cleaning up? Even if we do not call this early return, the assign_flag bits added to the original tag in from->objects[i].item won't be cleaned in this new code, as "cleanup:" section now loops over the list[] that omits the object whose flags was smudged above before the "continue". Would it be a problem that we leave the assign_flags without cleaning up? > - QSORT(list, from->nr, compare_commits_by_gen); > + QSORT(list, nr_commits, compare_commits_by_gen); > > - for (i = 0; i < from->nr; i++) { > + for (i = 0; i < nr_commits; i++) { > /* DFS from list[i] */ > struct commit_list *stack = NULL; > > @@ -600,7 +611,7 @@ int can_all_from_reach_with_flag(struct object_array > *from, > } > > cleanup: > - for (i = 0; i < from->nr; i++) { > + for (i = 0; i < nr_commits; i++) { > clear_commit_marks(list[i], RESULT); > clear_commit_marks(list[i], assign_flag); > }
[PATCH] sequencer: fix --allow-empty-message behavior, make it smarter
In commit b00bf1c9a8dd ("git-rebase: make --allow-empty-message the default", 2018-06-27), several arguments were given for transplanting empty commits without halting and asking the user for confirmation on each commit. These arguments were incomplete because the logic clearly assumed the only cases under consideration were transplanting of commits with empty messages (see the comment about "There are two sources for commits with empty messages). It didn't discuss or even consider rewords, squashes, etc. where the user is explicitly asked for a new commit message and provides an empty one. (My bad, I totally should have thought about that at the time, but just didn't.) Rewords and squashes are significantly different, though, as described by SZEDER: Let's suppose you start an interactive rebase, choose a commit to squash, save the instruction sheet, rebase fires up your editor, and then you notice that you mistakenly chose the wrong commit to squash. What do you do, how do you abort? Before [that commit] you could clear the commit message, exit the editor, and then rebase would say "Aborting commit due to empty commit message.", and you get to run 'git rebase --abort', and start over. But [since that commit, ...] saving the commit message as is would let rebase continue and create a bunch of unnecessary objects, and then you would have to use the reflog to return to the pre-rebase state. Also, he states: The instructions in the commit message template, which is shown for 'reword' and 'squash', too, still say... # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. These are sound arguments that when editing commit messages during a sequencer operation, that if the commit message is empty then the operation should halt and ask the user to correct. The arguments in commit b00bf1c9a8dd (referenced above) still apply when transplanting previously created commits with empty commit messages, so the sequencer should not halt for those. Furthermore, all rationale so far applies equally for cherry-pick as for rebase. Therefore, make the code default to --allow-empty-message when transplanting an existing commit, and to default to halting when the user is asked to edit a commit message and provides an empty one -- for both rebase and cherry-pick. Signed-off-by: Elijah Newren --- This patch cleanly applies to both 2.19.0 and pu. There are some related code cleanups that I'd like to make, but doing that cleanup conflicts with the various rewrite-rebase-in-C topics sitting in pu; since those are fairly lengthy, I really don't want to cause problems there, but I think SZEDER really wants this 2.19.0 regression fix before 2.20.0 and thus before those other topics. sequencer.c | 4 ++-- t/t3404-rebase-interactive.sh | 7 +++ t/t3405-rebase-malformed.sh | 2 +- t/t3505-cherry-pick-empty.sh | 18 -- 4 files changed, 14 insertions(+), 17 deletions(-) diff --git a/sequencer.c b/sequencer.c index dc2c58d464..638ee151f2 100644 --- a/sequencer.c +++ b/sequencer.c @@ -899,7 +899,7 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, if ((flags & ALLOW_EMPTY)) argv_array_push(, "--allow-empty"); - if (opts->allow_empty_message) + if (!(flags & EDIT_MSG)) argv_array_push(, "--allow-empty-message"); if (cmd.err == -1) { @@ -1313,7 +1313,7 @@ static int try_to_commit(struct strbuf *msg, const char *author, if (cleanup != COMMIT_MSG_CLEANUP_NONE) strbuf_stripspace(msg, cleanup == COMMIT_MSG_CLEANUP_ALL); - if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) { + if ((flags & EDIT_MSG) && message_is_empty(msg, cleanup)) { res = 1; /* run 'git commit' to display error message */ goto out; } diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh index 86bba5ed7c..ff89b6341a 100755 --- a/t/t3404-rebase-interactive.sh +++ b/t/t3404-rebase-interactive.sh @@ -569,16 +569,15 @@ test_expect_success '--continue tries to commit, even for "edit"' ' ' test_expect_success 'aborted --continue does not squash commits after "edit"' ' - test_when_finished "git rebase --abort" && old=$(git rev-parse HEAD) && test_tick && set_fake_editor && FAKE_LINES="edit 1" git rebase -i HEAD^ && echo "edited again" > file7 && git add file7 && - echo all the things >>conflict && - test_must_fail git rebase --continue && - test $old = $(git rev-parse HEAD) + test_must_fail env FAKE_COMMIT_MESSAGE=" " git rebase --continue && + test $old = $(git rev-parse HEAD) && + git rebase --abort ' test_expect_success 'auto-amend only edited commits after "edit"' ' diff --git
RE: [Question] Signature calculation ignoring parts of binary files
> -Original Message- > From: git-ow...@vger.kernel.org On Behalf > Of Johannes Sixt > Sent: September 12, 2018 4:48 PM > To: Randall S. Becker > Cc: git@vger.kernel.org > Subject: Re: [Question] Signature calculation ignoring parts of binary files > > Am 12.09.18 um 21:16 schrieb Randall S. Becker: > > I feel really bad asking this, and I should know the answer, and yet. > > > > I have a binary file that needs to go into a repo intact (unchanged). > > I also have a program that interprets the contents, like a textconv, > > that can output the relevant portions of the file in whatever format I > > like - used for diff typically, dumps in 1K chunks by file section. > > What I'm looking for is to have the SHA1 signature calculated with > > just the relevant portions of the file so that two actually different > > files will be considered the same by git during a commit or status. In > > real terms, I'm trying to ignore the Creator metadata of a JPG because > > it is mutable and irrelevant to my repo contents. > > > > I'm sorry to ask, but I thought this was in .gitattributes but I can't > > confirm the SHA1 behaviour. > > You are looking for a clean filter. See the 'filter' attribute in > gitattributes(5). > Your clean filter program or script should strip the unwanted metadata or set > it to a constant known-good value. > > (You shouldn't need a smudge filter.) > > -- Hannes Thanks Hannes. I thought about the clean filter, but I don't actually want to modify the file when going into git, just for SHA calculation. I need to be able to keep some origin metadata that might change with subsequent copies, so just cleaning the origin is not going to work - actually knowing the original author is important to our process. My objective is to keep the original file 100% exact as supplied and then ignore any changes to the metadata that I don't care about (like Creator) if the remainder of the file is the same. Regards, Randall
Re: [Question] Signature calculation ignoring parts of binary files
Am 12.09.18 um 21:16 schrieb Randall S. Becker: I feel really bad asking this, and I should know the answer, and yet. I have a binary file that needs to go into a repo intact (unchanged). I also have a program that interprets the contents, like a textconv, that can output the relevant portions of the file in whatever format I like - used for diff typically, dumps in 1K chunks by file section. What I'm looking for is to have the SHA1 signature calculated with just the relevant portions of the file so that two actually different files will be considered the same by git during a commit or status. In real terms, I'm trying to ignore the Creator metadata of a JPG because it is mutable and irrelevant to my repo contents. I'm sorry to ask, but I thought this was in .gitattributes but I can't confirm the SHA1 behaviour. You are looking for a clean filter. See the 'filter' attribute in gitattributes(5). Your clean filter program or script should strip the unwanted metadata or set it to a constant known-good value. (You shouldn't need a smudge filter.) -- Hannes
Re: [PATCH] linear-assignment: fix potential out of bounds memory access
Thomas Gummerer writes: >> > I'm looking into why that fails. Also adding Dscho to Cc here as the >> > author of this code. >> >> The diff below seems to fix it. Not submitting this as a proper >> patch [...] > > I found the time to actually have a look at the paper, so here's a > proper patch: > > I'm still not entirely sure what the initial code tried to do here, It looks to me an attempt to optimize (instead of starting from a value that is too big to be minimum, pick the first value and start from there, and all the "found even smaller one, so let's replace" later would work the same way) that went wrong (just that the "first one" was written incorrectly), but it is not absolutely necessary to find out why the code was written in a particular way that happened to be buggy. > but I think staying as close as possible to the original is probably > our best option here, also for future readers of this code. Thanks for digging. > --- >8 --- > > Subject: [PATCH] linear-assignment: fix potential out of bounds memory access > > Currently the 'compute_assignment()' function can may read memory out > of bounds, even if used correctly. Namely this happens when we only > have one column. In that case we try to calculate the initial > minimum cost using '!j1' as column in the reduction transfer code. > That in turn causes us to try and get the cost from column 1 in the > cost matrix, which does not exist, and thus results in an out of > bounds memory read. This nicely explains what goes wrong. > Instead of trying to intialize the minimum cost from another column, > just set it to INT_MAX. This also matches what the example code in the > original paper for the algorithm [1] does (it initializes the value to > inf, for which INT_MAX is the closest match in C). Yeah, if we really want to avoid INT_MAX we could use another "have we found any value yet?" boolean variable, but the caller in get_correspondences() does not even worry about integer overflows when stuffing diffsize to the cost[] array, and the other possible value that can go to cost[] array is COST_MAX that is mere 65k, so it would be OK to use INT_MAX as sentinel here, I guess. > Note that the test only fails under valgrind on Linux, but the same > command has been reported to segfault on Mac OS. > > Also start from 0 in the loop, which matches what the example code in > the original paper does as well. Starting from 1 means we'd ignore > the first column during the reduction transfer phase. Note that in > the original paper the loop does start from 1, but the implementation > is in Pascal, where arrays are 1 indexed. > > [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path > algorithm for dense and sparse linear assignment > problems. Computing, 38(4), 325–340. > > Reported-by: ryenus > Helped-by: Derrick Stolee > Signed-off-by: Thomas Gummerer > --- > linear-assignment.c | 4 ++-- > t/t3206-range-diff.sh | 5 + > 2 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/linear-assignment.c b/linear-assignment.c > index 9b3e56e283..7700b80eeb 100644 > --- a/linear-assignment.c > +++ b/linear-assignment.c > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, > int *cost, > else if (j1 < -1) > row2column[i] = -2 - j1; > else { > - int min = COST(!j1, i) - v[!j1]; > - for (j = 1; j < column_count; j++) > + int min = INT_MAX; > + for (j = 0; j < column_count; j++) > if (j != j1 && min > COST(j, i) - v[j]) > min = COST(j, i) - v[j]; > v[j1] -= min; > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index 2237c7f4af..fb4c13a84a 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -142,4 +142,9 @@ test_expect_success 'changed message' ' > test_cmp expected actual > ' > > +test_expect_success 'no commits on one side' ' > + git commit --amend -m "new message" && > + git range-diff master HEAD@{1} HEAD > +' > + > test_done
Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines
"Derrick Stolee via GitGitGadget" writes: > There have been a few bugs in recent patches what would have been caught if > the test suite covered those blocks (including a few of mine). I want to > work towards a "sensible" amount of coverage on new topics. In my opinion, > this means that any logic should be covered, but the 'die()' blocks in error > cases do not need to be covered. Nice. > It is important to not measure the coverage of the codebase by what old code > is not covered. To help, I created the 'contrib/coverage-diff.sh' script. > After creating the coverage statistics at a version (say, 'topic') you can > then run > > contrib/coverage-diff.sh base topic > ... > Using this 'git blame' output, we can quickly inspect whether the uncovered > lines are appropriate. For instance: > ... > I used this approach for 'next' over 'master' and got a larger list, some of > which I have already submitted tests to increase coverage [2] or will be > covered by topics not in 'next' [3]. > > Thanks, -Stolee Thanks for working on this.
Re: [PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens
William Chargin writes: > While the `test_dir_is_empty` function appears correct in most normal > use cases, it can improperly pass if a directory contains a filename > with a newline, and can improperly fail if an empty directory looks like > an argument to `ls`. This patch changes the implementation to check that > the output of `ls -a` has at most two lines (for `.` and `..`), which > should be better behaved, and adds the `--` delimiter before the > directory name when invoking `ls`. AFIAK dot and dot-dot are allowed not to exist; "at most two" is not a good test. Quite honestly, our tests are still run inside a sort-of controlled environment, so if it _requires_ use of things we have avoided depending on, like "ls -A" and "xargs -0", or the fact that most filesystems always have "." and ".." even in an empty directory, in order to be resistant to funnily-named files like dot-LF-dot, I would say it is not worth worrying about these funny names--instead we can simply refrain from using such a pathological name, can't we? In other words, is there a real-world need in the context of our test suite for this change? Also, I find that its support for directories whose names begin with a dash red-herring. All the test scripts in our test suite knows that they can prefix "./" to avoid problems, i.e. test_dir_is_empty ./--wat So it appears that the only problematic case is when we create a directory, create a file or a directory whose name is dot-LF-dot and nothing else, and then do something that ought to cause that file to disappear, and make sure that the directory is empty, e.g. mkdir empty && echo foo >"empty/$dotLFdot" && git add "empty/$dotLFdot" && git reset --hard && test_dir_is_empty empty We do want to make sure funny names can be added with "git add" and "git reset --hard" to HEAD that lacked those paths with funny names to remove them correctly. But the funny names used in such a test do not have to be $dotLFdot; you can use "${dotLFdot}X" instead in the above and can ensure whatever the original test wanted to ensure. So...
Re: [PATCH 1/1] commit-reach: properly peel tags
On Wed, Sep 12, 2018 at 07:22:56AM -0700, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee > > The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e > "commit-reach: make can_all_from_reach... linear" but incorrectly > assumed that all objects provided were commits. During a fetch > negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags > to the 'from' array. The current code creates a segfault. > > Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach' > and add a test in t6600-test-reach.sh that demonstrates this segfault. Thanks, that makes a lot of sense for reproducing. I almost wonder if the whole X_array of commits in test-reach could just go away, and we'd feed whatever list of objects the caller happens to send. That may make it simpler to include non-commit objects in a variety of tests. That said, I didn't look closely at other fallout in the program from that, so I'll defer to your judgement. > Correct the issue by peeling tags when investigating the initial list > of objects in the 'from' array. > > Signed-off-by: Jeff King I'm not sure if this should just be Reported-by, since I don't know that I actually contributed any code. ;) But for anything I might have contributed, certainly you have my signoff. > for (i = 0; i < from->nr; i++) { > - list[i] = (struct commit *)from->objects[i].item; > + struct object *from_one = from->objects[i].item; > > - if (parse_commit(list[i]) || > - list[i]->generation < min_generation) > - return 0; > + from_one = deref_tag(the_repository, from_one, > + "a from object", 0); > + if (!from_one || from_one->type != OBJ_COMMIT) { > + from->objects[i].item->flags |= assign_flag; > + continue; > + } I didn't resurrect the comment from this conditional that was in the original code (mostly because I wasn't sure if the reasoning was still entirely valid, or what setting the flag here actually means). But it's probably worth saying something here to explain why it's OK to punt on this case, and what it means to just set the flag. > [...] The rest of the patch looks sane to me. I didn't go through the trouble to reproduce the segfault with the test, but it sounds like you did. -Peff
Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines
On 9/12/2018 12:45 PM, Derrick Stolee via GitGitGadget wrote: We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report Very nice, I was unaware of the coverage test make targets. I like the new report; it makes it easier to verify any new changes are actually tested. 4. The lines in read-cache.c are part of a new block for the condition "if (expand_name_field)" as part of an optimization. These lines should probably be covered before that series is merged to 'next'. I understand that Ben and Duy are continuing work in this direction [1]. This code is only exercised when the index format is V4 but the default is version 2/3 [1]. To enable the test suite to use version 4 and test those code paths will require the addition of a new GIT_TEST_INDEX_VERSION environment variable. I'll add that to my TODO list. [1] https://git-scm.com/docs/git/2.1.0
[PATCH v1 1/1] Make git_check_attr() a void function
From: Torsten Bögershausen git_check_attr() returns always 0. Remove all the error handling code of the callers, which is never executed. Change git_check_attr() to be a void function. Signed-off-by: Torsten Bögershausen --- archive.c | 3 ++- attr.c | 8 +++- attr.h | 4 ++-- builtin/check-attr.c | 3 +-- builtin/pack-objects.c | 3 +-- convert.c | 42 ++-- ll-merge.c | 16 +++ userdiff.c | 3 +-- ws.c | 44 +++--- 9 files changed, 57 insertions(+), 69 deletions(-) diff --git a/archive.c b/archive.c index 0a07b140fe..c1870105eb 100644 --- a/archive.c +++ b/archive.c @@ -110,7 +110,8 @@ static const struct attr_check *get_archive_attrs(struct index_state *istate, static struct attr_check *check; if (!check) check = attr_check_initl("export-ignore", "export-subst", NULL); - return git_check_attr(istate, path, check) ? NULL : check; + git_check_attr(istate, path, check); + return check; } static int check_attr_export_ignore(const struct attr_check *check) diff --git a/attr.c b/attr.c index 98e4953f6e..60d284796d 100644 --- a/attr.c +++ b/attr.c @@ -1143,9 +1143,9 @@ static void collect_some_attrs(const struct index_state *istate, fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem); } -int git_check_attr(const struct index_state *istate, - const char *path, - struct attr_check *check) +void git_check_attr(const struct index_state *istate, + const char *path, + struct attr_check *check) { int i; @@ -1158,8 +1158,6 @@ int git_check_attr(const struct index_state *istate, value = ATTR__UNSET; check->items[i].value = value; } - - return 0; } void git_all_attrs(const struct index_state *istate, diff --git a/attr.h b/attr.h index 2be86db36e..b0378bfe5f 100644 --- a/attr.h +++ b/attr.h @@ -63,8 +63,8 @@ void attr_check_free(struct attr_check *check); */ const char *git_attr_name(const struct git_attr *); -int git_check_attr(const struct index_state *istate, - const char *path, struct attr_check *check); +void git_check_attr(const struct index_state *istate, + const char *path, struct attr_check *check); /* * Retrieve all attributes that apply to the specified path. diff --git a/builtin/check-attr.c b/builtin/check-attr.c index c05573ff9c..30a2f84274 100644 --- a/builtin/check-attr.c +++ b/builtin/check-attr.c @@ -65,8 +65,7 @@ static void check_attr(const char *prefix, if (collect_all) { git_all_attrs(_index, full_path, check); } else { - if (git_check_attr(_index, full_path, check)) - die("git_check_attr died"); + git_check_attr(_index, full_path, check); } output_attr(check, file); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d1144a8f7e..eb71dab5be 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -951,8 +951,7 @@ static int no_try_delta(const char *path) if (!check) check = attr_check_initl("delta", NULL); - if (git_check_attr(_index, path, check)) - return 0; + git_check_attr(_index, path, check); if (ATTR_FALSE(check->items[0].value)) return 1; return 0; diff --git a/convert.c b/convert.c index 6057f1f580..e0848226d2 100644 --- a/convert.c +++ b/convert.c @@ -1297,6 +1297,7 @@ static void convert_attrs(const struct index_state *istate, struct conv_attrs *ca, const char *path) { static struct attr_check *check; + struct attr_check_item *ccheck = NULL; if (!check) { check = attr_check_initl("crlf", "ident", "filter", @@ -1306,30 +1307,25 @@ static void convert_attrs(const struct index_state *istate, git_config(read_convert_config, NULL); } - if (!git_check_attr(istate, path, check)) { - struct attr_check_item *ccheck = check->items; - ca->crlf_action = git_path_check_crlf(ccheck + 4); - if (ca->crlf_action == CRLF_UNDEFINED) - ca->crlf_action = git_path_check_crlf(ccheck + 0); - ca->ident = git_path_check_ident(ccheck + 1); - ca->drv = git_path_check_convert(ccheck + 2); - if (ca->crlf_action != CRLF_BINARY) { - enum eol eol_attr = git_path_check_eol(ccheck + 3); - if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF) - ca->crlf_action = CRLF_AUTO_INPUT; - else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF) -
Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
Stefan Beller writes: > For Gerrit users that use submodules the invocation of fetch without a > branch is their main use case. That's way under explains this commit. It is totally unclear how that statement of fact relates to the problem this patch is trying to address; it does not even make it clear what problem is being addressed by the patch. > > Signed-off-by: Stefan Beller > --- > builtin/fetch.c | 5 - > t/t5526-fetch-submodules.sh | 2 +- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 95c44bf6ffa..ea6ecd123e7 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, > const char *remote_name, > rc |= update_local_ref(ref, what, rm, , > summary_width); > free(ref); > - } else > + } else { > + check_for_new_submodule_commits(>old_oid); This happens when there is no "ref", which is set only when rm->peer_ref exists, which is set only when we are using remote tracking branch (or more generally storing the fetched rev somewhere in our refs/ hierarchy), e.g. the rev is recorded only in FETCH_HEAD. What does rm->old_oid have in such a case? Is this the tip of the superproject history we just fetched? When we keep record of what we saw in the previous attempt to fetch, we can tell "we have seen their history up to this old commit before, and now we fetched their history up to this new commit" and the question "during that time, which submodules have been modified in the history of the superproject" becomes answerable. When we are not keeping the record of previous fetch, how would we answer that question without going through the whole history? The answer is that check-for-new does not even do the "old branch tip was X and new branch tip is Y, so we can look only at X..Y"; it only cares about the new branch tip of the superproject, and excludes the existing tips of all branches in the superproject (i.e. computing something akin to "Y --not --all" instead of "X..Y"). So, I guess this is probably reasonable. But does the call to "check-for-new submodule" need to be unconditional? In this codepath, do we know when we are not doing a recursive fetch in a superproject? If so, perhaps we can omit the cost of going through all the refs to populate ref_tips_before_fetch array in such a case. > format_display(, '*', > *kind ? kind : "branch", NULL, > *what ? what : "HEAD", > "FETCH_HEAD", summary_width); > + } > + > if (note.len) { > if (verbosity >= 0 && !shown_url) { > fprintf(stderr, _("From %.*s\n"), > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index af12c50e7dd..a509eabb044 100755 > --- a/t/t5526-fetch-submodules.sh > +++ b/t/t5526-fetch-submodules.sh > @@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when > they are not reachable" ' > git update-ref refs/changes/2 $D && > ( > cd downstream && > - git fetch --recurse-submodules --recurse-submodules-default > on-demand origin refs/changes/2:refs/heads/my_branch && > + git fetch --recurse-submodules origin refs/changes/2 && > git -C submodule cat-file -t $C && > git checkout --recurse-submodules FETCH_HEAD > )
[Question] Signature calculation ignoring parts of binary files
I feel really bad asking this, and I should know the answer, and yet. I have a binary file that needs to go into a repo intact (unchanged). I also have a program that interprets the contents, like a textconv, that can output the relevant portions of the file in whatever format I like - used for diff typically, dumps in 1K chunks by file section. What I'm looking for is to have the SHA1 signature calculated with just the relevant portions of the file so that two actually different files will be considered the same by git during a commit or status. In real terms, I'm trying to ignore the Creator metadata of a JPG because it is mutable and irrelevant to my repo contents. I'm sorry to ask, but I thought this was in .gitattributes but I can't confirm the SHA1 behaviour. Sheepishly, Randall -- Brief whoami: NonStop developer since approximately 2112884442 UNIX developer since approximately 421664400 -- In my real life, I talk too much.
Re: [PATCH 0/1] contrib: Add script to show uncovered "new" lines
On 9/12/2018 12:45 PM, Derrick Stolee via GitGitGadget wrote: For example, I ran this against the 'jch' branch (d3c0046) versus 'next' (dd90340) As another example, I ran this against the 'pu' branch (4c416a53) versus 'jch' (d3c0046) and got the following output, submitted here without commentary: builtin/bisect--helper.c 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 43) free((void *) terms->term_good); 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 162) if (get_oid_commit(commit, )) 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 163) return error(_("'%s' is not a valid commit"), commit); 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 164) strbuf_addstr(, commit); 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 172) error(_("Could not check out original HEAD '%s'. Try " 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 174) strbuf_release(); 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 175) argv_array_clear(); 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 176) return -1; 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 215) error(_("Bad bisect_write argument: %s"), state); 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 216) goto fail; 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 220) error(_("couldn't get the oid of the rev '%s'"), rev); 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 221) goto fail; 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 226) goto fail; 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 230) error_errno(_("couldn't open the file '%s'"), git_path_bisect_log()); 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 231) goto fail; 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 242)fail: 0b1f0fd910c (Pranit Bauva 2017-10-27 15:06:37 + 243) retval = -1; a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 323) yesno = git_prompt(_("Are you sure [Y/n]? "), PROMPT_ECHO); a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 324) if (starts_with(yesno, "N") || starts_with(yesno, "n")) a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 327) goto finish; a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 336) error(_("You need to start by \"git bisect start\". You " a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 341) goto fail; a919f328ba3 (Pranit Bauva 2017-10-27 15:06:37 + 345)fail: 35f7ca528ae (Pranit Bauva 2017-10-27 15:06:37 + 387) return error(_("--bisect-term requires exactly one argument")); 35f7ca528ae (Pranit Bauva 2017-10-27 15:06:37 + 400) error(_("BUG: invalid argument %s for 'git bisect terms'.\n" 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 416) return -1; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 419) goto fail; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 423) goto fail; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 427)fail: 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 428) retval = -1; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 466) no_checkout = 1; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 488) !one_of(arg, "--term-good", "--term-bad", NULL)) { 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 489) return error(_("unrecognised option: '%s'"), arg); 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 523) if (get_oid("HEAD", _oid)) 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 524) return error(_("Bad HEAD - I need a HEAD")); 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 539) error(_("checking out '%s' failed. Try 'git " 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 559) return error(_("won't bisect on cg-seek'ed tree")); 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 562) return error(_("Bad HEAD - strange symbolic ref")); 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 570) return -1; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 588) goto fail; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 598) goto fail; 5dfeec316b8 (Pranit Bauva 2017-10-27 15:06:37 + 606) goto fail; 3d3237b0e6b (Pranit Bauva 2017-10-27 15:06:37 + 686) return error(_("--bisect-reset requires either no argument or a commit")); 0b1f0fd910c
Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
On Wed, Sep 12, 2018 at 11:18 AM Junio C Hamano wrote: > > Stefan Beller writes: > > > We can string_list_insert() to maintain sorted-ness of the > > list as we find new items, or we can string_list_append() to > > build an unsorted list and sort it at the end just once. > > > > To pick which one is more appropriate, we notice the fact > > that we discover new items more or less in the already > > sorted order. That makes "append then sort" more > > appropriate. > > Sorry, but I still do not get the math you are implying in the > second paragraph. Are you saying that append-then-sort is efficient > when items being appended is already sorted? That depends on the > sorting algorithm used, so the logic is incomplete unless you say > "given that we use X for sorting,...", I think. > > Do we really discover new items in sorted order, by the way? In a > single diff invocation made inside collect_changed_submodules() for > one commit in the superproject's history, we will grab changed paths > in the pathname order (i.e. sorted); if the superproject's tip commit > touches the submodules at paths A and Z, we will discover these two > paths in sorted order. > > But because we are walking the superproject's history to collect all > paths that have been affected in that function, and repeatedly > calling diff as we discover commit in the superproject's history, I > am not sure how well the resulting set of paths would be sorted. > > The tip commit in superproject's history may have modified the > submodule at path X, the parent of that commit may have touched the > submodule at path M, and its parent may have touched the submodule > at path A. Don't we end up grabbing these paths in that discoverd > order, i.e. X, M and A? That is true. > > I still think changing it from "insert as we find an item, keeping > the list sorted" to "append all and then sort before we start > looking things up from the result" makes sense, but I do not think > the "we find things in sorted order" is either true, or it would > affect the choice between the two. A justification to choose the > latter I can think of that makes sense is that we don't have to pay > cost to keep the list sorted while building it because we do not do > any look-up while building the list. ok. Thanks, Stefan
Re: [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched
Stefan Beller writes: > Retry fetching a submodule if the object id that the superproject points > to, cannot be found. By object name? By attempting to fetch all refs? Or by doing something else? Fetching by the exact object name is the most efficient approach, but the server side may not be prepared to serve such a request, and that is why spelling it out here would help the readers. > This doesn't support fetching to FETCH_HEAD yet, but only into a local > branch. It is not clear if this sentence is talking about the fetch done at the superproject level, or what happens in a submodule repository when this "retrying" happens. Assuming that it is the former, perhaps This retrying does not happen when the "git fetch" done at the superproject is not storing the fetched results in remote tracking branches (i.e. instead just recording them to FETCH_HEAD) in this step. would help the readers understand what you are trying to say. > To make fetching into FETCH_HEAD work, we need some refactoring > in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits' > that is coming in the next patch. > > Signed-off-by: Stefan Beller > --- > builtin/fetch.c | 9 ++-- > submodule.c | 87 - > t/t5526-fetch-submodules.sh | 16 +++ > 3 files changed, 104 insertions(+), 8 deletions(-) > > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 61bec5d213d..95c44bf6ffa 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref, > what = _("[new ref]"); > } > > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) > check_for_new_submodule_commits(>new_oid); > r = s_update_ref(msg, ref, 0); > format_display(display, r ? '!' : '*', what, > @@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref, > strbuf_add_unique_abbrev(, >object.oid, > DEFAULT_ABBREV); > strbuf_addstr(, ".."); > strbuf_add_unique_abbrev(, >new_oid, > DEFAULT_ABBREV); > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) > check_for_new_submodule_commits(>new_oid); > r = s_update_ref("fast-forward", ref, 1); > format_display(display, r ? '!' : ' ', quickref.buf, > @@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref, > strbuf_add_unique_abbrev(, >object.oid, > DEFAULT_ABBREV); > strbuf_addstr(, "..."); > strbuf_add_unique_abbrev(, >new_oid, > DEFAULT_ABBREV); > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) > check_for_new_submodule_commits(>new_oid); > r = s_update_ref("forced-update", ref, 1); > format_display(display, r ? '!' : '+', quickref.buf, All of these used to react to any value set to recurse-submodules that is not off or on (i.e. on-demand, default, none), but now unless the value is explicitly set to off, we call into the check. It is not immediately clear how that change is linked to the retrying. "When set to 'on', we did not check for new commits, but now we do" can be read from the patch text but not the reasoning behind it. What was the reason why we didn't call "check-for-new" when recurse is set to "on"? Is it because "we are going to recurse anyway, so there is no need to check to decide if we need to fetch in submodule"? And the reason why we now need to call when we are set to recurse anyway is because check-for-new now learns much more than just "do we need to cd there and run git-fetch? yes/no?" The answers to these two questions would help readers in the log message. > diff --git a/submodule.c b/submodule.c > index 1e6781504f0..a75146e89cf 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch { > int result; > > struct string_list changed_submodule_names; > + struct string_list retry; > }; > -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, > STRING_LIST_INIT_DUP } > +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \ > + STRING_LIST_INIT_DUP, \ > + STRING_LIST_INIT_NODUP} > > static void calculate_changed_submodule_paths( > struct submodule_parallel_fetch *spf) > @@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp, > { > int ret = 0; > struct
[PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)
On 09/11, Thomas Gummerer wrote: > On 09/11, Thomas Gummerer wrote: > > I think you're on the right track here. I can not test this on Mac > > OS, but on Linux, the following fails when running the test under > > valgrind: > > > > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > > index 2237c7f4af..a8b0ef8c1d 100755 > > --- a/t/t3206-range-diff.sh > > +++ b/t/t3206-range-diff.sh > > @@ -142,4 +142,9 @@ test_expect_success 'changed message' ' > > test_cmp expected actual > > ' > > > > +test_expect_success 'amend and check' ' > > + git commit --amend -m "new message" && > > + git range-diff master HEAD@{1} HEAD > > +' > > + > > test_done > > > > valgrind gives me the following: > > > > ==18232== Invalid read of size 4 > > ==18232==at 0x34D7B5: compute_assignment (linear-assignment.c:54) > > ==18232==by 0x2A4253: get_correspondences (range-diff.c:245) > > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427) > > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108) > > ==18232==by 0x122698: run_builtin (git.c:418) > > ==18232==by 0x1229D8: handle_builtin (git.c:637) > > ==18232==by 0x122BCC: run_argv (git.c:689) > > ==18232==by 0x122D90: cmd_main (git.c:766) > > ==18232==by 0x1D55A3: main (common-main.c:45) > > ==18232== Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd > > ==18232==at 0x483777F: malloc (vg_replace_malloc.c:299) > > ==18232==by 0x3381B0: do_xmalloc (wrapper.c:60) > > ==18232==by 0x338283: xmalloc (wrapper.c:87) > > ==18232==by 0x2A3F8C: get_correspondences (range-diff.c:207) > > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427) > > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108) > > ==18232==by 0x122698: run_builtin (git.c:418) > > ==18232==by 0x1229D8: handle_builtin (git.c:637) > > ==18232==by 0x122BCC: run_argv (git.c:689) > > ==18232==by 0x122D90: cmd_main (git.c:766) > > ==18232==by 0x1D55A3: main (common-main.c:45) > > ==18232== > > > > I'm looking into why that fails. Also adding Dscho to Cc here as the > > author of this code. > > The diff below seems to fix it. Not submitting this as a proper > patch [...] I found the time to actually have a look at the paper, so here's a proper patch: I'm still not entirely sure what the initial code tried to do here, but I think staying as close as possible to the original is probably our best option here, also for future readers of this code. --- >8 --- Subject: [PATCH] linear-assignment: fix potential out of bounds memory access Currently the 'compute_assignment()' function can may read memory out of bounds, even if used correctly. Namely this happens when we only have one column. In that case we try to calculate the initial minimum cost using '!j1' as column in the reduction transfer code. That in turn causes us to try and get the cost from column 1 in the cost matrix, which does not exist, and thus results in an out of bounds memory read. Instead of trying to intialize the minimum cost from another column, just set it to INT_MAX. This also matches what the example code in the original paper for the algorithm [1] does (it initializes the value to inf, for which INT_MAX is the closest match in C). Note that the test only fails under valgrind on Linux, but the same command has been reported to segfault on Mac OS. Also start from 0 in the loop, which matches what the example code in the original paper does as well. Starting from 1 means we'd ignore the first column during the reduction transfer phase. Note that in the original paper the loop does start from 1, but the implementation is in Pascal, where arrays are 1 indexed. [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4), 325–340. Reported-by: ryenus Helped-by: Derrick Stolee Signed-off-by: Thomas Gummerer --- linear-assignment.c | 4 ++-- t/t3206-range-diff.sh | 5 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/linear-assignment.c b/linear-assignment.c index 9b3e56e283..7700b80eeb 100644 --- a/linear-assignment.c +++ b/linear-assignment.c @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int *cost, else if (j1 < -1) row2column[i] = -2 - j1; else { - int min = COST(!j1, i) - v[!j1]; - for (j = 1; j < column_count; j++) + int min = INT_MAX; + for (j = 0; j < column_count; j++) if (j != j1 && min > COST(j, i) - v[j]) min = COST(j, i) - v[j]; v[j1] -= min; diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 2237c7f4af..fb4c13a84a 100755 --- a/t/t3206-range-diff.sh +++
[PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens
While the `test_dir_is_empty` function appears correct in most normal use cases, it can improperly pass if a directory contains a filename with a newline, and can improperly fail if an empty directory looks like an argument to `ls`. This patch changes the implementation to check that the output of `ls -a` has at most two lines (for `.` and `..`), which should be better behaved, and adds the `--` delimiter before the directory name when invoking `ls`. The newly added unit test fails before this change and passes after it. Signed-off-by: William Chargin --- This patch depends on "t: factor out FUNNYNAMES as shared lazy prereq" (2018-08-06), which is now in master. I originally wrote this patch for the standalone Sharness library, but that library advises that such patches be sent to the Git mailing list first. Tested on GNU/Linux (Mint 18.2) and macOS (10.13). t/t-basic.sh| 43 + t/test-lib-functions.sh | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index 850f651e4e..a5c57c6aa5 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -821,6 +821,49 @@ test_expect_success 'tests clean up even on failures' " EOF " +test_expect_success FUNNYNAMES \ + 'test_dir_is_empty behaves even in pathological cases' " + run_sub_test_lib_test \ + dir-empty 'behavior of test_dir_is_empty' <<-\\EOF && + test_expect_success 'should pass with actually empty directory' ' + mkdir empty_dir && + test_dir_is_empty empty_dir + ' + test_expect_success 'should fail with a normal filename' ' + mkdir nonempty_dir && + >nonempty_dir/some_file && + ! test_dir_is_empty nonempty_dir + ' + test_expect_success 'should fail with dot-newline-dot filename' ' + mkdir pathological_dir && + >\"pathological_dir/. + .\" && + ! test_dir_is_empty pathological_dir + ' + test_expect_success 'should pass with an empty directory \"-l\"' ' + mkdir -- -l && + test_dir_is_empty -l && + rmdir -- -l + ' + test_expect_success 'should pass with an empty directory \"--wat\"' ' + mkdir -- --wat && + test_dir_is_empty --wat && + rmdir -- --wat + ' + test_done + EOF + check_sub_test_lib_test dir-empty <<-\\EOF + > ok 1 - should pass with actually empty directory + > ok 2 - should fail with a normal filename + > ok 3 - should fail with dot-newline-dot filename + > ok 4 - should pass with an empty directory \"-l\" + > ok 5 - should pass with an empty directory \"--wat\" + > # passed all 5 test(s) + > 1..5 + EOF +" + + # Basics of the basics diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 4207af4077..3df6b8027f 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -576,7 +576,7 @@ test_path_exists () { # Check if the directory exists and is empty as expected, barf otherwise. test_dir_is_empty () { test_path_is_dir "$1" && - if test -n "$(ls -a1 "$1" | egrep -v '^\.\.?$')" + if test "$(ls -a1 -- "$1" | wc -l)" -gt 2 then echo "Directory '$1' is not empty, it contains:" ls -la "$1" -- 2.18.0.549.gd66323a05
Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
Stefan Beller writes: > This patch started as a refactoring to make 'get_next_submodule' more > readable, but upon doing so, I realized that git-fetch actually doesn't > need to be run in the worktree. So let's run it in the git dir instead. It may be clear to the author but not clear to the reader of the above paragraph that "worktree", "fetch" and "git dir" all refer to the recursively invoked operation that updates the submodules repository. s/git-fetch/"git fetch" for the submodule/ should be sufficient to help the readers. > That should pave the way towards fetching submodules that are currently > not checked out. Very good. > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out) > +{ > + prepare_submodule_repo_env_no_git_dir(out); > + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT); > +} > + > /* Helper function to display the submodule header line prior to the full > * summary output. If it can locate the submodule objects directory it will > * attempt to lookup both the left and right commits and put them into the > @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct > submodule *submodule, > return spf->default_option; > } > > +static const char *get_submodule_git_dir(struct repository *r, const char > *path) > +{ > + struct repository subrepo; > + const char *ret; > + > + if (repo_submodule_init(, r, path)) { > + /* no entry in .gitmodules? */ > + struct strbuf gitdir = STRBUF_INIT; > + strbuf_repo_worktree_path(, r, "%s/.git", path); > + if (repo_init(, gitdir.buf, NULL)) { > + strbuf_release(); > + return NULL; > + } This is for the modern "absorbed" layout? Do we get a notice and encouragement to migrate from the historical layout, or there is no need to (e.g. the migration happens automatically in some other codepaths)? > + } > + > + ret = xstrdup(subrepo.gitdir); > + repo_clear(); > + > + return ret; > +} Returned value from this function is xstrdup()'ed so the caller owns, not borrows. There is no need to return "const char *" from this function. Also the caller needs to free it once done. > static int get_next_submodule(struct child_process *cp, > struct strbuf *err, void *data, void **task_cb) > { > @@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp, > struct submodule_parallel_fetch *spf = data; > > for (; spf->count < spf->r->index->cache_nr; spf->count++) { > - struct strbuf submodule_path = STRBUF_INIT; > - struct strbuf submodule_git_dir = STRBUF_INIT; > struct strbuf submodule_prefix = STRBUF_INIT; > const struct cache_entry *ce = spf->r->index->cache[spf->count]; > const char *git_dir, *default_argv; > @@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process > *cp, > continue; > } > > - strbuf_repo_worktree_path(_path, spf->r, "%s", > ce->name); > - strbuf_addf(_git_dir, "%s/.git", submodule_path.buf); > strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name); > - git_dir = read_gitfile(submodule_git_dir.buf); > - if (!git_dir) > - git_dir = submodule_git_dir.buf; > - if (is_directory(git_dir)) { In the old code, git_dir came from read_gitfile() which borrowed. > + git_dir = get_submodule_git_dir(spf->r, ce->name); In the new code, we own it, so we'd eventually need to get rid of it. How does it happen? > + if (git_dir) { > child_process_init(cp); > - cp->dir = strbuf_detach(_path, NULL); > - prepare_submodule_repo_env(>env_array); > + prepare_submodule_repo_env_in_gitdir(>env_array); > + cp->dir = git_dir; Does cp now own it and cp->dir gets freed once run_processes_parallel() is done with this task? Or is cp->dir simply leaking? The old code gave the result of strbuf_detach(), so even if cp->dir is leaking, the leak is not new in this patch. > cp->git_cmd = 1; > if (!spf->quiet) > strbuf_addf(err, "Fetching submodule %s%s\n", > @@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp, > argv_array_push(>args, submodule_prefix.buf); > ret = 1; > } > - strbuf_release(_path); > - strbuf_release(_git_dir); But if it is a leak, it is easily plugged by freeing git_dir here, I think. Thanks. > strbuf_release(_prefix); > if (ret) { > spf->count++; > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index
[PATCH v4] test_dir_is_empty: fix edge cases with newlines and hyphens
While the `test_dir_is_empty` function appears correct in most normal use cases, it can improperly pass if a directory contains a filename with a newline, and can improperly fail if an empty directory looks like an argument to `ls`. This patch changes the implementation to check that the output of `ls -a` has at most two lines (for `.` and `..`), which should be better behaved, and adds the `--` delimiter before the directory name when invoking `ls`. The newly added unit test fails before this change and passes after it. Signed-off-by: William Chargin --- This patch depends on "t: factor out FUNNYNAMES as shared lazy prereq" (2018-08-06), which is now in master. I originally wrote this patch for the standalone Sharness library, but that library advises that such patches be sent to the Git mailing list first. Tested on GNU/Linux (Mint 18.2) and macOS (10.13). t/t-basic.sh| 43 + t/test-lib-functions.sh | 2 +- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index 850f651e4e..a5c57c6aa5 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -821,6 +821,49 @@ test_expect_success 'tests clean up even on failures' " EOF " +test_expect_success FUNNYNAMES \ + 'test_dir_is_empty behaves even in pathological cases' " + run_sub_test_lib_test \ + dir-empty 'behavior of test_dir_is_empty' <<-\\EOF && + test_expect_success 'should pass with actually empty directory' ' + mkdir empty_dir && + test_dir_is_empty empty_dir + ' + test_expect_success 'should fail with a normal filename' ' + mkdir nonempty_dir && + >nonempty_dir/some_file && + ! test_dir_is_empty nonempty_dir + ' + test_expect_success 'should fail with dot-newline-dot filename' ' + mkdir pathological_dir && + >\"pathological_dir/. + .\" && + ! test_dir_is_empty pathological_dir + ' + test_expect_success 'should pass with an empty directory \"-l\"' ' + mkdir -- -l && + test_dir_is_empty -l && + rmdir -- -l + ' + test_expect_success 'should pass with an empty directory \"--wat\"' ' + mkdir -- --wat && + test_dir_is_empty --wat && + rmdir -- --wat + ' + test_done + EOF + check_sub_test_lib_test dir-empty <<-\\EOF + > ok 1 - should pass with actually empty directory + > ok 2 - should fail with a normal filename + > ok 3 - should fail with dot-newline-dot filename + > ok 4 - should pass with an empty directory \"-l\" + > ok 5 - should pass with an empty directory \"--wat\" + > # passed all 5 test(s) + > 1..5 + EOF +" + + # Basics of the basics diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 4207af4077..3df6b8027f 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -576,7 +576,7 @@ test_path_exists () { # Check if the directory exists and is empty as expected, barf otherwise. test_dir_is_empty () { test_path_is_dir "$1" && - if test -n "$(ls -a1 "$1" | egrep -v '^\.\.?$')" + if test "$(ls -a1 -- "$1" | wc -l)" -gt 2 then echo "Directory '$1' is not empty, it contains:" ls -la "$1" -- 2.18.0.549.gd66323a05
Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
Stefan Beller writes: > We can string_list_insert() to maintain sorted-ness of the > list as we find new items, or we can string_list_append() to > build an unsorted list and sort it at the end just once. > > To pick which one is more appropriate, we notice the fact > that we discover new items more or less in the already > sorted order. That makes "append then sort" more > appropriate. Sorry, but I still do not get the math you are implying in the second paragraph. Are you saying that append-then-sort is efficient when items being appended is already sorted? That depends on the sorting algorithm used, so the logic is incomplete unless you say "given that we use X for sorting,...", I think. Do we really discover new items in sorted order, by the way? In a single diff invocation made inside collect_changed_submodules() for one commit in the superproject's history, we will grab changed paths in the pathname order (i.e. sorted); if the superproject's tip commit touches the submodules at paths A and Z, we will discover these two paths in sorted order. But because we are walking the superproject's history to collect all paths that have been affected in that function, and repeatedly calling diff as we discover commit in the superproject's history, I am not sure how well the resulting set of paths would be sorted. The tip commit in superproject's history may have modified the submodule at path X, the parent of that commit may have touched the submodule at path M, and its parent may have touched the submodule at path A. Don't we end up grabbing these paths in that discoverd order, i.e. X, M and A? I still think changing it from "insert as we find an item, keeping the list sorted" to "append all and then sort before we start looking things up from the result" makes sense, but I do not think the "we find things in sorted order" is either true, or it would affect the choice between the two. A justification to choose the latter I can think of that makes sense is that we don't have to pay cost to keep the list sorted while building it because we do not do any look-up while building the list. > Signed-off-by: Stefan Beller > --- > submodule.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/submodule.c b/submodule.c > index d29dfa3d1f5..c6eff7699f3 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp, > case RECURSE_SUBMODULES_DEFAULT: > case RECURSE_SUBMODULES_ON_DEMAND: > if (!submodule || > - !unsorted_string_list_lookup( > + !string_list_lookup( > _submodule_names, > submodule->name)) > continue; > @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r, > /* default value, "--submodule-prefix" and its value are added later */ > > calculate_changed_submodule_paths(); > + string_list_sort(_submodule_names); > run_processes_parallel(max_parallel_jobs, > get_next_submodule, > fetch_start_failure,
Re: [PATCH 3/9] submodule.c: fix indentation
Stefan Beller writes: > The submodule subsystem is really bad at staying within 80 characters. > Fix it while we are here. Makes sense. Thanks. > > Signed-off-by: Stefan Beller > --- > submodule.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/submodule.c b/submodule.c > index a2b266fbfae..d29dfa3d1f5 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp, > if (!submodule) { > const char *name = default_name_or_path(ce->name); > if (name) { > - default_submodule.path = default_submodule.name > = name; > + default_submodule.path = name; > + default_submodule.name = name; > submodule = _submodule; > } > } > @@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp, > default: > case RECURSE_SUBMODULES_DEFAULT: > case RECURSE_SUBMODULES_ON_DEMAND: > - if (!submodule || > !unsorted_string_list_lookup(_submodule_names, > - submodule->name)) > + if (!submodule || > + !unsorted_string_list_lookup( > + _submodule_names, > + submodule->name)) > continue; > default_argv = "on-demand"; > break;
Re: [PATCH 2/9] sha1-array: provide oid_array_filter
Stefan Beller writes: > Helped-by: Junio C Hamano > Signed-off-by: Stefan Beller > --- > sha1-array.c | 18 ++ > sha1-array.h | 5 + > 2 files changed, 23 insertions(+) > > diff --git a/sha1-array.c b/sha1-array.c > index 265941fbf40..76323935dd7 100644 > --- a/sha1-array.c > +++ b/sha1-array.c > @@ -77,3 +77,21 @@ int oid_array_for_each_unique(struct oid_array *array, > } > return 0; > } > + > +int oid_array_filter(struct oid_array *array, > + for_each_oid_fn fn, It probably makes sense to call it "want" instead of "fn" to match object_array_filter(). > + void *cbdata) > +{ > + int src, dst; > + > + for (src = dst = 0; src < array->nr; src++) { > + if (fn(>oid[src], cbdata)) { > + if (dst < src) > + oidcpy(>oid[dst], >oid[src]); > + dst++; > + } In fact, matching the implementation of object_array_fiter() may also make sense, as I do not see a strong reason why the resulting code would become better by rewriting "dst != src" over there to "dst < src" here. > + } > + array->nr = dst; > + > + return 0; > +} > diff --git a/sha1-array.h b/sha1-array.h > index 232bf950172..a2d7c210835 100644 > --- a/sha1-array.h > +++ b/sha1-array.h > @@ -23,4 +23,9 @@ int oid_array_for_each_unique(struct oid_array *array, > for_each_oid_fn fn, > void *data); > > +/* Call fn for each oid, and remove it if fn returns 0, retain it otherwise > */ Also perhaps mimic the wording of object_array_filter()'s comment? I find it easier that the latter says "retaining only if X" instead of saying "remove if !X, retain otherwise"; it's both shorter and more to the point. It also is nicer that it notes that the order is preserved. > +int oid_array_filter(struct oid_array *array, > + for_each_oid_fn fn, > + void *cbdata); > + Other than that, the function makes sense very much, and the callsite we see in patch 8/9 does, too. Thanks. > #endif /* SHA1_ARRAY_H */
Re: [PATCH 1/9] string-list: add string_list_{pop, last} functions
Stefan Beller writes: > Add a few functions to allow a string-list to be used as a stack: > > - string_list_last() lets a caller peek the string_list_item at the >end of the string list. The caller needs to be aware that it is >borrowing a pointer, which can become invalid if/when the >string_list is resized. > > - string_list_pop() removes the string_list_item at the end of >the string list. > > - _pop usually has a friend _push. This role is taken by > string_list_append already, as they are not symmetrical > in our code base: _append returns the pointer, such that > adding a util is easy, but _pop doesn't return such a pointer. This third-item somehow is indented one place too deeply. "our pop() does not return a pointer, so we won't introduce push() that does return a pointer"? Am I reading the third item right? pop() and push() usually go together, but they are opposite operations, they are expected to behave oppositely, and they are expected to be interfaced differently. pop() that does not return a pointer is no better or no worse match to push() that returns a pointer, no? Lack of something_push(), when something_pop() exists, would always be surprising to anybody new to the "something" API, and no amount of mumbling would justify it, I would think, but this third item sounds like it is trying to make a lame excuse when there is no need to. Wouldn't it be sufficient to say "there is no string_list_push(); string_list_append() can be used in its place" and stop there? > You can use them in this pattern: > > while (list.nr) { > struct string_list_item *item = string_list_last(); > > work_on(item); > string_list_pop(); > } "free_util" as the second argument to this sample call is missing. > string-list.c | 14 ++ > string-list.h | 11 +++ > 2 files changed, 25 insertions(+) > diff --git a/string-list.c b/string-list.c > index 771c4550980..04db2b537c0 100644 > --- a/string-list.c > +++ b/string-list.c > @@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const > char *string, > } > } > > +void string_list_pop(struct string_list *list, int free_util) > +{ > + if (list->nr == 0) > + BUG("tried to remove an item from empty string list"); > + > + if (list->strdup_strings) > + free(list->items[list->nr - 1].string); > + > + if (free_util) > + free(list->items[list->nr - 1].util); > + > + list->nr--; > +} > + > int string_list_has_string(const struct string_list *list, const char > *string) > { > int exact_match; > diff --git a/string-list.h b/string-list.h > index ff8f6094a33..ce2528bbe15 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, > const char *string, > */ > struct string_list_item *string_list_lookup(struct string_list *list, const > char *string); > > +/** > + * Removes the last item from the list. > + * The caller must ensure that the list is not empty. > + */ > +void string_list_pop(struct string_list *list, int free_util); > + > +static inline struct string_list_item *string_list_last(struct string_list > *list) We may want to warn users that pop(), append(), etc. shouldn't be done while using the returned value from this function in an in-code comment or docstring. > +{ > + return >items[list->nr - 1]; > +} > + Other than that, nicely done. > /* > * Remove all but the first of consecutive entries with the same > * string value. If free_util is true, call free() on the util
Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)
On Tue, Sep 11 2018, Junio C Hamano wrote: > Here are the topics that have been cooking. Commits prefixed with > '-' are only in 'pu' (proposed updates) while commits prefixed with > '+' are in 'next'. The ones marked with '.' do not appear in any of > the integration branches, but I am still holding onto them. Can you please pick up v5 of the patches René & I had for various fsck skipList improvements, at: https://public-inbox.org/git/20180903144928.30691-1-ava...@gmail.com/ Looks like v5 is ready be picked up, and might have fallen through the cracks during the release period.
Starting subshells via tags
Hello - This is not a git bug or issue. It's just something I stumbled across and didn't see any google results for. I thought others would benefit from being aware of it. I saw a makefile which included of "git describe --tags --dirty" to define version information for a binary's --version command line parameter. The commit/tag information was passed to g++ in a Makefile via: CXXFLAGS += -DBUILD_COMMIT="\"$(shell git describe --tags --dirty)\"" For fun (on Linux) I made simple c++ program and Makefile with the above CXXFLAGS, and a tag (backticks work too): git tag '$(echo>/tmp/test.txt)' Then built. Make executes the git command via a shell and the shell executes the subshell. /tmp/test.txt was created. The tags themselves don't allow spaces so the complexity of the command is limited, though I didn't explore what I could do with chaining shells or escape characters. It's easy enough to add a script to the repository where the tag is located and execute that script from the tag's subshell with a tag, such as $(./test.sh). Again, this is not at all a git issue, git is just used as the transport. As with every other shell attack, it comes down to "always sanitize what you pass through to a shell". Or don't pass it to the shell at all, use another mechanism.
[PATCH 0/1] contrib: Add script to show uncovered "new" lines
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage counts for every line, and "#" at every uncovered line. There have been a few bugs in recent patches what would have been caught if the test suite covered those blocks (including a few of mine). I want to work towards a "sensible" amount of coverage on new topics. In my opinion, this means that any logic should be covered, but the 'die()' blocks in error cases do not need to be covered. It is important to not measure the coverage of the codebase by what old code is not covered. To help, I created the 'contrib/coverage-diff.sh' script. After creating the coverage statistics at a version (say, 'topic') you can then run contrib/coverage-diff.sh base topic to see the lines added between 'base' and 'topic' that are not covered by the test suite. For example, I ran this against the 'jch' branch (d3c0046) versus 'next' (dd90340) and got the following output: builtin/commit.c 859fdc0c3cf (Derrick Stolee 2018-08-29 05:49:04 -0700 1657) write_commit_graph_reachable(get_object_directory(), 0); builtin/rev-list.c 250edfa8c87 (Harald Nordgren2018-04-18 23:05:35 +0200 431) bisect_flags |= BISECT_FIND_ALL; builtin/worktree.c e5353bef550 (Eric Sunshine 2018-08-28 17:20:19 -0400 60) error_errno(_("failed to delete '%s'"), sb.buf); e19831c94f9 (Eric Sunshine 2018-08-28 17:20:23 -0400 251) die(_("unable to re-add worktree '%s'"), path); 68a6b3a1bd4 (Eric Sunshine 2018-08-28 17:20:24 -0400 793) die(_("cannot move a locked working tree, lock reason: %s\nuse 'move -f -f' to override or unlock first"), f4143101cbb (Eric Sunshine 2018-08-28 17:20:25 -0400 906) die(_("cannot remove a locked working tree, lock reason: %s\nuse 'remove -f -f' to override or unlock first"), read-cache.c 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1754) const unsigned char *cp = (const unsigned char *)name; 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1757) previous_len = previous_ce ? previous_ce->ce_namelen : 0; 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1758) strip_len = decode_varint(); 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1759) if (previous_len < strip_len) { 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1760) if (previous_ce) 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1761) die(_("malformed name field in the index, near path '%s'"), 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1762) previous_ce->name); 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1764) die(_("malformed name field in the index in the first path")); 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1766) copy_len = previous_len - strip_len; 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1767) name = (const char *)cp; 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1773) len += copy_len; 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1794) if (copy_len) 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1795) memcpy(ce->name, previous_ce->name, copy_len); 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1796) memcpy(ce->name + copy_len, name, len + 1 - copy_len); 67922a3 (Nguyễn Thái Ngọc Duy 2018-09-02 15:19:33 +0200 1797) *ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len; remote-curl.c c3b9bc94b9b (Elijah Newren 2018-09-05 10:03:07 -0700 181) options.filter = xstrdup(value); Using this 'git blame' output, we can quickly inspect whether the uncovered lines are appropriate. For instance: 1. The line in builtin/commit.c is due to writing the commit-graph file when GIT_TEST_COMMIT_GRAPH is enabled, which is not on by default in the test suite. Being uncovered is expected here. 2. The lines in builtin/worktree.c are all related to error conditions. This is acceptable. 3. The line in builtin/rev-list.c is a flag replacement in a block that is otherwise unchanged. It must not be covered by the test suite normally. This could be worth adding a test to ensure the new logic maintains old behavior. 4. The lines in
[PATCH 1/1] contrib: add coverage-diff script
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage counts for every line, and "#" at every uncovered line. There have been a few bugs in recent patches what would have been caught if the test suite covered those blocks (including a few of mine). I want to work towards a "sensible" amount of coverage on new topics. In my opinion, this means that any logic should be covered, but the 'die()' blocks in error cases do not need to be covered. It is important to not measure the coverage of the codebase by what old code is not covered. To help, I created the 'contrib/coverage-diff.sh' script. After creating the coverage statistics at a version (say, 'topic') you can then run contrib/coverage-diff.sh base topic to see the lines added between 'base' and 'topic' that are not covered by the test suite. The output uses 'git blame -c' format so you can find the commits responsible and view the line numbers for quick access to the context. Signed-off-by: Derrick Stolee --- contrib/coverage-diff.sh | 70 1 file changed, 70 insertions(+) create mode 100755 contrib/coverage-diff.sh diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh new file mode 100755 index 00..22acb13d38 --- /dev/null +++ b/contrib/coverage-diff.sh @@ -0,0 +1,70 @@ +#!/bin/bash + +# Usage: 'contrib/coverage-diff.sh +# Outputs a list of new lines in version2 compared to version1 that are +# not covered by the test suite. Assumes you ran +# 'make coverage-test coverage-report' from root first, so .gcov files exist. + +V1=$1 +V2=$2 + +diff-lines() { +local path= +local line= +while read; do + esc=$'\033' + if [[ $REPLY =~ ---\ (a/)?.* ]]; then + continue + elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then + path=${BASH_REMATCH[2]} + elif [[ $REPLY =~ @@\ -[0-9]+(,[0-9]+)?\ \+([0-9]+)(,[0-9]+)?\ @@.* ]]; then + line=${BASH_REMATCH[2]} + elif [[ $REPLY =~ ^($esc\[[0-9;]+m)*([\ +-]) ]]; then + echo "$path:$line:$REPLY" + if [[ ${BASH_REMATCH[2]} != - ]]; then + ((line++)) + fi + fi +done +} + +git diff --raw $V1 $V2 | grep \.c$ | awk 'NF>1{print $NF}' >files.txt + +for file in $(cat files.txt) +do + hash_file=${file//\//\#} + + git diff $V1 $V2 -- $file \ + | diff-lines \ + | grep ":+" \ + >"diff_file.txt" + + cat diff_file.txt \ + | sed -E 's/:/ /g' \ + | awk '{print $2}' \ + | sort \ + >new_lines.txt + + cat "$hash_file.gcov" \ + | grep \#\#\#\#\# \ + | sed 's/#: //g' \ + | sed 's/\:/ /g' \ + | awk '{print $1}' \ + | sort \ + >uncovered_lines.txt + + comm -12 uncovered_lines.txt new_lines.txt \ + | sed -e 's/$/\)/' \ + | sed -e 's/^/\t/' \ + >uncovered_new_lines.txt + + grep -q '[^[:space:]]' < uncovered_new_lines.txt && \ + echo $file && \ + git blame -c $file \ + | grep -f uncovered_new_lines.txt + + rm -f diff_file.txt new_lines.txt \ + uncovered_lines.txt uncovered_new_lines.txt +done + +rm -rf files.txt -- gitgitgadget
[PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension
The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + + "REUC" + ) Signed-off-by: Ben Peart --- Documentation/technical/index-format.txt | 23 read-cache.c | 154 +-- t/README | 5 + t/t1700-split-index.sh | 1 + 4 files changed, 175 insertions(+), 8 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index db3572626b..6bc2d90f7f 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -314,3 +314,26 @@ The remaining data of each directory block is grouped by type: - An ewah bitmap, the n-th bit indicates whether the n-th index entry is not CE_FSMONITOR_VALID. + +== End of Index Entry + + The End of Index Entry (EOIE) is used to locate the end of the variable + length index entries and the begining of the extensions. Code can take + advantage of this to quickly locate the index extensions without having + to parse through all of the index entries. + + Because it must be able to be loaded before the variable length cache + entries and other index extensions, this extension must be written last. + The signature for this extension is { 'E', 'O', 'I', 'E' }. + + The extension consists of: + + - 32-bit offset to the end of the index entries + + - 160-bit SHA-1 over the extension types and their sizes (but not + their contents). E.g. if we have "TREE" extension that is N-bytes + long, "REUC" extension that is M-bytes long, followed by "EOIE", + then the hash would be: + + SHA-1("TREE" + + + "REUC" + ) diff --git a/read-cache.c b/read-cache.c index 7b1354d759..858935f123 100644 --- a/read-cache.c +++ b/read-cache.c @@ -43,6 +43,7 @@ #define CACHE_EXT_LINK 0x6c696e6b/* "link" */ #define CACHE_EXT_UNTRACKED 0x554E5452 /* "UNTR" */ #define CACHE_EXT_FSMONITOR 0x46534D4E /* "FSMN" */ +#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */ /* changes that can be kept in $GIT_DIR/index (basically all extensions) */ #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \ @@ -1693,6 +1694,9 @@ static int read_index_extension(struct index_state *istate, case CACHE_EXT_FSMONITOR: read_fsmonitor_extension(istate, data, sz); break; + case CACHE_EXT_ENDOFINDEXENTRIES: + /* already handled in do_read_index() */ + break; default: if (*ext < 'A' || 'Z' < *ext) return error("index uses %.4s extension, which we do not understand", @@ -1889,6 +1893,11 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries) return ondisk_size + entries * per_entry; } +#ifndef NO_PTHREADS +static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size); +#endif +static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset); + /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) { @@ -2198,11 +2207,15 @@ static int ce_write(git_hash_ctx *context, int fd, void *data, unsigned int len) return 0; } -static int write_index_ext_header(git_hash_ctx *context, int fd, - unsigned int ext, unsigned int sz) +static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx *eoie_context, + int fd, unsigned int ext, unsigned int sz) { ext = htonl(ext); sz = htonl(sz); + if (eoie_context) { + the_hash_algo->update_fn(eoie_context, , 4); + the_hash_algo->update_fn(eoie_context, , 4); + } return ((ce_write(context, fd, , 4) < 0) || (ce_write(context, fd, , 4) < 0)) ? -1 : 0; } @@ -2445,7 +2458,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, { uint64_t start = getnanotime(); int newfd = tempfile->fd; - git_hash_ctx c; + git_hash_ctx c, eoie_c; struct cache_header hdr; int i, err = 0, removed,
Re: [RFC PATCH 0/1] builtin/remote: quote remote name on error to display empty name
Shulhan writes: > Rationale: consistent error format > > When adding new remote name with empty string, git will print the > following error message, > > fatal: '' is not a valid remote name\n > > But when removing remote name with empty string as input, git did not > print the empty string with quote, > > fatal: No such remote: \n Thanks for noticing. I think the consistency given by the patch 1/1 is good. > Follow up question: If this is ok, should the po files also updated? They should be updated, but not by you and not as part of this change. The po files will be updated by our l10n coordinator, typically when it gets close to the release time, for all the message changes done during the release cycle. Thanks.
[PATCH v5 4/5] read-cache.c: optimize reading index format v4
From: Nguyễn Thái Ngọc Duy Index format v4 requires some more computation to assemble a path based on a previous one. The current code is not very efficient because - it doubles memory copy, we assemble the final path in a temporary first before putting it back to a cache_entry - strbuf_remove() in expand_name_field() is not exactly a good fit for stripping a part at the end, _setlen() would do the same job and is much cheaper. - the open-coded loop to find the end of the string in expand_name_field() can't beat an optimized strlen() This patch avoids the temporary buffer and writes directly to the new cache_entry, which addresses the first two points. The last point could also be avoided if the total string length fits in the first 12 bits of ce_flags, if not we fall back to strlen(). Running "test-tool read-cache 100" on webkit.git (275k files), reading v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more time. The patch reduces read time on v4 to 4.319 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Ben Peart --- read-cache.c | 132 ++- 1 file changed, 67 insertions(+), 65 deletions(-) diff --git a/read-cache.c b/read-cache.c index 880f627b4c..40dc4723b2 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1721,33 +1721,6 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file(), get_git_dir()); } -static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool, - struct ondisk_cache_entry *ondisk, - unsigned int flags, - const char *name, - size_t len) -{ - struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len); - - ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec); - ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec); - ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec); - ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec); - ce->ce_stat_data.sd_dev = get_be32(>dev); - ce->ce_stat_data.sd_ino = get_be32(>ino); - ce->ce_mode = get_be32(>mode); - ce->ce_stat_data.sd_uid = get_be32(>uid); - ce->ce_stat_data.sd_gid = get_be32(>gid); - ce->ce_stat_data.sd_size = get_be32(>size); - ce->ce_flags = flags & ~CE_NAMEMASK; - ce->ce_namelen = len; - ce->index = 0; - hashcpy(ce->oid.hash, ondisk->sha1); - memcpy(ce->name, name, len); - ce->name[len] = '\0'; - return ce; -} - /* * Adjacent cache entries tend to share the leading paths, so it makes * sense to only store the differences in later entries. In the v4 @@ -1762,22 +1735,24 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_) if (name->len < len) die("malformed name field in the index"); - strbuf_remove(name, name->len - len, len); - for (ep = cp; *ep; ep++) - ; /* find the end */ + strbuf_setlen(name, name->len - len); + ep = cp + strlen((const char *)cp); strbuf_add(name, cp, ep - cp); return (const char *)ep + 1 - cp_; } -static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, +static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, + unsigned int version, struct ondisk_cache_entry *ondisk, unsigned long *ent_size, - struct strbuf *previous_name) + const struct cache_entry *previous_ce) { struct cache_entry *ce; size_t len; const char *name; unsigned int flags; + size_t copy_len = 0; + int expand_name_field = version == 4; /* On-disk flags are just 16 bits */ flags = get_be16(>flags); @@ -1797,21 +1772,50 @@ static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, else name = ondisk->name; - if (!previous_name) { - /* v3 and earlier */ - if (len == CE_NAMEMASK) - len = strlen(name); - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, name, len); + if (expand_name_field) { + const unsigned char *cp = (const unsigned char *)name; + size_t strip_len, previous_len; - *ent_size = ondisk_ce_size(ce); - } else { - unsigned long consumed; - consumed = expand_name_field(previous_name, name); - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, -previous_name->buf, -
[PATCH v5 5/5] read-cache: clean up casting and byte decoding
This patch does a clean up pass to minimize the casting required to work with the memory mapped index (mmap). It also makes the decoding of network byte order more consistent by using get_be32() where possible. Signed-off-by: Ben Peart --- read-cache.c | 49 +++-- 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/read-cache.c b/read-cache.c index 40dc4723b2..c05e887fc9 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1655,7 +1655,7 @@ int verify_index_checksum; /* Allow fsck to force verification of the cache entry order. */ int verify_ce_order; -static int verify_hdr(struct cache_header *hdr, unsigned long size) +static int verify_hdr(const struct cache_header *hdr, unsigned long size) { git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; @@ -1679,7 +1679,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) } static int read_index_extension(struct index_state *istate, - const char *ext, void *data, unsigned long sz) + const char *ext, const char *data, unsigned long sz) { switch (CACHE_EXT(ext)) { case CACHE_EXT_TREE: @@ -1902,7 +1902,7 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries) } #ifndef NO_PTHREADS -static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size); +static unsigned long read_eoie_extension(const char *mmap, size_t mmap_size); #endif static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset); @@ -1912,14 +1912,14 @@ struct load_index_extensions pthread_t pthread; #endif struct index_state *istate; - void *mmap; + const char *mmap; size_t mmap_size; unsigned long src_offset; }; -static void *load_index_extensions(void *_data) +static void *load_index_extensions(void *data) { - struct load_index_extensions *p = _data; + struct load_index_extensions *p = data; unsigned long src_offset = p->src_offset; while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) { @@ -1930,13 +1930,12 @@ static void *load_index_extensions(void *_data) * in 4-byte network byte order. */ uint32_t extsize; - memcpy(, (char *)p->mmap + src_offset + 4, 4); - extsize = ntohl(extsize); + extsize = get_be32(p->mmap + src_offset + 4); if (read_index_extension(p->istate, - (const char *)p->mmap + src_offset, - (char *)p->mmap + src_offset + 8, + p->mmap + src_offset, + p->mmap + src_offset + 8, extsize) < 0) { - munmap(p->mmap, p->mmap_size); + munmap((void *)p->mmap, p->mmap_size); die("index file corrupt"); } src_offset += 8; @@ -1951,7 +1950,7 @@ static void *load_index_extensions(void *_data) * from the memory mapped file and add them to the given index. */ static unsigned long load_cache_entry_block(struct index_state *istate, - struct mem_pool *ce_mem_pool, int offset, int nr, void *mmap, + struct mem_pool *ce_mem_pool, int offset, int nr, const char *mmap, unsigned long start_offset, const struct cache_entry *previous_ce) { int i; @@ -1962,7 +1961,7 @@ static unsigned long load_cache_entry_block(struct index_state *istate, struct cache_entry *ce; unsigned long consumed; - disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset); + disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset); ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, , previous_ce); set_index_entry(istate, i, ce); @@ -1973,7 +1972,7 @@ static unsigned long load_cache_entry_block(struct index_state *istate, } static unsigned long load_all_cache_entries(struct index_state *istate, - void *mmap, size_t mmap_size, unsigned long src_offset) + const char *mmap, size_t mmap_size, unsigned long src_offset) { unsigned long consumed; @@ -2006,7 +2005,7 @@ struct load_cache_entries_thread_data struct index_state *istate; struct mem_pool *ce_mem_pool; int offset, nr; - void *mmap; + const char *mmap; unsigned long start_offset; struct cache_entry *previous_ce; unsigned long consumed; /* return # of bytes in index file processed */ @@ -2026,7 +2025,7 @@ static void *load_cache_entries_thread(void *_data) } static unsigned long load_cache_entries_threaded(int nr_threads, struct index_state *istate, - void *mmap, size_t mmap_size, unsigned
[PATCH v5 3/5] read-cache: load cache entries on worker threads
This patch helps address the CPU cost of loading the index by creating multiple threads to divide the work of loading and converting the cache entries across all available CPU cores. It accomplishes this by having the primary thread loop across the index file tracking the offset and (for V4 indexes) expanding the name. It creates a thread to process each block of entries as it comes to them. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 filesBaseline Parallel entries --- read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0% Test w/1,000,000 files Baseline Parallel entries -- read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1% Signed-off-by: Ben Peart --- read-cache.c | 242 +-- t/README | 3 + 2 files changed, 217 insertions(+), 28 deletions(-) diff --git a/read-cache.c b/read-cache.c index b203eebb44..880f627b4c 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1942,20 +1942,212 @@ static void *load_index_extensions(void *_data) return NULL; } +/* + * A helper function that will load the specified range of cache entries + * from the memory mapped file and add them to the given index. + */ +static unsigned long load_cache_entry_block(struct index_state *istate, + struct mem_pool *ce_mem_pool, int offset, int nr, void *mmap, + unsigned long start_offset, struct strbuf *previous_name) +{ + int i; + unsigned long src_offset = start_offset; + + for (i = offset; i < offset + nr; i++) { + struct ondisk_cache_entry *disk_ce; + struct cache_entry *ce; + unsigned long consumed; + + disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset); + ce = create_from_disk(ce_mem_pool, disk_ce, , previous_name); + set_index_entry(istate, i, ce); + + src_offset += consumed; + } + return src_offset - start_offset; +} + +static unsigned long load_all_cache_entries(struct index_state *istate, + void *mmap, size_t mmap_size, unsigned long src_offset) +{ + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; + unsigned long consumed; + + if (istate->version == 4) { + previous_name = _name_buf; + mem_pool_init(>ce_mem_pool, + estimate_cache_size_from_compressed(istate->cache_nr)); + } else { + previous_name = NULL; + mem_pool_init(>ce_mem_pool, + estimate_cache_size(mmap_size, istate->cache_nr)); + } + + consumed = load_cache_entry_block(istate, istate->ce_mem_pool, + 0, istate->cache_nr, mmap, src_offset, previous_name); + strbuf_release(_name_buf); + return consumed; +} + +#ifndef NO_PTHREADS + +/* + * Mostly randomly chosen maximum thread counts: we + * cap the parallelism to online_cpus() threads, and we want + * to have at least 1 cache entries per thread for it to + * be worth starting a thread. + */ +#define THREAD_COST(1) + +struct load_cache_entries_thread_data +{ + pthread_t pthread; + struct index_state *istate; + struct mem_pool *ce_mem_pool; + int offset, nr; + void *mmap; + unsigned long start_offset; + struct strbuf previous_name_buf; + struct strbuf *previous_name; + unsigned long consumed; /* return # of bytes in index file processed */ +}; + +/* + * A thread proc to run the load_cache_entries() computation + * across multiple background threads. + */ +static void *load_cache_entries_thread(void *_data) +{ + struct load_cache_entries_thread_data *p = _data; + + p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool, + p->offset, p->nr, p->mmap, p->start_offset, p->previous_name); + return NULL; +} + +static unsigned long load_cache_entries_threaded(int nr_threads, struct index_state *istate, + void *mmap, size_t mmap_size, unsigned long src_offset) +{ + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; + struct load_cache_entries_thread_data *data; + int ce_per_thread; + unsigned long consumed; + int i, thread; + + /* a little sanity checking */ + if (istate->name_hash_initialized) + BUG("the name hash isn't thread safe"); + + mem_pool_init(>ce_mem_pool, 0); + if (istate->version == 4) + previous_name = _name_buf; + else + previous_name = NULL; + + ce_per_thread = DIV_ROUND_UP(istate->cache_nr, nr_threads);
[PATCH v5 2/5] read-cache: load cache extensions on a worker thread
This patch helps address the CPU cost of loading the index by loading the cache extensions on a worker thread in parallel with loading the cache entries. In some cases, loading the extensions takes longer than loading the cache entries so this patch utilizes the new EOIE to start the thread to load the extensions before loading all the cache entries in parallel. This is possible because the current extensions don't access the cache entries in the index_state structure so are OK that they don't all exist yet. The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED extensions don't even get a pointer to the index so don't have access to the cache entries. CACHE_EXT_LINK only uses the index_state to initialize the split index. CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last update and dirty flags. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 filesBaseline Parallel Extensions --- read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0% Test w/1,000,000 files Baseline Parallel Extensions -- read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1% Signed-off-by: Ben Peart --- Documentation/config.txt | 6 +++ config.c | 18 config.h | 1 + read-cache.c | 94 4 files changed, 102 insertions(+), 17 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1c42364988..79f8296d9c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2391,6 +2391,12 @@ imap:: The configuration variables in the 'imap' section are described in linkgit:git-imap-send[1]. +index.threads:: + Specifies the number of threads to spawn when loading the index. + This is meant to reduce index load time on multiprocessor machines. + Specifying 0 or 'true' will cause Git to auto-detect the number of + CPU's and set the number of threads accordingly. Defaults to 'true'. + index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. diff --git a/config.c b/config.c index 9a0b10d4bc..9bd79fb165 100644 --- a/config.c +++ b/config.c @@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void) return 0; } +/* + * You can disable multi-threaded code by setting index.threads + * to 'false' (or 1) + */ +int git_config_get_index_threads(void) +{ + int is_bool, val; + + if (!git_config_get_bool_or_int("index.threads", _bool, )) { + if (is_bool) + return val ? 0 : 1; + else + return val; + } + + return 0; /* auto-detect */ +} + NORETURN void git_die_config_linenr(const char *key, const char *filename, int linenr) { diff --git a/config.h b/config.h index ab46e0165d..a06027e69b 100644 --- a/config.h +++ b/config.h @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void); extern int git_config_get_split_index(void); extern int git_config_get_max_percent_split_change(void); extern int git_config_get_fsmonitor(void); +extern int git_config_get_index_threads(void); /* This dies if the configured or default date is in the future */ extern int git_config_get_expiry(const char *key, const char **output); diff --git a/read-cache.c b/read-cache.c index 858935f123..b203eebb44 100644 --- a/read-cache.c +++ b/read-cache.c @@ -23,6 +23,10 @@ #include "split-index.h" #include "utf8.h" #include "fsmonitor.h" +#ifndef NO_PTHREADS +#include +#include +#endif /* Mask for the name length in ce_flags in the on-disk index */ @@ -1898,6 +1902,46 @@ static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size); #endif static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset); +struct load_index_extensions +{ +#ifndef NO_PTHREADS + pthread_t pthread; +#endif + struct index_state *istate; + void *mmap; + size_t mmap_size; + unsigned long src_offset; +}; + +static void *load_index_extensions(void *_data) +{ + struct load_index_extensions *p = _data; + unsigned long src_offset = p->src_offset; + + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) { + /* After an array of active_nr index entries, +* there can be arbitrary number of extended +* sections, each of which is prefixed with +* extension name (4-byte) and section length +* in 4-byte network byte order. +*/ + uint32_t extsize; + memcpy(, (char *)p->mmap + src_offset + 4, 4); +
[PATCH v5 0/5] read-cache: speed up index load through parallelization
This version of the patch merges in Duy's work to speed up index v4 decoding. I had to massage it a bit to get it to work with the multi-threading but it is still largely his code. I also responded to Junio's feedback on initializing copy_len to avoid compiler warnings. I also added a minor cleanup patch to minimize the casting required when working with the memory mapped index and other minor changes based on the feedback received. Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/dcf62005f8 Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v5 && git checkout dcf62005f8 ### Interdiff (v3..v5): diff --git a/read-cache.c b/read-cache.c index 8537a55750..c05e887fc9 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1655,7 +1655,7 @@ int verify_index_checksum; /* Allow fsck to force verification of the cache entry order. */ int verify_ce_order; -static int verify_hdr(struct cache_header *hdr, unsigned long size) +static int verify_hdr(const struct cache_header *hdr, unsigned long size) { git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; @@ -1679,7 +1679,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) } static int read_index_extension(struct index_state *istate, - const char *ext, void *data, unsigned long sz) + const char *ext, const char *data, unsigned long sz) { switch (CACHE_EXT(ext)) { case CACHE_EXT_TREE: @@ -1721,33 +1721,6 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file(), get_git_dir()); } -static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool, - struct ondisk_cache_entry *ondisk, - unsigned int flags, - const char *name, - size_t len) -{ - struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len); - - ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec); - ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec); - ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec); - ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec); - ce->ce_stat_data.sd_dev = get_be32(>dev); - ce->ce_stat_data.sd_ino = get_be32(>ino); - ce->ce_mode = get_be32(>mode); - ce->ce_stat_data.sd_uid = get_be32(>uid); - ce->ce_stat_data.sd_gid = get_be32(>gid); - ce->ce_stat_data.sd_size = get_be32(>size); - ce->ce_flags = flags & ~CE_NAMEMASK; - ce->ce_namelen = len; - ce->index = 0; - hashcpy(ce->oid.hash, ondisk->sha1); - memcpy(ce->name, name, len); - ce->name[len] = '\0'; - return ce; -} - /* * Adjacent cache entries tend to share the leading paths, so it makes * sense to only store the differences in later entries. In the v4 @@ -1768,15 +1741,18 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_) return (const char *)ep + 1 - cp_; } -static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, +static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, + unsigned int version, struct ondisk_cache_entry *ondisk, unsigned long *ent_size, - struct strbuf *previous_name) + const struct cache_entry *previous_ce) { struct cache_entry *ce; size_t len; const char *name; unsigned int flags; + size_t copy_len = 0; + int expand_name_field = version == 4; /* On-disk flags are just 16 bits */ flags = get_be16(>flags); @@ -1796,21 +1772,50 @@ static struct cache_entry *create_from_disk(struct mem_pool *mem_pool, else name = ondisk->name; - if (!previous_name) { - /* v3 and earlier */ + if (expand_name_field) { + const unsigned char *cp = (const unsigned char *)name; + size_t strip_len, previous_len; + + previous_len = previous_ce ? previous_ce->ce_namelen : 0; + strip_len = decode_varint(); + if (previous_len < strip_len) { + if (previous_ce) + die(_("malformed name field in the index, near path '%s'"), + previous_ce->name); + else + die(_("malformed name field in the index in the first path")); + } + copy_len = previous_len - strip_len; + name = (const char *)cp; + } + if (len == CE_NAMEMASK) - len =
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jeff King writes: > I really wonder if this topic is worth pursuing further without finding > a real-world case that actually fails with the v2.19 code. I.e., is > there actually a server that doesn't set CONTENT_LENGTH and really can't > handle read-to-eof? It's plausible to me, but it's also equally > plausible that we'd be breaking some other case. OK.
[PATCH 0/2] Bugfix for partial clones and ref-in-want servers
A coworker discovered a bug when having a partial clone against a server that supports ref-in-want - the lazy fetches no longer work, because the client attempts to fetch with "want-ref ", which is not permitted by the protocol. The first patch deduplicates some code, so that the bugfix need only be applied once, and the second patch contains the actual bugfix. Jonathan Tan (2): fetch-object: provide only one fetching function fetch-object: set exact_oid when fetching fetch-object.c | 17 ++--- fetch-object.h | 8 ++-- sha1-file.c | 2 +- t/t0410-partial-clone.sh | 12 unpack-trees.c | 2 +- 5 files changed, 22 insertions(+), 19 deletions(-) -- 2.19.0.397.gdd90340f6a-goog
[PATCH 2/2] fetch-object: set exact_oid when fetching
fetch_objects() currently does not set exact_oid in struct ref when invoking transport_fetch_refs(). If the server supports ref-in-want, fetch_pack() uses this field to determine whether a wanted ref should be requested as a "want-ref" line or a "want" line; without the setting of exact_oid, the wrong line will be sent. Set exact_oid, so that the correct line is sent. Signed-off-by: Jonathan Tan --- fetch-object.c | 1 + t/t0410-partial-clone.sh | 12 2 files changed, 13 insertions(+) diff --git a/fetch-object.c b/fetch-object.c index 1af1bf857..426654880 100644 --- a/fetch-object.c +++ b/fetch-object.c @@ -32,6 +32,7 @@ void fetch_objects(const char *remote_name, const struct object_id *oids, for (i = 0; i < oid_nr; i++) { struct ref *new_ref = alloc_ref(oid_to_hex([i])); oidcpy(_ref->old_oid, [i]); + new_ref->exact_oid = 1; new_ref->next = ref; ref = new_ref; } diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 128130066..0ab02c337 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -170,6 +170,18 @@ test_expect_success 'fetching of missing objects' ' git verify-pack --verbose "$IDX" | grep "$HASH" ' +test_expect_success 'fetching of missing objects works with ref-in-want enabled' ' + # ref-in-want requires protocol version 2 + git -C server config protocol.version 2 && + git -C server config uploadpack.allowrefinwant 1 && + git -C repo config protocol.version 2 && + + rm -rf repo/.git/objects/* && + rm -f trace && + GIT_TRACE_PACKET="$(pwd)/trace" git -C repo cat-file -p "$HASH" && + grep "git< fetch=.*ref-in-want" trace +' + test_expect_success 'rev-list stops traversal at missing and promised commit' ' rm -rf repo && test_create_repo repo && -- 2.19.0.397.gdd90340f6a-goog
[PATCH 1/2] fetch-object: provide only one fetching function
fetch-object.h currently provides two functions (fetch_object() and fetch_objects()) that could be replaced by a single, more flexible function. Replace those two functions with the more flexible function. Signed-off-by: Jonathan Tan --- fetch-object.c | 16 +--- fetch-object.h | 8 ++-- sha1-file.c| 2 +- unpack-trees.c | 2 +- 4 files changed, 9 insertions(+), 19 deletions(-) diff --git a/fetch-object.c b/fetch-object.c index 853624f81..1af1bf857 100644 --- a/fetch-object.c +++ b/fetch-object.c @@ -23,21 +23,15 @@ static void fetch_refs(const char *remote_name, struct ref *ref) fetch_if_missing = original_fetch_if_missing; } -void fetch_object(const char *remote_name, const unsigned char *sha1) -{ - struct ref *ref = alloc_ref(sha1_to_hex(sha1)); - hashcpy(ref->old_oid.hash, sha1); - fetch_refs(remote_name, ref); -} - -void fetch_objects(const char *remote_name, const struct oid_array *to_fetch) +void fetch_objects(const char *remote_name, const struct object_id *oids, + int oid_nr) { struct ref *ref = NULL; int i; - for (i = 0; i < to_fetch->nr; i++) { - struct ref *new_ref = alloc_ref(oid_to_hex(_fetch->oid[i])); - oidcpy(_ref->old_oid, _fetch->oid[i]); + for (i = 0; i < oid_nr; i++) { + struct ref *new_ref = alloc_ref(oid_to_hex([i])); + oidcpy(_ref->old_oid, [i]); new_ref->next = ref; ref = new_ref; } diff --git a/fetch-object.h b/fetch-object.h index 4b269d07e..d2f996d4e 100644 --- a/fetch-object.h +++ b/fetch-object.h @@ -1,11 +1,7 @@ #ifndef FETCH_OBJECT_H #define FETCH_OBJECT_H -#include "sha1-array.h" - -extern void fetch_object(const char *remote_name, const unsigned char *sha1); - -extern void fetch_objects(const char *remote_name, - const struct oid_array *to_fetch); +void fetch_objects(const char *remote_name, const struct object_id *oids, + int oid_nr); #endif diff --git a/sha1-file.c b/sha1-file.c index 97b742384..2edf4564f 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1317,7 +1317,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid, * TODO Pass a repository struct through fetch_object, * such that arbitrary repositories work. */ - fetch_object(repository_format_partial_clone, real->hash); + fetch_objects(repository_format_partial_clone, real, 1); already_retried = 1; continue; } diff --git a/unpack-trees.c b/unpack-trees.c index f25089b87..82a83dbc6 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -392,7 +392,7 @@ static int check_updates(struct unpack_trees_options *o) } if (to_fetch.nr) fetch_objects(repository_format_partial_clone, - _fetch); + to_fetch.oid, to_fetch.nr); fetch_if_missing = fetch_if_missing_store; oid_array_clear(_fetch); } -- 2.19.0.397.gdd90340f6a-goog
[RFC PATCH 1/1] builtin/remote: quote remote name on error to display empty name
Rationale: consistent error format When adding new remote name with empty string, git will print the following error message, fatal: '' is not a valid remote name\n But when removing remote name with empty string as input, git did not print the empty string with quote, fatal: No such remote: \n --- builtin/remote.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/remote.c b/builtin/remote.c index 40c6f8a1b..f7edf7f2c 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -626,7 +626,7 @@ static int mv(int argc, const char **argv) oldremote = remote_get(rename.old_name); if (!remote_is_configured(oldremote, 1)) - die(_("No such remote: %s"), rename.old_name); + die(_("No such remote: '%s'"), rename.old_name); if (!strcmp(rename.old_name, rename.new_name) && oldremote->origin != REMOTE_CONFIG) return migrate_file(oldremote); @@ -762,7 +762,7 @@ static int rm(int argc, const char **argv) remote = remote_get(argv[1]); if (!remote_is_configured(remote, 1)) - die(_("No such remote: %s"), argv[1]); + die(_("No such remote: '%s'"), argv[1]); known_remotes.to_delete = remote; for_each_remote(add_known_remote, _remotes); @@ -861,7 +861,7 @@ static int get_remote_ref_states(const char *name, states->remote = remote_get(name); if (!states->remote) - return error(_("No such remote: %s"), name); + return error(_("No such remote: '%s'"), name); read_branches(); -- 2.19.0.398.gef7f67bed
[RFC PATCH 0/1] builtin/remote: quote remote name on error to display empty name
Rationale: consistent error format When adding new remote name with empty string, git will print the following error message, fatal: '' is not a valid remote name\n But when removing remote name with empty string as input, git did not print the empty string with quote, fatal: No such remote: \n Follow up question: If this is ok, should the po files also updated? This is my first patch, sorry if I did some mistakes. Shulhan (1): builtin/remote: quote remote name on error to display empty name builtin/remote.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.19.0.398.gef7f67bed
Re: [PATCH v4 0/5] read-cache: speed up index load through parallelization
On 9/11/2018 7:26 PM, Ben Peart wrote: This version of the patch merges in Duy's work to speed up index v4 decoding. I had to massage it a bit to get it to work with the multi-threading but its still largely his code. It helps a little (3%-4%) when the cache entry thread(s) take the longest and not when the index extensions loading is the long thread. I also added a minor cleanup patch to minimize the casting required when working with the memory mapped index and other minor changes based on the feedback received. Base Ref: v2.19.0 Web-Diff: https://github.com/benpeart/git/commit/9d31d5fb20 Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v4 && git checkout 9d31d5fb20 A bad merge (mistake on my part, not a bug) means this is missing some of the changes from V3. Please ignore, I'll send an updated series to address it. ### Patches Ben Peart (4): eoie: add End of Index Entry (EOIE) extension read-cache: load cache extensions on a worker thread read-cache: speed up index load through parallelization read-cache: clean up casting and byte decoding Nguyễn Thái Ngọc Duy (1): read-cache.c: optimize reading index format v4 Documentation/config.txt | 6 + Documentation/technical/index-format.txt | 23 + config.c | 18 + config.h | 1 + read-cache.c | 581 +++ 5 files changed, 531 insertions(+), 98 deletions(-) base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670
[PATCH v2 1/1] t3206-range-diff.sh: cover single-patch case
From: Derrick Stolee The commit 40ce4160 "format-patch: allow --range-diff to apply to a lone-patch" added the ability to see a range-diff as commentary after the commit message of a single patch series (i.e. [PATCH] instead of [PATCH X/N]). However, this functionality was not covered by a test case. Add a simple test case that checks that a range-diff is written as commentary to the patch. Signed-off-by: Derrick Stolee --- t/t3206-range-diff.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 3d7a2d8a4d..6babc0e276 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -154,4 +154,9 @@ do ' done +test_expect_success 'format-patch --range-diff as commentary' ' + git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual && + test_i18ngrep "^Range-diff:$" actual +' + test_done -- gitgitgadget
[PATCH v2 0/1] Add coverage for 'git format-patch --range-diff' single-patch case
In es/format-patch-rangediff, we added a '--range-diff' option to git-format-patch to automatically add a range-diff. We also added an option to write the diff as commentary to a single patch submission. However, this check was not tested. I discovered this test gap by running 'make coverage-test coverage-report' on 'next' and then comparing the uncovered lines with those in the diff between 'master' and 'next'. I have a script that automates this process, and I'm still working on polishing it. You can see an earlier version at [1]. Based on es/format-patch-rangediff Cc: sunsh...@sunshine.co Cc: p...@peff.net [1] https://github.com/derrickstolee/git/blob/coverage/contrib/coverage-diff.shA bash script to report uncovered lines that were added in a diff. Derrick Stolee (1): t3206-range-diff.sh: cover single-patch case t/t3206-range-diff.sh | 5 + 1 file changed, 5 insertions(+) base-commit: 40ce41604daf200cdc85abded0133d40faafc2f8 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-37%2Fderrickstolee%2Frange-diff-test-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-37/derrickstolee/range-diff-test-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/37 Range-diff vs v1: 1: 58347a9624 ! 1: 277a4d2bd8 t3206-range-diff.sh: cover single-patch case @@ -22,7 +22,7 @@ +test_expect_success 'format-patch --range-diff as commentary' ' + git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual && -+ grep -A 1 -e "\-\-\-" actual | grep "Range-diff:" ++ test_i18ngrep "^Range-diff:$" actual +' + test_done -- gitgitgadget
[PATCH 1/1] commit-reach: properly peel tags
From: Derrick Stolee The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e "commit-reach: make can_all_from_reach... linear" but incorrectly assumed that all objects provided were commits. During a fetch negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags to the 'from' array. The current code creates a segfault. Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach' and add a test in t6600-test-reach.sh that demonstrates this segfault. Correct the issue by peeling tags when investigating the initial list of objects in the 'from' array. Signed-off-by: Jeff King Signed-off-by: Derrick Stolee --- commit-reach.c| 25 ++--- t/helper/test-reach.c | 22 +- t/t6600-test-reach.sh | 30 -- 3 files changed, 63 insertions(+), 14 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 86715c103c..6de72c6e03 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -544,20 +544,31 @@ int can_all_from_reach_with_flag(struct object_array *from, { struct commit **list = NULL; int i; + int nr_commits; int result = 1; ALLOC_ARRAY(list, from->nr); + nr_commits = 0; for (i = 0; i < from->nr; i++) { - list[i] = (struct commit *)from->objects[i].item; + struct object *from_one = from->objects[i].item; - if (parse_commit(list[i]) || - list[i]->generation < min_generation) - return 0; + from_one = deref_tag(the_repository, from_one, +"a from object", 0); + if (!from_one || from_one->type != OBJ_COMMIT) { + from->objects[i].item->flags |= assign_flag; + continue; + } + + list[nr_commits] = (struct commit *)from_one; + if (parse_commit(list[nr_commits]) || + list[nr_commits]->generation < min_generation) + return 0; /* is this a leak? */ + nr_commits++; } - QSORT(list, from->nr, compare_commits_by_gen); + QSORT(list, nr_commits, compare_commits_by_gen); - for (i = 0; i < from->nr; i++) { + for (i = 0; i < nr_commits; i++) { /* DFS from list[i] */ struct commit_list *stack = NULL; @@ -600,7 +611,7 @@ int can_all_from_reach_with_flag(struct object_array *from, } cleanup: - for (i = 0; i < from->nr; i++) { + for (i = 0; i < nr_commits; i++) { clear_commit_marks(list[i], RESULT); clear_commit_marks(list[i], assign_flag); } diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index eb21103998..08d2ea68e8 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -31,6 +31,7 @@ int cmd__reach(int ac, const char **av) struct object_id oid_A, oid_B; struct commit *A, *B; struct commit_list *X, *Y; + struct object_array X_obj = OBJECT_ARRAY_INIT; struct commit **X_array; int X_nr, X_alloc; struct strbuf buf = STRBUF_INIT; @@ -49,7 +50,8 @@ int cmd__reach(int ac, const char **av) while (strbuf_getline(, stdin) != EOF) { struct object_id oid; - struct object *o; + struct object *orig; + struct object *peeled; struct commit *c; if (buf.len < 3) continue; @@ -57,14 +59,14 @@ int cmd__reach(int ac, const char **av) if (get_oid_committish(buf.buf + 2, )) die("failed to resolve %s", buf.buf + 2); - o = parse_object(r, ); - o = deref_tag_noverify(o); + orig = parse_object(r, ); + peeled = deref_tag_noverify(orig); - if (!o) + if (!peeled) die("failed to load commit for input %s resulting in oid %s\n", buf.buf, oid_to_hex()); - c = object_as_type(r, o, OBJ_COMMIT, 0); + c = object_as_type(r, peeled, OBJ_COMMIT, 0); if (!c) die("failed to load commit for input %s resulting in oid %s\n", @@ -85,6 +87,7 @@ int cmd__reach(int ac, const char **av) commit_list_insert(c, ); ALLOC_GROW(X_array, X_nr + 1, X_alloc); X_array[X_nr++] = c; + add_object_array(orig, NULL, _obj); break; case 'Y': @@ -113,6 +116,15 @@ int cmd__reach(int ac, const char **av) print_sorted_commit_ids(list); } else if (!strcmp(av[1], "can_all_from_reach")) { printf("%s(X,Y):%d\n", av[1], can_all_from_reach(X, Y, 1)); +
[PATCH 0/1] Properly peel tags in can_all_from_reach_with_flags()
As Peff reported [1], the refactored can_all_from_reach_with_flags() method does not properly peel tags. Since the helper method can_all_from_reach() and code in t/helper/test-reach.c all peel tags before getting to this method, it is not super-simple to create a test that demonstrates this. I modified t/helper/test-reach.c to allow calling can_all_from_reach_with_flags() directly, and added a test in t6600-test-reach.sh that demonstrates the segfault without the fix. The fix in commit-reach.c is Peff's fix verbatim. [1] https://public-inbox.org/git/0bf9103c-9377-506b-7ad7-e5273d8e9...@gmail.com/T/#u Derrick Stolee (1): commit-reach: properly peel tags commit-reach.c| 25 ++--- t/helper/test-reach.c | 22 +- t/t6600-test-reach.sh | 30 -- 3 files changed, 63 insertions(+), 14 deletions(-) base-commit: 6621c838743812aaba96e55cfec8524ea1144c2d Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-39%2Fderrickstolee%2Ftag-fix-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-39/derrickstolee/tag-fix-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/39 -- gitgitgadget
Re: [PATCH 1/1] t3206-range-diff.sh: cover single-patch case
On 9/11/2018 5:34 PM, Eric Sunshine wrote: On Tue, Sep 11, 2018 at 4:26 PM Derrick Stolee via GitGitGadget wrote: The commit 40ce4160 "format-patch: allow --range-diff to apply to a lone-patch" added the ability to see a range-diff as commentary after the commit message of a single patch series (i.e. [PATCH] instead of [PATCH X/N]). However, this functionality was not covered by a test case. Add a simple test case that checks that a range-diff is written as commentary to the patch. Signed-off-by: Derrick Stolee --- diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh @@ -154,4 +154,9 @@ do +test_expect_success 'format-patch --range-diff as commentary' ' + git format-patch --stdout --range-diff=HEAD~1 HEAD~1 >actual && + grep -A 1 -e "\-\-\-" actual | grep "Range-diff:" +' Aside from Junio's and Stefan's comments... Patch 6/14 [1], in addition to checking that a solo patch contains an interdiff, takes the extra step of checking that individual patches _don't_ contain an interdiff when --cover-letter is used. I wonder if the same should be done here, though I don't feel too strongly about it. If you do go that route, it might make sense to move this test to t4014 as neighbor to the --interdiff tests. The reason 10/14 [2] added the "git format-patch --range-diff" test to t3206 instead of t4014 was so it could do a thorough check of the embedded range-diff by re-using the specially crafted test repo set up by t3206. Your new test is much looser, thus could be moved alongside the --interdiff tests. Not a big deal, though. Either way is fine. Thanks for working on this. [1]: https://public-inbox.org/git/20180722095717.17912-7-sunsh...@sunshineco.com/ [2]: https://public-inbox.org/git/20180722095717.17912-11-sunsh...@sunshineco.com/ Thanks for these links! In particular, [2] uses this line to test the inter-diff appears: + test_i18ngrep "^Interdiff:$" 0001-fleep.patch && That's a better way to test, especially with the translation. It would be enough for my needs. Thanks, -Stolee P.S. Resending because apparently I had HTML in the last response
Gitattributes not properly supported with add command
The following: git add -u :\(glob,attr:-someAttr\):src/** Produces an error that, according to the source code, should never be visible to the user. This attribute/pathspec *should* be supported according to the documentation provided by git: fatal: BUG:builtin/add.c:498: unsupported magic 40 It looks like the documentation claims more features than the source code supports, perhaps incorrectly because I think you shouldn't be restricted to a set of attributes, and the source code doesn't properly do its job anyways and never handles this scenario. This should be fixed, if I have any time I'll look into what it would take to submit a patch, but I don't have the time for this right now.
Re: [PATCH v2 17/18] commit-reach: make can_all_from_reach... linear
On 9/12/2018 12:29 AM, Jeff King wrote: On Wed, Sep 12, 2018 at 12:14:25AM -0400, Jeff King wrote: + ALLOC_ARRAY(list, from->nr); for (i = 0; i < from->nr; i++) { - struct object *from_one = from->objects[i].item; + list[i] = (struct commit *)from->objects[i].item; Some of the objects in my array are not commits, but rather tags, so this is a bogus cast. You can see that the original code peeled them and threw away non-commits: - if (from_one->flags & assign_flag) - continue; - from_one = deref_tag(the_repository, from_one, "a from object", 0); - if (!from_one || from_one->type != OBJ_COMMIT) { - /* no way to tell if this is reachable by -* looking at the ancestry chain alone, so -* leave a note to ourselves not to worry about -* this object anymore. -*/ - from->objects[i].item->flags |= assign_flag; - continue; - } So presumably we'd need to do something similar. This patch seems to fix it for me. It's more or less a reversion of the hunk above, though I didn't dig into whether I'm violating some other assumption in your new code. I think this function leaks "list" both from the location I noted here, as well as from normal exit Thanks for the report and the fix. I'll try to create test that demonstrates this and then push up a full patch. --- commit-reach.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 622eeb313d..abe90a2f55 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -547,20 +547,31 @@ int can_all_from_reach_with_flag(struct object_array *from, { struct commit **list = NULL; int i; + int nr_commits; int result = 1; ALLOC_ARRAY(list, from->nr); + nr_commits = 0; for (i = 0; i < from->nr; i++) { - list[i] = (struct commit *)from->objects[i].item; + struct object *from_one = from->objects[i].item; - if (parse_commit(list[i]) || - list[i]->generation < min_generation) - return 0; + from_one = deref_tag(the_repository, from_one, +"a from object", 0); + if (!from_one || from_one->type != OBJ_COMMIT) { + from->objects[i].item->flags |= assign_flag; + continue; + } + + list[nr_commits] = (struct commit *)from_one; + if (parse_commit(list[nr_commits]) || + list[nr_commits]->generation < min_generation) + return 0; /* is this a leak? */ + nr_commits++; } - QSORT(list, from->nr, compare_commits_by_gen); + QSORT(list, nr_commits, compare_commits_by_gen); - for (i = 0; i < from->nr; i++) { + for (i = 0; i < nr_commits; i++) { /* DFS from list[i] */ struct commit_list *stack = NULL; @@ -603,7 +614,7 @@ int can_all_from_reach_with_flag(struct object_array *from, } cleanup: - for (i = 0; i < from->nr; i++) { + for (i = 0; i < nr_commits; i++) { clear_commit_marks(list[i], RESULT); clear_commit_marks(list[i], assign_flag); }
Re: git-credential-libsecret not prompting to unlock keyring
Apologies, forgot the crucial details post that log: $ git --version git version 2.19.0 $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 18.04.1 LTS Release:18.04 Codename: bionic
git-credential-libsecret not prompting to unlock keyring
Hi, For some reason today, after an apt update, my combination of GNOME keyring and git-credential-libsecret stopped working. Stopped working in so far as I am no longer prompted by GNOME keyring to unlock my keyring, instead I get prompted for the password that git is looking for. Trace output below; would appreciate any pointers. Many thanks == $ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git push 13:38:59.324799 git.c:415 trace: built-in: git push 13:38:59.325070 run-command.c:637 trace: run_command: GIT_DIR=.git git-remote-https origin https://github.com/myitcv/go-modules-by-example * Couldn't find host github.com in the .netrc file; using defaults * Trying 192.30.253.113... * TCP_NODELAY set * Connected to github.com (192.30.253.113) port 443 (#0) * found 133 certificates in /etc/ssl/certs/ca-certificates.crt * found 399 certificates in /etc/ssl/certs * ALPN, offering http/1.1 * SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256 *server certificate verification OK *server certificate status verification SKIPPED *common name: github.com (matched) *server certificate expiration date OK *server certificate activation date OK *certificate public key: RSA *certificate version: #3 *subject: businessCategory=Private Organization,jurisdictionOfIncorporationCountryName=US,jurisdictionOfIncorporationStateOrProvinceName=Delaware,serialNumber=5157550,C=US,ST=California,L=San Francisco,O=GitHub\, Inc.,CN=github.com *start date: Tue, 08 May 2018 00:00:00 GMT *expire date: Wed, 03 Jun 2020 12:00:00 GMT *issuer: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert SHA2 Extended Validation Server CA *compression: NULL * ALPN, server accepted to use http/1.1 > GET /myitcv/go-modules-by-example/info/refs?service=git-receive-pack HTTP/1.1 Host: github.com User-Agent: git/2.19.0 Accept: */* Accept-Encoding: deflate, gzip Accept-Language: en-GB, *;q=0.9 Pragma: no-cache < HTTP/1.1 401 Authorization Required < Server: GitHub Babel 2.0 < Content-Type: text/plain < Content-Length: 60 < WWW-Authenticate: Basic realm="GitHub" < X-GitHub-Request-Id: EF1E:073A:6C8091:C2DB67:5B9908E8 < X-Frame-Options: DENY < * Connection #0 to host github.com left intact 13:39:05.084588 run-command.c:637 trace: run_command: '/home/myitcv/dev/git/contrib/credential/libsecret/git-credential-libsecret get' Password for 'https://myi...@github.com':
Re: [PATCH 3/3] sequencer: use read_author_script()
On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood wrote: > Use the new function to read the author script, updating > read_env_script() and read_author_ident(). This means we now have a > single code path that reads the author script and uses sq_dequote() to > dequote it. This fixes potential problems with user edited scripts > as read_env_script() which did not track quotes properly. > [...] > Signed-off-by: Phillip Wood > --- > diff --git a/sequencer.c b/sequencer.c > @@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, > char **email, char **date, > static int read_env_script(struct argv_array *env) > { > + strbuf_addstr(, "GIT_AUTHOR_NAME="); > + strbuf_addstr(, name); > + argv_array_push(env, script.buf); > + strbuf_reset(); > + strbuf_addstr(, "GIT_AUTHOR_EMAIL="); > + strbuf_addstr(, email); > + argv_array_push(env, script.buf); > + strbuf_reset(); > + strbuf_addstr(, "GIT_AUTHOR_DATE="); > + strbuf_addstr(, date); > + argv_array_push(env, script.buf); > + strbuf_release(); I haven't read this series closely yet, but this caught my eye while scanning it quickly. The above noisy code can all be collapsed to the simpler: argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name); argv_array_pushf(env, "GIT_AUTHOR_EMAIL =%s", email); argv_array_pushf(env, "GIT_AUTHOR_DATE =%s", date);
Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)
On Wednesday, September 12, 2018 12:13:23 AM MST Junio C Hamano wrote: > Stephen & Linda Smith writes: > > On Tuesday, September 11, 2018 3:20:19 PM MST Junio C Hamano wrote: > >> * jc/wt-status-state-cleanup (2018-09-07) 1 commit > >> > >> - WIP: roll wt_status_state into wt_status and populate in the collect > >> > >> phase (this branch uses ss/wt-status-committable.) > >> > >> * ss/wt-status-committable (2018-09-07) 4 commits > >> > >> - wt-status.c: set the committable flag in the collect phase > >> - t7501: add test of "commit --dry-run --short" > >> - wt-status: rename commitable to committable > >> - wt-status.c: move has_unmerged earlier in the file > >> (this branch is used by jc/wt-status-state-cleanup.) > > > > I note that the jc/wt-status-state-cleanup branch is a patch "for > > illustration purposes only" [1]. > > > > I was about to update that patch to start dealing with the free() function > > calls, but noted you added the patch. Do you want me to take that patch > > and continue on? Or does someone else have something in progress? > > I do not plan to. Ok ... from the wording I wasn't sure if there wasn't another developer working this. I will pick up that patch and continue. > In general, anything that is only in 'pu' is a > fair game---when a better alternative appears, or a discussion leads > to a conclusion that a change is unneeded, they are replaced and/or > discarded. Just think of them as being kept slightly better record > of existence than merely being in the list archive, nothing more. Thanks that confirmed my reading of the Documentation/gitworkflows.txt.
[PATCH 2/3] add read_author_script() to libgit
From: Phillip Wood Add read_author_script() to sequencer.c based on the implementation in builtin/am.c and update read_am_author_script() to use read_author_script(). The sequencer code that reads the author script will be updated in the next commit. Signed-off-by: Phillip Wood --- builtin/am.c | 57 --- sequencer.c | 62 sequencer.h | 3 +++ 3 files changed, 69 insertions(+), 53 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index 8c165f747b..aa5de0ee73 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -260,32 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state, die_errno(_("could not read '%s'"), am_path(state, file)); } -/** - * Take a series of KEY='VALUE' lines where VALUE part is - * sq-quoted, and append at the end of the string list - */ -static int parse_key_value_squoted(char *buf, struct string_list *list) -{ - while (*buf) { - struct string_list_item *item; - char *np; - char *cp = strchr(buf, '='); - if (!cp) - return -1; - np = strchrnul(cp, '\n'); - *cp++ = '\0'; - item = string_list_append(list, buf); - - buf = np + (*np == '\n'); - *np = '\0'; - cp = sq_dequote(cp); - if (!cp) - return -1; - item->util = xstrdup(cp); - } - return 0; -} - /** * Reads and parses the state directory's "author-script" file, and sets * state->author_name, state->author_email and state->author_date accordingly. @@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct string_list *list) static int read_am_author_script(struct am_state *state) { const char *filename = am_path(state, "author-script"); - struct strbuf buf = STRBUF_INIT; - struct string_list kv = STRING_LIST_INIT_DUP; - int retval = -1; /* assume failure */ - int fd; assert(!state->author_name); assert(!state->author_email); assert(!state->author_date); - fd = open(filename, O_RDONLY); - if (fd < 0) { - if (errno == ENOENT) - return 0; - die_errno(_("could not open '%s' for reading"), filename); - } - strbuf_read(, fd, 0); - close(fd); - if (parse_key_value_squoted(buf.buf, )) - goto finish; + if (read_author_script(filename, >author_name, + >author_email, >author_date, 1)) + exit(128); - if (kv.nr != 3 || - strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") || - strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") || - strcmp(kv.items[2].string, "GIT_AUTHOR_DATE")) - goto finish; - state->author_name = kv.items[0].util; - state->author_email = kv.items[1].util; - state->author_date = kv.items[2].util; - retval = 0; -finish: - string_list_clear(, !!retval); - strbuf_release(); - return retval; + return 0; } /** diff --git a/sequencer.c b/sequencer.c index dc2c58d464..5d0ff8f1c1 100644 --- a/sequencer.c +++ b/sequencer.c @@ -660,6 +660,68 @@ static int write_author_script(const char *message) return res; } +/** + * Take a series of KEY='VALUE' lines where VALUE part is + * sq-quoted, and append at the end of the string list + */ +static int parse_key_value_squoted(char *buf, struct string_list *list) +{ + while (*buf) { + struct string_list_item *item; + char *np; + char *cp = strchr(buf, '='); + if (!cp) + return -1; + np = strchrnul(cp, '\n'); + *cp++ = '\0'; + item = string_list_append(list, buf); + + buf = np + (*np == '\n'); + *np = '\0'; + cp = sq_dequote(cp); + if (!cp) + return -1; + item->util = xstrdup(cp); + } + return 0; +} + +int read_author_script(const char *path, char **name, char **email, char **date, + int allow_missing) +{ + struct strbuf buf = STRBUF_INIT; + struct string_list kv = STRING_LIST_INIT_DUP; + int retval = -1; + + if (strbuf_read_file(, path, 256) <= 0) { + strbuf_release(); + if (errno == ENOENT && allow_missing) + return 0; + else + return error_errno(_("could not open '%s' for reading"), + path); + } + + if (parse_key_value_squoted(buf.buf, )) { + error(_("unable to parse '%s'"), path); + goto finish; + } + if (kv.nr != 3 || +
[PATCH 1/3] am: rename read_author_script()
From: Phillip Wood Rename read_author_script() in preparation for adding a shared read_author_script() function to libgit. Signed-off-by: Phillip Wood --- builtin/am.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index 5e866d17c7..8c165f747b 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -302,7 +302,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list) * script, and thus if the file differs from what this function expects, it is * better to bail out than to do something that the user does not expect. */ -static int read_author_script(struct am_state *state) +static int read_am_author_script(struct am_state *state) { const char *filename = am_path(state, "author-script"); struct strbuf buf = STRBUF_INIT; @@ -411,7 +411,7 @@ static void am_load(struct am_state *state) BUG("state file 'last' does not exist"); state->last = strtol(sb.buf, NULL, 10); - if (read_author_script(state) < 0) + if (read_am_author_script(state) < 0) die(_("could not parse author script")); read_commit_msg(state); -- 2.18.0
[PATCH 3/3] sequencer: use read_author_script()
From: Phillip Wood Use the new function to read the author script, updating read_env_script() and read_author_ident(). This means we now have a single code path that reads the author script and uses sq_dequote() to dequote it. This fixes potential problems with user edited scripts as read_env_script() which did not track quotes properly. This commit also removes the fallback code for checking for a broken author script after git is upgraded when a rebase is stopped. Now that the parsing uses sq_dequote() it will reliably return an error if the quoting is broken and the user will have to abort the rebase and restart. This isn't ideal but it's a corner case and the detection of the broken quoting could be confused by user edited author scripts. Signed-off-by: Phillip Wood --- sequencer.c | 108 +++- 1 file changed, 30 insertions(+), 78 deletions(-) diff --git a/sequencer.c b/sequencer.c index 5d0ff8f1c1..630741cfe0 100644 --- a/sequencer.c +++ b/sequencer.c @@ -723,54 +723,35 @@ int read_author_script(const char *path, char **name, char **email, char **date, return retval; } -/* - * write_author_script() used to fail to terminate the last line with a "'" and - * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for - * the terminating "'" on the last line to see how "'" has been escaped in case - * git was upgraded while rebase was stopped. - */ -static int quoting_is_broken(const char *s, size_t n) -{ - /* Skip any empty lines in case the file was hand edited */ - while (n > 0 && s[--n] == '\n') - ; /* empty */ - if (n > 0 && s[n] != '\'') - return 1; - - return 0; -} - /* * Read a list of environment variable assignments (such as the author-script * file) into an environment block. Returns -1 on error, 0 otherwise. */ static int read_env_script(struct argv_array *env) { struct strbuf script = STRBUF_INIT; - int i, count = 0, sq_bug; - const char *p2; - char *p; + char *name, *email, *date; - if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0) + if (read_author_script(rebase_path_author_script(), + , , , 0)) return -1; - /* write_author_script() used to quote incorrectly */ - sq_bug = quoting_is_broken(script.buf, script.len); - for (p = script.buf; *p; p++) - if (sq_bug && skip_prefix(p, "'''", )) - strbuf_splice(, p - script.buf, p2 - p, "'", 1); - else if (skip_prefix(p, "'\\''", )) - strbuf_splice(, p - script.buf, p2 - p, "'", 1); - else if (*p == '\'') - strbuf_splice(, p-- - script.buf, 1, "", 0); - else if (*p == '\n') { - *p = '\0'; - count++; - } - for (i = 0, p = script.buf; i < count; i++) { - argv_array_push(env, p); - p += strlen(p) + 1; - } + strbuf_addstr(, "GIT_AUTHOR_NAME="); + strbuf_addstr(, name); + argv_array_push(env, script.buf); + strbuf_reset(); + strbuf_addstr(, "GIT_AUTHOR_EMAIL="); + strbuf_addstr(, email); + argv_array_push(env, script.buf); + strbuf_reset(); + strbuf_addstr(, "GIT_AUTHOR_DATE="); + strbuf_addstr(, date); + argv_array_push(env, script.buf); + strbuf_release(); + + free(name); + free(email); + free(date); return 0; } @@ -790,54 +771,25 @@ static char *get_author(const char *message) /* Read author-script and return an ident line (author timestamp) */ static const char *read_author_ident(struct strbuf *buf) { - const char *keys[] = { - "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE=" - }; - struct strbuf out = STRBUF_INIT; - char *in, *eol; - const char *val[3]; - int i = 0; + char *name, *email, *date; - if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0) + if (read_author_script(rebase_path_author_script(), + , , , 0)) return NULL; - /* dequote values and construct ident line in-place */ - for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) { - if (!skip_prefix(in, keys[i], (const char **))) { - warning(_("could not parse '%s' (looking for '%s')"), - rebase_path_author_script(), keys[i]); - return NULL; - } - - eol = strchrnul(in, '\n'); - *eol = '\0'; - if (!sq_dequote(in)) { - warning(_("bad quoting on %s value in '%s'"), - keys[i], rebase_path_author_script()); - return NULL; - } -
[PATCH 0/3] am/rebase: share read_author_script()
From: Phillip Wood This is a follow up to pw/rebase-i-author-script-fix, it reduces code duplication and improves rebase's parsing of the author script. After this I'll do another series to share the code to write the author script. Phillip Wood (3): am: rename read_author_script() add read_author_script() to libgit sequencer: use read_author_script() builtin/am.c | 61 ++-- sequencer.c | 160 --- sequencer.h | 3 + 3 files changed, 96 insertions(+), 128 deletions(-) -- 2.18.0
Re: What's cooking in git.git (Sep 2018, #02; Tue, 11)
Stephen & Linda Smith writes: > On Tuesday, September 11, 2018 3:20:19 PM MST Junio C Hamano wrote: >> >> * jc/wt-status-state-cleanup (2018-09-07) 1 commit >> - WIP: roll wt_status_state into wt_status and populate in the collect >> phase (this branch uses ss/wt-status-committable.) >> >> * ss/wt-status-committable (2018-09-07) 4 commits >> - wt-status.c: set the committable flag in the collect phase >> - t7501: add test of "commit --dry-run --short" >> - wt-status: rename commitable to committable >> - wt-status.c: move has_unmerged earlier in the file >> (this branch is used by jc/wt-status-state-cleanup.) >> > > I note that the jc/wt-status-state-cleanup branch is a patch "for > illustration > purposes only" [1]. > > I was about to update that patch to start dealing with the free() function > calls, but noted you added the patch. Do you want me to take that patch and > continue on? Or does someone else have something in progress? I do not plan to. In general, anything that is only in 'pu' is a fair game---when a better alternative appears, or a discussion leads to a conclusion that a change is unneeded, they are replaced and/or discarded. Just think of them as being kept slightly better record of existence than merely being in the list archive, nothing more.
Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
Jeff King wrote: > On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote: >> I do not think we would mind terribly if we do not support >> combinations like gzipped-and-then-chunked from day one. An in-code >> NEEDSWORK comment that refers to the production in RFC 2616 Page 143 >> may not hurt, though. > > It's pretty common for Git to send gzip'd contents, so this might > actually be necessary on day one. However, it looks like we do so by > setting the content-encoding header. Correct, we haven't been using Transfer-Encoding for that. > I really wonder if this topic is worth pursuing further without finding > a real-world case that actually fails with the v2.19 code. I.e., is > there actually a server that doesn't set CONTENT_LENGTH and really can't > handle read-to-eof? It's plausible to me, but it's also equally > plausible that we'd be breaking some other case. I wonder about the motivating IIS case. The CGI spec says that CONTENT_LENGTH is set if and only if the message has a message-body. When discussing message-body, it says Request-Data = [ request-body ] [ extension-data ] [...] A request-body is supplied with the request if the CONTENT_LENGTH is not NULL. The server MUST make at least that many bytes available for the script to read. The server MAY signal an end-of-file condition after CONTENT_LENGTH bytes have been read or it MAY supply extension data. Therefore, the script MUST NOT attempt to read more than CONTENT_LENGTH bytes, even if more data is available. Does that mean that if CONTENT_LENGTH is not set, then we are guaranteed to see EOF, because extension-data cannot be present? If so, then what we have in v2.19 (plus Max's test improvement that is in "next") is already enough. So I agree. 1. Junio, please eject this patch from "pu", since we don't have any need for it. 2. IIS users, please test v2.19 and let us know how it goes. Do we have any scenarios that would use an empty POST (or other non-GET) request? Thanks, Jonathan
Re: 2.19.0 regression: leaving editor with empty commit message doesn't stop rebase [was: Re: [RFC PATCH v5 9/9] git-rebase: make --allow-empty-message the default]
Hi, On Tue, Sep 11, 2018 at 7:42 PM SZEDER Gábor wrote: > Lately I noticed that occasionally I ended up with an empty commit > message after an interactive rebase...those empty commit messages are the > consequence of a regression in v2.19.0, which bisects down to this > patch. > > To reproduce the issue, just start an interactive rebase, choose a > commit to reword, save, delete all the commit message, save, and BAM! > there is the commit with the empty message. > > > On Wed, Jun 27, 2018 at 12:23:19AM -0700, Elijah Newren wrote: > > rebase backends currently behave differently with empty commit messages, > > largely as a side-effect of the different underlying commands on which > > they are based. am-based rebases apply commits with an empty commit ... > I agree that this is an issue that should be addressed, and also agree > that it's reasonable to accept an empty commit message, if the > original commit already had an empty commit message. (Though perhaps > not completely silently, but with a warning? Dunno.) > > However, this should only be the case for existing commit messages > that are taken verbatim, but never in the case when the user is asked > for a commit message. ... > # Please enter the commit message for your changes. Lines starting > # with '#' will be ignored, and an empty message aborts the commit. ... > Let's suppose you start an interactive rebase, choose a commit to > squash, save the instruction sheet, rebase fires up your editor, and > then you notice that you mistakenly chose the wrong commit to squash. > What do you do, how do you abort? All sound like reasonable arguments to me to differentiate between commit messages that started empty (which I admit was what I had in mind) vs. ones where we asked for user input and it came out empty. Are you cooking up a patch? I might be able to find a little time to dig into this tomorrow if no one else is working on it already by then. It'll probably conflict a bunch with all the rewrite-rebase-in-C topics in flight in pu. :-(