Re: Duplicate safecrlf warning for racily clean index entry
On Wed, 2018-02-21 at 08:53 +0100, Torsten Bögershausen wrote: > I don't hava a pointer, but what should happen ? > 2 warnings for 2 "git add" should be OK, I think. > > 1 warning is part of the optimization, that Git does to handle > hundrets and thousands of files efficciently. > > Is the 1/2 warning real live problem ? As I've suggested, my opinion is that the nondeterministic second warning can result in significant user confusion and should be avoided. (If it were always shown, I'd be less concerned.) We'll see what the decision-makers think. Matt
Re: Duplicate safecrlf warning for racily clean index entry
On Tue, 2018-02-20 at 08:42 -0500, Matt McCutchen wrote: > In either case, if "git update-index --refresh" (or "git status") is > run before "git add", then "git add" does not print the warning. On > the other hand, if line endings in the working tree file are changed, > then git shows the file as having an unstaged change, even though the > content that would be added to the index after CRLF conversion is > identical. So it seems that git remembers the pre-conversion file > content and uses it for "git update-index --refresh" and would just > need to use it for "git add" as well. On further testing, this analysis is wrong. What I was seeing is that if the size of the working tree file has changed, git reports an unstaged change. (I suppose that reporting an unstaged change in this case without checking whether the post-conversion content has changed may be an important optimization.) If the line endings are changed without changing the size or post-conversion content, then no unstaged change is reported. It does not appear that git saves the pre- conversion content. Thus, if it were possible to create a file that doesn't need a safecrlf warning, add it to the index, and then modify it so that it does need a safecrlf warning without changing the size or post-conversion content, we would have a bug where no warning is shown in the case where "git status" is run before the second "git add". I believe this bug can't occur in the particular case of CRLF conversion without other filters because the file that doesn't need a safecrlf warning has a unique minimum (LF) or maximum (CRLF) size, though I presume it could occur with custom filters. My proposal would then be that "git add" should not show a safecrlf warning if the size and post-conversion content haven't changed; it would merely bring "git add" to parity with the potential bug in the "git status" case. Matt
Duplicate safecrlf warning for racily clean index entry
I noticed that if a file subject to a safecrlf warning is added to the index in the same second that it is created, resulting in a "racily clean" index entry, then a subsequent "git add" command prints another safecrlf warning. I reproduced this on the current "next" (499d7c4f91). The procedure: $ git init $ git config core.autocrlf true $ echo foo >file1 && git add file1 && git add file1 warning: LF will be replaced by CRLF in file1. The file will have its original line endings in your working directory. warning: LF will be replaced by CRLF in file1. The file will have its original line endings in your working directory. $ echo bar >file2 && sleep 1 && git add file2 && git add file2 warning: LF will be replaced by CRLF in file2. The file will have its original line endings in your working directory. This came up when I ran the test suite for Braid on Windows (https://github.com/cristibalan/braid/issues/77). The phenomenon actually seems to be more general: touching the file causes the next "git add" to print a safecrlf warning, suggesting that the warning occurs whenever the index entry is dirty. One could argue that a new warning is reasonable after touching the file, but it seems clear that "racy cleanliness" is an implementation detail that shouldn't have user-visible nondeterministic effects. In either case, if "git update-index --refresh" (or "git status") is run before "git add", then "git add" does not print the warning. On the other hand, if line endings in the working tree file are changed, then git shows the file as having an unstaged change, even though the content that would be added to the index after CRLF conversion is identical. So it seems that git remembers the pre-conversion file content and uses it for "git update-index --refresh" and would just need to use it for "git add" as well. Thoughts about the proposed change? Does someone want to work on it or give me a pointer to where to get started? Thanks, Matt
Tools that do an automatic fetch defeat "git push --force-with-lease"
When I'm rewriting history, "git push --force-with-lease" is a nice safeguard compared to "git push --force", but it still assumes the remote-tracking ref gives the old state the user wants to overwrite. Tools that do an implicit fetch, assuming it to be a safe operation, may break this assumption. In the worst case, Visual Studio Code does an automatic fetch every 3 minutes by default [1], making --force-with-lease pretty much reduce to --force. For a safer workflow, "git push" would check against a separate "old" ref that isn't updated by "git fetch", but is updated by "git push" the same way the remote-tracking ref is and maybe also by commands that update the local branch to take into account remote changes (I'm not sure what reasonable scenarios there are, if any). Has there been any work on this? I can write a wrapper script for the simple case of "git push" of a single branch to the same branch on a remote, which is pretty much all I use right now, but a native implementation would support all of the command-line usage forms, which would be nice. Thanks for reading! [1] https://github.com/Microsoft/vscode/blob/535a3de60023c81d75d0eac22044284f07dbcddf/extensions/git/src/autofetch.ts Matt
[PATCH v2 3/3] fetch-pack: add specific error for fetching an unadvertised object
Enhance filter_refs (which decides whether a request for an unadvertised object should be sent to the server) to record a new match status on the "struct ref" when a request is not allowed, and have report_unmatched_refs check for this status and print a special error message, "Server does not allow request for unadvertised object". Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- fetch-pack.c | 42 +++--- remote.h | 9 +++-- t/t5516-fetch-push.sh | 2 +- 3 files changed, 35 insertions(+), 18 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index 7c8d44c..f12bfcd 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -578,7 +578,7 @@ static void filter_refs(struct fetch_pack_args *args, break; /* definitely do not have it */ else if (cmp == 0) { keep = 1; /* definitely have it */ - sought[i]->matched = 1; + sought[i]->match_status = REF_MATCHED; } i++; } @@ -598,22 +598,24 @@ static void filter_refs(struct fetch_pack_args *args, } /* Append unmatched requests to the list */ - if ((allow_unadvertised_object_request & - (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { - for (i = 0; i < nr_sought; i++) { - unsigned char sha1[20]; + for (i = 0; i < nr_sought; i++) { + unsigned char sha1[20]; - ref = sought[i]; - if (ref->matched) - continue; - if (get_sha1_hex(ref->name, sha1) || - ref->name[40] != '\0' || - hashcmp(sha1, ref->old_oid.hash)) - continue; + ref = sought[i]; + if (ref->match_status != REF_NOT_MATCHED) + continue; + if (get_sha1_hex(ref->name, sha1) || + ref->name[40] != '\0' || + hashcmp(sha1, ref->old_oid.hash)) + continue; - ref->matched = 1; + if ((allow_unadvertised_object_request & + (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { + ref->match_status = REF_MATCHED; *newtail = copy_ref(ref); newtail = &(*newtail)->next; + } else { + ref->match_status = REF_UNADVERTISED_NOT_ALLOWED; } } *refs = newlist; @@ -1100,9 +1102,19 @@ int report_unmatched_refs(struct ref **sought, int nr_sought) int i, ret = 0; for (i = 0; i < nr_sought; i++) { - if (!sought[i] || sought[i]->matched) + if (!sought[i]) continue; - error(_("no such remote ref %s"), sought[i]->name); + switch (sought[i]->match_status) { + case REF_MATCHED: + continue; + case REF_NOT_MATCHED: + error(_("no such remote ref %s"), sought[i]->name); + break; + case REF_UNADVERTISED_NOT_ALLOWED: + error(_("Server does not allow request for unadvertised object %s"), + sought[i]->name); + break; + } ret = 1; } return ret; diff --git a/remote.h b/remote.h index 9248811..0b9d8c4 100644 --- a/remote.h +++ b/remote.h @@ -89,8 +89,13 @@ struct ref { force:1, forced_update:1, expect_old_sha1:1, - deletion:1, - matched:1; + deletion:1; + + enum { + REF_NOT_MATCHED = 0, /* initial value */ + REF_MATCHED, + REF_UNADVERTISED_NOT_ALLOWED + } match_status; /* * Order is important here, as we write to FETCH_HEAD diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 0d13a45..78f3b8e 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1099,7 +1099,7 @@ test_expect_success 'fetch exact SHA1' ' # fetching the hidden object should fail by default test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy 2>err && - test_i18ngrep "no such remote ref" err && + test_i18ngrep "Server does not allow request for unadvertised object" err && test_must_fail git rev-parse --verify refs/heads/copy && # the server side can allow it to succeed -- 2.9.3
[PATCH v2 2/3] fetch_refs_via_pack: call report_unmatched_refs
"git fetch" currently doesn't bother to check that it got all refs it sought, because the common case of requesting a nonexistent ref triggers a die() in get_fetch_map. However, there's at least one case that slipped through: "git fetch REMOTE SHA1" if the server doesn't allow requests for unadvertised objects. Make fetch_refs_via_pack (which is on the "git fetch" code path) call report_unmatched_refs so that we at least get an error message in that case. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- t/t5516-fetch-push.sh | 3 ++- transport.c | 14 +- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 26b2caf..0d13a45 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1098,7 +1098,8 @@ test_expect_success 'fetch exact SHA1' ' test_must_fail git cat-file -t $the_commit && # fetching the hidden object should fail by default - test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy && + test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy 2>err && + test_i18ngrep "no such remote ref" err && test_must_fail git rev-parse --verify refs/heads/copy && # the server side can allow it to succeed diff --git a/transport.c b/transport.c index 04e5d66..c377907 100644 --- a/transport.c +++ b/transport.c @@ -204,6 +204,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus static int fetch_refs_via_pack(struct transport *transport, int nr_heads, struct ref **to_fetch) { + int ret = 0; struct git_transport_data *data = transport->data; struct ref *refs; char *dest = xstrdup(transport->url); @@ -241,19 +242,22 @@ static int fetch_refs_via_pack(struct transport *transport, >pack_lockfile); close(data->fd[0]); close(data->fd[1]); - if (finish_connect(data->conn)) { - free_refs(refs); - refs = NULL; - } + if (finish_connect(data->conn)) + ret = -1; data->conn = NULL; data->got_remote_heads = 0; data->options.self_contained_and_connected = args.self_contained_and_connected; + if (refs == NULL) + ret = -1; + if (report_unmatched_refs(to_fetch, nr_heads)) + ret = -1; + free_refs(refs_tmp); free_refs(refs); free(dest); - return (refs ? 0 : -1); + return ret; } static int push_had_errors(struct ref *ref) -- 2.9.3
[PATCH v2 1/3] fetch-pack: move code to report unmatched refs to a function
We're preparing to reuse this code in transport.c for "git fetch". While I'm here, internationalize the existing error message. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- builtin/fetch-pack.c | 7 +-- fetch-pack.c | 13 + fetch-pack.h | 6 ++ t/t5500-fetch-pack.sh | 6 +++--- 4 files changed, 23 insertions(+), 9 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index cfe9e44..2a1c1c2 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -219,12 +219,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) * remote no-such-ref' would silently succeed without issuing * an error. */ - for (i = 0; i < nr_sought; i++) { - if (!sought[i] || sought[i]->matched) - continue; - error("no such remote ref %s", sought[i]->name); - ret = 1; - } + ret |= report_unmatched_refs(sought, nr_sought); while (ref) { printf("%s %s\n", diff --git a/fetch-pack.c b/fetch-pack.c index 601f077..7c8d44c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1094,3 +1094,16 @@ struct ref *fetch_pack(struct fetch_pack_args *args, clear_shallow_info(); return ref_cpy; } + +int report_unmatched_refs(struct ref **sought, int nr_sought) +{ + int i, ret = 0; + + for (i = 0; i < nr_sought; i++) { + if (!sought[i] || sought[i]->matched) + continue; + error(_("no such remote ref %s"), sought[i]->name); + ret = 1; + } + return ret; +} diff --git a/fetch-pack.h b/fetch-pack.h index c912e3d..a2d46e6 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -45,4 +45,10 @@ struct ref *fetch_pack(struct fetch_pack_args *args, struct sha1_array *shallow, char **pack_lockfile); +/* + * Print an appropriate error message for each sought ref that wasn't + * matched. Return 0 if all sought refs were matched, otherwise 1. + */ +int report_unmatched_refs(struct ref **sought, int nr_sought); + #endif diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 505e1b4..b5865b3 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -484,7 +484,7 @@ test_expect_success 'test lonely missing ref' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/xyzzy ) >/dev/null 2>error-m && - test_cmp expect-error error-m + test_i18ncmp expect-error error-m ' test_expect_success 'test missing ref after existing' ' @@ -492,7 +492,7 @@ test_expect_success 'test missing ref after existing' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/A refs/heads/xyzzy ) >/dev/null 2>error-em && - test_cmp expect-error error-em + test_i18ncmp expect-error error-em ' test_expect_success 'test missing ref before existing' ' @@ -500,7 +500,7 @@ test_expect_success 'test missing ref before existing' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/xyzzy refs/heads/A ) >/dev/null 2>error-me && - test_cmp expect-error error-me + test_i18ncmp expect-error error-me ' test_expect_success 'test --all, --depth, and explicit head' ' -- 2.9.3
Re: [PATCH 1/3] fetch-pack: move code to report unmatched refs to a function
On Wed, 2017-02-22 at 09:11 -0800, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > We're preparing to reuse this code in transport.c for "git fetch". > > > > While I'm here, internationalize the existing error message. > > --- > > Sounds good. Please just say it is OK for me to forge your sign-off > ;-) Oops. Given the other issue below, I'll just regenerate the patch series. > > diff --git a/fetch-pack.h b/fetch-pack.h > > index c912e3d..fd4d80e 100644 > > --- a/fetch-pack.h > > +++ b/fetch-pack.h > > @@ -45,4 +45,13 @@ struct ref *fetch_pack(struct fetch_pack_args > > *args, > > struct sha1_array *shallow, > > char **pack_lockfile); > > > > +/* > > + * Print an appropriate error message for each sought ref that > > wasn't > > + * matched. Return 0 if all sought refs were matched, otherwise > > 1. > > + * > > + * The type of "sought" should be "const struct ref *const *" but > > for > > + * http://stackoverflow.com/questions/5055655/double-pointer-const > > -correctness-warnings-in-c . > > + */ > > This is an unfinished sentence, but I wonder if we even need to have > it here? I'd be surprised if this function was unique in the > codebase that takes an array pointer whose type is looser than > necessary because of well-known language rules. You're probably right. I'm in the habit of documenting things that were unknown to me, but I'll take your word for what's well-known to the average git developer. I'll remove the remark. Matt
[PATCH 1/3] fetch-pack: move code to report unmatched refs to a function
We're preparing to reuse this code in transport.c for "git fetch". While I'm here, internationalize the existing error message. --- builtin/fetch-pack.c | 7 +-- fetch-pack.c | 13 + fetch-pack.h | 9 + t/t5500-fetch-pack.sh | 6 +++--- 4 files changed, 26 insertions(+), 9 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index cfe9e44..2a1c1c2 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -219,12 +219,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) * remote no-such-ref' would silently succeed without issuing * an error. */ - for (i = 0; i < nr_sought; i++) { - if (!sought[i] || sought[i]->matched) - continue; - error("no such remote ref %s", sought[i]->name); - ret = 1; - } + ret |= report_unmatched_refs(sought, nr_sought); while (ref) { printf("%s %s\n", diff --git a/fetch-pack.c b/fetch-pack.c index 601f077..7c8d44c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1094,3 +1094,16 @@ struct ref *fetch_pack(struct fetch_pack_args *args, clear_shallow_info(); return ref_cpy; } + +int report_unmatched_refs(struct ref **sought, int nr_sought) +{ + int i, ret = 0; + + for (i = 0; i < nr_sought; i++) { + if (!sought[i] || sought[i]->matched) + continue; + error(_("no such remote ref %s"), sought[i]->name); + ret = 1; + } + return ret; +} diff --git a/fetch-pack.h b/fetch-pack.h index c912e3d..fd4d80e 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -45,4 +45,13 @@ struct ref *fetch_pack(struct fetch_pack_args *args, struct sha1_array *shallow, char **pack_lockfile); +/* + * Print an appropriate error message for each sought ref that wasn't + * matched. Return 0 if all sought refs were matched, otherwise 1. + * + * The type of "sought" should be "const struct ref *const *" but for + * http://stackoverflow.com/questions/5055655/double-pointer-const-correctness-warnings-in-c . + */ +int report_unmatched_refs(struct ref **sought, int nr_sought); + #endif diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 505e1b4..b5865b3 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -484,7 +484,7 @@ test_expect_success 'test lonely missing ref' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/xyzzy ) >/dev/null 2>error-m && - test_cmp expect-error error-m + test_i18ncmp expect-error error-m ' test_expect_success 'test missing ref after existing' ' @@ -492,7 +492,7 @@ test_expect_success 'test missing ref after existing' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/A refs/heads/xyzzy ) >/dev/null 2>error-em && - test_cmp expect-error error-em + test_i18ncmp expect-error error-em ' test_expect_success 'test missing ref before existing' ' @@ -500,7 +500,7 @@ test_expect_success 'test missing ref before existing' ' cd client && test_must_fail git fetch-pack --no-progress .. refs/heads/xyzzy refs/heads/A ) >/dev/null 2>error-me && - test_cmp expect-error error-me + test_i18ncmp expect-error error-me ' test_expect_success 'test --all, --depth, and explicit head' ' -- 2.9.3
[PATCH 3/3] fetch-pack: add specific error for fetching an unadvertised object
Enhance filter_refs (which decides whether a request for an unadvertised object should be sent to the server) to record a new match status on the "struct ref" when a request is not allowed, and have report_unmatched_refs check for this status and print a special error message, "Server does not allow request for unadvertised object". --- fetch-pack.c | 42 +++--- remote.h | 9 +++-- t/t5516-fetch-push.sh | 2 +- 3 files changed, 35 insertions(+), 18 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index 7c8d44c..f12bfcd 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -578,7 +578,7 @@ static void filter_refs(struct fetch_pack_args *args, break; /* definitely do not have it */ else if (cmp == 0) { keep = 1; /* definitely have it */ - sought[i]->matched = 1; + sought[i]->match_status = REF_MATCHED; } i++; } @@ -598,22 +598,24 @@ static void filter_refs(struct fetch_pack_args *args, } /* Append unmatched requests to the list */ - if ((allow_unadvertised_object_request & - (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { - for (i = 0; i < nr_sought; i++) { - unsigned char sha1[20]; + for (i = 0; i < nr_sought; i++) { + unsigned char sha1[20]; - ref = sought[i]; - if (ref->matched) - continue; - if (get_sha1_hex(ref->name, sha1) || - ref->name[40] != '\0' || - hashcmp(sha1, ref->old_oid.hash)) - continue; + ref = sought[i]; + if (ref->match_status != REF_NOT_MATCHED) + continue; + if (get_sha1_hex(ref->name, sha1) || + ref->name[40] != '\0' || + hashcmp(sha1, ref->old_oid.hash)) + continue; - ref->matched = 1; + if ((allow_unadvertised_object_request & + (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { + ref->match_status = REF_MATCHED; *newtail = copy_ref(ref); newtail = &(*newtail)->next; + } else { + ref->match_status = REF_UNADVERTISED_NOT_ALLOWED; } } *refs = newlist; @@ -1100,9 +1102,19 @@ int report_unmatched_refs(struct ref **sought, int nr_sought) int i, ret = 0; for (i = 0; i < nr_sought; i++) { - if (!sought[i] || sought[i]->matched) + if (!sought[i]) continue; - error(_("no such remote ref %s"), sought[i]->name); + switch (sought[i]->match_status) { + case REF_MATCHED: + continue; + case REF_NOT_MATCHED: + error(_("no such remote ref %s"), sought[i]->name); + break; + case REF_UNADVERTISED_NOT_ALLOWED: + error(_("Server does not allow request for unadvertised object %s"), + sought[i]->name); + break; + } ret = 1; } return ret; diff --git a/remote.h b/remote.h index 9248811..0b9d8c4 100644 --- a/remote.h +++ b/remote.h @@ -89,8 +89,13 @@ struct ref { force:1, forced_update:1, expect_old_sha1:1, - deletion:1, - matched:1; + deletion:1; + + enum { + REF_NOT_MATCHED = 0, /* initial value */ + REF_MATCHED, + REF_UNADVERTISED_NOT_ALLOWED + } match_status; /* * Order is important here, as we write to FETCH_HEAD diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 0d13a45..78f3b8e 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1099,7 +1099,7 @@ test_expect_success 'fetch exact SHA1' ' # fetching the hidden object should fail by default test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy 2>err && - test_i18ngrep "no such remote ref" err && + test_i18ngrep "Server does not allow request for unadvertised object" err && test_must_fail git rev-parse --verify refs/heads/copy && # the server side can allow it to succeed -- 2.9.3
[PATCH 2/3] fetch_refs_via_pack: call report_unmatched_refs
"git fetch" currently doesn't bother to check that it got all refs it sought, because the common case of requesting a nonexistent ref triggers a die() in get_fetch_map. However, there's at least one case that slipped through: "git fetch REMOTE SHA1" if the server doesn't allow requests for unadvertised objects. Make fetch_refs_via_pack (which is on the "git fetch" code path) call report_unmatched_refs so that we at least get an error message in that case. --- t/t5516-fetch-push.sh | 3 ++- transport.c | 14 +- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 26b2caf..0d13a45 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1098,7 +1098,8 @@ test_expect_success 'fetch exact SHA1' ' test_must_fail git cat-file -t $the_commit && # fetching the hidden object should fail by default - test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy && + test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy 2>err && + test_i18ngrep "no such remote ref" err && test_must_fail git rev-parse --verify refs/heads/copy && # the server side can allow it to succeed diff --git a/transport.c b/transport.c index 04e5d66..c377907 100644 --- a/transport.c +++ b/transport.c @@ -204,6 +204,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus static int fetch_refs_via_pack(struct transport *transport, int nr_heads, struct ref **to_fetch) { + int ret = 0; struct git_transport_data *data = transport->data; struct ref *refs; char *dest = xstrdup(transport->url); @@ -241,19 +242,22 @@ static int fetch_refs_via_pack(struct transport *transport, >pack_lockfile); close(data->fd[0]); close(data->fd[1]); - if (finish_connect(data->conn)) { - free_refs(refs); - refs = NULL; - } + if (finish_connect(data->conn)) + ret = -1; data->conn = NULL; data->got_remote_heads = 0; data->options.self_contained_and_connected = args.self_contained_and_connected; + if (refs == NULL) + ret = -1; + if (report_unmatched_refs(to_fetch, nr_heads)) + ret = -1; + free_refs(refs_tmp); free_refs(refs); free(dest); - return (refs ? 0 : -1); + return ret; } static int push_had_errors(struct ref *ref) -- 2.9.3
Re: [PATCH] fetch: print an error when declining to request an unadvertised object
On Mon, 2017-02-20 at 22:36 -0800, Junio C Hamano wrote: > Hmph, I would have expected this to be done as a three-patch series, > > * move the loop at the end of cmd_fetch_pack() to a separate helper > function report_unmatched_refs() and call it; > > * add a call to report_unmatched_refs() to the transport layer; > > * enhance report_unmatched_refs() by introducing match_status > field and adding new code to filter_refs() to diagnose other > kinds of errors. Sure. > The result looks reasonable from a cursory read, though. > > Thanks for following it up to the completion. This remark led me to believe you were satisfied with the single patch, but the last "What's cooking in git.git" mail says "Expecting a split series?". Anyway, I made a split series and will send it in a moment. I don't know if all the commit messages include exactly the information you want; hopefully you're happy to edit them as desired. Compared to the previous patch, there is one fix in the net result: fixing t5500-fetch- pack.sh to deal with the internationalized "no such remote ref" message. Matt
[PATCH] fetch: print an error when declining to request an unadvertised object
Currently "git fetch REMOTE SHA1" silently exits 1 if the server doesn't allow requests for unadvertised objects by sha1. The more common case of requesting a nonexistent ref normally triggers a die() in get_fetch_map, so "git fetch" wasn't bothering to check after the fetch that it got all the refs it sought, like "git fetch-pack" does near the end of cmd_fetch_pack. Move the code from cmd_fetch_pack to a new function, report_unmatched_refs, that is called by fetch_refs_via_pack as part of "git fetch". Also, change filter_refs (which checks whether a request for an unadvertised object should be sent to the server) to set a new match status on the "struct ref" when the request is not allowed, and have report_unmatched_refs check for this status and print a special error message, "Server does not allow request for unadvertised object". Finally, add a simple test case for "git fetch REMOTE SHA1". Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- builtin/fetch-pack.c | 7 +-- fetch-pack.c | 51 ++- fetch-pack.h | 9 + remote.h | 9 +++-- t/t5516-fetch-push.sh | 3 ++- transport.c | 14 +- 6 files changed, 66 insertions(+), 27 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index cfe9e44..2a1c1c2 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -219,12 +219,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) * remote no-such-ref' would silently succeed without issuing * an error. */ - for (i = 0; i < nr_sought; i++) { - if (!sought[i] || sought[i]->matched) - continue; - error("no such remote ref %s", sought[i]->name); - ret = 1; - } + ret |= report_unmatched_refs(sought, nr_sought); while (ref) { printf("%s %s\n", diff --git a/fetch-pack.c b/fetch-pack.c index 601f077..f12bfcd 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -578,7 +578,7 @@ static void filter_refs(struct fetch_pack_args *args, break; /* definitely do not have it */ else if (cmp == 0) { keep = 1; /* definitely have it */ - sought[i]->matched = 1; + sought[i]->match_status = REF_MATCHED; } i++; } @@ -598,22 +598,24 @@ static void filter_refs(struct fetch_pack_args *args, } /* Append unmatched requests to the list */ - if ((allow_unadvertised_object_request & - (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { - for (i = 0; i < nr_sought; i++) { - unsigned char sha1[20]; + for (i = 0; i < nr_sought; i++) { + unsigned char sha1[20]; - ref = sought[i]; - if (ref->matched) - continue; - if (get_sha1_hex(ref->name, sha1) || - ref->name[40] != '\0' || - hashcmp(sha1, ref->old_oid.hash)) - continue; + ref = sought[i]; + if (ref->match_status != REF_NOT_MATCHED) + continue; + if (get_sha1_hex(ref->name, sha1) || + ref->name[40] != '\0' || + hashcmp(sha1, ref->old_oid.hash)) + continue; - ref->matched = 1; + if ((allow_unadvertised_object_request & + (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { + ref->match_status = REF_MATCHED; *newtail = copy_ref(ref); newtail = &(*newtail)->next; + } else { + ref->match_status = REF_UNADVERTISED_NOT_ALLOWED; } } *refs = newlist; @@ -1094,3 +1096,26 @@ struct ref *fetch_pack(struct fetch_pack_args *args, clear_shallow_info(); return ref_cpy; } + +int report_unmatched_refs(struct ref **sought, int nr_sought) +{ + int i, ret = 0; + + for (i = 0; i < nr_sought; i++) { + if (!sought[i]) + continue; + switch (sought[i]->match_status) { + case REF_MATCHED: + continue; + case REF_NOT_MATCHED: + error(_("no such remote ref %s"), sought[i]->name); + break; + case REF_UNADVERTISED_
Re: [PATCH] fetch: print an error when declining to request an unadvertised object
On Sun, 2017-02-12 at 15:49 -0800, Junio C Hamano wrote: > The fact that we have the above two choices tells me that a two-step > approach may be an appropriate approach. [...] > Even if you did only the first step, as long as the second step can > be done without reverting what the first step did [*4*] by somebody > who cares the "specific error" deeply enough, I am OK with that. Of > course if you did both steps, that is fine by me as well ;-) I appreciate the flexibility, but now that I've spent the time to understand all the code involved, it would be a pity not to go for the complete solution. New patch coming. Matt
Re: [PATCH] fetch: print an error when declining to request an unadvertised object
On Fri, 2017-02-10 at 10:36 -0800, Junio C Hamano wrote: > > > There is this piece of code near the end of builtin/fetch-pack.c: > > [...] > > that happens before the command shows the list of fetched refs, and > this code is prepared to inspect what happend to the requests it (in > response to the user request) made to the underlying fetch > machinery, and issue the error message. > If you change your command line to "git fetch-pack REMOTE SHA1", you > do see an error from the above. Yes, "error: no such remote ref ", which at least makes clear that the operation didn't work, though it would be nice to give a more specific error message. > This all happens in transport.c::fetch_refs_via_pack(). > I think that function is a much better place to error or die than > filter_refs(). I confirmed that checking the sought refs there works. However, in filter_refs, it's easy to give a more specific error message that the server doesn't allow requests for unadvertised objects, and that code works for "git fetch-pack" too. To do the same in fetch_refs_via_pack, we'd have to duplicate a few lines of code from filter_refs and expose the allow_unadvertised_object_request variable, or just set a flag on the "struct ref" in filter_refs and check it in fetch_refs_via_pack. What do you think? Do you not care about having a more specific error, in which case I can copy the code from builtin/fetch-pack.c to fetch_refs_via_pack? Or shall I add code to filter_refs to set a flag and add code to builtin/fetch-pack.c and fetch_refs_via_pack to check the flag? Or what? Matt
[PATCH] fetch: print an error when declining to request an unadvertised object
Currently "git fetch REMOTE SHA1" silently exits 1 if the server doesn't allow requests for unadvertised objects by sha1. Change it to print a meaningful error message. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- The fetch code looks unbelievably complicated to me and I don't understand all the cases that can arise. Hopefully this patch is an acceptable solution to the problem. fetch-pack.c | 31 --- t/t5516-fetch-push.sh | 3 ++- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index 601f077..117874c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -598,23 +598,24 @@ static void filter_refs(struct fetch_pack_args *args, } /* Append unmatched requests to the list */ - if ((allow_unadvertised_object_request & - (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) { - for (i = 0; i < nr_sought; i++) { - unsigned char sha1[20]; + for (i = 0; i < nr_sought; i++) { + unsigned char sha1[20]; - ref = sought[i]; - if (ref->matched) - continue; - if (get_sha1_hex(ref->name, sha1) || - ref->name[40] != '\0' || - hashcmp(sha1, ref->old_oid.hash)) - continue; + ref = sought[i]; + if (ref->matched) + continue; + if (get_sha1_hex(ref->name, sha1) || + ref->name[40] != '\0' || + hashcmp(sha1, ref->old_oid.hash)) + continue; - ref->matched = 1; - *newtail = copy_ref(ref); - newtail = &(*newtail)->next; - } + if (!(allow_unadvertised_object_request & + (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) + die(_("Server does not allow request for unadvertised object %s"), ref->name); + + ref->matched = 1; + *newtail = copy_ref(ref); + newtail = &(*newtail)->next; } *refs = newlist; } diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 0fc5a7c..177897e 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1098,7 +1098,8 @@ test_expect_success 'fetch exact SHA1' ' test_must_fail git cat-file -t $the_commit && # fetching the hidden object should fail by default - test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy && + test_must_fail git fetch -v ../testrepo $the_commit:refs/heads/copy 2>err && + test_i18ngrep "Server does not allow request for unadvertised object" err && test_must_fail git rev-parse --verify refs/heads/copy && # the server side can allow it to succeed -- 2.9.3
Re: [PATCH] merge-recursive: make "CONFLICT (rename/delete)" message show both paths
On Wed, 2017-02-01 at 12:56 -0800, Junio C Hamano wrote: > Junio C Hamano <gits...@pobox.com> writes: > > > Matt McCutchen <m...@mattmccutchen.net> writes: > > ... > > > Please check that my refactoring is indeed correct! All the > > > existing tests pass > > > for me, but the existing test coverage of these conflict messages > > > looks poor. > > > > This unfortunately is doing a bit too many things at once from that > > point of view. I need to reserve a solid quiet 20-minutes without > > distraction to check it, which I am hoping to do tonight. > > Let me make sure if I understood your changes correctly: > > * conflict_rename_delete() knew which one is renamed and which one > is deleted (even though the deleted one was called "other"), but > because in the original code handle_change_delete() wants to > always see tree A first and then tree B in its parameter list, > the original code swapped a/b before calling it. In the original > code, handle_change_delete() needed to figure out which one is > the deleted one by looking at a_oid or b_oid. > > * In the updated code, the knowledge of which branch survives and > which branch is deleted is passed from the caller to > handle_change_delete(), which no longer needs to figure out by > looking at a_oid/b_oid. The updated API only passes the oid for > surviving branch (as deleted one would have been 0{40} anyway). > > * In the updated code, handle_change_delete() is told the names of > both branches (the one that survives and the other that was > deleted). It no longer has to switch between o->branch[12] > depending on the NULLness of a_oid/b_oid; it knows both names and > which one is which. > > * handle_modify_delete() also calls handle_change_delete(). Unlike > conflict_rename_delete(), it is not told by its caller which > branch keeps the path and which branch deletes the path, and > instead relies on handle_change_delete() to figure it out. > Because of the above change to the API, now it needs to sort it > out before calling handle_change_delete(). > > It all makes sense to me. > > The single call to update_file() that appears near the end of > handle_change_delete() in the updated code corresponds to calls to > the same function in 3 among 4 codepaths in the function in the > original code. It is a bit tricky to reason about, though. > > In the original code, update_file() was omitted when we didn't have > to come up with a unique alternate filename and the one that is left > is a_oid (i.e. our side). The way to tell if we did not come up > with a unique alternate filename used to be to see the "renamed" > variable but now it is the NULL-ness of "alt_path". "alt_path" is the same variable that used to be "renamed". I just renamed it to be less confusing. > And the way to > tell if the side that is left is ours, we check to see o->branch1 > is the change_branch, not delete_branch. > > So the condition to guard the call to update_file() also looks > correct to me. All of the above matches my understanding. Would it have saved you time if I had included some of this explanation in the patch "cover letter"? Matt
[PATCH] merge-recursive: make "CONFLICT (rename/delete)" message show both paths
The current message printed by "git merge-recursive" for a rename/delete conflict is like this: CONFLICT (rename/delete): new-path deleted in HEAD and renamed in other-branch. Version other-branch of new-path left in tree. To be more helpful, the message should show both paths of the rename and state that the deletion occurred at the old path, not the new path. So change the message to the following format: CONFLICT (rename/delete): old-path deleted in HEAD and renamed to new-path in other-branch. Version other-branch of new-path left in tree. Since this doubles the number of cases in handle_change_delete (modify vs. rename), refactor the code to halve the number of cases again by merging the cases where o->branch1 has the change and o->branch2 has the delete with the cases that are the other way around. Also add a simple test of the new conflict message. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- This came up at: https://github.com/cristibalan/braid/issues/41#issuecomment-275826716 Please check that my refactoring is indeed correct! All the existing tests pass for me, but the existing test coverage of these conflict messages looks poor. merge-recursive.c | 117 ++--- t/t6045-merge-rename-delete.sh | 23 2 files changed, 86 insertions(+), 54 deletions(-) create mode 100755 t/t6045-merge-rename-delete.sh diff --git a/merge-recursive.c b/merge-recursive.c index d327209..e8fce10 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1061,16 +1061,20 @@ static int merge_file_one(struct merge_options *o, } static int handle_change_delete(struct merge_options *o, -const char *path, +const char *path, const char *old_path, const struct object_id *o_oid, int o_mode, -const struct object_id *a_oid, int a_mode, -const struct object_id *b_oid, int b_mode, +const struct object_id *changed_oid, +int changed_mode, +const char *change_branch, +const char *delete_branch, const char *change, const char *change_past) { - char *renamed = NULL; + char *alt_path = NULL; + const char *update_path = path; int ret = 0; + if (dir_in_way(path, !o->call_depth, 0)) { - renamed = unique_path(o, path, a_oid ? o->branch1 : o->branch2); + update_path = alt_path = unique_path(o, path, change_branch); } if (o->call_depth) { @@ -1081,43 +1085,43 @@ static int handle_change_delete(struct merge_options *o, */ ret = remove_file_from_cache(path); if (!ret) - ret = update_file(o, 0, o_oid, o_mode, - renamed ? renamed : path); - } else if (!a_oid) { - if (!renamed) { - output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s " - "and %s in %s. Version %s of %s left in tree."), - change, path, o->branch1, change_past, - o->branch2, o->branch2, path); - ret = update_file(o, 0, b_oid, b_mode, path); - } else { - output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s " - "and %s in %s. Version %s of %s left in tree at %s."), - change, path, o->branch1, change_past, - o->branch2, o->branch2, path, renamed); - ret = update_file(o, 0, b_oid, b_mode, renamed); - } + ret = update_file(o, 0, o_oid, o_mode, update_path); } else { - if (!renamed) { - output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s " - "and %s in %s. Version %s of %s left in tree."), - change, path, o->branch2, change_past, - o->branch1, o->branch1, path); + if (!alt_path) { + if (!old_path) { + output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s " + "and %s in %s. Version %s of %s left in tree."), + change, path, delete_branch, change_past, + change_branch, change_branch, path); + } else { + output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s "
[PATCH] t0001: don't let a default ACL interfere with the umask test
The "init creates a new deep directory (umask vs. shared)" test expects the permissions of newly created files to be based on the umask, which fails if a default ACL is inherited from the working tree for git. So attempt to remove a default ACL if there is one. Same idea as 8ed0a740dd42bd0724aebed6e3b07c4ea2a2d5e8. (I guess I'm the only one who ever runs the test suite with a default ACL set.) Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- t/t0001-init.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/t/t0001-init.sh b/t/t0001-init.sh index b8fc588..e424de5 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -258,6 +258,9 @@ test_expect_success POSIXPERM 'init creates a new deep directory (umask vs. shar ( # Leading directories should honor umask while # the repository itself should follow "shared" + mkdir newdir && + # Remove a default ACL if possible. + (setfacl -k newdir 2>/dev/null || true) && umask 002 && git init --bare --shared=0660 newdir/a/b/c && test_path_is_dir newdir/a/b/c/refs && -- 2.9.3
"git diff --ignore-space-change --stat" lists files with only whitespace differences as "changed"
A bug report: I noticed that "git diff --ignore-space-change --stat" lists files with only whitespace differences as having changed with 0 differing lines. This is inconsistent with the behavior without -- stat, which doesn't list such files at all. (Same behavior with all the --ignore*space* flags.) I can reproduce this with the current "next", af746e4. Quick test case: echo ' ' >test1 && echo ' ' >test2 && git diff --stat --no-index --ignore-space-change test1 test2 This caused me some inconvenience in the following scenario: I was reading a commit diff that had a bulk license change in all files combined with code changes. I attempted to revert the bulk license change locally using "sed" to more easily read the code diff, but my reversion left some whitespace diffs where the original files had inconsistent whitespace. So the diffstat after my reversion was cluttered with these "0" entries. Regards, Matt
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, 2016-11-15 at 12:40 -0500, Jeff King wrote: > On Tue, Nov 15, 2016 at 12:33:04PM -0500, Matt McCutchen wrote: > > > > > On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote: > > > > > > - when an object write is optimized out because we already have the > > > object, git will update the mtime on the file (loose object or > > > packfile) to freshen it > > > > FWIW, I am not seeing this happen when I do "git read-tree --prefix" > > followed by "git write-tree" using the current master (3ab2281). See > > the attached test script. > > The optimization I'm thinking about is the one from write_sha1_file(), > which learned to freshen in 33d4221c7 (write_sha1_file: freshen existing > objects, 2014-10-15). > > I suspect the issue is that read-tree populates the cache-tree index > extension, and then write-tree omits the object write before it even > gets to write_sha1_file(). The solution is that it should probably be > calling one of the freshen() functions (possibly just replacing > has_sha1_file() with check_and_freshen(), but I haven't looked). > > I'd definitely welcome patches in this area. Cool, it's nice to have an idea of what's going on. I don't think I'm going to try to fix it myself though. By the way, thanks for the fast response to my original question! Matt
[PATCH] git-gc.txt: expand discussion of races with other processes
In general, "git gc" may delete objects that another concurrent process is using but hasn't created a reference to. Git has some mitigations, but they fall short of a complete solution. Document this in the git-gc(1) man page and add a reference from the documentation of the gc.pruneExpire config variable. Based on a write-up by Jeff King: http://marc.info/?l=git=147922960131779=2 Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- Documentation/config.txt | 4 +++- Documentation/git-gc.txt | 34 ++ 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 21fdddf..3f1d931 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1409,7 +1409,9 @@ gc.pruneExpire:: Override the grace period with this config variable. The value "now" may be used to disable this grace period and always prune unreachable objects immediately, or "never" may be used to - suppress pruning. + suppress pruning. This feature helps prevent corruption when + 'git gc' runs concurrently with another process writing to the + repository; see the "NOTES" section of linkgit:git-gc[1]. gc.worktreePruneExpire:: When 'git gc' is run, it calls diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index bed60f4..852b72c 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -63,11 +63,10 @@ automatic consolidation of packs. --prune=:: Prune loose objects older than date (default is 2 weeks ago, overridable by the config variable `gc.pruneExpire`). - --prune=all prunes loose objects regardless of their age (do - not use --prune=all unless you know exactly what you are doing. - Unless the repository is quiescent, you will lose newly created - objects that haven't been anchored with the refs and end up - corrupting your repository). --prune is on by default. + --prune=all prunes loose objects regardless of their age and + increases the risk of corruption if another process is writing to + the repository concurrently; see "NOTES" below. --prune is on by + default. --no-prune:: Do not prune any loose objects. @@ -138,17 +137,36 @@ default is "2 weeks ago". Notes - -'git gc' tries very hard to be safe about the garbage it collects. In +'git gc' tries very hard not to delete objects that are referenced +anywhere in your repository. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by 'git filter-branch' in refs/original/, or reflogs (which may reference commits in branches that were later amended or rewound). - -If you are expecting some objects to be collected and they aren't, check +If you are expecting some objects to be deleted and they aren't, check all of those locations and decide whether it makes sense in your case to remove those references. +On the other hand, when 'git gc' runs concurrently with another process, +there is a risk of it deleting an object that the other process is using +but hasn't created a reference to. This may just cause the other process +to fail or may corrupt the repository if the other process later adds a +reference to the deleted object. Git has two features that significantly +mitigate this problem: + +. Any object with modification time newer than the `--prune` date is kept, + along with everything reachable from it. + +. Most operations that add an object to the database update the + modification time of the object if it is already present so that #1 + applies. + +However, these features fall short of a complete solution, so users who +run commands concurrently have to live with some risk of corruption (which +seems to be low in practice) unless they turn off automatic garbage +collection with 'git config gc.auto 0'. + HOOKS - -- 2.7.4
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote: > - when an object write is optimized out because we already have the > object, git will update the mtime on the file (loose object or > packfile) to freshen it FWIW, I am not seeing this happen when I do "git read-tree --prefix" followed by "git write-tree" using the current master (3ab2281). See the attached test script. > If you have long-running data (like, a temporary index file that might > literally sit around for days or weeks) I think that is a potential > problem. And the solution is probably to use refs in some way to point > to your objects. Agreed. This is not my current scenario. > If you're worried about a short-term operation where > somebody happens to run git-gc concurrently, I agree it's a possible > problem, but I suspect something you can ignore in practice. > > For the most part, a lot of the client-side git tools assume that one > operation is happening at a time in the repository. And I think that > largely holds for a developer working on a single clone, and things just > work in practice. > > Auto-gc makes that a little sketchier, but historically does not seem to > have really caused problems in practice. OK. I'll write a patch to add a summary of this information to the git-gc man page. Matt test-git-read-tree-write-tree-touch-object.sh Description: application/shellscript
Protecting old temporary objects being reused from concurrent "git gc"?
The Braid subproject management tool stores the subproject content in the main tree and is able to switch to a different upstream revision of a subproject by doing the equivalent of "git read-tree -m" on the superproject tree and the two upstream trees. The tricky part is preparing temporary trees with the upstream content moved to the path configured for the superproject. The usual method is "git read-tree --prefix", but using what index file? Braid currently uses the user's actual worktree, which can leave a mess if it gets interrupted: https://github.com/cristibalan/braid/blob/7d81da6e86e24de62a74f3ab8d880666cb343b04/lib/braid/commands/update.rb#L98 I want to change this to something that won't leave an inconsistent state if interrupted. I've written code for this kind of thing before that sets GIT_INDEX_FILE and uses a temporary index file and "git write-tree". But I realized that if "git gc" runs concurrently, the generated tree could be deleted before it is used and the tool would fail. If I had a need to run "git commit-tree", it seems like I might even end up with a commit object with a broken reference to a tree. "git gc" normally doesn't delete objects that were created in the last 2 weeks, but if an identical tree was added to the object database more than 2 weeks ago by another operation and is unreferenced, it could be reused without updating its mtime and it could still get deleted. Is there a recommended way to avoid this kind of problem in add-on tools? (I searched the Git documentation and the web for information about races with "git gc" and didn't find anything useful.) If not, it seems to be a significant design flaw in "git gc", even if the problem is extremely rare in practice. I wonder if some of the built-in commands may have the same problem, though I haven't tried to test them. If this is confirmed to be a known problem affecting built-in commands, then at least I won't feel bad about introducing the same problem into add-on tools. :/ Thanks, Matt
Re: "git subtree --squash" interacts poorly with revert, merge, and rebase
On Thu, 2016-11-10 at 16:53 -0500, Matt McCutchen wrote: > On Wed, 2016-10-26 at 19:07 -0400, Matt McCutchen wrote: > > > > Maybe we would never hit any of these problems in practice, but they > > give me a bad enough feeling that I'm planning to write my own tool > > that tracks the upstream commit ID in a file (like a submodule) and > > doesn't generate any extra commits. Without generating extra commits, > > the only place to store the upstream content in the superproject would > > be in another subtree, which would take up disk space in every working > > tree unless developers manually set skip-worktree. I think I prefer to > > not store the upstream content and just have the tool fetch it from a > > local subproject repository each time it's needed. > > > > I'll of course post the tool on the web and would be happy to see it > > integrated into "git subtree" if that makes sense, but I don't know how > > much time I'd be willing to put into making that happen. > > I have named my tool "git subtree-lite" and posted it here: > > https://mattmccutchen.net/utils/git-subtree-lite.git/ As I was doing additional research in preparation for adding git- subtree-lite to the tools page (https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools), by chance I found an existing tool, Braid (http://cristibalan.github.io/braid/), whose design meets my requirements. I have a few minor concerns, but assuming I'm able to fix them without too much work and upstream accepts my patches, I plan to switch to Braid. I've made a properly marked section on the tools page for subproject management tools: https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Subprojects_or_sets_of_repositories in the hope that the next person with the same requirements as me finds Braid. (I unfortunately didn't check that page before starting, but I will the next time I need something.) Matt
Re: [PATCH] fetch/push: document that private data can be leaked
On Mon, 2016-11-14 at 11:00 -0800, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > > > > > > > Yup, and then "do not push to untrustworthy place without > > > checking > > > what you are pushing", too? > > > > If there is no private data in the repository, then there is no > > need > > for the user to check what they are pushing. As I've indicated > > before, > > IMO manually checking each push would not be a workable security > > measure in the long term anyway. > > Then what is? Don't put private data in the same repository, then the whole issue becomes moot. Am I missing something? Matt
[PATCH] doc: mention transfer data leaks in more places
The "SECURITY" section of the gitnamespaces(7) man page described two ways for a client to steal data from a server that wasn't intended to be shared. Similar attacks can be performed by a server on a client, so adapt the section to cover both directions and add it to the git-fetch(1), git-pull(1), and git-push(1) man pages. Also add references to this section from the documentation of server configuration options that attempt to control data leakage but may not be fully effective. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- Documentation/config.txt | 17 ++--- Documentation/git-fetch.txt | 2 ++ Documentation/git-pull.txt| 2 ++ Documentation/git-push.txt| 2 ++ Documentation/gitnamespaces.txt | 20 +--- Documentation/transfer-data-leaks.txt | 30 ++ 6 files changed, 51 insertions(+), 22 deletions(-) create mode 100644 Documentation/transfer-data-leaks.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 21fdddf..fc2cf83 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2898,6 +2898,11 @@ is omitted from the advertisements but `refs/heads/master` and `refs/namespaces/bar/refs/heads/master` are still advertised as so-called "have" lines. In order to match refs before stripping, add a `^` in front of the ref name. If you combine `!` and `^`, `!` must be specified first. ++ +Even if you hide refs, a client may still be able to steal the target +objects via the techniques described in the "SECURITY" section of the +linkgit:gitnamespaces[7] man page; it's best to keep private data in a +separate repository. transfer.unpackLimit:: When `fetch.unpackLimit` or `receive.unpackLimit` are @@ -2907,7 +2912,7 @@ transfer.unpackLimit:: uploadarchive.allowUnreachable:: If true, allow clients to use `git archive --remote` to request any tree, whether reachable from the ref tips or not. See the - discussion in the `SECURITY` section of + discussion in the "SECURITY" section of linkgit:git-upload-archive[1] for more details. Defaults to `false`. @@ -2921,13 +2926,19 @@ uploadpack.allowTipSHA1InWant:: When `uploadpack.hideRefs` is in effect, allow `upload-pack` to accept a fetch request that asks for an object at the tip of a hidden ref (by default, such a request is rejected). - see also `uploadpack.hideRefs`. + See also `uploadpack.hideRefs`. Even if this is false, a client + may be able to steal objects via the techniques described in the + "SECURITY" section of the linkgit:gitnamespaces[7] man page; it's + best to keep private data in a separate repository. uploadpack.allowReachableSHA1InWant:: Allow `upload-pack` to accept a fetch request that asks for an object that is reachable from any ref tip. However, note that calculating object reachability is computationally expensive. - Defaults to `false`. + Defaults to `false`. Even if this is false, a client may be able + to steal objects via the techniques described in the "SECURITY" + section of the linkgit:gitnamespaces[7] man page; it's best to + keep private data in a separate repository. uploadpack.keepAlive:: When `upload-pack` has started `pack-objects`, there may be a diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt index 9e42169..b153aef 100644 --- a/Documentation/git-fetch.txt +++ b/Documentation/git-fetch.txt @@ -192,6 +192,8 @@ The first command fetches the `maint` branch from the repository at objects will eventually be removed by git's built-in housekeeping (see linkgit:git-gc[1]). +include::transfer-data-leaks.txt[] + BUGS Using --recurse-submodules can only fetch new commits in already checked diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt index d033b25..4470e4b 100644 --- a/Documentation/git-pull.txt +++ b/Documentation/git-pull.txt @@ -237,6 +237,8 @@ If you tried a pull which resulted in complex conflicts and would want to start over, you can recover with 'git reset'. +include::transfer-data-leaks.txt[] + BUGS Using --recurse-submodules can only fetch new commits in already checked diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt index 47b77e6..8eefabd 100644 --- a/Documentation/git-push.txt +++ b/Documentation/git-push.txt @@ -559,6 +559,8 @@ Commits A and B would no longer belong to a branch with a symbolic name, and so would be unreachable. As such, these commits would be removed by a `git gc` command on the origin repository. +include::transfer-data-leaks.txt[] + GIT --- Part of the linkgit:git[1] suite diff --git a/Documentation/gitnamespaces.txt b/Documentation/gitnamespaces.txt index 7685e36..b614969 100644 --- a/Documentation
Re: [PATCH] fetch/push: document that private data can be leaked
On Sun, 2016-11-13 at 18:57 -0800, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > > > Documentation/fetch-push-security.txt | 9 + > > A new (consolidated) piece like this that can be included in > multiple places is a good idea. I wonder if the original > description in "namespaces" thing can be moved here and then > "namespaces" page can be made to also borrow from this? I gave this a try. New patch coming. > > --- /dev/null > > +++ b/Documentation/fetch-push-security.txt > > @@ -0,0 +1,9 @@ > > +SECURITY > > + > > +The fetch and push protocols are not designed to prevent a > > malicious > > +server from stealing data from your repository that you did not > > intend to > > +share. The possible attacks are similar to the ones described in > > the > > +"SECURITY" section of linkgit:gitnamespaces[7]. If you have > > private data > > +that you need to protect from the server, keep it in a separate > > +repository. > > Yup, and then "do not push to untrustworthy place without checking > what you are pushing", too? If there is no private data in the repository, then there is no need for the user to check what they are pushing. As I've indicated before, IMO manually checking each push would not be a workable security measure in the long term anyway. Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Sun, 2016-10-30 at 01:16 -0700, Junio C Hamano wrote: > Jon Loeligerwrites: > > > > > Is there an existing protocol provision, or an extension to > > the protocol that would allow a distrustful client to say to > > the server, "Really, you have Y2? Prove it." > > There is not, but I do not think it would be an effective solution. > > The issue is not the lack of protocol support, but how to determine > that the other side needs such a proof for Y2 but not for other > commits. How does your side know what makes Y2 special and why does > yout side think they should not have Y2? > > Once you know how to determine Y2 is special, that knowledge can be > used to abort the "push" before even starting. When you are pushing > back the 'master' and that 'master' reaches Y2, which must be kept > secret, you shouldn't be pushing that 'master' to them, whether they > claim to have Y2 or not. FWIW, I can imagine a protocol that would prove possession for all objects, which would completely fix the problem. Each object would have a "private" hash computed recursively over the object graph, just like the ordinary object hash, but with a different seed. The object database would be extended to cache the private hash of every object. Then, during a fetch or push, when the two sides identify a matching object, the side that would otherwise have had to send the object sends the private hash. Support for storing multiple hashes per object might also be useful in some way for the migration to a stronger hash function than SHA-1. The next best solution, which doesn't require a protocol change but requires a little user intervention, is to have a configuration option per remote for a set of refs whose reachable objects are known to be safe to send to the server. This set presumably includes the remote's own remote-tracking refs. During fetch and push, the client looks for matches only among these "safe" objects, effectively emulating a repository containing only the safe objects. A fetch may update the remote-tracking refs to point to unsafe objects that were already in the local repository, effectively making them safe, but only after the server sends the content of these objects (and the client validates the hashes!). Unfortunately, I'm not signing up to implement either solution. :( Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Sun, 2016-10-30 at 01:03 -0700, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > > > On Fri, 2016-10-28 at 22:31 -0700, Junio C Hamano wrote: > > > > > > Not sending to the list, where mails from Gmail/phone is known to > > > get > > > rejected. > > > > [I guess I can go ahead and quote this to the list.] > > > > > > > > No. I'm saying that the scenario you gave is bad and people > > > should be > > > taught not to connect to untrustworthy sites. > > > > To clarify, are you saying: > > > > (1) don't connect to an untrusted server ever (e.g., we don't > > promise > > that the server can't execute arbitrary code on the client), or > > > > (2) don't connect to an untrusted server if the client repository > > has > > data that needs to be kept secret from the server? > > You sneaked "arbitrary code execution" into the discussion but I do > not know where it came from. In any case, "don't pull from or push > to untrustworthy place" would be a common sense advice that would > make sense in any scenario ;-) A blanket statement like that without explanation is not very helpful to users who do find themselves needing to pull from or push to a server they don't absolutely trust. The only "definitely safe" option it leaves them is to run the entire thing in a sandbox. A statement of the nature of the risk is much more helpful: users can determine that they don't care about the risk, or if it does, what the easiest workaround is. The new risk we discovered in this thread is of leakage of private data from the local repository. To avoid that risk, it's sufficient for users to move private data to a separate repository, so that's the advice I propose to give. Are you aware of issues with fetch/push with potential impact beyond leakage of private data, which would make my proposed text insufficient? I was giving "arbitrary code execution" as an example of what the impact of such an issue could be. > Just for future reference, when you have ideas/issues that might > have possible security ramifications, I'd prefer to see it first > discussed on a private list we created for that exact purpose, until > we can assess the impact (if any). Right now MaintNotes says this: > > If you think you found a security-sensitive issue and want to > disclose > it to us without announcing it to wider public, please contact us > at > our security mailing list <git-secur...@googlegroups.com>. This > is > a closed list that is limited to people who need to know early > about > vulnerabilities, including: > > - people triaging and fixing reported vulnerabilities > - people operating major git hosting sites with many users > - people packaging and distributing git to large numbers of > people > > where these issues are discussed without risk of the information > leaking out before we're ready to make public announcements. > > We may want to tweak the description from "disclose it to us" to > "have a discussion on it with us" (the former makes it sound as if > the topic has to be a definite problem, the latter can include an > idle speculation that may not be realistic attack vector). OK. I'll admit that I didn't even look for a policy on reporting of security issues because I believed the issue had low enough impact that a report to a dedicated security contact point would be unwelcome. Maybe that was reckless. The new text sounds good, if you put it in a place where people like me would see it. :/ Matt
[PATCH] fetch/push: document that private data can be leaked
A malicious server may be able to use the fetch and push protocols to steal data from a user's repository that the user did not intend to share, via attacks similar to those described in the gitnamespaces(7) man page. Mention this in the git-fetch(1), git-pull(1), and git-push(1) man pages and recommend using separate repositories for private data and interaction with untrusted servers. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- And here's a proposed patch. Based on the maint branch, ac84098. Documentation/fetch-push-security.txt | 9 + Documentation/git-fetch.txt | 2 ++ Documentation/git-pull.txt| 2 ++ Documentation/git-push.txt| 2 ++ 4 files changed, 15 insertions(+) create mode 100644 Documentation/fetch-push-security.txt diff --git a/Documentation/fetch-push-security.txt b/Documentation/fetch-push-security.txt new file mode 100644 index 000..00944ed --- /dev/null +++ b/Documentation/fetch-push-security.txt @@ -0,0 +1,9 @@ +SECURITY + +The fetch and push protocols are not designed to prevent a malicious +server from stealing data from your repository that you did not intend to +share. The possible attacks are similar to the ones described in the +"SECURITY" section of linkgit:gitnamespaces[7]. If you have private data +that you need to protect from the server, keep it in a separate +repository. + diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt index 9e42169..a461b4b 100644 --- a/Documentation/git-fetch.txt +++ b/Documentation/git-fetch.txt @@ -192,6 +192,8 @@ The first command fetches the `maint` branch from the repository at objects will eventually be removed by git's built-in housekeeping (see linkgit:git-gc[1]). +include::fetch-push-security.txt[] + BUGS Using --recurse-submodules can only fetch new commits in already checked diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt index d033b25..0af2de9 100644 --- a/Documentation/git-pull.txt +++ b/Documentation/git-pull.txt @@ -237,6 +237,8 @@ If you tried a pull which resulted in complex conflicts and would want to start over, you can recover with 'git reset'. +include::fetch-push-security.txt[] + BUGS Using --recurse-submodules can only fetch new commits in already checked diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt index 47b77e6..5ebef9e 100644 --- a/Documentation/git-push.txt +++ b/Documentation/git-push.txt @@ -559,6 +559,8 @@ Commits A and B would no longer belong to a branch with a symbolic name, and so would be unreachable. As such, these commits would be removed by a `git gc` command on the origin repository. +include::fetch-push-security.txt[] + GIT --- Part of the linkgit:git[1] suite -- 2.7.4
Re: "git subtree --squash" interacts poorly with revert, merge, and rebase
On Wed, 2016-10-26 at 19:07 -0400, Matt McCutchen wrote: > Maybe we would never hit any of these problems in practice, but they > give me a bad enough feeling that I'm planning to write my own tool > that tracks the upstream commit ID in a file (like a submodule) and > doesn't generate any extra commits. Without generating extra commits, > the only place to store the upstream content in the superproject would > be in another subtree, which would take up disk space in every working > tree unless developers manually set skip-worktree. I think I prefer to > not store the upstream content and just have the tool fetch it from a > local subproject repository each time it's needed. > > I'll of course post the tool on the web and would be happy to see it > integrated into "git subtree" if that makes sense, but I don't know how > much time I'd be willing to put into making that happen. I have named my tool "git subtree-lite" and posted it here: https://mattmccutchen.net/utils/git-subtree-lite.git/ For now, please email any bug reports, enhancement requests, or proposed patches to me. I have philosophical concerns about hosting my own projects on services I don't control and practical concerns about a few "forge" apps that I looked into installing on my own web site, but if people are seriously interested in collaborating on this, I'll work something out. Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Sat, 2016-10-29 at 09:39 -0400, Jeff King wrote: > I'm not sure I understand how connecting to a remote server to fetch is > a big problem. The server may learn about the existence of particular > sha1s in your repository, but cannot get their content. > > It's the subsequent push that is a problem. > > In the scenarios you've described, I'm mostly inclined to say that the > problem is not git or the protocol itself, but rather lax refspecs. > You mentioned earlier: > > the server can just generate another ref 'xx' pointing to Y2, assuming > it can entice the victim to set up a corresponding local branch > refs/heads/for-server1/xx and push it back. Or if the victim is for > some reason just mirroring back and forth: > > This sounds a lot like "I told git to push a bunch of things without > checking if they were really secret, and it turned out to push some > secret things". IOW I think the problem is not that the server may lie > about what it has, but that the user was not careful about what they > pushed. I dunno. I do not mind making a note in the documentation > explaining the implications of a server lying, but the scenarios seem > pretty contrived to me. Let's focus on the first scenario. There the user is just pulling and pushing a master branch. Are you saying that each time the user pulls, they need to look over all the commits they pulled before pushing them back? I think that's unrealistic, for example, on a busy project with centralized code review or if the user is publishing a project-specific modified version of an upstream library. The natural user expectation is that anything pulled from a public repository is public. But let's see what Junio says in the other subthread. > A much more interesting one, IMHO, is a server whose receive-pack lies > about which objects it has (possibly ones it found out about earlier via > fetch), which provokes the client to generate deltas against objects the > server doesn't have (and thereby leaking information about the base > objects). > > That is a problem no matter how careful your refspecs are. I suspect it > would be a hard attack to pull off in practice, just because it's going > to depend heavily on the content of the specific objects, what kinds of > deltas you can convince the other side to generate, etc. That might > merit a mention in the git-push documentation. Sure, if I end up doing a patch, I'll include this. Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Fri, 2016-10-28 at 22:31 -0700, Junio C Hamano wrote: > Not sending to the list, where mails from Gmail/phone is known to get > rejected. [I guess I can go ahead and quote this to the list.] > No. I'm saying that the scenario you gave is bad and people should be > taught not to connect to untrustworthy sites. To clarify, are you saying: (1) don't connect to an untrusted server ever (e.g., we don't promise that the server can't execute arbitrary code on the client), or (2) don't connect to an untrusted server if the client repository has data that needs to be kept secret from the server? The fetch/push attack relates only to #2. If #1, what are the other risks you are thinking of? Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Fri, 2016-10-28 at 18:11 -0700, Junio C Hamano wrote: > Ah, I see. My immediate reaction is that you can do worse things in > the reverse direction compared to this, but your scenario does sound > bad already. Are you saying that clients connecting to untrusted servers already face worse risks that people should know about, so there is no point in documenting this one? I guess I don't know about the other risks aside from accepting a corrupt object, which should be preventable by enabling fetch.fsckObjects. It seems we need either a statement that connecting to untrusted servers is officially unsupported or a description of the specific risks. Matt
Re: Fetch/push lets a malicious server steal the targets of "have" lines
On Fri, 2016-10-28 at 15:00 -0700, Junio C Hamano wrote: > Let me see if I understood your scenario correctly. > > Suppose we start from this history where 'O' are common, your victim > has a 'Y' branch with two commits that are private to it, as well as > a 'X' branch on which it has X1 that it previously obtained from the > server. On the other hand, the server does not know about Y1 or Y2, > and it added one commit X2 to the branch 'x' the victim is > following: > > victimserver > > Y1---Y2 > / > ---O---O---X1 ---O---O---X1---X2 > > Then when victim wants to fetch 'x' from the server, it would say > > have X1, have Y2, have Y1, have O > > and gets told to shut up by the server who heard enough. The > histories on these two parties will then become like this: > > > victimserver > > Y1---Y2 > / > ---O---O---X1---X2 ---O---O---X1---X2 Then the server generates a commit X3 that lists Y2 as a parent, even though it doesn't have Y2, and advances 'x' to X3. The victim fetches 'x': victim server Y1---Y2 (Y2) / \ \ ---O---O---X1---X2---X3 ---O---O---X1---X2---X3 Then the server rolls back 'x' to X2: victim server Y1---Y2 / \ ---O---O---X1---X2---X3 ---O---O---X1---X2 And the victim pushes: victim server Y1---Y2 Y1---Y2 / \ / \ ---O---O---X1---X2---X3 ---O---O---X1---X2---X3 Now the server has the content of Y2. If the victim is fetching and pulling a whole "directory" of refs, e.g: fetch: refs/heads/*:refs/remotes/server1/* push: refs/heads/for-server1/*:refs/heads/* then instead of generating a merge commit, the server can just generate another ref 'xx' pointing to Y2, assuming it can entice the victim to set up a corresponding local branch refs/heads/for-server1/xx and push it back. Or if the victim is for some reason just mirroring back and forth: fetch: refs/heads/*:refs/heads/for-server1/* push: refs/heads/for- server1/*:refs/heads/* then it doesn't have to set up a local branch as separate step. Matt
Fetch/push lets a malicious server steal the targets of "have" lines
I was studying the fetch protocol and I realized that in a scenario in which a client regularly fetches a set of refs from a server and pushes them back without careful scrutiny, the server can steal the targets of unrelated refs from the client repository by fabricating its own refs to the "have" objects specified by the client during the fetch. This is the reverse of attack #1 described in the "SECURITY" section of the gitnamespaces(7) man page, with the addition that the server doesn't have to know the object IDs in advance. Is this supposed to be well- known? I've been using git since 2006 and it was a surprise to me. Hopefully it isn't very common for a user to fetch and push with a server they don't trust to have all the data in their repository. I don't think I have any such cases myself; I have unfinished work that isn't meant for scrutiny by others, but nothing really damaging if it were released to the server. This attack presents no new risks if a user already runs code fetched from the server in such a way that it can read the repository. But there might be some users who just review embargoed security fixes from multiple sources (or something like that) without running code themselves, and their security expectations might be violated. If my analysis is correct, I'd argue for documenting the issue in a "SECURITY" section in the git-fetch man page. Shall I submit a patch? Thanks for your attention. Matt
Re: "git subtree --squash" interacts poorly with revert, merge, and rebase
On Wed, 2016-10-26 at 19:03 -0700, Stefan Beller wrote: > On Wed, Oct 26, 2016 at 6:52 PM, Matt McCutchen <m...@mattmccutchen.net> > wrote: > > 4. I pushed several dangling submodule pointers before I learned I > > could set push.recurseSubmodules = check. This isn't the default; each > > developer has to do it manually. (In theory, I could put such things > > in a setup script for them to run if they trust me.) > > There is a current series in flight/for review that makes "check" default. > (It is blocked as check has some performance issues when having lots > of commits to be pushed, so it may take a while and not show up in the > next release) Great! One other thing: IIRC, "check" does not distinguish between different remotes. For example, suppose I fork a project that already has a submodule and I have a pair of repositories that pull from the "upstream" repositories and push to "origin" repositories for my project. Suppose I upgrade to a new upstream version and find that I'm (temporarily) able to use the upstream submodule without modifications. The "check" feature won't stop me from pushing a pointer into the "origin" superproject that points to a commit that exists in the "upstream" subproject but not the "origin" subproject. > > 5. Stashing changes to both the superproject and the subproject takes > > more steps. > > True, so you'd want to have a `git stash --recurse-submodules={yes,no}` > where the command line option is configurable, so you don't have to type > it all the time? Sounds good. I'm sure you realize this is not just a matter of running "git stash" in each submodule because there are many ways the stash stacks could get out of lockstep. The submodule content needs to be incorporated into the superproject stash. > Thanks for pointing out the issues though. they align to what > we plan on doing for submodules, so ... the plan actually makes > sense :) Again, I'm thrilled you're working on this, even if I don't use it on my current project. Matt
Re: "git subtree --squash" interacts poorly with revert, merge, and rebase
Hi Stefan, I appreciate the effort to remove obstacles to the use of submodules! It looks like a custom tool is probably still our best option at this time, though we can always switch back to submodules later. On Wed, 2016-10-26 at 16:23 -0700, Stefan Beller wrote: > On Wed, Oct 26, 2016 at 4:07 PM, Matt McCutchen <m...@mattmccutchen.net> > wrote: > > - We have to make separate commits and manage corresponding topic > > branches for the superproject and subprojects. > > Well yeah, that is how submodule work on a conceptual level. > While having multiple commits may seem like overhead, note > the subtle difference for these commits. One if deep down in the > stack patching one of the submodules, the other is a high level > commit advancing the submodule pointer. > > Note that the target audience of these two commit messages > might be vastly different, hence can be worded differently. > (The submodule describing how you fixed e.g. a memleak or race condition > and the superproject describes on why you needed to include that submodule, > e.g. because you switched your toplevel application to use threads.) I understand one can adhere to that philosophy, but I don't see the practical benefit of doing so in our case compared to using a "git subtree"-like approach and making a single commit. It would just be us looking at both commits. If we do upstream any of the library changes, we'll probably have to rework them anyway. > > - A diff of the superproject doesn't include the content of > > subprojects. > Although this is just Git, you probably also have a code review system that > would need that change as well. Indeed. We currently use Bitbucket. I'd be open to switching, though maybe not just for this. > Is there anything else besides these 3 points that encourages you to > switch away from submodules? Those 3 are the ongoing pain points I can think of. There are a few other drawbacks compared to "git subtree" that come up less often: 1b. On another project, I was working with a teammate who was new to version control and not very careful, who forgot to run "git submodule update" and ended up committing back the old submodule pointer. Thankfully, this hasn't happened yet on my current project. 4. I pushed several dangling submodule pointers before I learned I could set push.recurseSubmodules = check. This isn't the default; each developer has to do it manually. (In theory, I could put such things in a setup script for them to run if they trust me.) 5. Stashing changes to both the superproject and the subproject takes more steps. 6. I use multiple worktrees (with "git worktree") because of #5 and also so that I can run two versions of the application at the same time and compare the behavior. Using "git worktree" with submodules is officially unsupported, though I was able to get things working by manually editing some files. 7. We have to set up a repository for each subproject on our hosting service. Anyone who forks our application and modifies a subproject has to set up a subproject repository and carry a change to .gitmodules to point to their repository. If we use relative URLs in .gitmodules, it's even worse: anyone who forks our application has to set up repositories for all the subprojects, even if they don't modify them. Matt
"git subtree --squash" interacts poorly with revert, merge, and rebase
I'm the lead developer of a research software application (https://bitb ucket.org/objsheets/objsheets) that uses modified versions of two third-party libraries, which we need to version and distribute along with our application. For better or for worse, we haven't made it a priority to upstream our changes, so for now we just want to optimize for ease of (1) making and reviewing changes and (2) upgrading to newer upstream versions. We've been using git submodules, but that's a pain for several reasons: - We have to run "git submodule update" manually. - We have to make separate commits and manage corresponding topic branches for the superproject and subprojects. - A diff of the superproject doesn't include the content of subprojects. Recently I looked into switching to the "git subtree" contrib tool in the --squash mode, but I identified a few drawbacks compared to submodules: 1. The upstream commit on which the subtree is based is assumed to be given by the latest squash commit in "git log". This means that (i) a change to a different upstream commit can't be reverted with "git revert" and (ii) a "git merge" of two superproject branches based on different upstream commits may successfully merge the content of the upstream commits but leave the tool thinking the subtree is based on an arbitrary one of the two commits. 2. Rebasing messes up the merge commits generated by "git subtree -- squash". --preserve-merges worked in a simple test but supposedly doesn't work if there are conflicts or I want to reorder commits with --interactive. Maybe we would never hit any of these problems in practice, but they give me a bad enough feeling that I'm planning to write my own tool that tracks the upstream commit ID in a file (like a submodule) and doesn't generate any extra commits. Without generating extra commits, the only place to store the upstream content in the superproject would be in another subtree, which would take up disk space in every working tree unless developers manually set skip-worktree. I think I prefer to not store the upstream content and just have the tool fetch it from a local subproject repository each time it's needed. I'll of course post the tool on the web and would be happy to see it integrated into "git subtree" if that makes sense, but I don't know how much time I'd be willing to put into making that happen. Any advice? Thanks, Matt
Making the "Note from the maintainer" information discoverable
On Mon, 2016-02-08 at 14:16 -0800, Junio C Hamano wrote: > Jeff Kingwrites: > > > FWIW, as the person who wrote that section, I think that is a good > > addition. We do have a link to Simon Tatham's bug-reporting guide, > > but > > this is a good place to put project-specific advice. > > > > In addition to "try it on next" you may want to also mention "try > > it on > > the latest version of git". That is another frequently given > > pointer to > > bug reporters. Trying "next" is obviously a superset, but I > > suspect > > trying a released version may be an easier first step for some > > people. > > Yes, definitely. > > I agree that testing with the latest released version would > typically be much easier to end users than building from the source. > It would reduce the need for "Ah, that's ancient issue, we know it > was fixed a few releases ago." responses by us; I do not recall many > of such responses in the recent history on the list, though. > > For the ones who are more into the spirit of helping each other who > can build from the source to help us even more, checking 'master' > and finding regressions before it gets too late is a very good > thing. Checking 'next' and confirming an upcoming fix is equally > valuable. While researching an unrelated issue, I stumbled upon http://marc.info/?l=git=142714670111063=2, which seems to have even more valuable information about community processes. Is there any interest in making this information discoverable from https://git-scm.com/community and/or the man pages? I'm happy to file an issue or to write a patch that adds a link, but I don't see myself spending more time on it than that. Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: update_linked_gitdir writes relative path to .git/worktrees//gitdir
On Mon, 2016-02-08 at 14:16 -0800, Junio C Hamano wrote: > Jeff Kingwrites: > > > FWIW, as the person who wrote that section, I think that is a good > > addition. We do have a link to Simon Tatham's bug-reporting guide, but > > this is a good place to put project-specific advice. > > > > In addition to "try it on next" you may want to also mention "try it on > > the latest version of git". That is another frequently given pointer to > > bug reporters. Trying "next" is obviously a superset, but I suspect > > trying a released version may be an easier first step for some people. > > Yes, definitely. > > I agree that testing with the latest released version would > typically be much easier to end users than building from the source. > It would reduce the need for "Ah, that's ancient issue, we know it > was fixed a few releases ago." responses by us; I do not recall many > of such responses in the recent history on the list, though. > > For the ones who are more into the spirit of helping each other who > can build from the source to help us even more, checking 'master' > and finding regressions before it gets too late is a very good > thing. Checking 'next' and confirming an upcoming fix is equally > valuable. OK, so this testing is an encouragement, not an expectation per se, even if bug reports may be less likely to get attention without it (I'm not familiar with the degree to which this may have been the case recently). See my revised proposed text here: https://github.com/git/git-scm.com/pull/676/files I'll send an analogous patch for the git(1) man page in a moment. I left a mention of providing feedback on pending fixes but thought it would be too much to go into the details of how to identify whether there is a pending fix. Is this sensible? Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git(1) man page has link to out-of-date googlecode site
I noticed that near the top of the git(1) man page, there is a link to http://git-htmldocs.googlecode.com/git/git.html, which apparently has not been updated since August 2015 (https://code.google.com/archive/p/git-htmldocs/source/default/commits). Should this link just be removed, or replaced with http://git-scm.com/docs (which lists a subset of the pages) or perhaps http://git-scm.com/docs/git? I will be happy to write the patch. Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] git.txt: encourage bug reporters to test recent versions
Specifically, the latest released version or the "next" branch, as reporters are willing. This is based on: http://marc.info/?l=git=145496979420513=2 Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- Documentation/git.txt | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index d987ad2..1a148bc 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -1226,7 +1226,11 @@ Reporting Bugs Report bugs to the Git mailing list <git@vger.kernel.org> where the development and maintenance is primarily done. You do not have to be -subscribed to the list to send a message there. +subscribed to the list to send a message there. You can help us out by +attempting to reproduce the bug in the latest released version of git, +or if you're willing to build git from source, the `next` branch. +Sometimes an attempted fix may be pending in this branch, in which case +your feedback as to whether the fix worked for you will be appreciated. SEE ALSO -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/git-clean.txt: don't mention deletion of .git/modules/*
On Tue, 2016-02-09 at 09:34 -0800, Junio C Hamano wrote: > Mikko Rapeli <mikko.rap...@iki.fi> writes: > > > Sorry, can't reproduce the problem where submodules stayed in the > > tree until > > git clean was called with two -f's. > > > > You are right in removing the second part. > > Thanks, then let's do this. Looks good except for a typo (below). I'll try to write better commit messages in the future. > -- >8 -- > From: Matt McCutchen <m...@mattmccutchen.net> > Date: Sat, 6 Feb 2016 15:25:41 -0500 > Subject: [PATCH] Documentation/git-clean.txt: don't mention deletion > of .git/modules/* > > The latter half fo of > this sentence, the removal of the submodules, was > never done with (or without) double -f back when it was written, and > we still do not do so. > > Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> > Acked-by: Mikko Rapeli <mikko.rap...@iki.fi> > Signed-off-by: Junio C Hamano <gits...@pobox.com> > --- > Documentation/git-clean.txt | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/Documentation/git-clean.txt b/Documentation/git- > clean.txt > index 641681f..51a7e26 100644 > --- a/Documentation/git-clean.txt > +++ b/Documentation/git-clean.txt > @@ -37,9 +37,7 @@ OPTIONS > to false, 'git clean' will refuse to delete files or > directories > unless given -f, -n or -i. Git will refuse to delete > directories > with .git sub directory or file unless a second -f > - is given. This affects also git submodules where the storage > area > - of the removed submodule under .git/modules/ is not removed > until > - -f is given twice. > + is given. > > -i:: > --interactive:: -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/git-clean.txt: don't mention deletion of .git/modules/*
On Mon, 2016-02-08 at 14:22 -0800, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > I found no evidence of such behavior in the source code. > > > > Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> > > --- > > That was added last year at bcd57cb9 (Documentation/git-clean.txt: > document that -f may need to be given twice, 2015-02-26). It would > be better to know what got changed since then--that is, was the > additional text unnecessary even back then, or we made changes to > the system since then and forgot to remove the added text. > > Mikko, is this need to give -f twice still the case? I know you probably want confirmation from Mikko, but I'll offer my understanding. There were two statements added in bcd57cb9: 1. -f may need to be given twice to delete nested worktrees and embedded repositories. This is still true. 2. Deletion of submodule repositories under .git/modules is conditional on -f being given twice. AFAICT, this was wrong even back then: "git clean" has never deleted such repositories under any conditions. My patch is only removing #2. Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
update_linked_gitdir writes relative path to .git/worktrees//gitdir
I noticed that when update_linked_gitdir chooses to update .git/worktrees//gitdir, the path it writes is relative, at least under some circumstances. This contradicts the gitrepository-layout man page, which says: worktrees//gitdir:: A text file containing the absolute path back to the .git file that points to here. IIUC, this behavior defeats one of the three safeguards that is supposed to prevent "git worktree prune" from pruning information for worktrees that still exist. A simple script to reproduce: #!/bin/bash set -e -x rm -rf repo worktree2 git init repo cd repo touch foo git add foo git commit -m 'dummy commit' git worktree add ../worktree2 -b branch2 cat .git/worktrees/worktree2/gitdir touch -d '2 days ago' .git/worktrees/worktree2/gitdir (cd ../worktree2 && git status) cat .git/worktrees/worktree2/gitdir Trying this on master as of earlier today (ff4ea60), I get: [...] /PATH/REDACTED/worktree2/.git [...] .git Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation/git-clean.txt: don't mention deletion of .git/modules/*
I found no evidence of such behavior in the source code. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- This is based on the maint branch, a08595f. Try #2 to get correct email formatting. Documentation/git-clean.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index 641681f..51a7e26 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -37,9 +37,7 @@ OPTIONS to false, 'git clean' will refuse to delete files or directories unless given -f, -n or -i. Git will refuse to delete directories with .git sub directory or file unless a second -f - is given. This affects also git submodules where the storage area - of the removed submodule under .git/modules/ is not removed until - -f is given twice. + is given. -i:: --interactive:: -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: update_linked_gitdir writes relative path to .git/worktrees//gitdir
On Sun, 2016-02-07 at 15:56 -0800, Junio C Hamano wrote: > Matt McCutchen <m...@mattmccutchen.net> writes: > > > I noticed that when update_linked_gitdir chooses to update > > .git/worktrees//gitdir, the path it writes is relative, at > > least > > under some circumstances. This contradicts the gitrepository- > > layout > > man page, which says: > > Duy, is it safe to say that the fix has already been cooking in > 'next' as nd/do-not-move-worktree-manually topic, Yes, looks like that topic removes the buggy functionality. > it is very much > appreciated when reporting bugs people check if a presumed fix is > already cooking in 'next', try it to verify if it really fixes their > problem, and send in a "OK fix is good" / "No that does not fix my > case"? Sorry to waste your time. This wasn't documented where I looked, namely the "Bug Reporting" section on http://git-scm.com/community . Here's a straw-man proposed update to that page: https://github.com/mattmccutchen/git-scm.com/compare/master...bug-reporting-next If you like it, I will submit it as a pull request. I can propose a similar update to the "REPORTING BUGS" section of the git(1) man page if you like. Matt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation/git-clean.txt: don't mention deletion of .git/modules/*
I found no evidence of such behavior in the source code. Signed-off-by: Matt McCutchen <m...@mattmccutchen.net> --- This is based on the maint branch, a08595f. Documentation/git-clean.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index 641681f..51a7e26 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -37,9 +37,7 @@ OPTIONS to false, 'git clean' will refuse to delete files or directories unless given -f, -n or -i. Git will refuse to delete directories with .git sub directory or file unless a second -f - is given. This affects also git submodules where the storage area - of the removed submodule under .git/modules/ is not removed until - -f is given twice. + is given. -i:: --interactive:: -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html