[PATCH 1/2] sha1_object_info_extended: provide delta base sha1s
A caller of sha1_object_info_extended technically has enough information to determine the base sha1 from the results of the call. It knows the pack, offset, and delta type of the object, which is sufficient to find the base. However, the functions to do so are not publicly available, and the code itself is intimate enough with the pack details that it should be abstracted away. We could add a public helper to allow callers to query the delta base separately, but it is simpler and slightly more efficient to optionally grab it along with the rest of the object_info data. For cases where the object is not stored as a delta, we write the null sha1 into the query field. A careful caller could check oi.whence == OI_PACKED oi.u.packed.is_delta before looking at the base sha1, but using the null sha1 provides a simple alternative (and gives a better sanity check for a non-careful caller than simply returning random bytes). Signed-off-by: Jeff King p...@peff.net --- cache.h | 1 + sha1_file.c | 53 + 2 files changed, 54 insertions(+) diff --git a/cache.h b/cache.h index ce377e1..67356db 100644 --- a/cache.h +++ b/cache.h @@ -1074,6 +1074,7 @@ struct object_info { enum object_type *typep; unsigned long *sizep; unsigned long *disk_sizep; + unsigned char *delta_base_sha1; /* Response */ enum { diff --git a/sha1_file.c b/sha1_file.c index daacc0c..4e8dd8b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1667,6 +1667,38 @@ static off_t get_delta_base(struct packed_git *p, return base_offset; } +/* + * Like get_delta_base above, but we return the sha1 instead of the pack + * offset. This means it is cheaper for REF deltas (we do not have to do + * the final object lookup), but more expensive for OFS deltas (we + * have to load the revidx to convert the offset back into a sha1). + */ +static const unsigned char *get_delta_base_sha1(struct packed_git *p, + struct pack_window **w_curs, + off_t curpos, + enum object_type type, + off_t delta_obj_offset) +{ + if (type == OBJ_REF_DELTA) { + unsigned char *base = use_pack(p, w_curs, curpos, NULL); + return base; + } else if (type == OBJ_OFS_DELTA) { + struct revindex_entry *revidx; + off_t base_offset = get_delta_base(p, w_curs, curpos, + type, delta_obj_offset); + + if (!base_offset) + return NULL; + + revidx = find_pack_revindex(p, base_offset); + if (!revidx) + return NULL; + + return nth_packed_object_sha1(p, revidx-nr); + } else + return NULL; +} + int unpack_object_header(struct packed_git *p, struct pack_window **w_curs, off_t *curpos, @@ -1824,6 +1856,22 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } } + if (oi-delta_base_sha1) { + if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { + const unsigned char *base; + + base = get_delta_base_sha1(p, w_curs, curpos, + type, obj_offset); + if (!base) { + type = OBJ_BAD; + goto out; + } + + hashcpy(oi-delta_base_sha1, base); + } else + hashclr(oi-delta_base_sha1); + } + out: unuse_pack(w_curs); return type; @@ -2407,6 +2455,9 @@ static int sha1_loose_object_info(const unsigned char *sha1, git_zstream stream; char hdr[32]; + if (oi-delta_base_sha1) + hashclr(oi-delta_base_sha1); + /* * If we don't care about type or size, then we don't * need to look inside the object at all. Note that we @@ -2457,6 +2508,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) *(oi-sizep) = co-size; if (oi-disk_sizep) *(oi-disk_sizep) = 0; + if (oi-delta_base_sha1) + hashclr(oi-delta_base_sha1); oi-whence = OI_CACHED; return 0; } -- 1.8.5.1.399.g900e7cd -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] cat-file --batch-check='%(deltabase)'
This series lets you query the delta base for a particular object (or set of objects), like: $ git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(deltabase) %(rest)' The only other way I know of to get this information is using verify-pack (or index-pack), which is much slower (and less convenient to parse). I needed this recently to write tests for another (not yet published) series. But I think it stands on its own as a debugging / introspection tool. [1/2]: sha1_object_info_extended: provide delta base sha1s [2/2]: cat-file: provide %(deltabase) batch format -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] cat-file: provide %(deltabase) batch format
It can be useful for debugging or analysis to see which objects are stored as delta bases on top of others. This information is available by running `git verify-pack`, but that is extremely expensive (and is harder than necessary to parse). Instead, let's make it available as a cat-file query format, which makes it fast and simple to get the bases for a subset of the objects. Signed-off-by: Jeff King p...@peff.net --- Documentation/git-cat-file.txt | 12 +--- builtin/cat-file.c | 6 ++ t/t1006-cat-file.sh| 34 ++ 3 files changed, 49 insertions(+), 3 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 322f5ed..f6a16f4 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -109,6 +109,11 @@ newline. The available atoms are: The size, in bytes, that the object takes up on disk. See the note about on-disk sizes in the `CAVEATS` section below. +`deltabase`:: + If the object is stored as a delta on-disk, this expands to the + 40-hex sha1 of the delta base object. Otherwise, expands to the + null sha1 (40 zeroes). See `CAVEATS` below. + `rest`:: If this atom is used in the output string, input lines are split at the first whitespace boundary. All characters before that @@ -152,10 +157,11 @@ should be taken in drawing conclusions about which refs or objects are responsible for disk usage. The size of a packed non-delta object may be much larger than the size of objects which delta against it, but the choice of which object is the base and which is the delta is arbitrary -and is subject to change during a repack. Note also that multiple copies -of an object may be present in the object database; in this case, it is -undefined which copy's size will be reported. +and is subject to change during a repack. +Note also that multiple copies of an object may be present in the object +database; in this case, it is undefined which copy's size or delta base +will be reported. GIT --- diff --git a/builtin/cat-file.c b/builtin/cat-file.c index b2ca775..2e0af2e 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -118,6 +118,7 @@ struct expand_data { unsigned long size; unsigned long disk_size; const char *rest; + unsigned char delta_base_sha1[20]; /* * If mark_query is true, we do not expand anything, but rather @@ -174,6 +175,11 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, data-split_on_whitespace = 1; else if (data-rest) strbuf_addstr(sb, data-rest); + } else if (is_atom(deltabase, atom, len)) { + if (data-mark_query) + data-info.delta_base_sha1 = data-delta_base_sha1; + else + strbuf_addstr(sb, sha1_to_hex(data-delta_base_sha1)); } else die(unknown format element: %.*s, len, atom); } diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index 8a1bc5c..633dc82 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -240,4 +240,38 @@ test_expect_success --batch-check with multiple sha1s gives correct format ' $(echo_without_newline $batch_check_input | git cat-file --batch-check) ' +test_expect_success 'setup blobs which are likely to delta' ' + test-genrandom foo 10240 foo + { cat foo; echo plus; } foo-plus + git add foo foo-plus + git commit -m foo + cat blobs -\EOF + HEAD:foo + HEAD:foo-plus + EOF +' + +test_expect_success 'confirm that neither loose blob is a delta' ' + cat expect -EOF + $_z40 + $_z40 + EOF + git cat-file --batch-check=%(deltabase) blobs actual + test_cmp expect actual +' + +# To avoid relying too much on the current delta heuristics, +# we will check only that one of the two objects is a delta +# against the other, but not the order. We can do so by just +# asking for the base of both, and checking whether either +# sha1 appears in the output. +test_expect_success '%(deltabase) reports packed delta bases' ' + git repack -ad + git cat-file --batch-check=%(deltabase) blobs actual + { + grep $(git rev-parse HEAD:foo) actual || + grep $(git rev-parse HEAD:foo-plus) actual + } +' + test_done -- 1.8.5.1.399.g900e7cd -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] t0000 cleanups
When I want to debug a failing test, I often end up doing: cd t ./t4107-tab -v -i cd tratab The test names are long, so tab-completing on the trash directory is very helpful. Lately I've noticed that there are a bunch of crufty trash directories in my t/ directory, which makes my tab-completion more annoying. It turns out they're leftovers from t running, due to a bad interaction with some other fixes from last April. The first patch fixes that. The second patch is a follow-on cleanup enabled by the first. The third is an unrelated cleanup that I noticed when I ran t 47 times in a row. :) [1/3]: t: set TEST_OUTPUT_DIRECTORY for sub-tests [2/3]: t: simplify HARNESS_ACTIVE hack [3/3]: t: drop known breakage test t/t-basic.sh | 19 +++ t/test-lib.sh| 2 -- 2 files changed, 7 insertions(+), 14 deletions(-) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] t0000: simplify HARNESS_ACTIVE hack
Commit 517cd55 set HARNESS_ACTIVE unconditionally in sub-tests, because that value affects the output of --verbose. t needs stable output from its sub-tests, and we may or may not be running under a TAP harness. That commit made the decision to always set the variable, since it has another useful side effect, which is suppressing writes to t/test-results by the sub-tests (which would just pollute the real results). Since the last commit, though, the sub-tests have their own test-results directories, so this is no longer an issue. We can now update a few comments that are no longer accurate nor necessary. We can also revisit the choice of HARNESS_ACTIVE. Since we must choose one value for stability, it's probably saner to have it off. This means that future patches could test things like the test-results writing, or the --quiet option, which is currently ignored when run under a harness. Signed-off-by: Jeff King p...@peff.net --- I do not have any plans to write such tests, and I'd be OK if we wanted to stop this just at fixing up the comments. But it took me a while to figure out what is going on, and I believe unsetting HARNESS_ACTIVE in the sub-tests is the choice that is least likely to cause somebody in the future to have to re-figure it out. :) t/t-basic.sh | 14 +- t/test-lib.sh| 2 -- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index bc4e3e2..e6c5b63 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -50,11 +50,11 @@ run_sub_test_lib_test () { shift 2 mkdir $name ( - # Pretend we're a test harness. This prevents - # test-lib from writing the counts to a file that will - # later be summarized, showing spurious failed tests - HARNESS_ACTIVE=t - export HARNESS_ACTIVE + # Pretend we're not running under a test harness, whether we + # are or not. The test-lib output depends on the setting of + # this variable, so we need a stable setting under which to run + # the sub-test. + sane_unset HARNESS_ACTIVE cd $name cat $name.sh -EOF #!$SHELL_PATH @@ -235,16 +235,13 @@ test_expect_success 'test --verbose' ' grep -v ^Initialized empty test-verbose/out+ test-verbose/out check_sub_test_lib_test test-verbose -\EOF expecting success: true -Z ok 1 - passing test Z expecting success: echo foo foo -Z ok 2 - test with output Z expecting success: false -Z not ok 3 - failing test # false Z @@ -267,7 +264,6 @@ test_expect_success 'test --verbose-only' ' Z expecting success: echo foo foo -Z ok 2 - test with output Z not ok 3 - failing test diff --git a/t/test-lib.sh b/t/test-lib.sh index 1cf78d5..1531c24 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -481,8 +481,6 @@ test_at_end_hook_ () { test_done () { GIT_EXIT_OK=t - # Note: t relies on $HARNESS_ACTIVE disabling the .counts - # output file if test -z $HARNESS_ACTIVE then test_results_dir=$TEST_OUTPUT_DIRECTORY/test-results -- 1.8.5.1.399.g900e7cd -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] t0000: drop known breakage test
Having a simulated known breakage test means that the test suite will always tell us there is a bug to be fixed, even though it is only simulated. The right way to test this is in a sub-test, that can also check that we provide the correct exit status and output. Fortunately, we already have such a test (added much later by 5ebf89e). We could arguably get rid of the simulated success test immediately above, as well, as it is also redundant with the tests added in 5ebf89e. However, it does not have the annoying behavior of the known breakage test. It may also be easier to debug if the test suite is truly broken, since it is not a test-within-a-test, as the later tests are. Signed-off-by: Jeff King p...@peff.net --- I am not _that_ bothered by the known breakage, but AFAICT there is zero benefit to keeping this redundant test. But maybe I am missing something. t/t-basic.sh | 3 --- 1 file changed, 3 deletions(-) diff --git a/t/t-basic.sh b/t/t-basic.sh index e6c5b63..a2bb63c 100755 --- a/t/t-basic.sh +++ b/t/t-basic.sh @@ -41,9 +41,6 @@ test_expect_success '.git/objects should have 3 subdirectories' ' test_expect_success 'success is reported like this' ' : ' -test_expect_failure 'pretend we have a known breakage' ' - false -' run_sub_test_lib_test () { name=$1 descr=$2 # stdin is the body of the test code -- 1.8.5.1.399.g900e7cd -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git:// protocol over SSL/TLS
On Fri, Dec 27, 2013 at 08:47:54PM +0600, Sergey Sharybin wrote: The web server software has nothing to do with HTTP[S] used by Git being smart, I think, it just has to be set up properly. Misunderstood git doc then which says it has to be Apache, currently - other CGI servers don't work, last I checked. Do you happen to remember where you saw that claim? If the manual in git's Documentation/ directory says that, I'd like to fix it. I added sample lighttpd config to git help http-backend a while back. I tested it at the time, but I do not currently run a lighttpd git server at all. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/12] Convert starts_with() to skip_prefix() for option parsing
On Thu, Dec 26, 2013 at 11:27:10AM -0800, Junio C Hamano wrote: I still need consensus on the name here guys, parse_prefix. remove_prefix or strip_prefix? If no other opinions i'll go with strip_prefix (Jeff's comment before parse_prefix() also uses strip) Yup, that comment is where I took strip from. When you name your thing as X, using too generic a word X, and then need to explain what X does using a bit more specific word Y, you are often better off naming it after Y. FWIW, the reason I shied away from strip is that I did not want to imply that the function mutates the string. But since nobody else seems concerned with that, I think strip is fine. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] t0000: set TEST_OUTPUT_DIRECTORY for sub-tests
On Sat, Dec 28, 2013 at 02:13:13PM -0800, Jonathan Nieder wrote: So the idea if I am reading correctly is Instead of relying on the implicit output directory chosen with chdir, which doesn't even work any more, set TEST_OUTPUT_DIRECTORY to decide where output for the sub-tests used by t's sanity checks for the test harness go. Right. I'm not sure I completely understand the regression caused by 38b074d. Is the idea that before that commit, TEST_OUTPUT_DIRECTORY was only used for the test-results/ directory so the only harm done was some mixing of test results? $TEST_OUTPUT_DIRECTORY was actually used in $TRASH_DIRECTORY, but some code paths properly used $TRASH_DIRECTORY, and some used another variable that (sometimes) contained a relative form of $TRASH_DIRECTORY. The creation of the repo was one such code-path. So there were already potentially problems before 38b074d (any sub-test looking at its $TRASH_DIRECTORY variable would find the wrong path), but I do not know offhand if it could trigger any bugs. Post-38b074d, we consistently use $TRASH_DIRECTORY (and therefore respect $TEST_OUTPUT_DIRECTORY) everywhere. What is the symptom this patch alleviates? As a result, t's sub-tests are now created in git's original test output directory rather than in our trash directory. This might be the source of my confusion. Is sub-tests an abbreviation for sub-test trash directories here? Yes, I should have said sub-test trash directories. And I think that answers your what is the symptom question. We could fix this by passing a new --root=$TRASH_DIRECTORY option to the sub-test. However, we do not want the sub-tests to write anything at all to git's directory (e.g., they should not be writing to t/test-results, either, although this is already handled by separate code). Ah, HARNESS_ACTIVE prevents output of test-results. Yes. My original notion was Oh, and this fixes broken test-results, too!. But then I noticed that it is already handled in a different way. :) Does the git test harness write something else to TEST_OUTPUT_DIRECTORY? Is the idea that using --root would be functionally equivalent but (1) more confusing and (2) less futureproof? Exactly. I do not think TEST_OUTPUT_DIRECTORY is used for anything else, but if someone were to ever add a new use, the sub-tests would almost certainly want that use to affect only the t trash directory. So, to sum up: if I understand correctly You answered these yourself in your follow-up. :) So the patch itself looks right. I think describing the symptoms up front would probably be enough to make the commit message less confusing to read. Would adding the missing trash directories wording above be sufficient? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] t0000: drop known breakage test
On Sat, Dec 28, 2013 at 12:51:04PM -0800, Jonathan Nieder wrote: Jeff King wrote: I am not _that_ bothered by the known breakage, but AFAICT there is zero benefit to keeping this redundant test. Devil's advocate: it ensures that anyone wrapping git's tests (like the old smoketest infrastructure experiment) is able to handle an expected failure. Thanks. One of the things I love about open source is that as soon as I say I can't see how..., the answer is crowd-sourced for me. :) That being said, even if the test has a non-zero possible value... But in practice I don't mind the behavior before or after this patch. If the test harness is that broken, we'll know. And people writing code that wraps git's tests can write their own custom sanity-checks. ...I think for these reasons that the value is smaller than the disruption caused by the test, and the patch is a net win. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Runaway git remote if group definition contains a remote by the same name
On Sat, Dec 28, 2013 at 03:56:55PM +0100, Alex Riesen wrote: it is also a way to create a fork bomb out of the innocent tool on platforms where pressing Ctrl-C does not terminate subprocesses of the foreground process (like, of course, Windows). To reproduce, run git -c remotes.origin='origin other' remote update origin Hmm. This is a pretty straightforward reference cycle. We expand origin to contain itself, so it recurses forever on expansion. As with most such problems, the cycle path may be longer than one: git -c remotes.foo='bar baz' -c remotes.bar='foo baz' fetch foo Detecting the cycle can be done by keeping track of which names we've seen, or just by putting in a depth limit that no sane setup would hit. In either case, it's complicated slightly by the fact that we pass the expanded list to a sub-process (which then recurses on the expansion). So we'd have to communicate the depth (or the list of seen remotes) via the command line or the environment. One alternative would be to have the parent git fetch recursively expand the list itself down to scalar entries, and then invoke sub-processes on the result (and tell them not to expand at all). That would also let us cull duplicates if a remote is found via multiple groups. Interestingly, the problem does not happen with this: git -c remotes.foo=foo fetch foo Fetch sees that foo expands only to a single item and says oh, that must not be a group. And then treats it like a regular remote, rather than recursing. So it's not clear to me whether groups are meant to be recursive or not. They are in some cases: # fetch remotes 1-4 git -c remotes.parent='child1 child2' \ -c remotes.child1='remote1 remote2' \ -c remotes.child2='remote3 remote4' \ fetch parent but not in others: # foo should be an alias for bar, but it's not git -c remotes.foo=bar \ -c remotes.bar='remote1 remote2' \ fetch foo If they are not allowed to recurse, the problem is much easier; the parent fetch simply tells all of the sub-invocations not to expand the arguments further. However, whether it was planned or not, it has been this way for a long time. I would not be surprised if somebody is relying on the recursion to help organize their groups. So I think the sanest thing is probably: 1. Teach fetch to expand recursively in a single process, and then tell sub-processes (via a new command-line option) not to expand any further. 2. Teach fetch to detect cycles (probably just by a simple depth counter). 3. Teach the group-reading code to detect groups more robustly, so that a single-item group like remotes.foo=bar correctly recurses to bar. 4. (Optional) Teach the expansion code from step 1 to cull duplicates, so that we do not try to fetch from the same remote twice (e.g., if it is mentioned as part of two groups, and both are specified on the command line). I do not plan to work on this myself in the immediate future, but perhaps it is an interesting low-hanging fruit for somebody else. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Runaway git remote if group definition contains a remote by the same name
On Mon, Dec 30, 2013 at 11:10:31AM -0800, Junio C Hamano wrote: So I think the sanest thing is probably: 1. Teach fetch to expand recursively in a single process, and then tell sub-processes (via a new command-line option) not to expand any further. 2. Teach fetch to detect cycles (probably just by a simple depth counter). I suspect that the expansion code will just accumulate remotes found into a string-list (as part of 4. below), so deduping would be fairly easily done without a depth counter. I don't think that will work (at least not naively). The end-product of step 1, and the string_list that is de-duped in step 4, is a list of the concrete remotes. The cycles occur between groups, which are not mentioned in the final list. You can keep a separate list of the groups we visit, of course, but we do not otherwise need it. One thing that does make such a list easier is that we do not need to care about order. E.g., in config like this: [remotes] a = c b = c c = d e you can mark c as seen after visiting it via a. It is not technically a cycle, but since we would want to suppress duplicates anyway, we can be overly broad. 3. Teach the group-reading code to detect groups more robustly, so that a single-item group like remotes.foo=bar correctly recurses to bar. A single-item remote group is somewhat annoying, but expanding it only at some places while ignoring it at other places is even more annoying, so this sounds like a right thing to do. The only configuration that I think would be negatively affected is something like: [remote] foo = foo [remote foo] url = ... that silently works now, but would become broken (because we would complain about the cycle). I think that's OK; that config is clearly stupid and broken. If it were remote.foo = foo bar, trying to expand the concrete foo and bar, that might make some sense, but then it is already broken in the current code (that is the example that started the discussion). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] t0000 cleanups
On Mon, Dec 30, 2013 at 10:51:25AM -0800, Jonathan Nieder wrote: I think it can be better, since the commit message left me scratching my head while the patch itself seems pretty simple. How about something like the following? I am fine with that format, though... Analysis and fix: These scratch areas for sub-tests should be under the t trash directory, but because the TEST_OUTPUT_DIRECTORY setting from the toplevel test leaks into the environment they are created under the toplevel output directory (typically t/) instead. Because some of the sub-tests simulate failures, their trash directories are kept around. This is not exactly true. The TEST_OUTPUT_DIRECTORY setting does not leak. t sets $TEST_DIRECTORY (which it must, so the sub-scripts can find test-lib.sh and friends), and then TEST_OUTPUT_DIRECTORY uses that as a default if it is not explicitly set. The rest of your rewrite looks correct. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drop unnecessary copying in credential_ask_one
On Thu, Jan 02, 2014 at 09:06:33AM +0800, Tay Ray Chuan wrote: We were leaking memory in there, as after obtaining a string from git_getpass, we returned a copy of it, yet no one else held the original string, apart from credential_ask_one. I don't think this change is correct by itself. credential_ask_one calls git_prompt. That function in turn calls git_terminal_prompt, which returns a pointer to a static buffer (because it may be backed by the system getpass() implementation). So there is no leak there, and dropping the strdup would be bad (the call to ask for the password would overwrite the value we got for the username). However, git_prompt may also call do_askpass if GIT_ASKPASS is set, and here there is a leak, as we duplicate the buffer. To stop the leak, we need to first harmonize the do_askpass and git_terminal_prompt code paths to either both allocate, or both return a static buffer (and then either strdup or not in the caller, depending on which way we go). It looks like what I originally wrote was correct, as both code paths matched. But then I stupidly broke it with 31b49d9, which failed to notice the static specifier on the strbuf in do_askpass, and started using strbuf_detach. I think this is the simplest fix: -- 8 -- Subject: Revert prompt: clean up strbuf usage This reverts commit 31b49d9b653803e7c7fd18b21c8bdd86e3421668. That commit taught do_askpass to hand ownership of our buffer back to the caller rather than simply return a pointer into our internal strbuf. What it failed to notice, though, was that our internal strbuf is static, because we are trying to emulate the getpass() interface. By handing off ownership, we created a memory leak that cannot be solved. Sometimes git_prompt returns a static buffer from getpass() (or our smarter git_terminal_prompt wrapper), and sometimes it returns an allocated string from do_askpass. Signed-off-by: Jeff King p...@peff.net --- prompt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/prompt.c b/prompt.c index d851807..d7bb17c 100644 --- a/prompt.c +++ b/prompt.c @@ -22,6 +22,7 @@ static char *do_askpass(const char *cmd, const char *prompt) if (start_command(pass)) return NULL; + strbuf_reset(buffer); if (strbuf_read(buffer, pass.out, 20) 0) err = 1; @@ -38,7 +39,7 @@ static char *do_askpass(const char *cmd, const char *prompt) strbuf_setlen(buffer, strcspn(buffer.buf, \r\n)); - return strbuf_detach(buffer, NULL); + return buffer.buf; } char *git_prompt(const char *prompt, int flags) -- 1.8.5.2.434.g63b1477 Signed-off-by: Tay Ray Chuan rcta...@gmail.com --- credential.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/credential.c b/credential.c index 86397f3..0d02ad8 100644 --- a/credential.c +++ b/credential.c @@ -54,7 +54,7 @@ static char *credential_ask_one(const char *what, struct credential *c) strbuf_release(desc); strbuf_release(prompt); - return xstrdup(r); + return r; } static void credential_getpass(struct credential *c) -- 1.8.5-rc2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drop unnecessary copying in credential_ask_one
On Wed, Jan 01, 2014 at 10:03:30PM -0500, Jeff King wrote: On Thu, Jan 02, 2014 at 09:06:33AM +0800, Tay Ray Chuan wrote: We were leaking memory in there, as after obtaining a string from git_getpass, we returned a copy of it, yet no one else held the original string, apart from credential_ask_one. I don't think this change is correct by itself. credential_ask_one calls git_prompt. That function in turn calls git_terminal_prompt, which returns a pointer to a static buffer (because it may be backed by the system getpass() implementation). So there is no leak there, and dropping the strdup would be bad (the call to ask for the password would overwrite the value we got for the username). By the way, you can see the breakage from your patch pretty easily by testing the terminal input. Disable any credential helper config you have, and then run: GIT_CURL_VERBOSE=1 \ git ls-remote https://github.com/peff/ask-for-auth 21 | perl -lne '/Authorization: Basic (.*)/ and print $1' | openssl base64 -d enter myuser and mypass respectively on the terminal. The result is that we send mypass:mypass to the server. And then double-free the result, which cases glibc to barf. I wondered why we did not see this breakage in test suite. My assumption was that it was simply because our test user has the same username and password. So I fixed that, but to my surprise we still did not detect the problem. The issue is that your patch does the right thing when GIT_ASKPASS is in use, and breaks only when the user types into the terminal. But the test suite, of course, always uses askpass because it cannot rely on accessing a terminal (we'd have to do some magic with lib-terminal, I think). So it doesn't detect the problem in your patch, but I wonder if it is worth applying the patch below anyway, as it makes the test suite slightly more robust. -- 8 -- Subject: use distinct username/password for http auth tests The httpd server we set up to test git's http client code knows about a single account, in which both the username and password are user@host (the unusual use of the @ here is to verify that we handle the character correctly when URL escaped). This means that we may miss a certain class of errors in which the username and password are mixed up internally by git. We can make our tests more robust by having distinct values for the username and password. In addition to tweaking the server passwd file and the client URL, we must teach the askpass harness to accept multiple values. As a bonus, this makes the setup of some tests more obvious; when we are expecting git to ask only about the password, we can seed the username askpass response with a bogus value. Signed-off-by: Jeff King p...@peff.net --- t/lib-httpd.sh| 15 --- t/lib-httpd/passwd| 2 +- t/t5540-http-push.sh | 4 ++-- t/t5541-http-push.sh | 6 +++--- t/t5550-http-fetch.sh | 10 +- t/t5551-http-fetch.sh | 6 +++--- 6 files changed, 26 insertions(+), 17 deletions(-) diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index c470784..bfdff2a 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -129,7 +129,7 @@ prepare_httpd() { HTTPD_DEST=127.0.0.1:$LIB_HTTPD_PORT HTTPD_URL=$HTTPD_PROTO://$HTTPD_DEST HTTPD_URL_USER=$HTTPD_PROTO://user%40host@$HTTPD_DEST - HTTPD_URL_USER_PASS=$HTTPD_PROTO://user%40host:user%40host@$HTTPD_DEST + HTTPD_URL_USER_PASS=$HTTPD_PROTO://user%40host:pass%40host@$HTTPD_DEST if test -n $LIB_HTTPD_DAV -o -n $LIB_HTTPD_SVN then @@ -217,7 +217,15 @@ setup_askpass_helper() { test_expect_success 'setup askpass helper' ' write_script $TRASH_DIRECTORY/askpass -\EOF echo $TRASH_DIRECTORY/askpass-query askpass: $* - cat $TRASH_DIRECTORY/askpass-response + case $* in + *Username*) + what=user + ;; + *Password*) + what=pass + ;; + esac + cat $TRASH_DIRECTORY/askpass-$what EOF GIT_ASKPASS=$TRASH_DIRECTORY/askpass export GIT_ASKPASS @@ -227,7 +235,8 @@ setup_askpass_helper() { set_askpass() { $TRASH_DIRECTORY/askpass-query - echo $* $TRASH_DIRECTORY/askpass-response + echo $1 $TRASH_DIRECTORY/askpass-user + echo $2 $TRASH_DIRECTORY/askpass-pass } expect_askpass() { diff --git a/t/lib-httpd/passwd b/t/lib-httpd/passwd index f2fbcad..99a34d6 100644 --- a/t/lib-httpd/passwd +++ b/t/lib-httpd/passwd @@ -1 +1 @@ -user@host:nKpa8pZUHx/ic +user@host:xb4E8pqD81KQs diff --git a/t/t5540-http-push.sh b/t/t5540-http-push.sh index 01d0d95..5b0198c 100755 --- a/t/t5540-http-push.sh +++ b/t/t5540-http-push.sh @@ -154,7 +154,7 @@ test_http_push_nonff $HTTPD_DOCUMENT_ROOT_PATH/test_repo.git \ test_expect_success 'push to password-protected repository (user
Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote: - builtin/merge.c is the same, but it is conceptually even worse. It has the end-user supplied string and wants to see if it is a valid strategy. If the user wants to use a custom strategy, a single stat() to make sure if it exists should suffice, and the error codepath should load the command list to present the names of available ones in the error message. Is it a single stat()? I think we would need to check each element of $PATH. Though in practice, the exec-dir would be the first thing we check, and where we would find the majority of hits. So it would still be a win, as we would avoid touching anything but the exec-dir in the common case. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Sun, Dec 22, 2013 at 09:55:23PM +, Ben Maurer wrote: One issue with this approach is that it seems git-pack-index doesn't perform as well with thin packs. git-index-pack uses a multi-threaded approach to resolving the deltas. However, the multithreading only works on deltas that are exclusively in the pack. After the multi-threaded phase, it incrementally brings in objects from external packs, but in single threaded manner. Many objects in the pack have some dependency on an external object, therefore, defeating the multithreading. Yes. It will also just plain perform worse, because it will have to copy over more external objects. This is somewhat made up for getting an actual smaller pack size, but I suspect the completed thin-pack ends up larger than what the server would otherwise send. Because the server is blindly reusing on-disk deltas (which is good, because it takes load off of the server), it misses good delta opportunities between objects in the sent pack (which are likely almost as small, but would not require fixing on the other end). Single-threading the extra work we have to do just exacerbates the problem, of course. Still, I think it will be a net win for end-to-end wall clock time of the operation. You are saving CPU time on the server end, and you're saving network bandwidth with a smaller pack. In my tests on torvalds/linux, doing a fetch across a local machine (so basically discounting network improvements), the times look like (this is end-to-end, counting both server and client CPU time): [vanilla] real0m3.850s user0m7.504s sys 0m0.380s [patched] real0m2.785s user0m2.472s sys 0m0.180s So it was a win both for wall-clock and CPU. What's the use case for a pack file with a SHA1 reference living inside the pack file (why not just use an offset?) Would it make sense to assume that all such files are external and bring them in in the first phase. Once upon a time, ref-delta was the only format supported by packfiles. Later, delta-base-offset was invented, and the client and server negotiate the use of the feature before the packfile is generated (and even when we reuse objects, pack-objects will rewrite the header on the fly to use ref-delta if necessary). These days, pretty much everybody supports delta-base-offset, so I don't think there is any reason index-pack should see ref-delta for a non-thin object. We could probably teach index-pack an --assume-refs-are-thin option to optimize for this case, and have fetch-pack/receive-pack pass it whenever they know that delta-base-offset was negotiated. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Sun, Dec 22, 2013 at 11:47:34AM -0800, Ben Maurer wrote: Jeff King's bitmap branch appears to give a very substantial speedup. After applying this branch, the counting objects phase is basically free. However, I found that the compression phase still takes a non-trivial amount of time. Sorry for the slow reply; I've been on vacation. First off, I'm excited that you're looking into using bitmaps. We've been using them for a while at GitHub, but more testing is definitely appreciated. :) When you build your bitmaps, do you set the pack.writeBitmapHashCache option? We found that it makes a significant difference during the compression phase, as otherwise git attempts deltas between random files based on size. Here are some numbers for a simulated fetch from torvalds/linux, representing about 7 weeks of history. Running: from=2d3c627502f2a9b0a7de06a5a2df2365542a72c9 to=f0a679afefc0b6288310f88606b4bb1f243f1aa9 run() { (echo $to echo ^$from) | git pack-objects --stdout --all-progress --revs /dev/null } echo == no hash cache git repack -adb 2/dev/null time run echo == with hash cache git -c pack.writebitmaphashcache=1 repack -adb 2/dev/null time run produces: == no hash cache Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (7700/7700), done. Writing objects: 100% (20661/20661), 23.23 MiB | 11.13 MiB/s, done. Total 20661 (delta 13884), reused 16638 (delta 12940) real0m3.626s user0m10.760s sys 0m0.060s == with hash cache Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (7700/7700), done. Writing objects: 100% (20661/20661), 22.64 MiB | 10.82 MiB/s, done. Total 20661 (delta 14038), reused 16638 (delta 12940) real0m3.072s user0m6.168s sys 0m0.100s So our resulting pack shrinks a little because we find better deltas, but note that we save a fair bit of CPU time (the wall clock time ends up not all that different, because the single-threaded writing phase represents a big chunk of that). It looks like most of the time spent compressing objects was in cases where the object was already compressed in the packfile, but the delta was based on an object that the client already had. For some reason, --thin wasn't enabling reuse of these deltas. I'm not too surprised. The long-time strategy for a fetch has been to walk down the haves and wants to their merge base. That boundary commit is marked as a preferred base, meaning we won't send it, but it's a good base for other objects, since we know the client has it. Technically _all_ of the history reachable from that merge base could be marked as a preferred base, but we don't do so for efficiency reasons: 1. It's expensive to walk the full object graph for a small fetch, and 2. You would clog the delta-search algorithm if you had a very large number of preferred-base objects. With bitmaps, though, the history walk is free (we just check each object against our have bitmap), so (1) is a non-issue. For (2), we probably don't want to stick each object into the preferred-base list, but we do want to reuse on-disk deltas we have, if we know the other side has the base. I don't know if you went through the same line of thinking, but that matches your proposed solution. :) This is a hacky, poorly styled attempt to figure how how much better performance could be. With the bitmap branch, git should know what objects the client has already and can easily test if an existing delta can be reused. I don't know the branch well enough to code this, so as a hack, I just assumed the client has any delta base that is in the pack file (for our repo, this is always true, because we have a linear history) Even without a linear history, it mostly works. If you are fetching all of the branches from the other side, then you will end up with all of the objects that the remote has. Which means that either you already have the base, or the remote is about to send it to you. It will break down, though, whenever the other side has something you're not fetching. For that you really need to do the have bitmap check. This greatly reduces the time: $ { echo HEAD echo ^$have; } | time ../git-bitmap/install/bin/git pack-objects --use-bitmap-index --revs --stdout --thin /dev/null Counting objects: 220909, done. Compressing objects: 100% (14203/14203), done. Total 220909 (delta 194050), reused 220909 (delta 199885) 3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 2007296maxresident)k 0inputs+0outputs (0major+416243minor)pagefaults 0swaps You might try with --all-progress (or pipe to wc -c), as this should be reducing the output size, too. Here's my same torvalds/linux test, run with the patch I'm including below: Counting objects: 20661, done. Delta compression using up to 8 threads. Compressing objects: 100% (3677/3677), done. Writing
Re: Bug report: stash in upstream caused remote fetch to fail
On Mon, Jan 06, 2014 at 08:16:31AM -0800, Junio C Hamano wrote: I was going to ask you to send your repository, but I can easily reproduce here. I guess people don't run into it because it's uncommon to fetch the whole refs/ namespace from a non-bare repo (and bare repos do not tend to have stashes). Here's a minimal reproduction recipe: git init repo cd repo echo content foo git add . git commit -m foo echo more foo git stash git init --bare sub cd sub git fetch .. 'refs/*:refs/*' It looks like we are not feeding refs/stash properly to pack-objects. I'll try to take a closer look later today. I looked at this in the past and I vaguely recall that we reject it in the for-each-ref loop with check-ref-format saying eh, that is a single-level name. At that point I stopped digging, thinking it was a feature ;-) based on your exact observation about stash vs bare/non-bare. I am fine with rejecting it with a warning, but we should not then complain that the other side did not send us the object, since we should not be asking for it at all. I also do not see us complaining about the funny ref anywhere. So there is definitely _a_ bug here. :) I think somebody else mentioned recently that we do not handle malformed refs consistently. I think it was: http://article.gmane.org/gmane.comp.version-control.git/239381 which might or might not be related. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 08:37:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: We could probably teach index-pack an --assume-refs-are-thin option to optimize for this case, and have fetch-pack/receive-pack pass it whenever they know that delta-base-offset was negotiated. I thought the existing negotiation merely means I understand offset encoded bases, so you are allowed to use that encoding, not I will not accept encoding other than the offset format, so you must use that encoding for everything. You are right about what it means. But this is an optimization, not a correctness thing. So if we assume that senders who are allowed to send offsets will generally do so, it might be a reasonable optimization to guess that ref-delta objects will need thin completion. If we are wrong, the worst case is that we add an extra local object to the end of the pack. So as long as we are right most of the time, it may still be a win. Of course, it may also be possible to simply multi-thread the thin-completion portion of index-pack. That would be even better, though I am not sure how it would work. The resolution of an object in one thread can always become the input for another thread. But maybe we could have each thread come up with a proposed set of objects to add to the pack, and then drop duplicates or something. I haven't looked closely. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Mon, Jan 06, 2014 at 12:06:51PM -0800, Junio C Hamano wrote: Unless you set @{u} to this new configuration, in which case the choice becomes dynamic depending on the current branch, but - if that is the only sane choice based on the current branch, why not use that as the default without having to set the configuration? - Or if that is still insufficient, don't we need branch.*.forkedFrom that is different from branch.*.merge, so that different branches you want to show format-patch output can have different reference points? Yeah, I had similar thoughts. I personally use branch.*.merge as forkedFrom, and it seems like we are going that way anyway with things like git rebase and git merge defaulting to upstream. But then there is git push -u and push.default = upstream, which treats the upstream config as something else entirely. So it seems like there is already some confusion, and either way we go, thisis making it worse to some degree (I do not blame Ram, but rather he has stumbled into a hidden sand pit that we have been building for the past few years... :). I wonder if it is too late to try to clarify this dual usage. It kind of seems like the push config is this is the place I publish to. Which, in many workflows, just so happens to be the exact same as the place you forked from. Could we introduce a new branch.*.pushupstream variable that falls back to branch.*.merge? Or is that just throwing more fuel on the fire (more sand in the pit in my analogy, I guess). I admit I haven't thought it through yet, though. And even if it does work, it may throw a slight monkey wrench in the proposed push.default transition. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 09:57:23AM -0500, Jeff King wrote: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c733379..0cff874 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1402,6 +1402,19 @@ static void check_object(struct object_entry *entry) base_entry-delta_child = entry; unuse_pack(w_curs); return; + } else if(base_ref bitmap_have(base_ref)) { + entry-type = entry-in_pack_type; + entry-delta_size = entry-size; + /* + * XXX we'll leak this, but it's probably OK + * since we'll exit immediately after the packing + * is done + */ + entry-delta = xcalloc(1, sizeof(*entry-delta)); + hashcpy(entry-delta-idx.sha1, base_ref); + entry-delta-preferred_base = 1; + unuse_pack(w_curs); + return; } Just reading over this again, the conditional here should obviously be checking thin (which needs to become a global, as in your patch). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Mon, Jan 06, 2014 at 03:29:57PM -0500, John Szakmeister wrote: Yeah, I had similar thoughts. I personally use branch.*.merge as forkedFrom, and it seems like we are going that way anyway with things like git rebase and git merge defaulting to upstream. But then there is git push -u and push.default = upstream, which treats the upstream config as something else entirely. Just for more reference, I rarely use branch.*.merge as forkedFrom. I typically want to use master as my target, but like Ram, I publish my changes elsewhere for safe keeping. I think in a typical, feature branch-based workflow @{u} would be nearly useless. In my feature-branch development for git.git, my upstream is almost always origin/master[1]. However, sometimes feature branches have dependencies[2] on each other. Representing that via the upstream field makes sense, since that is what you forked from, and what you would want git rebase to start from. -Peff [1] I do not even have a local master branch for git.git work, as it would just be a pain to keep up to date. I am either working directly on a topic branch, or I am integrating in my own personal branch. [2] You should try to minimize dependencies between feature branches, of course, but sometimes they simply form a logical progression of features. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Mon, Jan 06, 2014 at 12:38:30PM -0800, Junio C Hamano wrote: I wonder if it is too late to try to clarify this dual usage. It kind of seems like the push config is this is the place I publish to. Which, in many workflows, just so happens to be the exact same as the place you forked from. Could we introduce a new branch.*.pushupstream variable that falls back to branch.*.merge? Or is that just throwing more fuel on the fire (more sand in the pit in my analogy, I guess). I admit I haven't thought it through yet, though. And even if it does work, it may throw a slight monkey wrench in the proposed push.default transition. Yeah, when I say upstream, I never mean it as where I publish. Your upstream is where you get others' work from. That's my thinking, as well, but it means the upstream push.default is nonsensical. I've thought that all along, but it seems like other people find it useful. I guess because they are in a non-triangular, non-feature-branch setup (I suppose you could think of a central-repo feature-branch workflow as a special form of triangular setup, where the remote is bi-directional, but the branch names are triangular). If we want to declare push -u and push.default=upstream as tools for certain simple bi-directional workflows, that makes sense. But I suspect it may cause extra confusion when people make the jump to using a triangular workflow. For a push to somewhere for safekeeping or other people to look at triangular workflow, it does not make any sense to treat that I publish there place as an upstream (hence having branch.*.remote pointing at that publishing point). You _might_ treat it the same way we treat the upstream, in some special cases. For example, when you say git status, it is useful to see how your topic and the upstream have progressed (i.e., do I need to pull from upstream?). But you may _also_ want to know how your local branch differs from its pushed counterpart (i.e., do I have unsaved commits here that I want to push up?). So having two config options might help with that. Of course, your push upstream (or whatever you want to call it) does not logically have one value. You may push to several places, and would want to compare to each. Once you stop doing that, and instead using branch.*.remote = origin, and branch.*.merge = master, where 'origin' is not your publishing point, @{u} will again start making sense, I think. And I thought that is what setting remote.pushdefault to the publishing point repository was about. If that were sufficient, then we would just need push.default = current, and not upstream (nor simple). I lobbied for that during the discussion, but people seemed to think that upstream was better/more useful. Maybe it was just because remote.pushdefault did not exist then. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote: Let me know how this patch does for you. My testing has been fairly limited so far. This patch looks like a much cleaner version :-). Works well for me, but my test setup may not be great since I didn't get any errors from completely ignoring the haves bitmap :-). Great. Out of curiosity, can you show the before/after? The timings should be similar to what your patch produced, but I'm really curious to see how the pack size changes. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: stash in upstream caused remote fetch to fail
On Mon, Jan 06, 2014 at 12:17:21PM -0800, Junio C Hamano wrote: I am fine with rejecting it with a warning, but we should not then complain that the other side did not send us the object, since we should not be asking for it at all. I also do not see us complaining about the funny ref anywhere. So there is definitely _a_ bug here. :) Oh, no question about that. I was just pointing somebody who already has volunteered to take a look in a direction I recall was where the issue was ;-) Oh, crud, did I volunteer? :) So I found the problem, but I'm really not sure what to make of it. We do check the refname format when evaluating the refspecs, in: do_fetch get_ref_map get_fetch_map check_refname_format Before calling it, we check that it starts with refs/, and then pass the _whole_ refname into check_refname_format. So the latter sees refs/stash. And that's acceptable, as it's not a single-level ref. Thus we do not get the funny ref message. The code looks like this: if (!starts_with((*rmp)-peer_ref-name, refs/) || check_refname_format((*rmp)-peer_ref-name, 0)) { /* print funny ref and ignore */ Then we ask fetch_refs_via_pack to get the actual objects for us. And it checks our refs again, with this call chain: do_fetch fetch_refs transport_fetch_refs fetch_refs_via_pack fetch_pack do_fetch_pack everything_local filter_refs check_refname_format Here, the code looks like this: if (!memcmp(ref-name, refs/, 5) check_refname_format(ref-name + 5, 0)) ; /* trash */ At first I thought we are doing the same check (is it syntactically valid, and is it in refs/), but we're not. We are actually checking the format _only_ of stuff in refs/, and ignoring the rest. Which really makes no sense to me. If it were memcmp(...) || check_refname_format(), then it would be roughly the same check. But it would still be wrong, because note that we pass only the bits under refs/ to check_refname_format. So it sees only stash, and then complains that it is single-level. So the symptom we are seeing is because we are filtering with two different rulesets in different places. But I'm really not sure how to resolve it. The one in filter_refs seems nonsensical to me. Checking _only_ things under refs/ doesn't make sense. And even if that was sensible, feeding half a ref to check_refname_format does not work. In addition to the single-level check, it has other rules that want to see the whole ref (e.g., the ref @ is not valid, but refs/@ is OK; it cannot distinguish them without seeing the prefix). So I can see two options: 1. Make the check in filter_refs look like the one in get_fetch_map. That at least makes them the same, which alleviates the symptom. But we still are running two checks, and if they ever get out of sync, it causes problems. 2. Abolish the check in filter_refs. I think this makes the most sense from the perspective of fetch, because we will already have cleaned up the ref list. But we might need to leave the filtering in place for people who call fetch-pack as a bare plumbing command. It's really not clear to me what the check in filter_refs was trying to do. It dates all the way back to 1baaae5 (Make maximal use of the remote refs, 2005-10-28), but there is not much explanation. I haven't dug into the list around that time to see if there's any discussion. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false
On Mon, Jan 06, 2014 at 07:35:04PM -0800, Brodie Rao wrote: On Mon, Jan 6, 2014 at 7:32 PM, Brodie Rao bro...@sf.io wrote: This change ensures get_sha1_basic() doesn't try to resolve full hashes as refs when ambiguous ref warnings are disabled. This provides a substantial performance improvement when passing many hashes to a command (like git rev-list --stdin) when core.warnambiguousrefs is false. The check incurs 6 stat()s for every hash supplied, which can be costly over NFS. Forgot to add: Signed-off-by: Brodie Rao bro...@sf.io Looks good to me. I wonder if I should have simply gone this route instead of adding warn_on_object_refname_ambiguity, and then people who want cat-file --batch to be fast could just turn off core.warnAmbiguousRefs. I wanted it to happen automatically, though. Alternatively, I guess cat-file --batch could just turn off warn_ambiguous_refs itself. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Making use of bitmaps for thin objects
On Mon, Jan 06, 2014 at 10:14:30PM +, Ben Maurer wrote: It looks like for my repo the size win wasn't as big (~10%). Is it possible that with the kernel test you got extremely lucky and there was some huge binary blob that thin packing turned into a tiny delta? I don't think so. When I look at the reused-delta numbers, I see that we reused ~3000 extra deltas (and the compressing progress meter drops by the same amount. If I do a full clone (or just index-pack the result), it claims ~3000 thin objects completed with local objects (versus 0 in the normal case). So I think we really are getting a lot of little savings adding up, which makes sense. If there were thousands of changed files, a non-thin pack has to have at least _one_ full version of each file. With thin packs, we might literally have only deltas. It was a 7-week period, which might make more difference. I'm going to run some experiments with different time periods to see if that changes anything. It might also be the repo contents. I'm going to try my experiments on a few different repositories. It may be that either the kernel or your repo is unusual in some way. Or maybe I was just lucky. :) When you get a chance, it'd be handy if you could push an updated version of your change out to your public github repo. I'd like to see if folks here are interested in testing this more, and it'd be good to make sure we're testing the diff that is targeted for upstream. I've pushed it to: git://github.com/peff/git.git jk/bitmap-reuse-delta I'll continue to rebase it forward as time goes on (until a cleaned-up version gets merged upstream), but the tip of that branch should always be in a working state. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drop unnecessary copying in credential_ask_one
On Thu, Jan 02, 2014 at 11:08:51AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: ... But the test suite, of course, always uses askpass because it cannot rely on accessing a terminal (we'd have to do some magic with lib-terminal, I think). So it doesn't detect the problem in your patch, but I wonder if it is worth applying the patch below anyway, as it makes the test suite slightly more robust. Sounds like a good first step in the right direction. Thanks. I took a brief look at adding real terminal tests for the credential code using our test-terminal/lib-terminal.sh setup. Unfortunately, it falls short of what we need. test-terminal only handles stdout and stderr streams as fake terminals. We could pretty easily add stdin for input, as it uses fork() to work asynchronously. But the credential code does not actually read from stdin. It opens and reads from /dev/tty explicitly. So I think we'd have to actually fake setting up a controlling terminal. And that means magic with setsid() and ioctl(TIOCSCTTY), which in turn sounds like a portability headache. So it's definitely possible under Linux, and probably under most Unixes. But I'm not sure it's worth the effort, given that review already caught the potential bug here. Another option would be to instrument git_terminal_prompt with a mock-terminal interface (say, reading from a file specified in an environment variable). But I really hate polluting the code with test cruft, and it would not actually be testing an interesting segment of the code, anyway. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false
On Tue, Jan 07, 2014 at 09:51:07AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Alternatively, I guess cat-file --batch could just turn off warn_ambiguous_refs itself. Sounds like a sensible way to go, perhaps on top of this change? The downside is that we would not warn about ambiguous refs anymore, even if the user was expecting it to. I don't know if that matters much. I kind of feel in the --batch situation that it is somewhat useless (I wonder if rev-list --stdin should turn it off, too). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false
On Tue, Jan 07, 2014 at 11:38:15AM -0800, Junio C Hamano wrote: Alternatively, I guess cat-file --batch could just turn off warn_ambiguous_refs itself. Sounds like a sensible way to go, perhaps on top of this change? The downside is that we would not warn about ambiguous refs anymore, even if the user was expecting it to. I don't know if that matters much. That is true already with or without Brodie's change, isn't it? With warn_on_object_refname_ambiguity, cat-file --batch makes us ignore core.warnambigousrefs setting. If we redo 25fba78d (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12) to unconditionally disable warn_ambiguous_refs in cat-file --batch and get rid of warn_on_object_refname_ambiguity, the end result would be the same, no? No, I don't think the end effect is the same (or maybe we are not talking about the same thing. :) ). There are two ambiguity situations: 1. Ambiguous non-fully-qualified refs (e.g., same tag and head name). 2. 40-hex sha1 object names which might also be unqualified ref names. Prior to 25ffba78d, cat-file checked both (like all the rest of git). But checking (2) is very expensive, since otherwise a 40-hex sha1 does not need to do a ref lookup at all, and something like rev-list --objects | cat-file --batch-check processes a large number of these. Detecting (1) is not nearly as expensive. You must already be doing a ref lookup to trigger it (so the relative cost is much closer), and your query size is bounded by the number of refs, not the number of objects. Commit 25ffba78d traded off some safety for a lot of performance by disabling (2), but left (1) in place because the tradeoff is different. The two options I was musing over earlier today were (all on top of Brodie's patch): a. Revert 25ffba78d. With Brodie's patch, core.warnAmbiguousRefs disables _both_ warnings. So we default to safe-but-slow, but people who care about performance can turn off ambiguity warnings. The downside is that you have to know to turn it off manually (and it's a global config flag, so you end up turning it off _everywhere_, not just in big queries where it matters). b. Revert 25ffba78d, but then on top of it just turn off warn_ambiguous_refs unconditionally in cat-file --batch-check. The downside is that we drop the safety from (1). The upside is that the code is a little simpler, as we drop the extra flag. And obviously: c. Just leave it at Brodie's patch with nothing else on top. My thinking in favor of (b) was basically does anybody actually care about ambiguous refs in this situation anyway?. If they do, then I think (c) is my preferred choice. I kind of feel in the --batch situation that it is somewhat useless (I wonder if rev-list --stdin should turn it off, too). I think doing the same as cat-file --batch in rev-list --stdin makes sense. Both interfaces are designed to grok extended SHA-1s, and full 40-hex object names could be ambiguous and we are missing the warning for them. I'm not sure I understand what you are saying. We _do_ have the warning for rev-list --stdin currently. We do _not_ have the warning for cat-file --batch, since my 25ffba78d. I was wondering if rev-list should go the same way as 25ffba78d, for efficiency reasons (e.g., think piping to rev-list --no-walk --stdin). Or are you wondering if we should revert 25fba78d, apply Brodie's change to skip the ref resolution whose result is never used, and tell people who want to use cat-file --batch (or rev-list --stdin) to disable the ambiguity warning themselves? See above. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drop unnecessary copying in credential_ask_one
On Tue, Jan 07, 2014 at 11:44:00AM -0800, Junio C Hamano wrote: test-terminal only handles stdout and stderr streams as fake terminals. We could pretty easily add stdin for input, as it uses fork() to work asynchronously. But the credential code does not actually read from stdin. It opens and reads from /dev/tty explicitly. So I think we'd have to actually fake setting up a controlling terminal. And that means magic with setsid() and ioctl(TIOCSCTTY), which in turn sounds like a portability headache. I wonder if expect has already solved that for us. I would not be surprised if it did. Though it introduces its own portability issues, since we cannot depend on having it. But it is probably enough to just test_lazy_prereq EXPECT 'expect --version' or something. I dunno. I have never used expect, do not have it installed, and am not excited about introducing a new tool dependency. But if you want to explore it, be my guest. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] format-patch: introduce branch.*.forkedFrom
On Wed, Jan 08, 2014 at 02:00:44AM +0530, Ramkumar Ramachandra wrote: On Wed, Jan 8, 2014 at 1:59 AM, Ramkumar Ramachandra artag...@gmail.com wrote: A very common workflow for preparing patches involves working off a topic branch and generating patches against 'master' to send off to the maintainer. However, a plain $ git format-patch -o outgoing is a no-op on a topic branch, and the user has to remember to specify 'master' explicitly everytime. This problem is not unique to format-patch; even a $ git rebase -i is a no-op because the branch to rebase against isn't specified. To tackle this problem, introduce branch.*.forkedFrom which can specify the parent branch of a topic branch. Future patches will build functionality around this new configuration variable. Cc: Jeff King p...@peff.net Cc: Junio C Hamano gis...@pobox.com Signed-off-by: Ramkumar Ramachandra artag...@gmail.com I have not carefully read some of the later bits of the discussion from last night / this morning, so maybe I am missing something, but this seems backwards to me from what Junio and I were discussing earlier. The point was that the meaning of @{upstream} (and branch.*.merge) is _already_ forked-from, and push -u and push.default=upstream are the odd men out. If we are going to add an option to distinguish the two branch relationships: 1. Where you forked from 2. Where you push to we should leave @{upstream} as (1), and add a new option to represent (2). Not the other way around. Am I missing something? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Tue, Jan 07, 2014 at 03:40:56AM +0530, Ramkumar Ramachandra wrote: Jeff King wrote: Yeah, I had similar thoughts. I personally use branch.*.merge as forkedFrom, and it seems like we are going that way anyway with things like git rebase and git merge defaulting to upstream. My issue with that is that I no idea where my branch is with respect to my forked upstream; I find that extremely useful when doing re-spins. Right. I think there are two separate relationships, and they are both shoe-horned into upstream. The solution is to let them be configured separately (and fallback on each other as appropriate to make the burden less on the user). push.default = upstream is a bit of a disaster, in my opinion. I've advocated push.default = current on multiple occasions, and wrote the initial remote.pushDefault series with that configuration in mind. Yeah, I agree with all of that. I wonder if it is too late to try to clarify this dual usage. It kind of seems like the push config is this is the place I publish to. Which, in many workflows, just so happens to be the exact same as the place you forked from. Could we introduce a new branch.*.pushupstream variable that falls back to branch.*.merge? Or is that just throwing more fuel on the fire (more sand in the pit in my analogy, I guess). We already have a branch.*.pushremote, and I don't see the value of branch.*.pushbranch (what you're referring to as pushupstream, I assume) except for Gerrit users. Yes, pushbranch is probably a better name for what I am referring to. I agree that pushremote is probably enough for sane cases. I seem to recall that people advocating the upstream push-default thought that branch name mapping was a useful feature, but I might be mis-remembering. I will let those people speak up for the feature if they see fit; it seems somewhat crazy to me. Frankly, I don't use full triangular workflows myself mainly because my prompt is compromised: when I have a branch.*.remote different from branch.*.pushremote, I'd like to see where my branch is with respect to @{u} and @{publish} (not yet invented); Yes, as two separate relationships, you would theoretically want to be able to see them separately (or simultaneously side by side). Whether exposing that in the prompt is too clunky, I don't know (I don't even show ahead/behind in my prompt, but rather prefer to query it when I care; I have a separate script that queries the ahead/behind against my publishing point, but it would be nice if git handled this itself). I admit I haven't thought it through yet, though. And even if it does work, it may throw a slight monkey wrench in the proposed push.default transition. We're transitioning to push.default = simple which is even simpler than current. Simpler in the sense that it is less likely to do something unexpected. But the rules are actually more complicated. Two examples: 1. Imagine I make a feature branch foo forked off of origin/master, then git push with no arguments. The current scheme would go to foo on origin, but upstream would go to master. Since they don't agree, simple will punt and tell me to be more specific. 2. Imagine I have set my default push remote to publish, am on master (forked from origin/master) and I run git push without arguments. current would push to master on publish. But upstream will complain, because we are not pushing to our upstream remote. I believe simple will therefore reject this. In both cases, I think current does the sane thing, and simple makes things more complicated. The one saving grace it has is that it punts on these cases rather than potentially doing something destructive that the user did not intend. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Tue, Jan 07, 2014 at 04:17:00AM +0530, Ramkumar Ramachandra wrote: Junio C Hamano wrote:. As I said in the different subthread, I am not convinced that you would need the complexity of branch.*.forkedFrom. If you set your upstream to the true upstream (not your publishing point), and have remote.pushdefaultset to 'publish', you can expect git push to do the right thing, and then always say git show-branch publish/topic topic I think it's highly sub-optimal to have a local-branch @{u} for several reasons; the prompt is almost useless in this case, and it will always show your forked-branch ahead of 'master' (assuming that 'master' doesn't update itself in the duration of your development). I actually use local-branch @{u} all the time to represent inter-topic dependencies. For example, imagine I have topic bar which builds on topic foo, which is based on master. I would have: [branch foo] remote = origin merge = refs/heads/master [branch bar] remote = . merge = refs/heads/foo When I rebase foo, I want to rebase it against upstream's master. When I rebase bar, I want to rebase it against foo. And naturally, upstream does not necessarily have a foo, because it is my topic, not theirs (I _may_ have published my foo somewhere, but that is orthogonal, and anyway my local foo is the most up-to-date source, not the pushed version). As an aside, if you want to rebase both branches, you have to topo-sort them to make sure you do foo first, then rebase bar on the result. My daily procedure is something like: all_topics | while read topic; do echo $topic $(git rev-parse $topic@{u}); done | topo_sort | while read topic upstream; do git rebase $upstream $topic || exit 1 done While doing respins, the prompt doesn't aid you in any way. Besides, on several occasions, I found myself working on the same forked-branch from two different machines; then the publish-point isn't necessarily always a publish-point: it's just another upstream for the branch. Right, things get trickier then. But I don't think there is an automatic way around that. Sometimes the published one is more up to date, and sometimes the upstream thing is more up to date. You have to manually tell git which you are currently basing your work on. I find in such a situation that it tends to resolve itself quickly, though, as the first step is to pull in the changes you pushed up from the other machine anyway (either via git reset or git rebase). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] format-patch: introduce branch.*.forkedFrom
On Wed, Jan 08, 2014 at 02:32:10AM +0530, Ramkumar Ramachandra wrote: we should leave @{upstream} as (1), and add a new option to represent (2). Not the other way around. I have a local branch 'forkedfrom' that has two sources: 'master' and 'ram/forkedfrom'. 'ram/forkedfrom' isn't a dumb publish-point: the relationship information I get between 'forkedfrom' and 'ram/forkedfrom' is very useful; it's what helps me tell how my re-roll is doing with respect to the original series; I'd often want to cherry-pick commits/ messages from my original series to prepare the re-roll, so interaction with this source is quite high. On the other hand, the relationship information I get between 'forkedfrom' and 'master' is practically useless: 'forkedfrom' is always ahead of 'master', and a divergence indicates that I need to rebase; I'll never really need to interact with this source. Thanks for a concrete example. I definitely respect the desire to reuse the existing tooling we have for @{u}. At the same time, I think you are warping the meaning of @{u} somewhat. It is _not_ your upstream here, but rather another version of the branch that has useful changes in it. That might be splitting hairs a bit, but I think you will find that the differences leak through in inconvenient spots (like format-patch, where you really _do_ want to default to the true upstream). If we add @{publish} (and @{pu}), then it becomes very convenient to refer to the ram/ version of your branch. That seems like an obvious first step to me. We don't have to add new config, because branch.*.pushremote already handles this. Now you can do git rebase @{pu} which is nice, but not _quite_ as nice as git rebase, which defaults to @{u}. That first step might be enough, and I'd hold off there and try it out for a few days or weeks first. But if you find in your workflow that you are having to specify @{pu} a lot, then maybe it is worth adding an option to default rebase to @{pu} instead of @{u}. You end up in the same place (git rebase without options does what you want), but I think the underlying data more accurately represents what is going on (and there is no need to teach format-patch anything special). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Tue, Jan 07, 2014 at 01:07:08PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Yes, pushbranch is probably a better name for what I am referring to. I agree that pushremote is probably enough for sane cases. I seem to recall that people advocating the upstream push-default thought that branch name mapping was a useful feature, but I might be mis-remembering. I will let those people speak up for the feature if they see fit; it seems somewhat crazy to me. I think branch mapping you recall are for those who want to push their 'topic' to 'review/topic' or something like that. With Git post 7cdebd8a (Merge branch 'jc/push-refmap', 2013-12-27), I think remote.*.push can be used to implement that, by the way. Hmm. The top patch of that series still relies on upstream being a push destination, though. So if I have a triangular workflow where I fork topic from origin/master, my git push origin topic will go to refs/heads/master on origin under the upstream rule. So that seems broken as ever. :) But I guess what you are referring to is that in a triangular world, the second patch lets me do: git config push.default current git config remote.origin.push 'refs/heads/*:refs/review/*' And then git push, git push origin, or git push origin topic all put it in review/topic, which is what I want. I think that is sensible, and only heightens my sense of the upstream push.default as useless. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Wed, Jan 08, 2014 at 02:55:04AM +0530, Ramkumar Ramachandra wrote: Jeff King wrote: My daily procedure is something like: all_topics | while read topic; do echo $topic $(git rev-parse $topic@{u}); done | topo_sort | while read topic upstream; do git rebase $upstream $topic || exit 1 done Ah, I was perhaps over-specializing for my own usecase, where everything is based off 'master'. I never considered 'master' a true upstream because I throw away topic branches after the maintainer merges them. If you have long-running branches that you work on a daily basis, the issue is somewhat different. What I do is maybe somewhat gross, but I continually rebase my patches forward as master develops. So they diverge from where Junio has forked them upstream (which does not necessarily have any relationship with where I forked from, anyway). The nice thing about this is that eventually the topic becomes empty, as rebase drops patches that were merged upstream (or resolve conflicts to end up at an empty patch). It's a nice way of tracking the progress of the patch upstream, and it catches any differences between what's upstream and what's in the topic (in both directions: you see where the maintainer may have marked up your patch, and you may see a place where you added something to be squashed but the maintainer missed it). The downside is that sometimes the conflicts are annoying and complicated (e.g., several patches that touch the same spot are a pain to rebase on top of themselves; the early ones are confused that the later changes are already in place). My primary concern is that the proposed @{publish} should be a first-class citizen; if it has everything that @{u} has, then we're both good: you'd primarily use @{u}, while I'd primarily use @{publish}. Definitely. I think that's the world we want to work towards. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false
On Tue, Jan 07, 2014 at 12:31:57PM -0800, Junio C Hamano wrote: c. Just leave it at Brodie's patch with nothing else on top. My thinking in favor of (b) was basically does anybody actually care about ambiguous refs in this situation anyway?. If they do, then I think (c) is my preferred choice. OK. I agree with that line of thinking. Let's take it one step at a time, i.e. do c. and also use warn_on_object_refname_ambiguity in rev-list --stdin first and leave the simplification (i.e. b.) for later. Here's a series to do that. The first three are just cleanups I noticed while looking at the problem. While I was writing the commit messages, though, I had a thought. Maybe we could simply do the check faster for the common case that most refs do not look like object names? Right now we blindly call dwim_ref for each get_sha1 call, which is the expensive part. If we instead just loaded all of the refnames from the dwim_ref location (basically heads, tags and the top-level of refs/), we could build an index of all of the entries matching the 40-hex pattern. In 99% of cases, this would be zero entries, and the check would collapse to a simple integer comparison (and even if we did have one, it would be a simple binary search in memory). Our index is more racy than actually checking the filesystem, but I don't think it matters here. Anyway, here is the series I came up with, in the meantime. I can take a quick peek at just making it faster, too. [1/4]: cat-file: refactor error handling of batch_objects [2/4]: cat-file: fix a minor memory leak in batch_objects [3/4]: cat-file: restore ambiguity warning flag in batch_objects [4/4]: revision: turn off object/refname ambiguity check for --stdin -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] cat-file: fix a minor memory leak in batch_objects
We should always have been freeing our strbuf, but doing so consistently was annoying until the refactoring in the previous patch. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 971cdde..ce79103 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -314,6 +314,7 @@ static int batch_objects(struct batch_options *opt) break; } + strbuf_release(buf); return retval; } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] cat-file: restore ambiguity warning flag in batch_objects
Since commit 25fba78, we turn off the object/refname ambiguity warning using a global flag. However, we never restore it. This doesn't really matter in the current code, since the program generally exits immediately after the function is done, but it's good code hygeine to clean up after ourselves. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index ce79103..c64e287 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -264,6 +264,7 @@ static int batch_objects(struct batch_options *opt) struct strbuf buf = STRBUF_INIT; struct expand_data data; int retval = 0; + int save_warning = warn_on_object_refname_ambiguity; if (!opt-format) opt-format = %(objectname) %(objecttype) %(objectsize); @@ -314,6 +315,7 @@ static int batch_objects(struct batch_options *opt) break; } + warn_on_object_refname_ambiguity = save_warning; strbuf_release(buf); return retval; } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] cat-file: refactor error handling of batch_objects
This just pulls the return value for the function out of the inner loop, so we can break out of the loop rather than do an early return. This will make it easier to put any cleanup for the function in one place. Signed-off-by: Jeff King p...@peff.net --- Just making the subsequent diffs less noisy... builtin/cat-file.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index f8288c8..971cdde 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -263,6 +263,7 @@ static int batch_objects(struct batch_options *opt) { struct strbuf buf = STRBUF_INIT; struct expand_data data; + int retval = 0; if (!opt-format) opt-format = %(objectname) %(objecttype) %(objectsize); @@ -294,8 +295,6 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(buf, stdin, '\n') != EOF) { - int error; - if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning @@ -310,12 +309,12 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - error = batch_one_object(buf.buf, opt, data); - if (error) - return error; + retval = batch_one_object(buf.buf, opt, data); + if (retval) + break; } - return 0; + return retval; } static const char * const cat_file_usage[] = { -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin
We currently check that any 40-hex object name we receive is not also a refname, and output a warning if this is the case. When rev-list --stdin is used to receive object names, we may receive a large number of inputs, and the cost of checking each name as a ref is relatively high. Commit 25fba78d already dropped this warning for cat-file --batch-check. The same reasoning applies for rev-list --stdin. Let's disable the check in that instance. Here are before and after numbers: $ git rev-list --all commits [before] $ best-of-five -i commits ./git rev-list --stdin --no-walk --pretty=raw real0m0.675s user0m0.552s sys 0m0.120s [after] $ best-of-five -i commits ./git rev-list --stdin --no-walk --pretty=raw real0m0.415s user0m0.400s sys 0m0.012s Signed-off-by: Jeff King p...@peff.net --- Obviously we drop this one (and revert 25fba78d) if I can just make the check faster. revision.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/revision.c b/revision.c index a68fde6..87d04dd 100644 --- a/revision.c +++ b/revision.c @@ -1576,7 +1576,9 @@ static void read_revisions_from_stdin(struct rev_info *revs, { struct strbuf sb; int seen_dashdash = 0; + int save_warning = warn_on_object_refname_ambiguity; + warn_on_object_refname_ambiguity = 0; strbuf_init(sb, 1000); while (strbuf_getwholeline(sb, stdin, '\n') != EOF) { int len = sb.len; @@ -1595,6 +1597,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, REVARG_CANNOT_BE_FILENAME)) die(bad revision '%s', sb.buf); } + warn_on_object_refname_ambiguity = save_warning; if (seen_dashdash) read_pathspec_from_stdin(revs, sb, prune); strbuf_release(sb); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] format-patch: introduce format.defaultTo
On Tue, Jan 07, 2014 at 02:06:12PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: I think that is sensible, and only heightens my sense of the upstream push.default as useless. :) Yes, it only is good for centralized world (it was designed back in the centralized days after all, wasn't it?). I do not think there is any centralized days. From day one, Linus advocated a triangular workflow, and that is how git and kernel develop has always been done. And that is why the default of matching was there. There were people who came later, and who still exist today, who use git in an SVN-like centralized way. So if there were centralized days, we are in them now. :) I just do not see any real advantage even in a centralized world for upstream versus current. Before remote.pushdefault, I can potentially see some use (if you want to abuse @{upstream}), but now I do not see any point. And even in a centralized workflow, I see upstream creating problems. E.g., you fork a feature branch in the centralized repo; it should not get pushed straight back to master! And that is why we invented simple, to prevent such things. I dunno. I have not gone back and read all of the arguments around push.default from last year. It is entirely possible everything I just said was refuted back then, and I am needlessly rehashing old arguments. I remember that Matthieu was one of the main advocates of upstream. I am cc-ing him here to bring his attention (not just to this message, but to the whole thread). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] speeding up 40-hex ambiguity check
On Tue, Jan 07, 2014 at 05:08:56PM -0500, Jeff King wrote: OK. I agree with that line of thinking. Let's take it one step at a time, i.e. do c. and also use warn_on_object_refname_ambiguity in rev-list --stdin first and leave the simplification (i.e. b.) for later. Here's a series to do that. The first three are just cleanups I noticed while looking at the problem. While I was writing the commit messages, though, I had a thought. Maybe we could simply do the check faster for the common case that most refs do not look like object names? Right now we blindly call dwim_ref for each get_sha1 call, which is the expensive part. If we instead just loaded all of the refnames from the dwim_ref location (basically heads, tags and the top-level of refs/), we could build an index of all of the entries matching the 40-hex pattern. In 99% of cases, this would be zero entries, and the check would collapse to a simple integer comparison (and even if we did have one, it would be a simple binary search in memory). That turned out very nicely, and I think we can drop the extra flag entirely. Brodie's patch still makes sense, for people who do want to turn off ambiguity warnings entirely (and I built on his patch, which matters textually for 4 and 5, but it would be easy to rebase). I'm cc-ing Michael, since it is his ref-traversal code I am butchering in the 3rd patch. The first two are the unrelated cleanups from v1. They are not necessary, but I do not see any reason not to include them. [1/5]: cat-file: refactor error handling of batch_objects [2/5]: cat-file: fix a minor memory leak in batch_objects [3/5]: refs: teach for_each_ref a flag to avoid recursion [4/5]: get_sha1: speed up ambiguous 40-hex test [5/5]: get_sha1: drop object/refname ambiguity flag -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects
We should always have been freeing our strbuf, but doing so consistently was annoying until the refactoring in the previous patch. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 971cdde..ce79103 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -314,6 +314,7 @@ static int batch_objects(struct batch_options *opt) break; } + strbuf_release(buf); return retval; } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] cat-file: refactor error handling of batch_objects
This just pulls the return value for the function out of the inner loop, so we can break out of the loop rather than do an early return. This will make it easier to put any cleanup for the function in one place. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index f8288c8..971cdde 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -263,6 +263,7 @@ static int batch_objects(struct batch_options *opt) { struct strbuf buf = STRBUF_INIT; struct expand_data data; + int retval = 0; if (!opt-format) opt-format = %(objectname) %(objecttype) %(objectsize); @@ -294,8 +295,6 @@ static int batch_objects(struct batch_options *opt) warn_on_object_refname_ambiguity = 0; while (strbuf_getline(buf, stdin, '\n') != EOF) { - int error; - if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning @@ -310,12 +309,12 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - error = batch_one_object(buf.buf, opt, data); - if (error) - return error; + retval = batch_one_object(buf.buf, opt, data); + if (retval) + break; } - return 0; + return retval; } static const char * const cat_file_usage[] = { -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion
The normal for_each_ref traversal descends into subdirectories, returning each ref it finds. However, in some cases we may want to just iterate over the top-level of a certain part of the tree. The introduction of the flags option is a little mysterious. We already have a flags option that gets stuck in a callback struct and ends up interpreted in do_one_ref. But the traversal itself does not currently have any flags, and it needs to know about this new flag. We _could_ introduce this as a completely separate flag parameter. But instead, we simply put both flag types into a single namespace, and make it available at both sites. This is simple, and given that we do not have a proliferation of flags (we have had exactly one until now), it is probably sufficient. Signed-off-by: Jeff King p...@peff.net --- I think the flags thing is OK as explained above, but Michael may have a different suggestion for refactoring. refs.c | 61 ++--- 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/refs.c b/refs.c index 3926136..ca854d6 100644 --- a/refs.c +++ b/refs.c @@ -589,6 +589,8 @@ static void sort_ref_dir(struct ref_dir *dir) /* Include broken references in a do_for_each_ref*() iteration: */ #define DO_FOR_EACH_INCLUDE_BROKEN 0x01 +/* Do not recurse into subdirs, just iterate at a single level. */ +#define DO_FOR_EACH_NO_RECURSE 0x02 /* * Return true iff the reference described by entry can be resolved to @@ -661,7 +663,8 @@ static int do_one_ref(struct ref_entry *entry, void *cb_data) * called for all references, including broken ones. */ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, - each_ref_entry_fn fn, void *cb_data) + each_ref_entry_fn fn, void *cb_data, + int flags) { int i; assert(dir-sorted == dir-nr); @@ -669,9 +672,13 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, struct ref_entry *entry = dir-entries[i]; int retval; if (entry-flag REF_DIR) { - struct ref_dir *subdir = get_ref_dir(entry); - sort_ref_dir(subdir); - retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data); + if (flags DO_FOR_EACH_NO_RECURSE) { + struct ref_dir *subdir = get_ref_dir(entry); + sort_ref_dir(subdir); + retval = do_for_each_entry_in_dir(subdir, 0, + fn, cb_data, + flags); + } } else { retval = fn(entry, cb_data); } @@ -691,7 +698,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, */ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, struct ref_dir *dir2, -each_ref_entry_fn fn, void *cb_data) +each_ref_entry_fn fn, void *cb_data, +int flags) { int retval; int i1 = 0, i2 = 0; @@ -702,10 +710,12 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, struct ref_entry *e1, *e2; int cmp; if (i1 == dir1-nr) { - return do_for_each_entry_in_dir(dir2, i2, fn, cb_data); + return do_for_each_entry_in_dir(dir2, i2, fn, cb_data, + flags); } if (i2 == dir2-nr) { - return do_for_each_entry_in_dir(dir1, i1, fn, cb_data); + return do_for_each_entry_in_dir(dir1, i1, fn, cb_data, + flags); } e1 = dir1-entries[i1]; e2 = dir2-entries[i2]; @@ -713,12 +723,15 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, if (cmp == 0) { if ((e1-flag REF_DIR) (e2-flag REF_DIR)) { /* Both are directories; descend them in parallel. */ - struct ref_dir *subdir1 = get_ref_dir(e1); - struct ref_dir *subdir2 = get_ref_dir(e2); - sort_ref_dir(subdir1); - sort_ref_dir(subdir2); - retval = do_for_each_entry_in_dirs( - subdir1, subdir2, fn, cb_data); + if (!(flags DO_FOR_EACH_NO_RECURSE)) { + struct ref_dir *subdir1
[PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag
Now that our object/refname ambiguity test is much faster (thanks to the previous commit), there is no reason for code like cat-file --batch-check to turn it off. Here are before and after timings with this patch (on git.git): $ git rev-list --objects --all | cut -d' ' -f1 objects [with flag] $ best-of-five -i objects ./git cat-file --batch-check real0m0.392s user0m0.368s sys 0m0.024s [without flag, without speedup; i.e., pre-25fba78] $ best-of-five -i objects ./git cat-file --batch-check real0m1.652s user0m0.904s sys 0m0.748s [without flag, with speedup] $ best-of-five -i objects ./git cat-file --batch-check real0m0.388s user0m0.356s sys 0m0.028s So the new implementation does just as well as we did with the flag turning the whole thing off (better actually, but that is within the noise). Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 9 - cache.h| 1 - environment.c | 1 - sha1_name.c| 2 +- 4 files changed, 1 insertion(+), 12 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index ce79103..afba21f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -285,15 +285,6 @@ static int batch_objects(struct batch_options *opt) if (opt-print_contents) data.info.typep = data.type; - /* -* We are going to call get_sha1 on a potentially very large number of -* objects. In most large cases, these will be actual object sha1s. The -* cost to double-check that each one is not also a ref (just so we can -* warn) ends up dwarfing the actual cost of the object lookups -* themselves. We can work around it by just turning off the warning. -*/ - warn_on_object_refname_ambiguity = 0; - while (strbuf_getline(buf, stdin, '\n') != EOF) { if (data.split_on_whitespace) { /* diff --git a/cache.h b/cache.h index ce377e1..73afc38 100644 --- a/cache.h +++ b/cache.h @@ -566,7 +566,6 @@ extern int assume_unchanged; extern int prefer_symlink_refs; extern int log_all_ref_updates; extern int warn_ambiguous_refs; -extern int warn_on_object_refname_ambiguity; extern int shared_repository; extern const char *apply_default_whitespace; extern const char *apply_default_ignorewhitespace; diff --git a/environment.c b/environment.c index 3c76905..c59f6d4 100644 --- a/environment.c +++ b/environment.c @@ -22,7 +22,6 @@ int prefer_symlink_refs; int is_bare_repository_cfg = -1; /* unspecified */ int log_all_ref_updates = -1; /* unspecified */ int warn_ambiguous_refs = 1; -int warn_on_object_refname_ambiguity = 1; int repository_format_version; const char *git_commit_encoding; const char *git_log_output_encoding; diff --git a/sha1_name.c b/sha1_name.c index f83ecb7..b9aaf74 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -451,7 +451,7 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) int at, reflog_len, nth_prior = 0; if (len == 40 !get_sha1_hex(str, sha1)) { - if (warn_ambiguous_refs warn_on_object_refname_ambiguity) { + if (warn_ambiguous_refs) { if (sha1_is_ambiguous_with_ref(sha1)) { warning(warn_msg, len, str); if (advice_object_name_warning) -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
Since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), a 40-hex sha1 causes us to call dwim_ref on the result, on the off chance that we have a matching ref. This can cause a noticeable slow-down when there are a large number of objects. E.g., on linux.git: [baseline timing] $ best-of-five git rev-list --all --pretty=raw real0m3.996s user0m3.900s sys 0m0.100s [same thing, but calling get_sha1 on each commit from stdin] $ git rev-list --all commits $ best-of-five -i commits git rev-list --stdin --pretty=raw real0m7.862s user0m6.108s sys 0m1.760s The problem is that each call to dwim_ref individually stats the possible refs in refs/heads, refs/tags, etc. In the common case, there are no refs that look like sha1s at all. We can therefore do the same check much faster by loading all ambiguous-looking candidates once, and then checking our index for each object. This is technically more racy (somebody might create such a ref after we build our index), but that's OK, as it's just a warning (and we provide no guarantees about whether a simultaneous process ran before or after the ref was created anyway). Here is the time after this patch, which implements the strategy described above: $ best-of-five -i commits git rev-list --stdin --pretty=raw real0m4.966s user0m4.776s sys 0m0.192s We still pay some price to read the commits from stdin, but notice the system time is much lower, as we are avoiding hundreds of thousands of stat() calls. Signed-off-by: Jeff King p...@peff.net --- I wanted to make the ref traversal as cheap as possible, hence the NO_RECURSE flag I added. I thought INCLUDE_BROKEN used to not open up the refs at all, but it looks like it does these days. I wonder if that is worth changing or not. refs.c | 47 +++ refs.h | 2 ++ sha1_name.c | 4 +--- 3 files changed, 50 insertions(+), 3 deletions(-) diff --git a/refs.c b/refs.c index ca854d6..cddd871 100644 --- a/refs.c +++ b/refs.c @@ -4,6 +4,7 @@ #include tag.h #include dir.h #include string-list.h +#include sha1-array.h /* * Make sure ref is something reasonable to have under .git/refs/; @@ -2042,6 +2043,52 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log) return logs_found; } +static int check_ambiguous_sha1_ref(const char *refname, + const unsigned char *sha1, + int flags, + void *data) +{ + unsigned char tmp_sha1[20]; + if (strlen(refname) == 40 !get_sha1_hex(refname, tmp_sha1)) + sha1_array_append(data, tmp_sha1); + return 0; +} + +static void build_ambiguous_sha1_ref_index(struct sha1_array *idx) +{ + const char **rule; + + for (rule = ref_rev_parse_rules; *rule; rule++) { + const char *prefix = *rule; + const char *end = strstr(prefix, %.*s); + char *buf; + + if (!end) + continue; + + buf = xmemdupz(prefix, end - prefix); + do_for_each_ref(ref_cache, buf, check_ambiguous_sha1_ref, + end - prefix, + DO_FOR_EACH_INCLUDE_BROKEN | + DO_FOR_EACH_NO_RECURSE, + idx); + free(buf); + } +} + +int sha1_is_ambiguous_with_ref(const unsigned char *sha1) +{ + struct sha1_array idx = SHA1_ARRAY_INIT; + static int loaded; + + if (!loaded) { + build_ambiguous_sha1_ref_index(idx); + loaded = 1; + } + + return sha1_array_lookup(idx, sha1) = 0; +} + static struct ref_lock *lock_ref_sha1_basic(const char *refname, const unsigned char *old_sha1, int flags, int *type_p) diff --git a/refs.h b/refs.h index 87a1a79..c7d5f89 100644 --- a/refs.h +++ b/refs.h @@ -229,4 +229,6 @@ int update_refs(const char *action, const struct ref_update **updates, extern int parse_hide_refs_config(const char *var, const char *value, const char *); extern int ref_is_hidden(const char *); +int sha1_is_ambiguous_with_ref(const unsigned char *sha1); + #endif /* REFS_H */ diff --git a/sha1_name.c b/sha1_name.c index a5578f7..f83ecb7 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -452,13 +452,11 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) if (len == 40 !get_sha1_hex(str, sha1)) { if (warn_ambiguous_refs warn_on_object_refname_ambiguity) { - refs_found = dwim_ref(str, len, tmp_sha1, real_ref); - if (refs_found 0) { + if (sha1_is_ambiguous_with_ref(sha1)) { warning(warn_msg, len, str
[PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote: + if (flags DO_FOR_EACH_NO_RECURSE) { + struct ref_dir *subdir = get_ref_dir(entry); + sort_ref_dir(subdir); + retval = do_for_each_entry_in_dir(subdir, 0, Obviously this is totally wrong and inverts the point of the flag. And causes something like half of the test suite to fail. Michael was nice enough to point it out to me off-list, but well, I have to face the brown paper bag at some point. :) In my defense, it was a last minute refactor before going to dinner. That is what I get for rushing out the series. Here's a fixed version of patch 3/5. -- 8 -- Subject: refs: teach for_each_ref a flag to avoid recursion The normal for_each_ref traversal descends into subdirectories, returning each ref it finds. However, in some cases we may want to just iterate over the top-level of a certain part of the tree. The introduction of the flags option is a little mysterious. We already have a flags option that gets stuck in a callback struct and ends up interpreted in do_one_ref. But the traversal itself does not currently have any flags, and it needs to know about this new flag. We _could_ introduce this as a completely separate flag parameter. But instead, we simply put both flag types into a single namespace, and make it available at both sites. This is simple, and given that we do not have a proliferation of flags (we have had exactly one until now), it is probably sufficient. Signed-off-by: Jeff King p...@peff.net --- refs.c | 61 ++--- 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/refs.c b/refs.c index 3926136..b70b018 100644 --- a/refs.c +++ b/refs.c @@ -589,6 +589,8 @@ static void sort_ref_dir(struct ref_dir *dir) /* Include broken references in a do_for_each_ref*() iteration: */ #define DO_FOR_EACH_INCLUDE_BROKEN 0x01 +/* Do not recurse into subdirs, just iterate at a single level. */ +#define DO_FOR_EACH_NO_RECURSE 0x02 /* * Return true iff the reference described by entry can be resolved to @@ -661,7 +663,8 @@ static int do_one_ref(struct ref_entry *entry, void *cb_data) * called for all references, including broken ones. */ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, - each_ref_entry_fn fn, void *cb_data) + each_ref_entry_fn fn, void *cb_data, + int flags) { int i; assert(dir-sorted == dir-nr); @@ -669,9 +672,13 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, struct ref_entry *entry = dir-entries[i]; int retval; if (entry-flag REF_DIR) { - struct ref_dir *subdir = get_ref_dir(entry); - sort_ref_dir(subdir); - retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data); + if (!(flags DO_FOR_EACH_NO_RECURSE)) { + struct ref_dir *subdir = get_ref_dir(entry); + sort_ref_dir(subdir); + retval = do_for_each_entry_in_dir(subdir, 0, + fn, cb_data, + flags); + } } else { retval = fn(entry, cb_data); } @@ -691,7 +698,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, */ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, struct ref_dir *dir2, -each_ref_entry_fn fn, void *cb_data) +each_ref_entry_fn fn, void *cb_data, +int flags) { int retval; int i1 = 0, i2 = 0; @@ -702,10 +710,12 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, struct ref_entry *e1, *e2; int cmp; if (i1 == dir1-nr) { - return do_for_each_entry_in_dir(dir2, i2, fn, cb_data); + return do_for_each_entry_in_dir(dir2, i2, fn, cb_data, + flags); } if (i2 == dir2-nr) { - return do_for_each_entry_in_dir(dir1, i1, fn, cb_data); + return do_for_each_entry_in_dir(dir1, i1, fn, cb_data, + flags); } e1 = dir1-entries[i1]; e2 = dir2-entries[i2]; @@ -713,12 +723,15 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, if (cmp == 0
Re: It seems there is a very tight character count limit in .gitconfig
On Wed, Jan 08, 2014 at 02:59:37PM +0800, Li Zhang wrote: I tried to add url xxx insteadof xxx in .gitconfig. If the length of url exceed 125, git will not work. I am using Ubuntu. The default version is 1.7.9.5. Maybe the latest version solve this already. Yes, this was fixed in 0971e99 (Remove the hard coded length limit on variable names in config files, 2012-09-30). Git v1.8.0.1 and higher contain that commit. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH 0/5] branch@{publish} shorthand
On Wed, Jan 08, 2014 at 03:05:48AM +0530, Ramkumar Ramachandra wrote: Agreed. I'll start working on @{publish}. It's going to take quite a bit of effort, because I won't actually start using it until my prompt is @{publish}-aware. There's a fair bit of refactoring involved. I took a stab at it and came up with the series below. No docs or tests, and some of the refactoring in remote.c feels a little weird. I can't help but feel more of the logic from git push should be shared here. But it at least works with my rudimentary examples. I'm hoping it will make a good starting point for you to build on. Otherwise, I may get to it eventually, but it's not a high priority for me right now. Actually, I'm not sure I'd use git rebase @{pu}; for me @{pu} is mainly a source of information for taking apart to build a new series. Ah, that's how I'd probably use it, too. :) [1/5]: sha1_name: refactor upstream_mark [2/5]: interpret_branch_name: factor out upstream handling [3/5]: branch_get: return early on error [4/5]: branch_get: provide per-branch pushremote pointers [5/5]: implement @{publish} shorthand -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] sha1_name: refactor upstream_mark
We will be adding new mark types in the future, so separate the suffix data from the logic. Signed-off-by: Jeff King p...@peff.net --- sha1_name.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index b1873d8..0c50801 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -415,12 +415,12 @@ static int ambiguous_path(const char *path, int len) return slash; } -static inline int upstream_mark(const char *string, int len) +static inline int at_mark(const char *string, int len, + const char **suffix, int nr) { - const char *suffix[] = { @{upstream}, @{u} }; int i; - for (i = 0; i ARRAY_SIZE(suffix); i++) { + for (i = 0; i nr; i++) { int suffix_len = strlen(suffix[i]); if (suffix_len = len !memcmp(string, suffix[i], suffix_len)) @@ -429,6 +429,12 @@ static inline int upstream_mark(const char *string, int len) return 0; } +static inline int upstream_mark(const char *string, int len) +{ + const char *suffix[] = { @{upstream}, @{u} }; + return at_mark(string, len, suffix, ARRAY_SIZE(suffix)); +} + static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned lookup_flags); static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] branch_get: return early on error
Right now we simply check if ret is valid before doing further processing. As we add more processing, this will become more and more cumbersome. Instead, let's just check whether ret is invalid and return early with the error. Signed-off-by: Jeff King p...@peff.net --- remote.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/remote.c b/remote.c index a89efab..a773004 100644 --- a/remote.c +++ b/remote.c @@ -1543,7 +1543,10 @@ struct branch *branch_get(const char *name) ret = current_branch; else ret = make_branch(name, 0); - if (ret ret-remote_name) { + if (!ret) + return NULL; + + if (ret-remote_name) { ret-remote = remote_get(ret-remote_name); if (ret-merge_nr) { int i; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] branch_get: provide per-branch pushremote pointers
When a caller uses branch_get to retrieve a struct branch, they get the per-branch remote name and a pointer to the remote struct. However, they have no way of knowing about the per-branch pushremote from this interface. Let's expose that information via fields similar to remote and remote_name. We have to do a little refactoring around the configuration reading here. Instead of pushremote_name being its own allocated string, it instead becomes a pointer to one of: 1. The pushremote_name of the current branch, if configured. 2. The globally configured remote.pushdefault, which we store separately as pushremote_config_default. We can then set the branch's pushremote field by doing the normal sequence of config fallback. Signed-off-by: Jeff King p...@peff.net --- remote.c | 23 +++ remote.h | 2 ++ 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/remote.c b/remote.c index a773004..53e40e0 100644 --- a/remote.c +++ b/remote.c @@ -50,6 +50,7 @@ static int branches_nr; static struct branch *current_branch; static const char *default_remote_name; static const char *pushremote_name; +static const char *pushremote_config_default; static int explicit_default_remote_name; static struct rewrites rewrites; @@ -351,9 +352,10 @@ static int handle_config(const char *key, const char *value, void *cb) explicit_default_remote_name = 1; } } else if (!strcmp(subkey, .pushremote)) { + if (git_config_string(branch-pushremote_name, key, value)) + return -1; if (branch == current_branch) - if (git_config_string(pushremote_name, key, value)) - return -1; + pushremote_name = branch-pushremote_name; } else if (!strcmp(subkey, .merge)) { if (!value) return config_error_nonbool(key); @@ -385,8 +387,11 @@ static int handle_config(const char *key, const char *value, void *cb) name = key + 7; /* Handle remote.* variables */ - if (!strcmp(name, pushdefault)) - return git_config_string(pushremote_name, key, value); + if (!strcmp(name, pushdefault)) { + if (git_config_string(pushremote_config_default, key, value) 0) + return -1; + pushremote_name = pushremote_config_default; + } /* Handle remote.name.* variables */ if (*name == '/') { @@ -1560,6 +1565,16 @@ struct branch *branch_get(const char *name) } } } + + if (ret-pushremote_name) + ret-pushremote = remote_get(ret-pushremote_name); + else if (pushremote_config_default) + ret-pushremote = remote_get(pushremote_config_default); + else if (ret-remote_name) + ret-pushremote = remote_get(ret-remote_name); + else + ret-pushremote = remote_get(origin); + return ret; } diff --git a/remote.h b/remote.h index 00c6a76..e5beb30 100644 --- a/remote.h +++ b/remote.h @@ -200,6 +200,8 @@ struct branch { const char *remote_name; struct remote *remote; + const char *pushremote_name; + struct remote *pushremote; const char **merge_name; struct refspec **merge; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] interpret_branch_name: factor out upstream handling
This function checks a few different @{}-constructs. The early part checks for and dispatches us to helpers for each construct, but the code for handling @{upstream} is inline. Let's factor this out into its own function. This makes interpret_branch_name more readable, and will make it much simpler to add more constructs in future patches. While we're at it, let's also break apart the refactored code into a few helper functions. These will be useful when we implement similar @{upstream}-like constructs. Signed-off-by: Jeff King p...@peff.net --- sha1_name.c | 83 ++--- 1 file changed, 52 insertions(+), 31 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index 0c50801..50df5d4 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -1052,6 +1052,54 @@ static int reinterpret(const char *name, int namelen, int len, struct strbuf *bu return ret - used + len; } +static void set_shortened_ref(struct strbuf *buf, const char *ref) +{ + char *s = shorten_unambiguous_ref(ref, 0); + strbuf_reset(buf); + strbuf_addstr(buf, s); + free(s); +} + +static const char *get_upstream_branch(const char *branch_buf, int len) +{ + char *branch = xstrndup(branch_buf, len); + struct branch *upstream = branch_get(*branch ? branch : NULL); + + /* +* Upstream can be NULL only if branch refers to HEAD and HEAD +* points to something different than a branch. +*/ + if (!upstream) + die(_(HEAD does not point to a branch)); + if (!upstream-merge || !upstream-merge[0]-dst) { + if (!ref_exists(upstream-refname)) + die(_(No such branch: '%s'), branch); + if (!upstream-merge) { + die(_(No upstream configured for branch '%s'), + upstream-name); + } + die( + _(Upstream branch '%s' not stored as a remote-tracking branch), + upstream-merge[0]-src); + } + free(branch); + + return upstream-merge[0]-dst; +} + +static int interpret_upstream_mark(const char *name, int namelen, + int at, struct strbuf *buf) +{ + int len; + + len = upstream_mark(name + at, namelen - at); + if (!len) + return -1; + + set_shortened_ref(buf, get_upstream_branch(name, at)); + return len + at; +} + /* * This reads short-hand syntax that not only evaluates to a commit * object name, but also can act as if the end user spelled the name @@ -1076,9 +1124,7 @@ static int reinterpret(const char *name, int namelen, int len, struct strbuf *bu int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) { char *cp; - struct branch *upstream; int len = interpret_nth_prior_checkout(name, buf); - int tmp_len; if (!namelen) namelen = strlen(name); @@ -1100,36 +1146,11 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) if (len 0) return reinterpret(name, namelen, len, buf); - tmp_len = upstream_mark(cp, namelen - (cp - name)); - if (!tmp_len) - return -1; + len = interpret_upstream_mark(name, namelen, cp - name, buf); + if (len 0) + return len; - len = cp + tmp_len - name; - cp = xstrndup(name, cp - name); - upstream = branch_get(*cp ? cp : NULL); - /* -* Upstream can be NULL only if cp refers to HEAD and HEAD -* points to something different than a branch. -*/ - if (!upstream) - die(_(HEAD does not point to a branch)); - if (!upstream-merge || !upstream-merge[0]-dst) { - if (!ref_exists(upstream-refname)) - die(_(No such branch: '%s'), cp); - if (!upstream-merge) { - die(_(No upstream configured for branch '%s'), - upstream-name); - } - die( - _(Upstream branch '%s' not stored as a remote-tracking branch), - upstream-merge[0]-src); - } - free(cp); - cp = shorten_unambiguous_ref(upstream-merge[0]-dst, 0); - strbuf_reset(buf); - strbuf_addstr(buf, cp); - free(cp); - return len; + return -1; } int strbuf_branchname(struct strbuf *sb, const char *name) -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] implement @{publish} shorthand
In a triangular workflow, you may have a distinct @{upstream} that you pull changes from, but publish by default (if you typed git push) to a different remote (or a different branch on the remote). It may sometimes be useful to be able to quickly refer to that publishing point (e.g., to see which changes you have that have not yet been published). This patch introduces the branch@{publish} shorthand (or @{pu} to be even shorter). It refers to the tracking branch of the remote branch to which you would push if you were to push the named branch. That's a mouthful to explain, so here's an example: $ git checkout -b foo origin/master $ git config remote.pushdefault github $ git push Signed-off-by: Jeff King p...@peff.net --- The implementation feels weird, like the where do we push to code should be factored out from somewhere else. I think what we're doing here is not _wrong_, but I don't like repeating what git push is doing elsewhere. And I just punt on simple as a result. :) sha1_name.c | 76 - 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/sha1_name.c b/sha1_name.c index 50df5d4..59ffa93 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -435,6 +435,12 @@ static inline int upstream_mark(const char *string, int len) return at_mark(string, len, suffix, ARRAY_SIZE(suffix)); } +static inline int publish_mark(const char *string, int len) +{ + const char *suffix[] = { @{publish} }; + return at_mark(string, len, suffix, ARRAY_SIZE(suffix)); +} + static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned lookup_flags); static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf); @@ -481,7 +487,8 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) nth_prior = 1; continue; } - if (!upstream_mark(str + at, len - at)) { + if (!upstream_mark(str + at, len - at) + !publish_mark(str + at, len - at)) { reflog_len = (len-1) - (at+2); len = at; } @@ -1100,6 +1107,69 @@ static int interpret_upstream_mark(const char *name, int namelen, return len + at; } +static const char *get_publish_branch(const char *name_buf, int len) +{ + char *name = xstrndup(name_buf, len); + struct branch *b = branch_get(*name ? name : NULL); + struct remote *remote = b-pushremote; + const char *dst; + const char *track; + + free(name); + + if (!remote) + die(_(branch '%s' has no remote for pushing), b-name); + + /* Figure out what we would call it on the remote side... */ + if (remote-push_refspec_nr) + dst = apply_refspecs(remote-push, remote-push_refspec_nr, +b-refname); + else + dst = b-refname; + if (!dst) + die(_(unable to figure out how '%s' would be pushed), + b-name); + + /* ...and then figure out what we would call that remote here */ + track = apply_refspecs(remote-fetch, remote-fetch_refspec_nr, dst); + if (!track) + die(_(%s@{publish} has no tracking branch for '%s'), + b-name, dst); + + return track; +} + +static int interpret_publish_mark(const char *name, int namelen, + int at, struct strbuf *buf) +{ + int len; + + len = publish_mark(name + at, namelen - at); + if (!len) + return -1; + + switch (push_default) { + case PUSH_DEFAULT_NOTHING: + die(_(cannot use @{publish} with push.default of 'nothing')); + + case PUSH_DEFAULT_UNSPECIFIED: + case PUSH_DEFAULT_MATCHING: + case PUSH_DEFAULT_CURRENT: + set_shortened_ref(buf, get_publish_branch(name, at)); + break; + + case PUSH_DEFAULT_UPSTREAM: + set_shortened_ref(buf, get_upstream_branch(name, at)); + break; + + case PUSH_DEFAULT_SIMPLE: + /* ??? */ + die(@{publish} with simple unimplemented); + } + + return at + len; +} + /* * This reads short-hand syntax that not only evaluates to a commit * object name, but also can act as if the end user spelled the name @@ -1150,6 +1220,10 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) if (len 0) return len; + len = interpret_publish_mark(name, namelen, cp - name, buf); + if (len 0) + return len; + return -1; } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body
Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
On Tue, Jan 07, 2014 at 10:47:33PM -0500, Jeff King wrote: On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote: + if (flags DO_FOR_EACH_NO_RECURSE) { + struct ref_dir *subdir = get_ref_dir(entry); + sort_ref_dir(subdir); + retval = do_for_each_entry_in_dir(subdir, 0, Obviously this is totally wrong and inverts the point of the flag. And causes something like half of the test suite to fail. And while we're on the subject of my mistakes... The patch needs the fixup below to ensure that retval is always set, even when we do not recurse. I'll hold off on sending a full re-roll of the patch, in the extremely unlikely event that there are other small errors to be fixed. :) diff --git a/refs.c b/refs.c index aafbae9..99c72d0 100644 --- a/refs.c +++ b/refs.c @@ -679,7 +679,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data, flags); - } + } else + retval = 0; } else { retval = fn(entry, cb_data); } @@ -732,7 +733,8 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1, retval = do_for_each_entry_in_dirs( subdir1, subdir2, fn, cb_data, flags); - } + } else + retval = 0; i1++; i2++; } else if (!(e1-flag REF_DIR) !(e2-flag REF_DIR)) { -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] branch_get: provide per-branch pushremote pointers
On Wed, Jan 08, 2014 at 04:35:31AM -0500, Jeff King wrote: @@ -385,8 +387,11 @@ static int handle_config(const char *key, const char *value, void *cb) name = key + 7; /* Handle remote.* variables */ - if (!strcmp(name, pushdefault)) - return git_config_string(pushremote_name, key, value); + if (!strcmp(name, pushdefault)) { + if (git_config_string(pushremote_config_default, key, value) 0) + return -1; + pushremote_name = pushremote_config_default; + } This needs return 0 squashed in at the end of the conditional, of course, to match the old behavior. This patch passes the test suite by itself (with or without that fixup). But oddly, it seems to fail t5531 when merged with 'next'. I can't figure out why, though. It shouldn't affect any code that doesn't look at branch-pushremote. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] t5531: further matching fixups
Commit 43eb920 switched one of the sub-repository in this test to matching to prepare for a world where the default becomes simple. However, the main repository needs a similar change. We did not notice any test failure when merged with b2ed944 (push: switch default from matching to simple, 2013-01-04) because t5531.6 is trying to provoke a failure of git push due to a submodule check. When combined with b2ed944 the push still fails, but for the wrong reason (because our upstream setup does not exist, not because of the submodule). Signed-off-by: Jeff King p...@peff.net --- On Wed, Jan 08, 2014 at 05:27:07AM -0500, Jeff King wrote: This patch passes the test suite by itself (with or without that fixup). But oddly, it seems to fail t5531 when merged with 'next'. I can't figure out why, though. It shouldn't affect any code that doesn't look at branch-pushremote. I still don't understand the full reason for this interaction, but the failing test is actually somewhat broken in 'next' already. This patch fixes it, and should be done regardless of the other series. t/t5531-deep-submodule-push.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh index 8c16e04..445bb5f 100755 --- a/t/t5531-deep-submodule-push.sh +++ b/t/t5531-deep-submodule-push.sh @@ -12,6 +12,7 @@ test_expect_success setup ' ( cd work git init + git config push.default matching mkdir -p gar/bage ( cd gar/bage -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] branch_get: provide per-branch pushremote pointers
On Wed, Jan 08, 2014 at 05:27:07AM -0500, Jeff King wrote: This patch passes the test suite by itself (with or without that fixup). But oddly, it seems to fail t5531 when merged with 'next'. I can't figure out why, though. It shouldn't affect any code that doesn't look at branch-pushremote. OK, I figured it out. My patch calls: remote_get(origin) which creates an origin remote, even if one does not exist (it assumes it to be a URL origin). Later, when we want to decide if the push is triangular or not, we ask for: remote_get(NULL); which will internally look for a remote called origin. Before my patch there was not such a remote, and so the push could not be triangular. After my patch, it finds the bogus remote and says this thing exists, and is not what we are pushing to; therefore the push is triangular. The solution is that I should not be passing the term origin to remote_get, but rather passing NULL and relying on it to figure out the default remote correctly. I.e.: diff --git a/remote.c b/remote.c index 8724388..d214fa2 100644 --- a/remote.c +++ b/remote.c @@ -1574,7 +1574,7 @@ struct branch *branch_get(const char *name) else if (ret-remote_name) ret-pushremote = remote_get(ret-remote_name); else - ret-pushremote = remote_get(origin); + ret-pushremote = remote_get(NULL); return ret; } -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 23/28] Support shallow fetch/clone over smart-http
On Thu, Dec 05, 2013 at 08:02:50PM +0700, Nguyễn Thái Ngọc Duy wrote: Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- Documentation/gitremote-helpers.txt | 7 +++ builtin/fetch-pack.c| 16 +--- remote-curl.c | 31 +-- t/t5536-fetch-shallow.sh| 27 +++ I'm getting test failures in 'next' with GIT_TEST_HTTPD set, and they are bisectable to this patch (actually, the moral equivalent of it, as it looks like the commit message was tweaked along with the test number when it was applied). The failures look like this: $ GIT_TEST_HTTPD=1 ./t5537-fetch-shallow.sh -v -i [...] ok 9 - fetch --update-shallow expecting success: git clone --bare --no-local shallow $HTTPD_DOCUMENT_ROOT_PATH/repo.git git clone $HTTPD_URL/smart/repo.git clone ( cd clone git fsck git log --format=%s origin/master actual cat EOF expect 6 5 4 3 EOF test_cmp expect actual ) Cloning into bare repository '/home/peff/compile/git/t/trash directory.t5537-fetch-shallow/httpd/www/repo.git'... remote: Counting objects: 19, done. remote: Compressing objects: 100% (7/7), done. remote: Total 19 (delta 0), reused 6 (delta 0) Receiving objects: 100% (19/19), done. Checking connectivity... done. Cloning into 'clone'... remote: Counting objects: 19, done. remote: Compressing objects: 100% (7/7), done. remote: Total 19 (delta 0), reused 19 (delta 0) Unpacking objects: 100% (19/19), done. Checking connectivity... done. Checking object directories: 100% (256/256), done. --- expect 2014-01-08 11:20:20.178546452 + +++ actual 2014-01-08 11:20:20.178546452 + @@ -1,3 +1,4 @@ +7 6 5 4 not ok 10 - clone http repository If you do end up tweaking the test, you may also want to fix: +LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5536'} +. $TEST_DIRECTORY/lib-httpd.sh Since the test number got switched, it would be nice to tweak the port number to match it, in case the real t5536 ever starts using lib-httpd. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] implement @{publish} shorthand
On Wed, Jan 08, 2014 at 03:42:09PM -0800, Junio C Hamano wrote: This patch introduces the branch@{publish} shorthand (or @{pu} to be even shorter). It refers to the tracking If @{u} can already be used for upstream, why not allow @{p} but require two letters @{pu}? Just being curious---I am not advocating strongly for a shorter short-hand. Or is @{p} already taken by something and my memory is not functioning well? It is my brain that was not functioning well. I somehow thought well, @{u} is already taken, so we must use @{pu}. Which of course makes no sense, unless you are middle-endian. :) We may want to be cautious about giving up a short-and-sweet single-letter, though, until the feature has proved itself. We could also teach upstream_mark and friends to match unambiguous prefixes (so @{u}, @{up}, @{upst}, etc). That means @{p} would work immediately, but scripts should use @{publish} for future-proofing. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
On Wed, Jan 08, 2014 at 12:29:51PM +0100, Michael Haggerty wrote: Here's a fixed version of patch 3/5. v2 4/5 doesn't apply cleanly on top of v3 3/5. So I'm basing my review on the branch you have at GitHub peff/git jk/cat-file-warn-ambiguous; I hope it is the same. Hrmph. I didn't have to do any conflict resolution during the rebase, so I would think it would apply at least with am -3. -- 8 -- Subject: refs: teach for_each_ref a flag to avoid recursion The normal for_each_ref traversal descends into You haven't changed any for_each_ref*() functions; you have only exposed the DO_FOR_EACH_NO_RECURSE option to the (static) functions for_each_entry*() and do_for_each_ref(). (This is part and parcel of your decision not to expose the new functionality in the refs API.) Please correct the line above. Will do, and I'll add a note on not exposing it (basically because there is not an existing flags parameter in the public API, and nobody needs it). static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset, - each_ref_entry_fn fn, void *cb_data) + each_ref_entry_fn fn, void *cb_data, + int flags) { int i; assert(dir-sorted == dir-nr); Please update the docstring for this function, which still says that it recurses without mentioning DO_FOR_EACH_NO_RECURSE. Will do (and for the _in_dirs variant). static int do_for_each_entry(struct ref_cache *refs, const char *base, -each_ref_entry_fn fn, void *cb_data) +each_ref_entry_fn fn, void *cb_data, +int flags) { struct packed_ref_cache *packed_ref_cache; struct ref_dir *loose_dir; A few lines after this, do_for_each_entry() calls prime_ref_dir(loose_dir) to ensure that all of the loose references that will be iterated over are read before the packed-refs file is checked. It seems to me that prime_ref_dir() should also get a flags parameter to prevent it reading more loose references than necessary, something like this: Hmm. I hadn't considered that, but yeah, it definitely nullifies part of the purpose of the optimization. However, is it safe to prime only part of the loose ref namespace? The point of that priming is to avoid the race fixed in 98eeb09, which depends on us caching the loose refs before the packed refs. But when we read packed-refs, we will be reading and storing _all_ of it, even if we do not touch it in this traversal. So it does not affect the race for this traversal, but have we setup a cache situation where a subsequent for_each_ref in the same process would be subject to the race? I'm starting to wonder if this optimization is worth it. [...] @@ -1718,7 +1732,7 @@ static int do_for_each_ref(struct ref_cache *refs, const char *base, data.fn = fn; data.cb_data = cb_data; - return do_for_each_entry(refs, base, do_one_ref, data); + return do_for_each_entry(refs, base, do_one_ref, data, flags); } This change makes the DO_FOR_EACH_NO_RECURSE option usable with do_for_each_ref() (even though it is never in fact used). It should either be mentioned in the docstring or (if there is a reason not to allow it) explicitly prohibited. Hrm, yeah. I guess there are no callers, and there is no plan for any. So we could just pass 0 here, and then flags passed to do_for_each_ref really is _just_ for the callback data that goes to do_one_ref. That clears up the weird combined namespace stuff I mentioned in the commit message, and is a bit cleaner. I'll take it in that direction. It would be possible to use your new flag to speed up is_refname_available(), but it would be a little bit of work and I doubt that is_refname_available() is ever a bottleneck. Yeah, agreed on both counts. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
On Thu, Jan 09, 2014 at 09:51:24AM -0800, Junio C Hamano wrote: On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote: + if (flags DO_FOR_EACH_NO_RECURSE) { + struct ref_dir *subdir = get_ref_dir(entry); + sort_ref_dir(subdir); + retval = do_for_each_entry_in_dir(subdir, 0, Obviously this is totally wrong and inverts the point of the flag. And causes something like half of the test suite to fail. Michael was nice enough to point it out to me off-list, but well, I have to face the brown paper bag at some point. :) In my defense, it was a last minute refactor before going to dinner. That is what I get for rushing out the series. And perhaps a bad naming that calls for double-negation in the normal cases, which might have been less likely to happen it the new flag were called onelevel only or something, perhaps? That may be a nicer name, but it was not the problem here. The problem here is that I wrote: if (flags DO_FOR_EACH_NO_RECURSE == 0) to avoid the extra layer of parentheses, but of course that doesn't work. And then when I switched it back, I screwed up the reversion. I think the nicest way to write it would be to avoid negation at all, as: if (flags DO_FOR_EACH_RECURSE) { ... do the recursion ... but that means flipping the default, requiring us to set the flag explicitly in the existing callers (though there really aren't that many). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t5537: fix incorrect expectation in test case 10
On Wed, Jan 08, 2014 at 07:13:19PM +0700, Nguyễn Thái Ngọc Duy wrote: Commit 48d25ca adds a new commit 7 to the repo that the next test case in commit 1609488 clones from. But the next test case does not expect this commit. For these tests, it's the bottom that's important, not the top. Fix the expected commit list. Given the test output, I had a feeling it was something like this but didn't dive in (and figured it would be obvious to you). Patch looks sane. Thanks for a quick turnaround. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] implement @{publish} shorthand
On Thu, Jan 09, 2014 at 08:39:44AM -, Philip Oakley wrote: From: Jeff King p...@peff.net Sent: Wednesday, January 08, 2014 9:37 AM In a triangular workflow, you may have a distinct @{upstream} that you pull changes from, but publish by default (if you typed git push) to a different remote (or a different branch on the remote). One of the broader issues is the lack of _documenation_ about what the 'normal' naming convention is for the uspstream remote. Especially the implicit convention used within our documentation (and workflow). This is especially true for github users who will normally fork a repo of interest and then clone it from their own copy/fork. This means that the 'origin' remote is _not_ the upstream. See https://help.github.com/articles/fork-a-repo In my case 'origin' is my publish repo (as suggested by Github) while 'junio' is the upstream (as do some others). There are similar results from the likes of Stackoverflow. Sure, and I have done the same thing (though I tend to clone from the other person as origin, and only fork my own repo when I am ready to push). But it shouldn't matter, should it? The whole point of the upstream config is that git checkout -b topic junio/master does the right thing, without caring about your naming convention. So I'm not sure what you think should be said (or where). Telling me in patch form is preferred. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
On Fri, Jan 10, 2014 at 09:59:25AM +0100, Michael Haggerty wrote: However, is it safe to prime only part of the loose ref namespace? [...] prime_ref_dir() is called by do_for_each_entry(), which all the iteration functions pass through. It is always called before the iteration starts, and it primes only the subtree of the refs hierarchy that is being iterated over. For example, if iterating over refs/heads then it only primes references with that prefix. This is OK, because if later somebody iterates over a broader part of the refs hierarchy (say, refs), then priming is done again, including re-checking the packed refs. Ah, right. This is the part I was forgetting: the next for_each_ref will re-prime with the expanded view. Thanks for a dose of sanity. I'll fix that in my re-roll. It would also be possible to swing in the other direction. I don't remember a particular reason why I left the DO_FOR_EACH_INCLUDE_BROKEN handling at the do_for_each_ref() level rather than handling it at the do_for_each_entry() level. But now that you are passing the flags parameter all the way down the call stack, it wouldn't cost anything to support both of the DO_FOR_EACH flags everywhere and just document it that way. I think it was simply that it was an option that the traversal did not need to know about (just like the trim option), so you kept it as encapsulated as possible. I think I'll introduce it as a separate flag namespace, as discussed in the previous email. It is the same amount of refactoring work to merge them later as it is now, if we so choose. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
On Wed, Jan 08, 2014 at 05:09:25PM +0100, Michael Haggerty wrote: It's not only racy WRT other processes. If the current git process would create a new reference, it wouldn't be reflected in the cache. It's true that the main ref_cache doesn't invalidate itself automatically either when a new reference is created, so it's not really a fair complaint. However, as we add places where the cache is invalidated, it is easy to overlook this cache that is stuck in static variables within a function definition and it is impossible to invalidate it. Might it not be better to attach the cache to the ref_cache structure instead, and couple its lifetime to that object? Yeah, I noticed that we don't ever invalidate the loose ref cache. I think that's mostly fine, as we rely on resolve_ref to cover the cases that need to be ordered properly. And in this particular case, it's only a warning (and a rather obscure one, at that), so I think the stakes are low. That being said, it should not be hard at all to attach the cache to the ref_cache. Since we are generated from that cache, the lifetimes should be the same. Alternatively, the cache could be created and managed on the caller side, since the caller would know when the cache would have to be invalidated. Also, different callers are likely to have different performance characteristics. It is unlikely that the time to initialize the cache will be amortized in most cases; in fact, rev-list --stdin might be the *only* plausible use case. The two I know of are rev-list --stdin and cat-file --batch-check. Regarding the overall strategy: you gather all refnames that could be confused with an SHA-1 into a sha1_array, then later look up SHA-1s in the array to see if they are ambiguous. This is a very special-case optimization for SHA-1s. Yes, it is very sha1-specific. Part of my goal was that in the common case, the check would collapse to O(# of ambiguous refs), which is typically 0. That may be premature optimization, though. As you note below, doing a few binary searches through the in-memory ref cache is _probably_ fine, too. And we can do that without a separate index. I wonder whether another approach would gain almost the same amount of performance but be more general. We could change dwim_ref() (or a version of it?) to read its data out of a ref_cache instead of going to disk every time. Then, at the cost of populating the relevant parts of the ref_cache once, we would have fast dwim_ref() calls for all strings. I'm very nervous about turning dwim_ref into a cache. As we noted above, we never invalidate the cache, so any write-then-read operations could get stale data. That is not as risky as caching, say, resolve_ref, but it still makes me nervous. Caching just the warning has much lower stakes. It's true that the lookups wouldn't be quite so fast--they would require a few bisects per refname lookup (one for each level in the refname hierarchy) and several refname lookups (one for each ref_rev_parse_rule) for every dwim_ref() call, vs. a single bisect in your current design. But this approach it would bring us most of the gain, it might nevertheless be preferable. I don't think this would be all that hard to measure. I'll see what I can do. I wanted to make the ref traversal as cheap as possible, hence the NO_RECURSE flag I added. I thought INCLUDE_BROKEN used to not open up the refs at all, but it looks like it does these days. I wonder if that is worth changing or not. What do you mean by open up the refs? The loose reference files are read when populating the cache. (Was that ever different?) I meant actually open the ref files and read the sha1. But that is me being dumb. We have always done that, as we must provide the sha1 via to the for_each_ref callback. That being said, we could further optimize this by not opening the files at all (and make that the responsibility of do_one_ref, which we are avoiding here). I am slightly worried about the open() cost of my solution. It's amortized away in a big call, but it is probably noticeable for something like `git rev-parse 40-hex`. This doesn't correctly handle the rule refs/remotes/%.*s/HEAD We might be willing to accept this limitation, but it should at least be mentioned somewhere. OTOH if we want to handle this pattern as well, we could do use a technique like that of shorten_unambiguous_ref(). Yes, you're right. I considered this, but for some reason convinced myself that it would be OK to just look for refs/remotes/40-hex in this case (which is what my patch does). But obviously that's wrong. The ref traversal won't find a _directory_ with that name. I'll see how painful it is to make it work. I have to say I was tempted to simply manually write the rules. It's a duplication that could go out of sync with the ref_rev_parse rules, and that's nasty. But that list does not change often, and the reverse-parsing of the rules is
Re: [PATCH] t5531: further matching fixups
On Fri, Jan 10, 2014 at 03:34:59PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: ... but the failing test is actually somewhat broken in 'next' already. Hmph, in what way? I haven't seen t5531 breakage on 'next', with or without your series... The test still passes, but it is not testing the right thing anymore. On 'next', run t5531. Test 6 is push fails when commit on multiple branches if one branch has no remote and ends with: test_must_fail git push --recurse-submodules=check ../pub.git But the output ends with: warning: push.default is unset; its implicit value has changed in Git 2.0 from 'matching' to 'simple'. To squelch this message [...] fatal: The current branch branch2 has no upstream branch. To push the current branch and set the remote as upstream, use git push --set-upstream ../pub.git branch2 When not merged with b2ed944 (push: switch default from matching to simple), or with my patch to set push.default=matching explicitly, the output is: The following submodule paths contain changes that can not be found on any remote: gar/bage Please try git push --recurse-submodules=on-demand or cd to the path and use git push to push them to a remote. fatal: Aborting. which is what the test is actually trying to check. So the push fails, as we expect, but not for the right reason. My other series for @{publish} had a bug that caused the push to succeed. So that series was buggy (and I posted the fix already), but we only noticed it because this test was not working (it should not care about upstream/triangular config at all, but it accidentally did). Does that clarify the situation? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] remote: introduce and fill branch-pushremote
On Sun, Jan 12, 2014 at 10:41:06PM +0530, Ramkumar Ramachandra wrote: When a caller uses branch_get() to retrieve a struct branch, they get the per-branch remote name and a pointer to the remote struct. However, they have no way of knowing about the per-branch pushremote from this interface. So, let's expose that information via fields similar to remote and remote_name; pushremote and pushremote_name. Makes sense. This is similar to what I posted before, but stops short of setting branch-pushremote based on default.pushremote. I think that's a good thing. Your patch matches branch-remote better, and the logic for doing that fallback should probably stay outside of the struct branch construct. All 3 patches look like sane building blocks to me. One comment on this hunk, though: } else if (!strcmp(subkey, .pushremote)) { + if (git_config_string(branch-pushremote_name, key, value)) + return -1; if (branch == current_branch) - if (git_config_string(pushremote_name, key, value)) - return -1; + pushremote_name = branch-pushremote_name; In this code (both before and after your patch), pushremote_name does double-duty for storing both remote.pushdefault, and the current branch's branch.*.pushremote. I introduced an extra variable in my version of the patch to store remote.pushdefault directly, and turned pushremote_name into an alias (either to the current branch config, or to the global config). I did that for two reasons, one minor and one that I think will come up further in the topic: 1. After your patch pushremote_name sometimes owns its memory (if allocated for remote.pushdefault), and sometimes not (if an alias to branch.*.pushremote). This isn't a problem in the current code, because we never actually free() the string, meaning that if you set push.default twice, we leak. But that probably does not matter too much, and we have many such minor leaks of global config. 2. If the current branch has a branch.*.pushremote set, but we want to know where a _different_ branch would be pushed, we have no way to access remote.pushdefault (it gets overwritten in the hunk above). @{upstream} does not have this problem, because it is _only_ defined if branch.*.remote is set. There is no such thing as defaulting to a remote.default (or origin) there, and we never need to look at default_remote_name. For @{publish}, though, I think we will want that default. The common config will be to simply set remote.pushdefault, rather than setting up branch.*.pushremote for each branch, and we would want @{publish} to handle that properly. So I think your patch is OK as-is, as the problem in (2) does not show up until later in the series. But I suspect you will need to do something to address it (and I think it is fine as a patch that comes later to do that refactoring). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] remote: introduce and fill branch-pushremote
On Mon, Jan 13, 2014 at 04:52:52PM +0530, Ramkumar Ramachandra wrote: Not sure I understand what the problem is. Let's say we have two branches: master, and side with remote.pushdefault = ram, branch.*.remote = origin, and branch.side.pushremote = peff. Now, when I query master's pushremote, I get ram and when I query side's pushremote, I get peff; all the logic for falling-back from branch.*.pushremote to remote.pushdefault to branch.*.remote is in branch_get(), so I need to do nothing extra on the caller-side. From the caller's perspective, why does it matter if the pushremote of a particular branch is due to branch.*.pushremote or remote.pushdefault? Imagine your HEAD is at side. What should master@{publish} produce? I would argue ram/master. Where does ram come from in your code? It does not matter for actually pushing, because to do a non-default push, you must always specify a remote. But @{publish} will ask the question even if I am on 'side' now, what would happen if I were to default-push on 'master'?. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] remote: introduce and fill branch-pushremote
On Mon, Jan 13, 2014 at 12:15:08PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: It does not matter for actually pushing, because to do a non-default push, you must always specify a remote. But @{publish} will ask the question even if I am on 'side' now, what would happen if I were to default-push on 'master'?. In a similar wording to yours, it can be said that B@{upstream} is what would happen if I were to default-pull on 'B'?. Right. I wondered at first if there was a similar bug in @{upstream}, but as I noted earlier, it is not defined if a per-branch remote is not set. The answer to your question above is nothing, so we do not have to worry about it. :) A related tangent is what should B@{publish} should yield when there is no triangular configuration variables like remote.pushdefault, branch.B.pushremote and a possible future extension branch.B.push are defined. The definition you gave, i.e. if I were to default-push, gives a good guideline, I think. Yes, that is what I tried for with my original patches. (e.g., push.default=upstream should just make @{publish} a synonym for @{upstream}, which is what my patch did). I punted on simple, but it would ideally do the same thing as push. Which is why I do not think my patches are appropriate as-is; they need to somehow share the logic with git push rather than try to reimplement it. I.e. git push origin master does tell us to push out 'master', but it does not explicitly say what ref to update. It may be set to update their remotes/satellite/master when we are emulating a fetch in reverse by pushing, via e.g. [remote origin] push = refs/heads/master:refs/remotes/satellite/master and it would be intuitive if we make master@{publish} resolve to refs/remotes/satellite/master in such a case. Right. And my patches did that (or at least I intended them to :) ) by applying the push refspec (if any), and then applying the fetch refspec on top of that. But again, that seems like policy that should be shared with git push. That being said, I do not think your example is the best one for @{publish}. You have not specified any remote at all. I think the closest push behavior for @{publish} would be something like: git checkout master git push I.e., where would _that_ push go? One thing that makes things a bit fuzzy is what should happen if you have more than one push destinations. For example: [remote home] pushurl = ... push = refs/heads/master:refs/remotes/satellite/master [remote github] pushurl = ... mirror [remote] pushdefault = ??? git push home updates their 'refs/remotes/satellite/master' with my 'master' with the above, while git push github will update their 'refs/heads/master' with 'master'. We can say master@{publish} is 'remotes/satellite/master' if remote.pushdefault (or 'branch.master.pushremote) is set to 'home', it is 'master' if it is 'github', and if git push while sitting on 'master' does not push it anywhere then master@{publish} is an error. There may be a better definition of what if I were to default-push really means, but I don't think of any offhand. Exactly. I do not think the multiple push destinations matter here, because it is always what would I do if I were on the branch. At most one of them can be the default in that case (based on the config as you noted). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
On Fri, Jan 10, 2014 at 04:41:20AM -0500, Jeff King wrote: That being said, we could further optimize this by not opening the files at all (and make that the responsibility of do_one_ref, which we are avoiding here). I am slightly worried about the open() cost of my solution. It's amortized away in a big call, but it is probably noticeable for something like `git rev-parse 40-hex`. I took a look at this. It gets a bit hairy. My strategy is to add a flag to ask read_loose_refs to create REF_INCOMPLETE values. We currently use this flag for loose REF_DIRs to mean we haven't opendir()'d the subdirectory yet. This would extend it to the non-REF_DIR case to mean we haven't opened the loose ref file yet. We'd check REF_INCOMPLETE before handing the ref_entry to a callback, and complete it if necessary. It gets ugly, though, because we need to pass that flag through quite a bit of callstack. get_ref_dir() needs to know it, which means all of find_containing_dir, etc need it, meaning it pollutes all of the packed-refs code paths too. I have a half-done patch in this direction if that doesn't sound too nasty. This doesn't correctly handle the rule refs/remotes/%.*s/HEAD [...] I'll see how painful it is to make it work. It's actually reasonably painful. I thought at first we could get away with more cleverly parsing the rule, find the prefix (up to the placeholder), and then look for the suffix (/HEAD) inside there. But it can never work with the current do_for_each_* code. That code only triggers a callback when we see a concrete ref. It _never_ lets the callbacks see an intermediate directory. So a NO_RECURSE flag is not sufficient to handle this case. I'd need to teach do_for_each_ref to recurse based on pathspecs, or a custom callback function. And that is getting quite complicated. I think it might be simpler to just do my own custom traversal. What I need is much simpler than what do_for_each_entry provides. I don't need recursion, and I don't actually need to look at the loose and packed refs together. It's OK for me to do them one at a time because I don't care about the actual value; I just want to know about which refs exist. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: check-ref-format and rev-parse can not handle branches with an @ in their name combined with @{u}
On Tue, Jan 14, 2014 at 03:45:27PM -0800, Junio C Hamano wrote: Yet 'git check-ref-format --branch @mybranch@{u}' claims @mybranch is an invalid branch name. I do not think it claims any such thing. $ git check-ref-format --branch @foo@{u}; echo $? fatal: '@foo@{u}' is not a valid branch name 128 $ git check-ref-format --branch @foo; echo $? @foo 0 The former asks Is the string '@foo@{u}' a suitable name to give a branch? and the answer is no. The latter asks the same question about the string '@foo', and the answer is yes. Is that what --branch does? I have never used it, but the manpage seems to suggest it is about _parsing_ (which, IMHO, means it probably should have been an option to rev-parse, but that is another issue altogether). So a more interesting output than the above is: $ git checkout -t -b @mybranch origin/master Branch @mybranch set up to track remote branch master from origin. Switched to a new branch '@mybranch' $ git check-ref-format --branch @mybranch@{u}; echo $? fatal: '@mybranch@{u}' is not a valid branch name 128 $ git check-ref-format --branch HEAD@{u}; echo $? origin/master 0 So check-ref-format --branch does understand @{u} syntax. But it doesn't like @mybranch@{u}. You can see the same problem with rev-parse: $ git rev-parse --symbolic-full-name HEAD@{u} refs/remotes/origin/master $ git rev-parse --symbolic-full-name @mybranch@{u} @mybranch@{u} fatal: ambiguous argument '@mybranch@{u}': unknown revision or path not in the working tree. So I do think there is a bug. The interpret_branch_name parser somehow gets confused by the @ in the name. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: check-ref-format and rev-parse can not handle branches with an @ in their name combined with @{u}
On Tue, Jan 14, 2014 at 11:46:58PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Is that what --branch does? I have never used it, but the manpage seems to suggest it is about _parsing_ (which, IMHO, means it probably should have been an option to rev-parse, but that is another issue altogether). Ahh, of course you are right. I never use it, and somehow thought it was just prepending refs/heads/ to its arguments, but it seems to want to do a lot more than that. I am just about done with a patch series to address this, and a few other related bugs I found. So don't look too hard; I should have something out in a few minutes. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] interpret_branch_name bug potpourri
On Wed, Jan 15, 2014 at 12:00:03AM -0500, Jeff King wrote: $ git rev-parse --symbolic-full-name HEAD@{u} refs/remotes/origin/master $ git rev-parse --symbolic-full-name @mybranch@{u} @mybranch@{u} fatal: ambiguous argument '@mybranch@{u}': unknown revision or path not in the working tree. So I do think there is a bug. The interpret_branch_name parser somehow gets confused by the @ in the name. The somehow is because we only look for the first @, and never consider any possible marks after that. The series below fixes it, along with two other bugs I found while looking at this code. Ugh. Remind me never to look at our object name parser ever again. I feel pretty good that this is fixing real bugs and not regressing anything else. I would not be surprised if there are other weird things lurking, though. See the discussion in patch 4. [1/5]: interpret_branch_name: factor out upstream handling [2/5]: interpret_branch_name: rename cp variable to at [3/5]: interpret_branch_name: always respect namelen parameter [4/5]: interpret_branch_name: avoid @{upstream} past colon [5/5]: interpret_branch_name: find all possible @-marks -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] interpret_branch_name: factor out upstream handling
This function checks a few different @{}-constructs. The early part checks for and dispatches us to helpers for each construct, but the code for handling @{upstream} is inline. Let's factor this out into its own function. This makes interpret_branch_name more readable, and will make it much simpler to further refactor the function in future patches. While we're at it, let's also break apart the refactored code into a few helper functions. These will be useful if we eventually implement similar @{upstream}-like constructs. Signed-off-by: Jeff King p...@peff.net --- This is identical to the cleanup I posted recently for the @{publish} topic, though I did tweak the commit message a bit to make more sense in the context of this series. sha1_name.c | 83 ++--- 1 file changed, 52 insertions(+), 31 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index a5578f7..5db742b 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -1046,6 +1046,54 @@ static int reinterpret(const char *name, int namelen, int len, struct strbuf *bu return ret - used + len; } +static void set_shortened_ref(struct strbuf *buf, const char *ref) +{ + char *s = shorten_unambiguous_ref(ref, 0); + strbuf_reset(buf); + strbuf_addstr(buf, s); + free(s); +} + +static const char *get_upstream_branch(const char *branch_buf, int len) +{ + char *branch = xstrndup(branch_buf, len); + struct branch *upstream = branch_get(*branch ? branch : NULL); + + /* +* Upstream can be NULL only if branch refers to HEAD and HEAD +* points to something different than a branch. +*/ + if (!upstream) + die(_(HEAD does not point to a branch)); + if (!upstream-merge || !upstream-merge[0]-dst) { + if (!ref_exists(upstream-refname)) + die(_(No such branch: '%s'), branch); + if (!upstream-merge) { + die(_(No upstream configured for branch '%s'), + upstream-name); + } + die( + _(Upstream branch '%s' not stored as a remote-tracking branch), + upstream-merge[0]-src); + } + free(branch); + + return upstream-merge[0]-dst; +} + +static int interpret_upstream_mark(const char *name, int namelen, + int at, struct strbuf *buf) +{ + int len; + + len = upstream_mark(name + at, namelen - at); + if (!len) + return -1; + + set_shortened_ref(buf, get_upstream_branch(name, at)); + return len + at; +} + /* * This reads short-hand syntax that not only evaluates to a commit * object name, but also can act as if the end user spelled the name @@ -1070,9 +1118,7 @@ static int reinterpret(const char *name, int namelen, int len, struct strbuf *bu int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) { char *cp; - struct branch *upstream; int len = interpret_nth_prior_checkout(name, buf); - int tmp_len; if (!namelen) namelen = strlen(name); @@ -1094,36 +1140,11 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) if (len 0) return reinterpret(name, namelen, len, buf); - tmp_len = upstream_mark(cp, namelen - (cp - name)); - if (!tmp_len) - return -1; + len = interpret_upstream_mark(name, namelen, cp - name, buf); + if (len 0) + return len; - len = cp + tmp_len - name; - cp = xstrndup(name, cp - name); - upstream = branch_get(*cp ? cp : NULL); - /* -* Upstream can be NULL only if cp refers to HEAD and HEAD -* points to something different than a branch. -*/ - if (!upstream) - die(_(HEAD does not point to a branch)); - if (!upstream-merge || !upstream-merge[0]-dst) { - if (!ref_exists(upstream-refname)) - die(_(No such branch: '%s'), cp); - if (!upstream-merge) { - die(_(No upstream configured for branch '%s'), - upstream-name); - } - die( - _(Upstream branch '%s' not stored as a remote-tracking branch), - upstream-merge[0]-src); - } - free(cp); - cp = shorten_unambiguous_ref(upstream-merge[0]-dst, 0); - strbuf_reset(buf); - strbuf_addstr(buf, cp); - free(cp); - return len; + return -1; } int strbuf_branchname(struct strbuf *sb, const char *name) -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] interpret_branch_name: rename cp variable to at
In the original version of this function, cp acted as a pointer to many different things. Since the refactoring in the last patch, it only marks the at-sign in the string. Let's use a more descriptive variable name. Signed-off-by: Jeff King p...@peff.net --- Obviously can be squashed with the prior refactoring, but I think splitting it makes the diffs easier to read. sha1_name.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index 5db742b..958aa2e 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -1117,7 +1117,7 @@ static int interpret_upstream_mark(const char *name, int namelen, */ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) { - char *cp; + char *at; int len = interpret_nth_prior_checkout(name, buf); if (!namelen) @@ -1132,15 +1132,15 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) return reinterpret(name, namelen, len, buf); } - cp = strchr(name, '@'); - if (!cp) + at = strchr(name, '@'); + if (!at) return -1; - len = interpret_empty_at(name, namelen, cp - name, buf); + len = interpret_empty_at(name, namelen, at - name, buf); if (len 0) return reinterpret(name, namelen, len, buf); - len = interpret_upstream_mark(name, namelen, cp - name, buf); + len = interpret_upstream_mark(name, namelen, at - name, buf); if (len 0) return len; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] interpret_branch_name: always respect namelen parameter
interpret_branch_name gets passed a name buffer to parse, along with a namelen parameter representing its length. If namelen is zero, we fallback to the NUL-terminated string-length of name. However, it does not necessarily follow that if we have gotten a non-zero namelen, it is the NUL-terminated string-length of name. E.g., when get_sha1() is parsing foo:bar, we will be asked to operate only on the first three characters. Yet in interpret_branch_name and its helpers, we use string functions like strchr() to operate on name, looking past the length we were given. This can result in us mis-parsing object names. We should instead be limiting our search to namelen bytes. There are three distinct types of object names this patch addresses: - The intrepret_empty_at helper uses strchr to find the next @-expression after our potential empty-at. In an expression like @:foo@bar, it erroneously thinks that the second @ is relevant, even if we were asked only to look at the first character. This case is easy to trigger (and we test it in this patch). - When finding the initial @-mark for @{upstream}, we use strchr. This means we might treat foo:@{upstream} as the upstream for foo:, even though we were asked only to look at foo. We cannot test this one in practice, because it is masked by another bug (which is fixed in the next patch). - The interpret_nth_prior_checkout helper did not receive the name length at all. This turns out not to be a problem in practice, though, because its parsing is so limited: it always starts from the far-left of the string, and will not tolerate a colon (which is currently the only way to get a smaller-than-strlen namelen). However, it's still worth fixing to make the code more obviously correct, and to future-proof us against callers with more exotic buffers. Signed-off-by: Jeff King p...@peff.net --- sha1_name.c| 17 ++--- t/t1508-at-combinations.sh | 15 ++- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index 958aa2e..6d5038d 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -430,7 +430,7 @@ static inline int upstream_mark(const char *string, int len) } static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned lookup_flags); -static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf); +static int interpret_nth_prior_checkout(const char *name, int namelen, struct strbuf *buf); static int get_sha1_basic(const char *str, int len, unsigned char *sha1) { @@ -492,7 +492,7 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) struct strbuf buf = STRBUF_INIT; int detached; - if (interpret_nth_prior_checkout(str, buf) 0) { + if (interpret_nth_prior_checkout(str, len, buf) 0) { detached = (buf.len == 40 !get_sha1_hex(buf.buf, sha1)); strbuf_release(buf); if (detached) @@ -929,7 +929,8 @@ static int grab_nth_branch_switch(unsigned char *osha1, unsigned char *nsha1, * Parse @{-N} syntax, return the number of characters parsed * if successful; otherwise signal an error with negative value. */ -static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf) +static int interpret_nth_prior_checkout(const char *name, int namelen, + struct strbuf *buf) { long nth; int retval; @@ -937,9 +938,11 @@ static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf) const char *brace; char *num_end; + if (namelen 4) + return -1; if (name[0] != '@' || name[1] != '{' || name[2] != '-') return -1; - brace = strchr(name, '}'); + brace = memchr(name, '}', namelen); if (!brace) return -1; nth = strtol(name + 3, num_end, 10); @@ -1012,7 +1015,7 @@ static int interpret_empty_at(const char *name, int namelen, int len, struct str return -1; /* make sure it's a single @, or @@{.*}, not @foo */ - next = strchr(name + len + 1, '@'); + next = memchr(name + len + 1, '@', namelen - len - 1); if (next next[1] != '{') return -1; if (!next) @@ -1118,7 +1121,7 @@ static int interpret_upstream_mark(const char *name, int namelen, int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) { char *at; - int len = interpret_nth_prior_checkout(name, buf); + int len = interpret_nth_prior_checkout(name, namelen, buf); if (!namelen) namelen = strlen(name); @@ -1132,7 +1135,7 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) return reinterpret(name, namelen, len, buf
[PATCH 4/5] interpret_branch_name: avoid @{upstream} past colon
get_sha1() cannot currently parse a valid object name like HEAD:@{upstream} (assuming that such an oddly named file exists in the HEAD commit). It takes two passes to parse the string: 1. It first considers the whole thing as a ref, which results in looking for the upstream of HEAD:. 2. It finds the colon, parses HEAD as a tree-ish, and then finds the path @{upstream} in the tree. For a path that looks like a normal reflog (e.g., HEAD:@{yesterday}), the first pass is a no-op. We try to dwim_ref(HEAD:), that returns zero refs, and we proceed with colon-parsing. For HEAD:@{upstream}, though, the first pass ends up in interpret_upstream_mark, which tries to find the branch HEAD:. When it sees that the branch does not exist, it actually dies rather than returning an error to the caller. As a result, we never make it to the second pass. One obvious way of fixing this would be to teach interpret_upstream_mark to simply report no, this isn't an upstream in such a case. However, that would make the error-reporting for legitimate upstream cases significantly worse. Something like bogus@{upstream} would simply report unknown revision: bogus@{upstream}, while the current code diagnoses a wide variety of possible misconfigurations (no such branch, branch exists but does not have upstream, etc). However, we can take advantage of the fact that a branch name cannot contain a colon. Therefore even if we find an upstream mark, any prefix with a colon must mean that the upstream mark we found is actually a pathname, and should be disregarded completely. This patch implements that logic. Signed-off-by: Jeff King p...@peff.net --- I think this would actually be cleaner if get_sha1() simply did the colon-parsing first, and omitted the first pass completely. Then sub-functions would not have to care about arbitrary junk that can come in paths; they would always be parsing just the revision-specifier. However, given the way this code has developed over the years and its general fragility, I would be entirely unsurprised if there is some case that relies on the current scheme. So I went with the simple fix here, which should be much less likely to have any fallout. And I could not come up with an example that is actually broken under the current code (we just do some suboptimal things, like looking for foo:bar as a ref in the filesystem, even though it is syntactically bogus). sha1_name.c | 3 +++ t/t1507-rev-parse-upstream.sh | 16 2 files changed, 19 insertions(+) diff --git a/sha1_name.c b/sha1_name.c index 6d5038d..b253a88 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -1093,6 +1093,9 @@ static int interpret_upstream_mark(const char *name, int namelen, if (!len) return -1; + if (memchr(name, ':', at)) + return -1; + set_shortened_ref(buf, get_upstream_branch(name, at)); return len + at; } diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh index 2a19e79..cace1ca 100755 --- a/t/t1507-rev-parse-upstream.sh +++ b/t/t1507-rev-parse-upstream.sh @@ -210,4 +210,20 @@ test_expect_success 'log -g other@{u}@{now}' ' test_cmp expect actual ' +test_expect_success '@{reflog}-parsing does not look beyond colon' ' + echo content @{yesterday} + git add @{yesterday} + git commit -m funny reflog file + git hash-object @{yesterday} expect + git rev-parse HEAD:@{yesterday} actual +' + +test_expect_success '@{upstream}-parsing does not look beyond colon' ' + echo content @{upstream} + git add @{upstream} + git commit -m funny upstream file + git hash-object @{upstream} expect + git rev-parse HEAD:@{upstream} actual +' + test_done -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] interpret_branch_name: find all possible @-marks
When we parse a string like foo@{upstream}, we look for the first @-sign, and check to see if it is an upstream mark. However, since branch names can contain an @, we may also see @foo@{upstream}. In this case, we check only the first @, and ignore the second. As a result, we do not find the upstream. We can solve this by iterating through all @-marks in the string, and seeing if any is a legitimate upstream or empty-at mark. Another strategy would be to parse from the right-hand side of the string. However, that does not work for the empty_at case, which allows @@{upstream}. We need to find the left-most one in this case (and we then recurse as HEAD@{upstream}). Signed-off-by: Jeff King p...@peff.net --- And this one actually fixes Keith's bug. The diff is noisy due to indentation changes; try it with -b for increased reading pleasure. sha1_name.c | 20 +++- t/t1507-rev-parse-upstream.sh | 21 + 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index b253a88..6fca869 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -1124,6 +1124,7 @@ static int interpret_upstream_mark(const char *name, int namelen, int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) { char *at; + const char *start; int len = interpret_nth_prior_checkout(name, namelen, buf); if (!namelen) @@ -1138,17 +1139,18 @@ int interpret_branch_name(const char *name, int namelen, struct strbuf *buf) return reinterpret(name, namelen, len, buf); } - at = memchr(name, '@', namelen); - if (!at) - return -1; + for (start = name; +(at = memchr(start, '@', namelen - (start - name))); +start = at + 1) { - len = interpret_empty_at(name, namelen, at - name, buf); - if (len 0) - return reinterpret(name, namelen, len, buf); + len = interpret_empty_at(name, namelen, at - name, buf); + if (len 0) + return reinterpret(name, namelen, len, buf); - len = interpret_upstream_mark(name, namelen, at - name, buf); - if (len 0) - return len; + len = interpret_upstream_mark(name, namelen, at - name, buf); + if (len 0) + return len; + } return -1; } diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh index cace1ca..178694e 100755 --- a/t/t1507-rev-parse-upstream.sh +++ b/t/t1507-rev-parse-upstream.sh @@ -17,6 +17,9 @@ test_expect_success 'setup' ' test_commit 4 git branch --track my-side origin/side git branch --track local-master master +git branch --track fun@ny origin/side +git branch --track @funny origin/side +git branch --track funny@ origin/side git remote add -t master master-only .. git fetch master-only git branch bad-upstream @@ -54,6 +57,24 @@ test_expect_success 'my-side@{upstream} resolves to correct full name' ' test refs/remotes/origin/side = $(full_name my-side@{u}) ' +test_expect_success 'upstream of branch with @ in middle' ' + full_name fun@ny@{u} actual + echo refs/remotes/origin/side expect + test_cmp expect actual +' + +test_expect_success 'upstream of branch with @ at start' ' + full_name @funny@{u} actual + echo refs/remotes/origin/side expect + test_cmp expect actual +' + +test_expect_success 'upstream of branch with @ at end' ' + full_name funny@@{u} actual + echo refs/remotes/origin/side expect + test_cmp expect actual +' + test_expect_success 'refs/heads/my-side@{upstream} does not resolve to my-side{upstream}' ' test_must_fail full_name refs/heads/my-side@{upstream} ' -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: After stash pop, refs/stash become 40 zeroes
On Wed, Jan 15, 2014 at 01:56:52PM +0800, 乙酸鋰 wrote: what are the possible causes of this? After stash pop, refs/stash becomes 40 zeroes. This is the only stash, so refs/stash should be deleted after pop. However, it becomes 40 zeroes. git 1.8.x I can't reproduce the problem here. Running this: git init -q repo cd repo echo content file git add file git commit -qm file echo more file echo == stashed git stash -q ls -l .git/refs/stash .git/logs/refs/stash git rev-parse --verify refs/stash echo == popped git stash pop -q ls -l .git/refs/stash .git/logs/refs/stash git rev-parse --verify refs/stash yields: == stashed -rw-r--r-- 1 peff peff 153 Jan 15 03:49 .git/logs/refs/stash -rw-r--r-- 1 peff peff 41 Jan 15 03:49 .git/refs/stash 7fb40812ac4aaf7fa31e584863fc52c4b4a67aa0 == popped ls: cannot access .git/refs/stash: No such file or directory ls: cannot access .git/logs/refs/stash: No such file or directory fatal: Needed a single revision Does running it reproduce the problem for you? If it does, can you be more specific about your git version? If it does not, can you show what you are doing differently to reproduce? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Diagnosing stray/stale .keep files -- explore what is in a pack?
On Tue, Jan 14, 2014 at 02:42:09PM -0500, Martin Langhoff wrote: On Tue, Jan 14, 2014 at 2:36 PM, Martin Fick mf...@codeaurora.org wrote: Perhaps the receiving process is dying hard and leaving stuff behind? Out-of-memory, out of disk space? Yes, that's my guess as well. This server had gc misconfigured, so it hit ENOSPC a few weeks ago. It is likely that the .lock files were left behind back then, and since then the clients pushing to these refs were transferring their whole history and still failing to update the ref, leading to rapid repo growth. We see these occasionally at GitHub, too. I haven't yet figured out a definite cause, though whatever it is, it's relatively rare. I think the .keep files and the .lock files are in two separate boats, though. pack-objects creates the .keep files as a lock between the time it moves them into place and when receive-pack updates the refs (so that a simultaneous prune does not think they should be removed). Receive-pack then updates the refs and removes the .keep file. However, in the interim code, we are just updating the refs, and are careful to return any errors rather than calling die() (so if ENOSPC prevented ref write, that would not cause this). So for us to leave a .keep there, it is probably one of: 1. A few generic library functions, like xmalloc, can cause us to die. This should be very rare, though. 2. We tried to unlink the keep-file, but couldn't (could ENOSPC prevent a deletion? I suspect it depends on the filesystem). 3. We were killed by signal (or system crash). Fetch-pack also will create .keep files, and it is much less careful during the time the file exists. However, busy servers tend to be receiving pushes, not initiating fetches. Actual .lock files are added to a signal/atexit handle that cleans them up automatically on program exit. So those really should be caused by system crash (or kill -9), and that has generally been our experience at GitHub. But again, if ENOSPC could prevent deletion on your filesystem, it could be related. But there is not much git can do to clean up if unlink() fails us. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hooks scripts and noexec partition
On Tue, Jan 14, 2014 at 04:41:03PM +0100, krz...@gmail.com wrote: git can't execute hooks no partitions mounted with noexec - even if those are just scripts with shebang line Right. Git does not know that they are shell (or other) scripts; they could be anything, and the advertised interface is that git will run exec on them (and it is explicitly OK for them to exist but not be executable, and git takes this as a sign that they are inactive). and they actualy work by hooks/./post-comit (because I use small patch on kernel that allows running scripts that way on noexec partition) If you are suggesting that git always execute them as hooks/./$hook, that might make sense if such behavior is widespread. But it sounds like you are running a custom kernel patch to get around the noexec setting. Here is the custom git patch to match it. :) diff --git a/run-command.c b/run-command.c index 3914d9c..ae84e87 100644 --- a/run-command.c +++ b/run-command.c @@ -753,7 +753,7 @@ int finish_async(struct async *async) char *find_hook(const char *name) { - char *path = git_path(hooks/%s, name); + char *path = git_path(hooks/./%s, name); if (access(path, X_OK) 0) path = NULL; -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-log --cherry-pick gives different results when using tag or tag^{}
[+cc Junio, as the bug blames to him] On Fri, Jan 10, 2014 at 02:15:40PM +0100, Francis Moreau wrote: In mykernel repository, I'm having 2 different behaviours with git-log but I don't understand why: Doing: $ git log --oneline --cherry-pick --left-right v3.4.71-1^{}...next and $ git log --oneline --cherry-pick --left-right v3.4.71-1...next give something different (where v3.4.71-1 is a tag). The command using ^{} looks the one that gives correct result I think. Yeah, this looks like a bug. Here's a simple reproduction recipe: commit() { echo content $1 git add $1 git commit -m $1 } git init repo cd repo commit one commit two sleep 1 git tag -m foo mytag git checkout -b side HEAD^ git cherry-pick mytag commit three The sleep seems to be necessary, to give the commit and its cherry-picked version different commit times (presumably because it impacts the order in which we visit them during the traversal). Running: git log --oneline --decorate --cherry-pick --left-right mytag^{}...HEAD produces the expected: e36cc32 (HEAD, side) three but running it with the tag, as: git log --oneline --decorate --cherry-pick --left-right mytag...HEAD yields: e36cc32 (HEAD, side) three 5e96f7d two db92fca (tag: mytag, master) two Not only do we get the cherry-pick wrong (we should omit both twos), but we seem to erroneously count the tagged two as being on the right-hand side, which it clearly is not (and which is probably why we don't find the match via --cherry-pick). This worked in v1.8.4, but is broken in v1.8.5. It bisects to Junio's 895c5ba (revision: do not peel tags used in range notation, 2013-09-19), which sounds promising. I think what is happening is that we used to apply the SYMMETRIC_LEFT flag directly to the commit. Now we apply it to the tag, and it does not seem to get propagated. The patch below fixes it for me, but I have no idea if we actually need to be setting the other flags, or just SYMMETRIC_LEFT. I also wonder if the non-symmetric two-dot case needs to access any pointed-to commit and propagate flags in a similar way. diff --git a/revision.c b/revision.c index 7010aff..1d99bfc 100644 --- a/revision.c +++ b/revision.c @@ -1197,6 +1197,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi free_commit_list(exclude); a_flags = flags | SYMMETRIC_LEFT; + a-object.flags |= a_flags; + b-object.flags |= flags; } a_obj-flags |= a_flags; -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report: stash in upstream caused remote fetch to fail
On Mon, Jan 06, 2014 at 07:19:43PM -0800, Junio C Hamano wrote: Actually, I think the above recollection of mine was completely bogus. The is there because we do allow things like HEAD (they are the funny ones) as well as things inside refs/, and the latter is the only thing we had a check-ref-format to dictate the format when the code was written. I do not mind tightening things a bit (e.g. outside refs/, only allow HEAD and nothing else). A good first step might be to enforce allow-onelevel outside refs/ (so that we can allow HEAD) and for those inside refs/ check without allow-onelevel but without skipping the prefix. It is a separate story if it makes much sense to allow fetching refs/stash or ignoring when running git clone. Operationally, I think it makes more sense to ignore refs/stash, not because it is a one-level name, but because what a stash is. This discussion stalled, but I finally got around to looking at it today. I agree that we should leave aside more complex policy for now, and start with bringing the fetch and fetch-pack filters into harmony. That turns off format-checking for things outside refs/ (so allows HEAD), and checks the whole string for things inside refs/ (so it does not fall afoul of the one-level check). I ended up with the patch below, which converts fetch-pack to match fetch. However, reading the original one-level check in 03feddd (git-check-ref-format: reject funny ref names., 2005-10-13), it does seem like it was meant to reject refs/foo. It contains: if (level 2) return -1; /* at least of form heads/blah */ It seems like the meaning of this check changed over the years. I am not sure if that was intentional or not. :) So we could go the other direction, and harmonize on disallowing refs/foo. I don't particularly care that much. The nasty thing now is the mismatch between the two spots, which means that fetch doesn't just ignore the ref, but bombs out with a missing object. -- 8 -- Subject: fetch-pack: do not filter out one-level refs Currently fetching a one-level ref like refs/foo does not work consistently. The outer git fetch program filters the list of refs, checking each against check_refname_format. Then it feeds the result to do_fetch_pack to actually negotiate the haves/wants and get the pack. The fetch-pack code does its own filter, and it behaves differently. The fetch-pack filter looks for refs in refs/, and then feeds everything _after_ the slash (i.e., just foo) into check_refname_format. But check_refname_format is not designed to look at a partial refname. It complains that the ref has only one component, thinking it is at the root (i.e., alongside HEAD), when in reality we just fed it a partial refname. As a result, we omit a ref like refs/foo from the pack request, even though git fetch then tries to store the resulting ref. If we happen to get the object anyway (e.g., because the ref is contained in another ref we are fetching), then the fetch succeeds. But if it is a unique object, we fail when trying to update refs/foo. We can fix this by just passing the whole refname into check_refname_format; we know the part we were omitting is refs/, which is acceptable in a refname. This at least makes the checks consistent with each other. This problem happens most commonly with refs/stash, which is the only one-level ref in wide use. However, our test does not use refs/stash, as we may later want to restrict it specifically (not because it is one-level, but because of the semantics of stashes). We may also want to do away with the multiple levels of filtering (which can cause problems when they are out of sync), or even forbid one-level refs entirely. However, those decisions can come later; this fixes the most immediate problem, which is the mismatch between the two. Signed-off-by: Jeff King p...@peff.net --- fetch-pack.c | 2 +- t/t5510-fetch.sh | 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/fetch-pack.c b/fetch-pack.c index 760ed16..b28ccd1 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -505,7 +505,7 @@ static void filter_refs(struct fetch_pack_args *args, next = ref-next; if (!memcmp(ref-name, refs/, 5) - check_refname_format(ref-name + 5, 0)) + check_refname_format(ref-name, 0)) ; /* trash */ else { while (i nr_sought) { diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index 12674ac..ab28594 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -640,4 +640,15 @@ test_expect_success 'branchname D/F conflict resolved by --prune' ' test_cmp expect actual ' +test_expect_success 'fetching a one-level ref works' ' + test_commit extra + git reset --hard HEAD^ + git update-ref refs/foo extra + git init one-level + ( + cd one-level + git fetch
Re: Bug report: stash in upstream caused remote fetch to fail
On Wed, Jan 15, 2014 at 05:46:13AM -0500, Jeff King wrote: This discussion stalled, but I finally got around to looking at it today. I agree that we should leave aside more complex policy for now, and start with bringing the fetch and fetch-pack filters into harmony. That turns off format-checking for things outside refs/ (so allows HEAD), and checks the whole string for things inside refs/ (so it does not fall afoul of the one-level check). By the way, an interesting implication of this is that I do not think there is any format check on things outside of refs/. If you were to do git fetch ... +*:* you would write whatever crap the other side gives you. I can't imagine any reason a client would _ever_ do that, though, so I don't think it's a big deal. We tend to fetch HEAD explicitly by name, and that's it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] read-cache: connect to file watcher
On Sun, Jan 12, 2014 at 06:03:39PM +0700, Nguyễn Thái Ngọc Duy wrote: This patch establishes a connection between a new file watcher daemon and git. Each index file may have at most one file watcher attached to it. The file watcher maintains a UNIX socket at $GIT_DIR/index.watcher. Any process that has write access to $GIT_DIR can talk to the file watcher. IIRC, this is not portable. Some systems (not Linux) will allow anyone to connect to the socket if it the file is accessible to them (so anybody with read access to $GIT_DIR can talk to the file watcher). The usual trick is to put it in a sub-directory that only the connectors can access (e.g., put it in $GIT_DIR/watcher/index, and create watcher mode 0700). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Consistency question
On Wed, Jan 15, 2014 at 11:37:08AM +0100, David Kastrup wrote: The question is what guarantees I have with regard to the commit date of a commit in relation to that of its parent commits: a) none b) commitdate(child) = commitdate(parent) c) commitdate(child) commitdate(parent) a) none Obviously, I can rely on c) being true almost always: Actually, b) is quite often the case in automated processes (e.g., git am or git rebase). The author dates are different, but the committer dates may be in the same second. And of course a) is the result of clock skew and software bugs. it's definitely good for a heuristic used for improving performance (meaning as an ordering criterion for a commit priority queue). The problem is how much I should cater for graceful behavior for the cases where it's not. Yes, this is exactly how git uses it. We generally walk breadth-first through the graph, relying on commit times for performance but not correctness. There are some parts of the code that will behave badly with clock skew. For example, --since will stop traversing when we hit a certain point. It requires a fixed number of too old commits before quitting, though, in an attempt to bypass small runs of skewed clocks. The git describe --contains algorithm will also produce wrong results in the face of skew. I believe it uses a slop of 24 hours in its skew. The current tag --contains algorithm is currently correct in the face of skew, but it can go much faster if you accept that skew will cause it to go wrong. I suspect there are other algorithms that could be sped up, too, if we had trustworthy generation numbers (I implemented and timed the --contains algorithm, but haven't done so for other algorithms). I've played with calculating and caching the generation numbers at repack time, but there aren't any patches currently under consideration. Does git do any actual checks before pushing? No. One problem with checking the commit relationships is that it may not be the new commit which is broken (by skewing backwards), but rather the commit it builds on (which has skewed forwards). If you are building on such a fast-forward commit, it is often to late to fix that commit. So you have to fast-forward yourself, propagating the bogus value. If the receiving machine checked the incoming commits against its own internal clock, that could work. You would still want to check the commit relationships to catch new commits that erroneously claim to be from the past, though (that is, before their parents). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] do not discard revindex when re-preparing packfiles
When an object lookup fails, we re-read the objects/pack directory to pick up any new packfiles that may have been created since our last read. We also discard any pack revindex structs we've allocated. The discarding is a problem for the pack-bitmap code, which keeps a pointer to the revindex for the bitmapped pack. After the discard, the pointer is invalid, and we may read free()d memory. Other revindex users do not keep a bare pointer to the revindex; instead, they always access it through revindex_for_pack(), which lazily builds the revindex. So one solution is to teach the pack-bitmap code a similar trick. It would be slightly less efficient, but probably not all that noticeable. However, it turns out this discarding is not actually necessary. When we call reprepare_packed_git, we do not throw away our old pack list. We keep the existing entries, and only add in new ones. So there is no safety problem; we will still have the pack struct that matches each revindex. The packfile itself may go away, of course, but we are already prepared to handle that, and it may happen outside of reprepare_packed_git anyway. Throwing away the revindex may save some RAM if the pack never gets reused (about 12 bytes per object). But it also wastes some CPU time (to regenerate the index) if the pack does get reused. It's hard to say which is more valuable, but in either case, it happens very rarely (only when we race with a simultaneous repack). Just leaving the revindex in place is simple and safe both for current and future code. Signed-off-by: Jeff King p...@peff.net --- On top of jk/pack-bitmap. pack-revindex.c | 11 --- pack-revindex.h | 1 - sha1_file.c | 1 - 3 files changed, 13 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 0bb13b1..5bd7c61 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -245,14 +245,3 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) return pridx-revindex + pos; } - -void discard_revindex(void) -{ - if (pack_revindex_hashsz) { - int i; - for (i = 0; i pack_revindex_hashsz; i++) - free(pack_revindex[i].revindex); - free(pack_revindex); - pack_revindex_hashsz = 0; - } -} diff --git a/pack-revindex.h b/pack-revindex.h index 866ca9c..d737f98 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -15,6 +15,5 @@ struct pack_revindex *revindex_for_pack(struct packed_git *p); int find_revindex_position(struct pack_revindex *pridx, off_t ofs); struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); -void discard_revindex(void); #endif diff --git a/sha1_file.c b/sha1_file.c index df89b57..45f9bb4 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1279,7 +1279,6 @@ void prepare_packed_git(void) void reprepare_packed_git(void) { - discard_revindex(); prepare_packed_git_run_once = 0; prepare_packed_git(); } -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-log --cherry-pick gives different results when using tag or tag^{}
On Wed, Jan 15, 2014 at 11:57:39AM -0800, Junio C Hamano wrote: Where do we pass down other flags from tags to commits? For example, if we do this: $ git log ^v1.8.5 master we mark v1.8.5 tag as UNINTERESTING, and throw that tag (not commit v1.8.5^0) into revs-pending.objects[]. We do the same for 'master', which is a commit. Later, in prepare_revision_walk(), we call handle_commit() on them, and unwrap the tag v1.8.5 to get v1.8.5^0, and then handles that commit object with flags obtained from the tag object. This code only cares about UNINTERESTING and manually propagates it. Thanks for picking up this line of thought. I had some notion that the right solution would be in propagating the flags later from the pending tags to the commits, but I didn't quite know where to look. Knowing that we explicitly propagate UNINTERESTING but nothing else makes what I was seeing make a lot more sense. Perhaps that code needs to propagate at least SYMMETRIC_LEFT down to the commit object as well, no? With your patch, the topmost level of tag object and the eventual commit object are marked with the flag, but if we were dealing with a tag that points at another tag that in turn points at a commit, the intermediate tag will not be marked with SYMMETRIC_LEFT (nor UNINTERESTING for that matter), which may not affect the final outcome, but it somewhat feels wrong. Agreed. I think the lack of flags on intermediate tags has always been that way, even before 895c5ba, and I do not know of any case where it currently matters. But it seems like the obvious right thing to mark those intermediate tags. How about doing it this way instead (totally untested, though)? Makes sense. It also means we will propagate flags down to any pointed-to trees and blobs. I can't think of a case where that will matter either (and they cannot be SYMMETRIC_LEFT, as that only makes sense for commit objects). I do notice that when we have a tree, we explicitly propagate UNINTERESTING to the rest of the tree. Should we be propagating all flags instead? Again, I can't think of a reason to do so (and if it is not UNINTERESTING, it is a non-trivial amount of time to mark all paths in the tree). @@ -287,7 +288,6 @@ static struct commit *handle_commit(struct rev_info *revs, if (parse_commit(commit) 0) die(unable to parse commit %s, name); if (flags UNINTERESTING) { - commit-object.flags |= UNINTERESTING; mark_parents_uninteresting(commit); revs-limited = 1; } We don't need to propagate the UNINTERESTING flag here, because either: - object pointed to the commit, in which case flags comes from object-flags, and we already have it set or - object was a tag, and we propagated the flags as we peeled (from your earlier hunk) Makes sense. I think the mark_blob_uninteresting call later in the function is now irrelevant for the same reasons. The mark_tree_uninteresting call is not, though, because it recurses. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: revision: propagate flag bits from tags to pointees
On Wed, Jan 15, 2014 at 12:26:13PM -0800, Junio C Hamano wrote: With the previous fix 895c5ba3 (revision: do not peel tags used in range notation, 2013-09-19), handle_revision_arg() that processes command line arguments for the git log family of commands no longer directly places the object pointed by the tag in the pending object array when it sees a tag object. We used to place pointee there after copying the flag bits like UNINTERESTING and SYMMETRIC_LEFT. This change meant that any flag that is relevant to later history traversal must now be propagated to the pointed objects (most often these are commits) while starting the traversal, which is partly done by handle_commit() that is called from prepare_revision_walk(). We did propagate UNINTERESTING, but did not do so for others, most notably SYMMETRIC_LEFT. This caused git log --left-right v1.0... (where v1.0 is a tag) to start losing the leftness from the commit the tag points at. Signed-off-by: Junio C Hamano gits...@pobox.com --- Looks good to me. As per my previous mail, I _think_ you could squash in: diff --git a/revision.c b/revision.c index f786b51..2db906c 100644 --- a/revision.c +++ b/revision.c @@ -316,13 +316,10 @@ static struct commit *handle_commit(struct rev_info *revs, * Blob object? You know the drill by now.. */ if (object-type == OBJ_BLOB) { - struct blob *blob = (struct blob *)object; if (!revs-blob_objects) return NULL; - if (flags UNINTERESTING) { - mark_blob_uninteresting(blob); + if (flags UNINTERESTING) return NULL; - } add_pending_object(revs, object, ); return NULL; } but that is not very much code reduction (and mark_blob_uninteresting is very cheap). So it may not be worth the risk that my analysis is wrong. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git quietly fails on https:// URL, https errors are never reported to user
On Thu, Jan 16, 2014 at 04:27:03AM -0800, Yuri wrote: While debugging, I find that remove_junk() deletes all directories from under __cxa_finalize. Before this, exit(128) is called from recvline_fh (Debug: Remote helper quit.) And this function in turn is called from under refs = transport_get_remote_refs(transport); I think you need to make sure that any https errors (in this and other locations) are reported to the user, and git never quits on error without saying what the error is. We used to print Reading from helper 'git-remote-https' failed in this instance. But in the majority of cases, remote-https has printed a useful message already to stderr, and the extra line just confused people. The downside, as you noticed, is that when the helper dies without printing an error, the user is left with no message. Unfortunately, detecting whether the helper printed a useful message is non-trivial. It's possible we could do more detection based on the helper's death (e.g., if it died by signal, print a message) and at least say _something_. But even if we do so, the message isn't going to tell you what went wrong, only that something unexpected happened. It is up to the helper to print something useful, and if it didn't, it should be fixed. So the most important bug to fix here, IMHO, is figuring out why git-remote-https died without printing an error message. Is it possible to strace (or truss) the helper process? What it was doing when it died may be helpful. Rather than picking through strace -f output, you can use this hack to trace just the helper process: cat /in/your/$PATH/git-remote-strace \EOF #!/bin/sh protocol=$(echo $2 | cut -d: -f1) exec strace git-remote-$protocol $@ EOF git clone strace::https://github.com/your/repo.git -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] diff_filespec cleanups and optimizations
I recently came across a repository with a commit containing 100 million paths in its tree. Cleverly, the whole repo fits into a 1.5K packfile (can you guess how it was done?). Not so cleverly, running diff-tree --root on that commit uses a large amount of memory. :) I do not think it is worth optimizing for such a pathological repository. But I was curious how much it would want (it OOM'd on my 64-bit 16G machine). The answer is roughly: 100,000,000 * ( 8 bytes per pointer to diff_filepair in the diff_queue + 32 bytes per diff_filepair struct + 2 * ( 96 bytes per diff_filespec struct + 12 bytes per filename (in this case) ) ) which is about 25G. Plus malloc overhead. So obviously this example is unreasonable. A more reasonable large case is something like WebKit at ~150K files, doing a diff against the empty tree. That's only 37M. But while looking at it, I noticed a bunch of cleanups for diff_filespec. With the patches below, sizeof(struct diff_filespec) on my 64-bit machine goes from 96 bytes down to 80. Compiling with -m32 goes from 64 bytes down to 52. The first few patches have cleanup value aside from the struct size improvement. The last two are pure optimization. I doubt the optimization is noticeable for any real-life cases, so I don't mind if they get dropped. But they're quite trivial and obvious. [1/5]: diff_filespec: reorder dirty_submodule macro definitions [2/5]: diff_filespec: drop funcname_pattern_ident field [3/5]: diff_filespec: drop xfrm_flags field [4/5]: diff_filespec: reorder is_binary field [5/5]: diff_filespec: use only 2 bits for is_binary flag -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] diff_filespec: reorder dirty_submodule macro definitions
diff_filespec has a 2-bit dirty_submodule field and defines two flags as macros. Originally these were right next to each other, but a new field was accidentally added in between in commit 4682d85. This patch puts the field and its flags back together. Using an enum like: enum { DIRTY_SUBMODULE_UNTRACKED = 1, DIRTY_SUBMODULE_MODIFIED = 2 } dirty_submodule; would be more obvious, but it bloats the structure. Limiting the enum size like: } dirty_submodule : 2; might work, but it is not portable. Signed-off-by: Jeff King p...@peff.net --- diffcore.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diffcore.h b/diffcore.h index 1c16c85..f822f9e 100644 --- a/diffcore.h +++ b/diffcore.h @@ -43,9 +43,9 @@ struct diff_filespec { unsigned should_free : 1; /* data should be free()'ed */ unsigned should_munmap : 1; /* data should be munmap()'ed */ unsigned dirty_submodule : 2; /* For submodules: its work tree is dirty */ - unsigned is_stdin : 1; #define DIRTY_SUBMODULE_UNTRACKED 1 #define DIRTY_SUBMODULE_MODIFIED 2 + unsigned is_stdin : 1; unsigned has_more_entries : 1; /* only appear in combined diff */ struct userdiff_driver *driver; /* data should be considered binary; -1 means don't know yet */ -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] diff_filespec: drop xfrm_flags field
The only mention of this field in the code is by some debugging code which prints it out (and it will always be zero, since we never touch it otherwise). It was obsoleted very early on by 25d5ea4 ([PATCH] Redo rename/copy detection logic., 2005-05-24). Signed-off-by: Jeff King p...@peff.net --- No savings here on 64-bit, since this ended up going to padding, but it is what makes the next patch work. diff.c | 4 ++-- diffcore.h | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/diff.c b/diff.c index 6b4cd0e..8e4a6a9 100644 --- a/diff.c +++ b/diff.c @@ -4139,9 +4139,9 @@ void diff_debug_filespec(struct diff_filespec *s, int x, const char *one) DIFF_FILE_VALID(s) ? valid : invalid, s-mode, s-sha1_valid ? sha1_to_hex(s-sha1) : ); - fprintf(stderr, queue[%d] %s size %lu flags %d\n, + fprintf(stderr, queue[%d] %s size %lu\n, x, one ? one : , - s-size, s-xfrm_flags); + s-size); } void diff_debug_filepair(const struct diff_filepair *p, int i) diff --git a/diffcore.h b/diffcore.h index 92e4d48..22993e1 100644 --- a/diffcore.h +++ b/diffcore.h @@ -31,7 +31,6 @@ struct diff_filespec { void *cnt_data; unsigned long size; int count; /* Reference count */ - int xfrm_flags; /* for use by the xfrm */ int rename_used; /* Count of rename users */ unsigned short mode; /* file mode */ unsigned sha1_valid : 1; /* if true, use sha1 and trust mode; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] diff_filespec: reorder is_binary field
The middle of the diff_filespec struct contains a mixture of ints, shorts, and bit-fields, followed by a pointer. On an x86-64 system with an LP64 or LLP64 data model (i.e., most of them), the integers and flags end up being padded out by 41 bits to put the pointer at an 8-byte boundary. After the pointer, we have the int is_binary field, which is only 32 bits. We end up wasting another 32 bits to pad the struct size up to a multiple of 64 bits. We can move the is_binary field before the pointer, which lets the compiler store it where we used to have padding. This shrinks the top padding to only 9 bits (from the bit-fields), and eliminates the bottom padding entirely, dropping the struct size from 88 to 80 bytes. On a 32-bit system, there is no benefit, but nor should there be any harm (we only need 4-byte alignment there, so we were already using only 9 bits of padding). Signed-off-by: Jeff King p...@peff.net --- diffcore.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diffcore.h b/diffcore.h index 22993e1..d911bf0 100644 --- a/diffcore.h +++ b/diffcore.h @@ -45,9 +45,9 @@ struct diff_filespec { #define DIRTY_SUBMODULE_MODIFIED 2 unsigned is_stdin : 1; unsigned has_more_entries : 1; /* only appear in combined diff */ - struct userdiff_driver *driver; /* data should be considered binary; -1 means don't know yet */ int is_binary; + struct userdiff_driver *driver; }; extern struct diff_filespec *alloc_filespec(const char *); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] diff_filespec: use only 2 bits for is_binary flag
The is_binary flag needs only three values: -1, 0, and 1. However, we use a whole 32-bit int for it on most systems (both 32- and 64- bit). Instead, we can mark it to use only 2 bits. On 32-bit systems, this lets it end up as part of the bitfield above (saving 4 bytes). On 64-bit systems, we don't see any change (because the savings end up as padding), but it does leave room for another free 32-bit value to be added later. Signed-off-by: Jeff King p...@peff.net --- I don't typically use bit-sized integers like this for anything but unsigned integers to be used as flags. My understanding is that using signed integers is explicitly permitted by the standard. I don't know if we're guaranteed a 2's-complement representation, but I can't imagine an implementation providing any range besides -2..1, which is what we need. diffcore.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diffcore.h b/diffcore.h index d911bf0..79de8cf 100644 --- a/diffcore.h +++ b/diffcore.h @@ -46,7 +46,7 @@ struct diff_filespec { unsigned is_stdin : 1; unsigned has_more_entries : 1; /* only appear in combined diff */ /* data should be considered binary; -1 means don't know yet */ - int is_binary; + int is_binary : 2; struct userdiff_driver *driver; }; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html