[PATCH 1/2] sha1_object_info_extended: provide delta base sha1s

2013-12-21 Thread Jeff King
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check oi.whence == OI_PACKED  oi.u.packed.is_delta
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).

Signed-off-by: Jeff King p...@peff.net
---
 cache.h |  1 +
 sha1_file.c | 53 +
 2 files changed, 54 insertions(+)

diff --git a/cache.h b/cache.h
index ce377e1..67356db 100644
--- a/cache.h
+++ b/cache.h
@@ -1074,6 +1074,7 @@ struct object_info {
enum object_type *typep;
unsigned long *sizep;
unsigned long *disk_sizep;
+   unsigned char *delta_base_sha1;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index daacc0c..4e8dd8b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1667,6 +1667,38 @@ static off_t get_delta_base(struct packed_git *p,
return base_offset;
 }
 
+/*
+ * Like get_delta_base above, but we return the sha1 instead of the pack
+ * offset. This means it is cheaper for REF deltas (we do not have to do
+ * the final object lookup), but more expensive for OFS deltas (we
+ * have to load the revidx to convert the offset back into a sha1).
+ */
+static const unsigned char *get_delta_base_sha1(struct packed_git *p,
+   struct pack_window **w_curs,
+   off_t curpos,
+   enum object_type type,
+   off_t delta_obj_offset)
+{
+   if (type == OBJ_REF_DELTA) {
+   unsigned char *base = use_pack(p, w_curs, curpos, NULL);
+   return base;
+   } else if (type == OBJ_OFS_DELTA) {
+   struct revindex_entry *revidx;
+   off_t base_offset = get_delta_base(p, w_curs, curpos,
+  type, delta_obj_offset);
+
+   if (!base_offset)
+   return NULL;
+
+   revidx = find_pack_revindex(p, base_offset);
+   if (!revidx)
+   return NULL;
+
+   return nth_packed_object_sha1(p, revidx-nr);
+   } else
+   return NULL;
+}
+
 int unpack_object_header(struct packed_git *p,
 struct pack_window **w_curs,
 off_t *curpos,
@@ -1824,6 +1856,22 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
}
}
 
+   if (oi-delta_base_sha1) {
+   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
+   const unsigned char *base;
+
+   base = get_delta_base_sha1(p, w_curs, curpos,
+  type, obj_offset);
+   if (!base) {
+   type = OBJ_BAD;
+   goto out;
+   }
+
+   hashcpy(oi-delta_base_sha1, base);
+   } else
+   hashclr(oi-delta_base_sha1);
+   }
+
 out:
unuse_pack(w_curs);
return type;
@@ -2407,6 +2455,9 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
git_zstream stream;
char hdr[32];
 
+   if (oi-delta_base_sha1)
+   hashclr(oi-delta_base_sha1);
+
/*
 * If we don't care about type or size, then we don't
 * need to look inside the object at all. Note that we
@@ -2457,6 +2508,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
*(oi-sizep) = co-size;
if (oi-disk_sizep)
*(oi-disk_sizep) = 0;
+   if (oi-delta_base_sha1)
+   hashclr(oi-delta_base_sha1);
oi-whence = OI_CACHED;
return 0;
}
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] cat-file --batch-check='%(deltabase)'

2013-12-21 Thread Jeff King
This series lets you query the delta base for a particular object (or
set of objects), like:

  $ git rev-list --objects --all |
git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'

The only other way I know of to get this information is using
verify-pack (or index-pack), which is much slower (and less convenient
to parse).

I needed this recently to write tests for another (not yet published)
series. But I think it stands on its own as a debugging / introspection
tool.

  [1/2]: sha1_object_info_extended: provide delta base sha1s
  [2/2]: cat-file: provide %(deltabase) batch format

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] cat-file: provide %(deltabase) batch format

2013-12-21 Thread Jeff King
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).

Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt | 12 +---
 builtin/cat-file.c |  6 ++
 t/t1006-cat-file.sh| 34 ++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 322f5ed..f6a16f4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -109,6 +109,11 @@ newline. The available atoms are:
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
 
+`deltabase`::
+   If the object is stored as a delta on-disk, this expands to the
+   40-hex sha1 of the delta base object. Otherwise, expands to the
+   null sha1 (40 zeroes). See `CAVEATS` below.
+
 `rest`::
If this atom is used in the output string, input lines are split
at the first whitespace boundary. All characters before that
@@ -152,10 +157,11 @@ should be taken in drawing conclusions about which refs 
or objects are
 responsible for disk usage. The size of a packed non-delta object may be
 much larger than the size of objects which delta against it, but the
 choice of which object is the base and which is the delta is arbitrary
-and is subject to change during a repack. Note also that multiple copies
-of an object may be present in the object database; in this case, it is
-undefined which copy's size will be reported.
+and is subject to change during a repack.
 
+Note also that multiple copies of an object may be present in the object
+database; in this case, it is undefined which copy's size or delta base
+will be reported.
 
 GIT
 ---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..2e0af2e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -118,6 +118,7 @@ struct expand_data {
unsigned long size;
unsigned long disk_size;
const char *rest;
+   unsigned char delta_base_sha1[20];
 
/*
 * If mark_query is true, we do not expand anything, but rather
@@ -174,6 +175,11 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
data-split_on_whitespace = 1;
else if (data-rest)
strbuf_addstr(sb, data-rest);
+   } else if (is_atom(deltabase, atom, len)) {
+   if (data-mark_query)
+   data-info.delta_base_sha1 = data-delta_base_sha1;
+   else
+   strbuf_addstr(sb, sha1_to_hex(data-delta_base_sha1));
} else
die(unknown format element: %.*s, len, atom);
 }
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8a1bc5c..633dc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,4 +240,38 @@ test_expect_success --batch-check with multiple sha1s 
gives correct format '
 $(echo_without_newline $batch_check_input | git cat-file --batch-check)
 '
 
+test_expect_success 'setup blobs which are likely to delta' '
+   test-genrandom foo 10240 foo 
+   { cat foo; echo plus; } foo-plus 
+   git add foo foo-plus 
+   git commit -m foo 
+   cat blobs -\EOF
+   HEAD:foo
+   HEAD:foo-plus
+   EOF
+'
+
+test_expect_success 'confirm that neither loose blob is a delta' '
+   cat expect -EOF
+   $_z40
+   $_z40
+   EOF
+   git cat-file --batch-check=%(deltabase) blobs actual 
+   test_cmp expect actual
+'
+
+# To avoid relying too much on the current delta heuristics,
+# we will check only that one of the two objects is a delta
+# against the other, but not the order. We can do so by just
+# asking for the base of both, and checking whether either
+# sha1 appears in the output.
+test_expect_success '%(deltabase) reports packed delta bases' '
+   git repack -ad 
+   git cat-file --batch-check=%(deltabase) blobs actual 
+   {
+   grep $(git rev-parse HEAD:foo) actual ||
+   grep $(git rev-parse HEAD:foo-plus) actual
+   }
+'
+
 test_done
-- 
1.8.5.1.399.g900e7cd
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] t0000 cleanups

2013-12-28 Thread Jeff King
When I want to debug a failing test, I often end up doing:

  cd t
  ./t4107-tab -v -i
  cd tratab

The test names are long, so tab-completing on the trash directory is
very helpful. Lately I've noticed that there are a bunch of crufty trash
directories in my t/ directory, which makes my tab-completion more
annoying.

It turns out they're leftovers from t running, due to a bad
interaction with some other fixes from last April. The first patch fixes
that.

The second patch is a follow-on cleanup enabled by the first.

The third is an unrelated cleanup that I noticed when I ran t 47
times in a row. :)

  [1/3]: t: set TEST_OUTPUT_DIRECTORY for sub-tests
  [2/3]: t: simplify HARNESS_ACTIVE hack
  [3/3]: t: drop known breakage test

 t/t-basic.sh | 19 +++
 t/test-lib.sh|  2 --
 2 files changed, 7 insertions(+), 14 deletions(-)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] t0000: simplify HARNESS_ACTIVE hack

2013-12-28 Thread Jeff King
Commit 517cd55 set HARNESS_ACTIVE unconditionally in
sub-tests, because that value affects the output of
--verbose. t needs stable output from its sub-tests,
and we may or may not be running under a TAP harness.

That commit made the decision to always set the variable,
since it has another useful side effect, which is
suppressing writes to t/test-results by the sub-tests (which
would just pollute the real results).

Since the last commit, though, the sub-tests have their own
test-results directories, so this is no longer an issue. We
can now update a few comments that are no longer accurate
nor necessary.

We can also revisit the choice of HARNESS_ACTIVE. Since we
must choose one value for stability, it's probably saner to
have it off. This means that future patches could test
things like the test-results writing, or the --quiet
option, which is currently ignored when run under a harness.

Signed-off-by: Jeff King p...@peff.net
---
I do not have any plans to write such tests, and I'd be OK if we wanted
to stop this just at fixing up the comments. But it took me a while to
figure out what is going on, and I believe unsetting HARNESS_ACTIVE in
the sub-tests is the choice that is least likely to cause somebody in
the future to have to re-figure it out. :)

 t/t-basic.sh | 14 +-
 t/test-lib.sh|  2 --
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index bc4e3e2..e6c5b63 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -50,11 +50,11 @@ run_sub_test_lib_test () {
shift 2
mkdir $name 
(
-   # Pretend we're a test harness.  This prevents
-   # test-lib from writing the counts to a file that will
-   # later be summarized, showing spurious failed tests
-   HARNESS_ACTIVE=t 
-   export HARNESS_ACTIVE 
+   # Pretend we're not running under a test harness, whether we
+   # are or not. The test-lib output depends on the setting of
+   # this variable, so we need a stable setting under which to run
+   # the sub-test.
+   sane_unset HARNESS_ACTIVE 
cd $name 
cat $name.sh -EOF 
#!$SHELL_PATH
@@ -235,16 +235,13 @@ test_expect_success 'test --verbose' '
grep -v ^Initialized empty test-verbose/out+ test-verbose/out 
check_sub_test_lib_test test-verbose -\EOF
 expecting success: true
-Z
 ok 1 - passing test
 Z
 expecting success: echo foo
 foo
-Z
 ok 2 - test with output
 Z
 expecting success: false
-Z
 not ok 3 - failing test
 # false
 Z
@@ -267,7 +264,6 @@ test_expect_success 'test --verbose-only' '
 Z
 expecting success: echo foo
 foo
-Z
 ok 2 - test with output
 Z
 not ok 3 - failing test
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1cf78d5..1531c24 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -481,8 +481,6 @@ test_at_end_hook_ () {
 test_done () {
GIT_EXIT_OK=t
 
-   # Note: t relies on $HARNESS_ACTIVE disabling the .counts
-   # output file
if test -z $HARNESS_ACTIVE
then
test_results_dir=$TEST_OUTPUT_DIRECTORY/test-results
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] t0000: drop known breakage test

2013-12-28 Thread Jeff King
Having a simulated known breakage test means that the test
suite will always tell us there is a bug to be fixed, even
though it is only simulated.

The right way to test this is in a sub-test, that can also
check that we provide the correct exit status and output.
Fortunately, we already have such a test (added much later
by 5ebf89e).

We could arguably get rid of the simulated success test
immediately above, as well, as it is also redundant with the
tests added in 5ebf89e. However, it does not have the
annoying behavior of the known breakage test. It may also
be easier to debug if the test suite is truly broken, since
it is not a test-within-a-test, as the later tests are.

Signed-off-by: Jeff King p...@peff.net
---
I am not _that_ bothered by the known breakage, but AFAICT there is
zero benefit to keeping this redundant test. But maybe I am missing
something.

 t/t-basic.sh | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index e6c5b63..a2bb63c 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -41,9 +41,6 @@ test_expect_success '.git/objects should have 3 
subdirectories' '
 test_expect_success 'success is reported like this' '
:
 '
-test_expect_failure 'pretend we have a known breakage' '
-   false
-'
 
 run_sub_test_lib_test () {
name=$1 descr=$2 # stdin is the body of the test code
-- 
1.8.5.1.399.g900e7cd
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git:// protocol over SSL/TLS

2013-12-28 Thread Jeff King
On Fri, Dec 27, 2013 at 08:47:54PM +0600, Sergey Sharybin wrote:

  The web server software has nothing to do with HTTP[S] used by Git being
  smart, I think, it just has to be set up properly.
 
 Misunderstood git doc then which says it has to be Apache, currently
 - other CGI servers don't work, last I checked.

Do you happen to remember where you saw that claim? If the manual in
git's Documentation/ directory says that, I'd like to fix it.

I added sample lighttpd config to git help http-backend a while back.
I tested it at the time, but I do not currently run a lighttpd git
server at all.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12] Convert starts_with() to skip_prefix() for option parsing

2013-12-28 Thread Jeff King
On Thu, Dec 26, 2013 at 11:27:10AM -0800, Junio C Hamano wrote:

  I still need consensus on the name here guys, parse_prefix.
  remove_prefix or strip_prefix? If no other opinions i'll go with
  strip_prefix (Jeff's comment before parse_prefix() also uses strip)
 
 Yup, that comment is where I took strip from.  When you name your
 thing as X, using too generic a word X, and then need to explain
 what X does using a bit more specific word Y, you are often
 better off naming it after Y.

FWIW, the reason I shied away from strip is that I did not want to
imply that the function mutates the string. But since nobody else seems
concerned with that, I think strip is fine.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] t0000: set TEST_OUTPUT_DIRECTORY for sub-tests

2013-12-28 Thread Jeff King
On Sat, Dec 28, 2013 at 02:13:13PM -0800, Jonathan Nieder wrote:

 So the idea if I am reading correctly is Instead of relying on the
 implicit output directory chosen with chdir, which doesn't even work
 any more, set TEST_OUTPUT_DIRECTORY to decide where output for the
 sub-tests used by t's sanity checks for the test harness go.

Right.

 I'm not sure I completely understand the regression caused by 38b074d.
 Is the idea that before that commit, TEST_OUTPUT_DIRECTORY was only
 used for the test-results/ directory so the only harm done was some
 mixing of test results?

$TEST_OUTPUT_DIRECTORY was actually used in $TRASH_DIRECTORY, but some
code paths properly used $TRASH_DIRECTORY, and some used another
variable that (sometimes) contained a relative form of $TRASH_DIRECTORY.
The creation of the repo was one such code-path.  So there were already
potentially problems before 38b074d (any sub-test looking at its
$TRASH_DIRECTORY variable would find the wrong path), but I do not know
offhand if it could trigger any bugs.

Post-38b074d, we consistently use $TRASH_DIRECTORY (and therefore
respect $TEST_OUTPUT_DIRECTORY) everywhere.

 What is the symptom this patch alleviates?
 
  As a result, t's sub-tests are now created in git's
  original test output directory rather than in our trash
  directory.
 
 This might be the source of my confusion.  Is sub-tests an
 abbreviation for sub-test trash directories here?

Yes, I should have said sub-test trash directories. And I think that
answers your what is the symptom question.

  We could fix this by passing a new --root=$TRASH_DIRECTORY
  option to the sub-test. However, we do not want the sub-tests
  to write anything at all to git's directory (e.g., they
  should not be writing to t/test-results, either, although
  this is already handled by separate code).
 
 Ah, HARNESS_ACTIVE prevents output of test-results.

Yes. My original notion was Oh, and this fixes broken test-results,
too!. But then I noticed that it is already handled in a different way.
:)

 Does the git test harness write something else to
 TEST_OUTPUT_DIRECTORY?  Is the idea that using --root would be
 functionally equivalent but (1) more confusing and (2) less
 futureproof?

Exactly. I do not think TEST_OUTPUT_DIRECTORY is used for anything else,
but if someone were to ever add a new use, the sub-tests would almost
certainly want that use to affect only the t trash directory.

 So, to sum up: if I understand correctly

You answered these yourself in your follow-up. :)

 So the patch itself looks right.  I think describing the symptoms up
 front would probably be enough to make the commit message less
 confusing to read.

Would adding the missing trash directories wording above be
sufficient?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] t0000: drop known breakage test

2013-12-28 Thread Jeff King
On Sat, Dec 28, 2013 at 12:51:04PM -0800, Jonathan Nieder wrote:

 Jeff King wrote:
 
  I am not _that_ bothered by the known breakage, but AFAICT there is
  zero benefit to keeping this redundant test.
 
 Devil's advocate: it ensures that anyone wrapping git's tests (like
 the old smoketest infrastructure experiment) is able to handle an
 expected failure.

Thanks. One of the things I love about open source is that as soon as I
say I can't see how..., the answer is crowd-sourced for me. :)

That being said, even if the test has a non-zero possible value...

 But in practice I don't mind the behavior before or after this patch.
 If the test harness is that broken, we'll know.  And people writing
 code that wraps git's tests can write their own custom sanity-checks.

...I think for these reasons that the value is smaller than the
disruption caused by the test, and the patch is a net win.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Runaway git remote if group definition contains a remote by the same name

2013-12-28 Thread Jeff King
On Sat, Dec 28, 2013 at 03:56:55PM +0100, Alex Riesen wrote:

 it is also a way to create a fork bomb out of the innocent tool on platforms
 where pressing Ctrl-C does not terminate subprocesses of the foreground
 process (like, of course, Windows).
 
 To reproduce, run
 
git -c remotes.origin='origin other' remote update origin

Hmm. This is a pretty straightforward reference cycle. We expand origin
to contain itself, so it recurses forever on expansion. As with most
such problems, the cycle path may be longer than one:

  git -c remotes.foo='bar baz' -c remotes.bar='foo baz' fetch foo

Detecting the cycle can be done by keeping track of which names we've
seen, or just by putting in a depth limit that no sane setup would hit.
In either case, it's complicated slightly by the fact that we pass the
expanded list to a sub-process (which then recurses on the expansion).
So we'd have to communicate the depth (or the list of seen remotes) via
the command line or the environment.

One alternative would be to have the parent git fetch recursively
expand the list itself down to scalar entries, and then invoke
sub-processes on the result (and tell them not to expand at all). That
would also let us cull duplicates if a remote is found via multiple
groups.

Interestingly, the problem does not happen with this:

  git -c remotes.foo=foo fetch foo

Fetch sees that foo expands only to a single item and says oh, that
must not be a group. And then treats it like a regular remote, rather
than recursing. So it's not clear to me whether groups are meant to be
recursive or not. They are in some cases:

  # fetch remotes 1-4
  git -c remotes.parent='child1 child2' \
  -c remotes.child1='remote1 remote2' \
  -c remotes.child2='remote3 remote4' \
  fetch parent

but not in others:

  # foo should be an alias for bar, but it's not
  git -c remotes.foo=bar \
  -c remotes.bar='remote1 remote2' \
  fetch foo

If they are not allowed to recurse, the problem is much easier; the
parent fetch simply tells all of the sub-invocations not to expand the
arguments further. However, whether it was planned or not, it has been
this way for a long time. I would not be surprised if somebody is
relying on the recursion to help organize their groups.

So I think the sanest thing is probably:

  1. Teach fetch to expand recursively in a single process, and then
 tell sub-processes (via a new command-line option) not to expand
 any further.

  2. Teach fetch to detect cycles (probably just by a simple depth
 counter).

  3. Teach the group-reading code to detect groups more robustly, so
 that a single-item group like remotes.foo=bar correctly recurses
 to bar.

  4. (Optional) Teach the expansion code from step 1 to cull duplicates,
 so that we do not try to fetch from the same remote twice (e.g., if
 it is mentioned as part of two groups, and both are specified on
 the command line).

I do not plan to work on this myself in the immediate future, but
perhaps it is an interesting low-hanging fruit for somebody else.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Runaway git remote if group definition contains a remote by the same name

2013-12-31 Thread Jeff King
On Mon, Dec 30, 2013 at 11:10:31AM -0800, Junio C Hamano wrote:

  So I think the sanest thing is probably:
 
1. Teach fetch to expand recursively in a single process, and then
   tell sub-processes (via a new command-line option) not to expand
   any further.
 
2. Teach fetch to detect cycles (probably just by a simple depth
   counter).
 
 I suspect that the expansion code will just accumulate remotes found
 into a string-list (as part of 4. below), so deduping would be
 fairly easily done without a depth counter.

I don't think that will work (at least not naively). The end-product of
step 1, and the string_list that is de-duped in step 4, is a list of the
concrete remotes. The cycles occur between groups, which are not
mentioned in the final list.

You can keep a separate list of the groups we visit, of course, but we
do not otherwise need it.

One thing that does make such a list easier is that we do not need to
care about order. E.g., in config like this:

  [remotes]
  a = c
  b = c
  c = d e

you can mark c as seen after visiting it via a. It is not
technically a cycle, but since we would want to suppress duplicates
anyway, we can be overly broad.

3. Teach the group-reading code to detect groups more robustly, so
   that a single-item group like remotes.foo=bar correctly recurses
   to bar.
 
 A single-item remote group is somewhat annoying, but expanding it
 only at some places while ignoring it at other places is even more
 annoying, so this sounds like a right thing to do.

The only configuration that I think would be negatively affected is
something like:

  [remote]
  foo = foo
  [remote foo]
  url = ...

that silently works now, but would become broken (because we would
complain about the cycle). I think that's OK; that config is clearly
stupid and broken. If it were remote.foo = foo bar, trying to expand
the concrete foo and bar, that might make some sense, but then it is
already broken in the current code (that is the example that started the
discussion).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] t0000 cleanups

2013-12-31 Thread Jeff King
On Mon, Dec 30, 2013 at 10:51:25AM -0800, Jonathan Nieder wrote:

 I think it can be better, since the commit message left me scratching
 my head while the patch itself seems pretty simple.  How about
 something like the following?

I am fine with that format, though...

 Analysis and fix:
 
   These scratch areas for sub-tests should be under the t
   trash directory, but because the TEST_OUTPUT_DIRECTORY
   setting from the toplevel test leaks into the environment
   they are created under the toplevel output directory (typically
   t/) instead.  Because some of the sub-tests simulate failures,
   their trash directories are kept around.

This is not exactly true. The TEST_OUTPUT_DIRECTORY setting does not
leak. t sets $TEST_DIRECTORY (which it must, so the sub-scripts can
find test-lib.sh and friends), and then TEST_OUTPUT_DIRECTORY uses that
as a default if it is not explicitly set.

The rest of your rewrite looks correct.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drop unnecessary copying in credential_ask_one

2014-01-01 Thread Jeff King
On Thu, Jan 02, 2014 at 09:06:33AM +0800, Tay Ray Chuan wrote:

 We were leaking memory in there, as after obtaining a string from
 git_getpass, we returned a copy of it, yet no one else held the original
 string, apart from credential_ask_one.

I don't think this change is correct by itself.

credential_ask_one calls git_prompt. That function in turn calls
git_terminal_prompt, which returns a pointer to a static buffer (because
it may be backed by the system getpass() implementation).

So there is no leak there, and dropping the strdup would be bad (the
call to ask for the password would overwrite the value we got for the
username).

However, git_prompt may also call do_askpass if GIT_ASKPASS is set, and
here there is a leak, as we duplicate the buffer.  To stop the leak, we
need to first harmonize the do_askpass and git_terminal_prompt code
paths to either both allocate, or both return a static buffer (and then
either strdup or not in the caller, depending on which way we go).

It looks like what I originally wrote was correct, as both code paths
matched.  But then I stupidly broke it with 31b49d9, which failed to
notice the static specifier on the strbuf in do_askpass, and started
using strbuf_detach.

I think this is the simplest fix:

-- 8 --
Subject: Revert prompt: clean up strbuf usage

This reverts commit 31b49d9b653803e7c7fd18b21c8bdd86e3421668.

That commit taught do_askpass to hand ownership of our
buffer back to the caller rather than simply return a
pointer into our internal strbuf.  What it failed to notice,
though, was that our internal strbuf is static, because we
are trying to emulate the getpass() interface.

By handing off ownership, we created a memory leak that
cannot be solved. Sometimes git_prompt returns a static
buffer from getpass() (or our smarter git_terminal_prompt
wrapper), and sometimes it returns an allocated string from
do_askpass.

Signed-off-by: Jeff King p...@peff.net
---
 prompt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/prompt.c b/prompt.c
index d851807..d7bb17c 100644
--- a/prompt.c
+++ b/prompt.c
@@ -22,6 +22,7 @@ static char *do_askpass(const char *cmd, const char *prompt)
if (start_command(pass))
return NULL;
 
+   strbuf_reset(buffer);
if (strbuf_read(buffer, pass.out, 20)  0)
err = 1;
 
@@ -38,7 +39,7 @@ static char *do_askpass(const char *cmd, const char *prompt)
 
strbuf_setlen(buffer, strcspn(buffer.buf, \r\n));
 
-   return strbuf_detach(buffer, NULL);
+   return buffer.buf;
 }
 
 char *git_prompt(const char *prompt, int flags)
-- 
1.8.5.2.434.g63b1477






 
 Signed-off-by: Tay Ray Chuan rcta...@gmail.com
 ---
  credential.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/credential.c b/credential.c
 index 86397f3..0d02ad8 100644
 --- a/credential.c
 +++ b/credential.c
 @@ -54,7 +54,7 @@ static char *credential_ask_one(const char *what, struct 
 credential *c)
  
   strbuf_release(desc);
   strbuf_release(prompt);
 - return xstrdup(r);
 + return r;
  }
  
  static void credential_getpass(struct credential *c)
 -- 
 1.8.5-rc2
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drop unnecessary copying in credential_ask_one

2014-01-01 Thread Jeff King
On Wed, Jan 01, 2014 at 10:03:30PM -0500, Jeff King wrote:

 On Thu, Jan 02, 2014 at 09:06:33AM +0800, Tay Ray Chuan wrote:
 
  We were leaking memory in there, as after obtaining a string from
  git_getpass, we returned a copy of it, yet no one else held the original
  string, apart from credential_ask_one.
 
 I don't think this change is correct by itself.
 
 credential_ask_one calls git_prompt. That function in turn calls
 git_terminal_prompt, which returns a pointer to a static buffer (because
 it may be backed by the system getpass() implementation).
 
 So there is no leak there, and dropping the strdup would be bad (the
 call to ask for the password would overwrite the value we got for the
 username).

By the way, you can see the breakage from your patch pretty easily by
testing the terminal input. Disable any credential helper config you
have, and then run:

  GIT_CURL_VERBOSE=1 \
  git ls-remote https://github.com/peff/ask-for-auth 21 |
  perl -lne '/Authorization: Basic (.*)/ and print $1' |
  openssl base64 -d

enter myuser and mypass respectively on the terminal. The result is
that we send mypass:mypass to the server. And then double-free the
result, which cases glibc to barf.

I wondered why we did not see this breakage in test suite. My assumption
was that it was simply because our test user has the same username and
password. So I fixed that, but to my surprise we still did not detect
the problem. The issue is that your patch does the right thing when
GIT_ASKPASS is in use, and breaks only when the user types into the
terminal. But the test suite, of course, always uses askpass because it
cannot rely on accessing a terminal (we'd have to do some magic with
lib-terminal, I think).

So it doesn't detect the problem in your patch, but I wonder if it is
worth applying the patch below anyway, as it makes the test suite
slightly more robust.

-- 8 --
Subject: use distinct username/password for http auth tests

The httpd server we set up to test git's http client code
knows about a single account, in which both the username and
password are user@host (the unusual use of the @ here is
to verify that we handle the character correctly when URL
escaped).

This means that we may miss a certain class of errors in
which the username and password are mixed up internally by
git. We can make our tests more robust by having distinct
values for the username and password.

In addition to tweaking the server passwd file and the
client URL, we must teach the askpass harness to accept
multiple values. As a bonus, this makes the setup of some
tests more obvious; when we are expecting git to ask
only about the password, we can seed the username askpass
response with a bogus value.

Signed-off-by: Jeff King p...@peff.net
---
 t/lib-httpd.sh| 15 ---
 t/lib-httpd/passwd|  2 +-
 t/t5540-http-push.sh  |  4 ++--
 t/t5541-http-push.sh  |  6 +++---
 t/t5550-http-fetch.sh | 10 +-
 t/t5551-http-fetch.sh |  6 +++---
 6 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index c470784..bfdff2a 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -129,7 +129,7 @@ prepare_httpd() {
HTTPD_DEST=127.0.0.1:$LIB_HTTPD_PORT
HTTPD_URL=$HTTPD_PROTO://$HTTPD_DEST
HTTPD_URL_USER=$HTTPD_PROTO://user%40host@$HTTPD_DEST
-   HTTPD_URL_USER_PASS=$HTTPD_PROTO://user%40host:user%40host@$HTTPD_DEST
+   HTTPD_URL_USER_PASS=$HTTPD_PROTO://user%40host:pass%40host@$HTTPD_DEST
 
if test -n $LIB_HTTPD_DAV -o -n $LIB_HTTPD_SVN
then
@@ -217,7 +217,15 @@ setup_askpass_helper() {
test_expect_success 'setup askpass helper' '
write_script $TRASH_DIRECTORY/askpass -\EOF 
echo $TRASH_DIRECTORY/askpass-query askpass: $* 
-   cat $TRASH_DIRECTORY/askpass-response
+   case $* in
+   *Username*)
+   what=user
+   ;;
+   *Password*)
+   what=pass
+   ;;
+   esac 
+   cat $TRASH_DIRECTORY/askpass-$what
EOF
GIT_ASKPASS=$TRASH_DIRECTORY/askpass 
export GIT_ASKPASS 
@@ -227,7 +235,8 @@ setup_askpass_helper() {
 
 set_askpass() {
$TRASH_DIRECTORY/askpass-query 
-   echo $* $TRASH_DIRECTORY/askpass-response
+   echo $1 $TRASH_DIRECTORY/askpass-user 
+   echo $2 $TRASH_DIRECTORY/askpass-pass
 }
 
 expect_askpass() {
diff --git a/t/lib-httpd/passwd b/t/lib-httpd/passwd
index f2fbcad..99a34d6 100644
--- a/t/lib-httpd/passwd
+++ b/t/lib-httpd/passwd
@@ -1 +1 @@
-user@host:nKpa8pZUHx/ic
+user@host:xb4E8pqD81KQs
diff --git a/t/t5540-http-push.sh b/t/t5540-http-push.sh
index 01d0d95..5b0198c 100755
--- a/t/t5540-http-push.sh
+++ b/t/t5540-http-push.sh
@@ -154,7 +154,7 @@ test_http_push_nonff 
$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git \
 
 test_expect_success 'push to password-protected repository (user

Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-03 Thread Jeff King
On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote:

  - builtin/merge.c is the same, but it is conceptually even worse.
It has the end-user supplied string and wants to see if it is a
valid strategy.  If the user wants to use a custom strategy, a
single stat() to make sure if it exists should suffice, and the
error codepath should load the command list to present the names
of available ones in the error message.

Is it a single stat()? I think we would need to check each element of
$PATH. Though in practice, the exec-dir would be the first thing we
check, and where we would find the majority of hits. So it would still
be a win, as we would avoid touching anything but the exec-dir in the
common case.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Sun, Dec 22, 2013 at 09:55:23PM +, Ben Maurer wrote:

 One issue with this approach is that it seems git-pack-index doesn't
 perform as well with thin packs. git-index-pack uses a multi-threaded
 approach to resolving the deltas. However, the multithreading only
 works on deltas that are exclusively in the pack. After the
 multi-threaded phase, it incrementally brings in objects from external
 packs, but in single threaded manner. Many objects in the pack have
 some dependency on an external object, therefore, defeating the
 multithreading.

Yes. It will also just plain perform worse, because it will have to
copy over more external objects. This is somewhat made up for getting an
actual smaller pack size, but I suspect the completed thin-pack ends up
larger than what the server would otherwise send. Because the server is
blindly reusing on-disk deltas (which is good, because it takes load off
of the server), it misses good delta opportunities between objects in
the sent pack (which are likely almost as small, but would not require
fixing on the other end).

Single-threading the extra work we have to do just exacerbates the
problem, of course.

Still, I think it will be a net win for end-to-end wall clock time of
the operation. You are saving CPU time on the server end, and you're
saving network bandwidth with a smaller pack.

In my tests on torvalds/linux, doing a fetch across a local machine (so
basically discounting network improvements), the times look like (this
is end-to-end, counting both server and client CPU time):

  [vanilla]
  real0m3.850s
  user0m7.504s
  sys 0m0.380s

  [patched]
  real0m2.785s
  user0m2.472s
  sys 0m0.180s

So it was a win both for wall-clock and CPU.

 What's the use case for a pack file with a SHA1 reference living
 inside the pack file (why not just use an offset?) Would it make sense
 to assume that all such files are external and bring them in in the
 first phase.

Once upon a time, ref-delta was the only format supported by packfiles. Later,
delta-base-offset was invented, and the client and server negotiate the
use of the feature before the packfile is generated (and even when we
reuse objects, pack-objects will rewrite the header on the fly to use
ref-delta if necessary).

These days, pretty much everybody supports delta-base-offset, so I don't
think there is any reason index-pack should see ref-delta for a non-thin
object. We could probably teach index-pack an --assume-refs-are-thin
option to optimize for this case, and have fetch-pack/receive-pack pass
it whenever they know that delta-base-offset was negotiated.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Sun, Dec 22, 2013 at 11:47:34AM -0800, Ben Maurer wrote:

 Jeff King's bitmap branch appears to give a very substantial speedup.
 After applying this branch, the counting objects phase is basically
 free. However, I found that the compression phase still takes a
 non-trivial amount of time.

Sorry for the slow reply; I've been on vacation.

First off, I'm excited that you're looking into using bitmaps. We've
been using them for a while at GitHub, but more testing is definitely
appreciated. :)

When you build your bitmaps, do you set the pack.writeBitmapHashCache
option? We found that it makes a significant difference during the
compression phase, as otherwise git attempts deltas between random files
based on size. Here are some numbers for a simulated fetch from
torvalds/linux, representing about 7 weeks of history. Running:

  from=2d3c627502f2a9b0a7de06a5a2df2365542a72c9
  to=f0a679afefc0b6288310f88606b4bb1f243f1aa9
  run() {
(echo $to  echo ^$from) |
git pack-objects --stdout --all-progress --revs /dev/null
  }

  echo == no hash cache
  git repack -adb 2/dev/null
  time run

  echo == with hash cache
  git -c pack.writebitmaphashcache=1 repack -adb 2/dev/null
  time run

produces:

  == no hash cache
  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (7700/7700), done.
  Writing objects: 100% (20661/20661), 23.23 MiB | 11.13 MiB/s, done.
  Total 20661 (delta 13884), reused 16638 (delta 12940)

  real0m3.626s
  user0m10.760s
  sys 0m0.060s

  == with hash cache
  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (7700/7700), done.
  Writing objects: 100% (20661/20661), 22.64 MiB | 10.82 MiB/s, done.
  Total 20661 (delta 14038), reused 16638 (delta 12940)

  real0m3.072s
  user0m6.168s
  sys 0m0.100s

So our resulting pack shrinks a little because we find better deltas,
but note that we save a fair bit of CPU time (the wall clock time ends
up not all that different, because the single-threaded writing phase
represents a big chunk of that).

 It looks like most of the time spent compressing objects was in cases
 where the object was already compressed in the packfile, but the delta
 was based on an object that the client already had. For some reason,
 --thin wasn't enabling reuse of these deltas.

I'm not too surprised. The long-time strategy for a fetch has been to
walk down the haves and wants to their merge base. That boundary
commit is marked as a preferred base, meaning we won't send it, but
it's a good base for other objects, since we know the client has it.

Technically _all_ of the history reachable from that merge base could be
marked as a preferred base, but we don't do so for efficiency reasons:

  1. It's expensive to walk the full object graph for a small fetch, and

  2. You would clog the delta-search algorithm if you had a very large
 number of preferred-base objects.

With bitmaps, though, the history walk is free (we just check each
object against our have bitmap), so (1) is a non-issue. For (2), we
probably don't want to stick each object into the preferred-base list,
but we do want to reuse on-disk deltas we have, if we know the other
side has the base.

I don't know if you went through the same line of thinking, but that
matches your proposed solution. :)

 This is a hacky, poorly styled attempt to figure how how much better
 performance could be. With the bitmap branch, git should know what
 objects the client has already and can easily test if an existing delta
 can be reused. I don't know the branch well enough to code this, so
 as a hack, I just assumed the client has any delta base that is in
 the pack file (for our repo, this is always true, because we have a
 linear history)

Even without a linear history, it mostly works. If you are fetching all
of the branches from the other side, then you will end up with all of
the objects that the remote has. Which means that either you already
have the base, or the remote is about to send it to you.

It will break down, though, whenever the other side has something you're
not fetching. For that you really need to do the have bitmap check.

 This greatly reduces the time:
 
 $ { echo HEAD  echo ^$have; } | time ../git-bitmap/install/bin/git 
 pack-objects --use-bitmap-index --revs --stdout --thin  /dev/null
 Counting objects: 220909, done.
 Compressing objects: 100% (14203/14203), done.
 Total 220909 (delta 194050), reused 220909 (delta 199885)
 3.57user 1.28system 0:04.59elapsed 105%CPU (0avgtext+0avgdata 
 2007296maxresident)k
 0inputs+0outputs (0major+416243minor)pagefaults 0swaps

You might try with --all-progress (or pipe to wc -c), as this should
be reducing the output size, too.

Here's my same torvalds/linux test, run with the patch I'm including
below:

  Counting objects: 20661, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (3677/3677), done.
  Writing 

Re: Bug report: stash in upstream caused remote fetch to fail

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 08:16:31AM -0800, Junio C Hamano wrote:

  I was going to ask you to send your repository, but I can easily
  reproduce here. I guess people don't run into it because it's uncommon
  to fetch the whole refs/ namespace from a non-bare repo (and bare repos
  do not tend to have stashes). Here's a minimal reproduction recipe:
 
git init repo 
cd repo 
echo content foo 
git add . 
git commit -m foo 
echo more foo 
git stash 
git init --bare sub 
cd sub 
git fetch .. 'refs/*:refs/*'
 
  It looks like we are not feeding refs/stash properly to pack-objects.
  I'll try to take a closer look later today.
 
 I looked at this in the past and I vaguely recall that we reject it
 in the for-each-ref loop with check-ref-format saying eh, that is a
 single-level name.
 
 At that point I stopped digging, thinking it was a feature ;-)
 based on your exact observation about stash vs bare/non-bare.

I am fine with rejecting it with a warning, but we should not then
complain that the other side did not send us the object, since we should
not be asking for it at all. I also do not see us complaining about the
funny ref anywhere.  So there is definitely _a_ bug here. :)

I think somebody else mentioned recently that we do not handle malformed
refs consistently. I think it was:

  http://article.gmane.org/gmane.comp.version-control.git/239381

which might or might not be related.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 08:37:49AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  We could probably teach index-pack an --assume-refs-are-thin
  option to optimize for this case, and have fetch-pack/receive-pack pass
  it whenever they know that delta-base-offset was negotiated.
 
 I thought the existing negotiation merely means I understand offset
 encoded bases, so you are allowed to use that encoding, not I will
 not accept encoding other than the offset format, so you must use
 that encoding for everything.

You are right about what it means. But this is an optimization, not a
correctness thing. So if we assume that senders who are allowed to send
offsets will generally do so, it might be a reasonable optimization to
guess that ref-delta objects will need thin completion. If we are wrong,
the worst case is that we add an extra local object to the end of the
pack. So as long as we are right most of the time, it may still be a
win.

Of course, it may also be possible to simply multi-thread the
thin-completion portion of index-pack. That would be even better, though
I am not sure how it would work. The resolution of an object in one
thread can always become the input for another thread. But maybe we
could have each thread come up with a proposed set of objects to add to
the pack, and then drop duplicates or something. I haven't looked
closely.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 12:06:51PM -0800, Junio C Hamano wrote:

 Unless you set @{u} to this new configuration, in which case the
 choice becomes dynamic depending on the current branch, but
 
  - if that is the only sane choice based on the current branch, why
not use that as the default without having to set the
configuration?
 
  - Or if that is still insufficient, don't we need branch.*.forkedFrom
that is different from branch.*.merge, so that different branches
you want to show format-patch output can have different
reference points?

Yeah, I had similar thoughts. I personally use branch.*.merge as
forkedFrom, and it seems like we are going that way anyway with things
like git rebase and git merge defaulting to upstream. But then there
is git push -u and push.default = upstream, which treats the
upstream config as something else entirely.

So it seems like there is already some confusion, and either way we go,
thisis making it worse to some degree (I do not blame Ram, but rather he
has stumbled into a hidden sand pit that we have been building for the
past few years... :).

I wonder if it is too late to try to clarify this dual usage. It kind of
seems like the push config is this is the place I publish to. Which,
in many workflows, just so happens to be the exact same as the place you
forked from. Could we introduce a new branch.*.pushupstream variable
that falls back to branch.*.merge? Or is that just throwing more fuel on
the fire (more sand in the pit in my analogy, I guess).

I admit I haven't thought it through yet, though. And even if it does
work, it may throw a slight monkey wrench in the proposed push.default
transition.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 09:57:23AM -0500, Jeff King wrote:

 diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
 index c733379..0cff874 100644
 --- a/builtin/pack-objects.c
 +++ b/builtin/pack-objects.c
 @@ -1402,6 +1402,19 @@ static void check_object(struct object_entry *entry)
   base_entry-delta_child = entry;
   unuse_pack(w_curs);
   return;
 + } else if(base_ref  bitmap_have(base_ref)) {
 + entry-type = entry-in_pack_type;
 + entry-delta_size = entry-size;
 + /*
 +  * XXX we'll leak this, but it's probably OK
 +  * since we'll exit immediately after the packing
 +  * is done
 +  */
 + entry-delta = xcalloc(1, sizeof(*entry-delta));
 + hashcpy(entry-delta-idx.sha1, base_ref);
 + entry-delta-preferred_base = 1;
 + unuse_pack(w_curs);
 + return;
   }

Just reading over this again, the conditional here should obviously be
checking thin (which needs to become a global, as in your patch).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 03:29:57PM -0500, John Szakmeister wrote:

  Yeah, I had similar thoughts. I personally use branch.*.merge as
  forkedFrom, and it seems like we are going that way anyway with things
  like git rebase and git merge defaulting to upstream. But then there
  is git push -u and push.default = upstream, which treats the
  upstream config as something else entirely.
 
 Just for more reference, I rarely use branch.*.merge as
 forkedFrom.  I typically want to use master as my target, but like
 Ram, I publish my changes elsewhere for safe keeping.  I think in a
 typical, feature branch-based workflow @{u} would be nearly useless.

In my feature-branch development for git.git, my upstream is almost
always origin/master[1]. However, sometimes feature branches have
dependencies[2] on each other. Representing that via the upstream
field makes sense, since that is what you forked from, and what you
would want git rebase to start from.

-Peff

[1] I do not even have a local master branch for git.git work, as it
would just be a pain to keep up to date. I am either working
directly on a topic branch, or I am integrating in my own personal
branch.

[2] You should try to minimize dependencies between feature branches, of
course, but sometimes they simply form a logical progression of
features.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 12:38:30PM -0800, Junio C Hamano wrote:

  I wonder if it is too late to try to clarify this dual usage. It kind of
  seems like the push config is this is the place I publish to. Which,
  in many workflows, just so happens to be the exact same as the place you
  forked from. Could we introduce a new branch.*.pushupstream variable
  that falls back to branch.*.merge? Or is that just throwing more fuel on
  the fire (more sand in the pit in my analogy, I guess).
 
  I admit I haven't thought it through yet, though. And even if it does
  work, it may throw a slight monkey wrench in the proposed push.default
  transition.
 
 Yeah, when I say upstream, I never mean it as where I publish.
 Your upstream is where you get others' work from.

That's my thinking, as well, but it means the upstream push.default is
nonsensical. I've thought that all along, but it seems like other people
find it useful. I guess because they are in a non-triangular,
non-feature-branch setup (I suppose you could think of a central-repo
feature-branch workflow as a special form of triangular setup, where
the remote is bi-directional, but the branch names are triangular).

If we want to declare push -u and push.default=upstream as tools for
certain simple bi-directional workflows, that makes sense. But I suspect
it may cause extra confusion when people make the jump to using a
triangular workflow.

 For a push to somewhere for safekeeping or other people to look at
 triangular workflow, it does not make any sense to treat that I
 publish there place as an upstream (hence having branch.*.remote
 pointing at that publishing point).

You _might_ treat it the same way we treat the upstream, in some special
cases. For example, when you say git status, it is useful to see how
your topic and the upstream have progressed (i.e., do I need to pull
from upstream?). But you may _also_ want to know how your local branch
differs from its pushed counterpart (i.e., do I have unsaved commits
here that I want to push up?).

So having two config options might help with that. Of course, your push
upstream (or whatever you want to call it) does not logically have one
value. You may push to several places, and would want to compare to
each.

 Once you stop doing that, and
 instead using branch.*.remote = origin, and branch.*.merge = master,
 where 'origin' is not your publishing point, @{u} will again start
 making sense, I think.
 
 And I thought that is what setting remote.pushdefault to the
 publishing point repository was about.

If that were sufficient, then we would just need push.default =
current, and not upstream (nor simple). I lobbied for that during
the discussion, but people seemed to think that upstream was
better/more useful. Maybe it was just because remote.pushdefault did not
exist then.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 09:15:04PM +, Ben Maurer wrote:

  Let me know how this patch does for you. My testing has been fairly
  limited so far.
 
 This patch looks like a much cleaner version :-). Works well for me,
 but my test setup may not be great since I didn't get any errors from
 completely ignoring the haves bitmap :-).

Great. Out of curiosity, can you show the before/after? The timings
should be similar to what your patch produced, but I'm really curious to
see how the pack size changes.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: stash in upstream caused remote fetch to fail

2014-01-06 Thread Jeff King
On Mon, Jan 06, 2014 at 12:17:21PM -0800, Junio C Hamano wrote:

  I am fine with rejecting it with a warning, but we should not then
  complain that the other side did not send us the object, since we should
  not be asking for it at all. I also do not see us complaining about the
  funny ref anywhere.  So there is definitely _a_ bug here. :)
 
 Oh, no question about that.  I was just pointing somebody who
 already has volunteered to take a look in a direction I recall was
 where the issue was ;-)

Oh, crud, did I volunteer? :)

So I found the problem, but I'm really not sure what to make of it. We
do check the refname format when evaluating the refspecs, in:

do_fetch
  get_ref_map
get_fetch_map
  check_refname_format

Before calling it, we check that it starts with refs/, and then pass
the _whole_ refname into check_refname_format. So the latter sees
refs/stash. And that's acceptable, as it's not a single-level ref.
Thus we do not get the funny ref message.

The code looks like this:

  if (!starts_with((*rmp)-peer_ref-name, refs/) ||
  check_refname_format((*rmp)-peer_ref-name, 0)) {
/* print funny ref and ignore */

Then we ask fetch_refs_via_pack to get the actual objects for us. And
it checks our refs again, with this call chain:

  do_fetch
fetch_refs
  transport_fetch_refs
fetch_refs_via_pack
  fetch_pack
do_fetch_pack
  everything_local
filter_refs
  check_refname_format

Here, the code looks like this:

  if (!memcmp(ref-name, refs/, 5) 
  check_refname_format(ref-name + 5, 0))
; /* trash */

At first I thought we are doing the same check (is it syntactically
valid, and is it in refs/), but we're not. We are actually checking
the format _only_ of stuff in refs/, and ignoring the rest. Which
really makes no sense to me.

If it were memcmp(...) || check_refname_format(), then it would be
roughly the same check. But it would still be wrong, because note that
we pass only the bits under refs/ to check_refname_format. So it sees
only stash, and then complains that it is single-level.

So the symptom we are seeing is because we are filtering with two
different rulesets in different places. But I'm really not sure how to
resolve it. The one in filter_refs seems nonsensical to me.

Checking _only_ things under refs/ doesn't make sense. And even if that
was sensible, feeding half a ref to check_refname_format does not work.
In addition to the single-level check, it has other rules that want
to see the whole ref (e.g., the ref @ is not valid, but refs/@ is
OK; it cannot distinguish them without seeing the prefix).

So I can see two options:

  1. Make the check in filter_refs look like the one in get_fetch_map.
 That at least makes them the same, which alleviates the symptom.
 But we still are running two checks, and if they ever get out of
 sync, it causes problems.

  2. Abolish the check in filter_refs. I think this makes the most sense
 from the perspective of fetch, because we will already have cleaned up
 the ref list. But we might need to leave the filtering in place for
 people who call fetch-pack as a bare plumbing command.

It's really not clear to me what the check in filter_refs was trying to
do. It dates all the way back to 1baaae5 (Make maximal use of the remote
refs, 2005-10-28), but there is not much explanation. I haven't dug into
the list around that time to see if there's any discussion.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false

2014-01-07 Thread Jeff King
On Mon, Jan 06, 2014 at 07:35:04PM -0800, Brodie Rao wrote:

 On Mon, Jan 6, 2014 at 7:32 PM, Brodie Rao bro...@sf.io wrote:
  This change ensures get_sha1_basic() doesn't try to resolve full hashes
  as refs when ambiguous ref warnings are disabled.
 
  This provides a substantial performance improvement when passing many
  hashes to a command (like git rev-list --stdin) when
  core.warnambiguousrefs is false. The check incurs 6 stat()s for every
  hash supplied, which can be costly over NFS.
 
 Forgot to add:
 
 Signed-off-by: Brodie Rao bro...@sf.io

Looks good to me.

I wonder if I should have simply gone this route instead of adding
warn_on_object_refname_ambiguity, and then people who want cat-file
--batch to be fast could just turn off core.warnAmbiguousRefs. I wanted
it to happen automatically, though. Alternatively, I guess cat-file
--batch could just turn off warn_ambiguous_refs itself.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] Making use of bitmaps for thin objects

2014-01-07 Thread Jeff King
On Mon, Jan 06, 2014 at 10:14:30PM +, Ben Maurer wrote:

 It looks like for my repo the size win wasn't as big (~10%). Is it
 possible that with the kernel test you got extremely lucky and there
 was some huge binary blob that thin packing turned into a tiny delta?

I don't think so. When I look at the reused-delta numbers, I see that we
reused ~3000 extra deltas (and the compressing progress meter drops by
the same amount. If I do a full clone (or just index-pack the result),
it claims ~3000 thin objects completed with local objects (versus 0 in
the normal case).

So I think we really are getting a lot of little savings adding up,
which makes sense. If there were thousands of changed files, a non-thin
pack has to have at least _one_ full version of each file. With thin
packs, we might literally have only deltas.

It was a 7-week period, which might make more difference. I'm going to
run some experiments with different time periods to see if that changes
anything.

It might also be the repo contents. I'm going to try my experiments on a
few different repositories. It may be that either the kernel or your
repo is unusual in some way.

Or maybe I was just lucky. :)

 When you get a chance, it'd be handy if you could push an updated
 version of your change out to your public github repo. I'd like to see
 if folks here are interested in testing this more, and it'd be good to
 make sure we're testing the diff that is targeted for upstream.

I've pushed it to:

  git://github.com/peff/git.git jk/bitmap-reuse-delta

I'll continue to rebase it forward as time goes on (until a cleaned-up
version gets merged upstream), but the tip of that branch should always
be in a working state.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drop unnecessary copying in credential_ask_one

2014-01-07 Thread Jeff King
On Thu, Jan 02, 2014 at 11:08:51AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  ... But the test suite, of course, always uses askpass because it
  cannot rely on accessing a terminal (we'd have to do some magic with
  lib-terminal, I think).
 
  So it doesn't detect the problem in your patch, but I wonder if it is
  worth applying the patch below anyway, as it makes the test suite
  slightly more robust.
 
 Sounds like a good first step in the right direction.  Thanks.

I took a brief look at adding real terminal tests for the credential
code using our test-terminal/lib-terminal.sh setup. Unfortunately, it
falls short of what we need.

test-terminal only handles stdout and stderr streams as fake terminals.
We could pretty easily add stdin for input, as it uses fork() to work
asynchronously.  But the credential code does not actually read from
stdin. It opens and reads from /dev/tty explicitly. So I think we'd have
to actually fake setting up a controlling terminal. And that means magic
with setsid() and ioctl(TIOCSCTTY), which in turn sounds like a
portability headache.

So it's definitely possible under Linux, and probably under most Unixes.
But I'm not sure it's worth the effort, given that review already caught
the potential bug here.

Another option would be to instrument git_terminal_prompt with a
mock-terminal interface (say, reading from a file specified in an
environment variable). But I really hate polluting the code with test
cruft, and it would not actually be testing an interesting segment of
the code, anyway.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 09:51:07AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Alternatively, I guess cat-file
  --batch could just turn off warn_ambiguous_refs itself.
 
 Sounds like a sensible way to go, perhaps on top of this change?

The downside is that we would not warn about ambiguous refs anymore,
even if the user was expecting it to. I don't know if that matters much.
I kind of feel in the --batch situation that it is somewhat useless (I
wonder if rev-list --stdin should turn it off, too).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 11:38:15AM -0800, Junio C Hamano wrote:

   Alternatively, I guess cat-file
   --batch could just turn off warn_ambiguous_refs itself.
  
  Sounds like a sensible way to go, perhaps on top of this change?
 
  The downside is that we would not warn about ambiguous refs anymore,
  even if the user was expecting it to. I don't know if that matters much.
 
 That is true already with or without Brodie's change, isn't it?
 With warn_on_object_refname_ambiguity, cat-file --batch makes us
 ignore core.warnambigousrefs setting.  If we redo 25fba78d
 (cat-file: disable object/refname ambiguity check for batch mode,
 2013-07-12) to unconditionally disable warn_ambiguous_refs in
 cat-file --batch and get rid of warn_on_object_refname_ambiguity,
 the end result would be the same, no?

No, I don't think the end effect is the same (or maybe we are not
talking about the same thing. :) ).

There are two ambiguity situations:

  1. Ambiguous non-fully-qualified refs (e.g., same tag and head name).

  2. 40-hex sha1 object names which might also be unqualified ref names.

Prior to 25ffba78d, cat-file checked both (like all the rest of git).
But checking (2) is very expensive, since otherwise a 40-hex sha1 does
not need to do a ref lookup at all, and something like rev-list
--objects | cat-file --batch-check processes a large number of these.

Detecting (1) is not nearly as expensive. You must already be doing a
ref lookup to trigger it (so the relative cost is much closer), and your
query size is bounded by the number of refs, not the number of objects.

Commit 25ffba78d traded off some safety for a lot of performance by
disabling (2), but left (1) in place because the tradeoff is different.

The two options I was musing over earlier today were (all on top of
Brodie's patch):

  a. Revert 25ffba78d. With Brodie's patch, core.warnAmbiguousRefs
 disables _both_ warnings. So we default to safe-but-slow, but
 people who care about performance can turn off ambiguity warnings.
 The downside is that you have to know to turn it off manually (and
 it's a global config flag, so you end up turning it off
 _everywhere_, not just in big queries where it matters).

  b. Revert 25ffba78d, but then on top of it just turn off
 warn_ambiguous_refs unconditionally in cat-file --batch-check.
 The downside is that we drop the safety from (1). The upside is
 that the code is a little simpler, as we drop the extra flag.

And obviously:

  c. Just leave it at Brodie's patch with nothing else on top.

My thinking in favor of (b) was basically does anybody actually care
about ambiguous refs in this situation anyway?. If they do, then I
think (c) is my preferred choice.

  I kind of feel in the --batch situation that it is somewhat useless (I
  wonder if rev-list --stdin should turn it off, too).
 
 I think doing the same as cat-file --batch in rev-list --stdin
 makes sense.  Both interfaces are designed to grok extended SHA-1s,
 and full 40-hex object names could be ambiguous and we are missing
 the warning for them.

I'm not sure I understand what you are saying. We _do_ have the warning
for rev-list --stdin currently. We do _not_ have the warning for
cat-file --batch, since my 25ffba78d. I was wondering if rev-list
should go the same way as 25ffba78d, for efficiency reasons (e.g., think
piping to rev-list --no-walk --stdin).

 Or are you wondering if we should revert 25fba78d, apply Brodie's
 change to skip the ref resolution whose result is never used, and
 tell people who want to use cat-file --batch (or rev-list
 --stdin) to disable the ambiguity warning themselves?

See above. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drop unnecessary copying in credential_ask_one

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 11:44:00AM -0800, Junio C Hamano wrote:

  test-terminal only handles stdout and stderr streams as fake terminals.
  We could pretty easily add stdin for input, as it uses fork() to work
  asynchronously.  But the credential code does not actually read from
  stdin. It opens and reads from /dev/tty explicitly. So I think we'd have
  to actually fake setting up a controlling terminal. And that means magic
  with setsid() and ioctl(TIOCSCTTY), which in turn sounds like a
  portability headache.
 
 I wonder if expect has already solved that for us.

I would not be surprised if it did. Though it introduces its own
portability issues, since we cannot depend on having it. But it is
probably enough to just

  test_lazy_prereq EXPECT 'expect --version'

or something. I dunno. I have never used expect, do not have it
installed, and am not excited about introducing a new tool dependency.
But if you want to explore it, be my guest.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] format-patch: introduce branch.*.forkedFrom

2014-01-07 Thread Jeff King
On Wed, Jan 08, 2014 at 02:00:44AM +0530, Ramkumar Ramachandra wrote:

 On Wed, Jan 8, 2014 at 1:59 AM, Ramkumar Ramachandra artag...@gmail.com 
 wrote:
  A very common workflow for preparing patches involves working off a
  topic branch and generating patches against 'master' to send off to the
  maintainer. However, a plain
 
$ git format-patch -o outgoing
 
  is a no-op on a topic branch, and the user has to remember to specify
  'master' explicitly everytime. This problem is not unique to
  format-patch; even a
 
$ git rebase -i
 
  is a no-op because the branch to rebase against isn't specified.
 
  To tackle this problem, introduce branch.*.forkedFrom which can specify
  the parent branch of a topic branch. Future patches will build
  functionality around this new configuration variable.
 
  Cc: Jeff King p...@peff.net
  Cc: Junio C Hamano gis...@pobox.com
  Signed-off-by: Ramkumar Ramachandra artag...@gmail.com

I have not carefully read some of the later bits of the discussion from
last night / this morning, so maybe I am missing something, but this
seems backwards to me from what Junio and I were discussing earlier.

The point was that the meaning of @{upstream} (and branch.*.merge)
is _already_ forked-from, and push -u and push.default=upstream
are the odd men out. If we are going to add an option to distinguish the
two branch relationships:

  1. Where you forked from

  2. Where you push to

we should leave @{upstream} as (1), and add a new option to represent
(2). Not the other way around.

Am I missing something?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 03:40:56AM +0530, Ramkumar Ramachandra wrote:

 Jeff King wrote:
  Yeah, I had similar thoughts. I personally use branch.*.merge as
  forkedFrom, and it seems like we are going that way anyway with things
  like git rebase and git merge defaulting to upstream.
 
 My issue with that is that I no idea where my branch is with respect
 to my forked upstream; I find that extremely useful when doing
 re-spins.

Right. I think there are two separate relationships, and they are both
shoe-horned into upstream. The solution is to let them be configured
separately (and fallback on each other as appropriate to make the burden
less on the user).

 push.default = upstream is a bit of a disaster, in my opinion. I've
 advocated push.default = current on multiple occasions, and wrote the
 initial remote.pushDefault series with that configuration in mind.

Yeah, I agree with all of that.

  I wonder if it is too late to try to clarify this dual usage. It kind of
  seems like the push config is this is the place I publish to. Which,
  in many workflows, just so happens to be the exact same as the place you
  forked from. Could we introduce a new branch.*.pushupstream variable
  that falls back to branch.*.merge? Or is that just throwing more fuel on
  the fire (more sand in the pit in my analogy, I guess).
 
 We already have a branch.*.pushremote, and I don't see the value of
 branch.*.pushbranch (what you're referring to as pushupstream, I
 assume) except for Gerrit users.

Yes, pushbranch is probably a better name for what I am referring to.
I agree that pushremote is probably enough for sane cases. I seem to
recall that people advocating the upstream push-default thought that
branch name mapping was a useful feature, but I might be
mis-remembering. I will let those people speak up for the feature if
they see fit; it seems somewhat crazy to me.

 Frankly, I don't use full triangular workflows myself mainly because
 my prompt is compromised: when I have a branch.*.remote different from
 branch.*.pushremote, I'd like to see where my branch is with respect
 to @{u} and @{publish} (not yet invented);

Yes, as two separate relationships, you would theoretically want to be
able to see them separately (or simultaneously side by side). Whether
exposing that in the prompt is too clunky, I don't know (I don't even
show ahead/behind in my prompt, but rather prefer to query it when I
care; I have a separate script that queries the ahead/behind against my
publishing point, but it would be nice if git handled this itself).

  I admit I haven't thought it through yet, though. And even if it does
  work, it may throw a slight monkey wrench in the proposed push.default
  transition.
 
 We're transitioning to push.default = simple which is even simpler
 than current.

Simpler in the sense that it is less likely to do something unexpected.
But the rules are actually more complicated. Two examples:

  1. Imagine I make a feature branch foo forked off of origin/master, then
 git push with no arguments. The current scheme would go to
 foo on origin, but upstream would go to master. Since they
 don't agree, simple will punt and tell me to be more specific.

  2. Imagine I have set my default push remote to publish, am on
 master (forked from origin/master) and I run git push without
 arguments. current would push to master on publish. But
 upstream will complain, because we are not pushing to our
 upstream remote. I believe simple will therefore reject this.

In both cases, I think current does the sane thing, and simple makes
things more complicated. The one saving grace it has is that it punts on
these cases rather than potentially doing something destructive that the
user did not intend.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 04:17:00AM +0530, Ramkumar Ramachandra wrote:

 Junio C Hamano wrote:.
  As I said in the different subthread, I am not convinced that you
  would need the complexity of branch.*.forkedFrom.  If you set your
  upstream to the true upstream (not your publishing point), and
  have remote.pushdefaultset to 'publish', you can expect
 
  git push
 
  to do the right thing, and then always say
 
  git show-branch publish/topic topic
 
 I think it's highly sub-optimal to have a local-branch @{u} for
 several reasons; the prompt is almost useless in this case, and it
 will always show your forked-branch ahead of 'master' (assuming that
 'master' doesn't update itself in the duration of your development).

I actually use local-branch @{u} all the time to represent inter-topic
dependencies. For example, imagine I have topic bar which builds on
topic foo, which is based on master. I would have:

  [branch foo]
remote = origin
merge = refs/heads/master
  [branch bar]
remote = .
merge = refs/heads/foo

When I rebase foo, I want to rebase it against upstream's master. When
I rebase bar, I want to rebase it against foo. And naturally, upstream
does not necessarily have a foo, because it is my topic, not theirs (I
_may_ have published my foo somewhere, but that is orthogonal, and
anyway my local foo is the most up-to-date source, not the pushed
version).

As an aside, if you want to rebase both branches, you have to topo-sort
them to make sure you do foo first, then rebase bar on the result.
My daily procedure is something like:

  all_topics |
  while read topic; do echo $topic $(git rev-parse $topic@{u}); done |
  topo_sort |
  while read topic upstream; do
git rebase $upstream $topic || exit 1
  done

 While doing respins, the prompt doesn't aid you in any way. Besides,
 on several occasions, I found myself working on the same forked-branch
 from two different machines; then the publish-point isn't necessarily
 always a publish-point: it's just another upstream for the branch.

Right, things get trickier then. But I don't think there is an automatic
way around that. Sometimes the published one is more up to date, and
sometimes the upstream thing is more up to date.  You have to manually
tell git which you are currently basing your work on. I find in such a
situation that it tends to resolve itself quickly, though, as the first
step is to pull in the changes you pushed up from the other machine
anyway (either via git reset or git rebase).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] format-patch: introduce branch.*.forkedFrom

2014-01-07 Thread Jeff King
On Wed, Jan 08, 2014 at 02:32:10AM +0530, Ramkumar Ramachandra wrote:

  we should leave @{upstream} as (1), and add a new option to represent
  (2). Not the other way around.
 
 I have a local branch 'forkedfrom' that has two sources: 'master'
 and 'ram/forkedfrom'. 'ram/forkedfrom' isn't a dumb publish-point:
 the relationship information I get between 'forkedfrom' and
 'ram/forkedfrom' is very useful; it's what helps me tell how my
 re-roll is doing with respect to the original series; I'd often want
 to cherry-pick commits/ messages from my original series to prepare
 the re-roll, so interaction with this source is quite high. On the
 other hand, the relationship information I get between 'forkedfrom'
 and 'master' is practically useless: 'forkedfrom' is always ahead of
 'master', and a divergence indicates that I need to rebase; I'll never
 really need to interact with this source.

Thanks for a concrete example.

I definitely respect the desire to reuse the existing tooling we have
for @{u}. At the same time, I think you are warping the meaning of
@{u} somewhat. It is _not_ your upstream here, but rather another
version of the branch that has useful changes in it. That might be
splitting hairs a bit, but I think you will find that the differences
leak through in inconvenient spots (like format-patch, where you really
_do_ want to default to the true upstream).

If we add @{publish} (and @{pu}), then it becomes very convenient to
refer to the ram/ version of your branch. That seems like an obvious
first step to me. We don't have to add new config, because
branch.*.pushremote already handles this.

Now you can do git rebase @{pu} which is nice, but not _quite_ as nice
as git rebase, which defaults to @{u}. That first step might be
enough, and I'd hold off there and try it out for a few days or weeks
first. But if you find in your workflow that you are having to specify
@{pu} a lot, then maybe it is worth adding an option to default rebase
to @{pu} instead of @{u}.

You end up in the same place (git rebase without options does what you
want), but I think the underlying data more accurately represents what
is going on (and there is no need to teach format-patch anything
special).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 01:07:08PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Yes, pushbranch is probably a better name for what I am referring to.
  I agree that pushremote is probably enough for sane cases. I seem to
  recall that people advocating the upstream push-default thought that
  branch name mapping was a useful feature, but I might be
  mis-remembering. I will let those people speak up for the feature if
  they see fit; it seems somewhat crazy to me.
 
 I think branch mapping you recall are for those who want to push
 their 'topic' to 'review/topic' or something like that.  With Git
 post 7cdebd8a (Merge branch 'jc/push-refmap', 2013-12-27), I think
 remote.*.push can be used to implement that, by the way.

Hmm. The top patch of that series still relies on upstream being a
push destination, though. So if I have a triangular workflow where I
fork topic from origin/master, my git push origin topic will go to
refs/heads/master on origin under the upstream rule. So that seems
broken as ever. :)

But I guess what you are referring to is that in a triangular world, the
second patch lets me do:

  git config push.default current
  git config remote.origin.push 'refs/heads/*:refs/review/*'

And then git push, git push origin, or git push origin topic all
put it in review/topic, which is what I want.

I think that is sensible, and only heightens my sense of the upstream
push.default as useless. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-07 Thread Jeff King
On Wed, Jan 08, 2014 at 02:55:04AM +0530, Ramkumar Ramachandra wrote:

 Jeff King wrote:
  My daily procedure is something like:
 
all_topics |
while read topic; do echo $topic $(git rev-parse $topic@{u}); done |
topo_sort |
while read topic upstream; do
  git rebase $upstream $topic || exit 1
done
 
 Ah, I was perhaps over-specializing for my own usecase, where
 everything is based off 'master'. I never considered 'master' a true
 upstream because I throw away topic branches after the maintainer
 merges them. If you have long-running branches that you work on a
 daily basis, the issue is somewhat different.

What I do is maybe somewhat gross, but I continually rebase my patches
forward as master develops. So they diverge from where Junio has forked
them upstream (which does not necessarily have any relationship with
where I forked from, anyway). The nice thing about this is that
eventually the topic becomes empty, as rebase drops patches that were
merged upstream (or resolve conflicts to end up at an empty patch).

It's a nice way of tracking the progress of the patch upstream, and it
catches any differences between what's upstream and what's in the topic
(in both directions: you see where the maintainer may have marked up
your patch, and you may see a place where you added something to be
squashed but the maintainer missed it). The downside is that sometimes
the conflicts are annoying and complicated (e.g., several patches that
touch the same spot are a pain to rebase on top of themselves; the early
ones are confused that the later changes are already in place).

 My primary concern is that the proposed @{publish} should be a
 first-class citizen; if it has everything that @{u} has, then we're
 both good: you'd primarily use @{u}, while I'd primarily use
 @{publish}.

Definitely. I think that's the world we want to work towards.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 12:31:57PM -0800, Junio C Hamano wrote:

c. Just leave it at Brodie's patch with nothing else on top.
 
  My thinking in favor of (b) was basically does anybody actually care
  about ambiguous refs in this situation anyway?. If they do, then I
  think (c) is my preferred choice.
 
 OK.  I agree with that line of thinking.  Let's take it one step at
 a time, i.e. do c. and also use warn_on_object_refname_ambiguity in
 rev-list --stdin first and leave the simplification (i.e. b.) for
 later.

Here's a series to do that. The first three are just cleanups I noticed
while looking at the problem.

While I was writing the commit messages, though, I had a thought. Maybe
we could simply do the check faster for the common case that most refs
do not look like object names? Right now we blindly call dwim_ref for
each get_sha1 call, which is the expensive part. If we instead just
loaded all of the refnames from the dwim_ref location (basically heads,
tags and the top-level of refs/), we could build an index of all of
the entries matching the 40-hex pattern. In 99% of cases, this would be
zero entries, and the check would collapse to a simple integer
comparison (and even if we did have one, it would be a simple binary
search in memory).

Our index is more racy than actually checking the filesystem, but I
don't think it matters here.

Anyway, here is the series I came up with, in the meantime. I can take a
quick peek at just making it faster, too.

  [1/4]: cat-file: refactor error handling of batch_objects
  [2/4]: cat-file: fix a minor memory leak in batch_objects
  [3/4]: cat-file: restore ambiguity warning flag in batch_objects
  [4/4]: revision: turn off object/refname ambiguity check for --stdin

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] cat-file: fix a minor memory leak in batch_objects

2014-01-07 Thread Jeff King
We should always have been freeing our strbuf, but doing so
consistently was annoying until the refactoring in the
previous patch.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 971cdde..ce79103 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -314,6 +314,7 @@ static int batch_objects(struct batch_options *opt)
break;
}
 
+   strbuf_release(buf);
return retval;
 }
 
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] cat-file: restore ambiguity warning flag in batch_objects

2014-01-07 Thread Jeff King
Since commit 25fba78, we turn off the object/refname
ambiguity warning using a global flag. However, we never
restore it. This doesn't really matter in the current code,
since the program generally exits immediately after the
function is done, but it's good code hygeine to clean up
after ourselves.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index ce79103..c64e287 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -264,6 +264,7 @@ static int batch_objects(struct batch_options *opt)
struct strbuf buf = STRBUF_INIT;
struct expand_data data;
int retval = 0;
+   int save_warning = warn_on_object_refname_ambiguity;
 
if (!opt-format)
opt-format = %(objectname) %(objecttype) %(objectsize);
@@ -314,6 +315,7 @@ static int batch_objects(struct batch_options *opt)
break;
}
 
+   warn_on_object_refname_ambiguity = save_warning;
strbuf_release(buf);
return retval;
 }
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] cat-file: refactor error handling of batch_objects

2014-01-07 Thread Jeff King
This just pulls the return value for the function out of the
inner loop, so we can break out of the loop rather than do
an early return. This will make it easier to put any cleanup
for the function in one place.

Signed-off-by: Jeff King p...@peff.net
---
Just making the subsequent diffs less noisy...

 builtin/cat-file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f8288c8..971cdde 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -263,6 +263,7 @@ static int batch_objects(struct batch_options *opt)
 {
struct strbuf buf = STRBUF_INIT;
struct expand_data data;
+   int retval = 0;
 
if (!opt-format)
opt-format = %(objectname) %(objecttype) %(objectsize);
@@ -294,8 +295,6 @@ static int batch_objects(struct batch_options *opt)
warn_on_object_refname_ambiguity = 0;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   int error;
-
if (data.split_on_whitespace) {
/*
 * Split at first whitespace, tying off the beginning
@@ -310,12 +309,12 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   error = batch_one_object(buf.buf, opt, data);
-   if (error)
-   return error;
+   retval = batch_one_object(buf.buf, opt, data);
+   if (retval)
+   break;
}
 
-   return 0;
+   return retval;
 }
 
 static const char * const cat_file_usage[] = {
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin

2014-01-07 Thread Jeff King
We currently check that any 40-hex object name we receive is
not also a refname, and output a warning if this is the
case.  When rev-list --stdin is used to receive object
names, we may receive a large number of inputs, and the cost
of checking each name as a ref is relatively high.

Commit 25fba78d already dropped this warning for cat-file
--batch-check. The same reasoning applies for rev-list
--stdin. Let's disable the check in that instance.

Here are before and after numbers:

  $ git rev-list --all commits

  [before]
  $ best-of-five -i commits ./git rev-list --stdin --no-walk --pretty=raw

  real0m0.675s
  user0m0.552s
  sys 0m0.120s

  [after]
  $ best-of-five -i commits ./git rev-list --stdin --no-walk --pretty=raw

  real0m0.415s
  user0m0.400s
  sys 0m0.012s

Signed-off-by: Jeff King p...@peff.net
---
Obviously we drop this one (and revert 25fba78d) if I can just make the
check faster.

 revision.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/revision.c b/revision.c
index a68fde6..87d04dd 100644
--- a/revision.c
+++ b/revision.c
@@ -1576,7 +1576,9 @@ static void read_revisions_from_stdin(struct rev_info 
*revs,
 {
struct strbuf sb;
int seen_dashdash = 0;
+   int save_warning = warn_on_object_refname_ambiguity;
 
+   warn_on_object_refname_ambiguity = 0;
strbuf_init(sb, 1000);
while (strbuf_getwholeline(sb, stdin, '\n') != EOF) {
int len = sb.len;
@@ -1595,6 +1597,7 @@ static void read_revisions_from_stdin(struct rev_info 
*revs,
REVARG_CANNOT_BE_FILENAME))
die(bad revision '%s', sb.buf);
}
+   warn_on_object_refname_ambiguity = save_warning;
if (seen_dashdash)
read_pathspec_from_stdin(revs, sb, prune);
strbuf_release(sb);
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] format-patch: introduce format.defaultTo

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 02:06:12PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  I think that is sensible, and only heightens my sense of the upstream
  push.default as useless. :)
 
 Yes, it only is good for centralized world (it was designed back in
 the centralized days after all, wasn't it?).

I do not think there is any centralized days. From day one, Linus
advocated a triangular workflow, and that is how git and kernel develop
has always been done. And that is why the default of matching was
there.

There were people who came later, and who still exist today, who use git
in an SVN-like centralized way. So if there were centralized days, we
are in them now. :)

I just do not see any real advantage even in a centralized world for
upstream versus current. Before remote.pushdefault, I can
potentially see some use (if you want to abuse @{upstream}), but now I
do not see any point.

And even in a centralized workflow, I see upstream creating problems.
E.g., you fork a feature branch in the centralized repo; it should not
get pushed straight back to master! And that is why we invented
simple, to prevent such things.

I dunno. I have not gone back and read all of the arguments around
push.default from last year. It is entirely possible everything I just
said was refuted back then, and I am needlessly rehashing old arguments.
I remember that Matthieu was one of the main advocates of upstream. I
am cc-ing him here to bring his attention (not just to this message, but
to the whole thread).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] speeding up 40-hex ambiguity check

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 05:08:56PM -0500, Jeff King wrote:

  OK.  I agree with that line of thinking.  Let's take it one step at
  a time, i.e. do c. and also use warn_on_object_refname_ambiguity in
  rev-list --stdin first and leave the simplification (i.e. b.) for
  later.
 
 Here's a series to do that. The first three are just cleanups I noticed
 while looking at the problem.
 
 While I was writing the commit messages, though, I had a thought. Maybe
 we could simply do the check faster for the common case that most refs
 do not look like object names? Right now we blindly call dwim_ref for
 each get_sha1 call, which is the expensive part. If we instead just
 loaded all of the refnames from the dwim_ref location (basically heads,
 tags and the top-level of refs/), we could build an index of all of
 the entries matching the 40-hex pattern. In 99% of cases, this would be
 zero entries, and the check would collapse to a simple integer
 comparison (and even if we did have one, it would be a simple binary
 search in memory).

That turned out very nicely, and I think we can drop the extra flag
entirely. Brodie's patch still makes sense, for people who do want to
turn off ambiguity warnings entirely (and I built on his patch, which
matters textually for 4 and 5, but it would be easy to rebase).

I'm cc-ing Michael, since it is his ref-traversal code I am butchering
in the 3rd patch. The first two are the unrelated cleanups from v1. They
are not necessary, but I do not see any reason not to include them.

  [1/5]: cat-file: refactor error handling of batch_objects
  [2/5]: cat-file: fix a minor memory leak in batch_objects
  [3/5]: refs: teach for_each_ref a flag to avoid recursion
  [4/5]: get_sha1: speed up ambiguous 40-hex test
  [5/5]: get_sha1: drop object/refname ambiguity flag

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects

2014-01-07 Thread Jeff King
We should always have been freeing our strbuf, but doing so
consistently was annoying until the refactoring in the
previous patch.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 971cdde..ce79103 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -314,6 +314,7 @@ static int batch_objects(struct batch_options *opt)
break;
}
 
+   strbuf_release(buf);
return retval;
 }
 
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] cat-file: refactor error handling of batch_objects

2014-01-07 Thread Jeff King
This just pulls the return value for the function out of the
inner loop, so we can break out of the loop rather than do
an early return. This will make it easier to put any cleanup
for the function in one place.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f8288c8..971cdde 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -263,6 +263,7 @@ static int batch_objects(struct batch_options *opt)
 {
struct strbuf buf = STRBUF_INIT;
struct expand_data data;
+   int retval = 0;
 
if (!opt-format)
opt-format = %(objectname) %(objecttype) %(objectsize);
@@ -294,8 +295,6 @@ static int batch_objects(struct batch_options *opt)
warn_on_object_refname_ambiguity = 0;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   int error;
-
if (data.split_on_whitespace) {
/*
 * Split at first whitespace, tying off the beginning
@@ -310,12 +309,12 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   error = batch_one_object(buf.buf, opt, data);
-   if (error)
-   return error;
+   retval = batch_one_object(buf.buf, opt, data);
+   if (retval)
+   break;
}
 
-   return 0;
+   return retval;
 }
 
 static const char * const cat_file_usage[] = {
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-07 Thread Jeff King
The normal for_each_ref traversal descends into
subdirectories, returning each ref it finds. However, in
some cases we may want to just iterate over the top-level of
a certain part of the tree.

The introduction of the flags option is a little
mysterious. We already have a flags option that gets stuck
in a callback struct and ends up interpreted in do_one_ref.
But the traversal itself does not currently have any flags,
and it needs to know about this new flag.

We _could_ introduce this as a completely separate flag
parameter. But instead, we simply put both flag types into a
single namespace, and make it available at both sites. This
is simple, and given that we do not have a proliferation of
flags (we have had exactly one until now), it is probably
sufficient.

Signed-off-by: Jeff King p...@peff.net
---
I think the flags thing is OK as explained above, but Michael may have a
different suggestion for refactoring.

 refs.c | 61 ++---
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/refs.c b/refs.c
index 3926136..ca854d6 100644
--- a/refs.c
+++ b/refs.c
@@ -589,6 +589,8 @@ static void sort_ref_dir(struct ref_dir *dir)
 
 /* Include broken references in a do_for_each_ref*() iteration: */
 #define DO_FOR_EACH_INCLUDE_BROKEN 0x01
+/* Do not recurse into subdirs, just iterate at a single level. */
+#define DO_FOR_EACH_NO_RECURSE 0x02
 
 /*
  * Return true iff the reference described by entry can be resolved to
@@ -661,7 +663,8 @@ static int do_one_ref(struct ref_entry *entry, void 
*cb_data)
  * called for all references, including broken ones.
  */
 static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
-   each_ref_entry_fn fn, void *cb_data)
+   each_ref_entry_fn fn, void *cb_data,
+   int flags)
 {
int i;
assert(dir-sorted == dir-nr);
@@ -669,9 +672,13 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, 
int offset,
struct ref_entry *entry = dir-entries[i];
int retval;
if (entry-flag  REF_DIR) {
-   struct ref_dir *subdir = get_ref_dir(entry);
-   sort_ref_dir(subdir);
-   retval = do_for_each_entry_in_dir(subdir, 0, fn, 
cb_data);
+   if (flags  DO_FOR_EACH_NO_RECURSE) {
+   struct ref_dir *subdir = get_ref_dir(entry);
+   sort_ref_dir(subdir);
+   retval = do_for_each_entry_in_dir(subdir, 0,
+ fn, cb_data,
+ flags);
+   }
} else {
retval = fn(entry, cb_data);
}
@@ -691,7 +698,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, 
int offset,
  */
 static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
 struct ref_dir *dir2,
-each_ref_entry_fn fn, void *cb_data)
+each_ref_entry_fn fn, void *cb_data,
+int flags)
 {
int retval;
int i1 = 0, i2 = 0;
@@ -702,10 +710,12 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
struct ref_entry *e1, *e2;
int cmp;
if (i1 == dir1-nr) {
-   return do_for_each_entry_in_dir(dir2, i2, fn, cb_data);
+   return do_for_each_entry_in_dir(dir2, i2, fn, cb_data,
+   flags);
}
if (i2 == dir2-nr) {
-   return do_for_each_entry_in_dir(dir1, i1, fn, cb_data);
+   return do_for_each_entry_in_dir(dir1, i1, fn, cb_data,
+   flags);
}
e1 = dir1-entries[i1];
e2 = dir2-entries[i2];
@@ -713,12 +723,15 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
if (cmp == 0) {
if ((e1-flag  REF_DIR)  (e2-flag  REF_DIR)) {
/* Both are directories; descend them in 
parallel. */
-   struct ref_dir *subdir1 = get_ref_dir(e1);
-   struct ref_dir *subdir2 = get_ref_dir(e2);
-   sort_ref_dir(subdir1);
-   sort_ref_dir(subdir2);
-   retval = do_for_each_entry_in_dirs(
-   subdir1, subdir2, fn, cb_data);
+   if (!(flags  DO_FOR_EACH_NO_RECURSE)) {
+   struct ref_dir *subdir1

[PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag

2014-01-07 Thread Jeff King
Now that our object/refname ambiguity test is much faster
(thanks to the previous commit), there is no reason for code
like cat-file --batch-check to turn it off. Here are
before and after timings with this patch (on git.git):

  $ git rev-list --objects --all | cut -d' ' -f1 objects

  [with flag]
  $ best-of-five -i objects ./git cat-file --batch-check
  real0m0.392s
  user0m0.368s
  sys 0m0.024s

  [without flag, without speedup; i.e., pre-25fba78]
  $ best-of-five -i objects ./git cat-file --batch-check
  real0m1.652s
  user0m0.904s
  sys 0m0.748s

  [without flag, with speedup]
  $ best-of-five -i objects ./git cat-file --batch-check
  real0m0.388s
  user0m0.356s
  sys 0m0.028s

So the new implementation does just as well as we did with
the flag turning the whole thing off (better actually, but
that is within the noise).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 9 -
 cache.h| 1 -
 environment.c  | 1 -
 sha1_name.c| 2 +-
 4 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index ce79103..afba21f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -285,15 +285,6 @@ static int batch_objects(struct batch_options *opt)
if (opt-print_contents)
data.info.typep = data.type;
 
-   /*
-* We are going to call get_sha1 on a potentially very large number of
-* objects. In most large cases, these will be actual object sha1s. The
-* cost to double-check that each one is not also a ref (just so we can
-* warn) ends up dwarfing the actual cost of the object lookups
-* themselves. We can work around it by just turning off the warning.
-*/
-   warn_on_object_refname_ambiguity = 0;
-
while (strbuf_getline(buf, stdin, '\n') != EOF) {
if (data.split_on_whitespace) {
/*
diff --git a/cache.h b/cache.h
index ce377e1..73afc38 100644
--- a/cache.h
+++ b/cache.h
@@ -566,7 +566,6 @@ extern int assume_unchanged;
 extern int prefer_symlink_refs;
 extern int log_all_ref_updates;
 extern int warn_ambiguous_refs;
-extern int warn_on_object_refname_ambiguity;
 extern int shared_repository;
 extern const char *apply_default_whitespace;
 extern const char *apply_default_ignorewhitespace;
diff --git a/environment.c b/environment.c
index 3c76905..c59f6d4 100644
--- a/environment.c
+++ b/environment.c
@@ -22,7 +22,6 @@ int prefer_symlink_refs;
 int is_bare_repository_cfg = -1; /* unspecified */
 int log_all_ref_updates = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
-int warn_on_object_refname_ambiguity = 1;
 int repository_format_version;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/sha1_name.c b/sha1_name.c
index f83ecb7..b9aaf74 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -451,7 +451,7 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
int at, reflog_len, nth_prior = 0;
 
if (len == 40  !get_sha1_hex(str, sha1)) {
-   if (warn_ambiguous_refs  warn_on_object_refname_ambiguity) {
+   if (warn_ambiguous_refs) {
if (sha1_is_ambiguous_with_ref(sha1)) {
warning(warn_msg, len, str);
if (advice_object_name_warning)
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test

2014-01-07 Thread Jeff King
Since 798c35f (get_sha1: warn about full or short object
names that look like refs, 2013-05-29), a 40-hex sha1 causes
us to call dwim_ref on the result, on the off chance that we
have a matching ref. This can cause a noticeable slow-down
when there are a large number of objects.  E.g., on
linux.git:

  [baseline timing]
  $ best-of-five git rev-list --all --pretty=raw
  real0m3.996s
  user0m3.900s
  sys 0m0.100s

  [same thing, but calling get_sha1 on each commit from stdin]
  $ git rev-list --all commits
  $ best-of-five -i commits git rev-list --stdin --pretty=raw
  real0m7.862s
  user0m6.108s
  sys 0m1.760s

The problem is that each call to dwim_ref individually stats
the possible refs in refs/heads, refs/tags, etc. In the
common case, there are no refs that look like sha1s at all.
We can therefore do the same check much faster by loading
all ambiguous-looking candidates once, and then checking our
index for each object.

This is technically more racy (somebody might create such a
ref after we build our index), but that's OK, as it's just a
warning (and we provide no guarantees about whether a
simultaneous process ran before or after the ref was created
anyway).

Here is the time after this patch, which implements the
strategy described above:

  $ best-of-five -i commits git rev-list --stdin --pretty=raw
  real0m4.966s
  user0m4.776s
  sys 0m0.192s

We still pay some price to read the commits from stdin, but
notice the system time is much lower, as we are avoiding
hundreds of thousands of stat() calls.

Signed-off-by: Jeff King p...@peff.net
---
I wanted to make the ref traversal as cheap as possible, hence the
NO_RECURSE flag I added. I thought INCLUDE_BROKEN used to not open up
the refs at all, but it looks like it does these days. I wonder if that
is worth changing or not.

 refs.c  | 47 +++
 refs.h  |  2 ++
 sha1_name.c |  4 +---
 3 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index ca854d6..cddd871 100644
--- a/refs.c
+++ b/refs.c
@@ -4,6 +4,7 @@
 #include tag.h
 #include dir.h
 #include string-list.h
+#include sha1-array.h
 
 /*
  * Make sure ref is something reasonable to have under .git/refs/;
@@ -2042,6 +2043,52 @@ int dwim_log(const char *str, int len, unsigned char 
*sha1, char **log)
return logs_found;
 }
 
+static int check_ambiguous_sha1_ref(const char *refname,
+   const unsigned char *sha1,
+   int flags,
+   void *data)
+{
+   unsigned char tmp_sha1[20];
+   if (strlen(refname) == 40  !get_sha1_hex(refname, tmp_sha1))
+   sha1_array_append(data, tmp_sha1);
+   return 0;
+}
+
+static void build_ambiguous_sha1_ref_index(struct sha1_array *idx)
+{
+   const char **rule;
+
+   for (rule = ref_rev_parse_rules; *rule; rule++) {
+   const char *prefix = *rule;
+   const char *end = strstr(prefix, %.*s);
+   char *buf;
+
+   if (!end)
+   continue;
+
+   buf = xmemdupz(prefix, end - prefix);
+   do_for_each_ref(ref_cache, buf, check_ambiguous_sha1_ref,
+   end - prefix,
+   DO_FOR_EACH_INCLUDE_BROKEN |
+   DO_FOR_EACH_NO_RECURSE,
+   idx);
+   free(buf);
+   }
+}
+
+int sha1_is_ambiguous_with_ref(const unsigned char *sha1)
+{
+   struct sha1_array idx = SHA1_ARRAY_INIT;
+   static int loaded;
+
+   if (!loaded) {
+   build_ambiguous_sha1_ref_index(idx);
+   loaded = 1;
+   }
+
+   return sha1_array_lookup(idx, sha1) = 0;
+}
+
 static struct ref_lock *lock_ref_sha1_basic(const char *refname,
const unsigned char *old_sha1,
int flags, int *type_p)
diff --git a/refs.h b/refs.h
index 87a1a79..c7d5f89 100644
--- a/refs.h
+++ b/refs.h
@@ -229,4 +229,6 @@ int update_refs(const char *action, const struct ref_update 
**updates,
 extern int parse_hide_refs_config(const char *var, const char *value, const 
char *);
 extern int ref_is_hidden(const char *);
 
+int sha1_is_ambiguous_with_ref(const unsigned char *sha1);
+
 #endif /* REFS_H */
diff --git a/sha1_name.c b/sha1_name.c
index a5578f7..f83ecb7 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -452,13 +452,11 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
 
if (len == 40  !get_sha1_hex(str, sha1)) {
if (warn_ambiguous_refs  warn_on_object_refname_ambiguity) {
-   refs_found = dwim_ref(str, len, tmp_sha1, real_ref);
-   if (refs_found  0) {
+   if (sha1_is_ambiguous_with_ref(sha1)) {
warning(warn_msg, len, str

[PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-07 Thread Jeff King
On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote:

 + if (flags  DO_FOR_EACH_NO_RECURSE) {
 + struct ref_dir *subdir = get_ref_dir(entry);
 + sort_ref_dir(subdir);
 + retval = do_for_each_entry_in_dir(subdir, 0,

Obviously this is totally wrong and inverts the point of the flag. And
causes something like half of the test suite to fail.

Michael was nice enough to point it out to me off-list, but well, I have
to face the brown paper bag at some point. :) In my defense, it was a
last minute refactor before going to dinner. That is what I get for
rushing out the series.

Here's a fixed version of patch 3/5.

-- 8 --
Subject: refs: teach for_each_ref a flag to avoid recursion

The normal for_each_ref traversal descends into
subdirectories, returning each ref it finds. However, in
some cases we may want to just iterate over the top-level of
a certain part of the tree.

The introduction of the flags option is a little
mysterious. We already have a flags option that gets stuck
in a callback struct and ends up interpreted in do_one_ref.
But the traversal itself does not currently have any flags,
and it needs to know about this new flag.

We _could_ introduce this as a completely separate flag
parameter. But instead, we simply put both flag types into a
single namespace, and make it available at both sites. This
is simple, and given that we do not have a proliferation of
flags (we have had exactly one until now), it is probably
sufficient.

Signed-off-by: Jeff King p...@peff.net
---
 refs.c | 61 ++---
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/refs.c b/refs.c
index 3926136..b70b018 100644
--- a/refs.c
+++ b/refs.c
@@ -589,6 +589,8 @@ static void sort_ref_dir(struct ref_dir *dir)
 
 /* Include broken references in a do_for_each_ref*() iteration: */
 #define DO_FOR_EACH_INCLUDE_BROKEN 0x01
+/* Do not recurse into subdirs, just iterate at a single level. */
+#define DO_FOR_EACH_NO_RECURSE 0x02
 
 /*
  * Return true iff the reference described by entry can be resolved to
@@ -661,7 +663,8 @@ static int do_one_ref(struct ref_entry *entry, void 
*cb_data)
  * called for all references, including broken ones.
  */
 static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
-   each_ref_entry_fn fn, void *cb_data)
+   each_ref_entry_fn fn, void *cb_data,
+   int flags)
 {
int i;
assert(dir-sorted == dir-nr);
@@ -669,9 +672,13 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, 
int offset,
struct ref_entry *entry = dir-entries[i];
int retval;
if (entry-flag  REF_DIR) {
-   struct ref_dir *subdir = get_ref_dir(entry);
-   sort_ref_dir(subdir);
-   retval = do_for_each_entry_in_dir(subdir, 0, fn, 
cb_data);
+   if (!(flags  DO_FOR_EACH_NO_RECURSE)) {
+   struct ref_dir *subdir = get_ref_dir(entry);
+   sort_ref_dir(subdir);
+   retval = do_for_each_entry_in_dir(subdir, 0,
+ fn, cb_data,
+ flags);
+   }
} else {
retval = fn(entry, cb_data);
}
@@ -691,7 +698,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, 
int offset,
  */
 static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
 struct ref_dir *dir2,
-each_ref_entry_fn fn, void *cb_data)
+each_ref_entry_fn fn, void *cb_data,
+int flags)
 {
int retval;
int i1 = 0, i2 = 0;
@@ -702,10 +710,12 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
struct ref_entry *e1, *e2;
int cmp;
if (i1 == dir1-nr) {
-   return do_for_each_entry_in_dir(dir2, i2, fn, cb_data);
+   return do_for_each_entry_in_dir(dir2, i2, fn, cb_data,
+   flags);
}
if (i2 == dir2-nr) {
-   return do_for_each_entry_in_dir(dir1, i1, fn, cb_data);
+   return do_for_each_entry_in_dir(dir1, i1, fn, cb_data,
+   flags);
}
e1 = dir1-entries[i1];
e2 = dir2-entries[i2];
@@ -713,12 +723,15 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
if (cmp == 0

Re: It seems there is a very tight character count limit in .gitconfig

2014-01-07 Thread Jeff King
On Wed, Jan 08, 2014 at 02:59:37PM +0800, Li Zhang wrote:

 I tried to add url xxx insteadof xxx in .gitconfig. If the length of
 url exceed 125, git will not work.
 I am using Ubuntu. The default version is 1.7.9.5. Maybe the latest
 version solve this already.

Yes, this was fixed in 0971e99 (Remove the hard coded length limit on
variable names in config files, 2012-09-30). Git v1.8.0.1 and higher
contain that commit.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 0/5] branch@{publish} shorthand

2014-01-08 Thread Jeff King
On Wed, Jan 08, 2014 at 03:05:48AM +0530, Ramkumar Ramachandra wrote:

 Agreed. I'll start working on @{publish}. It's going to take quite a
 bit of effort, because I won't actually start using it until my prompt
 is @{publish}-aware.

There's a fair bit of refactoring involved. I took a stab at it and came
up with the series below. No docs or tests, and some of the refactoring
in remote.c feels a little weird. I can't help but feel more of the
logic from git push should be shared here.

But it at least works with my rudimentary examples. I'm hoping it will
make a good starting point for you to build on. Otherwise, I may get to
it eventually, but it's not a high priority for me right now.

 Actually, I'm not sure I'd use git rebase @{pu}; for me @{pu} is
 mainly a source of information for taking apart to build a new series.

Ah, that's how I'd probably use it, too. :)

  [1/5]: sha1_name: refactor upstream_mark
  [2/5]: interpret_branch_name: factor out upstream handling
  [3/5]: branch_get: return early on error
  [4/5]: branch_get: provide per-branch pushremote pointers
  [5/5]: implement @{publish} shorthand

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] sha1_name: refactor upstream_mark

2014-01-08 Thread Jeff King
We will be adding new mark types in the future, so separate
the suffix data from the logic.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_name.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index b1873d8..0c50801 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -415,12 +415,12 @@ static int ambiguous_path(const char *path, int len)
return slash;
 }
 
-static inline int upstream_mark(const char *string, int len)
+static inline int at_mark(const char *string, int len,
+ const char **suffix, int nr)
 {
-   const char *suffix[] = { @{upstream}, @{u} };
int i;
 
-   for (i = 0; i  ARRAY_SIZE(suffix); i++) {
+   for (i = 0; i  nr; i++) {
int suffix_len = strlen(suffix[i]);
if (suffix_len = len
 !memcmp(string, suffix[i], suffix_len))
@@ -429,6 +429,12 @@ static inline int upstream_mark(const char *string, int 
len)
return 0;
 }
 
+static inline int upstream_mark(const char *string, int len)
+{
+   const char *suffix[] = { @{upstream}, @{u} };
+   return at_mark(string, len, suffix, ARRAY_SIZE(suffix));
+}
+
 static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned 
lookup_flags);
 static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf);
 
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] branch_get: return early on error

2014-01-08 Thread Jeff King
Right now we simply check if ret is valid before doing
further processing. As we add more processing, this will
become more and more cumbersome. Instead, let's just check
whether ret is invalid and return early with the error.

Signed-off-by: Jeff King p...@peff.net
---
 remote.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index a89efab..a773004 100644
--- a/remote.c
+++ b/remote.c
@@ -1543,7 +1543,10 @@ struct branch *branch_get(const char *name)
ret = current_branch;
else
ret = make_branch(name, 0);
-   if (ret  ret-remote_name) {
+   if (!ret)
+   return NULL;
+
+   if (ret-remote_name) {
ret-remote = remote_get(ret-remote_name);
if (ret-merge_nr) {
int i;
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] branch_get: provide per-branch pushremote pointers

2014-01-08 Thread Jeff King
When a caller uses branch_get to retrieve a struct branch,
they get the per-branch remote name and a pointer to the
remote struct. However, they have no way of knowing about
the per-branch pushremote from this interface.

Let's expose that information via fields similar to
remote and remote_name.

We have to do a little refactoring around the configuration
reading here. Instead of pushremote_name being its own
allocated string, it instead becomes a pointer to one of:

  1. The pushremote_name of the current branch, if
 configured.

  2. The globally configured remote.pushdefault, which we
 store separately as pushremote_config_default.

We can then set the branch's pushremote field by doing the
normal sequence of config fallback.

Signed-off-by: Jeff King p...@peff.net
---
 remote.c | 23 +++
 remote.h |  2 ++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/remote.c b/remote.c
index a773004..53e40e0 100644
--- a/remote.c
+++ b/remote.c
@@ -50,6 +50,7 @@ static int branches_nr;
 static struct branch *current_branch;
 static const char *default_remote_name;
 static const char *pushremote_name;
+static const char *pushremote_config_default;
 static int explicit_default_remote_name;
 
 static struct rewrites rewrites;
@@ -351,9 +352,10 @@ static int handle_config(const char *key, const char 
*value, void *cb)
explicit_default_remote_name = 1;
}
} else if (!strcmp(subkey, .pushremote)) {
+   if (git_config_string(branch-pushremote_name, key, 
value))
+   return -1;
if (branch == current_branch)
-   if (git_config_string(pushremote_name, key, 
value))
-   return -1;
+   pushremote_name = branch-pushremote_name;
} else if (!strcmp(subkey, .merge)) {
if (!value)
return config_error_nonbool(key);
@@ -385,8 +387,11 @@ static int handle_config(const char *key, const char 
*value, void *cb)
name = key + 7;
 
/* Handle remote.* variables */
-   if (!strcmp(name, pushdefault))
-   return git_config_string(pushremote_name, key, value);
+   if (!strcmp(name, pushdefault)) {
+   if (git_config_string(pushremote_config_default, key, value)  
0)
+   return -1;
+   pushremote_name = pushremote_config_default;
+   }
 
/* Handle remote.name.* variables */
if (*name == '/') {
@@ -1560,6 +1565,16 @@ struct branch *branch_get(const char *name)
}
}
}
+
+   if (ret-pushremote_name)
+   ret-pushremote = remote_get(ret-pushremote_name);
+   else if (pushremote_config_default)
+   ret-pushremote = remote_get(pushremote_config_default);
+   else if (ret-remote_name)
+   ret-pushremote = remote_get(ret-remote_name);
+   else
+   ret-pushremote = remote_get(origin);
+
return ret;
 }
 
diff --git a/remote.h b/remote.h
index 00c6a76..e5beb30 100644
--- a/remote.h
+++ b/remote.h
@@ -200,6 +200,8 @@ struct branch {
 
const char *remote_name;
struct remote *remote;
+   const char *pushremote_name;
+   struct remote *pushremote;
 
const char **merge_name;
struct refspec **merge;
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] interpret_branch_name: factor out upstream handling

2014-01-08 Thread Jeff King
This function checks a few different @{}-constructs. The
early part checks for and dispatches us to helpers for each
construct, but the code for handling @{upstream} is inline.

Let's factor this out into its own function. This makes
interpret_branch_name more readable, and will make it much
simpler to add more constructs in future patches.

While we're at it, let's also break apart the refactored
code into a few helper functions. These will be useful when
we implement similar @{upstream}-like constructs.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_name.c | 83 ++---
 1 file changed, 52 insertions(+), 31 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 0c50801..50df5d4 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1052,6 +1052,54 @@ static int reinterpret(const char *name, int namelen, 
int len, struct strbuf *bu
return ret - used + len;
 }
 
+static void set_shortened_ref(struct strbuf *buf, const char *ref)
+{
+   char *s = shorten_unambiguous_ref(ref, 0);
+   strbuf_reset(buf);
+   strbuf_addstr(buf, s);
+   free(s);
+}
+
+static const char *get_upstream_branch(const char *branch_buf, int len)
+{
+   char *branch = xstrndup(branch_buf, len);
+   struct branch *upstream = branch_get(*branch ? branch : NULL);
+
+   /*
+* Upstream can be NULL only if branch refers to HEAD and HEAD
+* points to something different than a branch.
+*/
+   if (!upstream)
+   die(_(HEAD does not point to a branch));
+   if (!upstream-merge || !upstream-merge[0]-dst) {
+   if (!ref_exists(upstream-refname))
+   die(_(No such branch: '%s'), branch);
+   if (!upstream-merge) {
+   die(_(No upstream configured for branch '%s'),
+   upstream-name);
+   }
+   die(
+   _(Upstream branch '%s' not stored as a remote-tracking 
branch),
+   upstream-merge[0]-src);
+   }
+   free(branch);
+
+   return upstream-merge[0]-dst;
+}
+
+static int interpret_upstream_mark(const char *name, int namelen,
+  int at, struct strbuf *buf)
+{
+   int len;
+
+   len = upstream_mark(name + at, namelen - at);
+   if (!len)
+   return -1;
+
+   set_shortened_ref(buf, get_upstream_branch(name, at));
+   return len + at;
+}
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
@@ -1076,9 +1124,7 @@ static int reinterpret(const char *name, int namelen, int 
len, struct strbuf *bu
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *cp;
-   struct branch *upstream;
int len = interpret_nth_prior_checkout(name, buf);
-   int tmp_len;
 
if (!namelen)
namelen = strlen(name);
@@ -1100,36 +1146,11 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
if (len  0)
return reinterpret(name, namelen, len, buf);
 
-   tmp_len = upstream_mark(cp, namelen - (cp - name));
-   if (!tmp_len)
-   return -1;
+   len = interpret_upstream_mark(name, namelen, cp - name, buf);
+   if (len  0)
+   return len;
 
-   len = cp + tmp_len - name;
-   cp = xstrndup(name, cp - name);
-   upstream = branch_get(*cp ? cp : NULL);
-   /*
-* Upstream can be NULL only if cp refers to HEAD and HEAD
-* points to something different than a branch.
-*/
-   if (!upstream)
-   die(_(HEAD does not point to a branch));
-   if (!upstream-merge || !upstream-merge[0]-dst) {
-   if (!ref_exists(upstream-refname))
-   die(_(No such branch: '%s'), cp);
-   if (!upstream-merge) {
-   die(_(No upstream configured for branch '%s'),
-   upstream-name);
-   }
-   die(
-   _(Upstream branch '%s' not stored as a remote-tracking 
branch),
-   upstream-merge[0]-src);
-   }
-   free(cp);
-   cp = shorten_unambiguous_ref(upstream-merge[0]-dst, 0);
-   strbuf_reset(buf);
-   strbuf_addstr(buf, cp);
-   free(cp);
-   return len;
+   return -1;
 }
 
 int strbuf_branchname(struct strbuf *sb, const char *name)
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] implement @{publish} shorthand

2014-01-08 Thread Jeff King
In a triangular workflow, you may have a distinct
@{upstream} that you pull changes from, but publish by
default (if you typed git push) to a different remote (or
a different branch on the remote). It may sometimes be
useful to be able to quickly refer to that publishing point
(e.g., to see which changes you have that have not yet been
published).

This patch introduces the branch@{publish} shorthand (or
@{pu} to be even shorter). It refers to the tracking
branch of the remote branch to which you would push if you
were to push the named branch. That's a mouthful to explain,
so here's an example:

  $ git checkout -b foo origin/master
  $ git config remote.pushdefault github
  $ git push

Signed-off-by: Jeff King p...@peff.net
---
The implementation feels weird, like the where do we push to code
should be factored out from somewhere else. I think what we're doing
here is not _wrong_, but I don't like repeating what git push is doing
elsewhere. And I just punt on simple as a result. :)

 sha1_name.c | 76 -
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index 50df5d4..59ffa93 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -435,6 +435,12 @@ static inline int upstream_mark(const char *string, int 
len)
return at_mark(string, len, suffix, ARRAY_SIZE(suffix));
 }
 
+static inline int publish_mark(const char *string, int len)
+{
+   const char *suffix[] = { @{publish} };
+   return at_mark(string, len, suffix, ARRAY_SIZE(suffix));
+}
+
 static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned 
lookup_flags);
 static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf);
 
@@ -481,7 +487,8 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
nth_prior = 1;
continue;
}
-   if (!upstream_mark(str + at, len - at)) {
+   if (!upstream_mark(str + at, len - at) 
+   !publish_mark(str + at, len - at)) {
reflog_len = (len-1) - (at+2);
len = at;
}
@@ -1100,6 +1107,69 @@ static int interpret_upstream_mark(const char *name, int 
namelen,
return len + at;
 }
 
+static const char *get_publish_branch(const char *name_buf, int len)
+{
+   char *name = xstrndup(name_buf, len);
+   struct branch *b = branch_get(*name ? name : NULL);
+   struct remote *remote = b-pushremote;
+   const char *dst;
+   const char *track;
+
+   free(name);
+
+   if (!remote)
+   die(_(branch '%s' has no remote for pushing), b-name);
+
+   /* Figure out what we would call it on the remote side... */
+   if (remote-push_refspec_nr)
+   dst = apply_refspecs(remote-push, remote-push_refspec_nr,
+b-refname);
+   else
+   dst = b-refname;
+   if (!dst)
+   die(_(unable to figure out how '%s' would be pushed),
+   b-name);
+
+   /* ...and then figure out what we would call that remote here */
+   track = apply_refspecs(remote-fetch, remote-fetch_refspec_nr, dst);
+   if (!track)
+   die(_(%s@{publish} has no tracking branch for '%s'),
+   b-name, dst);
+
+   return track;
+}
+
+static int interpret_publish_mark(const char *name, int namelen,
+ int at, struct strbuf *buf)
+{
+   int len;
+
+   len = publish_mark(name + at, namelen - at);
+   if (!len)
+   return -1;
+
+   switch (push_default) {
+   case PUSH_DEFAULT_NOTHING:
+   die(_(cannot use @{publish} with push.default of 'nothing'));
+
+   case PUSH_DEFAULT_UNSPECIFIED:
+   case PUSH_DEFAULT_MATCHING:
+   case PUSH_DEFAULT_CURRENT:
+   set_shortened_ref(buf, get_publish_branch(name, at));
+   break;
+
+   case PUSH_DEFAULT_UPSTREAM:
+   set_shortened_ref(buf, get_upstream_branch(name, at));
+   break;
+
+   case PUSH_DEFAULT_SIMPLE:
+   /* ??? */
+   die(@{publish} with simple unimplemented);
+   }
+
+   return at + len;
+}
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
@@ -1150,6 +1220,10 @@ int interpret_branch_name(const char *name, int namelen, 
struct strbuf *buf)
if (len  0)
return len;
 
+   len = interpret_publish_mark(name, namelen, cp - name, buf);
+   if (len  0)
+   return len;
+
return -1;
 }
 
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body

Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-08 Thread Jeff King
On Tue, Jan 07, 2014 at 10:47:33PM -0500, Jeff King wrote:

 On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote:
 
  +   if (flags  DO_FOR_EACH_NO_RECURSE) {
  +   struct ref_dir *subdir = get_ref_dir(entry);
  +   sort_ref_dir(subdir);
  +   retval = do_for_each_entry_in_dir(subdir, 0,
 
 Obviously this is totally wrong and inverts the point of the flag. And
 causes something like half of the test suite to fail.

And while we're on the subject of my mistakes...

The patch needs the fixup below to ensure that retval is always set,
even when we do not recurse.

I'll hold off on sending a full re-roll of the patch, in the extremely
unlikely event that there are other small errors to be fixed. :)

diff --git a/refs.c b/refs.c
index aafbae9..99c72d0 100644
--- a/refs.c
+++ b/refs.c
@@ -679,7 +679,8 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, 
int offset,
retval = do_for_each_entry_in_dir(subdir, 0,
  fn, cb_data,
  flags);
-   }
+   } else
+   retval = 0;
} else {
retval = fn(entry, cb_data);
}
@@ -732,7 +733,8 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
retval = do_for_each_entry_in_dirs(
subdir1, subdir2,
fn, cb_data, flags);
-   }
+   } else
+   retval = 0;
i1++;
i2++;
} else if (!(e1-flag  REF_DIR)  !(e2-flag  
REF_DIR)) {
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] branch_get: provide per-branch pushremote pointers

2014-01-08 Thread Jeff King
On Wed, Jan 08, 2014 at 04:35:31AM -0500, Jeff King wrote:

 @@ -385,8 +387,11 @@ static int handle_config(const char *key, const char 
 *value, void *cb)
   name = key + 7;
  
   /* Handle remote.* variables */
 - if (!strcmp(name, pushdefault))
 - return git_config_string(pushremote_name, key, value);
 + if (!strcmp(name, pushdefault)) {
 + if (git_config_string(pushremote_config_default, key, value)  
 0)
 + return -1;
 + pushremote_name = pushremote_config_default;
 + }

This needs return 0 squashed in at the end of the conditional, of
course, to match the old behavior.

This patch passes the test suite by itself (with or without that fixup).
But oddly, it seems to fail t5531 when merged with 'next'. I can't
figure out why, though. It shouldn't affect any code that doesn't look
at branch-pushremote.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t5531: further matching fixups

2014-01-08 Thread Jeff King
Commit 43eb920 switched one of the sub-repository in this
test to matching to prepare for a world where the default
becomes simple. However, the main repository needs a
similar change.

We did not notice any test failure when merged with b2ed944
(push: switch default from matching to simple, 2013-01-04)
because t5531.6 is trying to provoke a failure of git push
due to a submodule check. When combined with b2ed944 the
push still fails, but for the wrong reason (because our
upstream setup does not exist, not because of the submodule).

Signed-off-by: Jeff King p...@peff.net
---
On Wed, Jan 08, 2014 at 05:27:07AM -0500, Jeff King wrote:

 This patch passes the test suite by itself (with or without that fixup).
 But oddly, it seems to fail t5531 when merged with 'next'. I can't
 figure out why, though. It shouldn't affect any code that doesn't look
 at branch-pushremote.

I still don't understand the full reason for this interaction, but the
failing test is actually somewhat broken in 'next' already. This patch
fixes it, and should be done regardless of the other series.

 t/t5531-deep-submodule-push.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
index 8c16e04..445bb5f 100755
--- a/t/t5531-deep-submodule-push.sh
+++ b/t/t5531-deep-submodule-push.sh
@@ -12,6 +12,7 @@ test_expect_success setup '
(
cd work 
git init 
+   git config push.default matching 
mkdir -p gar/bage 
(
cd gar/bage 
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] branch_get: provide per-branch pushremote pointers

2014-01-08 Thread Jeff King
On Wed, Jan 08, 2014 at 05:27:07AM -0500, Jeff King wrote:

 This patch passes the test suite by itself (with or without that fixup).
 But oddly, it seems to fail t5531 when merged with 'next'. I can't
 figure out why, though. It shouldn't affect any code that doesn't look
 at branch-pushremote.

OK, I figured it out. My patch calls:

  remote_get(origin)

which creates an origin remote, even if one does not exist (it assumes
it to be a URL origin). Later, when we want to decide if the push is
triangular or not, we ask for:

  remote_get(NULL);

which will internally look for a remote called origin. Before my patch
there was not such a remote, and so the push could not be triangular.
After my patch, it finds the bogus remote and says this thing exists,
and is not what we are pushing to; therefore the push is triangular.

The solution is that I should not be passing the term origin to
remote_get, but rather passing NULL and relying on it to figure out the
default remote correctly. I.e.:

diff --git a/remote.c b/remote.c
index 8724388..d214fa2 100644
--- a/remote.c
+++ b/remote.c
@@ -1574,7 +1574,7 @@ struct branch *branch_get(const char *name)
else if (ret-remote_name)
ret-pushremote = remote_get(ret-remote_name);
else
-   ret-pushremote = remote_get(origin);
+   ret-pushremote = remote_get(NULL);
 
return ret;
 }

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 23/28] Support shallow fetch/clone over smart-http

2014-01-08 Thread Jeff King
On Thu, Dec 05, 2013 at 08:02:50PM +0700, Nguyễn Thái Ngọc Duy wrote:

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 ---
  Documentation/gitremote-helpers.txt |  7 +++
  builtin/fetch-pack.c| 16 +---
  remote-curl.c   | 31 +--
  t/t5536-fetch-shallow.sh| 27 +++

I'm getting test failures in 'next' with GIT_TEST_HTTPD set, and they
are bisectable to this patch (actually, the moral equivalent of it, as
it looks like the commit message was tweaked along with the test number
when it was applied). The failures look like this:

  $ GIT_TEST_HTTPD=1 ./t5537-fetch-shallow.sh -v -i
  [...]
  ok 9 - fetch --update-shallow
  
  expecting success: 
  git clone --bare --no-local shallow 
$HTTPD_DOCUMENT_ROOT_PATH/repo.git 
  git clone $HTTPD_URL/smart/repo.git clone 
  (
  cd clone 
  git fsck 
  git log --format=%s origin/master actual 
  cat EOF expect 
  6
  5
  4
  3
  EOF
  test_cmp expect actual
  )
  
  Cloning into bare repository '/home/peff/compile/git/t/trash 
directory.t5537-fetch-shallow/httpd/www/repo.git'...
  remote: Counting objects: 19, done.
  remote: Compressing objects: 100% (7/7), done.
  remote: Total 19 (delta 0), reused 6 (delta 0)
  Receiving objects: 100% (19/19), done.
  Checking connectivity... done.
  Cloning into 'clone'...
  remote: Counting objects: 19, done.
  remote: Compressing objects: 100% (7/7), done.
  remote: Total 19 (delta 0), reused 19 (delta 0)
  Unpacking objects: 100% (19/19), done.
  Checking connectivity... done.
  Checking object directories: 100% (256/256), done.
  --- expect  2014-01-08 11:20:20.178546452 +
  +++ actual  2014-01-08 11:20:20.178546452 +
  @@ -1,3 +1,4 @@
  +7
   6
   5
   4
  not ok 10 - clone http repository


If you do end up tweaking the test, you may also want to fix:

 +LIB_HTTPD_PORT=${LIB_HTTPD_PORT-'5536'}
 +. $TEST_DIRECTORY/lib-httpd.sh

Since the test number got switched, it would be nice to tweak the port
number to match it, in case the real t5536 ever starts using lib-httpd.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] implement @{publish} shorthand

2014-01-09 Thread Jeff King
On Wed, Jan 08, 2014 at 03:42:09PM -0800, Junio C Hamano wrote:

  This patch introduces the branch@{publish} shorthand (or
  @{pu} to be even shorter). It refers to the tracking
 
 If @{u} can already be used for upstream, why not allow @{p} but
 require two letters @{pu}?  Just being curious---I am not advocating
 strongly for a shorter short-hand.
 
 Or is @{p} already taken by something and my memory is not
 functioning well?

It is my brain that was not functioning well. I somehow thought well,
@{u} is already taken, so we must use @{pu}. Which of course makes no
sense, unless you are middle-endian. :)

We may want to be cautious about giving up a short-and-sweet
single-letter, though, until the feature has proved itself. We could
also teach upstream_mark and friends to match unambiguous prefixes (so
@{u}, @{up}, @{upst}, etc). That means @{p} would work
immediately, but scripts should use @{publish} for future-proofing.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-09 Thread Jeff King
On Wed, Jan 08, 2014 at 12:29:51PM +0100, Michael Haggerty wrote:

  Here's a fixed version of patch 3/5.
 
 v2 4/5 doesn't apply cleanly on top of v3 3/5.  So I'm basing my review
 on the branch you have at GitHub peff/git jk/cat-file-warn-ambiguous;
 I hope it is the same.

Hrmph. I didn't have to do any conflict resolution during the rebase, so
I would think it would apply at least with am -3.

  -- 8 --
  Subject: refs: teach for_each_ref a flag to avoid recursion
  
  The normal for_each_ref traversal descends into
 
 You haven't changed any for_each_ref*() functions; you have only exposed
 the DO_FOR_EACH_NO_RECURSE option to the (static) functions
 for_each_entry*() and do_for_each_ref().  (This is part and parcel of
 your decision not to expose the new functionality in the refs API.)
 Please correct the line above.

Will do, and I'll add a note on not exposing it (basically because there
is not an existing flags parameter in the public API, and nobody needs
it).

   static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
  -   each_ref_entry_fn fn, void *cb_data)
  +   each_ref_entry_fn fn, void *cb_data,
  +   int flags)
   {
  int i;
  assert(dir-sorted == dir-nr);
 
 Please update the docstring for this function, which still says that it
 recurses without mentioning DO_FOR_EACH_NO_RECURSE.

Will do (and for the _in_dirs variant).

   static int do_for_each_entry(struct ref_cache *refs, const char *base,
  -each_ref_entry_fn fn, void *cb_data)
  +each_ref_entry_fn fn, void *cb_data,
  +int flags)
   {
  struct packed_ref_cache *packed_ref_cache;
  struct ref_dir *loose_dir;
 
 A few lines after this, do_for_each_entry() calls
 prime_ref_dir(loose_dir) to ensure that all of the loose references that
 will be iterated over are read before the packed-refs file is checked.
 It seems to me that prime_ref_dir() should also get a flags parameter to
 prevent it reading more loose references than necessary, something like
 this:

Hmm. I hadn't considered that, but yeah, it definitely nullifies part of
the purpose of the optimization.

However, is it safe to prime only part of the loose ref namespace? The
point of that priming is to avoid the race fixed in 98eeb09, which
depends on us caching the loose refs before the packed refs. But when we
read packed-refs, we will be reading and storing _all_ of it, even if we
do not touch it in this traversal. So it does not affect the race for
this traversal, but have we setup a cache situation where a subsequent
for_each_ref in the same process would be subject to the race?

I'm starting to wonder if this optimization is worth it.

  [...]
  @@ -1718,7 +1732,7 @@ static int do_for_each_ref(struct ref_cache *refs, 
  const char *base,
  data.fn = fn;
  data.cb_data = cb_data;
   
  -   return do_for_each_entry(refs, base, do_one_ref, data);
  +   return do_for_each_entry(refs, base, do_one_ref, data, flags);
   }
 
 This change makes the DO_FOR_EACH_NO_RECURSE option usable with
 do_for_each_ref() (even though it is never in fact used).  It should
 either be mentioned in the docstring or (if there is a reason not to
 allow it) explicitly prohibited.

Hrm, yeah. I guess there are no callers, and there is no plan for any.
So we could just pass 0 here, and then flags passed to
do_for_each_ref really is _just_ for the callback data that goes to
do_one_ref. That clears up the weird combined namespace stuff I
mentioned in the commit message, and is a bit cleaner. I'll take it in
that direction.

 It would be possible to use your new flag to speed up
 is_refname_available(), but it would be a little bit of work and I doubt
 that is_refname_available() is ever a bottleneck.

Yeah, agreed on both counts.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-09 Thread Jeff King
On Thu, Jan 09, 2014 at 09:51:24AM -0800, Junio C Hamano wrote:

  On Tue, Jan 07, 2014 at 06:58:50PM -0500, Jeff King wrote:
 
  +  if (flags  DO_FOR_EACH_NO_RECURSE) {
  +  struct ref_dir *subdir = get_ref_dir(entry);
  +  sort_ref_dir(subdir);
  +  retval = do_for_each_entry_in_dir(subdir, 0,
 
  Obviously this is totally wrong and inverts the point of the flag. And
  causes something like half of the test suite to fail.
 
  Michael was nice enough to point it out to me off-list, but well, I have
  to face the brown paper bag at some point. :) In my defense, it was a
  last minute refactor before going to dinner. That is what I get for
  rushing out the series.
 
 And perhaps a bad naming that calls for double-negation in the
 normal cases, which might have been less likely to happen it the new
 flag were called onelevel only or something, perhaps?

That may be a nicer name, but it was not the problem here. The problem
here is that I wrote:

  if (flags  DO_FOR_EACH_NO_RECURSE == 0)

to avoid the extra layer of parentheses, but of course that doesn't
work. And then when I switched it back, I screwed up the reversion.

I think the nicest way to write it would be to avoid negation at all,
as:

  if (flags  DO_FOR_EACH_RECURSE) {
 ... do the recursion ...

but that means flipping the default, requiring us to set the flag
explicitly in the existing callers (though there really aren't that
many).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t5537: fix incorrect expectation in test case 10

2014-01-09 Thread Jeff King
On Wed, Jan 08, 2014 at 07:13:19PM +0700, Nguyễn Thái Ngọc Duy wrote:

 Commit 48d25ca adds a new commit 7 to the repo that the next test case
 in commit 1609488 clones from. But the next test case does not expect
 this commit. For these tests, it's the bottom that's important, not
 the top. Fix the expected commit list.

Given the test output, I had a feeling it was something like this but
didn't dive in (and figured it would be obvious to you).

Patch looks sane. Thanks for a quick turnaround.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] implement @{publish} shorthand

2014-01-09 Thread Jeff King
On Thu, Jan 09, 2014 at 08:39:44AM -, Philip Oakley wrote:

 From: Jeff King p...@peff.net
 Sent: Wednesday, January 08, 2014 9:37 AM
 In a triangular workflow, you may have a distinct
 @{upstream} that you pull changes from, but publish by
 default (if you typed git push) to a different remote (or
 a different branch on the remote).
 
 One of the broader issues is the lack of _documenation_ about what
 the 'normal' naming convention is for the uspstream remote.
 Especially the implicit convention used within our documentation (and
 workflow).
 
 This is especially true for github users who will normally fork a
 repo of interest and then clone it from their own copy/fork. This
 means that the 'origin' remote is _not_ the upstream. See
 https://help.github.com/articles/fork-a-repo In my case 'origin' is
 my publish repo (as suggested by Github) while 'junio' is the
 upstream (as do some others). There are similar results from the
 likes of Stackoverflow.

Sure, and I have done the same thing (though I tend to clone from the
other person as origin, and only fork my own repo when I am ready to
push). But it shouldn't matter, should it? The whole point of the
upstream config is that git checkout -b topic junio/master does the
right thing, without caring about your naming convention.

So I'm not sure what you think should be said (or where). Telling me in
patch form is preferred. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion

2014-01-10 Thread Jeff King
On Fri, Jan 10, 2014 at 09:59:25AM +0100, Michael Haggerty wrote:

  However, is it safe to prime only part of the loose ref namespace?
 [...]
 
 prime_ref_dir() is called by do_for_each_entry(), which all the
 iteration functions pass through.  It is always called before the
 iteration starts, and it primes only the subtree of the refs hierarchy
 that is being iterated over.  For example, if iterating over
 refs/heads then it only primes references with that prefix.
 
 This is OK, because if later somebody iterates over a broader part of
 the refs hierarchy (say, refs), then priming is done again, including
 re-checking the packed refs.

Ah, right. This is the part I was forgetting: the next for_each_ref will
re-prime with the expanded view. Thanks for a dose of sanity.

I'll fix that in my re-roll.

 It would also be possible to swing in the other direction.  I don't
 remember a particular reason why I left the DO_FOR_EACH_INCLUDE_BROKEN
 handling at the do_for_each_ref() level rather than handling it at the
 do_for_each_entry() level.  But now that you are passing the flags
 parameter all the way down the call stack, it wouldn't cost anything to
 support both of the DO_FOR_EACH flags everywhere and just document it
 that way.

I think it was simply that it was an option that the traversal did not
need to know about (just like the trim option), so you kept it as
encapsulated as possible. I think I'll introduce it as a separate flag
namespace, as discussed in the previous email. It is the same amount of
refactoring work to merge them later as it is now, if we so choose.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test

2014-01-10 Thread Jeff King
On Wed, Jan 08, 2014 at 05:09:25PM +0100, Michael Haggerty wrote:

 It's not only racy WRT other processes.  If the current git process
 would create a new reference, it wouldn't be reflected in the cache.
 
 It's true that the main ref_cache doesn't invalidate itself
 automatically either when a new reference is created, so it's not really
 a fair complaint.  However, as we add places where the cache is
 invalidated, it is easy to overlook this cache that is stuck in static
 variables within a function definition and it is impossible to
 invalidate it.  Might it not be better to attach the cache to the
 ref_cache structure instead, and couple its lifetime to that object?

Yeah, I noticed that we don't ever invalidate the loose ref cache. I
think that's mostly fine, as we rely on resolve_ref to cover the cases
that need to be ordered properly. And in this particular case, it's
only a warning (and a rather obscure one, at that), so I think the
stakes are low.

That being said, it should not be hard at all to attach the cache to the
ref_cache. Since we are generated from that cache, the lifetimes should
be the same.

 Alternatively, the cache could be created and managed on the caller
 side, since the caller would know when the cache would have to be
 invalidated.  Also, different callers are likely to have different
 performance characteristics.  It is unlikely that the time to initialize
 the cache will be amortized in most cases; in fact, rev-list --stdin
 might be the *only* plausible use case.

The two I know of are rev-list --stdin and cat-file --batch-check.

 Regarding the overall strategy: you gather all refnames that could be
 confused with an SHA-1 into a sha1_array, then later look up SHA-1s in
 the array to see if they are ambiguous.  This is a very special-case
 optimization for SHA-1s.

Yes, it is very sha1-specific. Part of my goal was that in the common
case, the check would collapse to O(# of ambiguous refs), which is
typically 0.

That may be premature optimization, though. As you note below, doing a
few binary searches through the in-memory ref cache is _probably_ fine,
too. And we can do that without a separate index.

 I wonder whether another approach would gain almost the same amount of
 performance but be more general.  We could change dwim_ref() (or a
 version of it?) to read its data out of a ref_cache instead of going to
 disk every time.  Then, at the cost of populating the relevant parts of
 the ref_cache once, we would have fast dwim_ref() calls for all strings.

I'm very nervous about turning dwim_ref into a cache. As we noted above,
we never invalidate the cache, so any write-then-read operations could
get stale data. That is not as risky as caching, say, resolve_ref, but
it still makes me nervous. Caching just the warning has much lower
stakes.

 It's true that the lookups wouldn't be quite so fast--they would require
 a few bisects per refname lookup (one for each level in the refname
 hierarchy) and several refname lookups (one for each ref_rev_parse_rule)
 for every dwim_ref() call, vs. a single bisect in your current design.
 But this approach it would bring us most of the gain, it might
 nevertheless be preferable.

I don't think this would be all that hard to measure. I'll see what I
can do.

  I wanted to make the ref traversal as cheap as possible, hence the
  NO_RECURSE flag I added. I thought INCLUDE_BROKEN used to not open up
  the refs at all, but it looks like it does these days. I wonder if that
  is worth changing or not.
 
 What do you mean by open up the refs?  The loose reference files are
 read when populating the cache.  (Was that ever different?)

I meant actually open the ref files and read the sha1. But that is me
being dumb. We have always done that, as we must provide the sha1 via
to the for_each_ref callback.

That being said, we could further optimize this by not opening the files
at all (and make that the responsibility of do_one_ref, which we are
avoiding here). I am slightly worried about the open() cost of my
solution. It's amortized away in a big call, but it is probably
noticeable for something like `git rev-parse 40-hex`.

 This doesn't correctly handle the rule
 
   refs/remotes/%.*s/HEAD

 We might be willing to accept this limitation, but it should at least be
 mentioned somewhere.  OTOH if we want to handle this pattern as well, we
 could do use a technique like that of shorten_unambiguous_ref().

Yes, you're right. I considered this, but for some reason convinced
myself that it would be OK to just look for refs/remotes/40-hex in
this case (which is what my patch does). But obviously that's wrong.
The ref traversal won't find a _directory_ with that name.

I'll see how painful it is to make it work. I have to say I was tempted
to simply manually write the rules. It's a duplication that could go out
of sync with the ref_rev_parse rules, and that's nasty. But that list does
not change often, and the reverse-parsing of the rules is 

Re: [PATCH] t5531: further matching fixups

2014-01-10 Thread Jeff King
On Fri, Jan 10, 2014 at 03:34:59PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  ... but the
  failing test is actually somewhat broken in 'next' already.
 
 Hmph, in what way?  I haven't seen t5531 breakage on 'next', with or
 without your series...

The test still passes, but it is not testing the right thing anymore.

On 'next', run t5531. Test 6 is push fails when commit on
multiple branches if one branch has no remote and ends with:

  test_must_fail git push --recurse-submodules=check ../pub.git

But the output ends with:

  warning: push.default is unset; its implicit value has changed in
  Git 2.0 from 'matching' to 'simple'. To squelch this message
  [...]

  fatal: The current branch branch2 has no upstream branch.
  To push the current branch and set the remote as upstream, use

  git push --set-upstream ../pub.git branch2

When not merged with b2ed944 (push: switch default from matching to
simple), or with my patch to set push.default=matching explicitly, the
output is:

  The following submodule paths contain changes that can
  not be found on any remote:
gar/bage

  Please try

  git push --recurse-submodules=on-demand

  or cd to the path and use

  git push

  to push them to a remote.

  fatal: Aborting.

which is what the test is actually trying to check. So the push fails,
as we expect, but not for the right reason.

My other series for @{publish} had a bug that caused the push to
succeed. So that series was buggy (and I posted the fix already), but we
only noticed it because this test was not working (it should not care
about upstream/triangular config at all, but it accidentally did).

Does that clarify the situation?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] remote: introduce and fill branch-pushremote

2014-01-13 Thread Jeff King
On Sun, Jan 12, 2014 at 10:41:06PM +0530, Ramkumar Ramachandra wrote:

 When a caller uses branch_get() to retrieve a struct branch, they get
 the per-branch remote name and a pointer to the remote struct. However,
 they have no way of knowing about the per-branch pushremote from this
 interface. So, let's expose that information via fields similar to
 remote and remote_name; pushremote and pushremote_name.

Makes sense. This is similar to what I posted before, but stops short of
setting branch-pushremote based on default.pushremote. I think that's
a good thing. Your patch matches branch-remote better, and the logic
for doing that fallback should probably stay outside of the struct
branch construct.

All 3 patches look like sane building blocks to me.

One comment on this hunk, though:

   } else if (!strcmp(subkey, .pushremote)) {
 + if (git_config_string(branch-pushremote_name, key, 
 value))
 + return -1;
   if (branch == current_branch)
 - if (git_config_string(pushremote_name, key, 
 value))
 - return -1;
 + pushremote_name = branch-pushremote_name;

In this code (both before and after your patch), pushremote_name does
double-duty for storing both remote.pushdefault, and the current
branch's branch.*.pushremote. I introduced an extra variable in my
version of the patch to store remote.pushdefault directly, and turned
pushremote_name into an alias (either to the current branch config, or
to the global config).

I did that for two reasons, one minor and one that I think will come up
further in the topic:

  1. After your patch pushremote_name sometimes owns its memory (if
 allocated for remote.pushdefault), and sometimes not (if an alias to
 branch.*.pushremote). This isn't a problem in the current code,
 because we never actually free() the string, meaning that if you
 set push.default twice, we leak. But that probably does not matter
 too much, and we have many such minor leaks of global config.

  2. If the current branch has a branch.*.pushremote set, but we want to
 know where a _different_ branch would be pushed, we have no way to
 access remote.pushdefault (it gets overwritten in the hunk above).

 @{upstream} does not have this problem, because it is _only_
 defined if branch.*.remote is set. There is no such thing as
 defaulting to a remote.default (or origin) there, and we never
 need to look at default_remote_name.

 For @{publish}, though, I think we will want that default. The
 common config will be to simply set remote.pushdefault, rather
 than setting up branch.*.pushremote for each branch, and we would
 want @{publish} to handle that properly.

So I think your patch is OK as-is, as the problem in (2) does not show
up until later in the series. But I suspect you will need to do
something to address it (and I think it is fine as a patch that comes
later to do that refactoring).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] remote: introduce and fill branch-pushremote

2014-01-13 Thread Jeff King
On Mon, Jan 13, 2014 at 04:52:52PM +0530, Ramkumar Ramachandra wrote:

 Not sure I understand what the problem is. Let's say we have two
 branches: master, and side with remote.pushdefault = ram,
 branch.*.remote = origin, and branch.side.pushremote = peff. Now, when
 I query master's pushremote, I get ram and when I query side's
 pushremote, I get peff; all the logic for falling-back from
 branch.*.pushremote to remote.pushdefault to branch.*.remote is in
 branch_get(), so I need to do nothing extra on the caller-side. From
 the caller's perspective, why does it matter if the pushremote of a
 particular branch is due to branch.*.pushremote or remote.pushdefault?

Imagine your HEAD is at side. What should master@{publish} produce?
I would argue ram/master. Where does ram come from in your code?

It does not matter for actually pushing, because to do a non-default
push, you must always specify a remote. But @{publish} will ask the
question even if I am on 'side' now, what would happen if I were to
default-push on 'master'?.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] remote: introduce and fill branch-pushremote

2014-01-13 Thread Jeff King
On Mon, Jan 13, 2014 at 12:15:08PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  It does not matter for actually pushing, because to do a non-default
  push, you must always specify a remote. But @{publish} will ask the
  question even if I am on 'side' now, what would happen if I were to
  default-push on 'master'?.
 
 In a similar wording to yours, it can be said that B@{upstream} is
 what would happen if I were to default-pull on 'B'?.

Right. I wondered at first if there was a similar bug in @{upstream},
but as I noted earlier, it is not defined if a per-branch remote is not
set. The answer to your question above is nothing, so we do not have
to worry about it. :)

 A related tangent is what should B@{publish} should yield when there
 is no triangular configuration variables like remote.pushdefault,
 branch.B.pushremote and a possible future extension branch.B.push
 are defined.  The definition you gave, i.e. if I were to
 default-push, gives a good guideline, I think.

Yes, that is what I tried for with my original patches. (e.g.,
push.default=upstream should just make @{publish} a synonym for
@{upstream}, which is what my patch did). I punted on simple, but it
would ideally do the same thing as push. Which is why I do not think
my patches are appropriate as-is; they need to somehow share the logic
with git push rather than try to reimplement it.

 I.e. git push origin master does tell us to push out 'master', but
 it does not explicitly say what ref to update.  It may be set to
 update their remotes/satellite/master when we are emulating a fetch
 in reverse by pushing, via e.g.
 
   [remote origin]
   push = refs/heads/master:refs/remotes/satellite/master
 
 and it would be intuitive if we make master@{publish} resolve to
 refs/remotes/satellite/master in such a case.

Right. And my patches did that (or at least I intended them to :) ) by
applying the push refspec (if any), and then applying the fetch refspec
on top of that. But again, that seems like policy that should be shared
with git push.

That being said, I do not think your example is the best one for
@{publish}. You have not specified any remote at all. I think the
closest push behavior for @{publish} would be something like:

  git checkout master  git push

I.e., where would _that_ push go?

 One thing that makes things a bit fuzzy is what should happen if
 you have more than one push destinations.  For example:
 
   [remote home]
   pushurl = ...
 push = refs/heads/master:refs/remotes/satellite/master
 
   [remote github]
   pushurl = ...
 mirror
 
   [remote]
   pushdefault = ???
 
 git push home updates their 'refs/remotes/satellite/master' with
 my 'master' with the above, while git push github will update
 their 'refs/heads/master' with 'master'.
 
 We can say master@{publish} is 'remotes/satellite/master' if
 remote.pushdefault (or 'branch.master.pushremote) is set to 'home',
 it is 'master' if it is 'github', and if git push while sitting on
 'master' does not push it anywhere then master@{publish} is an
 error.  There may be a better definition of what if I were to
 default-push really means, but I don't think of any offhand.

Exactly.  I do not think the multiple push destinations matter here,
because it is always what would I do if I were on the branch.  At most
one of them can be the default in that case (based on the config as you
noted).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test

2014-01-14 Thread Jeff King
On Fri, Jan 10, 2014 at 04:41:20AM -0500, Jeff King wrote:

 That being said, we could further optimize this by not opening the files
 at all (and make that the responsibility of do_one_ref, which we are
 avoiding here). I am slightly worried about the open() cost of my
 solution. It's amortized away in a big call, but it is probably
 noticeable for something like `git rev-parse 40-hex`.

I took a look at this. It gets a bit hairy. My strategy is to add a flag
to ask read_loose_refs to create REF_INCOMPLETE values. We currently use
this flag for loose REF_DIRs to mean we haven't opendir()'d the
subdirectory yet. This would extend it to the non-REF_DIR case to mean
we haven't opened the loose ref file yet. We'd check REF_INCOMPLETE
before handing the ref_entry to a callback, and complete it if
necessary.

It gets ugly, though, because we need to pass that flag through quite a
bit of callstack. get_ref_dir() needs to know it, which means all of
find_containing_dir, etc need it, meaning it pollutes all of the
packed-refs code paths too.

I have a half-done patch in this direction if that doesn't sound too
nasty.

  This doesn't correctly handle the rule
  
  refs/remotes/%.*s/HEAD
 [...]

 I'll see how painful it is to make it work.

It's actually reasonably painful. I thought at first we could get away
with more cleverly parsing the rule, find the prefix (up to the
placeholder), and then look for the suffix (/HEAD) inside there. But
it can never work with the current do_for_each_* code. That code only
triggers a callback when we see a concrete ref. It _never_ lets the
callbacks see an intermediate directory.

So a NO_RECURSE flag is not sufficient to handle this case. I'd need to
teach do_for_each_ref to recurse based on pathspecs, or a custom
callback function. And that is getting quite complicated.

I think it might be simpler to just do my own custom traversal. What I
need is much simpler than what do_for_each_entry provides. I don't need
recursion, and I don't actually need to look at the loose and packed
refs together. It's OK for me to do them one at a time because I don't
care about the actual value; I just want to know about which refs exist.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: check-ref-format and rev-parse can not handle branches with an @ in their name combined with @{u}

2014-01-14 Thread Jeff King
On Tue, Jan 14, 2014 at 03:45:27PM -0800, Junio C Hamano wrote:

  Yet 'git check-ref-format --branch @mybranch@{u}' 
  claims @mybranch is an invalid branch name.
 
 I do not think it claims any such thing.
 
 $ git check-ref-format --branch @foo@{u}; echo $?
 fatal: '@foo@{u}' is not a valid branch name
 128
 $ git check-ref-format --branch @foo; echo $?
 @foo
 0
 
 The former asks Is the string '@foo@{u}' a suitable name to give a
 branch? and the answer is no.  The latter asks the same question
 about the string '@foo', and the answer is yes.

Is that what --branch does? I have never used it, but the manpage
seems to suggest it is about _parsing_ (which, IMHO, means it probably
should have been an option to rev-parse, but that is another issue
altogether).

So a more interesting output than the above is:
 
  $ git checkout -t -b @mybranch origin/master
  Branch @mybranch set up to track remote branch master from origin.
  Switched to a new branch '@mybranch'

  $ git check-ref-format --branch @mybranch@{u}; echo $?
  fatal: '@mybranch@{u}' is not a valid branch name
  128

  $ git check-ref-format --branch HEAD@{u}; echo $?
  origin/master
  0

So check-ref-format --branch does understand @{u} syntax. But it
doesn't like @mybranch@{u}. You can see the same problem with rev-parse:

  $ git rev-parse --symbolic-full-name HEAD@{u}
  refs/remotes/origin/master
  $ git rev-parse --symbolic-full-name @mybranch@{u}
  @mybranch@{u}
  fatal: ambiguous argument '@mybranch@{u}': unknown revision or path
  not in the working tree.

So I do think there is a bug. The interpret_branch_name parser somehow
gets confused by the @ in the name.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: check-ref-format and rev-parse can not handle branches with an @ in their name combined with @{u}

2014-01-14 Thread Jeff King
On Tue, Jan 14, 2014 at 11:46:58PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Is that what --branch does? I have never used it, but the manpage
  seems to suggest it is about _parsing_ (which, IMHO, means it probably
  should have been an option to rev-parse, but that is another issue
  altogether).
 
 Ahh, of course you are right.  I never use it, and somehow thought
 it was just prepending refs/heads/ to its arguments, but it seems to
 want to do a lot more than that.

I am just about done with a patch series to address this, and a few
other related bugs I found. So don't look too hard; I should have
something out in a few minutes.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] interpret_branch_name bug potpourri

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 12:00:03AM -0500, Jeff King wrote:

   $ git rev-parse --symbolic-full-name HEAD@{u}
   refs/remotes/origin/master
   $ git rev-parse --symbolic-full-name @mybranch@{u}
   @mybranch@{u}
   fatal: ambiguous argument '@mybranch@{u}': unknown revision or path
   not in the working tree.
 
 So I do think there is a bug. The interpret_branch_name parser somehow
 gets confused by the @ in the name.

The somehow is because we only look for the first @, and never
consider any possible marks after that. The series below fixes it, along
with two other bugs I found while looking at this code. Ugh. Remind me
never to look at our object name parser ever again.

I feel pretty good that this is fixing real bugs and not regressing
anything else. I would not be surprised if there are other weird things
lurking, though. See the discussion in patch 4.

  [1/5]: interpret_branch_name: factor out upstream handling
  [2/5]: interpret_branch_name: rename cp variable to at
  [3/5]: interpret_branch_name: always respect namelen parameter
  [4/5]: interpret_branch_name: avoid @{upstream} past colon
  [5/5]: interpret_branch_name: find all possible @-marks

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] interpret_branch_name: factor out upstream handling

2014-01-15 Thread Jeff King
This function checks a few different @{}-constructs. The
early part checks for and dispatches us to helpers for each
construct, but the code for handling @{upstream} is inline.

Let's factor this out into its own function. This makes
interpret_branch_name more readable, and will make it much
simpler to further refactor the function in future patches.

While we're at it, let's also break apart the refactored
code into a few helper functions. These will be useful if we
eventually implement similar @{upstream}-like constructs.

Signed-off-by: Jeff King p...@peff.net
---
This is identical to the cleanup I posted recently for the @{publish}
topic, though I did tweak the commit message a bit to make more sense in
the context of this series.

 sha1_name.c | 83 ++---
 1 file changed, 52 insertions(+), 31 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index a5578f7..5db742b 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1046,6 +1046,54 @@ static int reinterpret(const char *name, int namelen, 
int len, struct strbuf *bu
return ret - used + len;
 }
 
+static void set_shortened_ref(struct strbuf *buf, const char *ref)
+{
+   char *s = shorten_unambiguous_ref(ref, 0);
+   strbuf_reset(buf);
+   strbuf_addstr(buf, s);
+   free(s);
+}
+
+static const char *get_upstream_branch(const char *branch_buf, int len)
+{
+   char *branch = xstrndup(branch_buf, len);
+   struct branch *upstream = branch_get(*branch ? branch : NULL);
+
+   /*
+* Upstream can be NULL only if branch refers to HEAD and HEAD
+* points to something different than a branch.
+*/
+   if (!upstream)
+   die(_(HEAD does not point to a branch));
+   if (!upstream-merge || !upstream-merge[0]-dst) {
+   if (!ref_exists(upstream-refname))
+   die(_(No such branch: '%s'), branch);
+   if (!upstream-merge) {
+   die(_(No upstream configured for branch '%s'),
+   upstream-name);
+   }
+   die(
+   _(Upstream branch '%s' not stored as a remote-tracking 
branch),
+   upstream-merge[0]-src);
+   }
+   free(branch);
+
+   return upstream-merge[0]-dst;
+}
+
+static int interpret_upstream_mark(const char *name, int namelen,
+  int at, struct strbuf *buf)
+{
+   int len;
+
+   len = upstream_mark(name + at, namelen - at);
+   if (!len)
+   return -1;
+
+   set_shortened_ref(buf, get_upstream_branch(name, at));
+   return len + at;
+}
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
@@ -1070,9 +1118,7 @@ static int reinterpret(const char *name, int namelen, int 
len, struct strbuf *bu
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *cp;
-   struct branch *upstream;
int len = interpret_nth_prior_checkout(name, buf);
-   int tmp_len;
 
if (!namelen)
namelen = strlen(name);
@@ -1094,36 +1140,11 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
if (len  0)
return reinterpret(name, namelen, len, buf);
 
-   tmp_len = upstream_mark(cp, namelen - (cp - name));
-   if (!tmp_len)
-   return -1;
+   len = interpret_upstream_mark(name, namelen, cp - name, buf);
+   if (len  0)
+   return len;
 
-   len = cp + tmp_len - name;
-   cp = xstrndup(name, cp - name);
-   upstream = branch_get(*cp ? cp : NULL);
-   /*
-* Upstream can be NULL only if cp refers to HEAD and HEAD
-* points to something different than a branch.
-*/
-   if (!upstream)
-   die(_(HEAD does not point to a branch));
-   if (!upstream-merge || !upstream-merge[0]-dst) {
-   if (!ref_exists(upstream-refname))
-   die(_(No such branch: '%s'), cp);
-   if (!upstream-merge) {
-   die(_(No upstream configured for branch '%s'),
-   upstream-name);
-   }
-   die(
-   _(Upstream branch '%s' not stored as a remote-tracking 
branch),
-   upstream-merge[0]-src);
-   }
-   free(cp);
-   cp = shorten_unambiguous_ref(upstream-merge[0]-dst, 0);
-   strbuf_reset(buf);
-   strbuf_addstr(buf, cp);
-   free(cp);
-   return len;
+   return -1;
 }
 
 int strbuf_branchname(struct strbuf *sb, const char *name)
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] interpret_branch_name: rename cp variable to at

2014-01-15 Thread Jeff King
In the original version of this function, cp acted as a
pointer to many different things. Since the refactoring in
the last patch, it only marks the at-sign in the string.
Let's use a more descriptive variable name.

Signed-off-by: Jeff King p...@peff.net
---
Obviously can be squashed with the prior refactoring, but I think
splitting it makes the diffs easier to read.

 sha1_name.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 5db742b..958aa2e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1117,7 +1117,7 @@ static int interpret_upstream_mark(const char *name, int 
namelen,
  */
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
-   char *cp;
+   char *at;
int len = interpret_nth_prior_checkout(name, buf);
 
if (!namelen)
@@ -1132,15 +1132,15 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
return reinterpret(name, namelen, len, buf);
}
 
-   cp = strchr(name, '@');
-   if (!cp)
+   at = strchr(name, '@');
+   if (!at)
return -1;
 
-   len = interpret_empty_at(name, namelen, cp - name, buf);
+   len = interpret_empty_at(name, namelen, at - name, buf);
if (len  0)
return reinterpret(name, namelen, len, buf);
 
-   len = interpret_upstream_mark(name, namelen, cp - name, buf);
+   len = interpret_upstream_mark(name, namelen, at - name, buf);
if (len  0)
return len;
 
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] interpret_branch_name: always respect namelen parameter

2014-01-15 Thread Jeff King
interpret_branch_name gets passed a name buffer to parse,
along with a namelen parameter representing its length. If
namelen is zero, we fallback to the NUL-terminated
string-length of name.

However, it does not necessarily follow that if we have
gotten a non-zero namelen, it is the NUL-terminated
string-length of name. E.g., when get_sha1() is parsing
foo:bar, we will be asked to operate only on the first
three characters.

Yet in interpret_branch_name and its helpers, we use string
functions like strchr() to operate on name, looking past
the length we were given.  This can result in us mis-parsing
object names.  We should instead be limiting our search to
namelen bytes.

There are three distinct types of object names this patch
addresses:

  - The intrepret_empty_at helper uses strchr to find the
next @-expression after our potential empty-at.  In an
expression like @:foo@bar, it erroneously thinks that
the second @ is relevant, even if we were asked only
to look at the first character. This case is easy to
trigger (and we test it in this patch).

  - When finding the initial @-mark for @{upstream}, we use
strchr.  This means we might treat foo:@{upstream} as
the upstream for foo:, even though we were asked only
to look at foo. We cannot test this one in practice,
because it is masked by another bug (which is fixed in
the next patch).

  - The interpret_nth_prior_checkout helper did not receive
the name length at all. This turns out not to be a
problem in practice, though, because its parsing is so
limited: it always starts from the far-left of the
string, and will not tolerate a colon (which is
currently the only way to get a smaller-than-strlen
namelen). However, it's still worth fixing to make the
code more obviously correct, and to future-proof us
against callers with more exotic buffers.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_name.c| 17 ++---
 t/t1508-at-combinations.sh | 15 ++-
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 958aa2e..6d5038d 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -430,7 +430,7 @@ static inline int upstream_mark(const char *string, int len)
 }
 
 static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned 
lookup_flags);
-static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf);
+static int interpret_nth_prior_checkout(const char *name, int namelen, struct 
strbuf *buf);
 
 static int get_sha1_basic(const char *str, int len, unsigned char *sha1)
 {
@@ -492,7 +492,7 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
struct strbuf buf = STRBUF_INIT;
int detached;
 
-   if (interpret_nth_prior_checkout(str, buf)  0) {
+   if (interpret_nth_prior_checkout(str, len, buf)  0) {
detached = (buf.len == 40  !get_sha1_hex(buf.buf, 
sha1));
strbuf_release(buf);
if (detached)
@@ -929,7 +929,8 @@ static int grab_nth_branch_switch(unsigned char *osha1, 
unsigned char *nsha1,
  * Parse @{-N} syntax, return the number of characters parsed
  * if successful; otherwise signal an error with negative value.
  */
-static int interpret_nth_prior_checkout(const char *name, struct strbuf *buf)
+static int interpret_nth_prior_checkout(const char *name, int namelen,
+   struct strbuf *buf)
 {
long nth;
int retval;
@@ -937,9 +938,11 @@ static int interpret_nth_prior_checkout(const char *name, 
struct strbuf *buf)
const char *brace;
char *num_end;
 
+   if (namelen  4)
+   return -1;
if (name[0] != '@' || name[1] != '{' || name[2] != '-')
return -1;
-   brace = strchr(name, '}');
+   brace = memchr(name, '}', namelen);
if (!brace)
return -1;
nth = strtol(name + 3, num_end, 10);
@@ -1012,7 +1015,7 @@ static int interpret_empty_at(const char *name, int 
namelen, int len, struct str
return -1;
 
/* make sure it's a single @, or @@{.*}, not @foo */
-   next = strchr(name + len + 1, '@');
+   next = memchr(name + len + 1, '@', namelen - len - 1);
if (next  next[1] != '{')
return -1;
if (!next)
@@ -1118,7 +1121,7 @@ static int interpret_upstream_mark(const char *name, int 
namelen,
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *at;
-   int len = interpret_nth_prior_checkout(name, buf);
+   int len = interpret_nth_prior_checkout(name, namelen, buf);
 
if (!namelen)
namelen = strlen(name);
@@ -1132,7 +1135,7 @@ int interpret_branch_name(const char *name, int namelen, 
struct strbuf *buf)
return reinterpret(name, namelen, len, buf

[PATCH 4/5] interpret_branch_name: avoid @{upstream} past colon

2014-01-15 Thread Jeff King
get_sha1() cannot currently parse a valid object name like
HEAD:@{upstream} (assuming that such an oddly named file
exists in the HEAD commit). It takes two passes to parse the
string:

  1. It first considers the whole thing as a ref, which
 results in looking for the upstream of HEAD:.

  2. It finds the colon, parses HEAD as a tree-ish, and then
 finds the path @{upstream} in the tree.

For a path that looks like a normal reflog (e.g.,
HEAD:@{yesterday}), the first pass is a no-op. We try to
dwim_ref(HEAD:), that returns zero refs, and we proceed
with colon-parsing.

For HEAD:@{upstream}, though, the first pass ends up in
interpret_upstream_mark, which tries to find the branch
HEAD:. When it sees that the branch does not exist, it
actually dies rather than returning an error to the caller.
As a result, we never make it to the second pass.

One obvious way of fixing this would be to teach
interpret_upstream_mark to simply report no, this isn't an
upstream in such a case. However, that would make the
error-reporting for legitimate upstream cases significantly
worse. Something like bogus@{upstream} would simply report
unknown revision: bogus@{upstream}, while the current code
diagnoses a wide variety of possible misconfigurations (no
such branch, branch exists but does not have upstream, etc).

However, we can take advantage of the fact that a branch
name cannot contain a colon. Therefore even if we find an
upstream mark, any prefix with a colon must mean that
the upstream mark we found is actually a pathname, and
should be disregarded completely. This patch implements that
logic.

Signed-off-by: Jeff King p...@peff.net
---
I think this would actually be cleaner if get_sha1() simply did the
colon-parsing first, and omitted the first pass completely. Then
sub-functions would not have to care about arbitrary junk that can come
in paths; they would always be parsing just the revision-specifier.

However, given the way this code has developed over the years and its
general fragility, I would be entirely unsurprised if there is some case
that relies on the current scheme. So I went with the simple fix here,
which should be much less likely to have any fallout. And I could not
come up with an example that is actually broken under the current code
(we just do some suboptimal things, like looking for foo:bar as a ref
in the filesystem, even though it is syntactically bogus).

 sha1_name.c   |  3 +++
 t/t1507-rev-parse-upstream.sh | 16 
 2 files changed, 19 insertions(+)

diff --git a/sha1_name.c b/sha1_name.c
index 6d5038d..b253a88 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1093,6 +1093,9 @@ static int interpret_upstream_mark(const char *name, int 
namelen,
if (!len)
return -1;
 
+   if (memchr(name, ':', at))
+   return -1;
+
set_shortened_ref(buf, get_upstream_branch(name, at));
return len + at;
 }
diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh
index 2a19e79..cace1ca 100755
--- a/t/t1507-rev-parse-upstream.sh
+++ b/t/t1507-rev-parse-upstream.sh
@@ -210,4 +210,20 @@ test_expect_success 'log -g other@{u}@{now}' '
test_cmp expect actual
 '
 
+test_expect_success '@{reflog}-parsing does not look beyond colon' '
+   echo content @{yesterday} 
+   git add @{yesterday} 
+   git commit -m funny reflog file 
+   git hash-object @{yesterday} expect 
+   git rev-parse HEAD:@{yesterday} actual
+'
+
+test_expect_success '@{upstream}-parsing does not look beyond colon' '
+   echo content @{upstream} 
+   git add @{upstream} 
+   git commit -m funny upstream file 
+   git hash-object @{upstream} expect 
+   git rev-parse HEAD:@{upstream} actual
+'
+
 test_done
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] interpret_branch_name: find all possible @-marks

2014-01-15 Thread Jeff King
When we parse a string like foo@{upstream}, we look for
the first @-sign, and check to see if it is an upstream
mark. However, since branch names can contain an @, we may
also see @foo@{upstream}. In this case, we check only the
first @, and ignore the second. As a result, we do not find
the upstream.

We can solve this by iterating through all @-marks in the
string, and seeing if any is a legitimate upstream or
empty-at mark.

Another strategy would be to parse from the right-hand side
of the string. However, that does not work for the
empty_at case, which allows @@{upstream}. We need to
find the left-most one in this case (and we then recurse as
HEAD@{upstream}).

Signed-off-by: Jeff King p...@peff.net
---
And this one actually fixes Keith's bug.

The diff is noisy due to indentation changes; try it with -b for
increased reading pleasure.

 sha1_name.c   | 20 +++-
 t/t1507-rev-parse-upstream.sh | 21 +
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index b253a88..6fca869 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1124,6 +1124,7 @@ static int interpret_upstream_mark(const char *name, int 
namelen,
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *at;
+   const char *start;
int len = interpret_nth_prior_checkout(name, namelen, buf);
 
if (!namelen)
@@ -1138,17 +1139,18 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
return reinterpret(name, namelen, len, buf);
}
 
-   at = memchr(name, '@', namelen);
-   if (!at)
-   return -1;
+   for (start = name;
+(at = memchr(start, '@', namelen - (start - name)));
+start = at + 1) {
 
-   len = interpret_empty_at(name, namelen, at - name, buf);
-   if (len  0)
-   return reinterpret(name, namelen, len, buf);
+   len = interpret_empty_at(name, namelen, at - name, buf);
+   if (len  0)
+   return reinterpret(name, namelen, len, buf);
 
-   len = interpret_upstream_mark(name, namelen, at - name, buf);
-   if (len  0)
-   return len;
+   len = interpret_upstream_mark(name, namelen, at - name, buf);
+   if (len  0)
+   return len;
+   }
 
return -1;
 }
diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh
index cace1ca..178694e 100755
--- a/t/t1507-rev-parse-upstream.sh
+++ b/t/t1507-rev-parse-upstream.sh
@@ -17,6 +17,9 @@ test_expect_success 'setup' '
 test_commit 4 
 git branch --track my-side origin/side 
 git branch --track local-master master 
+git branch --track fun@ny origin/side 
+git branch --track @funny origin/side 
+git branch --track funny@ origin/side 
 git remote add -t master master-only .. 
 git fetch master-only 
 git branch bad-upstream 
@@ -54,6 +57,24 @@ test_expect_success 'my-side@{upstream} resolves to correct 
full name' '
test refs/remotes/origin/side = $(full_name my-side@{u})
 '
 
+test_expect_success 'upstream of branch with @ in middle' '
+   full_name fun@ny@{u} actual 
+   echo refs/remotes/origin/side expect 
+   test_cmp expect actual
+'
+
+test_expect_success 'upstream of branch with @ at start' '
+   full_name @funny@{u} actual 
+   echo refs/remotes/origin/side expect 
+   test_cmp expect actual
+'
+
+test_expect_success 'upstream of branch with @ at end' '
+   full_name funny@@{u} actual 
+   echo refs/remotes/origin/side expect 
+   test_cmp expect actual
+'
+
 test_expect_success 'refs/heads/my-side@{upstream} does not resolve to 
my-side{upstream}' '
test_must_fail full_name refs/heads/my-side@{upstream}
 '
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After stash pop, refs/stash become 40 zeroes

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 01:56:52PM +0800, 乙酸鋰 wrote:

 what are the possible causes of this?
 After stash pop, refs/stash becomes 40 zeroes.
 This is the only stash, so refs/stash should be deleted after pop.
 However, it becomes 40 zeroes.
 
 git 1.8.x

I can't reproduce the problem here. Running this:

  git init -q repo  cd repo
  echo content file  git add file  git commit -qm file
  echo more file

  echo == stashed
  git stash -q
  ls -l .git/refs/stash .git/logs/refs/stash
  git rev-parse --verify refs/stash

  echo == popped
  git stash pop -q
  ls -l .git/refs/stash .git/logs/refs/stash
  git rev-parse --verify refs/stash

yields:

  == stashed
  -rw-r--r-- 1 peff peff 153 Jan 15 03:49 .git/logs/refs/stash
  -rw-r--r-- 1 peff peff  41 Jan 15 03:49 .git/refs/stash
  7fb40812ac4aaf7fa31e584863fc52c4b4a67aa0
  == popped
  ls: cannot access .git/refs/stash: No such file or directory
  ls: cannot access .git/logs/refs/stash: No such file or directory
  fatal: Needed a single revision

Does running it reproduce the problem for you? If it does, can you be
more specific about your git version?  If it does not, can you show what
you are doing differently to reproduce?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Diagnosing stray/stale .keep files -- explore what is in a pack?

2014-01-15 Thread Jeff King
On Tue, Jan 14, 2014 at 02:42:09PM -0500, Martin Langhoff wrote:

  On Tue, Jan 14, 2014 at 2:36 PM, Martin Fick mf...@codeaurora.org wrote:
  Perhaps the receiving process is dying hard and leaving
  stuff behind?  Out-of-memory, out of disk space?
 
 Yes, that's my guess as well. This server had gc misconfigured, so it
 hit ENOSPC a few weeks ago.
 
 It is likely that the .lock files were left behind back then, and
 since then the clients pushing to these refs were transferring their
 whole history and still failing to update the ref, leading to rapid
 repo growth.

We see these occasionally at GitHub, too. I haven't yet figured out a
definite cause, though whatever it is, it's relatively rare.

I think the .keep files and the .lock files are in two separate
boats, though.

pack-objects creates the .keep files as a lock between the time it
moves them into place and when receive-pack updates the refs (so that a
simultaneous prune does not think they should be removed). Receive-pack
then updates the refs and removes the .keep file. However, in the
interim code, we are just updating the refs, and are careful to return
any errors rather than calling die() (so if ENOSPC prevented ref write,
that would not cause this). So for us to leave a .keep there, it is
probably one of:

  1. A few generic library functions, like xmalloc, can cause us to die.
 This should be very rare, though.

  2. We tried to unlink the keep-file, but couldn't (could ENOSPC
 prevent a deletion? I suspect it depends on the filesystem).

  3. We were killed by signal (or system crash).

Fetch-pack also will create .keep files, and it is much less careful
during the time the file exists.  However, busy servers tend to be
receiving pushes, not initiating fetches.

Actual .lock files are added to a signal/atexit handle that cleans
them up automatically on program exit. So those really should be caused
by system crash (or kill -9), and that has generally been our
experience at GitHub. But again, if ENOSPC could prevent deletion on
your filesystem, it could be related. But there is not much git can do
to clean up if unlink() fails us.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hooks scripts and noexec partition

2014-01-15 Thread Jeff King
On Tue, Jan 14, 2014 at 04:41:03PM +0100, krz...@gmail.com  wrote:

 git can't execute hooks no partitions mounted with noexec - even if
 those are just scripts with shebang line

Right. Git does not know that they are shell (or other) scripts; they
could be anything, and the advertised interface is that git will run
exec on them (and it is explicitly OK for them to exist but not be
executable, and git takes this as a sign that they are inactive).

 and they actualy work by
 hooks/./post-comit (because I use small patch on kernel that allows
 running scripts that way on noexec partition)

If you are suggesting that git always execute them as hooks/./$hook,
that might make sense if such behavior is widespread. But it sounds like
you are running a custom kernel patch to get around the noexec setting.
Here is the custom git patch to match it. :)

diff --git a/run-command.c b/run-command.c
index 3914d9c..ae84e87 100644
--- a/run-command.c
+++ b/run-command.c
@@ -753,7 +753,7 @@ int finish_async(struct async *async)
 
 char *find_hook(const char *name)
 {
-   char *path = git_path(hooks/%s, name);
+   char *path = git_path(hooks/./%s, name);
if (access(path, X_OK)  0)
path = NULL;
 

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-log --cherry-pick gives different results when using tag or tag^{}

2014-01-15 Thread Jeff King
[+cc Junio, as the bug blames to him]

On Fri, Jan 10, 2014 at 02:15:40PM +0100, Francis Moreau wrote:

 In mykernel repository, I'm having 2 different behaviours with git-log
 but I don't understand why:
 
 Doing:
 
 $ git log --oneline --cherry-pick --left-right v3.4.71-1^{}...next
 
 and
 
 $ git log --oneline --cherry-pick --left-right v3.4.71-1...next
 
 give something different (where v3.4.71-1 is a tag).
 
 The command using ^{} looks the one that gives correct result I think.

Yeah, this looks like a bug. Here's a simple reproduction recipe:

  commit() {
echo content $1 
git add $1 
git commit -m $1
  }

  git init repo  cd repo 
  commit one 
  commit two 
  sleep 1 
  git tag -m foo mytag 
  git checkout -b side HEAD^ 
  git cherry-pick mytag 
  commit three

The sleep seems to be necessary, to give the commit and its
cherry-picked version different commit times (presumably because it
impacts the order in which we visit them during the traversal).

Running:

  git log --oneline --decorate --cherry-pick --left-right mytag^{}...HEAD

produces the expected:

   e36cc32 (HEAD, side) three

but running it with the tag, as:

  git log --oneline --decorate --cherry-pick --left-right mytag...HEAD

yields:

   e36cc32 (HEAD, side) three
   5e96f7d two
   db92fca (tag: mytag, master) two

Not only do we get the cherry-pick wrong (we should omit both twos),
but we seem to erroneously count the tagged two as being on the
right-hand side, which it clearly is not (and which is probably why we
don't find the match via --cherry-pick).

This worked in v1.8.4, but is broken in v1.8.5. It bisects to Junio's
895c5ba (revision: do not peel tags used in range notation, 2013-09-19),
which sounds promising.

I think what is happening is that we used to apply the SYMMETRIC_LEFT
flag directly to the commit. Now we apply it to the tag, and it does not
seem to get propagated. The patch below fixes it for me, but I have no
idea if we actually need to be setting the other flags, or just
SYMMETRIC_LEFT. I also wonder if the non-symmetric two-dot case needs to
access any pointed-to commit and propagate flags in a similar way.

diff --git a/revision.c b/revision.c
index 7010aff..1d99bfc 100644
--- a/revision.c
+++ b/revision.c
@@ -1197,6 +1197,8 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
free_commit_list(exclude);
 
a_flags = flags | SYMMETRIC_LEFT;
+   a-object.flags |= a_flags;
+   b-object.flags |= flags;
}
 
a_obj-flags |= a_flags;

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: stash in upstream caused remote fetch to fail

2014-01-15 Thread Jeff King
On Mon, Jan 06, 2014 at 07:19:43PM -0800, Junio C Hamano wrote:

 Actually, I think the above recollection of mine was completely
 bogus.  The  is there because we do allow things like HEAD (they
 are the funny ones) as well as things inside refs/, and the latter
 is the only thing we had a check-ref-format to dictate the format
 when the code was written.
 
 I do not mind tightening things a bit (e.g. outside refs/, only
 allow HEAD and nothing else).  A good first step might be to enforce
 allow-onelevel outside refs/ (so that we can allow HEAD) and for
 those inside refs/ check without allow-onelevel but without skipping
 the prefix.
 
 It is a separate story if it makes much sense to allow fetching
 refs/stash or ignoring when running git clone.  Operationally, I
 think it makes more sense to ignore refs/stash, not because it is a
 one-level name, but because what a stash is.

This discussion stalled, but I finally got around to looking at it
today. I agree that we should leave aside more complex policy for now,
and start with bringing the fetch and fetch-pack filters into
harmony. That turns off format-checking for things outside refs/ (so
allows HEAD), and checks the whole string for things inside refs/
(so it does not fall afoul of the one-level check).

I ended up with the patch below, which converts fetch-pack to match
fetch. However, reading the original one-level check in 03feddd
(git-check-ref-format: reject funny ref names., 2005-10-13), it does
seem like it was meant to reject refs/foo. It contains:

if (level  2)
return -1; /* at least of form heads/blah */

It seems like the meaning of this check changed over the years. I am not
sure if that was intentional or not. :)

So we could go the other direction, and harmonize on disallowing
refs/foo. I don't particularly care that much. The nasty thing now is
the mismatch between the two spots, which means that fetch doesn't
just ignore the ref, but bombs out with a missing object.


-- 8 --
Subject: fetch-pack: do not filter out one-level refs

Currently fetching a one-level ref like refs/foo does not
work consistently. The outer git fetch program filters the
list of refs, checking each against check_refname_format.
Then it feeds the result to do_fetch_pack to actually
negotiate the haves/wants and get the pack. The fetch-pack
code does its own filter, and it behaves differently.

The fetch-pack filter looks for refs in refs/, and then
feeds everything _after_ the slash (i.e., just foo) into
check_refname_format.  But check_refname_format is not
designed to look at a partial refname. It complains that the
ref has only one component, thinking it is at the root
(i.e., alongside HEAD), when in reality we just fed it a
partial refname.

As a result, we omit a ref like refs/foo from the pack
request, even though git fetch then tries to store the
resulting ref.  If we happen to get the object anyway (e.g.,
because the ref is contained in another ref we are
fetching), then the fetch succeeds. But if it is a unique
object, we fail when trying to update refs/foo.

We can fix this by just passing the whole refname into
check_refname_format; we know the part we were omitting is
refs/, which is acceptable in a refname. This at least
makes the checks consistent with each other.

This problem happens most commonly with refs/stash, which
is the only one-level ref in wide use. However, our test
does not use refs/stash, as we may later want to restrict
it specifically (not because it is one-level, but because
of the semantics of stashes).

We may also want to do away with the multiple levels of
filtering (which can cause problems when they are out of
sync), or even forbid one-level refs entirely. However,
those decisions can come later; this fixes the most
immediate problem, which is the mismatch between the two.

Signed-off-by: Jeff King p...@peff.net
---
 fetch-pack.c |  2 +-
 t/t5510-fetch.sh | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 760ed16..b28ccd1 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -505,7 +505,7 @@ static void filter_refs(struct fetch_pack_args *args,
next = ref-next;
 
if (!memcmp(ref-name, refs/, 5) 
-   check_refname_format(ref-name + 5, 0))
+   check_refname_format(ref-name, 0))
; /* trash */
else {
while (i  nr_sought) {
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 12674ac..ab28594 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -640,4 +640,15 @@ test_expect_success 'branchname D/F conflict resolved by 
--prune' '
test_cmp expect actual
 '
 
+test_expect_success 'fetching a one-level ref works' '
+   test_commit extra 
+   git reset --hard HEAD^ 
+   git update-ref refs/foo extra 
+   git init one-level 
+   (
+   cd one-level 
+   git fetch

Re: Bug report: stash in upstream caused remote fetch to fail

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 05:46:13AM -0500, Jeff King wrote:

 This discussion stalled, but I finally got around to looking at it
 today. I agree that we should leave aside more complex policy for now,
 and start with bringing the fetch and fetch-pack filters into
 harmony. That turns off format-checking for things outside refs/ (so
 allows HEAD), and checks the whole string for things inside refs/
 (so it does not fall afoul of the one-level check).

By the way, an interesting implication of this is that I do not think
there is any format check on things outside of refs/. If you were to do

  git fetch ... +*:*

you would write whatever crap the other side gives you. I can't imagine
any reason a client would _ever_ do that, though, so I don't think it's
a big deal. We tend to fetch HEAD explicitly by name, and that's it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] read-cache: connect to file watcher

2014-01-15 Thread Jeff King
On Sun, Jan 12, 2014 at 06:03:39PM +0700, Nguyễn Thái Ngọc Duy wrote:

 This patch establishes a connection between a new file watcher daemon
 and git. Each index file may have at most one file watcher attached to
 it. The file watcher maintains a UNIX socket at
 $GIT_DIR/index.watcher. Any process that has write access to $GIT_DIR
 can talk to the file watcher.

IIRC, this is not portable. Some systems (not Linux) will allow anyone
to connect to the socket if it the file is accessible to them (so
anybody with read access to $GIT_DIR can talk to the file watcher). The
usual trick is to put it in a sub-directory that only the connectors can
access (e.g., put it in $GIT_DIR/watcher/index, and create watcher
mode 0700).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Consistency question

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 11:37:08AM +0100, David Kastrup wrote:

 The question is what guarantees I have with regard to the commit date of
 a commit in relation to that of its parent commits:
 
 a) none
 b) commitdate(child) = commitdate(parent)
 c) commitdate(child)  commitdate(parent)

a) none

 Obviously, I can rely on c) being true almost always:

Actually, b) is quite often the case in automated processes (e.g., git
am or git rebase). The author dates are different, but the committer
dates may be in the same second.

And of course a) is the result of clock skew and software bugs.

 it's definitely
 good for a heuristic used for improving performance (meaning as an
 ordering criterion for a commit priority queue).  The problem is how
 much I should cater for graceful behavior for the cases where it's not.

Yes, this is exactly how git uses it. We generally walk breadth-first
through the graph, relying on commit times for performance but not
correctness.

There are some parts of the code that will behave badly with clock skew.
For example, --since will stop traversing when we hit a certain point.
It requires a fixed number of too old commits before quitting, though,
in an attempt to bypass small runs of skewed clocks.

The git describe --contains algorithm will also produce wrong results
in the face of skew. I believe it uses a slop of 24 hours in its skew.

The current tag --contains algorithm is currently correct in the face
of skew, but it can go much faster if you accept that skew will cause it
to go wrong. I suspect there are other algorithms that could be sped up,
too, if we had trustworthy generation numbers (I implemented and timed
the --contains algorithm, but haven't done so for other algorithms).

I've played with calculating and caching the generation numbers at
repack time, but there aren't any patches currently under consideration.

 Does git do any actual checks before pushing?

No. One problem with checking the commit relationships is that it may
not be the new commit which is broken (by skewing backwards), but rather
the commit it builds on (which has skewed forwards). If you are building
on such a fast-forward commit, it is often to late to fix that commit.
So you have to fast-forward yourself, propagating the bogus value.

If the receiving machine checked the incoming commits against its own
internal clock, that could work. You would still want to check the
commit relationships to catch new commits that erroneously claim to be
from the past, though (that is, before their parents).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] do not discard revindex when re-preparing packfiles

2014-01-15 Thread Jeff King
When an object lookup fails, we re-read the objects/pack
directory to pick up any new packfiles that may have been
created since our last read. We also discard any pack
revindex structs we've allocated.

The discarding is a problem for the pack-bitmap code, which keeps
a pointer to the revindex for the bitmapped pack. After the
discard, the pointer is invalid, and we may read free()d
memory.

Other revindex users do not keep a bare pointer to the
revindex; instead, they always access it through
revindex_for_pack(), which lazily builds the revindex. So
one solution is to teach the pack-bitmap code a similar
trick. It would be slightly less efficient, but probably not
all that noticeable.

However, it turns out this discarding is not actually
necessary. When we call reprepare_packed_git, we do not
throw away our old pack list. We keep the existing entries,
and only add in new ones. So there is no safety problem; we
will still have the pack struct that matches each revindex.
The packfile itself may go away, of course, but we are
already prepared to handle that, and it may happen outside
of reprepare_packed_git anyway.

Throwing away the revindex may save some RAM if the pack
never gets reused (about 12 bytes per object). But it also
wastes some CPU time (to regenerate the index) if the pack
does get reused. It's hard to say which is more valuable,
but in either case, it happens very rarely (only when we
race with a simultaneous repack). Just leaving the revindex
in place is simple and safe both for current and future
code.

Signed-off-by: Jeff King p...@peff.net
---
On top of jk/pack-bitmap.

 pack-revindex.c | 11 ---
 pack-revindex.h |  1 -
 sha1_file.c |  1 -
 3 files changed, 13 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index 0bb13b1..5bd7c61 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -245,14 +245,3 @@ struct revindex_entry *find_pack_revindex(struct 
packed_git *p, off_t ofs)
 
return pridx-revindex + pos;
 }
-
-void discard_revindex(void)
-{
-   if (pack_revindex_hashsz) {
-   int i;
-   for (i = 0; i  pack_revindex_hashsz; i++)
-   free(pack_revindex[i].revindex);
-   free(pack_revindex);
-   pack_revindex_hashsz = 0;
-   }
-}
diff --git a/pack-revindex.h b/pack-revindex.h
index 866ca9c..d737f98 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -15,6 +15,5 @@ struct pack_revindex *revindex_for_pack(struct packed_git *p);
 int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
 
 struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
-void discard_revindex(void);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index df89b57..45f9bb4 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1279,7 +1279,6 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
-   discard_revindex();
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-log --cherry-pick gives different results when using tag or tag^{}

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 11:57:39AM -0800, Junio C Hamano wrote:

 Where do we pass down other flags from tags to commits?  For
 example, if we do this:
 
   $ git log ^v1.8.5 master
 
 we mark v1.8.5 tag as UNINTERESTING, and throw that tag (not commit
 v1.8.5^0) into revs-pending.objects[].  We do the same for 'master',
 which is a commit.
 
 Later, in prepare_revision_walk(), we call handle_commit() on them,
 and unwrap the tag v1.8.5 to get v1.8.5^0, and then handles that
 commit object with flags obtained from the tag object.  This code
 only cares about UNINTERESTING and manually propagates it.

Thanks for picking up this line of thought. I had some notion that the
right solution would be in propagating the flags later from the pending
tags to the commits, but I didn't quite know where to look. Knowing that
we explicitly propagate UNINTERESTING but nothing else makes what I was
seeing make a lot more sense.

 Perhaps that code needs to propagate at least SYMMETRIC_LEFT down to
 the commit object as well, no?  With your patch, the topmost level
 of tag object and the eventual commit object are marked with the
 flag, but if we were dealing with a tag that points at another tag
 that in turn points at a commit, the intermediate tag will not be
 marked with SYMMETRIC_LEFT (nor UNINTERESTING for that matter),
 which may not affect the final outcome, but it somewhat feels wrong.

Agreed. I think the lack of flags on intermediate tags has always been
that way, even before 895c5ba, and I do not know of any case where it
currently matters. But it seems like the obvious right thing to mark
those intermediate tags.

 How about doing it this way instead (totally untested, though)?

Makes sense. It also means we will propagate flags down to any
pointed-to trees and blobs. I can't think of a case where that will
matter either (and they cannot be SYMMETRIC_LEFT, as that only makes
sense for commit objects).

I do notice that when we have a tree, we explicitly propagate
UNINTERESTING to the rest of the tree. Should we be propagating all
flags instead? Again, I can't think of a reason to do so (and if it is
not UNINTERESTING, it is a non-trivial amount of time to mark all paths
in the tree).


 @@ -287,7 +288,6 @@ static struct commit *handle_commit(struct rev_info *revs,
   if (parse_commit(commit)  0)
   die(unable to parse commit %s, name);
   if (flags  UNINTERESTING) {
 - commit-object.flags |= UNINTERESTING;
   mark_parents_uninteresting(commit);
   revs-limited = 1;
   }

We don't need to propagate the UNINTERESTING flag here, because either:

  - object pointed to the commit, in which case flags comes from
object-flags, and we already have it set

or

  - object was a tag, and we propagated the flags as we peeled (from
your earlier hunk)

Makes sense. I think the mark_blob_uninteresting call later in the
function is now irrelevant for the same reasons. The
mark_tree_uninteresting call is not, though, because it recurses.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: revision: propagate flag bits from tags to pointees

2014-01-15 Thread Jeff King
On Wed, Jan 15, 2014 at 12:26:13PM -0800, Junio C Hamano wrote:

 With the previous fix 895c5ba3 (revision: do not peel tags used in
 range notation, 2013-09-19), handle_revision_arg() that processes
 command line arguments for the git log family of commands no
 longer directly places the object pointed by the tag in the pending
 object array when it sees a tag object.  We used to place pointee
 there after copying the flag bits like UNINTERESTING and
 SYMMETRIC_LEFT.
 
 This change meant that any flag that is relevant to later history
 traversal must now be propagated to the pointed objects (most often
 these are commits) while starting the traversal, which is partly
 done by handle_commit() that is called from prepare_revision_walk().
 We did propagate UNINTERESTING, but did not do so for others, most
 notably SYMMETRIC_LEFT.  This caused git log --left-right v1.0...
 (where v1.0 is a tag) to start losing the leftness from the
 commit the tag points at.
 
 Signed-off-by: Junio C Hamano gits...@pobox.com
 ---

Looks good to me. As per my previous mail, I _think_ you could squash
in:

diff --git a/revision.c b/revision.c
index f786b51..2db906c 100644
--- a/revision.c
+++ b/revision.c
@@ -316,13 +316,10 @@ static struct commit *handle_commit(struct rev_info *revs,
 * Blob object? You know the drill by now..
 */
if (object-type == OBJ_BLOB) {
-   struct blob *blob = (struct blob *)object;
if (!revs-blob_objects)
return NULL;
-   if (flags  UNINTERESTING) {
-   mark_blob_uninteresting(blob);
+   if (flags  UNINTERESTING)
return NULL;
-   }
add_pending_object(revs, object, );
return NULL;
}

but that is not very much code reduction (and mark_blob_uninteresting is
very cheap). So it may not be worth the risk that my analysis is wrong.
:)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git quietly fails on https:// URL, https errors are never reported to user

2014-01-16 Thread Jeff King
On Thu, Jan 16, 2014 at 04:27:03AM -0800, Yuri wrote:

 While debugging, I find that remove_junk() deletes all directories
 from under __cxa_finalize.
 Before this, exit(128) is called from recvline_fh (Debug: Remote
 helper quit.) And this function in turn is called from under
 refs = transport_get_remote_refs(transport);
 
 I think you need to make sure that any https errors (in this and
 other locations) are reported to the user, and git never quits on
 error without saying what the error is.

We used to print Reading from helper 'git-remote-https' failed in this
instance. But in the majority of cases, remote-https has printed a
useful message already to stderr, and the extra line just confused
people. The downside, as you noticed, is that when the helper dies
without printing an error, the user is left with no message.

Unfortunately, detecting whether the helper printed a useful message is
non-trivial. It's possible we could do more detection based on the
helper's death (e.g., if it died by signal, print a message) and at
least say _something_.

But even if we do so, the message isn't going to tell you what went
wrong, only that something unexpected happened.  It is up to the helper
to print something useful, and if it didn't, it should be fixed.  So the
most important bug to fix here, IMHO, is figuring out why
git-remote-https died without printing an error message.

Is it possible to strace (or truss) the helper process? What it was
doing when it died may be helpful. Rather than picking through strace
-f output, you can use this hack to trace just the helper process:

  cat /in/your/$PATH/git-remote-strace \EOF
  #!/bin/sh
  protocol=$(echo $2 | cut -d: -f1)
  exec strace git-remote-$protocol $@
  EOF

  git clone strace::https://github.com/your/repo.git

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] diff_filespec cleanups and optimizations

2014-01-16 Thread Jeff King
I recently came across a repository with a commit containing 100 million
paths in its tree. Cleverly, the whole repo fits into a 1.5K packfile
(can you guess how it was done?). Not so cleverly, running diff-tree
--root on that commit uses a large amount of memory. :)

I do not think it is worth optimizing for such a pathological
repository. But I was curious how much it would want (it OOM'd on my
64-bit 16G machine). The answer is roughly:

   100,000,000 * (
  8 bytes per pointer to diff_filepair in the diff_queue
+ 32 bytes per diff_filepair struct
+  2 * (
 96 bytes per diff_filespec struct
   + 12 bytes per filename (in this case)
 )
  )

which is about 25G. Plus malloc overhead. So obviously this example is
unreasonable. A more reasonable large case is something like WebKit at
~150K files, doing a diff against the empty tree. That's only 37M.

But while looking at it, I noticed a bunch of cleanups for
diff_filespec.  With the patches below, sizeof(struct diff_filespec) on
my 64-bit machine goes from 96 bytes down to 80. Compiling with -m32
goes from 64 bytes down to 52.

The first few patches have cleanup value aside from the struct size
improvement. The last two are pure optimization. I doubt the
optimization is noticeable for any real-life cases, so I don't mind if
they get dropped. But they're quite trivial and obvious.

  [1/5]: diff_filespec: reorder dirty_submodule macro definitions
  [2/5]: diff_filespec: drop funcname_pattern_ident field
  [3/5]: diff_filespec: drop xfrm_flags field
  [4/5]: diff_filespec: reorder is_binary field
  [5/5]: diff_filespec: use only 2 bits for is_binary flag

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] diff_filespec: reorder dirty_submodule macro definitions

2014-01-16 Thread Jeff King
diff_filespec has a 2-bit dirty_submodule field and
defines two flags as macros. Originally these were right
next to each other, but a new field was accidentally added
in between in commit 4682d85. This patch puts the field and
its flags back together.

Using an enum like:

  enum {
  DIRTY_SUBMODULE_UNTRACKED = 1,
  DIRTY_SUBMODULE_MODIFIED = 2
  } dirty_submodule;

would be more obvious, but it bloats the structure. Limiting
the enum size like:

  } dirty_submodule : 2;

might work, but it is not portable.

Signed-off-by: Jeff King p...@peff.net
---
 diffcore.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diffcore.h b/diffcore.h
index 1c16c85..f822f9e 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -43,9 +43,9 @@ struct diff_filespec {
unsigned should_free : 1; /* data should be free()'ed */
unsigned should_munmap : 1; /* data should be munmap()'ed */
unsigned dirty_submodule : 2;  /* For submodules: its work tree is 
dirty */
-   unsigned is_stdin : 1;
 #define DIRTY_SUBMODULE_UNTRACKED 1
 #define DIRTY_SUBMODULE_MODIFIED  2
+   unsigned is_stdin : 1;
unsigned has_more_entries : 1; /* only appear in combined diff */
struct userdiff_driver *driver;
/* data should be considered binary; -1 means don't know yet */
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] diff_filespec: drop xfrm_flags field

2014-01-16 Thread Jeff King
The only mention of this field in the code is by some
debugging code which prints it out (and it will always be
zero, since we never touch it otherwise). It was obsoleted
very early on by 25d5ea4 ([PATCH] Redo rename/copy detection
logic., 2005-05-24).

Signed-off-by: Jeff King p...@peff.net
---
No savings here on 64-bit, since this ended up going to padding, but it
is what makes the next patch work.

 diff.c | 4 ++--
 diffcore.h | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 6b4cd0e..8e4a6a9 100644
--- a/diff.c
+++ b/diff.c
@@ -4139,9 +4139,9 @@ void diff_debug_filespec(struct diff_filespec *s, int x, 
const char *one)
DIFF_FILE_VALID(s) ? valid : invalid,
s-mode,
s-sha1_valid ? sha1_to_hex(s-sha1) : );
-   fprintf(stderr, queue[%d] %s size %lu flags %d\n,
+   fprintf(stderr, queue[%d] %s size %lu\n,
x, one ? one : ,
-   s-size, s-xfrm_flags);
+   s-size);
 }
 
 void diff_debug_filepair(const struct diff_filepair *p, int i)
diff --git a/diffcore.h b/diffcore.h
index 92e4d48..22993e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -31,7 +31,6 @@ struct diff_filespec {
void *cnt_data;
unsigned long size;
int count;   /* Reference count */
-   int xfrm_flags;  /* for use by the xfrm */
int rename_used; /* Count of rename users */
unsigned short mode; /* file mode */
unsigned sha1_valid : 1; /* if true, use sha1 and trust mode;
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] diff_filespec: reorder is_binary field

2014-01-16 Thread Jeff King
The middle of the diff_filespec struct contains a mixture of
ints, shorts, and bit-fields, followed by a pointer. On an
x86-64 system with an LP64 or LLP64 data model (i.e., most
of them), the integers and flags end up being padded out by
41 bits to put the pointer at an 8-byte boundary.

After the pointer, we have the int is_binary field, which
is only 32 bits. We end up wasting another 32 bits to pad
the struct size up to a multiple of 64 bits.

We can move the is_binary field before the pointer, which
lets the compiler store it where we used to have padding.
This shrinks the top padding to only 9 bits (from the
bit-fields), and eliminates the bottom padding entirely,
dropping the struct size from 88 to 80 bytes.

On a 32-bit system, there is no benefit, but nor should
there be any harm (we only need 4-byte alignment there, so
we were already using only 9 bits of padding).

Signed-off-by: Jeff King p...@peff.net
---

 diffcore.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diffcore.h b/diffcore.h
index 22993e1..d911bf0 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -45,9 +45,9 @@ struct diff_filespec {
 #define DIRTY_SUBMODULE_MODIFIED  2
unsigned is_stdin : 1;
unsigned has_more_entries : 1; /* only appear in combined diff */
-   struct userdiff_driver *driver;
/* data should be considered binary; -1 means don't know yet */
int is_binary;
+   struct userdiff_driver *driver;
 };
 
 extern struct diff_filespec *alloc_filespec(const char *);
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] diff_filespec: use only 2 bits for is_binary flag

2014-01-16 Thread Jeff King
The is_binary flag needs only three values: -1, 0, and 1.
However, we use a whole 32-bit int for it on most systems
(both 32- and 64- bit).

Instead, we can mark it to use only 2 bits. On 32-bit
systems, this lets it end up as part of the bitfield above
(saving 4 bytes). On 64-bit systems, we don't see any change
(because the savings end up as padding), but it does leave
room for another free 32-bit value to be added later.

Signed-off-by: Jeff King p...@peff.net
---
I don't typically use bit-sized integers like this for anything but
unsigned integers to be used as flags. My understanding is that using
signed integers is explicitly permitted by the standard. I don't know if
we're guaranteed a 2's-complement representation, but I can't imagine an
implementation providing any range besides -2..1, which is what we need.

 diffcore.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diffcore.h b/diffcore.h
index d911bf0..79de8cf 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -46,7 +46,7 @@ struct diff_filespec {
unsigned is_stdin : 1;
unsigned has_more_entries : 1; /* only appear in combined diff */
/* data should be considered binary; -1 means don't know yet */
-   int is_binary;
+   int is_binary : 2;
struct userdiff_driver *driver;
 };
 
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >