date:20160415

Re: Parallel checkout (Was Re: 0 bot for Git)

2016-04-15 Thread Michael Haggerty

On 04/15/2016 06:52 PM, Jeff King wrote:
> On Fri, Apr 15, 2016 at 01:18:46PM +0200, Christian Couder wrote:
> 
>> On Fri, Apr 15, 2016 at 11:51 AM, Duy Nguyen  wrote:
>>> On Fri, Apr 15, 2016 at 12:04:49AM +0200, Christian Couder wrote:

 There is a draft of an article about the first part of the Contributor
 Summit in the draft of the next Git Rev News edition:

 https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-14.md
>>>
>>> Thanks. I read the sentence "This made people mention potential
>>> problems with parallelizing git checkout" and wondered what these
>>> problems were.
>>
>> It may have been Michael or Peff (CC'ed) saying that it could break
>> some builds as the timestamps on the files might not always be ordered
>> in the same way.
> 
> I don't think it was me. I'm also not sure how it would break a build.
> Git does not promise a particular timing or order for updating files as
> it is. So if we are checking out two files "a" and "b", and your build
> process depends on the timestamp between them, I think all bets are
> already off.

I'm hazy on this, but I think somebody at Git Merge pointed out that
parallel checkouts (within a single repository) could be tricky if
multiple Git filenames are mapped to the same file due to filesystem
case-insensitivity or encoding normalization.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +static int line_length(const char *recs)
>> +{
>> + char *s = strchr(recs, '\n');
>> + return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?
>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?
>
>

I think we're going to make use of xdl_blankline instead of this or
our own "is_emptyline"

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +static int line_length(const char *recs)
>> +{
>> + char *s = strchr(recs, '\n');
>> + return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?

We discarded this idea as it produces to many errors.
(We'd be back at the 50:50 case, "is it really worth it?")

We will go back to the "empty line" heuristic, which will be solved
via xdl_blankline(rec[i]->ptr, rec[i]->size, flags); which could be inlined.
That will solve the CRLF issue as a CR is covered as a whitespace
(with CRLF you'd have to specify diff to ignore white spaces).

For the safety I assumed
* there is always a \n even on the last line by convention.
* in case it is not, the string is null terminated, hence
  strchr and strlen for the rescue.

>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?

xdl_blankline will use ->size, so we'll be holding it right.

Thanks,
Stefan

>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: possible to checkout same branch in different worktree

2016-04-15 Thread Duy Nguyen

On Fri, Apr 15, 2016 at 10:36 PM, Junio C Hamano  wrote:
> Reto Hablützel  writes:
>
>> the checkout command prevents me from checking out a branch in the
>> current worktree if it is already checked out in another worktree.
>>
>> However, if I rebase the branch in the current worktree onto the
>> branch in the other worktree, I end up in a situation where the same
>> branch is checked out twice in the two worktrees.
>
> I agree that any end-user facing subcommand like "git rebase", even
> if it is not "git checkout", should refuse to work on and update a
> branch that is checked out elsewhere.  Otherwise it will end up
> causing confusion.

I agree. I suppose we need same treatment for git-push? A push can be
rejected if the pushed ref is being checked out. Suppose HEAD is in
the middle of a rebase (or am), it fails to detect ref name (and thus
checkout state) the same way here and we definitely do not want "git
rebase" to simply overwrite whatever is updated by the push request.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] submodule: port resolve_relative_url from shell to C

2016-04-15 Thread Stefan Beller

Later on we want to automatically call `git submodule init` from
other commands, such that the users don't have to initialize the
submodule themselves.  As these other commands are written in C
already, we'd need the init functionality in C, too.  The
`resolve_relative_url` function is a large part of that init
functionality, so start by porting this function to C.

To create the tests in t0060, the function `resolve_relative_url`
was temporarily enhanced to write all inputs and output to disk
when running the test suite. The added tests in this patch are
a small selection thereof.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 builtin/submodule--helper.c | 209 +++-
 git-submodule.sh|  81 +
 t/t0060-path-utils.sh   |  46 ++
 3 files changed, 258 insertions(+), 78 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 864dd18..46946b0 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -9,6 +9,211 @@
 #include "submodule-config.h"
 #include "string-list.h"
 #include "run-command.h"
+#include "remote.h"
+#include "refs.h"
+#include "connect.h"
+
+static char *get_default_remote(void)
+{
+   char *dest = NULL, *ret;
+   unsigned char sha1[20];
+   struct strbuf sb = STRBUF_INIT;
+   const char *refname = resolve_ref_unsafe("HEAD", 0, sha1, NULL);
+
+   if (!refname)
+   die(_("No such ref: %s"), "HEAD");
+
+   /* detached HEAD */
+   if (!strcmp(refname, "HEAD"))
+   return xstrdup("origin");
+
+   if (!skip_prefix(refname, "refs/heads/", ))
+   die(_("Expecting a full ref name, got %s"), refname);
+
+   strbuf_addf(, "branch.%s.remote", refname);
+   if (git_config_get_string(sb.buf, ))
+   ret = xstrdup("origin");
+   else
+   ret = dest;
+
+   strbuf_release();
+   return ret;
+}
+
+static int starts_with_dot_slash(const char *str)
+{
+   return str[0] == '.' && is_dir_sep(str[1]);
+}
+
+static int starts_with_dot_dot_slash(const char *str)
+{
+   return str[0] == '.' && str[1] == '.' && is_dir_sep(str[2]);
+}
+
+/*
+ * Returns 1 if it was the last chop before ':'.
+ */
+static int chop_last_dir(char **remoteurl, int is_relative)
+{
+   char *rfind = find_last_dir_sep(*remoteurl);
+   if (rfind) {
+   *rfind = '\0';
+   return 0;
+   }
+
+   rfind = strrchr(*remoteurl, ':');
+   if (rfind) {
+   *rfind = '\0';
+   return 1;
+   }
+
+   if (is_relative || !strcmp(".", *remoteurl))
+   die(_("cannot strip one component off url '%s'"),
+   *remoteurl);
+
+   free(*remoteurl);
+   *remoteurl = xstrdup(".");
+   return 0;
+}
+
+/*
+ * The `url` argument is the URL that navigates to the submodule origin
+ * repo. When relative, this URL is relative to the superproject origin
+ * URL repo. The `up_path` argument, if specified, is the relative
+ * path that navigates from the submodule working tree to the superproject
+ * working tree. Returns the origin URL of the submodule.
+ *
+ * Return either an absolute URL or filesystem path (if the superproject
+ * origin URL is an absolute URL or filesystem path, respectively) or a
+ * relative file system path (if the superproject origin URL is a relative
+ * file system path).
+ *
+ * When the output is a relative file system path, the path is either
+ * relative to the submodule working tree, if up_path is specified, or to
+ * the superproject working tree otherwise.
+ *
+ * NEEDSWORK: This works incorrectly on the domain and protocol part.
+ * remote_url  url  outcome  expectation
+ * http://a.com/b  ../c http://a.com/c   as is
+ * http://a.com/b  ../../c  http://c error out
+ * http://a.com/b  ../../../c   http:/c  error out
+ * http://a.com/b  ../../../../chttp:c   error out
+ * http://a.com/b  ../../../../../c.:c   error out
+ * NEEDSWORK: Given how chop_last_dir() works, this function is broken
+ * when a local part has a colon in its path component, too.
+ */
+static char *relative_url(const char *remote_url,
+   const char *url,
+   const char *up_path)
+{
+   int is_relative = 0;
+   int colonsep = 0;
+   char *out;
+   char *remoteurl = xstrdup(remote_url);
+   struct strbuf sb = STRBUF_INIT;
+   size_t len = strlen(remoteurl);
+
+   if (is_dir_sep(remoteurl[len]))
+   remoteurl[len] = '\0';
+
+   if (!url_is_local_not_ssh(remoteurl) || is_absolute_path(remoteurl))
+   is_relative = 0;
+   else {
+   is_relative = 1;
+   /*
+* Prepend a './' to ensure all relative
+*

[PATCH 2/2] submodule: port init from shell to C

2016-04-15 Thread Stefan Beller

By having the `submodule init` functionality in C, we can reference it
easier from other parts in the code in later patches. The code is split
up to have one function to initialize one submodule and a calling function
that takes care of the rest, such as argument handling and translating the
arguments to the paths of the submodules.

This is the first submodule subcommand that is fully converted to C
except for the usage string, so this is actually removing a call to
the `submodule--helper list` function, which is supposed to be used in
this transition. Instead we'll make a direct call to `module_list_compute`.

An explanation why we need to edit the prefixes in cmd_update in
git-submodule.sh in this patch:

By having no processing in the shell part, we need to convey the notion
of wt_prefix and prefix to the C parts, which former patches punted on
and did the processing of displaying path in the shell.

`wt_prefix` used to hold the path from the repository root to the current
directory, e.g. wt_prefix would be t/ if the user invoked the
`git submodule` command in ~/repo/t and ~repo is the GIT_DIR.

`prefix` used to hold the relative path from the repository root to the
operation, e.g. if you have recursive submodules, the shell script would
modify the `prefix` in each recursive step by adding the submodule path.

We will pass `wt_prefix` into the C helper via `git -C ` as that
will setup git in the directory the user actually called git-submodule.sh
from. The `prefix` will be passed in via the `--prefix` option.

Having `prefix` and `wt_prefix` relative to the GIT_DIR of the
calling superproject is unfortunate with this patch as the C code doesn't
know about a possible recursion from a superproject via `submodule update
--init --recursive`.

To fix this, we change the meaning of `wt_prefix` to point to the current
project instead of the superproject and `prefix` to include any relative
paths issues in the superproject. That way `prefix` will become the leading
part for displaying paths and `wt_prefix` will be empty in recursive
calls for now.

The new notion of `wt_prefix` and `prefix` still allows us to reconstruct
the calling directory in the superproject by just traveling reverse of
`prefix`.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 115 
 git-submodule.sh|  48 ++
 submodule.c |  21 
 submodule.h |   1 +
 4 files changed, 140 insertions(+), 45 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 46946b0..b6d4f27 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -305,6 +305,120 @@ static int module_list(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+static void init_submodule(const char *path, const char *prefix, int quiet)
+{
+   const struct submodule *sub;
+   struct strbuf sb = STRBUF_INIT;
+   char *upd = NULL, *url = NULL, *displaypath;
+
+   /* Only loads from .gitmodules, no overlay with .git/config */
+   gitmodules_config();
+
+   sub = submodule_from_path(null_sha1, path);
+
+   if (prefix) {
+   strbuf_addf(, "%s%s", prefix, path);
+   displaypath = strbuf_detach(, NULL);
+   } else
+   displaypath = xstrdup(sub->path);
+
+   /*
+* Copy url setting when it is not set yet.
+* To look up the url in .git/config, we must not fall back to
+* .gitmodules, so look it up directly.
+*/
+   strbuf_reset();
+   strbuf_addf(, "submodule.%s.url", sub->name);
+   if (git_config_get_string(sb.buf, )) {
+   url = xstrdup(sub->url);
+
+   if (!url)
+   die(_("No url found for submodule path '%s' in 
.gitmodules"),
+   displaypath);
+
+   /* Possibly a url relative to parent */
+   if (starts_with_dot_dot_slash(url) ||
+   starts_with_dot_slash(url)) {
+   char *remoteurl, *relurl;
+   char *remote = get_default_remote();
+   struct strbuf remotesb = STRBUF_INIT;
+   strbuf_addf(, "remote.%s.url", remote);
+   free(remote);
+
+   if (git_config_get_string(remotesb.buf, ))
+   /*
+* The repository is its own
+* authoritative upstream
+*/
+   remoteurl = xgetcwd();
+   relurl = relative_url(remoteurl, url, NULL);
+   strbuf_release();
+   free(remoteurl);
+   free(url);
+   url = relurl;
+   }
+
+   if (git_config_set_gently(sb.buf, url))
+

[PATCH 0/2] Another reroll of sb/submodule-init

2016-04-15 Thread Stefan Beller

* squashed the fixes from Johannes Sixt to unbreak Windows tests.
  (I had encoding issues; so I manually integrated the changes)
* fixed memleaks
* the base to apply did not change (ee30f17805f51d37 Merge branch
'sb/submodule-path-misc-bugs' into sb/submodule-init)

Thanks,
Stefan

diff to current origin/sb/submodule-init:

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ad3cba6..b6d4f27 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -309,8 +309,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 {
const struct submodule *sub;
struct strbuf sb = STRBUF_INIT;
-   const char *upd = NULL;
-   char *url = NULL, *displaypath;
+   char *upd = NULL, *url = NULL, *displaypath;
 
/* Only loads from .gitmodules, no overlay with .git/config */
gitmodules_config();
@@ -340,7 +339,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
/* Possibly a url relative to parent */
if (starts_with_dot_dot_slash(url) ||
starts_with_dot_slash(url)) {
-   char *remoteurl;
+   char *remoteurl, *relurl;
char *remote = get_default_remote();
struct strbuf remotesb = STRBUF_INIT;
strbuf_addf(, "remote.%s.url", remote);
@@ -352,9 +351,11 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 * authoritative upstream
 */
remoteurl = xgetcwd();
-   url = relative_url(remoteurl, url, NULL);
+   relurl = relative_url(remoteurl, url, NULL);
strbuf_release();
free(remoteurl);
+   free(url);
+   url = relurl;
}
 
if (git_config_set_gently(sb.buf, url))
@@ -368,14 +369,14 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
/* Copy "update" setting when it is not set yet */
strbuf_reset();
strbuf_addf(, "submodule.%s.update", sub->name);
-   if (git_config_get_string_const(sb.buf, ) &&
+   if (git_config_get_string(sb.buf, ) &&
sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
if (sub->update_strategy.type == SM_UPDATE_COMMAND) {
fprintf(stderr, _("warning: command update mode 
suggested for submodule '%s'\n"),
sub->name);
-   upd = "none";
+   upd = xstrdup("none");
} else
-   upd = 
submodule_strategy_to_string(>update_strategy);
+   upd = 
xstrdup(submodule_strategy_to_string(>update_strategy));
 
if (git_config_set_gently(sb.buf, upd))
die(_("Failed to register update mode for submodule 
path '%s'"), displaypath);
@@ -383,6 +384,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
strbuf_release();
free(displaypath);
free(url);
+   free(upd);
 }
 
 static int module_init(int argc, const char **argv, const char *prefix)
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 2e62548..bf2deee 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -305,13 +305,16 @@ test_git_path GIT_COMMON_DIR=bar config   
bar/config
 test_git_path GIT_COMMON_DIR=bar packed-refs  bar/packed-refs
 test_git_path GIT_COMMON_DIR=bar shallow  bar/shallow
 
+# In the tests below, the distinction between $PWD and $(pwd) is important:
+# on Windows, $PWD is POSIX style (/c/foo), $(pwd) has drive letter (c:/foo).
+
 test_submodule_relative_url "../" "../foo" "../submodule" "../../submodule"
 test_submodule_relative_url "../" "../foo/bar" "../submodule" 
"../../foo/submodule"
 test_submodule_relative_url "../" "../foo/submodule" "../submodule" 
"../../foo/submodule"
 test_submodule_relative_url "../" "./foo" "../submodule" "../submodule"
 test_submodule_relative_url "../" "./foo/bar" "../submodule" "../foo/submodule"
 test_submodule_relative_url "../../../" "../foo/bar" "../sub/a/b/c" 
"../../../../foo/sub/a/b/c"
-test_submodule_relative_url "../" "$PWD/addtest" "../repo" "$PWD/repo"
+test_submodule_relative_url "../" "$PWD/addtest" "../repo" "$(pwd)/repo"
 test_submodule_relative_url "../" "foo/bar" "../submodule" "../foo/submodule"
 test_submodule_relative_url "../" "foo" "../submodule" "../submodule"
 
@@ -322,16 +325,16 @@ test_submodule_relative_url "(null)" "../foo" 
"../submodule" "../submodule"
 test_submodule_relative_url "(null)" "./foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "./foo" "../submodule" "submodule"

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Junio C Hamano

Stefan Beller  writes:

> +static int line_length(const char *recs)
> +{
> + char *s = strchr(recs, '\n');
> + return s ? s - recs : strlen(recs);
> +}

It seems that you guys are discarding this "number of bytes on a
line, no matter what these bytes are" idea, so this may be moot, but
is there a guarantee that reading through recs until you happen to
see a NUL is safe?

Shouldn't the code that accesses a "line" be using the same "from
here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
having to scan the underlying string in an unbounded way?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 16/21] bisect: make total number of commits global

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> The total number of commits in a bisect process is a property of
> the bisect process. Making this property global helps to make the code
> clearer.
>
> Signed-off-by: Stephan Beyer 
> ---

After wondring about count++ vs nr, I re-read this one.

This patch is mislabled.

Making it global is a lessor, supposed-to-be-no-op change, but the
bigger change is that the definition of "total" is silently changed.

The definition of mid-point was based on 'nr' in the original code,
which counted only the tree-changing commits, and with this patch,
it is based on 'total', which now only counts the tree-changing
commits, so things are internally consistent, and the loop I was
puzzled with, "while (counted < total)", would properly terminate.

Perhaps things become cleaner and easier to understand if this was
split into two steps.  One that changes the meaning of 'total' (and
removes 'nr'), and the other that makes 'total' a global.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git mascot

2016-04-15 Thread Duy Nguyen

On Fri, Apr 15, 2016 at 11:41 PM, Christian Howe  wrote:
> There has been talk of a git mascot a while back in 2005. Some people
> mentioned a fish or a turtle. Since all the great open source projects
> like Linux or RethinkDB have a cute mascot, git definitely needs one
> as well. A mascot gives people a recognizable persona towards which
> they can direct their unbounded love for the project. It'd even be
> good if a plush doll of this mascot could eventually be created for
> people to physically express their love of git through cuddling and
> hugging.

Given Git's horrible interface (in some cases still) and power, I'd
say an ugly witch (maybe doing something with trees). But I don't
think anyone can make a cute mascot out of that. Nor does anybody want
to cuddling :D
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 03/16] index-helper: new daemon for caching index and related stuff

2016-04-15 Thread Duy Nguyen

On Sat, Apr 16, 2016 at 3:21 AM, David Turner  wrote:
> On Fri, 2016-04-15 at 18:25 +0700, Duy Nguyen wrote:
>> On Thu, Apr 14, 2016 at 1:47 AM, David Turner <
>> dtur...@twopensource.com> wrote:
>> > > > +   fd = unix_stream_connect(socket_path);
>> > > > +   if (refresh_cache) {
>> > > > +   ret = write_in_full(fd, "refresh", 8) != 8;
>> > >
>> > > Since we've moved to unix socket and had bidirectional
>> > > communication,
>> > > it's probably a good idea to read an "ok" back, giving index
>> > > -helper
>> > > time to prepare the cache. As I recall the last discussion with
>> > > Johannes, missing a cache here when the index is around 300MB
>> > > could
>> > > hurt more than wait patiently once and have it ready next time.
>> >
>> > It is somewhat slower to wait for the daemon (which requires a disk
>> > load + a memcpy) than it is to just load it ourselves (which is
>> > just a
>> > disk load).
>>
>> You forgot the most costly part, SHA-1 verification. For very large
>> index, I assume the index-helper is already in the middle of hashing
>> the index content. If you ignore index-helper, you need to go hash
>> the
>> whole thing again. The index-helper can hand it to you if you wait
>> just a bit more. This wait time should be shorter because index
>> -helper
>> is already in the middle of hashing (and in optimistic case, very
>> close to finishing it).
>
> You're right -- I did forget that part.
>
> In "index-helper: use watchman to avoid refreshing index with lstat()",
> we switch from just poking to poking and waiting for a reply.  Then in
> "read-cache: config for waiting for index-helper", we make that waiting
> optional.  So what if I just remove that patch?  Does that solve it?
>

Yes. I think so.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Parallel checkout (Was Re: 0 bot for Git)

2016-04-15 Thread Duy Nguyen

On Fri, Apr 15, 2016 at 10:08 PM, Stefan Beller  wrote:
> On Fri, Apr 15, 2016 at 2:51 AM, Duy Nguyen  wrote:
>> On Fri, Apr 15, 2016 at 12:04:49AM +0200, Christian Couder wrote:
>> The idea is simple, you offload some work to process workers. In this
>> patch, only entry.c:write_entry() is moved to workers. We still do
>> directory creation and all sort of checks and stat refresh in the main
>> process. Some more work may be moved away, for example, the entire
>> builtin/checkout.c:checkout_merged().
>>
>> Multi process is less efficient than multi thread model. But I doubt
>> we could make object db access thread-safe soon. The last discussion
>> was 2 years ago [1] and nothing much has happened.
>>
>> Numbers are encouraging though. On linux-2.6 repo running on linux and
>> ext4 filesystem, checkout_paths() would dominate "git checkout :/".
>> Unmodified git takes about 31s.
>
> Please also benchmark "make build" or another read heavy operation
> with these 2 different checkouts. IIRC that was the problem. (checkout
> improved, but due to file ordering on the fs, the operation afterwards
> slowed down, such that it became a net negative)

That's way too close to fs internals. Don't filesystems these days
have b-tree and indexes to speed up pathname lookup (which makes file
creation order meaningless, I guess)? If it only happens to a fs or
two, I'm leaning to say "your problem, fix your file system". A
mitigation may be let worker handle whole directory (non-recursively)
so file creation order within a directory is almost the same.

> Would it make sense to use the parallel processing infrastructure from
> run-command.h
> instead of doing all setup and teardown yourself?
> (As you call it for-fun patch, I'd assume the answer is: Writing code
> is more fun than
> using other peoples code ;)

I did look at run-command.h. Your run_process_parallel() looked almost
fit, but I needed control over stdout for coordination, not to be
printed. At that point, yes writing new code was more fun than
tweaking run_process_parallel :-D
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 15/16] branch: use ref-filter printing APIs

2016-04-15 Thread Stefan Beller

>  static int calc_maxwidth(struct ref_array *refs, int remote_bonus)
>  {
> int i, max = 0;
> @@ -432,7 +281,10 @@ static int calc_maxwidth(struct ref_array *refs, int 
> remote_bonus)
>
> skip_prefix(it->refname, "refs/heads/", );
> skip_prefix(it->refname, "refs/remotes/", );
> -   w = utf8_strwidth(desc);
> +   if (it->kind == FILTER_REFS_DETACHED_HEAD)
> +   w = strlen(get_head_description());

get_head_description returns memory, which needs to be free'd.
(found by catching up on reading the coverity scan log. I see
you deleted get_head_description in another part of the patch;
nevertheless would you like to take care of this memleak?)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Move test-* to t/helper/ subdirectory

2016-04-15 Thread Duy Nguyen

On Sat, Apr 16, 2016 at 12:06 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Nguyễn Thái Ngọc Duy   writes:
>>
>>> This keeps top dir a bit less crowded. And because these programs are
>>> for testing purposes, it makes sense that they stay somewhere in t/
>>
>> But leaves many *.o files after "make clean".  Even "distclean" does
>> not clean them.
>
> Perhaps something like this as a preparatory patch, to protect us
> from future breakages similar to this change.

Yes. Much better than adding t/helper/*.o there. Thanks.

>
> -- >8 --
> Subject: Makefile: clean *.o files we create
>
> The part that removes object files in the 'clean' target predates
> various Makefile macros that list object files we create, and
> instead removes the objects with shell glob, perpetually requiring
> updates whenever a new location that builds object files is added.
>
> Simplify the target by removing $(OBJECTS), which is supposed to
> have all the objects we create during the build.
>
> Signed-off-by: Junio C Hamano 
> ---
>  Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index fe0bf7d..69d32bf 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2456,8 +2456,8 @@ profile-clean:
> $(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
>
>  clean: profile-clean coverage-clean
> -   $(RM) *.o *.res refs/*.o block-sha1/*.o ppc/*.o compat/*.o 
> compat/*/*.o
> -   $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
> +   $(RM) *.res
> +   $(RM) $(OBJECTS)
> $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
> $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
> $(RM) $(TEST_PROGRAMS) $(NO_INSTALL)



-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 09/16] index-helper: use watchman to avoid refreshing index with lstat()

2016-04-15 Thread Stefan Beller

> +static void refresh_by_watchman(struct index_state *istate)
> +{
> +   void *shm = NULL;
> +   int length;
> +   int i;
> +   struct stat st;
> +   int fd = -1;
> +   const char *path = index_helper_path("git-watchman-%s-%"PRIuMAX,
> +sha1_to_hex(istate->sha1),
> +(uintmax_t)getpid());
> +
> +   fd = open(path, O_RDONLY);
> +   if (fd < 0)
> +   return;
> +
> +   /*
> +* This watchman data is just for us -- no need to keep it
> +* around once we've got it open.
> +*/
> +   unlink(path);
> +
> +   if (fstat(fd, ) < 0)
> +   goto done;
> +
> +   length = st.st_size;
> +   shm = mmap(NULL, length, PROT_READ, MAP_SHARED, fd, 0);
> +
> +   if (shm == MAP_FAILED)
> +   goto done;
> +
> +   close(fd);
> +   fd = -1;
> +
> +   if (length <= 20 ||
> +   hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
> +   /*
> +* No need to clear CE_WATCHMAN_DIRTY set by 'WAMA' on
> +* disk. Watchman can only set more, not clear any, so
> +* this is OR mask.
> +*/
> +   read_watchman_ext(istate, shm, length - 20))
> +   goto done;
> +
> +   /*
> +* Now that we've marked the invalid entries in the
> +* untracked-cache itself, we can erase them from the list of
> +* entries to be processed and mark the untracked cache for
> +* watchman usage.
> +*/
> +   if (istate->untracked) {
> +   string_list_clear(>untracked->invalid_untracked, 0);
> +   istate->untracked->use_watchman = 1;
> +   }
> +
> +   for (i = 0; i < istate->cache_nr; i++) {
> +   struct cache_entry *ce = istate->cache[i];
> +   if (ce_stage(ce) || (ce->ce_flags & CE_WATCHMAN_DIRTY))
> +   continue;
> +   ce_mark_uptodate(ce);
> +   }
> +done:
> +   if (shm)
> +   munmap(shm, length);
> +
> +   if (fd > 0)
> +   close(fd);

coverities opinion:
> off_by_one: Testing whether handle fd is strictly greater than zero is 
> suspicious.
> fd leaks when it is zero. Did you intend to include equality with zero?

We can assert that fd is never 0, because that is where stdin is?

> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 03/16] index-helper: new daemon for caching index and related stuff

2016-04-15 Thread Stefan Beller

> +static int try_shm(struct index_state *istate)
> +{
> +   void *new_mmap = NULL;
> +   size_t old_size = istate->mmap_size;
> +   ssize_t new_size;
> +   const unsigned char *sha1;
> +   struct stat st;
> +   int fd;
> +
> +   if (!is_main_index(istate) ||
> +   old_size <= 20 ||
> +   stat(git_path("index-helper.path"), ))
> +   return -1;
> +   if (poke_daemon(istate, , 0))
> +   return -1;
> +   sha1 = (unsigned char *)istate->mmap + old_size - 20;
> +
> +   fd = open(index_helper_path("git-index-%s", sha1_to_hex(sha1)),
> + O_RDONLY);
> +   if (fd < 0)
> +   goto fail;
> +
> +   if (fstat(fd, ))
> +   goto fail;
> +
> +   new_size = st.st_size;
> +   new_mmap = mmap(NULL, new_size, PROT_READ, MAP_SHARED, fd, 0);
> +   if (new_size <= 20 ||
> +   hashcmp((unsigned char *)istate->mmap + old_size - 20,
> +   (unsigned char *)new_mmap + new_size - 20)) {
> +   if (new_mmap)
> +   munmap(new_mmap, new_size);
> +   goto fail;

coming from here

> +   }
> +
> +   /* The memory barrier here matches index-helper.c:share_index. */
> +   __sync_synchronize();
> +
> +   munmap(istate->mmap, istate->mmap_size);
> +   istate->mmap = new_mmap;
> +   istate->mmap_size = new_size;
> +   istate->from_shm = 1;
> +   return 0;
> +
> +fail:

fd may be leaking here?

> +   poke_daemon(istate, , 1);
> +   return -1;
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 4:32 PM, Jacob Keller  wrote:
> On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller  wrote:
>> There's a few places that will need cleaning up (comments and such)
>> that mention empty line still, but that's not surprising. I am going
>> to test this for a bit on my local repos, and see if it makes any
>> difference to the old heuristic as well.
>>
>> Thanks,
>> Jake
>>
>
> I ran this heuristic on git.git and it produces tons of false positive
> transforms which are much lease readable (to me at least), far more
> than those produced by the newline/blank link heuristic did.
>
> I think we should stick with the empty line heuristic instead of this
> version, even if it's easier to implement this version.

I agree. The heuristic is worse as we often have these 50:50 chances
of messing stuff up.

>
> We still would need to figure out how to handle CRLF properly but it's
> worth resolving that than this heuristic is.
>
> Thanks,
> Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller  wrote:
> There's a few places that will need cleaning up (comments and such)
> that mention empty line still, but that's not surprising. I am going
> to test this for a bit on my local repos, and see if it makes any
> difference to the old heuristic as well.
>
> Thanks,
> Jake
>

I ran this heuristic on git.git and it produces tons of false positive
transforms which are much lease readable (to me at least), far more
than those produced by the newline/blank link heuristic did.

I think we should stick with the empty line heuristic instead of this
version, even if it's easier to implement this version.

We still would need to figure out how to handle CRLF properly but it's
worth resolving that than this heuristic is.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 4:01 PM, Stefan Beller  wrote:
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Original discussion and testing found the following heuristic to be
> producing the desired output:
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. When
> implementing this heuristic the handling of empty lines was awkward as
> it is unclear what an empty line is. ('\n' or do we include "\r\n" as it
> is common on Windows?) Instead we implement a slightly different heuristic:
>
>   If there are diff chunks which can be shifted around, find the shortest
>   line in the overlapping parts. Use the line with the shortest length that
>   occurs last as the last line of the chunk with the rest
>   of the context above.
>
> However, as with any heuristic it is not really known whether this will
> always be more optimal. Thus, leave the heuristic disabled by default.
>
> Add an XDIFF flag to enable this heuristic only conditionally. Add
> a diff command line option and diff configuration option to allow users
> to enable this option when desired.
>
> TODO:
> * Add tests
> * Add better/more documentation explaining the heuristic, possibly with
>   examples(?)
> * better name(?)
>

There's a few places that will need cleaning up (comments and such)
that mention empty line still, but that's not surprising. I am going
to test this for a bit on my local repos, and see if it makes any
difference to the old heuristic as well.

Thanks,
Jake

> Signed-off-by: Stefan Beller 
> Signed-off-by: Jacob Keller 
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/diff-config.txt  |  6 ++
>  Documentation/diff-options.txt |  6 ++
>  diff.c | 11 +++
>  xdiff/xdiff.h  |  2 ++
>  xdiff/xdiffi.c | 29 +
>  5 files changed, 54 insertions(+)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba565..3d99a90 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.shortestLineHeuristic::
> +   Set this option to true to enable the shortest line chunk heuristic 
> when
> +   producing diff output. This heuristic will attempt to shift hunks such
> +   that the last shortest common line occurs below the hunk with the 
> rest of
> +   the context above it.
> +
>  diff.algorithm::
> Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 4b0318e..b1ca83d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
> Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> +--shortest-line-heuristic::
> +--no-shortest-line-heuristic::
> +   When possible, shift common shortest line in diff hunks below the hunk
> +   such that the last common shortest line for each hunk is below, with 
> the
> +   rest of the context above the hunk.
> +
>  --minimal::
> Spend extra time to make sure the smallest possible
> diff is produced.
> diff --git a/diff.c b/diff.c
> index 4dfe660..a02aff9 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #endif
>
>  static int diff_detect_rename_default;
> +static int diff_shortest_line_heuristic = 0;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char 
> *value, void *cb)
> diff_detect_rename_default = git_config_rename(var, value);
> return 0;
> }
> +   if (!strcmp(var, "diff.shortestlineheuristic")) {
> +   diff_shortest_line_heuristic = git_config_bool(var, value);
> +   return 0;
> +   }
> if (!strcmp(var, "diff.autorefreshindex")) {
> diff_auto_refresh_index = git_config_bool(var, value);
> return 0;
> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
> options->use_color =

[PATCH 1/2] xdiff: add recs_match helper function

2016-04-15 Thread Stefan Beller

From: Jacob Keller 

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller 
Signed-off-by: Stefan Beller 
---
 xdiff/xdiffi.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d..748eeb9 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+{
+   return (recs[ixs]->ha == recs[ix]->ha &&
+   xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+recs[ix]->ptr, recs[ix]->size,
+flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha 
&&
-  xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 
1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+   while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, 
flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -470,8 +477,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-  xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
recs[ix]->ptr, recs[ix]->size, flags)) {
+   while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
rchg[ixs++] = 0;
rchg[ix++] = 1;
 
-- 
2.8.1.189.gd13d43c

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Stefan Beller

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Original discussion and testing found the following heuristic to be
producing the desired output:

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. When
implementing this heuristic the handling of empty lines was awkward as
it is unclear what an empty line is. ('\n' or do we include "\r\n" as it
is common on Windows?) Instead we implement a slightly different heuristic:

  If there are diff chunks which can be shifted around, find the shortest
  line in the overlapping parts. Use the line with the shortest length that
  occurs last as the last line of the chunk with the rest
  of the context above.

However, as with any heuristic it is not really known whether this will
always be more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller 
Signed-off-by: Jacob Keller 
Signed-off-by: Stefan Beller 
---
 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 29 +
 5 files changed, 54 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba565..3d99a90 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.shortestLineHeuristic::
+   Set this option to true to enable the shortest line chunk heuristic when
+   producing diff output. This heuristic will attempt to shift hunks such
+   that the last shortest common line occurs below the hunk with the rest 
of
+   the context above it.
+
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e..b1ca83d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--shortest-line-heuristic::
+--no-shortest-line-heuristic::
+   When possible, shift common shortest line in diff hunks below the hunk
+   such that the last common shortest line for each hunk is below, with the
+   rest of the context above the hunk.
+
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe660..a02aff9 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_shortest_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
+   if (!strcmp(var, "diff.shortestlineheuristic")) {
+   diff_shortest_line_heuristic = git_config_bool(var, value);
+   return 0;
+   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
+   if (diff_shortest_line_heuristic)
+   DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg,

[RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff chunk heuristic

2016-04-15 Thread Stefan Beller

This is a version based on Jacobs v2, with the same fixes as in his v3 
(hopefully),
changing the heuristic, such that CRLF confusion might be gone.

TODO:
* add some tests
* think about whether we need a git attribute or not (I did some
  thinking, and if we do need to configure this at all, this is where I
  would put it)
  
Later on we want to have git attributes I'd think. For now let's just keep
the `git config diff.shortestlineheuristic true` config option for testing?
  
Changes since Jacobs v2:
 * s/empty line/shortest line/g
   That new heuristic is a superset of the empty line heuristic as empty lines
   are shortest lines.  This solves the "What is an empty line?" question
   (Think of CRLF vs LF)
 * fixed Jacobs rebase mistake (which is also fixed in Jacobs v3)

Changes since my v1:
* rename xdl_hash_and_recmatch to recs_match
* remove counting empty lines in the first section of the looping

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

Jacob Keller (1):
  xdiff: add recs_match helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 43 ++
 5 files changed, 64 insertions(+), 4 deletions(-)

-- 
2.8.1.189.gd13d43c

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH, WAS: "weird diff output?" v3 1/2] xdiff: add recs_match helper function

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller 
---
 xdiff/xdiffi.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d6326e..b56c1e24593c 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+{
+   return (recs[ixs]->ha == recs[ix]->ha &&
+   xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+recs[ix]->ptr, recs[ix]->size,
+flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha 
&&
-  xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 
1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+   while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, 
flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -470,8 +477,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-  xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
recs[ix]->ptr, recs[ix]->size, flags)) {
+   while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+
rchg[ixs++] = 0;
rchg[ix++] = 1;
 
-- 
2.8.1.369.geae769a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH, WAS: "weird diff output?" v3 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

From: Stefan Beller 

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller 
Signed-off-by: Jacob Keller 
---
 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 26 ++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..9265d60d9571 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+   Set this option to true to enable the empty line chunk heuristic when
+   producing diff output. This heuristic will attempt to shift hunks such
+   that the last common empty line occurs below the hunk with the rest of
+   the context above it.
+
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+   When possible, shift common empty lines in diff hunks below the hunk
+   such that the last common empty line for each hunk is below, with the
+   rest of the context above the hunk.
+
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
+   if (!strcmp(var, "diff.emptylineheuristic")) {
+   diff_empty_line_heuristic = git_config_bool(var, value);
+   return 0;
+   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
+   if (diff_empty_line_heuristic)
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--empty-line-heuristic"))
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+   else if (!strcmp(arg, "--no-empty-line-heuristic"))
+   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
else if (!strcmp(arg, "--patience"))
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@

[RFC PATCH, WAS: "weird diff output?" v3 0/2] implement empty line diff chunk heuristic

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

Third version of my series with a few more minor fixups. I left the
diff command line and configuration option alone for now, suspecting
that long term we either (a) remove it or (b) use a gitattribute, so
there is no reason to bikeshed the name or its contents right now.

TODO:
* add some tests
* think about whether we need a git attribute or not (I did some
  thinking, and if we do need to configure this at all, this is where I
  would put it)
* figure out how to make is_emptyline CRLF aware

Changes since my v2:
* fixed is_emptylines in the wrong patch

Changes since my v1:
* rename xdl_hash_and_recmatch to recs_match
* remove counting empty lines in the first section of the looping

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

The interdiff between v2 and v3 is very small:

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 9436ad735243..dd93e6781e8b 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -485,6 +485,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the group.
 */
while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+
emptylines += is_emptyline(recs[ix]->ptr);
 
rchg[ixs++] = 0;

Jacob Keller (1):
  xdiff: add recs_match helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 41 +
 5 files changed, 62 insertions(+), 4 deletions(-)

-- 
2.8.1.369.geae769a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 21/21] bisect: get back halfway shortcut

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> The documentation says that when the maximum possible distance
> is found, the algorithm stops immediately. That feature is
> reestablished by this commit.
>
> Signed-off-by: Stephan Beyer 
> ---
>
> Notes:
> I plugged a memory leak here.

... relative to patch series v1, I presume?

> @@ -391,7 +391,13 @@ static void traversal_up_to_merges(struct commit_list 
> *queue,
>   }
>  
>   update_best_bisection(top);
> + if (distance_direction(top) == 0) { // halfway

Say /* halfway */ without // double-slash comment.

The remainder of the patch makes sense to me.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> +static struct commit *extract_merge_to_queue(struct commit_list **merges)
> +{
> + assert(merges);
> +
> + struct commit_list *p, *q;
> + struct commit *found;
> +

"gcc -Werror -Wdecl-after-statement" will barf at this.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 3:46 PM, Jeff King  wrote:
> On Fri, Apr 15, 2016 at 02:56:21PM -0700, Jacob Keller wrote:
>
>> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
>> long flags) {
>>* the line next of the current change group, shift 
>> forward
>>* the group.
>>*/
>> - while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
>> -xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
>> recs[ix]->ptr, recs[ix]->size, flags)) {
>> + while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
>> + emptylines += is_emptyline(recs[ix]->ptr);
>> +
>
> I have not looked closely at your patches yet, but is this hunk right?
> The is_emptyline stuff doesn't come in until patch 2.
>
> -Peff

Oops, I suspect this is a rebase mistake, will fix it.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> The idea is to reverse the DAG and perform a traversal
> starting on all sources of the reversed DAG.

Please clarify what you mean by "sources" here.  Those who read log
message in Git context would know that you mean the commit graph by
"DAG", and "reversed DAG" means "having reverse linkage that lets
you find children given a parent", so "DAG" does not need such a
clarification.

> We walk from the bottom commits, incrementing the weight while
> walking on a part of the graph that is single strand of pearls,
> or doing the "count the reachable ones the hard way" using
> compute_weight() when we hit a merge commit.

Makes sense.  So instead of "all sources", you can say "perform a
traversal starting from the bottom commits, going from parent to its
children".

> A traversal ends when the computed weight is falling or halfway.
> This way, commits with too high weight to be relevant are never
> visited (and their weights are never computed).

Yup, beautiful.

> diff --git a/bisect.c b/bisect.c
> index c6bad43..9487ba9 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -30,6 +30,9 @@ static unsigned marker;
>  struct node_data {
>   int weight;
>   unsigned marked;
> + unsigned parents;
> + unsigned visited : 1;
> + struct commit_list *children;
>  };
>  
>  #define DEBUG_BISECT 0

> +static inline void commit_list_insert_unique(struct commit *item,
> +   struct commit_list **list)
> +{
> + if (!*list || item < (*list)->item) /* empty list or item will be first 
> */
> + commit_list_insert(item, list);
> + else if (item != (*list)->item) { /* item will not be first or not 
> inserted */
> + struct commit_list *p = *list;
> + for (; p->next && p->next->item < item; p = p->next);
> + if (!p->next || item != p->next->item) /* not already inserted 
> */
> + commit_list_insert(item, >next);
> + }
> +}

H.

When you have two commits, struct commit *one, and struct commit
*two, is it safe to do a pointer comparison for ordering?

I know it would work in practice, but I am worried about language
lawyers (and possibly static analysis tools) barking at this code.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function

2016-04-15 Thread Jeff King

On Fri, Apr 15, 2016 at 02:56:21PM -0700, Jacob Keller wrote:

> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
> long flags) {
>* the line next of the current change group, shift 
> forward
>* the group.
>*/
> - while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
> -xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
> recs[ix]->ptr, recs[ix]->size, flags)) {
> + while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
> + emptylines += is_emptyline(recs[ix]->ptr);
> +

I have not looked closely at your patches yet, but is this hunk right?
The is_emptyline stuff doesn't come in until patch 2.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Default authentication over https?

2016-04-15 Thread Jeff King

On Fri, Apr 15, 2016 at 06:21:20PM -0400, Jeff King wrote:

> I think we can take that down to _two_ requests pretty easily. We know
> in the very first request that the server told us something like:
> 
>   < WWW-Authenticate: Basic realm="GitHub"
> 
> but curl doesn't remember that. However, we should be able to pull it
> out of the old request and feed it into the new one. That would save the
> second request, which is just a probe.

Hmm. Looks like we already pull this out of the curl result for other
reasons, but we never feed it back in to the next request. So if I do
this:

diff --git a/http.c b/http.c
index 9bedad7..add9bf2 100644
--- a/http.c
+++ b/http.c
@@ -1132,6 +1132,8 @@ static int handle_curl_result(struct slot_results 
*results)
return HTTP_NOAUTH;
} else {
 #ifdef LIBCURL_CAN_HANDLE_AUTH_ANY
+   if (results->auth_avail)
+   http_auth_methods = results->auth_avail;
http_auth_methods &= ~CURLAUTH_GSSNEGOTIATE;
 #endif
return HTTP_REAUTH;

that drops my test case down to two requests: once to find out that we
need auth via the 401, and then we feed curl sufficient information to
do the followup in a single request (the GSSNEGOTIATE thing there is a
hack from 4dbe664, which we can ignore for now).

Interestingly, curl _does_ reuse the connection this time. I'm still not
sure why it didn't in the original case. But this means the whole thing
is happening over a single TCP session, which is good (and I didn't have
to change my config at all).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 18/21] bisect: prepare for different algorithms based on find_all

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> This is a preparation commit with copy-and-paste involved.
> The function do_find_bisection() is changed and copied to
> two almost similar functions compute_all_weights() and
> compute_relevant_weights().
>
> The function compute_relevant_weights() stops when a
> "halfway" commit is found.
>
> To keep the code clean, the halfway commit is not returned
> and has to be found by best_bisection() afterwards.
> This results in a singular additional O(#commits)-time
> overhead but this will be outweighed by the following
> changes to compute_relevant_weights().
>
> It is necessary to keep compute_all_weights() for the
> "git rev-list --bisect-all" command. All other bisect-related
> commands will use compute_relevant_weights().
>
> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 116 
> ---
>  1 file changed, 97 insertions(+), 19 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index a254f28..c6bad43 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -179,6 +179,7 @@ static struct commit_list *best_bisection(struct 
> commit_list *list)
>   }
>   }
>  
> + best->next = NULL;
>   return best;
>  }

At this point of this patch it is unclear how this change is
relevant.  I'll read on without complaining but with puzzlement.

> @@ -245,9 +246,8 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list)
>   * unknown.  After running compute_weight() first, they will get zero
>   * or positive distance.
>   */
> -static struct commit_list *do_find_bisection(struct commit_list *list,
> -  struct node_data *weights,
> -  int find_all)
> +static void compute_all_weights(struct commit_list *list,
> + struct node_data *weights)
>  {
>   int n, counted;
>   struct commit_list *p;
> @@ -301,10 +301,88 @@ static struct commit_list *do_find_bisection(struct 
> commit_list *list,
>   if (!(p->item->object.flags & UNINTERESTING)
>&& (node_data(p->item)->weight == -2)) {
>   compute_weight(p->item);
> + counted++;
> + }
> + }
> +
> + show_list("bisection 2 compute_weight", counted, list);
> +
> + while (counted < total) {
> + for (p = list; p; p = p->next) {
> + struct commit_list *q;
> + unsigned flags = p->item->object.flags;
> +
> + if (0 <= node_data(p->item)->weight)
> + continue;
> + for (q = p->item->parents; q; q = q->next) {
> + if (q->item->object.flags & UNINTERESTING)
> + continue;
> + if (0 <= node_data(q->item)->weight)
> + break;
> + }
> + if (!q)
> + continue;
> +
> + /*
> +  * weight for p is unknown but q is known.
> +  * add one for p itself if p is to be counted,
> +  * otherwise inherit it from q directly.
> +  */

This is not a new problem, but "q" in the comment should be
"q->item"; my bad.

> + node_data(p->item)->weight = node_data(q->item)->weight;
> + if (!(flags & TREESAME)) {
> + node_data(p->item)->weight++;
> + counted++;
> + show_list("bisection 2 count one",
> +   counted, list);
> + }

This is not a new problem, and I do admit that I wrote the original
at 1daa09d9 (make the previous optimization work also on
path-limited rev-list --bisect, 2007-03-23), but this makes me
wonder why counted++ is not done for p that is treesame.

 ... goes and looks ...

Ahh, the original while loop was counting upto "nr", which is
different from total.  When the traversal is pathspec limited, I
counted in the original (and probably in 'master' branch before your
series) the number of tree-changing commits in 'nr' which can be
smaller than 'total'.  That is why skipping counted++ on a treesame
commit was the correct thing to do.

Is it possible that you did not test --bisect-all with pathspec?  I
have a suspicion that the above loop would not terminate because of
this change.  If that is the case, either "make total global and get
rid of nr" needs to be fixed, or (probably better) move counted++
out of this if statement.  At this point, counted indicates "how
many commits out of 'total' we have computed weight for?", so the
latter would make more sense to me, even though either is OK.

The "priming the well" step for the "no interesting parent--set
weight to

Re: Default authentication over https?

2016-04-15 Thread Jeff King

On Thu, Apr 14, 2016 at 05:32:16PM -0400, Isaac Levy wrote:

> After the authenticated request, curl says it's keeping the connection
> open, but the next fetch seems to do two handshakes again.  The
> unauthenticated request closes the connection, so the 2nd handshake is
> forced, but I'm not sure why subsequent git fetches still do
> handshakes.  I did a bit of sleuthing w/ source, GIT_CURL_VERBOSE and
> wireshark.
> 
> We're using GitHub enterprise -- it'd just be nice if there was a
> better way to configure for super fast fetches.  ssh with cached
> connections does avoid the SSL this overhead -- though as I recall git
> protocols over ssh are less performant.

It's the opposite; generally git is more efficient over ssh, because we
have a true full-duplex connection, and we can do everything over a
single session (though setup for the ssh session may or may not be
faster than SSL).

Here's what I observed just for the initial contact. If I instrument git
like this:

diff --git a/http.c b/http.c
index 4304b80..9bedad7 100644
--- a/http.c
+++ b/http.c
@@ -1422,6 +1422,7 @@ static int http_request(const char *url,
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip");

+   warning("running one curl slot...");
ret = run_one_slot(slot, );

if (options && options->content_type) {

and run:

  GIT_CURL_VERBOSE=1 git ls-remote 2>&1 | egrep '< HTTP|> |^Auth|^warn'

I get three requests:

  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  < HTTP/1.1 401 Authorization Required
  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  < HTTP/1.1 401 Authorization Required
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  Authorization: Basic ...
  < HTTP/1.1 200 OK

The first request is us (git) asking curl to hit the URL, and curl
returns the 401 from the server to us. At that point we'll prompt the
user (or the credential helper) for the password, and then we'll give
that to curl and ask it to make the request again. Curl will do the
probe with the 401, because we haven't set an auth type, and then follow
up (without returning the 401 to git) with the right credentials.

I think we can take that down to _two_ requests pretty easily. We know
in the very first request that the server told us something like:

  < WWW-Authenticate: Basic realm="GitHub"

but curl doesn't remember that. However, we should be able to pull it
out of the old request and feed it into the new one. That would save the
second request, which is just a probe.

But ideally we'd have this down to one request. I think we could do
something like this:

diff --git a/http.c b/http.c
index 9bedad7..7e5009d 100644
--- a/http.c
+++ b/http.c
@@ -870,6 +870,11 @@ struct active_request_slot *get_active_slot(void)
 #ifdef LIBCURL_CAN_HANDLE_AUTH_ANY
curl_easy_setopt(slot->curl, CURLOPT_HTTPAUTH, http_auth_methods);
 #endif
+
+   /* XXX should be configurable via http.* or whatever */
+   curl_easy_setopt(slot->curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
+   credential_fill(_auth);
+
if (http_auth.password || curl_empty_auth)
init_curl_http_auth(slot->curl);

My request then looks like:

  warning: running one curl slot...
  > GET /my/repo/info/refs?service=git-upload-pack HTTP/1.1
  Authorization: Basic ...
  < HTTP/1.1 200 OK

with just a single round-trip.

Obviously the hard-coding there is wrong, but the mental model here is
that the user would do something like:

  git config http.https://github.com.auth basic

to say "I know I want to do Basic auth for this host", and that would
trigger git to tell that to curl, and to prompt for the password
preemptively.  The obvious downside is that it requires manual
configuration, and some clue about "Basic" versus other auth types.

I didn't trace the protocol further, but I suspect that curl ends up
doing that probe for _each_ request, because each time we hand it only
the credential without telling it that we know the server wants "Basic"
auth.

So that tries to keep the number of requests down in the first place. As
far as reusing connections for the connections we _do_ make, I'm not
sure. I do see curl claiming to leave the connection up:

  * Connection #0 to host github.com left intact

but then we seem to make a separate one for the next request:

  * Hostname github.com was found in DNS cache
  *   Trying 192.30.252.120...
  * Connected to github.com (192.30.252.120) port 443 (#1)

I'm not sure why that is. We do get SSL reuse:

  * SSL re-using session ID
  * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256

which is good. And then of the two requests curl makes later, the second
one has:

  * Re-using existing connection! (#1) with host github.com

which is good. So I'm not sure why we don't reuse connection #0 from the
very first request. Perhaps it's

Re: [PATCH v2 17/21] bisect: rename count_distance() to compute_weight()

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> Let us use the term "weight" for the number of ancestors
> of each commit, and "distance" for the number
> min{weight, #commits - weight}. (Bisect finds the commit
> with maximum distance.)
>
> In these terms, "count_distance()" is the wrong name of
> the function. So it is renamed to "compute_weight()",
> and it also directly sets the computed weight.
>
> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)

Makes sense.  We can think of the "distance" the distance from the
periphery of the graph we are looking at, and "bisection" is to find
a point close to the center of the graph.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 16/21] bisect: make total number of commits global

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> The total number of commits in a bisect process is a property of
> the bisect process. Making this property global helps to make the code
> clearer.

OK.

>
> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 74 
> ++--
>  1 file changed, 39 insertions(+), 35 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index f737ce7..2b415ad 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -23,6 +23,8 @@ static const char *argv_show_branch[] = {"show-branch", 
> NULL, NULL};
>  static const char *term_bad;
>  static const char *term_good;
>  
> +static int total;
> +
>  static unsigned marker;
>  
>  struct node_data {
> @@ -38,7 +40,7 @@ static inline struct node_data *node_data(struct commit 
> *elem)
>   return (struct node_data *)elem->util;
>  }
>  
> -static inline int get_distance(struct commit *commit, int total)
> +static inline int get_distance(struct commit *commit)
>  {
>   int distance = node_data(commit)->weight;
>   if (total - distance < distance)
> @@ -54,7 +56,7 @@ static inline int get_distance(struct commit *commit, int 
> total)
>   * Return 0 if the distance is halfway.
>   * Return 1 if the distance is rising.
>   */
> -static inline int distance_direction(struct commit *commit, int total)
> +static inline int distance_direction(struct commit *commit)
>  {
>   int doubled_diff = 2 * node_data(commit)->weight - total;
>   if (doubled_diff < -1)
> @@ -107,25 +109,25 @@ static int count_interesting_parents(struct commit 
> *commit)
>   return count;
>  }
>  
> -static inline int halfway(struct commit *commit, int nr)
> +static inline int halfway(struct commit *commit)
>  {
>   /*
>* Don't short-cut something we are not going to return!
>*/
>   if (commit->object.flags & TREESAME)
>   return 0;
> - return !distance_direction(commit, nr);
> + return !distance_direction(commit);
>  }
>  
>  #if !DEBUG_BISECT
> -#define show_list(a,b,c,d) do { ; } while (0)
> +#define show_list(a,b,c) do { ; } while (0)
>  #else
> -static void show_list(const char *debug, int counted, int nr,
> +static void show_list(const char *debug, int counted,
> struct commit_list *list)
>  {
>   struct commit_list *p;
>  
> - fprintf(stderr, "%s (%d/%d)\n", debug, counted, nr);
> + fprintf(stderr, "%s (%d/%d)\n", debug, counted, total);
>  
>   for (p = list; p; p = p->next) {
>   struct commit_list *pp;
> @@ -157,7 +159,7 @@ static void show_list(const char *debug, int counted, int 
> nr,
>  }
>  #endif /* DEBUG_BISECT */
>  
> -static struct commit_list *best_bisection(struct commit_list *list, int nr)
> +static struct commit_list *best_bisection(struct commit_list *list)
>  {
>   struct commit_list *p, *best;
>   int best_distance = -1;
> @@ -169,7 +171,7 @@ static struct commit_list *best_bisection(struct 
> commit_list *list, int nr)
>  
>   if (flags & TREESAME)
>   continue;
> - distance = get_distance(p->item, nr);
> + distance = get_distance(p->item);
>   if (distance > best_distance) {
>   best = p;
>   best_distance = distance;
> @@ -195,10 +197,10 @@ static int compare_commit_dist(const void *a_, const 
> void *b_)
>   return oidcmp(>commit->object.oid, >commit->object.oid);
>  }
>  
> -static struct commit_list *best_bisection_sorted(struct commit_list *list, 
> int nr)
> +static struct commit_list *best_bisection_sorted(struct commit_list *list)
>  {
>   struct commit_list *p;
> - struct commit_dist *array = xcalloc(nr, sizeof(*array));
> + struct commit_dist *array = xcalloc(total, sizeof(*array));
>   int cnt, i;
>  
>   for (p = list, cnt = 0; p; p = p->next) {
> @@ -207,7 +209,7 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list, int n
>  
>   if (flags & TREESAME)
>   continue;
> - distance = get_distance(p->item, nr);
> + distance = get_distance(p->item);
>   array[cnt].commit = p->item;
>   array[cnt].distance = distance;
>   cnt++;
> @@ -243,7 +245,7 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list, int n
>   * or positive distance.
>   */
>  static struct commit_list *do_find_bisection(struct commit_list *list,
> -  int nr, struct node_data *weights,
> +  struct node_data *weights,
>int find_all)
>  {
>   int n, counted;
> @@ -262,7 +264,7 @@ static struct commit_list *do_find_bisection(struct 
> commit_list *list,
>   node_data(commit)->weight = 1;
>   counted++;
>

Re: [PATCH] Adding list of p4 jobs to git commit message

2016-04-15 Thread Luke Diamand

On 15 April 2016 at 21:27, Junio C Hamano  wrote:
> Jan Durovec  writes:
>
>> ---
>
> A few issues.  Please:
>
>  (1) Sign-off your work.
>
>  (2) Try to find those who are familiar with the area and Cc them.
>
>  "git shortlog -s -n --since=18.months --no-merges git-p4.py"
>  may help.
>
>  (3) Follow the style of existing commits when giving a title to
>  your patch.
>
>  "git shortlog --since=18.months --no-merges git-p4.py" may
>  help you notice "git-p4: do this thing" is the common way to
>  title "git p4" patches.
>
>  (4) Justify why your change is a good thing in your log message.
>  What you did, i.e. "list p4 jobs when making a commit", can be
>  seen by the patch, but readers cannot guess why you thought it
>  is a good idea to extract "job%d" out of the P4 commit and to
>  record them in the resulting Git commit, unless you explain
>  things like:
>
>  - what goes wrong if you don't?
>  - when would "job%d" appear in P4 commit?
>  - is it sane to assume "job0", "job1",... appear consecutively?
>
>  (5) Describe what your change does clearly.  "Adding list" is not
>  quite clear.  Where in the "git commit message" are you adding
>  the list, and why is that location in the message the most
>  appropriate place to add it?

Additionally, it would be very useful to add a test case (see t/t98*).
There are quite a few test cases for git-p4, and they make maintenance
a lot easier.

Some documentation (Documentation/git-p4.txt) would also be useful.

Thanks
Luke
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 15/21] bisect: introduce distance_direction()

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> We introduce the concept of rising and falling distances
> (in addition to a halfway distance).
> This will be useful in subsequent commits.
>
> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 33 +++--
>  1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index cfd406c..f737ce7 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -46,6 +46,28 @@ static inline int get_distance(struct commit *commit, int 
> total)
>   return distance;
>  }
>  
> +/*
> + * Return -1 if the distance is falling.
> + * (A falling distance means that the distance of the
> + *  given commit is larger than the distance of its
> + *  child commits.)
> + * Return 0 if the distance is halfway.
> + * Return 1 if the distance is rising.
> + */
> +static inline int distance_direction(struct commit *commit, int total)
> +{
> + int doubled_diff = 2 * node_data(commit)->weight - total;
> + if (doubled_diff < -1)
> + return 1;
> + if (doubled_diff > 1)
> + return -1;
> + /*
> +  * 2 and 3 are halfway of 5.
> +  * 3 is halfway of 6 but 2 and 4 are not.
> +  */
> + return 0;
> +}

Nice.  This makes it clear that to arrive at a half-way point, it is
pointless to dig a commit to its parents if the direction says it
will only take us further away from the half-way point.

>  static int count_distance(struct commit *elem)
>  {
>   int nr = 0;
> @@ -92,16 +114,7 @@ static inline int halfway(struct commit *commit, int nr)
>*/
>   if (commit->object.flags & TREESAME)
>   return 0;
> - /*
> -  * 2 and 3 are halfway of 5.
> -  * 3 is halfway of 6 but 2 and 4 are not.
> -  */
> - switch (2 * node_data(commit)->weight - nr) {
> - case -1: case 0: case 1:
> - return 1;
> - default:
> - return 0;
> - }
> + return !distance_direction(commit, nr);
>  }
>  
>  #if !DEBUG_BISECT
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 13/21] bisect: use commit instead of commit list as arguments when appropriate

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> It makes no sense that the argument for count_distance() and
> halfway() is a commit list when only its first commit is relevant.
>
> Signed-off-by: Stephan Beyer 
> ---

Makes sense (modulo perhaps s/elem/commit/).

>  bisect.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index 4209c75..2c1102f 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -38,11 +38,11 @@ static inline struct node_data *node_data(struct commit 
> *elem)
>   return (struct node_data *)elem->util;
>  }
>  
> -static int count_distance(struct commit_list *entry)
> +static int count_distance(struct commit *elem)
>  {
>   int nr = 0;
>   struct commit_list *todo = NULL;
> - commit_list_append(entry->item, );
> + commit_list_append(elem, );
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 14/21] bisect: extract get_distance() function from code duplication

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)

Nice.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 12/21] bisect: replace clear_distance() by unique markers

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> @@ -43,15 +43,17 @@ static int count_distance(struct commit_list *entry)
>   int nr = 0;
>   struct commit_list *todo = NULL;
>   commit_list_append(entry->item, );
> + marker++;
>  
>   while (todo) {
>   struct commit *commit = pop_commit();
>  
> - if (!(commit->object.flags & (UNINTERESTING | COUNTED))) {
> + if (!(commit->object.flags & UNINTERESTING)
> +  && node_data(commit)->marked != marker) {

Makes sense.

> @@ -123,10 +116,9 @@ static void show_list(const char *debug, int counted, 
> int nr,
>   const char *subject_start;
>   int subject_len;
>  
> - fprintf(stderr, "%c%c%c ",
> + fprintf(stderr, "%c%c ",
>   (flags & TREESAME) ? ' ' : 'T',
> - (flags & UNINTERESTING) ? 'U' : ' ',
> - (flags & COUNTED) ? 'C' : ' ');
> + (flags & UNINTERESTING) ? 'U' : ' ');

As this one is for debugging, could we keep the output of 'C'
intact?

It is equivalent to

commit->util && node_data(commit)->marked == marker ? 'C' : ' '

right?

This makes me wonder if node_data(commit) should return NULL instead
of asserting on commit->util in [11/21], by the way.  That would
make the above

node_data(commit) && node_data(commit)->marked == marker
? 'C' : ' '

which may be easier to read.

Another small thing I overlooked in [11/21] is that the parameter to
node_data() helper should not be called "elem", which is typically
the name used to point at an element on a linked list structure such
as commit_list.  Call it "commit" instead, as that is typically the
way we call a single parameter/variable that appears in a function
that is "struct commit".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 2:44 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
 What you have is a pure developer support; aim to come up with "good
 enough" way, giving developers an easier way to experiment with, and
 remove it before the feature is shipped to the end user.
>>
>> What are your thoughts on adding this do the gitattributes diff
>> section? Ie: modifications to the diff driver.
>
> I do try to foresee the future needs but I try to limit the forecast
> to "just enough so that we won't waste engineering effort on a wrong
> thing".  "It may need to be turned off conditionally" suggests we
> would want attributes added per path, and that is sufficient to make
> me say "don't waste time on bikeshedding configuration variable
> names, it will not be in the final product".
>
> We do not need yet to know what the final name of the attributes
> are, or how exactly the bit to affect the low level mechanism will
> be set by the attribute mechanism.  I do not think this topic is
> there yet, and it is a waste of engineering effort to prematurely
> trying to make things too flexible and customizable, when the thing
> that will eventually become flexible by conditionally enabled is not
> even there yet.
>
> As long as the low-level thing has a knob, set of internal bits, to
> enable and disable it, that is all that is necessary to know at this
> point.
>
> Having said all that, I'd expect we'd compute the right bit to use
> in the same place where we currently pick the custom textconv
> driver, diff backend, etc., by consulting the attribute system
> before running the diff.
>
> But again, I'd think it would be waste of time to think beyond that
> at this point, identifying exactly at which line of which source
> file the new code would go and what that new code would look like,
> until we are ready to start integrating it.

Ok, for now I'll leave this as is then.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH, WAS: "weird diff output?" v2 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

From: Stefan Beller 

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller 
Signed-off-by: Jacob Keller 
---
 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 24 
 5 files changed, 49 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..9265d60d9571 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+   Set this option to true to enable the empty line chunk heuristic when
+   producing diff output. This heuristic will attempt to shift hunks such
+   that the last common empty line occurs below the hunk with the rest of
+   the context above it.
+
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+   When possible, shift common empty lines in diff hunks below the hunk
+   such that the last common empty line for each hunk is below, with the
+   rest of the context above the hunk.
+
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
+   if (!strcmp(var, "diff.emptylineheuristic")) {
+   diff_empty_line_heuristic = git_config_bool(var, value);
+   return 0;
+   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
+   if (diff_empty_line_heuristic)
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--empty-line-heuristic"))
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+   else if (!strcmp(arg, "--no-empty-line-heuristic"))
+   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
else if (!strcmp(arg, "--patience"))
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6

[RFC PATCH, WAS: "weird diff output?" v2 0/2] implement empty line diff chunk heuristic

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

Second version of my series with a few more minor fixups. I left the
diff command line and configuration option alone for now, suspecting
that long term we either (a) remove it or (b) use a gitattribute, so
there is no reason to bikeshed the name or its contents right now.

TODO:
* add some tests
* think about whether we need a git attribute or not (I did some
  thinking, and if we do need to configure this at all, this is where I
  would put it)
* figure out how to make is_emptyline CRLF aware

Changes since my v1:
* rename xdl_hash_and_recmatch to recs_match
* remove counting empty lines in the first section of the looping

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

For reviewer convenience, the interdiff between v1 and v2:

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index cebf82702d2a..9265d60d9571 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -173,8 +173,8 @@ include::mergetools-diff.txt[]
 diff.emptyLineHeuristic::
Set this option to true to enable the empty line chunk heuristic when
producing diff output. This heuristic will attempt to shift hunks such
-   that a common empty line occurs below the hunk with the rest of the
-   context above it.
+   that the last common empty line occurs below the hunk with the rest of
+   the context above it.
 
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 83984b90f82f..9436ad735243 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -405,7 +405,7 @@ static int is_emptyline(const char *recs)
return recs[0] == '\n'; /* CRLF not covered here */
 }
 
-static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long 
flags)
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
return (recs[ixs]->ha == recs[ix]->ha &&
xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
@@ -457,9 +457,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, 
ix - 1, flags)) {
-   emptylines += is_emptyline(recs[ix - 1]->ptr);
-
+   while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, 
flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -486,7 +484,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, 
ix, flags)) {
+   while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
emptylines += is_emptyline(recs[ix]->ptr);
 
rchg[ixs++] = 0;
@@ -527,7 +525,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
while (ixs > 0 &&
   !is_emptyline(recs[ix - 1]->ptr) &&
-  xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, 
flags)) {
+  recs_match(recs, ixs - 1, ix - 1, flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
}

Jacob Keller (1):
  xdiff: add recs_match helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 40 
 5 files changed, 61 insertions(+), 4 deletions(-)

-- 
2.8.1.369.geae769a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller 
---
 xdiff/xdiffi.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d6326e..d14c47de5e36 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+{
+   return (recs[ixs]->ha == recs[ix]->ha &&
+   xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+recs[ix]->ptr, recs[ix]->size,
+flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha 
&&
-  xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 
1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+   while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, 
flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-  xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
recs[ix]->ptr, recs[ix]->size, flags)) {
+   while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+   emptylines += is_emptyline(recs[ix]->ptr);
+
rchg[ixs++] = 0;
rchg[ix++] = 1;
 
-- 
2.8.1.369.geae769a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/21] bisect: use struct node_data array instead of int array

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> This is a preparation for subsequent changes.
> During a bisection process, we want to augment commits with
> further information to improve speed.
>
> Signed-off-by: Stephan Beyer 
> ---

Makes sense.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Junio C Hamano

Jacob Keller  writes:

>>> What you have is a pure developer support; aim to come up with "good
>>> enough" way, giving developers an easier way to experiment with, and
>>> remove it before the feature is shipped to the end user.
>
> What are your thoughts on adding this do the gitattributes diff
> section? Ie: modifications to the diff driver.

I do try to foresee the future needs but I try to limit the forecast
to "just enough so that we won't waste engineering effort on a wrong
thing".  "It may need to be turned off conditionally" suggests we
would want attributes added per path, and that is sufficient to make
me say "don't waste time on bikeshedding configuration variable
names, it will not be in the final product".

We do not need yet to know what the final name of the attributes
are, or how exactly the bit to affect the low level mechanism will
be set by the attribute mechanism.  I do not think this topic is
there yet, and it is a waste of engineering effort to prematurely
trying to make things too flexible and customizable, when the thing
that will eventually become flexible by conditionally enabled is not
even there yet.

As long as the low-level thing has a knob, set of internal bits, to
enable and disable it, that is all that is necessary to know at this
point.

Having said all that, I'd expect we'd compute the right bit to use
in the same place where we currently pick the custom textconv
driver, diff backend, etc., by consulting the attribute system
before running the diff.

But again, I'd think it would be waste of time to think beyond that
at this point, identifying exactly at which line of which source
file the new code would go and what that new code would look like,
until we are ready to start integrating it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 10/21] bisect: get rid of recursion in count_distance()

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> Large repositories with a huge amount of merge commits in the
> bisection process could lead to stack overflows in git bisect.
> In order to prevent this, this commit uses an *iterative* version
> for counting the number of ancestors of a commit.

Yay!

> -/*
> - * This is a truly stupid algorithm, but it's only
> - * used for bisection, and we just don't care enough.
> - *
> - * We care just barely enough to avoid recursing for
> - * non-merge entries.
> - */
>  static int count_distance(struct commit_list *entry)
>  {
>   int nr = 0;
> + struct commit_list *todo = NULL;
> + commit_list_append(entry->item, );
>  
> - while (entry) {
> - struct commit *commit = entry->item;
> - struct commit_list *p;
> + while (todo) {
> + struct commit *commit = pop_commit();
>  
> - if (commit->object.flags & (UNINTERESTING | COUNTED))
> - break;
> - if (!(commit->object.flags & TREESAME))
> - nr++;
> - commit->object.flags |= COUNTED;
> - p = commit->parents;
> - entry = p;
> - if (p) {
> - p = p->next;
> - while (p) {
> - nr += count_distance(p);
> - p = p->next;
> + if (!(commit->object.flags & (UNINTERESTING | COUNTED))) {
> + struct commit_list *p;
> + if (!(commit->object.flags & TREESAME))
> + nr++;
> + commit->object.flags |= COUNTED;
> +
> + for (p = commit->parents; p; p = p->next) {
> + commit_list_insert(p->item, );
>   }
>   }
>   }
> @@ -287,7 +277,7 @@ static struct commit_list *do_find_bisection(struct 
> commit_list *list,
>* can reach.  So we do not have to run the expensive
>* count_distance() for single strand of pearls.
>*
> -  * However, if you have more than one parents, you cannot
> +  * However, if you have more than one parent, you cannot

Thanks.  This grammo is mine, back in 1c4fea3a (git-rev-list
--bisect: optimization, 2007-03-21)

> @@ -296,17 +286,16 @@ static struct commit_list *do_find_bisection(struct 
> commit_list *list,
>* way, and then fill the blanks using cheaper algorithm.
>*/
>   for (p = list; p; p = p->next) {
> - if (p->item->object.flags & UNINTERESTING)
> - continue;
> - if (weight(p) != -2)
> - continue;
> - weight_set(p, count_distance(p));
> - clear_distance(list);
> + if (!(p->item->object.flags & UNINTERESTING)
> +  && (weight(p) == -2)) {
> + weight_set(p, count_distance(p));
> + clear_distance(list);
>  
> - /* Does it happen to be at exactly half-way? */
> - if (!find_all && halfway(p, nr))
> - return p;
> - counted++;
> + /* Does it happen to be at exactly half-way? */
> + if (!find_all && halfway(p, nr))
> + return p;
> + counted++;
> + }
>   }

I can buy collapsing two if() statements into one, but I'd prefer to
see us keep the structure:

loop () {
if (... || ...)
continue;
quite a
many
operations
here
}

>  
>   show_list("bisection 2 count_distance", counted, nr, list);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 09/21] bisect: make algorithm behavior independent of DEBUG_BISECT

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> If DEBUG_BISECT is set to 1, bisect does not only show debug
> information but also changes the algorithm behavior: halfway()
> is always false.
>
> This commit makes the algorithm independent of DEBUG_BISECT.
>
> Signed-off-by: Stephan Beyer 
> ---

Another good candidate for preliminary clean-up.

I do not remember what the rationale was to do this short-cut when I
wrote it at 1daa09d9 (make the previous optimization work also on
path-limited rev-list --bisect, 2007-03-23).  Thanks for spotting it.

>  bisect.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index 2f54d96..1a13f35 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -101,8 +101,6 @@ static inline int halfway(struct commit_list *p, int nr)
>*/
>   if (p->item->object.flags & TREESAME)
>   return 0;
> - if (DEBUG_BISECT)
> - return 0;
>   /*
>* 2 and 3 are halfway of 5.
>* 3 is halfway of 6 but 2 and 4 are not.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 2:15 PM, Jacob Keller  wrote:
> On Fri, Apr 15, 2016 at 1:30 PM, Junio C Hamano  wrote:
>> Jacob Keller  writes:
>>
>>> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano  wrote:
>>>
 I actually do not think these knobs should exist when the code is
 mature enough to be shipped to the end users.

 Use "diff.compactionHeuristics = " as an opaque set of bits to
 help the developers while they compare notes and reach consensus on
 a single tweak that they can agree on being good enough, and then
 remove that variable before the code hits 'next'.

 Thanks.
>>>
>>> I was under the impression that we would want a longer lived
>>> configuration until we had enough data to say whether it was
>>> helpful to make it default. I guess i had thought it would need to
>>> be longer lived since there may be cases where it's not optimal
>>> and being able to turn it off would be good?
>>
>> Once you start worrying about "some cases this may misbehave", a
>> configuration variable is a wrong mechanism to do so anyway.  You
>> would need to tie this to attributes, so the users can say "use this
>> heuristics for my C code, but do not apply it for my AsciiDoc
>> input", etc.
>>
>
> I think this makes perfect sense to apply this as an attribute,
> however.. why isn't the current diff algorithm done this way?
>
> Thanks,
> Jake
>
>> What you have is a pure developer support; aim to come up with "good
>> enough" way, giving developers an easier way to experiment with, and
>> remove it before the feature is shipped to the end user.


What are your thoughts on adding this do the gitattributes diff
section? Ie: modifications to the diff driver.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/21] bisect: make bisect compile if DEBUG_BISECT is set

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> Setting the macro DEBUG_BISECT to 1 enables debugging information
> for the bisect algorithm. The code did not compile due to struct
> changes.
>
> Signed-off-by: Stephan Beyer 
> ---

Thanks.

This is something that we should do as a preparatory clean-up patch
before the series.  The real body of the series is more important
thing for us to spend review cycles on, and striving to slim it down
by having preparatory bits graduate early would help the process.


>  bisect.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index 901e4d3..2f54d96 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -131,7 +131,7 @@ static void show_list(const char *debug, int counted, int 
> nr,
>   unsigned flags = commit->object.flags;
>   enum object_type type;
>   unsigned long size;
> - char *buf = read_sha1_file(commit->object.sha1, , );
> + char *buf = read_sha1_file(commit->object.oid.hash, , 
> );
>   const char *subject_start;
>   int subject_len;
>  
> @@ -143,10 +143,10 @@ static void show_list(const char *debug, int counted, 
> int nr,
>   fprintf(stderr, "%3d", weight(p));
>   else
>   fprintf(stderr, "---");
> - fprintf(stderr, " %.*s", 8, sha1_to_hex(commit->object.sha1));
> + fprintf(stderr, " %.*s", 8, 
> sha1_to_hex(commit->object.oid.hash));
>   for (pp = commit->parents; pp; pp = pp->next)
>   fprintf(stderr, " %.*s", 8,
> - sha1_to_hex(pp->item->object.sha1));
> + sha1_to_hex(pp->item->object.oid.hash));
>  
>   subject_len = find_commit_subject(buf, _start);
>   if (subject_len)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 07/21] bisect: plug the biggest memory leak

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> Signed-off-by: Stephan Beyer 
> ---
>  bisect.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/bisect.c b/bisect.c
> index 7996c29..901e4d3 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -984,6 +984,8 @@ int bisect_next_all(const char *prefix, int no_checkout)
>   exit(10);
>   }
>  
> + free_commit_list(revs.commits);
> +
>   nr = all - reaches - 1;
>   steps = estimate_bisect_steps(all);
>   printf("Bisecting: %d revision%s left to test after this "

While I do not think this is wrong per-se (i.e. it is clear that we
no longer need revs.commits), after this the function will return to
the top-level caller and exit immediately, and I do not see anything
that desperately wants to use as much memory as available (i.e. would
be helped by this piece of memory released early).  "the biggest"
may be an overstatement ;-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 1:30 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano  wrote:
>>
>>> I actually do not think these knobs should exist when the code is
>>> mature enough to be shipped to the end users.
>>>
>>> Use "diff.compactionHeuristics = " as an opaque set of bits to
>>> help the developers while they compare notes and reach consensus on
>>> a single tweak that they can agree on being good enough, and then
>>> remove that variable before the code hits 'next'.
>>>
>>> Thanks.
>>
>> I was under the impression that we would want a longer lived
>> configuration until we had enough data to say whether it was
>> helpful to make it default. I guess i had thought it would need to
>> be longer lived since there may be cases where it's not optimal
>> and being able to turn it off would be good?
>
> Once you start worrying about "some cases this may misbehave", a
> configuration variable is a wrong mechanism to do so anyway.  You
> would need to tie this to attributes, so the users can say "use this
> heuristics for my C code, but do not apply it for my AsciiDoc
> input", etc.
>

I think this makes perfect sense to apply this as an attribute,
however.. why isn't the current diff algorithm done this way?

Thanks,
Jake

> What you have is a pure developer support; aim to come up with "good
> enough" way, giving developers an easier way to experiment with, and
> remove it before the feature is shipped to the end user.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 06/21] bisect: add test for the bisect algorithm

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> +test_expect_success 'bisect algorithm works in linear history with an odd 
> number of commits' '
> + git bisect start A7 &&
> + git bisect next &&
> + test_cmp_rev HEAD A3 A4
> +'
> +
> +test_expect_success 'bisect algorithm works in linear history with an even 
> number of commits' '
> + git bisect reset &&
> + git bisect start A8 &&
> + git bisect next &&
> + test_cmp_rev HEAD A4
> +'
> +
> +test_expect_success 'bisect algorithm works with a merge' '
> + git bisect reset &&
> + git bisect start Bmerge &&
> + git bisect next &&
> + test_cmp_rev HEAD A5 &&
> + git bisect good &&
> + test_cmp_rev HEAD A8 &&
> + git bisect good &&
> + test_cmp_rev HEAD B1 B2
> +'
> +
> +#   | w  min | w  min | w  min | w  min |
> +# B---.BCDmerge | 18  0  | 90 | 50 | 30 |
> +# |\ \ \|||||
> +# | | | *  D2   | 5   5  | 22 | 22*| good   |
> +# | | | *  D1   | 4   4  | 11 | 11 | good   |
> +# | | * |  C3   | 10  8  | 11 | 11 | 11*|
> +# | | * |  C2   | 9   9 *| good   | good   | good   |
> +# | | * |  C1   | 8   8  | good   | good   | good   |
> +# | * | |  B3   | 8   8  | 33 | 11 | 11*|
> +# * | | |  Bmerge   | 11  7  | 44*| good   | good   |
> +# |\ \ \ \  |||||
> +# | |/ / /  |||||
> +# | * | |  B2   | 7   7  | 22 | good   | good   |
> +# | * | |  B1   | 6   6  | 11 | good   | good   |
> +# * | | |  A8   | 8   8  | 11 | good   | good   |
> +# | |/ /|||||
> +# |/| | |||||
> +# * | |   A7| 7   7  | good   | good   | good   |
> +# * | |   A6| 6   6  | good   | good   | good   |
> +# |/ /  |||||
> +# * | A5| 5   5  | good   | good   | good   |
> +# * | A4| 4   4  | good   | good   | good   |
> +# |/|||||
> +# *   A3| 3   3  | good   | good   | good   |
> +# *   A2| 2   2  | good   | good   | good   |
> +# *   A1| 1   1  | good   | good   | good   |

Nice drawing.  With this, it is easy to see how the first three
examples above are testing the right thing, too.  It is not
immediately clear what these asterisks in the table are trying to
say, though (the same comment applies to the other drawing below).

> +test_expect_success 'bisect algorithm works with octopus merge' '
> + git bisect reset &&
> + git bisect start BCDmerge &&
> + git bisect next &&
> + test_cmp_rev HEAD C2 &&
> + git bisect good &&
> + test_cmp_rev HEAD Bmerge &&
> + git bisect good &&
> + test_cmp_rev HEAD D2 &&
> + git bisect good &&
> + test_cmp_rev HEAD B3 C3 &&
> + git bisect good &&
> + test_cmp_rev HEAD C3 B3 &&
> + git bisect good > output &&
> + grep "$(git rev-parse BCDmerge) is the first bad commit" output
> +'
> +
> +# G 5a6bcdfD3   | w  min | w  min |
> +# | B 02f2eed  A9   | 14  0  | 7   0  |
> +# | *---. 6174c5c  BCDmerge | 13  1  | 6   1  |
> +# | |\ \ \  |||
> +# | |_|_|/  |||
> +# |/| | |   |||
> +# G | | | a6d6dab  D2   | good   | good   |
> +# * | | | 86414e4  D1   | good   | good   |
> +# | | | * c672402  C3   | 7   7 *| good   |
> +# | | | * 0555272  C2   | 6   6  | good   |
> +# | | | * 28c2b2a  C1   | 5   5  | good   |
> +# | | * | 4b5a7d9  B3   | 5   5  | 3   3 *|
> +# | * | | a419ab7  Bmerge   | 8   6  | 4   3 *|
> +# | |\ \ \  |||
> +# | | |/ /  |||
> +# | | * | 4fa1e39  B2   | 4   4  | 2   2  |
> +# | | * | 92a014d  B1   | 3   3  | 1   1  |
> +# | * | | 79158c7  A8   | 5   5  | 1   1  |
> +# | | |/|||
> +# | |/| |||
> +# | * | 237eb73A7   | 4   4  | good   |
> +# | * | 3b2f811A6   | 3   3  | good   |
> +# | |/  |||
> +# | * 0f2b6d2  A5   | 2   2  | good   |
> +# | * 1fcdaf0  A4   | 1   1  | good   |
> +# |/|||
> +# * 096648bA3   | good   | good   |
> +# * 1cf01b8A2   | good   | good   |
> +# * 6623165A1   | good   | good   |
> +
> +test_expect_success 'bisect algorithm works with good commit on unrelated 
> branch' '
> + git bisect reset &&
> + git bisect start A9 D3 &&
> + test_cmp_rev HEAD "$(git merge-base A9 D3)" &&
> + test_cmp_rev HEAD D2 &&
> + git bisect good &&
> + test_cmp_rev HEAD C3 &&
> + git bisect good &&
> +

Re: [PATCH 0/6] Miscellaneous merge fixes

2016-04-15 Thread Elijah Newren

Hi,

On Tue, Apr 12, 2016 at 6:23 PM, Junio C Hamano  wrote:
> Elijah Newren  writes:
>
>> Elijah Newren (6):
>>   Remove duplicate code
>>   Avoid checking working copy when creating a virtual merge base
>>   Add merge testcases for when index doesn't match HEAD
>>   merge-octopus: Abort if index does not match HEAD
>>   Add a testcase demonstrating a bug with trivial merges
>>   builtin/merge.c: Fix a bug with trivial merges
>
> Please be careful when giving titles to your patches.  They will be
> shown in a context that is much wider than the area your attention
> is currently concentrated on.  E.g. does "Remove duplicate code"
> tell readers of "git shortlog --no-merges v2.8.0..v2.9.0" what the
> change was about when it eventually appears in the upcoming release?

I will try to do better on that.  For the patch you mention, how about:
   merge-recursive: Remove a few lines of duplicate code
?

I can resubmit the series correcting both that subject and the subject
of the second patch using your suggestion elsewhere in this thread.
The other four already provide some context on their area; am I
correct to assume those provide enough?

Elijah
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 15/16] branch: use ref-filter printing APIs

2016-04-15 Thread Jeff King

On Sat, Apr 16, 2016 at 01:57:23AM +0530, Karthik Nayak wrote:

> I had a look at your patch and even tested it, seems solid, I like how you
> integrated all these atoms together under refname_atom_parser_internal().
> I'm squashing this in, for my re-roll. Thanks.

Great, thanks for picking it up.

> > So actually, we _do_ accept "%(upstream:short,track)" with your
> > patch, which is somewhat nonsensical. It just ignores "short" and
> > takes whatever option came last. Which is reasonable, though
> > flagging an error would also be reasonable (and I think is what
> > existing git does). I don't think it matters much either way.
> >
> 
> I think it was decided a while ago that it'd be best to ignore all and
> accept the last argument without barfing, I couldn't find the exact
> link. But I'm open to both.  But if you look at the %(align) atom,
> even that accepts mutually exclusive arguments and accepts the last
> argument without reporting an error.

Makes sense, and I'm fine with how you have it (and my patch tried to
retain that property). I just wasn't sure if it was intentional, as I
did a bad job of paying attention to earlier rounds of the series. Thank
you for keeping at it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 05/21] t6030: generalize test to not rely on current implementation

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> The bisect algorithm allows different outcomes if, for example,
> the number of commits between a good and a bad commit is even.
> The current test relies on a specific behavior (for example,
> the behavior of the halfway() implementation). By disabling
> halfway(), some skip tests fail although the algorithm works.
>
> This commit generalizes the test t6030 such that it works
> even if the bisect algorithm uses its degree of freedom to
> choose another commit.
>
> While at it, fix some indentation issues: use tabs instead of
> 4 spaces.

While style fixes are very much welcome, it makes the patch
unnecessary noisy.  We typically do so as a preparatory clean-up.

And if you do style fixes, please fix other style issues, such as

 - use of "if [ ... ]; then", which should be spelled as

if test ...
then

 - unnecessasry space between redirection operator and the filename,
   and lack of double-quoting around such a filename in a variable
   to work around certain vintage of bash that gives unnecessary
   warnings, e.g. 'echo foo > $file' must be spelled as

echo foo >"$file"

etc.

> @@ -84,9 +82,8 @@ test_expect_success 'bisect fails if given any junk instead 
> of revs' '
>  
>  test_expect_success 'bisect reset: back in the master branch' '
>   git bisect reset &&
> - echo "* master" > branch.expect &&
>   git branch > branch.output &&
> - cmp branch.expect branch.output
> + grep "^* master" branch.output

This is not a style fix, and it is not a "possibly multiple valid
outcomes", either.

If the purpose of change is "to do the right thing", checking the
output from "git symbolic-ref HEAD" against "refs/heads/master" is
the kosher way to check what test is trying to do.

> @@ -180,14 +175,15 @@ test_expect_success 'bisect start: no 
> ".git/BISECT_START" if checkout error' '
>   git checkout HEAD hello
>  '
>  
> -# $HASH1 is good, $HASH4 is bad, we skip $HASH3
> +# $HASH1 is good, monday is bad, we skip $HASH3

I am not sure this s/$HASH4/monday/ is adding value.  Certainly it
breaks consistency, which you could keep by defining SIDE_HASH5 or
something when you added the "Ok Monday, let's do it" commit.  On
the other hand, you could choose to consistently use branch-relative
names by turning $HASH3 to master~1, etc.

>  # but $HASH2 is bad,
>  # so we should find $HASH2 as the first bad commit
> ...

> +test_expect_success '"git bisect run" simple case' '
> + echo "#"\!"/bin/sh" > test_script.sh &&
> + echo "grep Another hello > /dev/null" >> test_script.sh &&
> + echo "test \$? -ne 0" >> test_script.sh &&
> + chmod +x test_script.sh &&

Use write_script in the "style fix" preparatory clean-up patch?

> + git bisect start &&
> + git bisect good $HASH1 &&
> + git bisect bad $HASH4 &&
> + git bisect run ./test_script.sh > my_bisect_log.txt &&
> + grep "$HASH3 is the first bad commit" my_bisect_log.txt &&
> + git bisect reset
> +'
> ...
> +test_expect_success '"git bisect run" with more complex "git bisect start"' '
> + echo "#"\!"/bin/sh" > test_script.sh &&
> + echo "grep Ciao hello > /dev/null" >> test_script.sh &&
> + echo "test \$? -ne 0" >> test_script.sh &&
> + chmod +x test_script.sh &&

Likewise.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 04/21] t: use test_cmp_rev() where appropriate

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> test_cmp_rev() from t/test-lib-functions.sh is used to make many
> tests clearer.
>
> Signed-off-by: Stephan Beyer 
> ---
>
> Notes:
> This change is in some way independent of the bisect topic but
> the next patch is based on this (for t6030).

It seems that with this step test_cmp_rev is only used to compare
two revisions, not "I want to see this expected one among many"
introduced by 03/21.

And they all make the resulting code look easier to read.

As I said, these tests would become more cumbersome to debug when
they break with this change, though.

Also, with this mass rewrite, it is not sensible to assume that all
of these tests that start calling test_cmp_rev() do not mind having
expect.rev and actual.rev files in their working trees (and it is
also not sensible to assume that they do not mind having hash1 and
hash2 shell variables clobbered), so not using temporary files may
be good, but perhaps something like

test_cmp_rev () {
test "$(git rev-parse --verify "$1")" = \
 "$(git rev-parse --verify "$2")" && return
echo >&2 "Revs '$1' and '$2" are different"
return false
}

may be necessary.  I dunno.

> diff --git a/t/t2012-checkout-last.sh b/t/t2012-checkout-last.sh
> index e7ba8c5..64cb449 100755
> --- a/t/t2012-checkout-last.sh
> +++ b/t/t2012-checkout-last.sh
> @@ -43,7 +43,7 @@ test_expect_success '"checkout -" attaches again' '
>  
>  test_expect_success '"checkout -" detaches again' '
>   git checkout - &&
> - test "z$(git rev-parse HEAD)" = "z$(git rev-parse other)" &&
> + test_cmp_rev HEAD other &&
>   test_must_fail git symbolic-ref HEAD
>  '
>  
> @@ -101,19 +101,19 @@ test_expect_success 'merge base test setup' '
>  test_expect_success 'another...master' '
>   git checkout another &&
>   git checkout another...master &&
> - test "z$(git rev-parse --verify HEAD)" = "z$(git rev-parse --verify 
> master^)"
> + test_cmp_rev HEAD master^
>  '
>  
>  test_expect_success '...master' '
>   git checkout another &&
>   git checkout ...master &&
> - test "z$(git rev-parse --verify HEAD)" = "z$(git rev-parse --verify 
> master^)"
> + test_cmp_rev HEAD master^
>  '
>  
>  test_expect_success 'master...' '
>   git checkout another &&
>   git checkout master... &&
> - test "z$(git rev-parse --verify HEAD)" = "z$(git rev-parse --verify 
> master^)"
> + test_cmp_rev HEAD master^
>  '
>  
>  test_expect_success '"checkout -" works after a rebase A' '
> diff --git a/t/t3308-notes-merge.sh b/t/t3308-notes-merge.sh
> index 19aed7e..1f72e9e 100755
> --- a/t/t3308-notes-merge.sh
> +++ b/t/t3308-notes-merge.sh
> @@ -93,7 +93,7 @@ test_expect_success 'merge non-notes ref into empty notes 
> ref (remote-notes/orig
>   git notes merge refs/remote-notes/origin/x &&
>   verify_notes v &&
>   # refs/remote-notes/origin/x and v should point to the same notes commit
> - test "$(git rev-parse refs/remote-notes/origin/x)" = "$(git rev-parse 
> refs/notes/v)"
> + test_cmp_rev refs/remote-notes/origin/x refs/notes/v
>  '
>  
>  test_expect_success 'merge notes into empty notes ref (x => y)' '
> @@ -101,13 +101,13 @@ test_expect_success 'merge notes into empty notes ref 
> (x => y)' '
>   git notes merge x &&
>   verify_notes y &&
>   # x and y should point to the same notes commit
> - test "$(git rev-parse refs/notes/x)" = "$(git rev-parse refs/notes/y)"
> + test_cmp_rev refs/notes/x refs/notes/y
>  '
>  
>  test_expect_success 'merge empty notes ref (z => y)' '
>   git notes merge z &&
>   # y should not change (still == x)
> - test "$(git rev-parse refs/notes/x)" = "$(git rev-parse refs/notes/y)"
> + test_cmp_rev refs/notes/x refs/notes/y
>  '
>  
>  test_expect_success 'change notes on other notes ref (y)' '
> @@ -174,7 +174,7 @@ test_expect_success 'merge changed (y) into original (x) 
> => Fast-forward' '
>   verify_notes x &&
>   verify_notes y &&
>   # x and y should point to same the notes commit
> - test "$(git rev-parse refs/notes/x)" = "$(git rev-parse refs/notes/y)"
> + test_cmp_rev refs/notes/x refs/notes/y
>  '
>  
>  test_expect_success 'merge empty notes ref (z => y)' '
> diff --git a/t/t3310-notes-merge-manual-resolve.sh 
> b/t/t3310-notes-merge-manual-resolve.sh
> index d557212..5e46fcf 100755
> --- a/t/t3310-notes-merge-manual-resolve.sh
> +++ b/t/t3310-notes-merge-manual-resolve.sh
> @@ -516,7 +516,7 @@ cp expect_log_w expect_log_m
>  test_expect_success 'reset notes ref m to somewhere else (w)' '
>   git update-ref refs/notes/m refs/notes/w &&
>   verify_notes m &&
> - test "$(git rev-parse refs/notes/m)" = "$(git rev-parse refs/notes/w)"
> + test_cmp_rev refs/notes/m refs/notes/w
>  '
>  
>  test_expect_success 'fail to finalize conflicting merge if underlying ref 
> has moved in the meantime (m

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Junio C Hamano

Jacob Keller  writes:

> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano  wrote:
>
>> I actually do not think these knobs should exist when the code is
>> mature enough to be shipped to the end users.
>>
>> Use "diff.compactionHeuristics = " as an opaque set of bits to
>> help the developers while they compare notes and reach consensus on
>> a single tweak that they can agree on being good enough, and then
>> remove that variable before the code hits 'next'.
>>
>> Thanks.
>
> I was under the impression that we would want a longer lived
> configuration until we had enough data to say whether it was
> helpful to make it default. I guess i had thought it would need to
> be longer lived since there may be cases where it's not optimal
> and being able to turn it off would be good?

Once you start worrying about "some cases this may misbehave", a
configuration variable is a wrong mechanism to do so anyway.  You
would need to tie this to attributes, so the users can say "use this
heuristics for my C code, but do not apply it for my AsciiDoc
input", etc.

What you have is a pure developer support; aim to come up with "good
enough" way, giving developers an easier way to experiment with, and
remove it before the feature is shipped to the end user.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 15/16] branch: use ref-filter printing APIs

2016-04-15 Thread Karthik Nayak

On Fri, Apr 15, 2016 at 2:06 AM, Jeff King  wrote:
> On Thu, Apr 14, 2016 at 04:05:30PM -0400, Jeff King wrote:
>
>> It looks like that's a little tricky for %(upstream) and %(push), which
>> have extra tracking options, but it's pretty trivial for %(symref):
>> [...]
>> I suspect it could work for the remote_ref_atom_parser, too, if you did
>> something like:
>
> So here that is (handling both %(symref) and %(upstream), so replacing
> what I sent a few minutes ago). It's only lightly tested by me, so there
> may be more corner cases to think about. I kept things where they were
> to create a minimal diff, but if it gets squashed in, it might be worth
> re-ordering a little to avoid the need for forward declarations.
>

I had a look at your patch and even tested it, seems solid, I like how you
integrated all these atoms together under refname_atom_parser_internal().
I'm squashing this in, for my re-roll. Thanks.

>> I don't know if that would create weird corner cases with RR_SHORTEN
>> and RR_TRACK, though, which are currently in the same enum (and thus
>> cannot be used at the same time). But it's not like we parse
>> "%(upstream:short:track)" anyway, I don't think. For each "%()"
>> placeholder you have to choose one or the other: showing the tracking
>> info, or showing the refname (possibly with modifiers). So ":track"
>> isn't so much a modifier as "upstream:track" is this totally other
>> thing.
>
> So actually, we _do_ accept "%(upstream:short,track)" with your patch,
> which is somewhat nonsensical. It just ignores "short" and takes
> whatever option came last. Which is reasonable, though flagging an error
> would also be reasonable (and I think is what existing git does). I
> don't think it matters much either way.
>

I think it was decided a while ago that it'd be best to ignore all and
accept the
last argument without barfing, I couldn't find the exact link. But I'm
open to both.
But if you look at the %(align) atom, even that accepts mutually
exclusive arguments
and accepts the last argument without reporting an error.

> ---
> diff --git a/ref-filter.c b/ref-filter.c
> index 3435df1..951c57e 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -50,6 +50,11 @@ struct if_then_else {
> condition_satisfied : 1;
>  };
>
> +struct refname_atom {
> +   enum { R_BASE, R_DIR, R_NORMAL, R_SHORT, R_STRIP } option;
> +   unsigned int strip;
> +};
> +
>  /*
>   * An atom is a valid field atom listed below, possibly prefixed with
>   * a "*" to denote deref_tag().
> @@ -67,7 +72,8 @@ static struct used_atom {
> char color[COLOR_MAXLEN];
> struct align align;
> struct {
> -   enum { RR_NORMAL, RR_SHORTEN, RR_TRACK, RR_TRACKSHORT 
> } option;
> +   enum { RR_REF, RR_TRACK, RR_TRACKSHORT } option;
> +   struct refname_atom refname;
> unsigned int nobracket: 1;
> } remote_ref;
> struct {
> @@ -82,16 +88,14 @@ static struct used_atom {
> enum { O_FULL, O_LENGTH, O_SHORT } option;
> unsigned int length;
> } objectname;
> -   enum { S_FULL, S_SHORT } symref;
> -   struct {
> -   enum { R_BASE, R_DIR, R_NORMAL, R_SHORT, R_STRIP } 
> option;
> -   unsigned int strip;
> -   } refname;
> +   struct refname_atom refname;
> } u;
>  } *used_atom;
>  static int used_atom_cnt, need_tagged, need_symref;
>  static int need_color_reset_at_eol;
>
> +static const char *show_ref(struct refname_atom *atom, const char *refname);
> +
>  static void color_atom_parser(struct used_atom *atom, const char 
> *color_value)
>  {
> if (!color_value)
> @@ -100,13 +104,34 @@ static void color_atom_parser(struct used_atom *atom, 
> const char *color_value)
> die(_("unrecognized color: %%(color:%s)"), color_value);
>  }
>
> +static void refname_atom_parser_internal(struct refname_atom *atom,
> +const char *arg, const char *name)
> +{
> +   if (!arg)
> +   atom->option = R_NORMAL;
> +   else if (!strcmp(arg, "short"))
> +   atom->option = R_SHORT;
> +   else if (skip_prefix(arg, "strip=", )) {
> +   atom->option = R_STRIP;
> +   if (strtoul_ui(arg, 10, >strip) || atom->strip <= 0)
> +   die(_("positive value expected refname:strip=%s"), 
> arg);
> +   } else if (!strcmp(arg, "dir"))
> +   atom->option = R_DIR;
> +   else if (!strcmp(arg, "base"))
> +   atom->option = R_BASE;
> +   else
> +   die(_("unrecognized %%(%s) argument: %s"), name, arg);
> +}
> +
>  static void remote_ref_atom_parser(struct used_atom *atom, const char *arg)
>  {
> struct string_list params = STRING_LIST_INIT_DUP;
>

Re: [PATCH] Adding list of p4 jobs to git commit message

2016-04-15 Thread Junio C Hamano

Jan Durovec  writes:

> ---

A few issues.  Please:

 (1) Sign-off your work.

 (2) Try to find those who are familiar with the area and Cc them.

 "git shortlog -s -n --since=18.months --no-merges git-p4.py"
 may help.

 (3) Follow the style of existing commits when giving a title to
 your patch.

 "git shortlog --since=18.months --no-merges git-p4.py" may
 help you notice "git-p4: do this thing" is the common way to
 title "git p4" patches.

 (4) Justify why your change is a good thing in your log message.
 What you did, i.e. "list p4 jobs when making a commit", can be
 seen by the patch, but readers cannot guess why you thought it
 is a good idea to extract "job%d" out of the P4 commit and to
 record them in the resulting Git commit, unless you explain
 things like:

 - what goes wrong if you don't?
 - when would "job%d" appear in P4 commit?
 - is it sane to assume "job0", "job1",... appear consecutively?

 (5) Describe what your change does clearly.  "Adding list" is not
 quite clear.  Where in the "git commit message" are you adding
 the list, and why is that location in the message the most
 appropriate place to add it?

Thanks.

>  git-p4.py | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index 527d44b..a81795f 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2320,6 +2320,15 @@ def extractFilesFromCommit(self, commit):
>  fnum = fnum + 1
>  return files
>  
> +def extractJobsFromCommit(self, commit):
> +jobs = []
> +jnum = 0
> +while commit.has_key("job%s" % jnum):
> +job = commit["job%s" % jnum]
> +jobs.append(job)
> +jnum = jnum + 1
> +return jobs
> +
>  def stripRepoPath(self, path, prefixes):
>  """When streaming files, this is called to map a p4 depot path
> to where it should go in git.  The prefixes are either
> @@ -2665,6 +2674,7 @@ def hasBranchPrefix(self, path):
>  def commit(self, details, files, branch, parent = ""):
>  epoch = details["time"]
>  author = details["user"]
> +jobs = self.extractJobsFromCommit(details)
>  
>  if self.verbose:
>  print('commit into {0}'.format(branch))
> @@ -2696,6 +2706,8 @@ def commit(self, details, files, branch, parent = ""):
>   (','.join(self.branchPrefixes), 
> details["change"]))
>  if len(details['options']) > 0:
>  self.gitStream.write(": options = %s" % details['options'])
> +if len(jobs) > 0:
> +self.gitStream.write(": jobs = %s" % (','.join(jobs)))
>  self.gitStream.write("]\nEOT\n\n")
>  
>  if len(parent) > 0:
>
> --
> https://github.com/git/git/pull/225
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 03/16] index-helper: new daemon for caching index and related stuff

2016-04-15 Thread David Turner

On Fri, 2016-04-15 at 18:25 +0700, Duy Nguyen wrote:
> On Thu, Apr 14, 2016 at 1:47 AM, David Turner <
> dtur...@twopensource.com> wrote:
> > > > +   fd = unix_stream_connect(socket_path);
> > > > +   if (refresh_cache) {
> > > > +   ret = write_in_full(fd, "refresh", 8) != 8;
> > > 
> > > Since we've moved to unix socket and had bidirectional
> > > communication,
> > > it's probably a good idea to read an "ok" back, giving index
> > > -helper
> > > time to prepare the cache. As I recall the last discussion with
> > > Johannes, missing a cache here when the index is around 300MB
> > > could
> > > hurt more than wait patiently once and have it ready next time.
> > 
> > It is somewhat slower to wait for the daemon (which requires a disk
> > load + a memcpy) than it is to just load it ourselves (which is
> > just a
> > disk load).
> 
> You forgot the most costly part, SHA-1 verification. For very large
> index, I assume the index-helper is already in the middle of hashing
> the index content. If you ignore index-helper, you need to go hash
> the
> whole thing again. The index-helper can hand it to you if you wait
> just a bit more. This wait time should be shorter because index
> -helper
> is already in the middle of hashing (and in optimistic case, very
> close to finishing it).

You're right -- I did forget that part.

In "index-helper: use watchman to avoid refreshing index with lstat()",
we switch from just poking to poking and waiting for a reply.  Then in
"read-cache: config for waiting for index-helper", we make that waiting
optional.  So what if I just remove that patch?  Does that solve it?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 03/16] index-helper: new daemon for caching index and related stuff

2016-04-15 Thread David Turner

On Thu, 2016-04-14 at 17:04 +0700, Duy Nguyen wrote:
> On Thu, Apr 14, 2016 at 1:47 AM, David Turner <
> dtur...@twopensource.com> wrote:
> > On Wed, 2016-04-13 at 20:43 +0700, Duy Nguyen wrote:
> > > On Wed, Apr 13, 2016 at 7:32 AM, David Turner <
> > > dtur...@twopensource.com> wrote:
> > > > +NOTES
> > > > +-
> > > > +
> > > > +$GIT_DIR/index-helper.path is a symlink
> > > 
> > > In multiple worktree context, this file will be per-worktree. So
> > > we
> > > have one daemon per worktree. I think that's fine.
> > > 
> > > > to a directory in $TMPDIR
> > > > +containing a Unix domain socket called 's' that the daemon
> > > > reads
> > > > +commands from.
> > > 
> > > Oops. I stand corrected, now it's one daemon per repository...
> > > Probably good to hide the socket path in $GIT_DIR though, people
> > > may
> > > protect it with dir permission of one of ancestor directories.
> > 
> > I'm not sure I understand what you're saying here.  It should be
> > one
> > daemon per worktree, I think.  And as far as I know, it is.
> 
> No you're right, it's still per worktree. I assumed
> $GIT_DIR/index-helper.path points to the same $TMPDIR, but it's not.
> 
> > Socket paths must be short (less than 104 chars on Mac).  That's
> > why I
> > do the weird symlink-to-tmpdir thing.
> 
> Is relative path in sun_path portable? We could just chdir() there,
> open the socket and chdir() back. Though if the current solution's
> already good enough, I don't think we need to change this again.
> 
> Hmm.. googling a bit pointed me back to Jeff's patch that does
> exactly
> that. The commit is 1eb10f4 (unix-socket: handle long socket
> pathnames
> - 2012-01-09). It does not mention Mac though, neither does the
> related discussion on mailing list..

In that case, I guess we can put the socket in $GITDIR and save the
annoyance of the temp dir.  Seems legit to me.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> +diff.emptyLineHeuristic::
>>
>> I was looking at the TODO here and thought about the name:
>> It should not encode the `emptyLine` into the config option as
>> it is only one of many heuristics.
>>
>> It should be something like `diff.heuristic=lastEmptyLine`
>> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
>> breakAtShortestLineLength or other styles as well later on.
>>
>> I do not quite understand the difference between diff.algorithm
>> and the newly proposed heuristic as the heuristic is part of
>> the algorithm? So I guess we'd need to have some documentation
>> saying how these differ. (fundamental algorithm vs last minute
>> style fixup?)
>
> I actually do not think these knobs should exist when the code is
> mature enough to be shipped to the end users.
>
> Use "diff.compactionHeuristics = " as an opaque set of bits to
> help the developers while they compare notes and reach consensus on
> a single tweak that they can agree on being good enough, and then
> remove that variable before the code hits 'next'.
>
> Thanks.

I was under the impression that we would want a longer lived
configuration until we had enough data to say whether it was helpful
to make it default. I guess i had thought it would need to be longer
lived since there may be cases where it's not optimal and being able
to turn it off would be good?

I'd rather keep it semi-human readable vs a uint since it would help
keep me sane when looking at it in the interim.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 12:57 PM, Stefan Beller  wrote:
> I was looking at the TODO here and thought about the name:
> It should not encode the `emptyLine` into the config option as
> it is only one of many heuristics.
>
> It should be something like `diff.heuristic=lastEmptyLine`
> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
> breakAtShortestLineLength or other styles as well later on.
>

This sounds better, but how does this handle multiple heuristics?

> I do not quite understand the difference between diff.algorithm
> and the newly proposed heuristic as the heuristic is part of
> the algorithm? So I guess we'd need to have some documentation
> saying how these differ. (fundamental algorithm vs last minute
> style fixup?)

It is not part of the algorithm. It's applied after the algorithm.
xdl_change_compact is run after the algorithm and run for all
algorithms.

These are last minute style changes, and should probably not use the
term heuristic, but somehow capture "last minute style fixup"

Thanks,
Jake

> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Junio C Hamano

Stefan Beller  writes:

>> +diff.emptyLineHeuristic::
>
> I was looking at the TODO here and thought about the name:
> It should not encode the `emptyLine` into the config option as
> it is only one of many heuristics.
>
> It should be something like `diff.heuristic=lastEmptyLine`
> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
> breakAtShortestLineLength or other styles as well later on.
>
> I do not quite understand the difference between diff.algorithm
> and the newly proposed heuristic as the heuristic is part of
> the algorithm? So I guess we'd need to have some documentation
> saying how these differ. (fundamental algorithm vs last minute
> style fixup?)

I actually do not think these knobs should exist when the code is
mature enough to be shipped to the end users.

Use "diff.compactionHeuristics = " as an opaque set of bits to
help the developers while they compare notes and reach consensus on
a single tweak that they can agree on being good enough, and then
remove that variable before the code hits 'next'.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 03/21] t/test-lib-functions.sh: generalize test_cmp_rev

2016-04-15 Thread Junio C Hamano

Stephan Beyer  writes:

> test_cmp_rev() took exactly two parameters, the expected revision
> and the revision to test. This commit generalizes this function
> such that it takes any number of at least two revisions: the
> expected one and a list of actual ones. The function returns true
> if and only if at least one actual revision coincides with the
> expected revision.

There may be cases where you want to find the expected one among
various things you actually have (which is what the above talks
about; it is like "list-what-I-actually-got | grep what-i-want"),
but an equally useful use case would be "I would get only one
outcome from test, I anticipate one of these things, all of which is
OK, but I cannot dictate which one of them should come out" (it is
like "list-what-I-can-accept | grep what-I-actually-got").

I am not enthused by the new test that implements the "match one
against multi" check only in one way among these possible two to
squat on a very generic name, test_cmp_rev.

The above _may_ appear a non-issue until you realize one thing that
is there to help those who debug the tests, which is ...

> While at it, the side effect of generating two (temporary) files
> is removed.

That is not strictly a side effect.  test_cmp allows you to see what
was expected and what you actually had when the test failed (we
always compare expect with actual and not the other way around, so
that "diff -u expect actual" would show how the actual behaviour
diverted from our expectation in a natural way).

Something with the semantics of these two:

test_revs_have_expected () {
expect=$1
shift
git rev-parse "$@" | grep -e "$expect" >/dev/null && return
echo >&2 "The expected '$1' is not found in:"
printf >&2 " '%s'\n", "$@"
return 1
}

test_rev_among_expected () {
actual=$1
shift
git rev-parse "$@" | grep -e "$actual" >/dev/null && return
echo >&2 "'$1' is not among expected ones:"
printf >&2 " '%s'\n", "$@"
return 1
}

might be more appropriate.

>
> Signed-off-by: Stephan Beyer 
> ---
>  t/test-lib-functions.sh | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 8d99eb3..8caf59c 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -711,11 +711,17 @@ test_must_be_empty () {
>   fi
>  }
>  
> -# Tests that its two parameters refer to the same revision
> +# Tests that the first parameter refers to the same revision
> +# of at least one other parameter
>  test_cmp_rev () {
> - git rev-parse --verify "$1" >expect.rev &&
> - git rev-parse --verify "$2" >actual.rev &&
> - test_cmp expect.rev actual.rev
> + hash1="$(git rev-parse --verify "$1")" || return
> + shift
> + for rev
> + do
> + hash2="$(git rev-parse --verify "$rev")" || return
> + test "$hash1" = "$hash2" && return 0
> + done
> + return 1
>  }
>  
>  # Print a sequence of numbers or letters in increasing order.  This is
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'git mv' doesn't move submodule if it's in a subdirectory

2016-04-15 Thread Albin Otterhäll

On 2016-04-15 19:18, Stefan Beller wrote:
> On Fri, Apr 15, 2016 at 1:14 AM, Albin Otterhäll  wrote:
>> I've a submodule located in a subdirectory
>> ({git_rep}/home/{directory}/{submodule}), and I wanted to move the whole
>> directory up a level ({git_rep}/{directory}/{submodule}). But when I
>> used 'git mv {directory} ../' the '.gitmodule' file didn't get modified.
>>
>> Best regards,
>> Albin Otterhäll
> 
> Thanks for the bug report!
> Which version of Git do you use? (Did you try different versions?)
> 

I'm using 2.8.0 (on an Arch system). Haven't tested on any other version.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 0/6] fetch with refspec

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 12:19 PM, David Turner  wrote:
> We've got a lot of refs, but pretty frequently we only want to fetch
> one.  It's silly for the server to send a bunch of refs that the client
> is just going to ignore.  Here are some patches that fix that.
>
> Let me know if this seems reasonable.

Thanks for working on this!

I had a similar goal back then for non-http traffic and that series
exploded in size[1]
The issue at my attempt was non http traffic would require a protocol
update such that
the client speaks first to transport the refspec to the server. To
make "client speaks first"
happen, we'd need to have a protocol v2. So that attempt of mine
stalled as it seemed like
a huge thing.

[1] WIP at https://github.com/stefanbeller/git/tree/protocol2-10

This series looks small and reasonable from a cursory read.

Thanks,
Stefan

>
> (and I'll start in on another round of index-helper as soon as this is sent!)
>
> David Turner (6):
>   http-backend: use argv_array functions
>   remote-curl.c: fix variable shadowing
>   http-backend: handle refspec argument
>   transport: add refspec list parameters to functions
>   fetch: pass refspec to http server
>   clone: send refspec for single-branch clones
>
>  Documentation/technical/protocol-capabilities.txt | 23 +++
>  builtin/clone.c   | 16 -
>  builtin/fetch.c   | 24 ++-
>  builtin/ls-remote.c   |  2 +-
>  builtin/remote.c  |  2 +-
>  http-backend.c| 23 +--
>  remote-curl.c | 25 ---
>  t/t5552-http-fetch-branch.sh  | 47 +
>  transport-helper.c| 44 
>  transport.c   | 14 ++--
>  transport.h   |  4 +-
>  upload-pack.c | 81 
> ++-
>  12 files changed, 261 insertions(+), 44 deletions(-)
>  create mode 100755 t/t5552-http-fetch-branch.sh
>
> --
> 2.4.2.767.g62658d5-twtrsrc
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC 3/6] http-backend: handle refspec argument

2016-04-15 Thread David Turner

Allow clients to pass a "refspec" parameter through to upload-pack;
upload-pack will only advertise refs which match that refspec.

Signed-off-by: David Turner 
---
 http-backend.c |  9 +++
 upload-pack.c  | 81 --
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index a4f0066..9731391 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -452,6 +452,7 @@ static void get_info_refs(char *arg)
if (service_name) {
struct argv_array argv = ARGV_ARRAY_INIT;
struct rpc_service *svc = select_service(service_name);
+   const char *refspec;
 
strbuf_addf(, "application/x-git-%s-advertisement",
svc->name);
@@ -465,6 +466,14 @@ static void get_info_refs(char *arg)
argv_array_push(, "--stateless-rpc");
argv_array_push(, "--advertise-refs");
 
+   refspec = get_parameter("refspec");
+   if (refspec) {
+   struct strbuf interesting_refs = STRBUF_INIT;
+   strbuf_addstr(_refs, "--interesting-refs=");
+   strbuf_addstr(_refs, refspec);
+   argv_array_push(, interesting_refs.buf);
+   strbuf_release(_refs);
+   }
argv_array_push(, ".");
run_service(argv.argv, 0);
argv_array_clear();
diff --git a/upload-pack.c b/upload-pack.c
index b3f6653..da140c2 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -52,6 +52,7 @@ static int keepalive = 5;
 static int use_sideband;
 static int advertise_refs;
 static int stateless_rpc;
+static struct string_list interesting_refspecs = STRING_LIST_INIT_DUP;
 
 static void reset_timeout(void)
 {
@@ -687,16 +688,61 @@ static void receive_needs(void)
free(shallows.objects);
 }
 
-/* return non-zero if the ref is hidden, otherwise 0 */
+struct refspec_data {
+   int has_star;
+   size_t prefixlen;
+   size_t suffixlen;
+};
+
+static int matches_refspec(const char *refspec, struct refspec_data *data,
+   const char *ref)
+{
+   size_t len;
+
+   if (!data->has_star)
+   return !strcmp(refspec, ref);
+
+   if (strncmp(refspec, ref, data->prefixlen))
+   return -1;
+
+   len = strlen(refspec);
+   if (len < data->prefixlen + data->suffixlen)
+   return -1;
+
+   return strcmp(ref + (len - data->suffixlen),
+ refspec + data->prefixlen + 1);
+}
+
+/*
+ * return non-zero if the ref is hidden or outside the provided
+ * refspecs, otherwise 0
+*/
 static int mark_our_ref(const char *refname, const char *refname_full,
const struct object_id *oid)
 {
struct object *o = lookup_unknown_object(oid->hash);
+   struct string_list_item *item;
 
if (ref_is_hidden(refname, refname_full)) {
o->flags |= HIDDEN_REF;
return 1;
}
+
+   if (interesting_refspecs.nr) {
+   int found = 0;
+   /*
+* TODO: this could be faster for large numbers of
+* refspecs by using tries or a DFA.
+*/
+   for_each_string_list_item(item, _refspecs)
+   if (matches_refspec(item->string, item->util, refname)) 
{
+   found = 1;
+   break;
+   }
+   if (!found)
+   return 1;
+
+   }
o->flags |= OUR_REF;
return 0;
 }
@@ -725,7 +771,7 @@ static int send_ref(const char *refname, const struct 
object_id *oid,
 {
static const char *capabilities = "multi_ack thin-pack side-band"
" side-band-64k ofs-delta shallow no-progress"
-   " include-tag multi_ack_detailed";
+   " include-tag multi_ack_detailed interesting-refs";
const char *refname_nons = strip_namespace(refname);
struct object_id peeled;
 
@@ -820,6 +866,24 @@ static int upload_pack_config(const char *var, const char 
*value, void *unused)
return parse_hide_refs_config(var, value, "uploadpack");
 }
 
+static struct refspec_data *make_refspec_data(const char *refspec)
+{
+   struct refspec_data *data;
+   const char *star;
+
+   data = xmalloc(sizeof(struct refspec_data));
+   star = strchr(refspec, '*');
+   if (star) {
+   data->has_star = 1;
+   data->prefixlen = star - refspec;
+   data->suffixlen = strlen(refspec) - (data->prefixlen + 1);
+   } else {
+   data->has_star = 0;
+   }
+
+   return data;
+}
+
 int main(int argc, char **argv)
 {
char *dir;
@@ -841,6 +905,19 @@ int main(int argc, char **argv)
advertise_refs = 1;

[PATCH/RFC 4/6] transport: add refspec list parameters to functions

2016-04-15 Thread David Turner

Add parameters for a list of refspecs to transport_get_remote_refs and
get_refs_list.  These parameters are presently unused -- soon, we will
use them to implement fetches which only learn about a subset of refs.

Signed-off-by: David Turner 
---
 builtin/clone.c |  2 +-
 builtin/fetch.c |  6 --
 builtin/ls-remote.c |  2 +-
 builtin/remote.c|  2 +-
 transport-helper.c  | 24 
 transport.c | 14 --
 transport.h |  4 ++--
 7 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 6616392..91f668c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1010,7 +1010,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !option_depth)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport);
+   refs = transport_get_remote_refs(transport, NULL, 0);
 
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index e4639d8..cafab37 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -221,7 +221,8 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, _refs);
-   for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
+   for (ref = transport_get_remote_refs(transport, NULL, 0);
+ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -301,8 +302,9 @@ static struct ref *get_ref_map(struct transport *transport,
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
+   const struct ref *remote_refs;
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport);
+   remote_refs = transport_get_remote_refs(transport, NULL, 0);
 
if (refspec_count) {
struct refspec *fetch_refspec;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 66cdd45..bce706e 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -94,7 +94,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport);
+   ref = transport_get_remote_refs(transport, NULL, 0);
if (transport_disconnect(transport))
return 1;
 
diff --git a/builtin/remote.c b/builtin/remote.c
index fda5c2e..5745e8b 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -862,7 +862,7 @@ static int get_remote_ref_states(const char *name,
if (query) {
transport = transport_get(states->remote, 
states->remote->url_nr > 0 ?
states->remote->url[0] : NULL);
-   remote_refs = transport_get_remote_refs(transport);
+   remote_refs = transport_get_remote_refs(transport, NULL, 0);
transport_disconnect(transport);
 
states->queried = 1;
diff --git a/transport-helper.c b/transport-helper.c
index b934183..b5c91d2 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -99,7 +99,7 @@ static void do_take_over(struct transport *transport)
 
 static void standard_options(struct transport *t);
 
-static struct child_process *get_helper(struct transport *transport)
+static struct child_process *get_helper(struct transport *transport, const 
struct refspec *req_refspecs, int req_refspec_nr)
 {
struct helper_data *data = transport->data;
struct strbuf buf = STRBUF_INIT;
@@ -267,7 +267,7 @@ static int set_helper_option(struct transport *transport,
struct strbuf buf = STRBUF_INIT;
int i, ret, is_bool = 0;
 
-   get_helper(transport);
+   get_helper(transport, NULL, 0);
 
if (!data->option)
return 1;
@@ -395,7 +395,7 @@ static int fetch_with_fetch(struct transport *transport,
 
 static int get_importer(struct transport *transport, struct child_process 
*fastimport)
 {
-   struct child_process *helper = get_helper(transport);
+   struct child_process *helper = get_helper(transport, NULL, 0);
struct helper_data *data = transport->data;
int cat_blob_fd, code;
child_process_init(fastimport);
@@ -418,7 +418,7 @@ static int get_exporter(struct transport *transport,
struct string_list *revlist_args)
 {
struct helper_data *data = transport->data;
-   struct child_process *helper = get_helper(transport);
+   struct child_process *helper = get_helper(transport, NULL, 0);
int i;
 
child_process_init(fastexport);
@@ -451,7 +451,7 @@ static int fetch_with_import(struct transport *transport,
struct ref *posn;

[PATCH/RFC 1/6] http-backend: use argv_array functions

2016-04-15 Thread David Turner

Signed-off-by: David Turner 
---
 http-backend.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 8870a26..a4f0066 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -450,9 +450,7 @@ static void get_info_refs(char *arg)
hdr_nocache();
 
if (service_name) {
-   const char *argv[] = {NULL /* service name */,
-   "--stateless-rpc", "--advertise-refs",
-   ".", NULL};
+   struct argv_array argv = ARGV_ARRAY_INIT;
struct rpc_service *svc = select_service(service_name);
 
strbuf_addf(, "application/x-git-%s-advertisement",
@@ -463,9 +461,13 @@ static void get_info_refs(char *arg)
packet_write(1, "# service=git-%s\n", svc->name);
packet_flush(1);
 
-   argv[0] = svc->name;
-   run_service(argv, 0);
+   argv_array_push(, svc->name);
+   argv_array_push(, "--stateless-rpc");
+   argv_array_push(, "--advertise-refs");
 
+   argv_array_push(, ".");
+   run_service(argv.argv, 0);
+   argv_array_clear();
} else {
select_getanyfile();
for_each_namespaced_ref(show_text_ref, );
-- 
2.4.2.767.g62658d5-twtrsrc

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC 6/6] clone: send refspec for single-branch clones

2016-04-15 Thread David Turner

For single-branch clones (when we know in advance what the remote
branch name will be), send a refspec so that the server doesn't
tell us about any other refs.

Signed-off-by: David Turner 
---
 builtin/clone.c  | 16 +++-
 t/t5552-http-fetch-branch.sh |  5 +
 transport-helper.c   | 12 +++-
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 91f668c..9a0291d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1010,7 +1010,21 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !option_depth)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport, NULL, 0);
+   if (option_single_branch && option_branch) {
+   struct refspec branch_refspec = {0};
+
+   if (starts_with(option_branch, "refs/")) {
+   branch_refspec.src = xstrdup(option_branch);
+   } else {
+   struct strbuf buf = STRBUF_INIT;
+   strbuf_addf(, "refs/heads/%s", option_branch);
+   branch_refspec.src = strbuf_detach(, NULL);
+   }
+   refs = transport_get_remote_refs(transport, _refspec, 1);
+   free(branch_refspec.src);
+   } else {
+   refs = transport_get_remote_refs(transport, NULL, 0);
+   }
 
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/t/t5552-http-fetch-branch.sh b/t/t5552-http-fetch-branch.sh
index 0e905d9..8a8e218 100755
--- a/t/t5552-http-fetch-branch.sh
+++ b/t/t5552-http-fetch-branch.sh
@@ -38,5 +38,10 @@ test_expect_success 'fetch with refspec only fetches 
requested branch' '
)
 '
 
+test_expect_success 'single-branch clone only fetches requested branch' '
+   GIT_TRACE_PACKET="$TRASH_DIRECTORY/trace" git clone --single-branch -b 
master $HTTPD_URL/smart/repo.git sbc &&
+   ! grep "refs/heads/another_branch" trace
+'
+
 stop_httpd
 test_done
diff --git a/transport-helper.c b/transport-helper.c
index 7d75d64..3775d63 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -28,6 +28,7 @@ struct helper_data {
signed_tags : 1,
check_connectivity : 1,
no_disconnect_req : 1,
+   refspec : 1,
no_private_update : 1;
char *export_marks;
char *import_marks;
@@ -114,8 +115,15 @@ static struct child_process *get_helper(struct transport 
*transport, const struc
int code;
int i;
 
-   if (data->helper)
+   if (data->helper) {
+   if (!data->refspec && req_refspec_nr) {
+   for (i = 0; i < req_refspec_nr; i++)
+   set_helper_option(transport, "refspec",
+ req_refspecs[i].src);
+   data->refspec = 1;
+   }
return data->helper;
+   }
 
helper = xmalloc(sizeof(*helper));
child_process_init(helper);
@@ -220,6 +228,8 @@ static struct child_process *get_helper(struct transport 
*transport, const struc
 
for (i = 0; i < req_refspec_nr; i++)
set_helper_option(transport, "refspec", req_refspecs[i].src);
+   if (req_refspec_nr)
+   data->refspec = 1;
standard_options(transport);
return data->helper;
 }
-- 
2.4.2.767.g62658d5-twtrsrc

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC 0/6] fetch with refspec

2016-04-15 Thread David Turner

We've got a lot of refs, but pretty frequently we only want to fetch
one.  It's silly for the server to send a bunch of refs that the client
is just going to ignore.  Here are some patches that fix that.

Let me know if this seems reasonable.

(and I'll start in on another round of index-helper as soon as this is sent!)

David Turner (6):
  http-backend: use argv_array functions
  remote-curl.c: fix variable shadowing
  http-backend: handle refspec argument
  transport: add refspec list parameters to functions
  fetch: pass refspec to http server
  clone: send refspec for single-branch clones

 Documentation/technical/protocol-capabilities.txt | 23 +++
 builtin/clone.c   | 16 -
 builtin/fetch.c   | 24 ++-
 builtin/ls-remote.c   |  2 +-
 builtin/remote.c  |  2 +-
 http-backend.c| 23 +--
 remote-curl.c | 25 ---
 t/t5552-http-fetch-branch.sh  | 47 +
 transport-helper.c| 44 
 transport.c   | 14 ++--
 transport.h   |  4 +-
 upload-pack.c | 81 ++-
 12 files changed, 261 insertions(+), 44 deletions(-)
 create mode 100755 t/t5552-http-fetch-branch.sh

-- 
2.4.2.767.g62658d5-twtrsrc

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC 2/6] remote-curl.c: fix variable shadowing

2016-04-15 Thread David Turner

The local variable 'options' was shadowing a global of the same name.

Signed-off-by: David Turner 
---
 remote-curl.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 15e48e2..b9b6a90 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -254,7 +254,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
struct strbuf effective_url = STRBUF_INIT;
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
-   struct http_get_options options;
+   struct http_get_options get_options;
 
if (last && !strcmp(service, last->service))
return last;
@@ -271,15 +271,15 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(_url, "service=%s", service);
}
 
-   memset(, 0, sizeof(options));
-   options.content_type = 
-   options.charset = 
-   options.effective_url = _url;
-   options.base_url = 
-   options.no_cache = 1;
-   options.keep_error = 1;
+   memset(_options, 0, sizeof(get_options));
+   get_options.content_type = 
+   get_options.charset = 
+   get_options.effective_url = _url;
+   get_options.base_url = 
+   get_options.no_cache = 1;
+   get_options.keep_error = 1;
 
-   http_ret = http_get_strbuf(refs_url.buf, , );
+   http_ret = http_get_strbuf(refs_url.buf, , _options);
switch (http_ret) {
case HTTP_OK:
break;
-- 
2.4.2.767.g62658d5-twtrsrc

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH/RFC 5/6] fetch: pass refspec to http server

2016-04-15 Thread David Turner

When fetching over http, send the requested refspec to the server.
The server will then only send refs matching that refspec.  It is
permitted for old servers to ignore that parameter, and the client
will automatically handle this.

When the server has many refs, and the client only wants a few, this
can save bandwidth and reduce fetch latency.

Signed-off-by: David Turner 
---
 Documentation/technical/protocol-capabilities.txt | 23 +
 builtin/fetch.c   | 20 ++-
 remote-curl.c |  7 
 t/t5552-http-fetch-branch.sh  | 42 +++
 transport-helper.c|  8 -
 5 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100755 t/t5552-http-fetch-branch.sh

diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index eaab6b4..8c4a0b9 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -275,3 +275,26 @@ to accept a signed push certificate, and asks the  
to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+interesting-refs
+
+
+For HTTP(S) servers only:
+
+Whether or not the upload-pack server advertises this capability,
+fetch-pack may send a "refspec" parameter in the query string of a
+fetch request.  This capability indicates that such a parameter will
+be respected, in case the client cares.
+
+Whenever the receive-pack server gets that parameter, it will not
+advertise all refs and will instead only advertise refs that match
+those listed in the header. The parameter is a space-separated list of
+refs.  A ref may optionally contain up to one wildcard.
+Advertisements will still respect hideRefs.
+
+The presence or absence of the "refspec" parameter does not affect
+what refs a client is permitted to fetch; this is still controlled in
+the normal fashion.
+
+This saves time in the presence of a large number of refs, because the
+client need not wait for the server to send the complete list of refs.
diff --git a/builtin/fetch.c b/builtin/fetch.c
index cafab37..c22a92f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -302,9 +302,27 @@ static struct ref *get_ref_map(struct transport *transport,
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
+   struct refspec *qualified_refspecs;
const struct ref *remote_refs;
 
-   remote_refs = transport_get_remote_refs(transport, NULL, 0);
+   qualified_refspecs = xcalloc(refspec_count, 
sizeof(*qualified_refspecs));
+   for (i = 0; i < refspec_count; i++) {
+   if (starts_with(refspecs[i].src, "refs/")) {
+   qualified_refspecs[i].src = xstrdup(refspecs[i].src);
+   } else {
+   struct strbuf buf = STRBUF_INIT;
+   strbuf_addf(, "refs/heads/%s", refspecs[i].src);
+   qualified_refspecs[i].src = strbuf_detach(, NULL);
+   }
+   }
+
+   remote_refs = transport_get_remote_refs(transport, qualified_refspecs,
+   refspec_count);
+
+   for (i = 0; i < refspec_count; i++) {
+   free(qualified_refspecs[i].src);
+   }
+   free(qualified_refspecs);
 
if (refspec_count) {
struct refspec *fetch_refspec;
diff --git a/remote-curl.c b/remote-curl.c
index b9b6a90..e914d3f 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -20,6 +20,7 @@ static struct strbuf url = STRBUF_INIT;
 struct options {
int verbosity;
unsigned long depth;
+   char *refspec;
unsigned progress : 1,
check_self_contained_and_connected : 1,
cloning : 1,
@@ -43,6 +44,10 @@ static int set_option(const char *name, const char *value)
options.verbosity = v;
return 0;
}
+   else if (!strcmp(name, "refspec")) {
+   options.refspec = xstrdup(value);
+   return 0;
+   }
else if (!strcmp(name, "progress")) {
if (!strcmp(value, "true"))
options.progress = 1;
@@ -269,6 +274,8 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
else
strbuf_addch(_url, '&');
strbuf_addf(_url, "service=%s", service);
+   if (options.refspec)
+   strbuf_addf(_url, "=%s", options.refspec);
}
 
memset(_options, 0, sizeof(get_options));
diff --git a/t/t5552-http-fetch-branch.sh b/t/t5552-http-fetch-branch.sh
new file mode 100755
index 000..0e905d9
--- /dev/null
+++ b/t/t5552-http-fetch-branch.sh
@@ -0,0 +1,42 @@

[PATCH] mv: allow moving nested submodules

2016-04-15 Thread Stefan Beller

When directories are moved using `git mv` all files in the directory
have been just moved, but no further action was taken on them. This
was done by assigning the mode = WORKING_DIRECTORY to the files
inside a moved directory.

submodules however need to update their link to the git directory as
well as updates to the .gitmodules file. By removing the condition of
`mode != INDEX` (the remaining modes are BOTH and WORKING_DIRECTORY) for
the required submodule actions, we perform these for submodules in a
moved directory.

Signed-off-by: Stefan Beller 
---

Albin,
I think this would fix your problem.

Developed on origin/master (it may apply on older versions, though I did not 
test)

Thanks,
Stefan

 builtin/mv.c  |  7 +--
 t/t7001-mv.sh | 16 
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/builtin/mv.c b/builtin/mv.c
index aeae855..65fff11 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -252,9 +252,12 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
int pos;
if (show_only || verbose)
printf(_("Renaming %s to %s\n"), src, dst);
-   if (!show_only && mode != INDEX) {
-   if (rename(src, dst) < 0 && !ignore_errors)
+   if (!show_only) {
+   if (mode != INDEX &&
+   rename(src, dst) < 0 &&
+   !ignore_errors)
die_errno(_("renaming '%s' failed"), src);
+
if (submodule_gitfile[i]) {
if (submodule_gitfile[i] != 
SUBMODULE_WITH_GITDIR)
connect_work_tree_and_git_dir(dst, 
submodule_gitfile[i]);
diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index 4008fae..4a2570e 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -292,6 +292,9 @@ test_expect_success 'setup submodule' '
echo content >file &&
git add file &&
git commit -m "added sub and file" &&
+   mkdir -p deep/directory/hierachy &&
+   git submodule add ./. deep/directory/hierachy/sub &&
+   git commit -m "added another submodule" &&
git branch submodule
 '
 
@@ -475,4 +478,17 @@ test_expect_success 'mv -k does not accidentally destroy 
submodules' '
git checkout .
 '
 
+test_expect_success 'moving a submodule in nested directories' '
+   (
+   cd deep &&
+   git mv directory ../ &&
+   # git status would fail if the update of linking git dir to
+   # work dir of the submodule failed.
+   git status &&
+   git config -f ../.gitmodules 
submodule.deep/directory/hierachy/sub.path >../actual &&
+   echo "directory/hierachy/sub" >../expect
+   ) &&
+   test_cmp actual expect
+'
+
 test_done
-- 
2.8.1.211.g630c034

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 10:25 AM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> From: Jacob Keller 
>>
>> It is a common pattern in xdl_change_compact to check that hashes and
>> strings match. The resulting code to perform this change causes very
>> long lines and makes it hard to follow the intention. Introduce a helper
>> function xdl_hash_and_recmatch which performs both checks to increase
>> code readability.
>
> Think _why_ it is common to check hash and then do recmatch().  What
> is the combination of two trying to compute?
>
> How about calling it after "what" it computes, not after "how" it
> computes it?  E.g.
>
> static int recs_match(xrecord_t **recs, long x, long y, long flags)
>
> if we answer the above question "they try to see if two records match".
> We could also go s/recs/lines/.
>
> The xdl_recmatch() function appears in xutils.c, and over there the
> functions do not use arrays of (xrecord_t *), so I think we are
> better off without xdl_ prefix to avoid confusion.
>

That makes sense. I like the sound of recs_match, and it's shorter too
which is nice.

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 10:33 AM, Stefan Beller  wrote:
> On Fri, Apr 15, 2016 at 10:27 AM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> Actually we would only need to have the empty line count in the
>>> second loop as the first loop shifted it as much up(backwards) as
>>> possible, such that the hunk sits on one end of the movable
>>> range. The second loop would iterate over the whole range, so that
>>> would be the only place needed to count.
>>
>> The description makes me wonder if we can do without two loops ;-)
>
> I think the existing 2 loops are needed to move "as much as possible"
> to either boundary to see if there is overlap to another group and combine
> the groups if so. (This whole construction prior to these patches remind
> me of shaker sort)
>
>>
>> That is, can we teach the first loop not to be so aggressive to
>> shift "as much as possible" but notice there is an empty line we
>> would want to treat specially?
>
> I think we need to be aggressive to find adjacent groups and only after
> that is done we should think about optimizing look? That is why I
> even proposed to not touch the current 2 loops at all and have our own
> loop to find out if there is at least one empty line.

Agreed, we need the two loops in order to aggressively merge all the
changes together first.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 10:10 AM, Stefan Beller  wrote:
> On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller  wrote:
>>> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
>>> long flags) {
>>>  * the group.
>>>  */
>>> while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 
>>> 1, ix - 1, flags)) {
>>> +   emptylines += is_emptyline(recs[ix - 
>>> 1]->ptr);
>>> +
>>> rchg[--ixs] = 1;
>>> rchg[--ix] = 0;
>>>
>>> -   has_emptyline |=
>>> -   
>>> starts_with_emptyline(recs[ix]->ptr);
>>
>> How is this fixing the segfault bug?
>> From my understanding ix should have the same value here as ix-1 above.
>>
>>> /*
>>>  * This change might have joined two change 
>>> groups,
>>>  * so we try to take this scenario in 
>>> account by moving
>>> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
>>> long flags) {
>>> rchg[ixs++] = 0;
>>> rchg[ix++] = 1;
>>>
>>> -   has_emptyline |=
>>> -   
>>> starts_with_emptyline(recs[ix]->ptr);
>>> -
>>
>> Or would this fix the segfault bug?
>> I think we would need to have the check for empty lines
>> in both loops to be sure to cover the whole movable range.
>
> Actually we would only need to have the empty line count in the second loop as
> the first loop shifted it as much up(backwards) as possible, such that
> the hunk sits on one
> end of the movable range. The second loop would iterate over the whole
> range, so that
> would be the only place needed to count.

I agree that we can drop the first part and I am testing it now.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Jacob Keller

On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller  wrote:
> On Fri, Apr 15, 2016 at 9:51 AM, Jacob Keller  
> wrote:
>> From: Jacob Keller 
>>
>> I took up Stefan's patch, and modified it to resolve a couple issues. I
>> also tried to implement the suggestions from Junio's review, as well as
>> the suggestions I had. It appears to produce equivalent output as Jeff's
>> script. This version is still missing a few things, and it is possible
>> Stefan is working on a v2 of his own, but I thought I'd submit this.
>>
>> There's still several TODOs:
>> * Add some tests
>> * better name for the heuristic(?)
>> * better patch description plus documentation
>> * is_emptyline should be more generic and handle CRLF
>>
>> Changes since Stefan's v1:
>> * Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
>> * Fixed a segfault in Stefan's patch
>> * Added XDL flag to configure the behavior
>> * Used an int and counted empty lines via += instead of |=
>> * Renamed starts_with_emptyline to is_emptyline
>> * Added diff command line and config options
>>
>> For reviewer convenience, the interdiff between this and Stefan's version:
>>
>> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
>> index edba56522bce..cebf82702d2a 100644
>> --- a/Documentation/diff-config.txt
>> +++ b/Documentation/diff-config.txt
>> @@ -170,6 +170,12 @@ diff.tool::
>>
>>  include::mergetools-diff.txt[]
>>
>> +diff.emptyLineHeuristic::
>> +   Set this option to true to enable the empty line chunk heuristic when
>> +   producing diff output. This heuristic will attempt to shift hunks 
>> such
>> +   that a common empty line occurs below the hunk with the rest of the
>> +   context above it.
>> +
>
> This heuristic will attempt to shift hunks such that *the last* common
> empty line occurs below the hunk with the rest of the context above it.
>
> maybe?
>

That sounds reasonable.

>
>>  diff.algorithm::
>> Choose a diff algorithm.  The variants are as follows:
>>  +
>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index 4b0318e2ac15..6c1cd8b35584 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
>> Synonym for `-p --raw`.
>>  endif::git-format-patch[]
>>
>> +--empty-line-heuristic::
>> +--no-empty-line-heuristic::
>> +   When possible, shift common empty lines in diff hunks below the hunk
>> +   such that the last common empty line for each hunk is below, with the
>> +   rest of the context above the hunk.
>> +
>>  --minimal::
>> Spend extra time to make sure the smallest possible
>> diff is produced.
>> diff --git a/diff.c b/diff.c
>> index 4dfe6609d059..8ab9a492928d 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -26,6 +26,7 @@
>>  #endif
>>
>>  static int diff_detect_rename_default;
>> +static int diff_empty_line_heuristic = 0;
>>  static int diff_rename_limit_default = 400;
>>  static int diff_suppress_blank_empty;
>>  static int diff_use_color_default = -1;
>> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char 
>> *value, void *cb)
>> diff_detect_rename_default = git_config_rename(var, value);
>> return 0;
>> }
>> +   if (!strcmp(var, "diff.emptylineheuristic")) {
>> +   diff_empty_line_heuristic = git_config_bool(var, value);
>> +   return 0;
>> +   }
>> if (!strcmp(var, "diff.autorefreshindex")) {
>> diff_auto_refresh_index = git_config_bool(var, value);
>> return 0;
>> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
>> options->use_color = diff_use_color_default;
>> options->detect_rename = diff_detect_rename_default;
>> options->xdl_opts |= diff_algorithm;
>> +   if (diff_empty_line_heuristic)
>> +   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>>
>> options->orderfile = diff_order_file_cfg;
>>
>> @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
>> DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
>> else if (!strcmp(arg, "--ignore-blank-lines"))
>> DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
>> +   else if (!strcmp(arg, "--empty-line-heuristic"))
>> +   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>> +   else if (!strcmp(arg, "--no-empty-line-heuristic"))
>> +   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
>> else if (!strcmp(arg, "--patience"))
>> options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
>> else if (!strcmp(arg, "--histogram"))
>> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
>> index 4fb7e79410c2..9195a5c0e958 100644
>> --- a/xdiff/xdiff.h
>> +++ b/xdiff/xdiff.h
>> @@ -41,6 +41,8 @@ extern "C" {
>>
>>  #define

Re: [PATCH v4 11/16] ref-filter: introduce refname_atom_parser()

2016-04-15 Thread Karthik Nayak

On Fri, Apr 15, 2016 at 2:13 AM, Jeff King  wrote:
> On Sun, Apr 10, 2016 at 12:15:10AM +0530, Karthik Nayak wrote:
>
>> +static void refname_atom_parser(struct used_atom *atom, const char *arg)
>> +{
>> + if (!arg)
>> + atom->u.refname.option = R_NORMAL;
>> + else if (!strcmp(arg, "short"))
>> + atom->u.refname.option = R_SHORT;
>> + else if (skip_prefix(arg, "strip=", )) {
>> + atom->u.contents.option = R_STRIP;
>> + if (strtoul_ui(arg, 10, >u.refname.strip) ||
>> + atom->u.refname.strip <= 0)
>> + die(_("positive value expected refname:strip=%s"), 
>> arg);
>
> That R_STRIP line should be setting atom->u.refname.option, right?
>
> The same mistake happens in a later patch, too, when parsing for R_BASE
> and R_DIR is added. And I think the same problem is also present in
> objectname_atom_parser.
>
> The compiler doesn't notice because, hey, it's C, and why bother
> complaining when somebody assigns the value from one enum to another?
> And the tests don't notice because the enum is at the front of each
> union member, so the compiler happens to put it in the same place (I
> think it _might_ even be guaranteed by the standard's type-punning
> rules, but that's really not something we should rely on).
>
> -Peff

True, I'm surprised that went unnoticed, will fix it, thanks :)

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'git mv' doesn't move submodule if it's in a subdirectory

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 11:21 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> I think I can reproduce the problem. A regression test (which currently 
>> fails)
>> could look like
>
> Thanks.  I however do not think this is a regression.
>
> Changes around 0656781f (mv: update the path entry in .gitmodules
> for moved submodules, 2013-08-06) did introduce "git mv dir1 dir2"
> when 'dir1' is a submodule, but I do not think it went beyond that.
> I do not see any effort to treat a submodule that is discovered by
> scanning a directory that was given from the command line,
> i.e. prepare_move_submodule() is not called for them, and the
> entries in the submodule_gitfile[] array that correspond to them are
> explicitly set to NULL in the loop.

Also I just realize this is not exactly the same bug as reported.
Albin complains about the .gitmodules file not being adjusted, whereas
the test case I wrote breaks commands in your superproject, i.e. `git status`
or `git diff` just dies.

(Manually inspecting the .gitmodules file turns out it is not adjusted as well.)

>
>
>> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
>> index 4008fae..3b96a9a 100755
>> --- a/t/t7001-mv.sh
>> +++ b/t/t7001-mv.sh
>> @@ -292,6 +292,9 @@ test_expect_success 'setup submodule' '
>> echo content >file &&
>> git add file &&
>> git commit -m "added sub and file" &&
>> +   mkdir -p deep/directory/hierachy &&
>> +   git submodule add ./. deep/directory/hierachy/sub &&
>> +   git commit -m "added another submodule" &&
>> git branch submodule
>>  '
>>
>> @@ -475,4 +478,14 @@ test_expect_success 'mv -k does not accidentally
>> destroy submodules' '
>> git checkout .
>>  '
>>
>> +test_expect_failure 'moving a submodule in nested directories' '
>> +   (
>> +   cd deep &&
>> +   git mv directory ../ &&
>> +   git status
>> +   # currently git status exits with 128
>> +   # fatal: Not a git repository:
>> directory/hierachy/sub/../../../../.git/modules/deep/directory/hierachy/sub
>> +   )
>> +'
>> +
>>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'git mv' doesn't move submodule if it's in a subdirectory

2016-04-15 Thread Junio C Hamano

Stefan Beller  writes:

> I think I can reproduce the problem. A regression test (which currently fails)
> could look like

Thanks.  I however do not think this is a regression.

Changes around 0656781f (mv: update the path entry in .gitmodules
for moved submodules, 2013-08-06) did introduce "git mv dir1 dir2"
when 'dir1' is a submodule, but I do not think it went beyond that.
I do not see any effort to treat a submodule that is discovered by
scanning a directory that was given from the command line,
i.e. prepare_move_submodule() is not called for them, and the
entries in the submodule_gitfile[] array that correspond to them are
explicitly set to NULL in the loop.

> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> index 4008fae..3b96a9a 100755
> --- a/t/t7001-mv.sh
> +++ b/t/t7001-mv.sh
> @@ -292,6 +292,9 @@ test_expect_success 'setup submodule' '
> echo content >file &&
> git add file &&
> git commit -m "added sub and file" &&
> +   mkdir -p deep/directory/hierachy &&
> +   git submodule add ./. deep/directory/hierachy/sub &&
> +   git commit -m "added another submodule" &&
> git branch submodule
>  '
>
> @@ -475,4 +478,14 @@ test_expect_success 'mv -k does not accidentally
> destroy submodules' '
> git checkout .
>  '
>
> +test_expect_failure 'moving a submodule in nested directories' '
> +   (
> +   cd deep &&
> +   git mv directory ../ &&
> +   git status
> +   # currently git status exits with 128
> +   # fatal: Not a git repository:
> directory/hierachy/sub/../../../../.git/modules/deep/directory/hierachy/sub
> +   )
> +'
> +
>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'git mv' doesn't move submodule if it's in a subdirectory

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 10:18 AM, Stefan Beller  wrote:
> On Fri, Apr 15, 2016 at 1:14 AM, Albin Otterhäll  wrote:
>> I've a submodule located in a subdirectory
>> ({git_rep}/home/{directory}/{submodule}), and I wanted to move the whole
>> directory up a level ({git_rep}/{directory}/{submodule}). But when I
>> used 'git mv {directory} ../' the '.gitmodule' file didn't get modified.
>>
>> Best regards,
>> Albin Otterhäll
>
> Thanks for the bug report!
> Which version of Git do you use? (Did you try different versions?)

I think I can reproduce the problem. A regression test (which currently fails)
could look like
diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index 4008fae..3b96a9a 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -292,6 +292,9 @@ test_expect_success 'setup submodule' '
echo content >file &&
git add file &&
git commit -m "added sub and file" &&
+   mkdir -p deep/directory/hierachy &&
+   git submodule add ./. deep/directory/hierachy/sub &&
+   git commit -m "added another submodule" &&
git branch submodule
 '

@@ -475,4 +478,14 @@ test_expect_success 'mv -k does not accidentally
destroy submodules' '
git checkout .
 '

+test_expect_failure 'moving a submodule in nested directories' '
+   (
+   cd deep &&
+   git mv directory ../ &&
+   git status
+   # currently git status exits with 128
+   # fatal: Not a git repository:
directory/hierachy/sub/../../../../.git/modules/deep/directory/hierachy/sub
+   )
+'
+
 test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Junio C Hamano

Stefan Beller  writes:

> I think we need to be aggressive to find adjacent groups and only after
> that is done we should think about optimizing look?

OK.  I was just gauging to see if those involved in the codepath
thought things through, and apparently you did, so I'm happy ;-)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Parallel checkout (Was Re: 0 bot for Git)

2016-04-15 Thread Jeff King

On Fri, Apr 15, 2016 at 10:31:39AM -0700, Junio C Hamano wrote:

> Last time I checked, I think the accesses to attributes from the
> convert.c thing was one of the things that are cumbersome to make
> safe in multi-threaded world.

Multi-threaded grep has the same problem. I think we started with a big
lock on the attribute access. That works OK, as long as you hold the
lock only for the lookup and not the actual filtering. We later moved to
pre-loading the attributes in 9dd5245c104, because looking up attributes
in order is much more efficient (because locality of paths lets us reuse
work from the previous request).

So I'm guessing the major work here will be to split the "look up smudge
attributes" step from "do the smudge".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 10:27 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Actually we would only need to have the empty line count in the
>> second loop as the first loop shifted it as much up(backwards) as
>> possible, such that the hunk sits on one end of the movable
>> range. The second loop would iterate over the whole range, so that
>> would be the only place needed to count.
>
> The description makes me wonder if we can do without two loops ;-)

I think the existing 2 loops are needed to move "as much as possible"
to either boundary to see if there is overlap to another group and combine
the groups if so. (This whole construction prior to these patches remind
me of shaker sort)

>
> That is, can we teach the first loop not to be so aggressive to
> shift "as much as possible" but notice there is an empty line we
> would want to treat specially?

I think we need to be aggressive to find adjacent groups and only after
that is done we should think about optimizing look? That is why I
even proposed to not touch the current 2 loops at all and have our own
loop to find out if there is at least one empty line.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Parallel checkout (Was Re: 0 bot for Git)

2016-04-15 Thread Junio C Hamano

Jeff King  writes:

> On Fri, Apr 15, 2016 at 01:18:46PM +0200, Christian Couder wrote:
>
>> On Fri, Apr 15, 2016 at 11:51 AM, Duy Nguyen  wrote:
>> > On Fri, Apr 15, 2016 at 12:04:49AM +0200, Christian Couder wrote:
>> >>
>> >> There is a draft of an article about the first part of the Contributor
>> >> Summit in the draft of the next Git Rev News edition:
>> >>
>> >> https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-14.md
>> >
>> > Thanks. I read the sentence "This made people mention potential
>> > problems with parallelizing git checkout" and wondered what these
>> > problems were.
>> 
>> It may have been Michael or Peff (CC'ed) saying that it could break
>> some builds as the timestamps on the files might not always be ordered
>> in the same way.
>
> I don't think it was me. I'm also not sure how it would break a build.

Yup, "will break a build" is a crazy-talk that I'd be surprised if
you said something silly like that ;-)

Last time I checked, I think the accesses to attributes from the
convert.c thing was one of the things that are cumbersome to make
safe in multi-threaded world.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Junio C Hamano

Stefan Beller  writes:

> Actually we would only need to have the empty line count in the
> second loop as the first loop shifted it as much up(backwards) as
> possible, such that the hunk sits on one end of the movable
> range. The second loop would iterate over the whole range, so that
> would be the only place needed to count.

The description makes me wonder if we can do without two loops ;-)

That is, can we teach the first loop not to be so aggressive to
shift "as much as possible" but notice there is an empty line we
would want to treat specially?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function

2016-04-15 Thread Junio C Hamano

Jacob Keller  writes:

> From: Jacob Keller 
>
> It is a common pattern in xdl_change_compact to check that hashes and
> strings match. The resulting code to perform this change causes very
> long lines and makes it hard to follow the intention. Introduce a helper
> function xdl_hash_and_recmatch which performs both checks to increase
> code readability.

Think _why_ it is common to check hash and then do recmatch().  What
is the combination of two trying to compute?

How about calling it after "what" it computes, not after "how" it
computes it?  E.g.

static int recs_match(xrecord_t **recs, long x, long y, long flags)

if we answer the above question "they try to see if two records match".
We could also go s/recs/lines/.

The xdl_recmatch() function appears in xutils.c, and over there the
functions do not use arrays of (xrecord_t *), so I think we are
better off without xdl_ prefix to avoid confusion.

> Signed-off-by: Jacob Keller 
> ---
>  xdiff/xdiffi.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 2358a2d6326e..fff97ac3e3c7 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
> i1, long i2, long chg1,
>  }
>  
>  
> +static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long 
> flags)
> +{
> + return (recs[ixs]->ha == recs[ix]->ha &&
> + xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
> +  recs[ix]->ptr, recs[ix]->size,
> +  flags));
> +}
> +
>  int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>   long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
>   char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
> @@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
> long flags) {
>* the last line of the current change group, shift 
> backward
>* the group.
>*/
> - while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha 
> &&
> -xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 
> 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
> + while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, 
> ix - 1, flags)) {
>   rchg[--ixs] = 1;
>   rchg[--ix] = 0;
>  
> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
> long flags) {
>* the line next of the current change group, shift 
> forward
>* the group.
>*/
> - while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
> -xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
> recs[ix]->ptr, recs[ix]->size, flags)) {
> + while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, 
> ix, flags)) {
> + emptylines += is_emptyline(recs[ix]->ptr);
> +
>   rchg[ixs++] = 0;
>   rchg[ix++] = 1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller  wrote:
>> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
>> long flags) {
>>  * the group.
>>  */
>> while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 
>> 1, ix - 1, flags)) {
>> +   emptylines += is_emptyline(recs[ix - 
>> 1]->ptr);
>> +
>> rchg[--ixs] = 1;
>> rchg[--ix] = 0;
>>
>> -   has_emptyline |=
>> -   starts_with_emptyline(recs[ix]->ptr);
>
> How is this fixing the segfault bug?
> From my understanding ix should have the same value here as ix-1 above.
>
>> /*
>>  * This change might have joined two change 
>> groups,
>>  * so we try to take this scenario in 
>> account by moving
>> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
>> long flags) {
>> rchg[ixs++] = 0;
>> rchg[ix++] = 1;
>>
>> -   has_emptyline |=
>> -   starts_with_emptyline(recs[ix]->ptr);
>> -
>
> Or would this fix the segfault bug?
> I think we would need to have the check for empty lines
> in both loops to be sure to cover the whole movable range.

Actually we would only need to have the empty line count in the second loop as
the first loop shifted it as much up(backwards) as possible, such that
the hunk sits on one
end of the movable range. The second loop would iterate over the whole
range, so that
would be the only place needed to count.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 'git mv' doesn't move submodule if it's in a subdirectory

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 1:14 AM, Albin Otterhäll  wrote:
> I've a submodule located in a subdirectory
> ({git_rep}/home/{directory}/{submodule}), and I wanted to move the whole
> directory up a level ({git_rep}/{directory}/{submodule}). But when I
> used 'git mv {directory} ../' the '.gitmodule' file didn't get modified.
>
> Best regards,
> Albin Otterhäll

Thanks for the bug report!
Which version of Git do you use? (Did you try different versions?)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git mascot

2016-04-15 Thread Jean-Noël AVILA

Le vendredi 15 avril 2016, 11:41:52 Christian Howe a écrit :
> There has been talk of a git mascot a while back in 2005. Some people
> mentioned a fish or a turtle. Since all the great open source projects
> like Linux or RethinkDB have a cute mascot, git definitely needs one
> as well. A mascot gives people a recognizable persona towards which
> they can direct their unbounded love for the project. It'd even be
> good if a plush doll of this mascot could eventually be created for
> people to physically express their love of git through cuddling and
> hugging.

No graphical skills at all, but...

I would go for a granny who is knitting, making a great sock. Not sure, this 
would feel modern, but at least, she could tell us some stories from the 
trenches...

Cheers,

JN



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Move test-* to t/helper/ subdirectory

2016-04-15 Thread Junio C Hamano

Junio C Hamano  writes:

> Nguyễn Thái Ngọc Duy   writes:
>
>> This keeps top dir a bit less crowded. And because these programs are
>> for testing purposes, it makes sense that they stay somewhere in t/
>
> But leaves many *.o files after "make clean".  Even "distclean" does
> not clean them.

Perhaps something like this as a preparatory patch, to protect us
from future breakages similar to this change.

-- >8 --
Subject: Makefile: clean *.o files we create

The part that removes object files in the 'clean' target predates
various Makefile macros that list object files we create, and
instead removes the objects with shell glob, perpetually requiring
updates whenever a new location that builds object files is added.

Simplify the target by removing $(OBJECTS), which is supposed to
have all the objects we create during the build.

Signed-off-by: Junio C Hamano 
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index fe0bf7d..69d32bf 100644
--- a/Makefile
+++ b/Makefile
@@ -2456,8 +2456,8 @@ profile-clean:
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))

 clean: profile-clean coverage-clean
-   $(RM) *.o *.res refs/*.o block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o
-   $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
+   $(RM) *.res
+   $(RM) $(OBJECTS)
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS) $(NO_INSTALL)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Stefan Beller

On Fri, Apr 15, 2016 at 9:51 AM, Jacob Keller  wrote:
> From: Jacob Keller 
>
> I took up Stefan's patch, and modified it to resolve a couple issues. I
> also tried to implement the suggestions from Junio's review, as well as
> the suggestions I had. It appears to produce equivalent output as Jeff's
> script. This version is still missing a few things, and it is possible
> Stefan is working on a v2 of his own, but I thought I'd submit this.
>
> There's still several TODOs:
> * Add some tests
> * better name for the heuristic(?)
> * better patch description plus documentation
> * is_emptyline should be more generic and handle CRLF
>
> Changes since Stefan's v1:
> * Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
> * Fixed a segfault in Stefan's patch
> * Added XDL flag to configure the behavior
> * Used an int and counted empty lines via += instead of |=
> * Renamed starts_with_emptyline to is_emptyline
> * Added diff command line and config options
>
> For reviewer convenience, the interdiff between this and Stefan's version:
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba56522bce..cebf82702d2a 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.emptyLineHeuristic::
> +   Set this option to true to enable the empty line chunk heuristic when
> +   producing diff output. This heuristic will attempt to shift hunks such
> +   that a common empty line occurs below the hunk with the rest of the
> +   context above it.
> +

This heuristic will attempt to shift hunks such that *the last* common
empty line occurs below the hunk with the rest of the context above it.

maybe?


>  diff.algorithm::
> Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 4b0318e2ac15..6c1cd8b35584 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
> Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> +--empty-line-heuristic::
> +--no-empty-line-heuristic::
> +   When possible, shift common empty lines in diff hunks below the hunk
> +   such that the last common empty line for each hunk is below, with the
> +   rest of the context above the hunk.
> +
>  --minimal::
> Spend extra time to make sure the smallest possible
> diff is produced.
> diff --git a/diff.c b/diff.c
> index 4dfe6609d059..8ab9a492928d 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #endif
>
>  static int diff_detect_rename_default;
> +static int diff_empty_line_heuristic = 0;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char 
> *value, void *cb)
> diff_detect_rename_default = git_config_rename(var, value);
> return 0;
> }
> +   if (!strcmp(var, "diff.emptylineheuristic")) {
> +   diff_empty_line_heuristic = git_config_bool(var, value);
> +   return 0;
> +   }
> if (!strcmp(var, "diff.autorefreshindex")) {
> diff_auto_refresh_index = git_config_bool(var, value);
> return 0;
> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
> options->use_color = diff_use_color_default;
> options->detect_rename = diff_detect_rename_default;
> options->xdl_opts |= diff_algorithm;
> +   if (diff_empty_line_heuristic)
> +   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>
> options->orderfile = diff_order_file_cfg;
>
> @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
> DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
> else if (!strcmp(arg, "--ignore-blank-lines"))
> DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
> +   else if (!strcmp(arg, "--empty-line-heuristic"))
> +   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
> +   else if (!strcmp(arg, "--no-empty-line-heuristic"))
> +   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
> else if (!strcmp(arg, "--patience"))
> options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
> else if (!strcmp(arg, "--histogram"))
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 4fb7e79410c2..9195a5c0e958 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -41,6 +41,8 @@ extern "C" {
>
>  #define XDF_IGNORE_BLANK_LINES (1 << 7)
>
> +#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
> +
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index

[RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function xdl_hash_and_recmatch which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller 
---
 xdiff/xdiffi.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d6326e..fff97ac3e3c7 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
+static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long 
flags)
+{
+   return (recs[ixs]->ha == recs[ix]->ha &&
+   xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+recs[ix]->ptr, recs[ix]->size,
+flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha 
&&
-  xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 
1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+   while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, 
ix - 1, flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-  xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, 
recs[ix]->ptr, recs[ix]->size, flags)) {
+   while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, 
ix, flags)) {
+   emptylines += is_emptyline(recs[ix]->ptr);
+
rchg[ixs++] = 0;
rchg[ix++] = 1;
 
-- 
2.8.1.369.geae769a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic

2016-04-15 Thread Jacob Keller

From: Stefan Beller 

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller 
Signed-off-by: Jacob Keller 
---
 Documentation/diff-config.txt  |  6 ++
 Documentation/diff-options.txt |  6 ++
 diff.c | 11 +++
 xdiff/xdiff.h  |  2 ++
 xdiff/xdiffi.c | 26 ++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..cebf82702d2a 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+   Set this option to true to enable the empty line chunk heuristic when
+   producing diff output. This heuristic will attempt to shift hunks such
+   that a common empty line occurs below the hunk with the rest of the
+   context above it.
+
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+   When possible, shift common empty lines in diff hunks below the hunk
+   such that the last common empty line for each hunk is below, with the
+   rest of the context above the hunk.
+
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
+   if (!strcmp(var, "diff.emptylineheuristic")) {
+   diff_empty_line_heuristic = git_config_bool(var, value);
+   return 0;
+   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
+   if (diff_empty_line_heuristic)
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--empty-line-heuristic"))
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+   else if (!strcmp(arg, "--no-empty-line-heuristic"))
+   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
else if (!strcmp(arg, "--patience"))
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8

[RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics

2016-04-15 Thread Jacob Keller

From: Jacob Keller 

I took up Stefan's patch, and modified it to resolve a couple issues. I
also tried to implement the suggestions from Junio's review, as well as
the suggestions I had. It appears to produce equivalent output as Jeff's
script. This version is still missing a few things, and it is possible
Stefan is working on a v2 of his own, but I thought I'd submit this.

There's still several TODOs:
* Add some tests
* better name for the heuristic(?)
* better patch description plus documentation
* is_emptyline should be more generic and handle CRLF

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

For reviewer convenience, the interdiff between this and Stefan's version:

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..cebf82702d2a 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+   Set this option to true to enable the empty line chunk heuristic when
+   producing diff output. This heuristic will attempt to shift hunks such
+   that a common empty line occurs below the hunk with the rest of the
+   context above it.
+
 diff.algorithm::
Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+   When possible, shift common empty lines in diff hunks below the hunk
+   such that the last common empty line for each hunk is below, with the
+   rest of the context above the hunk.
+
 --minimal::
Spend extra time to make sure the smallest possible
diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
diff_detect_rename_default = git_config_rename(var, value);
return 0;
}
+   if (!strcmp(var, "diff.emptylineheuristic")) {
+   diff_empty_line_heuristic = git_config_bool(var, value);
+   return 0;
+   }
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
+   if (diff_empty_line_heuristic)
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--empty-line-heuristic"))
+   DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+   else if (!strcmp(arg, "--no-empty-line-heuristic"))
+   DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
else if (!strcmp(arg, "--patience"))
options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5f14beb98049..83984b90f82f 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,7 +400,7 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
-static int starts_with_emptyline(const char *recs)
+static int is_emptyline(const char *recs)
 {
return recs[0] == '\n'; /* CRLF not covered here */
 }
@@ -416,7 +416,7 @@ static int xdl_hash_and_recmatch(xrecord_t **recs, long 
ixs, long ix, long flags
 int

Re: [PATCH] Extend runtime prefix computation

2016-04-15 Thread Junio C Hamano

Michael Weiser  writes:

> Make git fully relocatable at runtime extending the runtime prefix
> calculation. Handle absolute and relative paths in argv0. Handle no path
> at all in argv0 in a system-specific manner.  Replace assertions with
> initialised variables and checks that lead to fallback to the static
> prefix.

That's a dense description of "what" without saying much about
"why".  Hint: start by describing what case(s) the current code
fails to find the correct runtime prefix.  That would give readers a
better understanding of what problem you are trying to solve.

Missing sign-off.

> diff --git a/exec_cmd.c b/exec_cmd.c
> index 9d5703a..f0db070 100644
> --- a/exec_cmd.c
> +++ b/exec_cmd.c
> @@ -3,30 +3,27 @@
>  #include "quote.h"
>  #include "argv-array.h"
>  #define MAX_ARGS 32
> +#if defined(__APPLE__)
> +#include 
> +#endif

We tend to avoid system specific includes in individual *.c files.

Perhaps implement platform specific bits in compat/?  E.g. each
platform specific file in compat/ may implement and export the same
git_extract_argv_path() function, perhaps, so that this file does
not even need to know what platforms it supports?

>  char *system_path(const char *path)
>  {
> -#ifdef RUNTIME_PREFIX
> - static const char *prefix;
> -#else
>   static const char *prefix = PREFIX;
> -#endif
>   struct strbuf d = STRBUF_INIT;
>  
>   if (is_absolute_path(path))
>   return xstrdup(path);
>  
>  #ifdef RUNTIME_PREFIX
> - assert(argv0_path);
> - assert(is_absolute_path(argv0_path));

Aren't these protecting against future and careless change that
forgets to call extract-argv0-path or make that function return
something that is not an absolute path?

Is it a good idea to drop these checks, and if so why?

> - if (!prefix &&
> - !(prefix = strip_path_suffix(argv0_path, GIT_EXEC_PATH)) &&
> - !(prefix = strip_path_suffix(argv0_path, BINDIR)) &&
> - !(prefix = strip_path_suffix(argv0_path, "git"))) {
> + if (!argv0_path ||
> + !is_absolute_path(argv0_path) ||
> + (!(prefix = strip_path_suffix(argv0_path, GIT_EXEC_PATH)) &&
> +  !(prefix = strip_path_suffix(argv0_path, BINDIR)) &&
> +  !(prefix = strip_path_suffix(argv0_path, "git" {
>   prefix = PREFIX;
>   trace_printf("RUNTIME_PREFIX requested, "
>   "but prefix computation failed.  "
> @@ -41,6 +38,8 @@ char *system_path(const char *path)
>  const char *git_extract_argv0_path(const char *argv0)
>  {
>   const char *slash;
> + char *to_resolve = NULL;
> + const char *resolved;
>  
>   if (!argv0 || !*argv0)
>   return NULL;
> @@ -48,11 +47,56 @@ const char *git_extract_argv0_path(const char *argv0)
>   slash = find_last_dir_sep(argv0);
>  
>   if (slash) {
> - argv0_path = xstrndup(argv0, slash - argv0);
> - return slash + 1;
> + /* ... it's either an absolute path */
> + if (is_absolute_path(argv0)) {
> + argv0_path = xstrndup(argv0, slash - argv0);
> + return slash + 1;
> + }
> +
> + /* ... or a relative path, in which case we have to make it
> +  * absolute first and do the whole thing again */
> + to_resolve = xstrdup(argv0);
> + } else {
> + /* argv0 is no path at all, just a name. Resolve it into a
> +  * path. Unfortunately, this gets system specific. */
> +#if defined(__linux__)
> + struct stat st;
> + if (!stat("/proc/self/exe", ))
> + to_resolve = xstrdup("/proc/self/exe");
> +#elif defined(__APPLE__)
> + /* call with len == 0 queries the necessary buffer size */
> + uint32_t len = 0;
> + if(_NSGetExecutablePath(NULL, ) != 0) {
> + to_resolve = malloc(len);
> + if (to_resolve) {
> + /* get the executable path now we have a buffer
> +  * of proper size */
> + if(_NSGetExecutablePath(to_resolve, ) != 0) 
> {
> + free(to_resolve);
> + return argv0;
> + }
> + }
> + }
> +#endif
> +
> + /* if to_resolve is still NULL here, something failed above or
> +  * we are on an unsupported system. system_path() will warn
> +  * and fall back to the static prefix */
> + if (!to_resolve)
> + return argv0;
>   }
>  
> - return argv0;
> + /* resolve path from absolute to canonical */
> + resolved = real_path(to_resolve);
> + free(to_resolve);
> +
> + slash = find_last_dir_sep(resolved);
> + if (!slash)
> + return argv0;
> +
> + argv0_path =

Re: Parallel checkout (Was Re: 0 bot for Git)

2016-04-15 Thread Jeff King

On Fri, Apr 15, 2016 at 01:18:46PM +0200, Christian Couder wrote:

> On Fri, Apr 15, 2016 at 11:51 AM, Duy Nguyen  wrote:
> > On Fri, Apr 15, 2016 at 12:04:49AM +0200, Christian Couder wrote:
> >>
> >> There is a draft of an article about the first part of the Contributor
> >> Summit in the draft of the next Git Rev News edition:
> >>
> >> https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-14.md
> >
> > Thanks. I read the sentence "This made people mention potential
> > problems with parallelizing git checkout" and wondered what these
> > problems were.
> 
> It may have been Michael or Peff (CC'ed) saying that it could break
> some builds as the timestamps on the files might not always be ordered
> in the same way.

I don't think it was me. I'm also not sure how it would break a build.
Git does not promise a particular timing or order for updating files as
it is. So if we are checking out two files "a" and "b", and your build
process depends on the timestamp between them, I think all bets are
already off.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 115 matches

Mail list logo