Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-22 Thread Sebastian Schuberth

On 05.01.2014 14:42, Sebastian Schuberth wrote:


Since 2dce956 is_git_command() is a bit slow as it does file I/O in the
call to list_commands_in_dir(). Avoid the file I/O by adding an early
check for internal commands.


Considering the purpose of the series is it better to say builtin instead of 
internal in the commit message?


True, I'll fix this in a re-rool.


Sorry for not coming up with the re-roll until now, but lucky Junio has 
fixed this himself in c6127fa which already is on master.


--
Sebastian Schuberth
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-03 Thread Jeff King
On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote:

  - builtin/merge.c is the same, but it is conceptually even worse.
It has the end-user supplied string and wants to see if it is a
valid strategy.  If the user wants to use a custom strategy, a
single stat() to make sure if it exists should suffice, and the
error codepath should load the command list to present the names
of available ones in the error message.

Is it a single stat()? I think we would need to check each element of
$PATH. Though in practice, the exec-dir would be the first thing we
check, and where we would find the majority of hits. So it would still
be a win, as we would avoid touching anything but the exec-dir in the
common case.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-03 Thread Kent R. Spillner

 Since 2dce956 is_git_command() is a bit slow as it does file I/O in the
 call to list_commands_in_dir(). Avoid the file I/O by adding an early
 check for internal commands.

Considering the purpose of the series is it better to say builtin instead of 
internal in the commit message?--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote:

  - builtin/merge.c is the same, but it is conceptually even worse.
It has the end-user supplied string and wants to see if it is a
valid strategy.  If the user wants to use a custom strategy, a
single stat() to make sure if it exists should suffice, and the
error codepath should load the command list to present the names
of available ones in the error message.

 Is it a single stat()? I think we would need to check each element of
 $PATH. 

Yeah, load_command_list() iterates over the members of env_path.

 Though in practice, the exec-dir would be the first thing we
 check, and where we would find the majority of hits. So it would still
 be a win, as we would avoid touching anything but the exec-dir in the
 common case.

Exactly.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-02 Thread Sebastian Schuberth
Since 2dce956 is_git_command() is a bit slow as it does file I/O in the
call to list_commands_in_dir(). Avoid the file I/O by adding an early
check for internal commands.

Signed-off-by: Sebastian Schuberth sschube...@gmail.com
---
 Documentation/technical/api-builtin.txt |   4 +-
 builtin.h   |   2 +
 builtin/help.c  |   3 +
 git.c   | 242 +---
 4 files changed, 134 insertions(+), 117 deletions(-)

diff --git a/Documentation/technical/api-builtin.txt 
b/Documentation/technical/api-builtin.txt
index 150a02a..e3d6e7a 100644
--- a/Documentation/technical/api-builtin.txt
+++ b/Documentation/technical/api-builtin.txt
@@ -14,8 +14,8 @@ Git:
 
 . Add the external declaration for the function to `builtin.h`.
 
-. Add the command to `commands[]` table in `handle_builtin()`,
-  defined in `git.c`.  The entry should look like:
+. Add the command to the `commands[]` table defined in `git.c`.
+  The entry should look like:
 
{ foo, cmd_foo, options },
 +
diff --git a/builtin.h b/builtin.h
index d4afbfe..c47c110 100644
--- a/builtin.h
+++ b/builtin.h
@@ -27,6 +27,8 @@ extern int fmt_merge_msg(struct strbuf *in, struct strbuf 
*out,
 
 extern int textconv_object(const char *path, unsigned mode, const unsigned 
char *sha1, int sha1_valid, char **buf, unsigned long *buf_size);
 
+extern int is_builtin(const char *s);
+
 extern int cmd_add(int argc, const char **argv, const char *prefix);
 extern int cmd_annotate(int argc, const char **argv, const char *prefix);
 extern int cmd_apply(int argc, const char **argv, const char *prefix);
diff --git a/builtin/help.c b/builtin/help.c
index b6fc15e..1fdefeb 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -288,6 +288,9 @@ static struct cmdnames main_cmds, other_cmds;
 
 static int is_git_command(const char *s)
 {
+   if (is_builtin(s))
+   return 1;
+
load_command_list(git-, main_cmds, other_cmds);
return is_in_cmdlist(main_cmds, s) ||
is_in_cmdlist(other_cmds, s);
diff --git a/git.c b/git.c
index 89ab5d7..bba4378 100644
--- a/git.c
+++ b/git.c
@@ -332,124 +332,136 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
return 0;
 }
 
+static struct cmd_struct commands[] = {
+   { add, cmd_add, RUN_SETUP | NEED_WORK_TREE },
+   { annotate, cmd_annotate, RUN_SETUP },
+   { apply, cmd_apply, RUN_SETUP_GENTLY },
+   { archive, cmd_archive },
+   { bisect--helper, cmd_bisect__helper, RUN_SETUP },
+   { blame, cmd_blame, RUN_SETUP },
+   { branch, cmd_branch, RUN_SETUP },
+   { bundle, cmd_bundle, RUN_SETUP_GENTLY },
+   { cat-file, cmd_cat_file, RUN_SETUP },
+   { check-attr, cmd_check_attr, RUN_SETUP },
+   { check-ignore, cmd_check_ignore, RUN_SETUP | NEED_WORK_TREE },
+   { check-mailmap, cmd_check_mailmap, RUN_SETUP },
+   { check-ref-format, cmd_check_ref_format },
+   { checkout, cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
+   { checkout-index, cmd_checkout_index,
+   RUN_SETUP | NEED_WORK_TREE},
+   { cherry, cmd_cherry, RUN_SETUP },
+   { cherry-pick, cmd_cherry_pick, RUN_SETUP | NEED_WORK_TREE },
+   { clean, cmd_clean, RUN_SETUP | NEED_WORK_TREE },
+   { clone, cmd_clone },
+   { column, cmd_column, RUN_SETUP_GENTLY },
+   { commit, cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+   { commit-tree, cmd_commit_tree, RUN_SETUP },
+   { config, cmd_config, RUN_SETUP_GENTLY },
+   { count-objects, cmd_count_objects, RUN_SETUP },
+   { credential, cmd_credential, RUN_SETUP_GENTLY },
+   { describe, cmd_describe, RUN_SETUP },
+   { diff, cmd_diff },
+   { diff-files, cmd_diff_files, RUN_SETUP | NEED_WORK_TREE },
+   { diff-index, cmd_diff_index, RUN_SETUP },
+   { diff-tree, cmd_diff_tree, RUN_SETUP },
+   { fast-export, cmd_fast_export, RUN_SETUP },
+   { fetch, cmd_fetch, RUN_SETUP },
+   { fetch-pack, cmd_fetch_pack, RUN_SETUP },
+   { fmt-merge-msg, cmd_fmt_merge_msg, RUN_SETUP },
+   { for-each-ref, cmd_for_each_ref, RUN_SETUP },
+   { format-patch, cmd_format_patch, RUN_SETUP },
+   { fsck, cmd_fsck, RUN_SETUP },
+   { fsck-objects, cmd_fsck, RUN_SETUP },
+   { gc, cmd_gc, RUN_SETUP },
+   { get-tar-commit-id, cmd_get_tar_commit_id },
+   { grep, cmd_grep, RUN_SETUP_GENTLY },
+   { hash-object, cmd_hash_object },
+   { help, cmd_help },
+   { index-pack, cmd_index_pack, RUN_SETUP_GENTLY },
+   { init, cmd_init_db },
+   { init-db, cmd_init_db },
+   { log, cmd_log, RUN_SETUP },
+   { ls-files, cmd_ls_files, RUN_SETUP },
+   { ls-remote, cmd_ls_remote, RUN_SETUP_GENTLY },
+   { ls-tree, cmd_ls_tree, RUN_SETUP },
+   { mailinfo, cmd_mailinfo },
+   { mailsplit, cmd_mailsplit },
+   { merge, cmd_merge, RUN_SETUP | NEED_WORK_TREE },
+   { merge-base, 

Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands

2014-01-02 Thread Junio C Hamano
Sebastian Schuberth sschube...@gmail.com writes:

 Since 2dce956 is_git_command() is a bit slow as it does file I/O in the
 call to list_commands_in_dir(). Avoid the file I/O by adding an early
 check for internal commands.

I think it is a good thing to check with the list of built-in's
first, but in order to see if one name we already have at hand is a
git command, I have to wonder if it is the best we can do to collect
all the possible command names with load_command_list() and check
the membership.

There are two callers of is_in_cmdlist(), and the way they use the
function looks both very backwards.

 - builtin/help.c has a user supplied string that it wants to see if
   it is a git command (either on-disk in exec-path or a builtin),
   either to see if an alias in a config is really in effect, or to
   see if it is a command (i.e. invoke 'man' with git-string) or a
   concept (i.e. invoke 'man' with gitstring).  It feels that it
   should be a lot less work to just check with the builtin table
   and stat(exec-path + string) for either of these purposes (on
   Windows you may have to append .exe before checking and may also
   have to check with .com so you may have to do more than one
   stat(), but still).

   Even though help is not a performance critical codepath,
   optimizing this further so that we do not load the full set of
   commands unnecessarily may be in line with the theme of your
   series, I think.

 - builtin/merge.c is the same, but it is conceptually even worse.
   It has the end-user supplied string and wants to see if it is a
   valid strategy.  If the user wants to use a custom strategy, a
   single stat() to make sure if it exists should suffice, and the
   error codepath should load the command list to present the names
   of available ones in the error message.

Based on the above observation, I have a feeling that we will be
better off to aim for getting rid of is_in_cmdlist(), reimplement
is_git_command() to check with builtin and then do a stat (or two)
inside the exec-path, and update builtin/merge.c codepath to use
is_git_command() to see if the given name is indeed a strategy.

So in short, I very much like the part that moves the built-in
command list out of the main() function and makes is_git_command()
use it, but I think the primary implementation of is_git_command()
to check if the given string names an on-disk command is still not
very nice.

Thanks.

 Signed-off-by: Sebastian Schuberth sschube...@gmail.com
 ---
  Documentation/technical/api-builtin.txt |   4 +-
  builtin.h   |   2 +
  builtin/help.c  |   3 +
  git.c   | 242 
 +---
  4 files changed, 134 insertions(+), 117 deletions(-)

 diff --git a/Documentation/technical/api-builtin.txt 
 b/Documentation/technical/api-builtin.txt
 index 150a02a..e3d6e7a 100644
 --- a/Documentation/technical/api-builtin.txt
 +++ b/Documentation/technical/api-builtin.txt
 @@ -14,8 +14,8 @@ Git:
  
  . Add the external declaration for the function to `builtin.h`.
  
 -. Add the command to `commands[]` table in `handle_builtin()`,
 -  defined in `git.c`.  The entry should look like:
 +. Add the command to the `commands[]` table defined in `git.c`.
 +  The entry should look like:
  
   { foo, cmd_foo, options },
  +
 diff --git a/builtin.h b/builtin.h
 index d4afbfe..c47c110 100644
 --- a/builtin.h
 +++ b/builtin.h
 @@ -27,6 +27,8 @@ extern int fmt_merge_msg(struct strbuf *in, struct strbuf 
 *out,
  
  extern int textconv_object(const char *path, unsigned mode, const unsigned 
 char *sha1, int sha1_valid, char **buf, unsigned long *buf_size);
  
 +extern int is_builtin(const char *s);
 +
  extern int cmd_add(int argc, const char **argv, const char *prefix);
  extern int cmd_annotate(int argc, const char **argv, const char *prefix);
  extern int cmd_apply(int argc, const char **argv, const char *prefix);
 diff --git a/builtin/help.c b/builtin/help.c
 index b6fc15e..1fdefeb 100644
 --- a/builtin/help.c
 +++ b/builtin/help.c
 @@ -288,6 +288,9 @@ static struct cmdnames main_cmds, other_cmds;
  
  static int is_git_command(const char *s)
  {
 + if (is_builtin(s))
 + return 1;
 +
   load_command_list(git-, main_cmds, other_cmds);
   return is_in_cmdlist(main_cmds, s) ||
   is_in_cmdlist(other_cmds, s);
 diff --git a/git.c b/git.c
 index 89ab5d7..bba4378 100644
 --- a/git.c
 +++ b/git.c
 @@ -332,124 +332,136 @@ static int run_builtin(struct cmd_struct *p, int 
 argc, const char **argv)
   return 0;
  }
  
 +static struct cmd_struct commands[] = {
 + { add, cmd_add, RUN_SETUP | NEED_WORK_TREE },
 + { annotate, cmd_annotate, RUN_SETUP },
 + { apply, cmd_apply, RUN_SETUP_GENTLY },
 + { archive, cmd_archive },
 + { bisect--helper, cmd_bisect__helper, RUN_SETUP },
 + { blame, cmd_blame, RUN_SETUP },
 + { branch, cmd_branch, RUN_SETUP },
 + { bundle,