Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
On 05.01.2014 14:42, Sebastian Schuberth wrote: Since 2dce956 is_git_command() is a bit slow as it does file I/O in the call to list_commands_in_dir(). Avoid the file I/O by adding an early check for internal commands. Considering the purpose of the series is it better to say builtin instead of internal in the commit message? True, I'll fix this in a re-rool. Sorry for not coming up with the re-roll until now, but lucky Junio has fixed this himself in c6127fa which already is on master. -- Sebastian Schuberth -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote: - builtin/merge.c is the same, but it is conceptually even worse. It has the end-user supplied string and wants to see if it is a valid strategy. If the user wants to use a custom strategy, a single stat() to make sure if it exists should suffice, and the error codepath should load the command list to present the names of available ones in the error message. Is it a single stat()? I think we would need to check each element of $PATH. Though in practice, the exec-dir would be the first thing we check, and where we would find the majority of hits. So it would still be a win, as we would avoid touching anything but the exec-dir in the common case. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
Since 2dce956 is_git_command() is a bit slow as it does file I/O in the call to list_commands_in_dir(). Avoid the file I/O by adding an early check for internal commands. Considering the purpose of the series is it better to say builtin instead of internal in the commit message?-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
Jeff King p...@peff.net writes: On Thu, Jan 02, 2014 at 11:41:05AM -0800, Junio C Hamano wrote: - builtin/merge.c is the same, but it is conceptually even worse. It has the end-user supplied string and wants to see if it is a valid strategy. If the user wants to use a custom strategy, a single stat() to make sure if it exists should suffice, and the error codepath should load the command list to present the names of available ones in the error message. Is it a single stat()? I think we would need to check each element of $PATH. Yeah, load_command_list() iterates over the members of env_path. Though in practice, the exec-dir would be the first thing we check, and where we would find the majority of hits. So it would still be a win, as we would avoid touching anything but the exec-dir in the common case. Exactly. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
Since 2dce956 is_git_command() is a bit slow as it does file I/O in the call to list_commands_in_dir(). Avoid the file I/O by adding an early check for internal commands. Signed-off-by: Sebastian Schuberth sschube...@gmail.com --- Documentation/technical/api-builtin.txt | 4 +- builtin.h | 2 + builtin/help.c | 3 + git.c | 242 +--- 4 files changed, 134 insertions(+), 117 deletions(-) diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt index 150a02a..e3d6e7a 100644 --- a/Documentation/technical/api-builtin.txt +++ b/Documentation/technical/api-builtin.txt @@ -14,8 +14,8 @@ Git: . Add the external declaration for the function to `builtin.h`. -. Add the command to `commands[]` table in `handle_builtin()`, - defined in `git.c`. The entry should look like: +. Add the command to the `commands[]` table defined in `git.c`. + The entry should look like: { foo, cmd_foo, options }, + diff --git a/builtin.h b/builtin.h index d4afbfe..c47c110 100644 --- a/builtin.h +++ b/builtin.h @@ -27,6 +27,8 @@ extern int fmt_merge_msg(struct strbuf *in, struct strbuf *out, extern int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, int sha1_valid, char **buf, unsigned long *buf_size); +extern int is_builtin(const char *s); + extern int cmd_add(int argc, const char **argv, const char *prefix); extern int cmd_annotate(int argc, const char **argv, const char *prefix); extern int cmd_apply(int argc, const char **argv, const char *prefix); diff --git a/builtin/help.c b/builtin/help.c index b6fc15e..1fdefeb 100644 --- a/builtin/help.c +++ b/builtin/help.c @@ -288,6 +288,9 @@ static struct cmdnames main_cmds, other_cmds; static int is_git_command(const char *s) { + if (is_builtin(s)) + return 1; + load_command_list(git-, main_cmds, other_cmds); return is_in_cmdlist(main_cmds, s) || is_in_cmdlist(other_cmds, s); diff --git a/git.c b/git.c index 89ab5d7..bba4378 100644 --- a/git.c +++ b/git.c @@ -332,124 +332,136 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) return 0; } +static struct cmd_struct commands[] = { + { add, cmd_add, RUN_SETUP | NEED_WORK_TREE }, + { annotate, cmd_annotate, RUN_SETUP }, + { apply, cmd_apply, RUN_SETUP_GENTLY }, + { archive, cmd_archive }, + { bisect--helper, cmd_bisect__helper, RUN_SETUP }, + { blame, cmd_blame, RUN_SETUP }, + { branch, cmd_branch, RUN_SETUP }, + { bundle, cmd_bundle, RUN_SETUP_GENTLY }, + { cat-file, cmd_cat_file, RUN_SETUP }, + { check-attr, cmd_check_attr, RUN_SETUP }, + { check-ignore, cmd_check_ignore, RUN_SETUP | NEED_WORK_TREE }, + { check-mailmap, cmd_check_mailmap, RUN_SETUP }, + { check-ref-format, cmd_check_ref_format }, + { checkout, cmd_checkout, RUN_SETUP | NEED_WORK_TREE }, + { checkout-index, cmd_checkout_index, + RUN_SETUP | NEED_WORK_TREE}, + { cherry, cmd_cherry, RUN_SETUP }, + { cherry-pick, cmd_cherry_pick, RUN_SETUP | NEED_WORK_TREE }, + { clean, cmd_clean, RUN_SETUP | NEED_WORK_TREE }, + { clone, cmd_clone }, + { column, cmd_column, RUN_SETUP_GENTLY }, + { commit, cmd_commit, RUN_SETUP | NEED_WORK_TREE }, + { commit-tree, cmd_commit_tree, RUN_SETUP }, + { config, cmd_config, RUN_SETUP_GENTLY }, + { count-objects, cmd_count_objects, RUN_SETUP }, + { credential, cmd_credential, RUN_SETUP_GENTLY }, + { describe, cmd_describe, RUN_SETUP }, + { diff, cmd_diff }, + { diff-files, cmd_diff_files, RUN_SETUP | NEED_WORK_TREE }, + { diff-index, cmd_diff_index, RUN_SETUP }, + { diff-tree, cmd_diff_tree, RUN_SETUP }, + { fast-export, cmd_fast_export, RUN_SETUP }, + { fetch, cmd_fetch, RUN_SETUP }, + { fetch-pack, cmd_fetch_pack, RUN_SETUP }, + { fmt-merge-msg, cmd_fmt_merge_msg, RUN_SETUP }, + { for-each-ref, cmd_for_each_ref, RUN_SETUP }, + { format-patch, cmd_format_patch, RUN_SETUP }, + { fsck, cmd_fsck, RUN_SETUP }, + { fsck-objects, cmd_fsck, RUN_SETUP }, + { gc, cmd_gc, RUN_SETUP }, + { get-tar-commit-id, cmd_get_tar_commit_id }, + { grep, cmd_grep, RUN_SETUP_GENTLY }, + { hash-object, cmd_hash_object }, + { help, cmd_help }, + { index-pack, cmd_index_pack, RUN_SETUP_GENTLY }, + { init, cmd_init_db }, + { init-db, cmd_init_db }, + { log, cmd_log, RUN_SETUP }, + { ls-files, cmd_ls_files, RUN_SETUP }, + { ls-remote, cmd_ls_remote, RUN_SETUP_GENTLY }, + { ls-tree, cmd_ls_tree, RUN_SETUP }, + { mailinfo, cmd_mailinfo }, + { mailsplit, cmd_mailsplit }, + { merge, cmd_merge, RUN_SETUP | NEED_WORK_TREE }, + { merge-base,
Re: [PATCH v2 3/4] Speed up is_git_command() by checking early for internal commands
Sebastian Schuberth sschube...@gmail.com writes: Since 2dce956 is_git_command() is a bit slow as it does file I/O in the call to list_commands_in_dir(). Avoid the file I/O by adding an early check for internal commands. I think it is a good thing to check with the list of built-in's first, but in order to see if one name we already have at hand is a git command, I have to wonder if it is the best we can do to collect all the possible command names with load_command_list() and check the membership. There are two callers of is_in_cmdlist(), and the way they use the function looks both very backwards. - builtin/help.c has a user supplied string that it wants to see if it is a git command (either on-disk in exec-path or a builtin), either to see if an alias in a config is really in effect, or to see if it is a command (i.e. invoke 'man' with git-string) or a concept (i.e. invoke 'man' with gitstring). It feels that it should be a lot less work to just check with the builtin table and stat(exec-path + string) for either of these purposes (on Windows you may have to append .exe before checking and may also have to check with .com so you may have to do more than one stat(), but still). Even though help is not a performance critical codepath, optimizing this further so that we do not load the full set of commands unnecessarily may be in line with the theme of your series, I think. - builtin/merge.c is the same, but it is conceptually even worse. It has the end-user supplied string and wants to see if it is a valid strategy. If the user wants to use a custom strategy, a single stat() to make sure if it exists should suffice, and the error codepath should load the command list to present the names of available ones in the error message. Based on the above observation, I have a feeling that we will be better off to aim for getting rid of is_in_cmdlist(), reimplement is_git_command() to check with builtin and then do a stat (or two) inside the exec-path, and update builtin/merge.c codepath to use is_git_command() to see if the given name is indeed a strategy. So in short, I very much like the part that moves the built-in command list out of the main() function and makes is_git_command() use it, but I think the primary implementation of is_git_command() to check if the given string names an on-disk command is still not very nice. Thanks. Signed-off-by: Sebastian Schuberth sschube...@gmail.com --- Documentation/technical/api-builtin.txt | 4 +- builtin.h | 2 + builtin/help.c | 3 + git.c | 242 +--- 4 files changed, 134 insertions(+), 117 deletions(-) diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt index 150a02a..e3d6e7a 100644 --- a/Documentation/technical/api-builtin.txt +++ b/Documentation/technical/api-builtin.txt @@ -14,8 +14,8 @@ Git: . Add the external declaration for the function to `builtin.h`. -. Add the command to `commands[]` table in `handle_builtin()`, - defined in `git.c`. The entry should look like: +. Add the command to the `commands[]` table defined in `git.c`. + The entry should look like: { foo, cmd_foo, options }, + diff --git a/builtin.h b/builtin.h index d4afbfe..c47c110 100644 --- a/builtin.h +++ b/builtin.h @@ -27,6 +27,8 @@ extern int fmt_merge_msg(struct strbuf *in, struct strbuf *out, extern int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, int sha1_valid, char **buf, unsigned long *buf_size); +extern int is_builtin(const char *s); + extern int cmd_add(int argc, const char **argv, const char *prefix); extern int cmd_annotate(int argc, const char **argv, const char *prefix); extern int cmd_apply(int argc, const char **argv, const char *prefix); diff --git a/builtin/help.c b/builtin/help.c index b6fc15e..1fdefeb 100644 --- a/builtin/help.c +++ b/builtin/help.c @@ -288,6 +288,9 @@ static struct cmdnames main_cmds, other_cmds; static int is_git_command(const char *s) { + if (is_builtin(s)) + return 1; + load_command_list(git-, main_cmds, other_cmds); return is_in_cmdlist(main_cmds, s) || is_in_cmdlist(other_cmds, s); diff --git a/git.c b/git.c index 89ab5d7..bba4378 100644 --- a/git.c +++ b/git.c @@ -332,124 +332,136 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) return 0; } +static struct cmd_struct commands[] = { + { add, cmd_add, RUN_SETUP | NEED_WORK_TREE }, + { annotate, cmd_annotate, RUN_SETUP }, + { apply, cmd_apply, RUN_SETUP_GENTLY }, + { archive, cmd_archive }, + { bisect--helper, cmd_bisect__helper, RUN_SETUP }, + { blame, cmd_blame, RUN_SETUP }, + { branch, cmd_branch, RUN_SETUP }, + { bundle,