Re: git-owner, was Re: [bug] first line truncated with `git log --oneline --decorate --graph`
Hi Dave, On 2015-04-16 19:06, Johannes Schindelin wrote: On 2015-04-16 18:31, David Miller wrote: From: Jeff King p...@peff.net Date: Thu, 16 Apr 2015 12:26:21 -0400 Weird. In a nearby thread with the same problem, the first email that mentions git-owner in a cc header is yours[1]. It's in reply to a message that does not mention git-owner at all[2], except in the Sender field. Your agent header looks like: User-Agent: Roundcube Webmail/1.1.0 Maybe their reply to all function is a little over-zealous? This is always caused by broken reply list handling in email clients. That must be it. Dave, my apologies! Will investigate *right now*. With the help of Peff, who identified the culprit (http://trac.roundcube.net/ticket/1489011 introduces this bug, but maintains to fix one), I was able to fix this on my side: https://github.com/dscho/roundcubemail/commit/baec39d Please accept my sincerest apologies, Johannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
Hi, I wonder why git log -Gregexp works with the regexp-ignore-case option but not with the other regexp-related options? Wouldn't it be useful to make the Gregex option support the following options? * basic-regexp * extended-regexp * fixed-strings * perl-regexp Similarly I think it is not very consistent that one cannot combine any of the above options with the Sstring but instead have yet another option called pickaxe-regex to toggle between fixed-string and extended-regexp semantics for the argument passed to option S. The description of the above options in the git-log(1) manpage of Git version 2.1 do not explicitly say that they do not support the Gregex and Sstring option. Wouldn't it be nice to have all of the above options collaborate with each other? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
We became aware of slow merge times with the following setup: The merge is created in a temporary location that uses alternates. The temporary repository is on a local disk, the alternate object database on an NFS mount. After some investigation we believe that #33d4221 (present in git 2.2.0, absent in 2.1.4) is causing this regression in merge time. The following are merge times (in seconds) with git@33d4221~ (2.1.2-393-gabcb865) (before the change) Elapsed SystemUser Min. :0.3700 Min. :0.04000 Min. :0.3000 1st Qu.:0.3800 1st Qu.:0.05000 1st Qu.:0.3100 Median :0.4000 Median :0.06000 Median :0.3300 Mean :0.4295 Mean :0.05905 Mean :0.3519 3rd Qu.:0.4600 3rd Qu.:0.07000 3rd Qu.:0.3600 Max. :0.5900 Max. :0.09000 Max. :0.4900 The following are merge times with git@33d4221 (2.1.2-394-g33d4221): Elapsed SystemUser Min. : 8.58 Min. :1.46 Min. :0.4400 1st Qu.: 9.63 1st Qu.:1.60 1st Qu.:0.4400 Median :10.64 Median :1.66 Median :0.4800 Mean :10.50 Mean :1.69 Mean :0.4986 3rd Qu.:11.13 3rd Qu.:1.81 3rd Qu.:0.5000 Max. :13.96 Max. :2.05 Max. :0.6700 As you can see the merge times are an order of magnitude slower after the change. The effect of #33d4221 can be seen when strace'ing the merge: Running strace on git@33d4221 yields % time seconds usecs/call callserrors syscall -- --- --- - - 70.790.714852 178 4018 utime 14.730.148789 3 50141 50123 lstat 13.880.140198 17 8074 8067 access 0.240.002455 614 4 rename 0.150.001493 3 577 write 0.060.000618 1065 close 0.040.000453 3 152 brk 0.040.000433 2716 mkdir 0.030.000310 841 fstat Running strace on git@33d4221~ yields % time seconds usecs/call callserrors syscall -- --- --- - - 98.370.138516 3 50141 50123 lstat 0.920.001292 2 577 write 0.370.000520 143831 access 0.180.000252 36 7 getcwd 0.170.000237 73620 stat 0.000.00 040 read 0.000.00 08730 open My current hypothesis is that the additional `access`, but more importantly the additional `utime` calls are responsible in the increased merge times that we see. NFS stats on the server for the tests seem to confirm this (see nfsstat-{after,before}-change.txt on https://bitbucket.org/snippets/ssaasen/oend). This is certainly due to the fact that this will all happen over NFS but in 2.1.4 this worked fine and starting with 2.2 this has become very slow. Looking at the detailed strace shows that utime will be called repeatedly in same cases (e.g. https://bitbucket.org/snippets/ssaasen/oend shows an example where the same packfile will be updated more than 4000 times in a single merge). http://www.spinics.net/lists/git/msg240106.html discusses a potential improvement for this case. Would that be an acceptable avenue to improve this situation? Best regards, Stefan Saasen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-p4: Use -m when running p4 changes
Thanks, all. I will update the patch as requested and resend a [PATCH v3]. This time without the redundant headers. I will also make an extra effort to make sure that the raw tabs do not get converted to spaces this time. Oof, I am really out of practice at programming with raw tabs, much less getting them to make it through email software. Thank you for your patience. test_seq is a neat utility. Also, I don't know why I didn't think to update the document page. Certainly it needs to be updated. Lex Spoon -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reachable: only mark local objects as recent
On Fri, Mar 27, 2015 at 12:00:05PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: It is possible that we may drop an object that is depended upon by another object in the alternate. For example, imagine two repositories, A and B, with A pointing to B as an alternate. Now imagine a commit that is in B which references a tree that is only in A. Traversing from recent objects in B might prevent A from dropping that tree. But this case isn't worth covering. Repo B should take responsibility for its own objects. It would never have had the commit in the first place if it did not also have the tree, and assuming it is using the same keep recent chunks of history scheme, then it would itself keep the tree, as well. In other words, if you have a loop in dependency chain among alternate repositories, your set-up is broken by definition. Which makes sense to me. Thanks. I don't see this patch in pu or What's Cooking at all. Did it get dropped? It does fix a performance regression, but the problem is in v2.2, so I don't think it's urgent for v2.4-rc. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
On Thu, Apr 16, 2015 at 5:01 AM, Jeff King p...@peff.net wrote: We spend a lot of time in strbuf_getwholeline in a tight loop reading characters from a stdio handle into a buffer. The libc getdelim() function can do this for us with less overhead. It's in POSIX.1-2008, and was a GNU extension before that. Therefore we can't rely on it, but can fall back to the existing getc loop when it is not available. The HAVE_GETDELIM knob is turned on automatically for Linux, where we have glibc. We don't need to set any new feature-test macros, because we already define _GNU_SOURCE. Other systems that implement getdelim may need to other macros (probably _POSIX_C_SOURCE = 200809L), but we can address that along with setting the Makefile knob after testing the feature on those systems. [...] Based on a patch from Rasmus Villemoes r...@rasmusvillemoes.dk. Signed-off-by: Jeff King p...@peff.net --- If somebody has a FreeBSD or OS X system to test on, I'd love to see what is needed to compile with HAVE_GETDELIM there. Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from 2009. It does not have getdelim(). Unfortunately, I haven't been able to determine when getdelim() was introduced on the Mac OS X, thus have been unable to craft a simple rule for config.mak.uname. And to confirm that the performance is much better. Sharing my 1.6GB packed-refs file would be hard, but you should be able to generate something large and ridiculous. I'll leave that as an exercise to the reader. diff --git a/Makefile b/Makefile index 5f3987f..36655d5 100644 --- a/Makefile +++ b/Makefile @@ -359,6 +359,8 @@ all:: # compiler is detected to support it. # # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function. +# +# Define HAVE_GETDELIM if your system has the getdelim() function. GIT-VERSION-FILE: FORCE @$(SHELL_PATH) ./GIT-VERSION-GEN @@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL BASIC_CFLAGS += -DHAVE_BSD_SYSCTL endif +ifdef HAVE_GETDELIM + BASIC_CFLAGS += -DHAVE_GETDELIM +endif + ifeq ($(TCLTK_PATH),) NO_TCLTK = NoThanks endif diff --git a/config.mak.uname b/config.mak.uname index f4e77cb..d26665f 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux) HAVE_DEV_TTY = YesPlease HAVE_CLOCK_GETTIME = YesPlease HAVE_CLOCK_MONOTONIC = YesPlease + HAVE_GETDELIM = YesPlease endif ifeq ($(uname_S),GNU/kFreeBSD) HAVE_ALLOCA_H = YesPlease -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fast-import: add options to enable/disable case folding
Currently, fast-import does case folding depending on `core.ignorecase`. `core.ignorecase` depends on the file system where the working tree is. However, different kind of imports require different kinds of semantics, and they usually aren't tied with the file system, but with the data being imported. Add command line options to enable or disable case folding. Also expose them as features in the fast-import stream. Features instead of options, because a stream that needs case folding enabled or disabled won't work as expected if fast-import doesn't support the case folding options. --- Documentation/git-fast-import.txt | 11 ++ fast-import.c | 19 -- t/t9300-fast-import.sh| 79 +++ 3 files changed, 106 insertions(+), 3 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 690fed3..22eba87 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -50,6 +50,13 @@ OPTIONS memory used by fast-import during this run. Showing this output is currently the default, but can be disabled with \--quiet. +--[no-]fold-case:: + When files/directories with the same name but a different case + are detected, they are treated as the same (--fold-case) or as + being different (--no-fold-case). The default is --fold-case + when `core.ignorecase` is set to `true`, and --no-fold-case when + it is `false`. + Options for Frontends ~ @@ -1027,6 +1034,8 @@ date-format:: export-marks:: relative-marks:: no-relative-marks:: +fold-case:: +no-fold-case:: force:: Act as though the corresponding command-line option with a leading '--' was passed on the command line @@ -1091,6 +1100,8 @@ not be passed as option: * import-marks * export-marks * cat-blob-fd +* fold-case +* no-fold-case * force `done` diff --git a/fast-import.c b/fast-import.c index 6378726..958f3da 100644 --- a/fast-import.c +++ b/fast-import.c @@ -371,10 +371,18 @@ static volatile sig_atomic_t checkpoint_requested; /* Where to write output of cat-blob commands */ static int cat_blob_fd = STDOUT_FILENO; +/* Whether to enable case folding */ +static int fold_case; + static void parse_argv(void); static void parse_cat_blob(const char *p); static void parse_ls(const char *p, struct branch *b); +static int strncmp_foldcase(const char *a, const char *b, size_t count) +{ + return fold_case ? strncasecmp(a, b, count) : strncmp(a, b, count); +} + static void write_branch_report(FILE *rpt, struct branch *b) { fprintf(rpt, %s:\n, b-name); @@ -1507,7 +1515,7 @@ static int tree_content_set( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (!*slash1) { if (!S_ISDIR(mode) e-versions[1].mode == mode @@ -1597,7 +1605,7 @@ static int tree_content_remove( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (*slash1 !S_ISDIR(e-versions[1].mode)) /* * If p names a file in some subdirectory, and a @@ -1664,7 +1672,7 @@ static int tree_content_get( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (!*slash1) goto found_entry; if (!S_ISDIR(e-versions[1].mode)) @@ -3246,6 +3254,10 @@ static int parse_one_feature(const char *feature, int from_stream) relative_marks_paths = 1; } else if (!strcmp(feature, no-relative-marks)) { relative_marks_paths = 0; + } else if (!strcmp(feature, fold-case)) { + fold_case = 1; + } else if (!strcmp(feature, no-fold-case)) { + fold_case = 0; } else if (!strcmp(feature, done)) { require_explicit_termination = 1; } else if (!strcmp(feature, force)) { @@ -3372,6 +3384,7 @@ int main(int argc, char **argv) avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*)); marks = pool_calloc(1, sizeof(struct mark_set)); + fold_case = ignore_case; global_argc = argc; global_argv = argv; diff --git
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote: This _might_ have some performance impact in that strbuf_addch() involves strbuf_grow(*, 1), which does does it overflow to increment sb-len by one?; I would say it should be unmeasurable because the function is expected to be used only on loose objects and you shouldn't have very many of them without packing in your repository in the first place. I guess Peff's c1822d4f (strbuf: add an optimized 1-character strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his new strbuf_grow_ch(), and once that happens the performance worry would disappear without this code to be changed at all. I haven't re-rolled that series yet, but the discussion there showed that strbuf_grow_ch() is unnecessary; call-sites can just check: if (!strbuf_avail(sb)) strbuf_grow(sb, 1); to get the fast inline check. Since we go to the trouble to inline strbuf_addch, we should probably teach it the same trick. It would be nice to identify a place with a tight strbuf_addch() loop that could demonstrate the speed increase, though. Just for reference, I did teach strbuf_addch this trick. And the config code is the tight loop to test it with. Results are here: http://article.gmane.org/gmane.comp.version-control.git/267266 In this code path, we are typically only seeing short strings (the original code used a 10-byte static buffer). So I doubt it makes all that much difference. If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. I'm not sure I understand why we are copying it at all. The original code copied from the hdr into type[10] so that we could NUL-terminate it, which was required for type_from_string(). But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? -Peff [1] Not related to your patch, but it looks like type_from_string_gently is overly lax. It does a strncmp() with the length of the candidate name, but does not check that we consumed all of the matching name. So tr would match tree, comm would match commit, and so forth. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote: The merge is created in a temporary location that uses alternates. The temporary repository is on a local disk, the alternate object database on an NFS mount. Is the alternate writeable? If we can't freshen the object, we fall back to storing the object locally, which could have a performance impact. But it looks from your tables below like the utime() call is succeeding, so that is probably not what is happening here. My current hypothesis is that the additional `access`, but more importantly the additional `utime` calls are responsible in the increased merge times that we see. Yeah, that makes sense from your tables. The commit in question flips the order of the loose/packed check, and the packed check should be much faster on your NFS mount. Can you try: diff --git a/sha1_file.c b/sha1_file.c index 88f06ba..822aaef 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen); if (returnsha1) hashcpy(returnsha1, sha1); - if (freshen_loose_object(sha1) || freshen_packed_object(sha1)) + if (freshen_packed_object(sha1) || freshen_loose_object(sha1)) return 0; return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } I think that should clear up the access() calls, but leave the utime() ones. Looking at the detailed strace shows that utime will be called repeatedly in same cases (e.g. https://bitbucket.org/snippets/ssaasen/oend shows an example where the same packfile will be updated more than 4000 times in a single merge). http://www.spinics.net/lists/git/msg240106.html discusses a potential improvement for this case. Would that be an acceptable avenue to improve this situation? I think so. Here's a tentative patch: diff --git a/cache.h b/cache.h index 3d3244b..72c6888 100644 --- a/cache.h +++ b/cache.h @@ -1174,6 +1174,7 @@ extern struct packed_git { int pack_fd; unsigned pack_local:1, pack_keep:1, +freshened:1, do_not_close:1; unsigned char sha1[20]; /* something like .git/objects/pack/x.pack */ diff --git a/sha1_file.c b/sha1_file.c index 822aaef..f27cbf1 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1) static int freshen_packed_object(const unsigned char *sha1) { struct pack_entry e; - return find_pack_entry(sha1, e) freshen_file(e.p-pack_name); + if (!find_pack_entry(sha1, e)) + return 0; + if (e.p-freshened) + return 1; + return freshen_file(e.p-pack_name); } int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) If it's not a problem, I'd love to see timings for your case with just the first patch, and then with both. You may also be interested in: http://thread.gmane.org/gmane.comp.version-control.git/266370 which addresses another performance problem related to the freshen/recent code in v2.2. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rev-list-options.txt: complete sentence about notes matching
Signed-off-by: Michael J Gruber g...@drmicha.warpmail.net --- Documentation/rev-list-options.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index f620ee4..77ac439 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -59,8 +59,8 @@ endif::git-rev-list[] matches any of the given patterns are chosen (but see `--all-match`). + -When `--show-notes` is in effect, the message from the notes as -if it is part of the log message. +When `--show-notes` is in effect, the message from the notes is +matched as if it were part of the log message. --all-match:: Limit the commits output to ones that match all given `--grep`, -- 2.4.0.rc2.251.gab67463 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
Tim Friske venit, vidit, dixit 17.04.2015 12:00: Hi, I wonder why git log -Gregexp works with the regexp-ignore-case option but not with the other regexp-related options? Wouldn't it be useful to make the Gregex option support the following options? * basic-regexp * extended-regexp * fixed-strings * perl-regexp Similarly I think it is not very consistent that one cannot combine any of the above options with the Sstring but instead have yet another option called pickaxe-regex to toggle between fixed-string and extended-regexp semantics for the argument passed to option S. The defaults are different, and it is likely that users want to switch one without switching the other. E.g., with -S you often use strings that you'd rather not have to quote to guard them against the regexp engine. The description of the above options in the git-log(1) manpage of Git version 2.1 do not explicitly say that they do not support the Gregex and Sstring option. They are in different sections, since --grep etc. are log options pertaining to matching the commit header and log message (commit object), while S and G match in the diff and are described in the diff section (although they are commit limitting as well). Wouldn't it be nice to have all of the above options collaborate with each other? I'm afraid it's important to keep the different defaults. Personally, I found it surprising that --regexp-ignore-case applies to -G at all. It turns out that it was bolted on retroactively - it used to apply to commit object greps only, and was made to switch also diff grep behaviour later, as a convenience matter. The reason probaly is that -S originally was directed at script usage and turned out to be used by end users quite a bit. I'd say most of our inconsistencies are due to convenience... If you want to work on this, I suggest you introduce the missing long option names such as --grep-diff (-G) and maybe --grep-log (--grep) first and then find consistent and convenient names and defaults for the regexp options. Michael -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix settings in default_user_config template
On Fri, Apr 17, 2015 at 05:50:10PM +0300, Ossi Herrala wrote: The name (not user) and email setting should be in config section user and not in core as documented in Documentation/config.txt. facepalm Well, those probably weren't helping any new users, then, were they? :) diff --git a/builtin/config.c b/builtin/config.c index d32c532..bfd3016 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -455,9 +455,9 @@ static char *default_user_config(void) struct strbuf buf = STRBUF_INIT; strbuf_addf(buf, _(# This is Git's per-user configuration file.\n - [core]\n + [user]\n # Please adapt and uncomment the following lines:\n - #user = %s\n + #name = %s\n #email = %s\n), Looks obviously correct. Thanks for noticing. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] type_from_string_gently: make sure length matches
When commit fe8e3b7 refactored type_from_string to allow input that was not NUL-terminated, it switched to using strncmp instead of strcmp. But this means we check only the first len bytes of the strings, and ignore any remaining bytes in the object_type_string. We should make sure that it is also len bytes, or else we would accept comm as commit, and so forth. Signed-off-by: Jeff King p...@peff.net --- Since the strings we are matching are literals, we could also record their sizes in the object_type_strings array and check the length first before even calling strncmp. I doubt this is a performance hot-spot, though. You could also potentially just use strlen(object_type_strings[i]), but I'm not sure if compilers will optimize out the strlen in this case, since it is in a loop. object.c | 3 ++- t/t1007-hash-object.sh | 8 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/object.c b/object.c index 23d6c96..980ac5f 100644 --- a/object.c +++ b/object.c @@ -41,7 +41,8 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle) len = strlen(str); for (i = 1; i ARRAY_SIZE(object_type_strings); i++) - if (!strncmp(str, object_type_strings[i], len)) + if (!strncmp(str, object_type_strings[i], len) + object_type_strings[i][len] == '\0') return i; if (gentle) diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh index f83df8e..ebb3a69 100755 --- a/t/t1007-hash-object.sh +++ b/t/t1007-hash-object.sh @@ -201,4 +201,12 @@ test_expect_success 'corrupt tag' ' test_must_fail git hash-object -t tag --stdin /dev/null ' +test_expect_success 'hash-object complains about bogus type name' ' + test_must_fail git hash-object -t bogus --stdin /dev/null +' + +test_expect_success 'hash-object complains about truncated type name' ' + test_must_fail git hash-object -t bl --stdin /dev/null +' + test_done -- 2.4.0.rc2.384.g7297a4a -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix settings in default_user_config template
The name (not user) and email setting should be in config section user and not in core as documented in Documentation/config.txt. --- builtin/config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/config.c b/builtin/config.c index d32c532..bfd3016 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -455,9 +455,9 @@ static char *default_user_config(void) struct strbuf buf = STRBUF_INIT; strbuf_addf(buf, _(# This is Git's per-user configuration file.\n - [core]\n + [user]\n # Please adapt and uncomment the following lines:\n - #user = %s\n + #name = %s\n #email = %s\n), ident_default_name(), ident_default_email()); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories
On Wed, Apr 15, 2015 at 7:56 PM, Junio C Hamano gits...@pobox.com wrote: Erik Elfström erik.elfst...@gmail.com writes: Before this change, clean used resolve_gitlink_ref to check for the presence of nested git repositories. This had the drawback of creating a ref_cache entry for every directory that should potentially be cleaned. The linear search through the ref_cache list caused a massive performance hit for large number of directories. I'd prefer to see the current state described in the current tense, e.g. git clean uses resolve_gitlink_ref() to check for the presence of nested git repositories, but it has the drawback of creating a ref_cache entry for every directory that should potentially be cleaned. The linear search through the ref_cache list causes a massive performance hit for large number of directories. Yes, that reads better. Teach clean.c:remove_dirs to use setup.c:is_git_directory instead. is_git_directory will actually open HEAD and parse the HEAD ref but this implies a nested git repository and should be rare when cleaning. I am not sure what you wanted to say in this paragraph. What does it being rare have to do with it? Even if it is not rare (i.e. the top-level project you are working with has many submodules checked out without using the more recent a file .git pointing into .git/modules/ via 'gitdir: $overThere' mechanism), if we found a nested git repository, we treat it as special and exclude it from cleaning it out, which is a good thing, no? I was trying to motivate that the performance of is_git_directory is not a problem for us even though it opens a file and parses it. I see now when I read it again that this is not very clear. Doesn't this implementation get confused by modern submodule checkouts and descend into and clean their working tree, though? Module M with path P would have a directory P in the working tree of the top-level project, and P/.git is a regular file that will fail is_git_directory() test but records the location of the real submodule repository i.e. .git/modules/M via the gitdir: mechanism. Yes, there is a problem here. I've added the test below and it fails after my change by cleaning sub2 (sub1 is not cleaned). Are there more cases here that I should test for? +test_expect_success 'should not clean submodules' ' + rm -fr repo to_clean sub1 sub2 + mkdir -p repo to_clean + ( + cd repo + git init + hello.world + git add . + git commit -a -m nested + ) + git submodule add ./repo/.git sub1 + git commit -m sub1 + git branch before_sub2 + git submodule add ./repo/.git sub2 + git commit -m sub2 + git checkout before_sub2 + to_clean/should_clean.this + git clean -f -d + test_path_is_file repo/.git/index + test_path_is_file repo/hello.world + test_path_is_file sub1/.git + test_path_is_file sub1/hello.world + test_path_is_file sub2/.git + test_path_is_file sub2/hello.world + test_path_is_missing to_clean +' Base on the previous discussion of the patch topic I can see 3 options for how to fix this: Option 1: Plug the hole in my new is_git_repository function. A quick and dirty fix that passes the above test would be: diff --git a/builtin/clean.c b/builtin/clean.c index b679913..4f2fe95 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -153,14 +153,21 @@ static int is_git_repository(struct strbuf *path) if (is_git_directory(path-buf)) ret = 1; else { - size_t orig_path_len = path-len; - assert(orig_path_len != 0); - if (path-buf[orig_path_len - 1] != '/') - strbuf_addch(path, '/'); - strbuf_addstr(path, .git); - if (is_git_directory(path-buf)) - ret = 1; - strbuf_setlen(path, orig_path_len); + struct stat st; + const char *submodule_git_dir = git_path_submodule(path-buf, ); + lstat(submodule_git_dir, st); + if (S_ISDIR(st.st_mode)) + ret = 1; + else { + size_t orig_path_len = path-len; + assert(orig_path_len != 0); + if (path-buf[orig_path_len - 1] != '/') + strbuf_addch(path, '/'); + strbuf_addstr(path, .git); + if (is_git_directory(path-buf)) + ret = 1; + strbuf_setlen(path, orig_path_len); + } } return ret; There are probably more elegant solutions available here, suggestions welcome. Option 2: Go with the current solution of using resolve_gitlink_ref but either A) avoid placing non-submodule paths in the ref_cache
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/17/2015 07:53 PM, Jeff King wrote: On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote: This _might_ have some performance impact in that strbuf_addch() involves strbuf_grow(*, 1), which does does it overflow to increment sb-len by one?; I would say it should be unmeasurable because the function is expected to be used only on loose objects and you shouldn't have very many of them without packing in your repository in the first place. I guess Peff's c1822d4f (strbuf: add an optimized 1-character strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his new strbuf_grow_ch(), and once that happens the performance worry would disappear without this code to be changed at all. I haven't re-rolled that series yet, but the discussion there showed that strbuf_grow_ch() is unnecessary; call-sites can just check: if (!strbuf_avail(sb)) strbuf_grow(sb, 1); to get the fast inline check. Since we go to the trouble to inline strbuf_addch, we should probably teach it the same trick. It would be nice to identify a place with a tight strbuf_addch() loop that could demonstrate the speed increase, though. Just for reference, I did teach strbuf_addch this trick. And the config code is the tight loop to test it with. Results are here: http://article.gmane.org/gmane.comp.version-control.git/267266 In this code path, we are typically only seeing short strings (the original code used a 10-byte static buffer). So I doubt it makes all that much difference. If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. I'm not sure I understand why we are copying it at all. The original code copied from the hdr into type[10] so that we could NUL-terminate it, which was required for type_from_string(). But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? Yes, we could, that would eliminate struct strbuf typename = STRBUF_INIT. Something like this perhaps : @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s * too permissive for what we want to check. So do an anal * object header parse by hand. */ -int parse_sha1_header(const char *hdr, unsigned long *sizep) +static int parse_sha1_header_extended(const char *hdr, struct object_info *oi, + unsigned int flags) { - char type[10]; - int i; + const char *buf = hdr; unsigned long size; + int type, len = 0; /* -* The type can be at most ten bytes (including the -* terminating '\0' that we add), and is followed by +* The type can be of any size but is followed by * a space. */ - i = 0; for (;;) { char c = *hdr++; if (c == ' ') break; - type[i++] = c; - if (i = sizeof(type)) - return -1; + len++; } - type[i] = 0; + + type = type_from_string_gently(buf, len, 1); + if (oi-typename) { + strbuf_add(oi-typename, buf, len); + strbuf_addch(oi-typename, '\0'); + } + /* +* Set type to 0 if its an unknown object and +* we're obtaining the type using '--literally' +* option. +*/ + if ((flags LOOKUP_LITERALLY) (type == -1)) + type = 0; + else if (type == -1) + die(invalid object type); + if (oi-typep) + *oi-typep = type; /* * The length must follow immediately, and be in canonical -Peff [1] Not related to your patch, but it looks like type_from_string_gently is overly lax. It does a strncmp() with the length of the candidate name, but does not check that we consumed all of the matching name. So tr would match tree, comm would match commit, and so forth. Thanks for the patch re-roll on strbuf_addch() and the patch on type_from_string_gently(). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear friend,
Dear friend, Please do accept my apologies as I do not wish to invade into your privacy, I had written an earlier mail to you but without response. My name is Coffi Kelly, I am a lawyer by profession based in Lome-Togo. I need your urgent assistance for the claiming of the sum of US$9.2 Million belonging to my(late client) , Which he deposited in a bank in Togo. He was my client and a major contract supplier to oil firms here in Togo. I want you to stand as the next of kin for the fund because my late client died in a motor accident with his immediate family without leaving a will or anybody to be presented to the bank as the next of kin of the fund. Indicate your interest by Writing back directly to my Email: ( eaaqu...@gmail.com) for more details due to the mandate given to me by the bank management. As you write include your phone number and email address for more details clarification and the way forward to the claim. I wait your response soonest. Best regards, Coffi Kelly, GOD BLESS -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote: But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? Yes, we could, that would eliminate struct strbuf typename = STRBUF_INIT. Something like this perhaps : Yeah, this is exactly what I had in mind. { - char type[10]; - int i; + const char *buf = hdr; unsigned long size; + int type, len = 0; Maybe switch the names of buf and len to type_buf and type_len to make their purpose more clear? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
Michael J Gruber g...@drmicha.warpmail.net writes: Similarly I think it is not very consistent that one cannot combine any of the above options with the Sstring but instead have yet another option called pickaxe-regex to toggle between fixed-string and extended-regexp semantics for the argument passed to option S. The defaults are different, and it is likely that users want to switch one without switching the other. E.g., with -S you often use strings that you'd rather not have to quote to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -Sstring option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where string appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of string and the other side has zero. But -Gregexp shows commits whose git show $that_commit output would have lines matching regexp as added or deleted. So you get different results from this history: (before)(after) a b b a c c As git show for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c git log -Ga would say it is a match. But from git log -Sa's point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but -G --fixed-strings cannot be -S. They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than git blame. You feed a _unique_ block of lines taken from the current contents as the string to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reachable: only mark local objects as recent
Jeff King p...@peff.net writes: On Fri, Mar 27, 2015 at 12:00:05PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: It is possible that we may drop an object that is depended upon by another object in the alternate. For example, imagine two repositories, A and B, with A pointing to B as an alternate. Now imagine a commit that is in B which references a tree that is only in A. Traversing from recent objects in B might prevent A from dropping that tree. But this case isn't worth covering. Repo B should take responsibility for its own objects. It would never have had the commit in the first place if it did not also have the tree, and assuming it is using the same keep recent chunks of history scheme, then it would itself keep the tree, as well. In other words, if you have a loop in dependency chain among alternate repositories, your set-up is broken by definition. Which makes sense to me. Thanks. I don't see this patch in pu or What's Cooking at all. Did it get dropped? It appears that way (rather, never picked up). Thanks for reminding. It does fix a performance regression, but the problem is in v2.2, so I don't think it's urgent for v2.4-rc. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
Michael J Gruber g...@drmicha.warpmail.net writes: Similarly I think it is not very consistent that one cannot combine any of the above options with the Sstring but instead have yet another option called pickaxe-regex to toggle between fixed-string and extended-regexp semantics for the argument passed to option S. The defaults are different, and it is likely that users want to switch one without switching the other. E.g., with -S you often use strings that you'd rather not have to quote to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -Sstring option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where string appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of string and the other side has zero. But -Gregexp shows commits whose git show $that_commit output would have lines matching regexp as added or deleted. So you get different results from this history: (before)(after) a b b a c c As git show for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c git log -Ga would say it is a match. But from git log -Sa's point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but -G --fixed-strings cannot be -S. They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than git blame. You feed a _unique_ block of lines taken from the current contents as the string to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Another approach to large transactions
Stefan Beller sbel...@google.com writes: * We keep the speed on small transactions (no close and reopen of fds in small transactions) * No refactoring for refs included, only minimally invasive to the refs.c code * applies on top of origin/sb/remove-fd-from-ref-lock replacing the last commit there (I reworded the commit message of the last patch of that tip, being the first patch in this series) * another approach would be to move the fd counting into the lock file api, I think that's not worth it for now. I agree that it is a good direction to go to limit the number of open file descriptors. Overall it looked good to me. Stefan Beller (3): refs.c: remove lock_fd from struct ref_lock Move unsigned int get_max_fd_limit(void) to git_compat_util.h refs.c: enable large transactions git-compat-util.h | 1 + refs.c| 28 ++-- sha1_file.c | 41 - t/t1400-update-ref.sh | 4 ++-- wrapper.c | 41 + 5 files changed, 62 insertions(+), 53 deletions(-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?
On Fri, Apr 17, 2015 at 7:26 AM, Michael J Gruber g...@drmicha.warpmail.net wrote: Similarly I think it is not very consistent that one cannot combine any of the above options with the Sstring but instead have yet another option called pickaxe-regex to toggle between fixed-string and extended-regexp semantics for the argument passed to option S. The defaults are different, and it is likely that users want to switch one without switching the other. E.g., with -S you often use strings that you'd rather not have to quote to guard them against the regexp engine. But the hypothetical -G that would look for a fixed string would be vastly different from -S, wouldn't it? The -Sstring option was invented to find a commit where one side of the comparison has that string in the blob and the other side does not; it shows commits where string appears different number of times in the before- and the after- blobs, because doing so does not hurt its primary use case to find commits where one side has one instance of string and the other side has zero. But -Gregexp shows commits whose git show $that_commit output would have lines matching regexp as added or deleted. So you get different results from this history: (before)(after) a b b a c c As git show for such a commit looks like this: diff --git a/one b/one index de98044..0c81c28 100644 --- a/one +++ b/one @@ -1,3 +1,3 @@ -a b +a c git log -Ga would say it is a match. But from git log -Sa's point of view, it is not a match; both sides have the same number of 'a' [*1*]. I think it would make sense to teach --fixed-strings or whatever option to -G just like it pays attention to ignore-case, but -G --fixed-strings cannot be -S. They have different semantics. [Footnote] *1* This is because -S was envisioned as (and its behaviour has been maintained as such) a building block for Porcelain that does more than git blame. You feed a _unique_ block of lines taken from the current contents as the string to quickly find the last commit that touched that area, and iteratively dig deeper. The -S option was meant to be used for that single step of digging, as a part of much more grand vision in $gmane/217, which I would still consider one of the most important messages on the mailing list, posted 10 years ago ;-) [jc: My mail provider seem to be queuing but not sending out SMTP outgoing traffic, so I am trying to (re)send this in an alternate route. If you got a duplicate of this message, my apologies.] -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF
On 2015-04-17 17.44, Torsten Bögershausen wrote: Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de Originally I wanted to have Dscho as Author, is that OK with you ? (But the From: line didn't made it through my email program) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix settings in default_user_config template
The name (not user) and email setting should be in config section user and not in core as documented in Documentation/config.txt. Signed-Off-By: Ossi Herrala oherr...@gmail.com Reviewed-by: Jeff King p...@peff.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix settings in default_user_config template
The name (not user) and email setting should be in config section user and not in core as documented in Documentation/config.txt. --- builtin/config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/config.c b/builtin/config.c index d32c532..bfd3016 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -455,9 +455,9 @@ static char *default_user_config(void) struct strbuf buf = STRBUF_INIT; strbuf_addf(buf, _(# This is Git's per-user configuration file.\n - [core]\n + [user]\n # Please adapt and uncomment the following lines:\n - #user = %s\n + #name = %s\n #email = %s\n), ident_default_name(), ident_default_email()); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] t0027: Cleanup: rename functions; avoid non-leading TABs
Make more clear what the tests are doing: commit_check_warn(): Commit files and checks for conversion warnings. Old name: create_file_in_repo() checkout_files(): Checkout files from the repo and check if they have the appropriate line endings in the work space. Old name: check_files_in_ws() Replace non-leading TABS with spaces Signed-off-by: Torsten Bögershausen tbo...@web.de --- Changes since v1: - patch 1 is the same - patch 2 is taken from Dscho (but slightly modified) Dscho, please tell if this is not OK with you - patch 3 temporally removed to simplify life, it will be send later t/t0027-auto-crlf.sh | 184 +-- 1 file changed, 92 insertions(+), 92 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 452320d..5858397 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -68,7 +68,7 @@ check_warning () { esac } -create_file_in_repo () { +commit_check_warn () { crlf=$1 attr=$2 lfname=$3 @@ -109,7 +109,7 @@ check_files_in_repo () { } -check_files_in_ws () { +checkout_files () { eol=$1 crlf=$2 attr=$3 @@ -169,40 +169,40 @@ test_expect_success 'setup master' ' warn_LF_CRLF=LF will be replaced by CRLF warn_CRLF_LF=CRLF will be replaced by LF -test_expect_success 'add files empty attr' ' - create_file_in_repo false - create_file_in_repo true LF_CRLF LF_CRLF - create_file_in_repo input CRLF_LF CRLF_LF +test_expect_success 'commit files empty attr' ' + commit_check_warn false + commit_check_warn true LF_CRLF LF_CRLF + commit_check_warn input CRLF_LF CRLF_LF ' -test_expect_success 'add files attr=auto' ' - create_file_in_repo false auto CRLF_LF CRLF_LF - create_file_in_repo true auto LF_CRLF LF_CRLF - create_file_in_repo input auto CRLF_LF CRLF_LF +test_expect_success 'commit files attr=auto' ' + commit_check_warn false auto CRLF_LF CRLF_LF + commit_check_warn true auto LF_CRLF LF_CRLF + commit_check_warn input auto CRLF_LF CRLF_LF ' -test_expect_success 'add files attr=text' ' - create_file_in_repo false text CRLF_LF CRLF_LF CRLF_LF - create_file_in_repo true text LF_CRLF LF_CRLF LF_CRLF - create_file_in_repo input text CRLF_LF CRLF_LF CRLF_LF +test_expect_success 'commit files attr=text' ' + commit_check_warn false text CRLF_LF CRLF_LF CRLF_LF + commit_check_warn true text LF_CRLF LF_CRLF LF_CRLF + commit_check_warn input text CRLF_LF CRLF_LF CRLF_LF ' -test_expect_success 'add files attr=-text' ' - create_file_in_repo false -text - create_file_in_repo true -text - create_file_in_repo input -text +test_expect_success 'commit files attr=-text' ' + commit_check_warn false -text + commit_check_warn true -text + commit_check_warn input -text ' -test_expect_success 'add files attr=lf' ' - create_file_in_repo false lf CRLF_LF CRLF_LF CRLF_LF - create_file_in_repo true lf CRLF_LF CRLF_LF CRLF_LF - create_file_in_repo input lf CRLF_LF CRLF_LF CRLF_LF +test_expect_success 'commit files attr=lf' ' + commit_check_warn false lf CRLF_LF CRLF_LF CRLF_LF + commit_check_warn true lf CRLF_LF CRLF_LF CRLF_LF + commit_check_warn input lf CRLF_LF CRLF_LF CRLF_LF ' -test_expect_success 'add files attr=crlf' ' - create_file_in_repo false crlf LF_CRLF LF_CRLF LF_CRLF - create_file_in_repo true crlf LF_CRLF LF_CRLF LF_CRLF - create_file_in_repo input crlf LF_CRLF LF_CRLF LF_CRLF +test_expect_success 'commit files attr=crlf' ' + commit_check_warn false crlf LF_CRLF LF_CRLF LF_CRLF + commit_check_warn true crlf LF_CRLF LF_CRLF LF_CRLF + commit_check_warn input crlf LF_CRLF LF_CRLF LF_CRLF ' test_expect_success 'create files cleanup' ' @@ -237,7 +237,7 @@ test_expect_success 'commit -text' ' # Check how files in the repo are changed when they are checked out # How to read the table below: -# - check_files_in_ws will check multiple files with a combination of settings +#
[PATCH v2 2/2] t0027: Support NATIVE_CRLF
Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de --- t/t0027-auto-crlf.sh | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 5858397..8975b97 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -57,15 +57,13 @@ create_gitattributes () { check_warning () { case $1 in - LF_CRLF) grep LF will be replaced by CRLF $2;; - CRLF_LF) grep CRLF will be replaced by LF $2;; - '') - expect - grep will be replaced by $2 actual - test_cmp expect actual - ;; - *) false ;; + LF_CRLF) echo warning: LF will be replaced by CRLF $2.expect ;; + CRLF_LF) echo warning: CRLF will be replaced by LF $2.expect ;; + '') $2.expect ;; + *) echo 2 Illegal 1: $1 ; return false ;; esac + grep will be replaced by $2 | sed -e s/\(.*\) in [^ ]*$/\1/ $2.actual + test_cmp $2.expect $2.actual } commit_check_warn () { @@ -169,6 +167,18 @@ test_expect_success 'setup master' ' warn_LF_CRLF=LF will be replaced by CRLF warn_CRLF_LF=CRLF will be replaced by LF +#WILC means Warn if (this OS) converts LF into CRLF +if test_have_prereq NATIVE_CRLF +then +WILC=LF_CRLF +WICL= +WAMIX=LF_CRLF +else +WILC= +WICL=CRLF_LF +WAMIX=CRLF_LF +fi + test_expect_success 'commit files empty attr' ' commit_check_warn false commit_check_warn true LF_CRLF LF_CRLF @@ -176,13 +186,13 @@ test_expect_success 'commit files empty attr' ' ' test_expect_success 'commit files attr=auto' ' - commit_check_warn false auto CRLF_LF CRLF_LF + commit_check_warn false auto $WILC $WICL$WAMIX commit_check_warn true auto LF_CRLF LF_CRLF commit_check_warn input auto CRLF_LF CRLF_LF ' test_expect_success 'commit files attr=text' ' - commit_check_warn false text CRLF_LF CRLF_LF CRLF_LF + commit_check_warn false text $WILC $WICL$WAMIX $WILC $WICL commit_check_warn true text LF_CRLF LF_CRLF LF_CRLF commit_check_warn input text CRLF_LF CRLF_LF CRLF_LF ' -- 2.2.0.rc1.790.ge19fcd2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fast-import: add options to enable/disable case folding
On 04/17/2015 01:52 PM, Mike Hommey wrote: Currently, fast-import does case folding depending on `core.ignorecase`. `core.ignorecase` depends on the file system where the working tree is. However, different kind of imports require different kinds of semantics, and they usually aren't tied with the file system, but with the data being imported. Good that you take up this issue, thanks for the patch More comments inline. Add command line options to enable or disable case folding. Also expose them as features in the fast-import stream. Features instead of options, because a stream that needs case folding enabled or disabled won't work as expected if fast-import doesn't support the case folding options. --- Documentation/git-fast-import.txt | 11 ++ fast-import.c | 19 -- t/t9300-fast-import.sh| 79 +++ 3 files changed, 106 insertions(+), 3 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 690fed3..22eba87 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -50,6 +50,13 @@ OPTIONS memory used by fast-import during this run. Showing this output is currently the default, but can be disabled with \--quiet. +--[no-]fold-case:: + When files/directories with the same name but a different case + are detected, they are treated as the same (--fold-case) or as + being different (--no-fold-case). The default is --fold-case + when `core.ignorecase` is set to `true`, and --no-fold-case when + it is `false`. + Most often the we use the term ignore-case, could that be a better name ? Other opinions, pros/cons ? Options for Frontends ~ @@ -1027,6 +1034,8 @@ date-format:: export-marks:: relative-marks:: no-relative-marks:: +fold-case:: +no-fold-case:: force:: Act as though the corresponding command-line option with a leading '--' was passed on the command line @@ -1091,6 +1100,8 @@ not be passed as option: * import-marks * export-marks * cat-blob-fd +* fold-case +* no-fold-case * force `done` diff --git a/fast-import.c b/fast-import.c index 6378726..958f3da 100644 --- a/fast-import.c +++ b/fast-import.c @@ -371,10 +371,18 @@ static volatile sig_atomic_t checkpoint_requested; /* Where to write output of cat-blob commands */ static int cat_blob_fd = STDOUT_FILENO; +/* Whether to enable case folding */ +static int fold_case; + static void parse_argv(void); static void parse_cat_blob(const char *p); static void parse_ls(const char *p, struct branch *b); +static int strncmp_foldcase(const char *a, const char *b, size_t count) +{ + return fold_case ? strncasecmp(a, b, count) : strncmp(a, b, count); +} + static void write_branch_report(FILE *rpt, struct branch *b) { fprintf(rpt, %s:\n, b-name); @@ -1507,7 +1515,7 @@ static int tree_content_set( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (!*slash1) { if (!S_ISDIR(mode) e-versions[1].mode == mode @@ -1597,7 +1605,7 @@ static int tree_content_remove( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (*slash1 !S_ISDIR(e-versions[1].mode)) /* * If p names a file in some subdirectory, and a @@ -1664,7 +1672,7 @@ static int tree_content_get( t = root-tree; for (i = 0; i t-entry_count; i++) { e = t-entries[i]; - if (e-name-str_len == n !strncmp_icase(p, e-name-str_dat, n)) { + if (e-name-str_len == n !strncmp_foldcase(p, e-name-str_dat, n)) { if (!*slash1) goto found_entry; if (!S_ISDIR(e-versions[1].mode)) @@ -3246,6 +3254,10 @@ static int parse_one_feature(const char *feature, int from_stream) relative_marks_paths = 1; } else if (!strcmp(feature, no-relative-marks)) { relative_marks_paths = 0; + } else if (!strcmp(feature, fold-case)) { + fold_case = 1; + } else if (!strcmp(feature, no-fold-case)) { + fold_case = 0; } else if (!strcmp(feature, done)) { require_explicit_termination = 1; } else if (!strcmp(feature, force)) { @@ -3372,6 +3384,7 @@ int main(int argc, char
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Jeff King p...@peff.net writes: If it's not a problem, I'd love to see timings for your case with just the first patch, and then with both. Thanks for two quick progress patches. You may also be interested in: http://thread.gmane.org/gmane.comp.version-control.git/266370 which addresses another performance problem related to the freshen/recent code in v2.2. That, too. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix settings in default_user_config template
Ossi Herrala oherr...@gmail.com writes: The name (not user) and email setting should be in config section user and not in core as documented in Documentation/config.txt. Signed-Off-By: Ossi Herrala oherr...@gmail.com Reviewed-by: Jeff King p...@peff.net Thanks ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories
Jeff King p...@peff.net writes: Option 1: Plug the hole in my new is_git_repository function. A quick and dirty fix that passes the above test would be: I think that makes sense. It would be nice if you could just call read_gitfile, but that function is very anxious to die on error. So the prerequisite step would probably be to refactor that into a read_gitfile_gently that returns an error code. I agree. I was looking at the repository discovery loop to see if it makes sense to update is-git-directory() to take a gitfile, but I do not think it is a good idea (typically after is-git-directory() says yes, we would append paths e.g. refs/heads/master after it to pass the result to system calls like open()). I agree that adding read-gitfile-gently and call it before running is-git-directory would be a good solution for this change. PS Thank you for working on this. That too. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF
Hi Torsten, On 2015-04-17 17:44, Torsten Bögershausen wrote: Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de Thank you so much! Dscho -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
karthik nayak karthik@gmail.com writes: + type = type_from_string_gently(buf, len, 1); + if (oi-typename) { + strbuf_add(oi-typename, buf, len); + strbuf_addch(oi-typename, '\0'); add() has setlen() at the end so you do not have to NUL terminate it yourself. Doing addch() is actively wrong, as oi-typename-len now counts the terminating NUL as part of the string, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Jeff King p...@peff.net writes: If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. Thanks for reminding me; yes, that also worried me. I'm not sure I understand why we are copying it at all. The original code copied from the hdr into type[10] so that we could NUL-terminate it, which was required for type_from_string(). Sounds like a good plan. But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? -Peff [1] Not related to your patch, but it looks like type_from_string_gently is overly lax. It does a strncmp() with the length of the candidate name, but does not check that we consumed all of the matching name. So tr would match tree, comm would match commit, and so forth. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fast-import: add options to enable/disable case folding
Torsten Bögershausen tbo...@web.de writes: +--[no-]fold-case:: +When files/directories with the same name but a different case +are detected, they are treated as the same (--fold-case) or as +being different (--no-fold-case). The default is --fold-case +when `core.ignorecase` is set to `true`, and --no-fold-case when +it is `false`. + Most often the we use the term ignore-case, could that be a better name ? Other opinions, pros/cons ? Yeah, --[no-]ignore-case sounds more in line with how other commands' options are spelled. But I somehow thought this case-folding was deliberately done as an improvement against the original that did not have a way to do the ignore-case? http://thread.gmane.org/gmane.comp.version-control.git/200597/focus=200625 I am not sure why not until now I did not find the original justification dubious, but I think fast-export should never do case folding---Joshua talks about working trees on a file system that is incapable of expressing different cases, but export is about reading in-repository histories, whose trees are fully capable of expressing paths in different cases just fine, and spitting out a file that can be processed by fast-import. I do not see why it should collapse two different paths that differ in case at export time. If the original history is broken by Perforce or whatever and recording the history of the same path in different case combinations in different commits, perhaps the right thing to do is to fix the original history in Git repository before exporting in the first place. I do not see how such a corruption is related to the characteristics of the filesystem where export is run. Perhaps a case-insensitive filesystem may helped Perforce to corrupt the history when initial import of the history into Git was done, but core.ignorecase of the current repository does not help us decide if that was actually the case---the import may have been done on a completely different machine. So perhaps we should rip the case folding out altogether instead? The entry for the change in the Release Notes may say: * git fast-import incorrectly case-folded the paths recorded in the history when core.ignorease is set (i.e. the repository's working tree is incapable of expressing paths that differ only in their cases); this old bug was reported in 2012 and was finally corrected. or something like that? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF
Hi Torsten, On 2015-04-17 19:00, Torsten Bögershausen wrote: On 2015-04-17 17.44, Torsten Bögershausen wrote: Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de Originally I wanted to have Dscho as Author, is that OK with you ? (But the From: line didn't made it through my email program) No worries. Dscho -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories
On Fri, Apr 17, 2015 at 08:15:40PM +0200, erik elfström wrote: Doesn't this implementation get confused by modern submodule checkouts and descend into and clean their working tree, though? Module M with path P would have a directory P in the working tree of the top-level project, and P/.git is a regular file that will fail is_git_directory() test but records the location of the real submodule repository i.e. .git/modules/M via the gitdir: mechanism. Yes, there is a problem here. I've added the test below and it fails after my change by cleaning sub2 (sub1 is not cleaned). Are there more cases here that I should test for? I wonder about the opposite case, too (finding more repos than we used to). It looks like your patches will find bare repositories in the tree, whereas the current code does not (it only cares about .git). AFAIK, submodules will never exist as bare in the working tree. And I have seen repositories which embed bare repos as test cases. Admittedly this is because I work on projects that are related to git itself, but I don't see a reason to regress this case if the submodule code doesn't get any benefit. Base on the previous discussion of the patch topic I can see 3 options for how to fix this: Option 1: Plug the hole in my new is_git_repository function. A quick and dirty fix that passes the above test would be: I think that makes sense. It would be nice if you could just call read_gitfile, but that function is very anxious to die on error. So the prerequisite step would probably be to refactor that into a read_gitfile_gently that returns an error code. -Peff PS Thank you for working on this. I have been quiet because I haven't had a chance to look over your patches carefully yet, but overall they look very promising. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fast-import: add options to enable/disable case folding
On Fri, Apr 17, 2015 at 06:56:43PM +0200, Torsten Bögershausen wrote: On 04/17/2015 01:52 PM, Mike Hommey wrote: +test_expect_success 'V: default case folding with ignorecase=true' ' + git config core.ignorecase true + git fast-import input + git ls-tree refs/heads/V actual + git update-ref -d refs/heads/V + cat expected \EOF +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 a +EOF + test_cmp expected actual' + +test_expect_success 'V: default case folding with ignorecase=false' ' + git config core.ignorecase false + git fast-import input + git ls-tree refs/heads/V actual + git update-ref -d refs/heads/V + cat expected \EOF +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 A +EOF + test_cmp expected actual' + +test_expect_success 'V: forced case folding with ignorecase=true' ' + git config core.ignorecase true + git fast-import --fold-case input + git ls-tree refs/heads/V actual + git update-ref -d refs/heads/V + cat expected \EOF +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85 a +EOF + test_cmp expected actual' If you want to make it shorter (and try to avoid repetition): The forced true cases could be collected in a loop. (and the same for forced=false) I was also going to suggest squashing the repetition. Here's what I had in mind: --- 8 --- test_foldcase() { ignore=$1 case $2 in true) fold=--fold-case folded=true ;; false) fold=--no-fold-case folded=false ;; *) fold= folded=$ignore ;; esac case $folded in true) folded=a ;; false) folded=A ;; esac test_expect_success V: case folding: ignorecase=$ignore${fold:+ $fold} git -c core.ignorecase=$ignore fast-import $fold input git ls-tree refs/heads/V actual git update-ref -d refs/heads/V cat expect -EOF 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85$folded EOF test_cmp expect actual } for o in '' true false do for c in true false do test_foldcase $c $o done done --- 8 --- which outputs: --- 8 --- ok 176 - V: case folding: ignorecase=true ok 177 - V: case folding: ignorecase=false ok 178 - V: case folding: ignorecase=true --fold-case ok 179 - V: case folding: ignorecase=false --fold-case ok 180 - V: case folding: ignorecase=true --no-fold-case ok 181 - V: case folding: ignorecase=false --no-fold-case --- 8 --- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] log -L: improve error message on malformed argument
Matthieu Moy matthieu@imag.fr writes: The old message did not mention the :regex:file form. To avoid overly long lines, split the message into two lines (in case item-string is long, it will be the only part truncated in a narrow terminal). Signed-off-by: Matthieu Moy matthieu@imag.fr --- line-log.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/line-log.c b/line-log.c index a490efe..e725248 100644 --- a/line-log.c +++ b/line-log.c @@ -575,7 +575,8 @@ parse_lines(struct commit *commit, const char *prefix, struct string_list *args) name_part = skip_range_arg(item-string); if (!name_part || *name_part != ':' || !name_part[1]) - die(-L argument '%s' not of the form start,end:file, + die(invalid -L argument '%s'.\n + It should be of the form start,end:file or :regex:file., item-string); range_part = xstrndup(item-string, name_part - item-string); name_part++; I actually think start,end:file is more correct than your updated text. By adding :regex:file as a possibility, you are hinting that 'start' and 'end' are *not* regular expressions but numbers, but $ git log -L'/^int main/,/^}/:git.c' is a perfectly fine way to specify start (i.e. the first line that matches '^int main') and end (i.e. the first line that matches '^}' after that). Perhaps rephrase it as :funcname:file to avoid giving false impression to the other one, and use Eric's suggestion on top? die(-L argument not 'start,end:file' or ':funcname:file': %s, item-string); With the matching update to tests, here is what I'll queue on top of this patch for now, but please send in objections and improvements. Thanks. line-log.c | 3 +-- t/t4211-line-log.sh | 8 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/line-log.c b/line-log.c index 5d4cb7c..51d9f7c 100644 --- a/line-log.c +++ b/line-log.c @@ -543,8 +543,7 @@ parse_lines(struct commit *commit, const char *prefix, struct string_list *args) name_part = skip_range_arg(item-string); if (!name_part || *name_part != ':' || !name_part[1]) - die(invalid -L argument '%s'.\n - It should be of the form start,end:file or :regex:file., + die(-L argument not 'start,end:file' or ':funcname:file': %s, item-string); range_part = xstrndup(item-string, name_part - item-string); name_part++; diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh index 426a828..edd5ed3 100755 --- a/t/t4211-line-log.sh +++ b/t/t4211-line-log.sh @@ -40,14 +40,14 @@ canned_test -L 24,+1:a.c simple vanishes-early canned_test -L '/long f/,/^}/:b.c' move-support move-support-f test_bad_opts -L switch.*requires a value -test_bad_opts -L b.c argument.*not of the form -test_bad_opts -L 1: argument.*not of the form +test_bad_opts -L b.c argument not .start,end:file +test_bad_opts -L 1: argument not .start,end:file test_bad_opts -L 1:nonexistent There is no path test_bad_opts -L 1:simple There is no path -test_bad_opts -L '/foo:b.c' argument.*not of the form +test_bad_opts -L '/foo:b.c' argument not .start,end:file test_bad_opts -L 1000:b.c has only.*lines test_bad_opts -L 1,1000:b.c has only.*lines -test_bad_opts -L :b.c argument.*not of the form +test_bad_opts -L :b.c argument not .start,end:file test_bad_opts -L :foo:b.c no match test_done -- 2.4.0-rc2-183-g70401ab -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Jeff King p...@peff.net writes: On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. Thanks for reminding me; yes, that also worried me. As an aside, I worried about the extra allocation for reading the header in the first place. But it looks like we only do this on the --literally code path (and otherwise use the normal unpack_sha1_header). Still, I wonder if we could make this work automagically. That is, speculatively unpack the first N bytes, assuming we hit the end-of-header. If not, then go to a strbuf as the slow path. Then it would be fine to cover all cases; the normal ones would be fast, and only ridiculous things would incur the extra allocation. Yes, that was what I was hoping to see eventually ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] type_from_string_gently: make sure length matches
Jeff King p...@peff.net writes: I'd be surprised if it appreciably speeds things up, but I guess it is not too complicated to do. +static struct { +const char *str; +int len; +} object_type_name[] = { +{ NULL, 0 }, /* OBJ_NONE = 0 */ +{ commit, 6 }, /* OBJ_COMMIT = 1 */ +{ tree, 4 },/* OBJ_TREE = 2 */ +{ blob, 4 },/* OBJ_BLOB = 3 */ +{ tag, 3 }, /* OBJ_TAG = 4 */ }; I had envisioned a macro like: #define SIZED_STRING(x) { (x), (sizeof(x) - 1) } though perhaps that is overkill for such a short list (that we don't even expect to change). Sounds good (either way ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. Thanks for reminding me; yes, that also worried me. As an aside, I worried about the extra allocation for reading the header in the first place. But it looks like we only do this on the --literally code path (and otherwise use the normal unpack_sha1_header). Still, I wonder if we could make this work automagically. That is, speculatively unpack the first N bytes, assuming we hit the end-of-header. If not, then go to a strbuf as the slow path. Then it would be fine to cover all cases; the normal ones would be fast, and only ridiculous things would incur the extra allocation. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] type_from_string_gently: make sure length matches
On Fri, Apr 17, 2015 at 01:54:27PM -0700, Junio C Hamano wrote: Since the strings we are matching are literals, we could also record their sizes in the object_type_strings array and check the length first before even calling strncmp. I doubt this is a performance hot-spot, though. You could also potentially just use strlen(object_type_strings[i]), but I'm not sure if compilers will optimize out the strlen in this case, since it is in a loop. That thought crossed my mind while reading your patch. It could even make it go faster if we made object_type_strings into an array of counted strings (i.e. struct { const char *str; int len; }) and then took advantage of the fact that we have lengths of both. Right, that was what I meant. I'd be surprised if it appreciably speeds things up, but I guess it is not too complicated to do. +static struct { + const char *str; + int len; +} object_type_name[] = { + { NULL, 0 }, /* OBJ_NONE = 0 */ + { commit, 6 }, /* OBJ_COMMIT = 1 */ + { tree, 4 },/* OBJ_TREE = 2 */ + { blob, 4 },/* OBJ_BLOB = 3 */ + { tag, 3 }, /* OBJ_TAG = 4 */ }; I had envisioned a macro like: #define SIZED_STRING(x) { (x), (sizeof(x) - 1) } though perhaps that is overkill for such a short list (that we don't even expect to change). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF
Johannes Schindelin johannes.schinde...@gmx.de writes: Hi Torsten, On 2015-04-17 17:44, Torsten Bögershausen wrote: Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de Thank you so much! Dscho Thanks, is that Acked-by: Dscho for both patches? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF
On Fri, Apr 17, 2015 at 11:44 AM, Torsten Bögershausen tbo...@web.de wrote: Without this patch, t0027 expects the native end-of-lines to be a single line feed character. On Windows, however, we set it to a carriage return character followed by a line feed character. Thus, we have to modify t0027 to expect different warnings depending on the end-of-line markers. Adjust the check of the warnings and use these macros: WILC: Warn if LF becomes CRLF WICL: Warn if CRLF becomes LF WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF Improve the information given by check_warning(): Use test_cmp to show which warning is missing (or should'n t be there) s/should'n t/shouldn't/ Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de Signed-off-by: Torsten Bögershausen tbo...@web.de -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] type_from_string_gently: make sure length matches
Jeff King p...@peff.net writes: When commit fe8e3b7 refactored type_from_string to allow input that was not NUL-terminated, it switched to using strncmp instead of strcmp. But this means we check only the first len bytes of the strings, and ignore any remaining bytes in the object_type_string. We should make sure that it is also len bytes, or else we would accept comm as commit, and so forth. Signed-off-by: Jeff King p...@peff.net --- Since the strings we are matching are literals, we could also record their sizes in the object_type_strings array and check the length first before even calling strncmp. I doubt this is a performance hot-spot, though. You could also potentially just use strlen(object_type_strings[i]), but I'm not sure if compilers will optimize out the strlen in this case, since it is in a loop. That thought crossed my mind while reading your patch. It could even make it go faster if we made object_type_strings into an array of counted strings (i.e. struct { const char *str; int len; }) and then took advantage of the fact that we have lengths of both. object.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/object.c b/object.c index aedac24..51584ea 100644 --- a/object.c +++ b/object.c @@ -18,19 +18,22 @@ struct object *get_indexed_object(unsigned int idx) return obj_hash[idx]; } -static const char *object_type_strings[] = { - NULL, /* OBJ_NONE = 0 */ - commit, /* OBJ_COMMIT = 1 */ - tree, /* OBJ_TREE = 2 */ - blob, /* OBJ_BLOB = 3 */ - tag, /* OBJ_TAG = 4 */ +static struct { + const char *str; + int len; +} object_type_name[] = { + { NULL, 0 }, /* OBJ_NONE = 0 */ + { commit, 6 }, /* OBJ_COMMIT = 1 */ + { tree, 4 },/* OBJ_TREE = 2 */ + { blob, 4 },/* OBJ_BLOB = 3 */ + { tag, 3 }, /* OBJ_TAG = 4 */ }; const char *typename(unsigned int type) { - if (type = ARRAY_SIZE(object_type_strings)) + if (type = ARRAY_SIZE(object_type_name)) return NULL; - return object_type_strings[type]; + return object_type_name[type].str; } int type_from_string_gently(const char *str, ssize_t len, int gentle) @@ -40,8 +43,9 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle) if (len 0) len = strlen(str); - for (i = 1; i ARRAY_SIZE(object_type_strings); i++) - if (!strncmp(str, object_type_strings[i], len)) + for (i = 1; i ARRAY_SIZE(object_type_name); i++) + if (object_type_name[i].len == len + !strncmp(str, object_type_name[i].str, len)) return i; if (gentle) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] limit_list: avoid quadratic behavior from still_interesting
When we are limiting a rev-list traversal due to UNINTERESTING refs, we have to walk down the tips (both interesting and uninteresting) to find where they intersect. We keep a queue of commits to examine, pop commits off the queue one by one, and potentially add their parents. The size of the queue will naturally fluctuate based on the width of the history graph; i.e., the number of simultaneous lines of development. But for the most part it will stay in the same ballpark as the initial number of tips we fed, shrinking over time (as we hit common ancestors of the tips). So roughly speaking, if we start with `N` tips, we'll spend much of the time with a queue around `N` items. For each UNINTERESTING commit we pop, we call still_interesting to check whether marking its parents as UNINTERESTING has made the whole queue uninteresting (in which case we can quit early). Because the queue is stored as a linked list, this is `O(N)`, where `N` is the number of items in the queue. So processing a queue with `N` commits marked UNINTERESTING (and one or more interesting commits) will take `O(N^2)`. If you feed a lot of positive tips, this isn't a problem. They aren't UNINTERESTING, so they don't incur the still_interesting check. It also isn't a problem if you traverse from an interesting tip to some UNINTERESTING bases. We order the queue by recency, so the interesting commits stay at the front of the queue as we walk down them. The linear check can exit early as soon as it sees one interesting commit left in the queue. But if you want to know whether an older commit is reachable from a set of newer tips, we end up processing in the opposite direction: from the UNINTERESTING ones down to the interesting one. This may happen when we call: git rev-list $commits --not --all in check_everything_connected after a fetch. If we fetched something much older than most of our refs, and if we have a large number of refs, the traversal cost is dominated by the quadratic behavior. These commands simulate the connectivity check of such a fetch, when you have `$n` distinct refs in the receiver: # positive ref is 100,000 commits deep git rev-list --all | head -10 | tail -1 input # huge number of more recent negative refs git rev-list --all | head -$n | sed s/^/^/ input time git rev-list --stdin input Here are timings for various `n` on the linux.git repository. The `n=1` case provides a baseline for just walking the commits, which lets us see the still_interesting overhead. The times marked with `+` subtract that baseline to show just the extra time growth due to the large number of refs. The `x` numbers show the slowdown of the adjusted time versus the prior trial. n | before | after 1 | 0.991s | 0.848s 1 | 1.120s (+0.129s) | 0.885s (+0.037s) 2 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x) 4 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x) 8 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x) Each trial doubles `n`, so you can see the quadratic (`4x`) behavior before this patch. Afterwards, we have a roughly linear relationship. The implementation is fairly straightforward. Whenever we do the linear search, we cache the interesting commit we find, and next time check it before doing another linear search. If that commit is removed from the list or becomes UNINTERESTING itself, then we fall back to the linear search. This is very similar to the trick used by fce87ae (Fix quadratic performance in rewrite_one., 2008-07-12). I considered and rejected several possible alternatives: 1. Keep a count of UNINTERESTING commits in the queue. This requires managing the count not only when removing an item from the queue, but also when marking an item as UNINTERESTING. That requires touching the other functions which mark commits, and would require knowing quickly which commits are in the queue (lookup in the queue is linear, so we would need an auxiliary structure or to also maintain an IN_QUEUE flag in each commit object). 2. Keep a separate list of interesting commits. Drop items from it when they are dropped from the queue, or if they become UNINTERESTING. This again suffers from extra complexity to maintain the list, not to mention CPU and memory. 3. Use a better data structure for the queue. This is something that could help the fix in fce87ae, because we order the queue by recency, and it is about inserting quickly in recency order. So a normal priority queue would help there. But here, we cannot disturb the order of the queue, which makes things harder. We really do need an auxiliary index to track the flag we care about, which is basically option (2) above. The cache trick is simple, and the numbers above show that it works well in
Re: [PATCH 0/3] Another approach to large transactions
On Fri, Apr 17, 2015 at 3:12 PM, Junio C Hamano gits...@pobox.com wrote: This is now pushed out and sitting at the tip of 'pu'. It seems to break one of the tests in 1400 when merged to 'next', but I didn't look it closely. Thanks. ok, I'll look more closely. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] limit_list: avoid quadratic behavior from still_interesting
Jeff King p...@peff.net writes: The implementation is fairly straightforward. Whenever we do the linear search, we cache the interesting commit we find, and next time check it before doing another linear search. If that commit is removed from the list or becomes UNINTERESTING itself, then we fall back to the linear search. Nicely done, clever, simple and effective. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] git-p4: Use -m when running p4 changes
Simply running p4 changes on a large branch can result in a too many rows scanned error from the Perforce server. It is better to use a sequence of smaller calls to p4 changes, using the -m option to limit the size of each call. Signed-off-by: Lex Spoon l...@lexspoon.org Reviewed-by: Junio C Hamano gits...@pobox.com Reviewed-by: Luke Diamand l...@diamand.org --- Updated as suggested: - documentation added - avoided touch(1) - used test_seq - used || exit for test commands inside for loops - more tabs - fewer line breaks - expanded commit message Documentation/git-p4.txt | 17 ++--- git-p4.py| 54 +++- t/t9818-git-p4-block.sh | 64 3 files changed, 120 insertions(+), 15 deletions(-) create mode 100755 t/t9818-git-p4-block.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index a1664b9..82aa5d6 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -225,9 +225,20 @@ Git repository: they can find the p4 branches in refs/heads. --max-changes n:: - Limit the number of imported changes to 'n'. Useful to - limit the amount of history when using the '@all' p4 revision - specifier. + Import at most 'n' changes, rather than the entire range of + changes included in the given revision specifier. A typical + usage would be use '@all' as the revision specifier, but then + to use '--max-changes 1000' to import only the last 1000 + revisions rather than the entire revision history. + +--changes-block-size n:: + The internal block size to use when converting a revision + specifier such as '@all' into a list of specific change + numbers. Instead of using a single call to 'p4 changes' to + find the full list of changes for the conversion, there are a + sequence of calls to 'p4 changes -m', each of which requests + one block of changes of the given size. The default block size + is 500, which should usually be suitable. --keep-path:: The mapping of file names from the p4 depot path to Git, by diff --git a/git-p4.py b/git-p4.py index 549022e..1fba3aa 100755 --- a/git-p4.py +++ b/git-p4.py @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = refs/remotes/p4/, silent def originP4BranchesExist(): return gitBranchExists(origin) or gitBranchExists(origin/p4) or gitBranchExists(origin/p4/master) -def p4ChangesForPaths(depotPaths, changeRange): +def p4ChangesForPaths(depotPaths, changeRange, block_size): assert depotPaths -cmd = ['changes'] -for p in depotPaths: -cmd += [%s...%s % (p, changeRange)] -output = p4_read_pipe_lines(cmd) +assert block_size + +# Parse the change range into start and end +if changeRange is None or changeRange == '': +changeStart = '@1' +changeEnd = '#head' +else: +parts = changeRange.split(',') +assert len(parts) == 2 +changeStart = parts[0] +changeEnd = parts[1] +# Accumulate change numbers in a dictionary to avoid duplicates changes = {} -for line in output: -changeNum = int(line.split( )[1]) -changes[changeNum] = True + +for p in depotPaths: +# Retrieve changes a block at a time, to prevent running +# into a MaxScanRows error from the server. +start = changeStart +end = changeEnd +get_another_block = True +while get_another_block: +new_changes = [] +cmd = ['changes'] +cmd += ['-m', str(block_size)] +cmd += [%s...%s,%s % (p, start, end)] +for line in p4_read_pipe_lines(cmd): +changeNum = int(line.split( )[1]) +new_changes.append(changeNum) +changes[changeNum] = True +if len(new_changes) == block_size: +get_another_block = True +end = '@' + str(min(new_changes)) +else: +get_another_block = False changelist = changes.keys() changelist.sort() @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap): optparse.make_option(--import-labels, dest=importLabels, action=store_true), optparse.make_option(--import-local, dest=importIntoRemotes, action=store_false, help=Import into refs/heads/ , not refs/remotes), -optparse.make_option(--max-changes, dest=maxChanges), +optparse.make_option(--max-changes, dest=maxChanges, + help=Maximum number of changes to import), +optparse.make_option(--changes-block-size, dest=changes_block_size, type=int, + help=Internal block size to use when iteratively calling p4 changes), optparse.make_option(--keep-path,
Re: [PATCH 0/3] Another approach to large transactions
On Fri, Apr 17, 2015 at 3:17 PM, Stefan Beller sbel...@google.com wrote: On Fri, Apr 17, 2015 at 3:12 PM, Junio C Hamano gits...@pobox.com wrote: This is now pushed out and sitting at the tip of 'pu'. It seems to break one of the tests in 1400 when merged to 'next', but I didn't look it closely. Thanks. ok, I'll look more closely. Apparently I screwed up even before sending the patches over the wire. not ok 144 - large transaction deleting branches does not burst open file limit fails in my local branch as well as origin/pu as well on origin/sb/remove-fd-from-ref-lock So there is a pretty strong argument, the code is only improving large transaction creating branches and not deleting branches. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Another approach to large transactions
Junio C Hamano gits...@pobox.com writes: Stefan Beller sbel...@google.com writes: * We keep the speed on small transactions (no close and reopen of fds in small transactions) * No refactoring for refs included, only minimally invasive to the refs.c code * applies on top of origin/sb/remove-fd-from-ref-lock replacing the last commit there (I reworded the commit message of the last patch of that tip, being the first patch in this series) * another approach would be to move the fd counting into the lock file api, I think that's not worth it for now. I agree that it is a good direction to go to limit the number of open file descriptors. Overall it looked good to me. This is now pushed out and sitting at the tip of 'pu'. It seems to break one of the tests in 1400 when merged to 'next', but I didn't look it closely. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/4] UTF8 BOM follow-up
Am 16.04.2015 um 20:39 schrieb Junio C Hamano: This is on top of the .gitignore can start with UTF8 BOM patch from Carlos. Second try; the first patch is new to clarify the logic in the codeflow after Carlos's patch, and the second one has been adjusted accordingly. Junio C Hamano (4): add_excludes_from_file: clarify the bom skipping logic utf8-bom: introduce skip_utf8_bom() helper config: use utf8_bom[] from utf.[ch] in git_parse_source() attr: skip UTF8 BOM at the beginning of the input file Wouldn't it be better to just strip the BOM on commit, e.g. via a clean filter or pre-commit hook (as suggested in [1])? Or is this patch series only meant to supplement such a solution (i.e. only strip the BOM when reading files from the working-copy rather than the committed tree)? According to rfc3629 chapter 6 [2], the use of a BOM as encoding signature should be forbidden if the encoding is *known* to be always UTF-8. And .gitignore, .gitattributes and .gitmodules contain path names, which are always UTF-8 as of Git for Windows v1.7.10. IOW, allowing a BOM would mean that files *without* BOM are *not* UTF-8 and need to be decoded from e.g. system encoding (which unfortunately cannot be set to UTF-8 on Windows). But this makes no sense as the repository would not be portable. E.g. a .gitattributes file created on a Greek Windows, containing greek path names in Cp1253, would not work on platforms with different encoding. On the other hand, just ignoring the BOM (as this patch series does) leaves us with two alternative binary representations of the same content file...i.e. we'll eventually end up with spurious 1st line changes as users add / remove BOMs from committed .git[ignore|attributes|modules] files, depending on their editor preference... For local files (.gitconfig, .git/info/exclude, .git/COMMIT_EDITMSG...), auto-detecting encoding based on the presence of a BOM makes somewhat more sense. However, this will most likely break editors that follow the recommendation of the Unicode specification (Use of a BOM is neither required nor recommended for UTF-8 [3]). So we'd probably need a core.editorEncoding or core.editorUseBom setting to tell git whether no BOM means UTF-8 or system encoding... Just as a reminder: we should update the Git for Windows Unicode document [4] if we improve support for BOM-adamant editors. Cheers, Karsten [1] http://stackoverflow.com/questions/27223985/git-ignore-bom-prevent-git-diff-from-showing-byte-order-mark-changes [2] https://tools.ietf.org/html/rfc3629 [3] http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf p.40 [4] https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#editor -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak karthik@gmail.com wrote: Update sha1_loose_object_info() to optionally allow it to read from a loose object file of unknown/bogus type; as the function usually returns the type of the object it read in the form of enum for known types, add an optional typename field to receive the name of the type in textual form and a flag to indicate the reading of a loose object file of unknown/bogus type. [...] --- diff --git a/sha1_file.c b/sha1_file.c index 980ce6b..267399d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, } static int sha1_loose_object_info(const unsigned char *sha1, - struct object_info *oi) + struct object_info *oi, + int flags) { - int status; - unsigned long mapsize, size; + int status = 0; + unsigned long mapsize; void *map; git_zstream stream; char hdr[32]; + struct strbuf hdrbuf = STRBUF_INIT; if (oi-delta_base_sha1) hashclr(oi-delta_base_sha1); @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char *sha1, return -1; if (oi-disk_sizep) *oi-disk_sizep = mapsize; - if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 0) - status = error(unable to unpack %s header, - sha1_to_hex(sha1)); - else if ((status = parse_sha1_header(hdr, size)) 0) - status = error(unable to parse %s header, sha1_to_hex(sha1)); - else if (oi-sizep) - *oi-sizep = size; + if ((flags LOOKUP_LITERALLY)) { + if (unpack_sha1_header_to_strbuf(stream, map, mapsize, hdrbuf) 0) + status = error(unable to unpack %s header with --literally, + sha1_to_hex(sha1)); + else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) 0) + status = error(unable to parse %s header with --literally, + sha1_to_hex(sha1)); + } else { + if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 0) + status = error(unable to unpack %s header, + sha1_to_hex(sha1)); + else if ((status = parse_sha1_header_extended(hdr, oi, flags)) 0) + status = error(unable to parse %s header, sha1_to_hex(sha1)); + } git_inflate_end(stream); munmap(map, mapsize); - if (oi-typep) + if (status oi-typep) *oi-typep = status; + if (hdrbuf.buf) + strbuf_release(hdrbuf); Why is strbuf_release() protected by a conditional rather than being called unconditionally? Am I missing something obvious? return 0; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] t1006: add tests for git cat-file --literally
On 04/18/2015 05:30 AM, Eric Sunshine wrote: On Wed, Apr 15, 2015 at 1:00 PM, Karthik Nayak karthik@gmail.com wrote: Signed-off-by: Karthik Nayak karthik@gmail.com --- diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index ab36b1e..61fab78 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -296,4 +308,21 @@ test_expect_success '%(deltabase) reports packed delta bases' ' } ' +bogus_type=bogus +bogus_content=bogus +bogus_size=$(strlen $bogus_content) If someone ever changes the value of 'bogus_content' so it contains whitespace, then the result of strlen() will be incorrect as you've invoked it. You should quote its argument (as other callers in this script do) to safeguard against such an issue. bogus_size=$(strlen $bogus_content) +bogus_sha1=$(printf $bogus_content | git hash-object -t $bogus_type --literally -w --stdin) Ditto regarding quoting of 'bogus_content'. Also, if someone ever modifies 'bogus_content' so that it contains a literal '%' (such as %s), then your printf() invocation will misbehave. Either call it like this: $(printf '%s' $bogus_content | ...) or, better yet, call echo_without_newline() as other similar code elsewhere in this script does, and as suggested earlier[1]. [1]: http://article.gmane.org/gmane.comp.version-control.git/266972/ +test_expect_success Type of broken object is correct ' + echo $bogus_type expect + git cat-file -t --literally $bogus_sha1 actual + test_cmp expect actual +' + +test_expect_success Size of broken object is correct ' +echo $bogus_size expect Bad indentation. Use tab rather than spaces. + git cat-file -s --literally $bogus_sha1 actual + test_cmp expect actual +' + test_done -- 2.4.0.rc1.249.gb598846 Thanks Eric for the changes. I did echo -n at the beginning but that gave me a warning and asked me to use printf instead. I'll use echo_without_newline, Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
If it's not a problem, I'd love to see timings for your case with just the first patch, and then with both. Thanks for the swift response, much appreciated Jeff! Here are the timings for the two patches: Patch 1 on top of 33d4221c79 Elapsed System User Min. :6.110 Min. :0.6700 Min. :0.3600 1st Qu.:6.580 1st Qu.:0.6900 1st Qu.:0.3900 Median :7.260 Median :0.7100 Median :0.4100 Mean :7.347 Mean :0.7248 Mean :0.4214 3rd Qu.:8.000 3rd Qu.:0.7400 3rd Qu.:0.4600 Max. :8.860 Max. :0.8700 Max. :0.5100 I've had to slightly tweak your second patch (`freshened` was never set) but applying the modified patch yielded even better results compared to patch 1: Elapsed System User Min. :0.38 Min. :0.03000 Min. :0.2900 1st Qu.:0.38 1st Qu.:0.04000 1st Qu.:0.3100 Median :0.39 Median :0.06000 Median :0.3200 Mean :0.43 Mean :0.05667 Mean :0.3519 3rd Qu.:0.42 3rd Qu.:0.07000 3rd Qu.:0.3600 Max. :0.68 Max. :0.08000 Max. :0.5700 This is pretty much back to the before state. The graph really tells the whole story: https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png (After is the change in #33d4221, Before the parent of #33d4221 and so on) The graph and the NFS stats can be found here: https://bitbucket.org/snippets/ssaasen/GeRE My tweaked version of your second patch is: diff --git a/cache.h b/cache.h index 51ee856..8982055 100644 --- a/cache.h +++ b/cache.h @@ -1168,6 +1168,7 @@ extern struct packed_git { int pack_fd; unsigned pack_local:1, pack_keep:1, + freshened:1, do_not_close:1; unsigned char sha1[20]; /* something like .git/objects/pack/x.pack */ diff --git a/sha1_file.c b/sha1_file.c index bc6322e..c0ccd4b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1) static int freshen_packed_object(const unsigned char *sha1) { struct pack_entry e; - return find_pack_entry(sha1, e) freshen_file(e.p-pack_name); + if (!find_pack_entry(sha1, e)) + return 0; + if (e.p-freshened) + return 1; + return e.p-freshened = freshen_file(e.p-pack_name); } int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) The only change is that I assign the result of `freshen_file` to the `freshened` flag. I've only ran this with the test case I was using before but it looks like this is pretty much fixing the merge time changes we observed. Thanks again for the swift response. I've got my test setup sitting here, happy to rerun the tests if that'd be useful. Is there a chance to backport those changes to the 2.2+ branches? You may also be interested in: http://thread.gmane.org/gmane.comp.version-control.git/266370 which addresses another performance problem related to the freshen/recent code in v2.2. Thanks for the pointer, I'll have a look at that as well. Cheers, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] pathspec: adjust prefixlen after striping trailing slash
A path(spec) from git perspective consists of two parts, the prefix, and the rest. The prefix covers the part of `pwd` after expanding .. components. The split is to support case-insensitive match in a sane way (see 93d9353, especially the big comment block added in dir.c). Normally the prefix is found after prefix_path_gently() and that will be it. Unfortunately the *STRIP_SUBMODULE* flags can strip the trailing slash (see 2ce53f9 for the reason) and cut into this prefix part. In the test, the prefix is submodule/, but the final path is just submodule. We need to readjust prefix info when this happens. The assert() that catches this is turned to die() to make sure it's always active. prefix_pathspec() is not in any critical path to be a performance concern because of this change. 93d9353 (parse_pathspec: accept :(icase)path syntax - 2013-07-14) 2ce53f9 (git add: do not add files from a submodule - 2009-01-02) Noticed-by: djanos_ (via irc) Helped-by: Dennis Kaarsemaker den...@kaarsemaker.net Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- On Fri, Apr 17, 2015 at 2:27 AM, Jens Lehmann jens.lehm...@web.de wrote: The problem seems to be that subrepo is still registered as a submodule in the index. But we should see a proper error message instead of an assert in that case ... CCed Duy who knows much more about pathspec.c than me. This deals with the bug in pathspec code. I leave it to you to decide how git-add should do when a submodule is registered in index, but it's no longer a submodule in worktree. Yeah maybe it should error out. pathspec.c| 18 +++--- t/t3703-add-magic-pathspec.sh | 8 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/pathspec.c b/pathspec.c index 9304ee3..aa5e2c7 100644 --- a/pathspec.c +++ b/pathspec.c @@ -262,7 +262,6 @@ static unsigned prefix_pathspec(struct pathspec_item *item, } else item-original = elt; item-len = strlen(item-match); - item-prefix = prefixlen; if ((flags PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP) (item-len = 1 item-match[item-len - 1] == '/') @@ -292,6 +291,15 @@ static unsigned prefix_pathspec(struct pathspec_item *item, elt, ce_len, ce-name); } + /* +* Adjust prefixlen if the above trailing slash stripping cuts +* into the prefix part +*/ + if ((flags (PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP | + PATHSPEC_STRIP_SUBMODULE_SLASH_EXPENSIVE)) + prefixlen item-len) + prefixlen = item-len; + if (magic PATHSPEC_LITERAL) item-nowildcard_len = item-len; else { @@ -299,6 +307,7 @@ static unsigned prefix_pathspec(struct pathspec_item *item, if (item-nowildcard_len prefixlen) item-nowildcard_len = prefixlen; } + item-prefix = prefixlen; item-flags = 0; if (magic PATHSPEC_GLOB) { /* @@ -313,8 +322,11 @@ static unsigned prefix_pathspec(struct pathspec_item *item, } /* sanity checks, pathspec matchers assume these are sane */ - assert(item-nowildcard_len = item-len - item-prefix = item-len); + if (!(item-nowildcard_len = item-len + item-prefix = item-len)) + die(BUG: item-nowildcard_len (%d) or item-prefix (%d) +is longer than item-len (%d), + item-nowildcard_len, item-prefix, item-len); return magic; } diff --git a/t/t3703-add-magic-pathspec.sh b/t/t3703-add-magic-pathspec.sh index 5115de7..cced8c4 100755 --- a/t/t3703-add-magic-pathspec.sh +++ b/t/t3703-add-magic-pathspec.sh @@ -55,4 +55,12 @@ test_expect_success COLON_DIR 'a file with the same (short) magic name exists' ' git add -n ./:/bar ' +test_expect_success 'prefix is updated after trailing slash is stripped' ' + git init submodule + ( cd submodule test_commit test ) + git add submodule + mv submodule/.git submodule/dotgit + ( cd submodule git add . ) +' + test_done -- 2.3.0.rc1.137.g477eb31 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] t1006: add tests for git cat-file --literally
On Wed, Apr 15, 2015 at 1:00 PM, Karthik Nayak karthik@gmail.com wrote: Signed-off-by: Karthik Nayak karthik@gmail.com --- diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index ab36b1e..61fab78 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -296,4 +308,21 @@ test_expect_success '%(deltabase) reports packed delta bases' ' } ' +bogus_type=bogus +bogus_content=bogus +bogus_size=$(strlen $bogus_content) If someone ever changes the value of 'bogus_content' so it contains whitespace, then the result of strlen() will be incorrect as you've invoked it. You should quote its argument (as other callers in this script do) to safeguard against such an issue. bogus_size=$(strlen $bogus_content) +bogus_sha1=$(printf $bogus_content | git hash-object -t $bogus_type --literally -w --stdin) Ditto regarding quoting of 'bogus_content'. Also, if someone ever modifies 'bogus_content' so that it contains a literal '%' (such as %s), then your printf() invocation will misbehave. Either call it like this: $(printf '%s' $bogus_content | ...) or, better yet, call echo_without_newline() as other similar code elsewhere in this script does, and as suggested earlier[1]. [1]: http://article.gmane.org/gmane.comp.version-control.git/266972/ +test_expect_success Type of broken object is correct ' + echo $bogus_type expect + git cat-file -t --literally $bogus_sha1 actual + test_cmp expect actual +' + +test_expect_success Size of broken object is correct ' +echo $bogus_size expect Bad indentation. Use tab rather than spaces. + git cat-file -s --literally $bogus_sha1 actual + test_cmp expect actual +' + test_done -- 2.4.0.rc1.249.gb598846 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html