date:20150417

On Fri, Mar 27, 2015 at 12:00:05PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  It is possible that we may drop an object that is depended
  upon by another object in the alternate. For example,
  imagine two repositories, A and B, with A pointing to B as
  an alternate. Now imagine a commit that is in B which
  references a tree that is only in A. Traversing from recent
  objects in B might prevent A from dropping that tree. But
  this case isn't worth covering. Repo B should take
  responsibility for its own objects. It would never have had
  the commit in the first place if it did not also have the
  tree, and assuming it is using the same keep recent chunks
  of history scheme, then it would itself keep the tree, as
  well.
 
 In other words, if you have a loop in dependency chain among
 alternate repositories, your set-up is broken by definition.
 
 Which makes sense to me.
 
 Thanks.

I don't see this patch in pu or What's Cooking at all. Did it get
dropped?

It does fix a performance regression, but the problem is in v2.2, so I
don't think it's urgent for v2.4-rc.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available

On Thu, Apr 16, 2015 at 5:01 AM, Jeff King p...@peff.net wrote:
 We spend a lot of time in strbuf_getwholeline in a tight
 loop reading characters from a stdio handle into a buffer.
 The libc getdelim() function can do this for us with less
 overhead. It's in POSIX.1-2008, and was a GNU extension
 before that. Therefore we can't rely on it, but can fall
 back to the existing getc loop when it is not available.

 The HAVE_GETDELIM knob is turned on automatically for Linux,
 where we have glibc. We don't need to set any new
 feature-test macros, because we already define _GNU_SOURCE.
 Other systems that implement getdelim may need to other
 macros (probably _POSIX_C_SOURCE = 200809L), but we can
 address that along with setting the Makefile knob after
 testing the feature on those systems.
 [...]

 Based on a patch from Rasmus Villemoes r...@rasmusvillemoes.dk.

 Signed-off-by: Jeff King p...@peff.net
 ---
 If somebody has a FreeBSD or OS X system to test on, I'd
 love to see what is needed to compile with HAVE_GETDELIM
 there.

Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine
with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from
2009. It does not have getdelim(). Unfortunately, I haven't been able
to determine when getdelim() was introduced on the Mac OS X, thus have
been unable to craft a simple rule for config.mak.uname.

 And to confirm that the performance is much better.
 Sharing my 1.6GB packed-refs file would be hard, but you
 should be able to generate something large and ridiculous.
 I'll leave that as an exercise to the reader.

 diff --git a/Makefile b/Makefile
 index 5f3987f..36655d5 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -359,6 +359,8 @@ all::
  # compiler is detected to support it.
  #
  # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl 
 function.
 +#
 +# Define HAVE_GETDELIM if your system has the getdelim() function.

  GIT-VERSION-FILE: FORCE
 @$(SHELL_PATH) ./GIT-VERSION-GEN
 @@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL
 BASIC_CFLAGS += -DHAVE_BSD_SYSCTL
  endif

 +ifdef HAVE_GETDELIM
 +   BASIC_CFLAGS += -DHAVE_GETDELIM
 +endif
 +
  ifeq ($(TCLTK_PATH),)
  NO_TCLTK = NoThanks
  endif
 diff --git a/config.mak.uname b/config.mak.uname
 index f4e77cb..d26665f 100644
 --- a/config.mak.uname
 +++ b/config.mak.uname
 @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux)
 HAVE_DEV_TTY = YesPlease
 HAVE_CLOCK_GETTIME = YesPlease
 HAVE_CLOCK_MONOTONIC = YesPlease
 +   HAVE_GETDELIM = YesPlease
  endif
  ifeq ($(uname_S),GNU/kFreeBSD)
 HAVE_ALLOCA_H = YesPlease
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fast-import: add options to enable/disable case folding

2015-04-17 Thread Mike Hommey

Currently, fast-import does case folding depending on `core.ignorecase`.
`core.ignorecase` depends on the file system where the working tree is.
However, different kind of imports require different kinds of semantics,
and they usually aren't tied with the file system, but with the data being
imported.

Add command line options to enable or disable case folding. Also expose
them as features in the fast-import stream. Features instead of options,
because a stream that needs case folding enabled or disabled won't work
as expected if fast-import doesn't support the case folding options.
---
 Documentation/git-fast-import.txt | 11 ++
 fast-import.c | 19 --
 t/t9300-fast-import.sh| 79 +++
 3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 690fed3..22eba87 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -50,6 +50,13 @@ OPTIONS
memory used by fast-import during this run.  Showing this output
is currently the default, but can be disabled with \--quiet.
 
+--[no-]fold-case::
+   When files/directories with the same name but a different case
+   are detected, they are treated as the same (--fold-case) or as
+   being different (--no-fold-case). The default is --fold-case
+   when `core.ignorecase` is set to `true`, and --no-fold-case when
+   it is `false`.
+
 Options for Frontends
 ~
 
@@ -1027,6 +1034,8 @@ date-format::
 export-marks::
 relative-marks::
 no-relative-marks::
+fold-case::
+no-fold-case::
 force::
Act as though the corresponding command-line option with
a leading '--' was passed on the command line
@@ -1091,6 +1100,8 @@ not be passed as option:
 * import-marks
 * export-marks
 * cat-blob-fd
+* fold-case
+* no-fold-case
 * force
 
 `done`
diff --git a/fast-import.c b/fast-import.c
index 6378726..958f3da 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -371,10 +371,18 @@ static volatile sig_atomic_t checkpoint_requested;
 /* Where to write output of cat-blob commands */
 static int cat_blob_fd = STDOUT_FILENO;
 
+/* Whether to enable case folding */
+static int fold_case;
+
 static void parse_argv(void);
 static void parse_cat_blob(const char *p);
 static void parse_ls(const char *p, struct branch *b);
 
+static int strncmp_foldcase(const char *a, const char *b, size_t count)
+{
+   return fold_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
+}
+
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
fprintf(rpt, %s:\n, b-name);
@@ -1507,7 +1515,7 @@ static int tree_content_set(
t = root-tree;
for (i = 0; i  t-entry_count; i++) {
e = t-entries[i];
-   if (e-name-str_len == n  !strncmp_icase(p, 
e-name-str_dat, n)) {
+   if (e-name-str_len == n  !strncmp_foldcase(p, 
e-name-str_dat, n)) {
if (!*slash1) {
if (!S_ISDIR(mode)
 e-versions[1].mode == mode
@@ -1597,7 +1605,7 @@ static int tree_content_remove(
t = root-tree;
for (i = 0; i  t-entry_count; i++) {
e = t-entries[i];
-   if (e-name-str_len == n  !strncmp_icase(p, 
e-name-str_dat, n)) {
+   if (e-name-str_len == n  !strncmp_foldcase(p, 
e-name-str_dat, n)) {
if (*slash1  !S_ISDIR(e-versions[1].mode))
/*
 * If p names a file in some subdirectory, and a
@@ -1664,7 +1672,7 @@ static int tree_content_get(
t = root-tree;
for (i = 0; i  t-entry_count; i++) {
e = t-entries[i];
-   if (e-name-str_len == n  !strncmp_icase(p, 
e-name-str_dat, n)) {
+   if (e-name-str_len == n  !strncmp_foldcase(p, 
e-name-str_dat, n)) {
if (!*slash1)
goto found_entry;
if (!S_ISDIR(e-versions[1].mode))
@@ -3246,6 +3254,10 @@ static int parse_one_feature(const char *feature, int 
from_stream)
relative_marks_paths = 1;
} else if (!strcmp(feature, no-relative-marks)) {
relative_marks_paths = 0;
+   } else if (!strcmp(feature, fold-case)) {
+   fold_case = 1;
+   } else if (!strcmp(feature, no-fold-case)) {
+   fold_case = 0;
} else if (!strcmp(feature, done)) {
require_explicit_termination = 1;
} else if (!strcmp(feature, force)) {
@@ -3372,6 +3384,7 @@ int main(int argc, char **argv)
avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct 
avail_tree_content*));
marks = pool_calloc(1, sizeof(struct mark_set));
 
+   fold_case = ignore_case;
global_argc = argc;
global_argv = argv;
 
diff --git

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

  This _might_ have some performance impact in that strbuf_addch()
  involves strbuf_grow(*, 1), which does does it overflow to
  increment sb-len by one?; I would say it should be unmeasurable
  because the function is expected to be used only on loose objects
  and you shouldn't have very many of them without packing in your
  repository in the first place.
  
  I guess Peff's c1822d4f (strbuf: add an optimized 1-character
  strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
  new strbuf_grow_ch(), and once that happens the performance worry
  would disappear without this code to be changed at all.
 
 I haven't re-rolled that series yet, but the discussion there showed
 that strbuf_grow_ch() is unnecessary; call-sites can just check:
 
   if (!strbuf_avail(sb))
   strbuf_grow(sb, 1);
 
 to get the fast inline check. Since we go to the trouble to inline
 strbuf_addch, we should probably teach it the same trick. It would be
 nice to identify a place with a tight strbuf_addch() loop that could
 demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

  http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?

-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
is overly lax. It does a strncmp() with the length of the candidate
name, but does not check that we consumed all of the matching name.
So tr would match tree, comm would match commit, and so
forth.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:

 The merge is created in a temporary location that uses alternates. The
 temporary repository is on a local disk, the alternate object database
 on an NFS mount.

Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.

 My current hypothesis is that the additional `access`, but more
 importantly the additional `utime` calls are responsible in the
 increased merge times that we see.

Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
-   if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+   if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }

I think that should clear up the access() calls, but leave the utime()
ones.

 Looking at the detailed strace shows that utime will be called
 repeatedly in same cases (e.g.
 https://bitbucket.org/snippets/ssaasen/oend shows an example where the
 same packfile will be updated more than 4000 times in a single merge).
 
 http://www.spinics.net/lists/git/msg240106.html discusses a potential
 improvement for this case. Would that be an acceptable avenue to
 improve this situation?

I think so. Here's a tentative patch:

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
 pack_keep:1,
+freshened:1,
 do_not_close:1;
unsigned char sha1[20];
/* something like .git/objects/pack/x.pack */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char 
*sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
struct pack_entry e;
-   return find_pack_entry(sha1, e)  freshen_file(e.p-pack_name);
+   if (!find_pack_entry(sha1, e))
+   return 0;
+   if (e.p-freshened)
+   return 1;
+   return freshen_file(e.p-pack_name);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, 
unsigned char *returnsha1)


If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.

You may also be interested in:

  http://thread.gmane.org/gmane.comp.version-control.git/266370

which addresses another performance problem related to the
freshen/recent code in v2.2.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] rev-list-options.txt: complete sentence about notes matching

2015-04-17 Thread Michael J Gruber

Signed-off-by: Michael J Gruber g...@drmicha.warpmail.net
---
 Documentation/rev-list-options.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index f620ee4..77ac439 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -59,8 +59,8 @@ endif::git-rev-list[]
matches any of the given patterns are chosen (but see
`--all-match`).
 +
-When `--show-notes` is in effect, the message from the notes as
-if it is part of the log message.
+When `--show-notes` is in effect, the message from the notes is
+matched as if it were part of the log message.
 
 --all-match::
Limit the commits output to ones that match all given `--grep`,
-- 
2.4.0.rc2.251.gab67463

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?

2015-04-17 Thread Michael J Gruber

Tim Friske venit, vidit, dixit 17.04.2015 12:00:
 Hi,
 
 I wonder why  git log -Gregexp works with the regexp-ignore-case
 option but not with the other regexp-related options? Wouldn't it be
 useful to make the Gregex option support the following options?
 
   * basic-regexp
   * extended-regexp
   * fixed-strings
   * perl-regexp
 
 Similarly I think it is not very consistent that one cannot combine any of
 the above options with the Sstring but instead have yet another option
 called pickaxe-regex to toggle between fixed-string and
 extended-regexp semantics for the argument passed to option S.

The defaults are different, and it is likely that users want to switch
one without switching the other.

E.g., with -S you often use strings that you'd rather not have to quote
to guard them against the regexp engine.

 The description of the above options in the git-log(1) manpage of Git
 version 2.1 do not explicitly say that they do not support the Gregex
 and Sstring option.

They are in different sections, since --grep etc. are log options
pertaining to matching the commit header and log message (commit
object), while S and G match in the diff and are described in the diff
section (although they are commit limitting as well).

 Wouldn't it be nice to have all of the above options collaborate with each
 other?

I'm afraid it's important to keep the different defaults.

Personally, I found it surprising that --regexp-ignore-case applies to
-G at all. It turns out that it was bolted on retroactively - it used
to apply to commit object greps only, and was made to switch also diff
grep behaviour later, as a convenience matter. The reason probaly is
that -S originally was directed at script usage and turned out to be
used by end users quite a bit.

I'd say most of our inconsistencies are due to convenience...

If you want to work on this, I suggest you introduce the missing long
option names such as --grep-diff (-G) and maybe --grep-log (--grep)
first and then find consistent and convenient names and defaults for the
regexp options.

Michael
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix settings in default_user_config template

On Fri, Apr 17, 2015 at 05:50:10PM +0300, Ossi Herrala wrote:

 The name (not user) and email setting should be in config section
 user and not in core as documented in Documentation/config.txt.

facepalm Well, those probably weren't helping any new users, then,
were they? :)

 diff --git a/builtin/config.c b/builtin/config.c
 index d32c532..bfd3016 100644
 --- a/builtin/config.c
 +++ b/builtin/config.c
 @@ -455,9 +455,9 @@ static char *default_user_config(void)
   struct strbuf buf = STRBUF_INIT;
   strbuf_addf(buf,
   _(# This is Git's per-user configuration file.\n
 -   [core]\n
 +   [user]\n
 # Please adapt and uncomment the following lines:\n
 -   #user = %s\n
 +   #name = %s\n
 #email = %s\n),

Looks obviously correct. Thanks for noticing.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] type_from_string_gently: make sure length matches

When commit fe8e3b7 refactored type_from_string to allow
input that was not NUL-terminated, it switched to using
strncmp instead of strcmp. But this means we check only the
first len bytes of the strings, and ignore any remaining
bytes in the object_type_string. We should make sure that it
is also len bytes, or else we would accept comm as
commit, and so forth.

Signed-off-by: Jeff King p...@peff.net
---
Since the strings we are matching are literals, we could also record
their sizes in the object_type_strings array and check the length first
before even calling strncmp. I doubt this is a performance hot-spot,
though.

You could also potentially just use strlen(object_type_strings[i]), but
I'm not sure if compilers will optimize out the strlen in this case,
since it is in a loop.

 object.c   | 3 ++-
 t/t1007-hash-object.sh | 8 
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/object.c b/object.c
index 23d6c96..980ac5f 100644
--- a/object.c
+++ b/object.c
@@ -41,7 +41,8 @@ int type_from_string_gently(const char *str, ssize_t len, int 
gentle)
len = strlen(str);
 
for (i = 1; i  ARRAY_SIZE(object_type_strings); i++)
-   if (!strncmp(str, object_type_strings[i], len))
+   if (!strncmp(str, object_type_strings[i], len) 
+   object_type_strings[i][len] == '\0')
return i;
 
if (gentle)
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f83df8e..ebb3a69 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -201,4 +201,12 @@ test_expect_success 'corrupt tag' '
test_must_fail git hash-object -t tag --stdin /dev/null
 '
 
+test_expect_success 'hash-object complains about bogus type name' '
+   test_must_fail git hash-object -t bogus --stdin /dev/null
+'
+
+test_expect_success 'hash-object complains about truncated type name' '
+   test_must_fail git hash-object -t bl --stdin /dev/null
+'
+
 test_done
-- 
2.4.0.rc2.384.g7297a4a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix settings in default_user_config template

2015-04-17 Thread Ossi Herrala

The name (not user) and email setting should be in config section
user and not in core as documented in Documentation/config.txt.
---
 builtin/config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index d32c532..bfd3016 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -455,9 +455,9 @@ static char *default_user_config(void)
struct strbuf buf = STRBUF_INIT;
strbuf_addf(buf,
_(# This is Git's per-user configuration file.\n
- [core]\n
+ [user]\n
  # Please adapt and uncomment the following lines:\n
- #user = %s\n
+ #name = %s\n
  #email = %s\n),
ident_default_name(),
ident_default_email());
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories

2015-04-17 Thread erik elfström

On Wed, Apr 15, 2015 at 7:56 PM, Junio C Hamano gits...@pobox.com wrote:
 Erik Elfström erik.elfst...@gmail.com writes:

 Before this change, clean used resolve_gitlink_ref to check for the
 presence of nested git repositories. This had the drawback of creating
 a ref_cache entry for every directory that should potentially be
 cleaned. The linear search through the ref_cache list caused a massive
 performance hit for large number of directories.

 I'd prefer to see the current state described in the current
 tense, e.g.

 git clean uses resolve_gitlink_ref() to check for the presence of
 nested git repositories, but it has the drawback of creating a
 ref_cache entry for every directory that should potentially be
 cleaned. The linear search through the ref_cache list causes a
 massive performance hit for large number of directories.


Yes, that reads better.

 Teach clean.c:remove_dirs to use setup.c:is_git_directory
 instead. is_git_directory will actually open HEAD and parse the HEAD
 ref but this implies a nested git repository and should be rare when
 cleaning.

 I am not sure what you wanted to say in this paragraph.  What does
 it being rare have to do with it?  Even if it is not rare (i.e. the
 top-level project you are working with has many submodules checked
 out without using the more recent a file .git pointing into
 .git/modules/ via 'gitdir: $overThere' mechanism), if we found a
 nested git repository, we treat it as special and exclude it from
 cleaning it out, which is a good thing, no?


I was trying to motivate that the performance of is_git_directory is not a
problem for us even though it opens a file and parses it. I see now when I
read it again that this is not very clear.

 Doesn't this implementation get confused by modern submodule
 checkouts and descend into and clean their working tree, though?
 Module M with path P would have a directory P in the working tree of
 the top-level project, and P/.git is a regular file that will fail
 is_git_directory() test but records the location of the real
 submodule repository i.e. .git/modules/M via the gitdir:
 mechanism.


Yes, there is a problem here. I've added the test below and it fails after
my change by cleaning sub2 (sub1 is not cleaned). Are there more cases
here that I should test for?

+test_expect_success 'should not clean submodules' '
+   rm -fr repo to_clean sub1 sub2 
+   mkdir -p repo to_clean 
+   (
+   cd repo 
+   git init 
+   hello.world
+   git add . 
+   git commit -a -m nested
+   ) 
+   git submodule add ./repo/.git sub1 
+   git commit -m sub1 
+   git branch before_sub2 
+   git submodule add ./repo/.git sub2 
+   git commit -m sub2 
+   git checkout before_sub2 
+   to_clean/should_clean.this 
+   git clean -f -d 
+   test_path_is_file repo/.git/index 
+   test_path_is_file repo/hello.world 
+   test_path_is_file sub1/.git 
+   test_path_is_file sub1/hello.world 
+   test_path_is_file sub2/.git 
+   test_path_is_file sub2/hello.world 
+   test_path_is_missing to_clean
+'

Base on the previous discussion of the patch topic I can see 3 options
for how to fix this:

Option 1:
 Plug the hole in my new is_git_repository function. A quick and dirty
 fix that passes the above test would be:

diff --git a/builtin/clean.c b/builtin/clean.c
index b679913..4f2fe95 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -153,14 +153,21 @@ static int is_git_repository(struct strbuf *path)
if (is_git_directory(path-buf))
ret = 1;
else {
-   size_t orig_path_len = path-len;
-   assert(orig_path_len != 0);
-   if (path-buf[orig_path_len - 1] != '/')
-   strbuf_addch(path, '/');
-   strbuf_addstr(path, .git);
-   if (is_git_directory(path-buf))
-   ret = 1;
-   strbuf_setlen(path, orig_path_len);
+   struct stat st;
+   const char *submodule_git_dir =
git_path_submodule(path-buf, );
+   lstat(submodule_git_dir, st);
+   if (S_ISDIR(st.st_mode))
+   ret = 1;
+   else {
+   size_t orig_path_len = path-len;
+   assert(orig_path_len != 0);
+   if (path-buf[orig_path_len - 1] != '/')
+   strbuf_addch(path, '/');
+   strbuf_addstr(path, .git);
+   if (is_git_directory(path-buf))
+   ret = 1;
+   strbuf_setlen(path, orig_path_len);
+   }
}

return ret;

There are probably more elegant solutions available here, suggestions
welcome.

Option 2:
 Go with the current solution of using resolve_gitlink_ref but either
 A) avoid placing non-submodule paths in the ref_cache

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread karthik nayak




On 04/17/2015 07:53 PM, Jeff King wrote:

On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

 This _might_ have some performance impact in that strbuf_addch()
 involves strbuf_grow(*, 1), which does does it overflow to
 increment sb-len by one?; I would say it should be unmeasurable
 because the function is expected to be used only on loose objects
 and you shouldn't have very many of them without packing in your
 repository in the first place.

 I guess Peff's c1822d4f (strbuf: add an optimized 1-character
 strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
 new strbuf_grow_ch(), and once that happens the performance worry
 would disappear without this code to be changed at all.

 I haven't re-rolled that series yet, but the discussion there showed
 that strbuf_grow_ch() is unnecessary; call-sites can just check:

if (!strbuf_avail(sb))
 strbuf_grow(sb, 1);

 to get the fast inline check. Since we go to the trouble to inline
 strbuf_addch, we should probably teach it the same trick. It would be
 nice to identify a place with a tight strbuf_addch() loop that could
 demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

   http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?


Yes, we could, that would eliminate  struct strbuf typename =
STRBUF_INIT.

Something like this perhaps :

@@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream
*stream, void *buffer, unsigned long s
   * too permissive for what we want to check. So do an anal
   * object header parse by hand.
   */
-int parse_sha1_header(const char *hdr, unsigned long *sizep)
+static int parse_sha1_header_extended(const char *hdr, struct
object_info *oi,
+  unsigned int flags)
  {
-   char type[10];
-   int i;
+   const char *buf = hdr;
 unsigned long size;
+   int type, len = 0;

 /*
-* The type can be at most ten bytes (including the
-* terminating '\0' that we add), and is followed by
+* The type can be of any size but is followed by
  * a space.
  */
-   i = 0;
 for (;;) {
 char c = *hdr++;
 if (c == ' ')
 break;
-   type[i++] = c;
-   if (i = sizeof(type))
-   return -1;
+   len++;
 }
-   type[i] = 0;
+
+   type = type_from_string_gently(buf, len, 1);
+   if (oi-typename) {
+   strbuf_add(oi-typename, buf, len);
+   strbuf_addch(oi-typename, '\0');
+   }
+   /*
+* Set type to 0 if its an unknown object and
+* we're obtaining the type using '--literally'
+* option.
+*/
+   if ((flags  LOOKUP_LITERALLY)  (type == -1))
+   type = 0;
+   else if (type == -1)
+   die(invalid object type);
+   if (oi-typep)
+   *oi-typep = type;

 /*
  * The length must follow immediately, and be in canonical



-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
 is overly lax. It does a strncmp() with the length of the candidate
 name, but does not check that we consumed all of the matching name.
 So tr would match tree, comm would match commit, and so
 forth.


Thanks for the patch re-roll on strbuf_addch() and the patch on
type_from_string_gently().
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dear friend,

2015-04-17 Thread anthonypeters

Dear friend,

Please do accept my apologies as I do not wish to invade into your
privacy, I had written an earlier mail to you but without response. My
name is Coffi Kelly, I am a lawyer by profession based in Lome-Togo. I
need your urgent assistance for the claiming of the sum of US$9.2
Million belonging to my(late client) , Which he deposited in a bank in
Togo. He was my client and a major contract supplier to oil firms here
in Togo.

I want you to stand as the next of kin for the fund because my late
client died in a motor accident with his immediate family without
leaving a will or anybody to be presented to the bank as the next of
kin of the fund.

Indicate your interest by Writing back directly to my Email:
( eaaqu...@gmail.com) for more details due to the mandate given to
me by the bank management. As you write include your phone number and
email address for more details clarification and the way forward to
the claim.

I wait your response soonest.

Best regards,

Coffi Kelly,
GOD BLESS
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote:

 But now we use type_from_string_gently, which can accept a length[1]. So
 we could just count the bytes to the first space and pass the original
 buffer along with that length, no?
 
 Yes, we could, that would eliminate  struct strbuf typename =
 STRBUF_INIT.
 
 Something like this perhaps :

Yeah, this is exactly what I had in mind.

   {
 -   char type[10];
 -   int i;
 +   const char *buf = hdr;
  unsigned long size;
 +   int type, len = 0;

Maybe switch the names of buf and len to type_buf and type_len
to make their purpose more clear?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?

Michael J Gruber g...@drmicha.warpmail.net writes:

 Similarly I think it is not very consistent that one cannot combine any of
 the above options with the Sstring but instead have yet another option
 called pickaxe-regex to toggle between fixed-string and
 extended-regexp semantics for the argument passed to option S.

 The defaults are different, and it is likely that users want to switch
 one without switching the other.

 E.g., with -S you often use strings that you'd rather not have to quote
 to guard them against the regexp engine.

But the hypothetical -G that would look for a fixed string would be
vastly different from -S, wouldn't it?

The -Sstring option was invented to find a commit where one side
of the comparison has that string in the blob and the other side
does not; it shows commits where string appears different number
of times in the before- and the after- blobs, because doing so does
not hurt its primary use case to find commits where one side has one
instance of string and the other side has zero.

But -Gregexp shows commits whose git show $that_commit output
would have lines matching regexp as added or deleted.  So you get
different results from this history:

(before)(after)
a   b  
b   a  
c   c 

As git show for such a commit looks like this:

diff --git a/one b/one
index de98044..0c81c28 100644
--- a/one
+++ b/one
@@ -1,3 +1,3 @@
-a
 b
+a
 c

git log -Ga would say it is a match.  But from git log -Sa's
point of view, it is not a match; both sides have the same number of
'a' [*1*].

I think it would make sense to teach --fixed-strings or whatever
option to -G just like it pays attention to ignore-case, but -G
--fixed-strings cannot be -S.  They have different semantics.


[Footnote]

*1* This is because -S was envisioned as (and its behaviour has been
maintained as such) a building block for Porcelain that does
more than git blame.  You feed a _unique_ block of lines taken
from the current contents as the string to quickly find the
last commit that touched that area, and iteratively dig deeper.
The -S option was meant to be used for that single step of
digging, as a part of much more grand vision in $gmane/217,
which I would still consider one of the most important messages
on the mailing list, posted 10 years ago ;-)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] reachable: only mark local objects as recent

Jeff King p...@peff.net writes:

 On Fri, Mar 27, 2015 at 12:00:05PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  It is possible that we may drop an object that is depended
  upon by another object in the alternate. For example,
  imagine two repositories, A and B, with A pointing to B as
  an alternate. Now imagine a commit that is in B which
  references a tree that is only in A. Traversing from recent
  objects in B might prevent A from dropping that tree. But
  this case isn't worth covering. Repo B should take
  responsibility for its own objects. It would never have had
  the commit in the first place if it did not also have the
  tree, and assuming it is using the same keep recent chunks
  of history scheme, then it would itself keep the tree, as
  well.
 
 In other words, if you have a loop in dependency chain among
 alternate repositories, your set-up is broken by definition.
 
 Which makes sense to me.
 
 Thanks.

 I don't see this patch in pu or What's Cooking at all. Did it get
 dropped?

It appears that way (rather, never picked up).  Thanks for
reminding.





 It does fix a performance regression, but the problem is in v2.2, so I
 don't think it's urgent for v2.4-rc.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?

Michael J Gruber g...@drmicha.warpmail.net writes:

 Similarly I think it is not very consistent that one cannot combine any of
 the above options with the Sstring but instead have yet another option
 called pickaxe-regex to toggle between fixed-string and
 extended-regexp semantics for the argument passed to option S.

 The defaults are different, and it is likely that users want to switch
 one without switching the other.

 E.g., with -S you often use strings that you'd rather not have to quote
 to guard them against the regexp engine.

But the hypothetical -G that would look for a fixed string would be
vastly different from -S, wouldn't it?

The -Sstring option was invented to find a commit where one side
of the comparison has that string in the blob and the other side
does not; it shows commits where string appears different number
of times in the before- and the after- blobs, because doing so does
not hurt its primary use case to find commits where one side has one
instance of string and the other side has zero.

But -Gregexp shows commits whose git show $that_commit output
would have lines matching regexp as added or deleted.  So you get
different results from this history:

(before)(after)
a   b  
b   a  
c   c 

As git show for such a commit looks like this:

diff --git a/one b/one
index de98044..0c81c28 100644
--- a/one
+++ b/one
@@ -1,3 +1,3 @@
-a
 b
+a
 c

git log -Ga would say it is a match.  But from git log -Sa's
point of view, it is not a match; both sides have the same number of
'a' [*1*].

I think it would make sense to teach --fixed-strings or whatever
option to -G just like it pays attention to ignore-case, but -G
--fixed-strings cannot be -S.  They have different semantics.


[Footnote]

*1* This is because -S was envisioned as (and its behaviour has been
maintained as such) a building block for Porcelain that does
more than git blame.  You feed a _unique_ block of lines taken
from the current contents as the string to quickly find the
last commit that touched that area, and iteratively dig deeper.
The -S option was meant to be used for that single step of
digging, as a part of much more grand vision in $gmane/217,
which I would still consider one of the most important messages
on the mailing list, posted 10 years ago ;-)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Another approach to large transactions

Stefan Beller sbel...@google.com writes:

 * We keep the speed on small transactions 
   (no close and reopen of fds in small transactions)

 * No refactoring for refs included, only minimally invasive to the refs.c code

 * applies on top of origin/sb/remove-fd-from-ref-lock replacing the last
   commit there (I reworded the commit message of the last patch of that tip,
   being the first patch in this series)
   
 * another approach would be to move the fd counting into the lock file api,
   I think that's not worth it for now.

I agree that it is a good direction to go to limit the number of
open file descriptors.  Overall it looked good to me.




 Stefan Beller (3):
   refs.c: remove lock_fd from struct ref_lock
   Move unsigned int get_max_fd_limit(void) to git_compat_util.h
   refs.c: enable large transactions

  git-compat-util.h |  1 +
  refs.c| 28 ++--
  sha1_file.c   | 41 -
  t/t1400-update-ref.sh |  4 ++--
  wrapper.c | 41 +
  5 files changed, 62 insertions(+), 53 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Why does git log -Gregex works with regexp-ignore-case but not with other regexp-related options?

On Fri, Apr 17, 2015 at 7:26 AM, Michael J Gruber
g...@drmicha.warpmail.net wrote:

 Similarly I think it is not very consistent that one cannot combine any of
 the above options with the Sstring but instead have yet another option
 called pickaxe-regex to toggle between fixed-string and
 extended-regexp semantics for the argument passed to option S.

 The defaults are different, and it is likely that users want to switch
 one without switching the other.

 E.g., with -S you often use strings that you'd rather not have to quote
 to guard them against the regexp engine.

But the hypothetical -G that would look for a fixed string would be
vastly different from -S, wouldn't it?

The -Sstring option was invented to find a commit where one side
of the comparison has that string in the blob and the other side
does not; it shows commits where string appears different number
of times in the before- and the after- blobs, because doing so does
not hurt its primary use case to find commits where one side has one
instance of string and the other side has zero.

But -Gregexp shows commits whose git show $that_commit output
would have lines matching regexp as added or deleted.  So you get
different results from this history:

(before)(after)
a   b
b   a
c   c

As git show for such a commit looks like this:

diff --git a/one b/one
index de98044..0c81c28 100644
--- a/one
+++ b/one
@@ -1,3 +1,3 @@
-a
 b
+a
 c

git log -Ga would say it is a match.  But from git log -Sa's
point of view, it is not a match; both sides have the same number of
'a' [*1*].

I think it would make sense to teach --fixed-strings or whatever
option to -G just like it pays attention to ignore-case, but -G
--fixed-strings cannot be -S.  They have different semantics.


[Footnote]

*1* This is because -S was envisioned as (and its behaviour has been
maintained as such) a building block for Porcelain that does
more than git blame.  You feed a _unique_ block of lines taken
from the current contents as the string to quickly find the
last commit that touched that area, and iteratively dig deeper.
The -S option was meant to be used for that single step of
digging, as a part of much more grand vision in $gmane/217,
which I would still consider one of the most important messages
on the mailing list, posted 10 years ago ;-)



[jc: My mail provider seem to be queuing but not sending out SMTP
outgoing traffic, so I am trying to (re)send this in an alternate route.
If you got a duplicate of this message, my apologies.]
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF

On 2015-04-17 17.44, Torsten Bögershausen wrote:
 Without this patch, t0027 expects the native end-of-lines to be a single
 line feed character. On Windows, however, we set it to a carriage return
 character followed by a line feed character. Thus, we have to modify
 t0027 to expect different warnings depending on the end-of-line markers.

 Adjust the check of the warnings and use these macros:
   WILC:  Warn if LF becomes CRLF
   WICL:  Warn if CRLF becomes LF
   WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF

 Improve the information given by check_warning():
 Use test_cmp to show which warning is missing (or should'n t be there)

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 Signed-off-by: Torsten Bögershausen tbo...@web.de
Originally I wanted to have Dscho as Author, is that OK with you ?
(But the From: line didn't made it through my email program)
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix settings in default_user_config template

2015-04-17 Thread Ossi Herrala

The name (not user) and email setting should be in config section
user and not in core as documented in Documentation/config.txt.

Signed-Off-By: Ossi Herrala oherr...@gmail.com
Reviewed-by: Jeff King p...@peff.net
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix settings in default_user_config template

2015-04-17 Thread Ossi Herrala

The name (not user) and email setting should be in config section
user and not in core as documented in Documentation/config.txt.
---
 builtin/config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index d32c532..bfd3016 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -455,9 +455,9 @@ static char *default_user_config(void)
struct strbuf buf = STRBUF_INIT;
strbuf_addf(buf,
_(# This is Git's per-user configuration file.\n
- [core]\n
+ [user]\n
  # Please adapt and uncomment the following lines:\n
- #user = %s\n
+ #name = %s\n
  #email = %s\n),
ident_default_name(),
ident_default_email());
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/2] t0027: Cleanup: rename functions; avoid non-leading TABs

Make more clear what the tests are doing:
  commit_check_warn():
Commit files and checks for conversion warnings.
Old name: create_file_in_repo()

  checkout_files():
Checkout files from the repo and check if they have
the appropriate line endings in the work space.
Old name: check_files_in_ws()

Replace non-leading TABS with spaces

Signed-off-by: Torsten Bögershausen tbo...@web.de
---
Changes since v1:
- patch 1 is the same
- patch 2 is taken from Dscho (but slightly modified)
  Dscho, please tell if this is not OK with you 
- patch 3 temporally removed to simplify life, it will be send later


 t/t0027-auto-crlf.sh | 184 +--
 1 file changed, 92 insertions(+), 92 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 452320d..5858397 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -68,7 +68,7 @@ check_warning () {
esac
 }
 
-create_file_in_repo () {
+commit_check_warn () {
crlf=$1
attr=$2
lfname=$3
@@ -109,7 +109,7 @@ check_files_in_repo () {
 }
 
 
-check_files_in_ws () {
+checkout_files () {
eol=$1
crlf=$2
attr=$3
@@ -169,40 +169,40 @@ test_expect_success 'setup master' '
 warn_LF_CRLF=LF will be replaced by CRLF
 warn_CRLF_LF=CRLF will be replaced by LF
 
-test_expect_success 'add files empty attr' '
-   create_file_in_repo false 
  
-   create_file_in_repo true   LF_CRLF LF_CRLF
  
-   create_file_in_repo input  CRLF_LF CRLF_LF
 
+test_expect_success 'commit files empty attr' '
+   commit_check_warn false  
 
+   commit_check_warn true   LF_CRLF LF_CRLF 
 
+   commit_check_warn input  CRLF_LF CRLF_LF 

 '
 
-test_expect_success 'add files attr=auto' '
-   create_file_in_repo false auto CRLF_LF CRLF_LF
  
-   create_file_in_repo true  auto LF_CRLF LF_CRLF
  
-   create_file_in_repo input auto CRLF_LF CRLF_LF
 
+test_expect_success 'commit files attr=auto' '
+   commit_check_warn false auto CRLF_LF CRLF_LF 
 
+   commit_check_warn true  auto LF_CRLF LF_CRLF 
 
+   commit_check_warn input auto CRLF_LF CRLF_LF 

 '
 
-test_expect_success 'add files attr=text' '
-   create_file_in_repo false text CRLF_LF CRLF_LF
 CRLF_LF 
-   create_file_in_repo true  text LF_CRLF LF_CRLF 
LF_CRLF 
-   create_file_in_repo input text CRLF_LF CRLF_LF
 CRLF_LF
+test_expect_success 'commit files attr=text' '
+   commit_check_warn false text CRLF_LF CRLF_LF 
CRLF_LF 
+   commit_check_warn true  text LF_CRLF LF_CRLF LF_CRLF 

+   commit_check_warn input text CRLF_LF CRLF_LF 
CRLF_LF
 '
 
-test_expect_success 'add files attr=-text' '
-   create_file_in_repo false -text   
  
-   create_file_in_repo true  -text   
  
-   create_file_in_repo input -text   
 
+test_expect_success 'commit files attr=-text' '
+   commit_check_warn false -text
 
+   commit_check_warn true  -text
 
+   commit_check_warn input -text

 '
 
-test_expect_success 'add files attr=lf' '
-   create_file_in_repo false lf   CRLF_LF CRLF_LF
 CRLF_LF 
-   create_file_in_repo true  lf   CRLF_LF CRLF_LF
 CRLF_LF 
-   create_file_in_repo input lf   CRLF_LF CRLF_LF
 CRLF_LF
+test_expect_success 'commit files attr=lf' '
+   commit_check_warn false lf   CRLF_LF CRLF_LF 
CRLF_LF 
+   commit_check_warn true  lf   CRLF_LF CRLF_LF 
CRLF_LF 
+   commit_check_warn input lf   CRLF_LF CRLF_LF 
CRLF_LF
 '
 
-test_expect_success 'add files attr=crlf' '
-   create_file_in_repo false crlf LF_CRLF LF_CRLF 
LF_CRLF  
-   create_file_in_repo true  crlf LF_CRLF LF_CRLF 
LF_CRLF  
-   create_file_in_repo input crlf LF_CRLF LF_CRLF 
LF_CRLF 
+test_expect_success 'commit files attr=crlf' '
+   commit_check_warn false crlf LF_CRLF LF_CRLF LF_CRLF 
 
+   commit_check_warn true  crlf LF_CRLF LF_CRLF LF_CRLF 
 
+   commit_check_warn input crlf LF_CRLF LF_CRLF LF_CRLF 

 '
 
 test_expect_success 'create files cleanup' '
@@ -237,7 +237,7 @@ test_expect_success 'commit -text' '
 

 # Check how files in the repo are changed when they are checked out
 # How to read the table below:
-# - check_files_in_ws will check multiple files with a combination of settings
+#

[PATCH v2 2/2] t0027: Support NATIVE_CRLF

Without this patch, t0027 expects the native end-of-lines to be a single
line feed character. On Windows, however, we set it to a carriage return
character followed by a line feed character. Thus, we have to modify
t0027 to expect different warnings depending on the end-of-line markers.

Adjust the check of the warnings and use these macros:
  WILC:  Warn if LF becomes CRLF
  WICL:  Warn if CRLF becomes LF
  WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF

Improve the information given by check_warning():
Use test_cmp to show which warning is missing (or should'n t be there)

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
Signed-off-by: Torsten Bögershausen tbo...@web.de
---
 t/t0027-auto-crlf.sh | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 5858397..8975b97 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -57,15 +57,13 @@ create_gitattributes () {
 
 check_warning () {
case $1 in
-   LF_CRLF) grep LF will be replaced by CRLF $2;;
-   CRLF_LF) grep CRLF will be replaced by LF $2;;
-   '')
-   expect
-   grep will be replaced by $2 actual
-   test_cmp expect actual
-   ;;
-   *) false ;;
+   LF_CRLF) echo warning: LF will be replaced by CRLF $2.expect ;;
+   CRLF_LF) echo warning: CRLF will be replaced by LF $2.expect ;;
+   '')  $2.expect ;;
+   *) echo 2 Illegal 1: $1 ; return false ;;
esac
+   grep will be replaced by $2 | sed -e s/\(.*\) in [^ ]*$/\1/ 
$2.actual
+   test_cmp $2.expect $2.actual
 }
 
 commit_check_warn () {
@@ -169,6 +167,18 @@ test_expect_success 'setup master' '
 warn_LF_CRLF=LF will be replaced by CRLF
 warn_CRLF_LF=CRLF will be replaced by LF
 
+#WILC means Warn if (this OS) converts LF into CRLF
+if test_have_prereq NATIVE_CRLF
+then
+WILC=LF_CRLF
+WICL=
+WAMIX=LF_CRLF
+else
+WILC=
+WICL=CRLF_LF
+WAMIX=CRLF_LF
+fi
+
 test_expect_success 'commit files empty attr' '
commit_check_warn false  
 
commit_check_warn true   LF_CRLF LF_CRLF 
 
@@ -176,13 +186,13 @@ test_expect_success 'commit files empty attr' '
 '
 
 test_expect_success 'commit files attr=auto' '
-   commit_check_warn false auto CRLF_LF CRLF_LF 
 
+   commit_check_warn false auto $WILC  $WICL$WAMIX  
 
commit_check_warn true  auto LF_CRLF LF_CRLF 
 
commit_check_warn input auto CRLF_LF CRLF_LF 

 '
 
 test_expect_success 'commit files attr=text' '
-   commit_check_warn false text CRLF_LF CRLF_LF 
CRLF_LF 
+   commit_check_warn false text $WILC  $WICL$WAMIX  $WILC  
$WICL 
commit_check_warn true  text LF_CRLF LF_CRLF LF_CRLF 

commit_check_warn input text CRLF_LF CRLF_LF 
CRLF_LF
 '
-- 
2.2.0.rc1.790.ge19fcd2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fast-import: add options to enable/disable case folding




On 04/17/2015 01:52 PM, Mike Hommey wrote:
 Currently, fast-import does case folding depending on `core.ignorecase`.
 `core.ignorecase` depends on the file system where the working tree is.
 However, different kind of imports require different kinds of semantics,
 and they usually aren't tied with the file system, but with the data being
 imported.
Good that you take up this issue, thanks for the patch
More comments inline.
 Add command line options to enable or disable case folding. Also expose
 them as features in the fast-import stream. Features instead of options,
 because a stream that needs case folding enabled or disabled won't work
 as expected if fast-import doesn't support the case folding options.
 ---
  Documentation/git-fast-import.txt | 11 ++
  fast-import.c | 19 --
  t/t9300-fast-import.sh| 79 
 +++
  3 files changed, 106 insertions(+), 3 deletions(-)

 diff --git a/Documentation/git-fast-import.txt 
 b/Documentation/git-fast-import.txt
 index 690fed3..22eba87 100644
 --- a/Documentation/git-fast-import.txt
 +++ b/Documentation/git-fast-import.txt
 @@ -50,6 +50,13 @@ OPTIONS
   memory used by fast-import during this run.  Showing this output
   is currently the default, but can be disabled with \--quiet.
  
 +--[no-]fold-case::
 + When files/directories with the same name but a different case
 + are detected, they are treated as the same (--fold-case) or as
 + being different (--no-fold-case). The default is --fold-case
 + when `core.ignorecase` is set to `true`, and --no-fold-case when
 + it is `false`.
 +
Most often the we use the term ignore-case, could that be a better name ?
Other opinions, pros/cons  ?

  Options for Frontends
  ~
  
 @@ -1027,6 +1034,8 @@ date-format::
  export-marks::
  relative-marks::
  no-relative-marks::
 +fold-case::
 +no-fold-case::
  force::
   Act as though the corresponding command-line option with
   a leading '--' was passed on the command line
 @@ -1091,6 +1100,8 @@ not be passed as option:
  * import-marks
  * export-marks
  * cat-blob-fd
 +* fold-case
 +* no-fold-case
  * force
  
  `done`
 diff --git a/fast-import.c b/fast-import.c
 index 6378726..958f3da 100644
 --- a/fast-import.c
 +++ b/fast-import.c
 @@ -371,10 +371,18 @@ static volatile sig_atomic_t checkpoint_requested;
  /* Where to write output of cat-blob commands */
  static int cat_blob_fd = STDOUT_FILENO;
  
 +/* Whether to enable case folding */
 +static int fold_case;
 +
  static void parse_argv(void);
  static void parse_cat_blob(const char *p);
  static void parse_ls(const char *p, struct branch *b);
  
 +static int strncmp_foldcase(const char *a, const char *b, size_t count)
 +{
 + return fold_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
 +}
 +
  static void write_branch_report(FILE *rpt, struct branch *b)
  {
   fprintf(rpt, %s:\n, b-name);
 @@ -1507,7 +1515,7 @@ static int tree_content_set(
   t = root-tree;
   for (i = 0; i  t-entry_count; i++) {
   e = t-entries[i];
 - if (e-name-str_len == n  !strncmp_icase(p, 
 e-name-str_dat, n)) {
 + if (e-name-str_len == n  !strncmp_foldcase(p, 
 e-name-str_dat, n)) {
   if (!*slash1) {
   if (!S_ISDIR(mode)
e-versions[1].mode == mode
 @@ -1597,7 +1605,7 @@ static int tree_content_remove(
   t = root-tree;
   for (i = 0; i  t-entry_count; i++) {
   e = t-entries[i];
 - if (e-name-str_len == n  !strncmp_icase(p, 
 e-name-str_dat, n)) {
 + if (e-name-str_len == n  !strncmp_foldcase(p, 
 e-name-str_dat, n)) {
   if (*slash1  !S_ISDIR(e-versions[1].mode))
   /*
* If p names a file in some subdirectory, and a
 @@ -1664,7 +1672,7 @@ static int tree_content_get(
   t = root-tree;
   for (i = 0; i  t-entry_count; i++) {
   e = t-entries[i];
 - if (e-name-str_len == n  !strncmp_icase(p, 
 e-name-str_dat, n)) {
 + if (e-name-str_len == n  !strncmp_foldcase(p, 
 e-name-str_dat, n)) {
   if (!*slash1)
   goto found_entry;
   if (!S_ISDIR(e-versions[1].mode))
 @@ -3246,6 +3254,10 @@ static int parse_one_feature(const char *feature, int 
 from_stream)
   relative_marks_paths = 1;
   } else if (!strcmp(feature, no-relative-marks)) {
   relative_marks_paths = 0;
 + } else if (!strcmp(feature, fold-case)) {
 + fold_case = 1;
 + } else if (!strcmp(feature, no-fold-case)) {
 + fold_case = 0;
   } else if (!strcmp(feature, done)) {
   require_explicit_termination = 1;
   } else if (!strcmp(feature, force)) {
 @@ -3372,6 +3384,7 @@ int main(int argc, char

Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

Jeff King p...@peff.net writes:

 If it's not a problem, I'd love to see timings for your case with just
 the first patch, and then with both.

Thanks for two quick progress patches.

 You may also be interested in:

   http://thread.gmane.org/gmane.comp.version-control.git/266370

 which addresses another performance problem related to the
 freshen/recent code in v2.2.

That, too.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix settings in default_user_config template

Ossi Herrala oherr...@gmail.com writes:

 The name (not user) and email setting should be in config section
 user and not in core as documented in Documentation/config.txt.

 Signed-Off-By: Ossi Herrala oherr...@gmail.com
 Reviewed-by: Jeff King p...@peff.net

Thanks ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories

Jeff King p...@peff.net writes:

 Option 1:
  Plug the hole in my new is_git_repository function. A quick and dirty
  fix that passes the above test would be:

 I think that makes sense. It would be nice if you could just call
 read_gitfile, but that function is very anxious to die on error. So the
 prerequisite step would probably be to refactor that into a
 read_gitfile_gently that returns an error code.

I agree.

I was looking at the repository discovery loop to see if it makes
sense to update is-git-directory() to take a gitfile, but I do not
think it is a good idea (typically after is-git-directory() says
yes, we would append paths e.g. refs/heads/master after it to
pass the result to system calls like open()).  I agree that adding
read-gitfile-gently and call it before running is-git-directory
would be a good solution for this change.

 PS Thank you for working on this.

That too.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF

2015-04-17 Thread Johannes Schindelin

Hi Torsten,

On 2015-04-17 17:44, Torsten Bögershausen wrote:
 Without this patch, t0027 expects the native end-of-lines to be a single
 line feed character. On Windows, however, we set it to a carriage return
 character followed by a line feed character. Thus, we have to modify
 t0027 to expect different warnings depending on the end-of-line markers.
 
 Adjust the check of the warnings and use these macros:
   WILC:  Warn if LF becomes CRLF
   WICL:  Warn if CRLF becomes LF
   WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF
 
 Improve the information given by check_warning():
 Use test_cmp to show which warning is missing (or should'n t be there)
 
 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 Signed-off-by: Torsten Bögershausen tbo...@web.de

Thank you so much!
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

karthik nayak karthik@gmail.com writes:

 +   type = type_from_string_gently(buf, len, 1);
 +   if (oi-typename) {
 +   strbuf_add(oi-typename, buf, len);
 +   strbuf_addch(oi-typename, '\0');

add() has setlen() at the end so you do not have to NUL terminate it
yourself.  Doing addch() is actively wrong, as oi-typename-len now
counts the terminating NUL as part of the string, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

Jeff King p...@peff.net writes:

 If there _is_ a performance implication to worry about here, I think it
 would be that we are doing an extra malloc/free.

Thanks for reminding me; yes, that also worried me.

 I'm not sure I
 understand why we are copying it at all. The original code copied from
 the hdr into type[10] so that we could NUL-terminate it, which was
 required for type_from_string().

Sounds like a good plan.


 But now we use type_from_string_gently, which can accept a length[1]. So
 we could just count the bytes to the first space and pass the original
 buffer along with that length, no?

 -Peff

 [1] Not related to your patch, but it looks like type_from_string_gently
 is overly lax. It does a strncmp() with the length of the candidate
 name, but does not check that we consumed all of the matching name.
 So tr would match tree, comm would match commit, and so
 forth.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fast-import: add options to enable/disable case folding

Torsten Bögershausen tbo...@web.de writes:

 +--[no-]fold-case::
 +When files/directories with the same name but a different case
 +are detected, they are treated as the same (--fold-case) or as
 +being different (--no-fold-case). The default is --fold-case
 +when `core.ignorecase` is set to `true`, and --no-fold-case when
 +it is `false`.
 +
 Most often the we use the term ignore-case, could that be a better name ?
 Other opinions, pros/cons  ?

Yeah, --[no-]ignore-case sounds more in line with how other
commands' options are spelled.

But I somehow thought this case-folding was deliberately done as
an improvement against the original that did not have a way to do
the ignore-case?

http://thread.gmane.org/gmane.comp.version-control.git/200597/focus=200625

I am not sure why not until now I did not find the original
justification dubious, but I think fast-export should never do case
folding---Joshua talks about working trees on a file system that is
incapable of expressing different cases, but export is about
reading in-repository histories, whose trees are fully capable of
expressing paths in different cases just fine, and spitting out a
file that can be processed by fast-import.  I do not see why it
should collapse two different paths that differ in case at export
time.

If the original history is broken by Perforce or whatever and
recording the history of the same path in different case
combinations in different commits, perhaps the right thing to do is
to fix the original history in Git repository before exporting in
the first place.

I do not see how such a corruption is related to the characteristics
of the filesystem where export is run.  Perhaps a case-insensitive
filesystem may helped Perforce to corrupt the history when initial
import of the history into Git was done, but core.ignorecase of the
current repository does not help us decide if that was actually the
case---the import may have been done on a completely different
machine.

So perhaps we should rip the case folding out altogether instead?
The entry for the change in the Release Notes may say:

 * git fast-import incorrectly case-folded the paths recorded in
   the history when core.ignorease is set (i.e. the repository's
   working tree is incapable of expressing paths that differ only in
   their cases); this old bug was reported in 2012 and was finally
   corrected.

or something like that?


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF

2015-04-17 Thread Johannes Schindelin

Hi Torsten,

On 2015-04-17 19:00, Torsten Bögershausen wrote:
 On 2015-04-17 17.44, Torsten Bögershausen wrote:
 Without this patch, t0027 expects the native end-of-lines to be a single
 line feed character. On Windows, however, we set it to a carriage return
 character followed by a line feed character. Thus, we have to modify
 t0027 to expect different warnings depending on the end-of-line markers.

 Adjust the check of the warnings and use these macros:
   WILC:  Warn if LF becomes CRLF
   WICL:  Warn if CRLF becomes LF
   WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF

 Improve the information given by check_warning():
 Use test_cmp to show which warning is missing (or should'n t be there)

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 Signed-off-by: Torsten Bögershausen tbo...@web.de
 Originally I wanted to have Dscho as Author, is that OK with you ?
 (But the From: line didn't made it through my email program)

No worries.
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/3] clean: improve performance when removing lots of directories

On Fri, Apr 17, 2015 at 08:15:40PM +0200, erik elfström wrote:

  Doesn't this implementation get confused by modern submodule
  checkouts and descend into and clean their working tree, though?
  Module M with path P would have a directory P in the working tree of
  the top-level project, and P/.git is a regular file that will fail
  is_git_directory() test but records the location of the real
  submodule repository i.e. .git/modules/M via the gitdir:
  mechanism.
 
 
 Yes, there is a problem here. I've added the test below and it fails after
 my change by cleaning sub2 (sub1 is not cleaned). Are there more cases
 here that I should test for?

I wonder about the opposite case, too (finding more repos than we used
to).

It looks like your patches will find bare repositories in the tree,
whereas the current code does not (it only cares about .git). AFAIK,
submodules will never exist as bare in the working tree. And I have seen
repositories which embed bare repos as test cases. Admittedly this is
because I work on projects that are related to git itself, but I don't
see a reason to regress this case if the submodule code doesn't get any
benefit.

 Base on the previous discussion of the patch topic I can see 3 options
 for how to fix this:
 
 Option 1:
  Plug the hole in my new is_git_repository function. A quick and dirty
  fix that passes the above test would be:

I think that makes sense. It would be nice if you could just call
read_gitfile, but that function is very anxious to die on error. So the
prerequisite step would probably be to refactor that into a
read_gitfile_gently that returns an error code.

-Peff

PS Thank you for working on this. I have been quiet because I haven't
   had a chance to look over your patches carefully yet, but overall
   they look very promising.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fast-import: add options to enable/disable case folding

On Fri, Apr 17, 2015 at 06:56:43PM +0200, Torsten Bögershausen wrote:
 On 04/17/2015 01:52 PM, Mike Hommey wrote:
  +test_expect_success 'V: default case folding with ignorecase=true' '
  +   git config core.ignorecase true 
  +   git fast-import input 
  +   git ls-tree refs/heads/V actual 
  +   git update-ref -d refs/heads/V 
  +   cat expected \EOF 
  +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85   a
  +EOF
  +   test_cmp expected actual'
  +
  +test_expect_success 'V: default case folding with ignorecase=false' '
  +   git config core.ignorecase false 
  +   git fast-import input 
  +   git ls-tree refs/heads/V actual 
  +   git update-ref -d refs/heads/V 
  +   cat expected \EOF 
  +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85   A
  +EOF
  +   test_cmp expected actual'
  +
  +test_expect_success 'V: forced case folding with ignorecase=true' '
  +   git config core.ignorecase true 
  +   git fast-import --fold-case input 
  +   git ls-tree refs/heads/V actual 
  +   git update-ref -d refs/heads/V 
  +   cat expected \EOF 
  +100644 blob 78981922613b2afb6025042ff6bd878ac1994e85   a
  +EOF
  +   test_cmp expected actual'

 If you want to make it shorter (and try to avoid repetition):
 The forced true cases could be collected in a loop.
 (and the same for forced=false)

I was also going to suggest squashing the repetition. Here's what I
had in mind:

--- 8 ---
test_foldcase() {
ignore=$1 
case $2 in
true) fold=--fold-case folded=true ;;
false) fold=--no-fold-case folded=false ;;
*) fold= folded=$ignore ;;
esac 
case $folded in true) folded=a ;; false) folded=A ;; esac 

test_expect_success V: case folding: ignorecase=$ignore${fold:+ 
$fold} 
git -c core.ignorecase=$ignore fast-import $fold input 
git ls-tree refs/heads/V actual 
git update-ref -d refs/heads/V 
cat expect -EOF 
100644 blob 78981922613b2afb6025042ff6bd878ac1994e85$folded
EOF
test_cmp expect actual

}

for o in '' true false
do
for c in true false
do
test_foldcase $c $o
done
done
--- 8 ---

which outputs:

--- 8 ---
ok 176 - V: case folding: ignorecase=true
ok 177 - V: case folding: ignorecase=false
ok 178 - V: case folding: ignorecase=true --fold-case
ok 179 - V: case folding: ignorecase=false --fold-case
ok 180 - V: case folding: ignorecase=true --no-fold-case
ok 181 - V: case folding: ignorecase=false --no-fold-case
--- 8 ---
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] log -L: improve error message on malformed argument

Matthieu Moy matthieu@imag.fr writes:

 The old message did not mention the :regex:file form.

 To avoid overly long lines, split the message into two lines (in case
 item-string is long, it will be the only part truncated in a narrow
 terminal).

 Signed-off-by: Matthieu Moy matthieu@imag.fr
 ---
  line-log.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/line-log.c b/line-log.c
 index a490efe..e725248 100644
 --- a/line-log.c
 +++ b/line-log.c
 @@ -575,7 +575,8 @@ parse_lines(struct commit *commit, const char *prefix, 
 struct string_list *args)
  
   name_part = skip_range_arg(item-string);
   if (!name_part || *name_part != ':' || !name_part[1])
 - die(-L argument '%s' not of the form start,end:file,
 + die(invalid -L argument '%s'.\n
 + It should be of the form start,end:file or 
 :regex:file.,
   item-string);
   range_part = xstrndup(item-string, name_part - item-string);
   name_part++;

I actually think start,end:file is more correct than your updated
text.

By adding :regex:file as a possibility, you are hinting that 'start'
and 'end' are *not* regular expressions but numbers, but

$ git log -L'/^int main/,/^}/:git.c'

is a perfectly fine way to specify start (i.e. the first line that
matches '^int main') and end (i.e. the first line that matches '^}'
after that).  Perhaps rephrase it as :funcname:file to avoid giving
false impression to the other one, and use Eric's suggestion on top?

die(-L argument not 'start,end:file' or ':funcname:file': %s,
item-string);

With the matching update to tests, here is what I'll queue on top of
this patch for now, but please send in objections and improvements.

Thanks.

 line-log.c  | 3 +--
 t/t4211-line-log.sh | 8 
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/line-log.c b/line-log.c
index 5d4cb7c..51d9f7c 100644
--- a/line-log.c
+++ b/line-log.c
@@ -543,8 +543,7 @@ parse_lines(struct commit *commit, const char *prefix, 
struct string_list *args)
 
name_part = skip_range_arg(item-string);
if (!name_part || *name_part != ':' || !name_part[1])
-   die(invalid -L argument '%s'.\n
-   It should be of the form start,end:file or 
:regex:file.,
+   die(-L argument not 'start,end:file' or 
':funcname:file': %s,
item-string);
range_part = xstrndup(item-string, name_part - item-string);
name_part++;
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 426a828..edd5ed3 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -40,14 +40,14 @@ canned_test -L 24,+1:a.c simple vanishes-early
 canned_test -L '/long f/,/^}/:b.c' move-support move-support-f
 
 test_bad_opts -L switch.*requires a value
-test_bad_opts -L b.c argument.*not of the form
-test_bad_opts -L 1: argument.*not of the form
+test_bad_opts -L b.c argument not .start,end:file
+test_bad_opts -L 1: argument not .start,end:file
 test_bad_opts -L 1:nonexistent There is no path
 test_bad_opts -L 1:simple There is no path
-test_bad_opts -L '/foo:b.c' argument.*not of the form
+test_bad_opts -L '/foo:b.c' argument not .start,end:file
 test_bad_opts -L 1000:b.c has only.*lines
 test_bad_opts -L 1,1000:b.c has only.*lines
-test_bad_opts -L :b.c argument.*not of the form
+test_bad_opts -L :b.c argument not .start,end:file
 test_bad_opts -L :foo:b.c no match
 
 test_done
-- 
2.4.0-rc2-183-g70401ab

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

Jeff King p...@peff.net writes:

 On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  If there _is_ a performance implication to worry about here, I think it
  would be that we are doing an extra malloc/free.
 
 Thanks for reminding me; yes, that also worried me.

 As an aside, I worried about the extra allocation for reading the header
 in the first place. But it looks like we only do this on the --literally
 code path (and otherwise use the normal unpack_sha1_header).  Still, I
 wonder if we could make this work automagically.  That is, speculatively
 unpack the first N bytes, assuming we hit the end-of-header. If not,
 then go to a strbuf as the slow path. Then it would be fine to cover all
 cases; the normal ones would be fast, and only ridiculous things would
 incur the extra allocation.

Yes, that was what I was hoping to see eventually ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] type_from_string_gently: make sure length matches

Jeff King p...@peff.net writes:

 I'd be surprised if it appreciably speeds things up, but I guess it is
 not too complicated to do.

 +static struct {
 +const char *str;
 +int len;
 +} object_type_name[] = {
 +{ NULL, 0 },  /* OBJ_NONE = 0 */
 +{ commit, 6 },  /* OBJ_COMMIT = 1 */
 +{ tree, 4 },/* OBJ_TREE = 2 */
 +{ blob, 4 },/* OBJ_BLOB = 3 */
 +{ tag, 3 }, /* OBJ_TAG = 4 */
  };

 I had envisioned a macro like:

   #define SIZED_STRING(x) { (x), (sizeof(x) - 1) }

 though perhaps that is overkill for such a short list (that we don't
 even expect to change).

Sounds good (either way ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  If there _is_ a performance implication to worry about here, I think it
  would be that we are doing an extra malloc/free.
 
 Thanks for reminding me; yes, that also worried me.

As an aside, I worried about the extra allocation for reading the header
in the first place. But it looks like we only do this on the --literally
code path (and otherwise use the normal unpack_sha1_header).  Still, I
wonder if we could make this work automagically.  That is, speculatively
unpack the first N bytes, assuming we hit the end-of-header. If not,
then go to a strbuf as the slow path. Then it would be fine to cover all
cases; the normal ones would be fast, and only ridiculous things would
incur the extra allocation.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] type_from_string_gently: make sure length matches

On Fri, Apr 17, 2015 at 01:54:27PM -0700, Junio C Hamano wrote:

  Since the strings we are matching are literals, we could also record
  their sizes in the object_type_strings array and check the length first
  before even calling strncmp. I doubt this is a performance hot-spot,
  though.
 
  You could also potentially just use strlen(object_type_strings[i]), but
  I'm not sure if compilers will optimize out the strlen in this case,
  since it is in a loop.
 
 That thought crossed my mind while reading your patch.  It could
 even make it go faster if we made object_type_strings into an array
 of counted strings (i.e. struct { const char *str; int len; })
 and then took advantage of the fact that we have lengths of both.

Right, that was what I meant.

I'd be surprised if it appreciably speeds things up, but I guess it is
not too complicated to do.

 +static struct {
 + const char *str;
 + int len;
 +} object_type_name[] = {
 + { NULL, 0 },  /* OBJ_NONE = 0 */
 + { commit, 6 },  /* OBJ_COMMIT = 1 */
 + { tree, 4 },/* OBJ_TREE = 2 */
 + { blob, 4 },/* OBJ_BLOB = 3 */
 + { tag, 3 }, /* OBJ_TAG = 4 */
  };

I had envisioned a macro like:

  #define SIZED_STRING(x) { (x), (sizeof(x) - 1) }

though perhaps that is overkill for such a short list (that we don't
even expect to change).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF

Johannes Schindelin johannes.schinde...@gmx.de writes:

 Hi Torsten,

 On 2015-04-17 17:44, Torsten Bögershausen wrote:
 Without this patch, t0027 expects the native end-of-lines to be a single
 line feed character. On Windows, however, we set it to a carriage return
 character followed by a line feed character. Thus, we have to modify
 t0027 to expect different warnings depending on the end-of-line markers.
 
 Adjust the check of the warnings and use these macros:
   WILC:  Warn if LF becomes CRLF
   WICL:  Warn if CRLF becomes LF
   WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF
 
 Improve the information given by check_warning():
 Use test_cmp to show which warning is missing (or should'n t be there)
 
 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 Signed-off-by: Torsten Bögershausen tbo...@web.de

 Thank you so much!
 Dscho

Thanks, is that Acked-by: Dscho for both patches?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] t0027: Support NATIVE_CRLF

On Fri, Apr 17, 2015 at 11:44 AM, Torsten Bögershausen tbo...@web.de wrote:
 Without this patch, t0027 expects the native end-of-lines to be a single
 line feed character. On Windows, however, we set it to a carriage return
 character followed by a line feed character. Thus, we have to modify
 t0027 to expect different warnings depending on the end-of-line markers.

 Adjust the check of the warnings and use these macros:
   WILC:  Warn if LF becomes CRLF
   WICL:  Warn if CRLF becomes LF
   WAMIX: Mixed line endings: either CRLF-LF or LF-CRLF

 Improve the information given by check_warning():
 Use test_cmp to show which warning is missing (or should'n t be there)

s/should'n t/shouldn't/

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 Signed-off-by: Torsten Bögershausen tbo...@web.de
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] type_from_string_gently: make sure length matches

Jeff King p...@peff.net writes:

 When commit fe8e3b7 refactored type_from_string to allow
 input that was not NUL-terminated, it switched to using
 strncmp instead of strcmp. But this means we check only the
 first len bytes of the strings, and ignore any remaining
 bytes in the object_type_string. We should make sure that it
 is also len bytes, or else we would accept comm as
 commit, and so forth.

 Signed-off-by: Jeff King p...@peff.net
 ---
 Since the strings we are matching are literals, we could also record
 their sizes in the object_type_strings array and check the length first
 before even calling strncmp. I doubt this is a performance hot-spot,
 though.

 You could also potentially just use strlen(object_type_strings[i]), but
 I'm not sure if compilers will optimize out the strlen in this case,
 since it is in a loop.

That thought crossed my mind while reading your patch.  It could
even make it go faster if we made object_type_strings into an array
of counted strings (i.e. struct { const char *str; int len; })
and then took advantage of the fact that we have lengths of both.


 object.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/object.c b/object.c
index aedac24..51584ea 100644
--- a/object.c
+++ b/object.c
@@ -18,19 +18,22 @@ struct object *get_indexed_object(unsigned int idx)
return obj_hash[idx];
 }
 
-static const char *object_type_strings[] = {
-   NULL,   /* OBJ_NONE = 0 */
-   commit,   /* OBJ_COMMIT = 1 */
-   tree, /* OBJ_TREE = 2 */
-   blob, /* OBJ_BLOB = 3 */
-   tag,  /* OBJ_TAG = 4 */
+static struct {
+   const char *str;
+   int len;
+} object_type_name[] = {
+   { NULL, 0 },  /* OBJ_NONE = 0 */
+   { commit, 6 },  /* OBJ_COMMIT = 1 */
+   { tree, 4 },/* OBJ_TREE = 2 */
+   { blob, 4 },/* OBJ_BLOB = 3 */
+   { tag, 3 }, /* OBJ_TAG = 4 */
 };
 
 const char *typename(unsigned int type)
 {
-   if (type = ARRAY_SIZE(object_type_strings))
+   if (type = ARRAY_SIZE(object_type_name))
return NULL;
-   return object_type_strings[type];
+   return object_type_name[type].str;
 }
 
 int type_from_string_gently(const char *str, ssize_t len, int gentle)
@@ -40,8 +43,9 @@ int type_from_string_gently(const char *str, ssize_t len, int 
gentle)
if (len  0)
len = strlen(str);
 
-   for (i = 1; i  ARRAY_SIZE(object_type_strings); i++)
-   if (!strncmp(str, object_type_strings[i], len))
+   for (i = 1; i  ARRAY_SIZE(object_type_name); i++)
+   if (object_type_name[i].len == len 
+   !strncmp(str, object_type_name[i].str, len))
return i;
 
if (gentle)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] limit_list: avoid quadratic behavior from still_interesting

When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents.  The
size of the queue will naturally fluctuate based on the
width of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.

For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early).  Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.

If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check.  It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.

But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:

  git rev-list $commits --not --all

in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.

These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:

# positive ref is 100,000 commits deep
git rev-list --all | head -10 | tail -1 input

# huge number of more recent negative refs
git rev-list --all | head -$n | sed s/^/^/ input

time git rev-list --stdin input

Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.

   n  | before | after

1 | 0.991s | 0.848s
1 | 1.120s (+0.129s)   | 0.885s (+0.037s)
2 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x)
4 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x)
8 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x)

Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.

The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).

I considered and rejected several possible alternatives:

  1. Keep a count of UNINTERESTING commits in the queue.
 This requires managing the count not only when removing
 an item from the queue, but also when marking an item
 as UNINTERESTING. That requires touching the other
 functions which mark commits, and would require knowing
 quickly which commits are in the queue (lookup in the
 queue is linear, so we would need an auxiliary
 structure or to also maintain an IN_QUEUE flag in each
 commit object).

  2. Keep a separate list of interesting commits. Drop items
 from it when they are dropped from the queue, or if
 they become UNINTERESTING. This again suffers from
 extra complexity to maintain the list, not to mention
 CPU and memory.

  3. Use a better data structure for the queue. This is
 something that could help the fix in fce87ae, because
 we order the queue by recency, and it is about
 inserting quickly in recency order. So a normal
 priority queue would help there. But here, we cannot
 disturb the order of the queue, which makes things
 harder. We really do need an auxiliary index to track
 the flag we care about, which is basically option (2)
 above.

The cache trick is simple, and the numbers above show that
it works well in

Re: [PATCH 0/3] Another approach to large transactions

2015-04-17 Thread Stefan Beller

On Fri, Apr 17, 2015 at 3:12 PM, Junio C Hamano gits...@pobox.com wrote:

 This is now pushed out and sitting at the tip of 'pu'.  It seems to
 break one of the tests in 1400 when merged to 'next', but I didn't
 look it closely.

 Thanks.

ok, I'll look more closely.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] limit_list: avoid quadratic behavior from still_interesting

Jeff King p...@peff.net writes:

 The implementation is fairly straightforward. Whenever we do
 the linear search, we cache the interesting commit we find,
 and next time check it before doing another linear search.
 If that commit is removed from the list or becomes
 UNINTERESTING itself, then we fall back to the linear
 search.

Nicely done, clever, simple and effective.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] git-p4: Use -m when running p4 changes

2015-04-17 Thread Lex Spoon

Simply running p4 changes on a large branch can
result in a too many rows scanned error from the
Perforce server. It is better to use a sequence
of smaller calls to p4 changes, using the -m
option to limit the size of each call.

Signed-off-by: Lex Spoon l...@lexspoon.org
Reviewed-by: Junio C Hamano gits...@pobox.com
Reviewed-by: Luke Diamand l...@diamand.org
---
Updated as suggested:
- documentation added
- avoided touch(1)
- used test_seq
- used || exit for test commands inside for loops
- more tabs
- fewer line breaks
- expanded commit message

 Documentation/git-p4.txt | 17 ++---
 git-p4.py| 54 +++-
 t/t9818-git-p4-block.sh  | 64 
 3 files changed, 120 insertions(+), 15 deletions(-)
 create mode 100755 t/t9818-git-p4-block.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index a1664b9..82aa5d6 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -225,9 +225,20 @@ Git repository:
they can find the p4 branches in refs/heads.
 
 --max-changes n::
-   Limit the number of imported changes to 'n'.  Useful to
-   limit the amount of history when using the '@all' p4 revision
-   specifier.
+   Import at most 'n' changes, rather than the entire range of
+   changes included in the given revision specifier. A typical
+   usage would be use '@all' as the revision specifier, but then
+   to use '--max-changes 1000' to import only the last 1000
+   revisions rather than the entire revision history.
+
+--changes-block-size n::
+   The internal block size to use when converting a revision
+   specifier such as '@all' into a list of specific change
+   numbers. Instead of using a single call to 'p4 changes' to
+   find the full list of changes for the conversion, there are a
+   sequence of calls to 'p4 changes -m', each of which requests
+   one block of changes of the given size. The default block size
+   is 500, which should usually be suitable.
 
 --keep-path::
The mapping of file names from the p4 depot path to Git, by
diff --git a/git-p4.py b/git-p4.py
index 549022e..1fba3aa 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = 
refs/remotes/p4/, silent
 def originP4BranchesExist():
 return gitBranchExists(origin) or gitBranchExists(origin/p4) or 
gitBranchExists(origin/p4/master)
 
-def p4ChangesForPaths(depotPaths, changeRange):
+def p4ChangesForPaths(depotPaths, changeRange, block_size):
 assert depotPaths
-cmd = ['changes']
-for p in depotPaths:
-cmd += [%s...%s % (p, changeRange)]
-output = p4_read_pipe_lines(cmd)
+assert block_size
+
+# Parse the change range into start and end
+if changeRange is None or changeRange == '':
+changeStart = '@1'
+changeEnd = '#head'
+else:
+parts = changeRange.split(',')
+assert len(parts) == 2
+changeStart = parts[0]
+changeEnd = parts[1]
 
+# Accumulate change numbers in a dictionary to avoid duplicates
 changes = {}
-for line in output:
-changeNum = int(line.split( )[1])
-changes[changeNum] = True
+
+for p in depotPaths:
+# Retrieve changes a block at a time, to prevent running
+# into a MaxScanRows error from the server.
+start = changeStart
+end = changeEnd
+get_another_block = True
+while get_another_block:
+new_changes = []
+cmd = ['changes']
+cmd += ['-m', str(block_size)]
+cmd += [%s...%s,%s % (p, start, end)]
+for line in p4_read_pipe_lines(cmd):
+changeNum = int(line.split( )[1])
+new_changes.append(changeNum)
+changes[changeNum] = True
+if len(new_changes) == block_size:
+get_another_block = True
+end = '@' + str(min(new_changes))
+else:
+get_another_block = False
 
 changelist = changes.keys()
 changelist.sort()
@@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
 optparse.make_option(--import-labels, dest=importLabels, 
action=store_true),
 optparse.make_option(--import-local, 
dest=importIntoRemotes, action=store_false,
  help=Import into refs/heads/ , not 
refs/remotes),
-optparse.make_option(--max-changes, dest=maxChanges),
+optparse.make_option(--max-changes, dest=maxChanges,
+ help=Maximum number of changes to 
import),
+optparse.make_option(--changes-block-size, 
dest=changes_block_size, type=int,
+ help=Internal block size to use when 
iteratively calling p4 changes),
 optparse.make_option(--keep-path,

Re: [PATCH 0/3] Another approach to large transactions

2015-04-17 Thread Stefan Beller

On Fri, Apr 17, 2015 at 3:17 PM, Stefan Beller sbel...@google.com wrote:
 On Fri, Apr 17, 2015 at 3:12 PM, Junio C Hamano gits...@pobox.com wrote:

 This is now pushed out and sitting at the tip of 'pu'.  It seems to
 break one of the tests in 1400 when merged to 'next', but I didn't
 look it closely.

 Thanks.

 ok, I'll look more closely.

Apparently I screwed up even before sending the patches over the wire.

  not ok 144 - large transaction deleting branches does not burst open
file limit

fails in my local branch as well as origin/pu as well on
origin/sb/remove-fd-from-ref-lock

So there is a pretty strong argument, the code is only improving
large transaction creating branches and not deleting branches.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Another approach to large transactions

Junio C Hamano gits...@pobox.com writes:

 Stefan Beller sbel...@google.com writes:

 * We keep the speed on small transactions 
   (no close and reopen of fds in small transactions)

 * No refactoring for refs included, only minimally invasive to the refs.c 
 code

 * applies on top of origin/sb/remove-fd-from-ref-lock replacing the last
   commit there (I reworded the commit message of the last patch of that tip,
   being the first patch in this series)
   
 * another approach would be to move the fd counting into the lock file api,
   I think that's not worth it for now.

 I agree that it is a good direction to go to limit the number of
 open file descriptors.  Overall it looked good to me.

This is now pushed out and sitting at the tip of 'pu'.  It seems to
break one of the tests in 1400 when merged to 'next', but I didn't
look it closely.

Thanks.




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] UTF8 BOM follow-up

2015-04-17 Thread Karsten Blees

Am 16.04.2015 um 20:39 schrieb Junio C Hamano:
This is on top of the .gitignore can start with UTF8 BOM patch
from Carlos.

Second try; the first patch is new to clarify the logic in the
codeflow after Carlos's patch, and the second one has been adjusted
accordingly.

Junio C Hamano (4):
add_excludes_from_file: clarify the bom skipping logic
utf8-bom: introduce skip_utf8_bom() helper
config: use utf8_bom[] from utf.[ch] in git_parse_source()
attr: skip UTF8 BOM at the beginning of the input file

Wouldn't it be better to just strip the BOM on commit, e.g. via a clean filter
or pre-commit hook (as suggested in [1])? Or is this patch series only meant to
supplement such a solution (i.e. only strip the BOM when reading files from the
working-copy rather than the committed tree)?

According to rfc3629 chapter 6 [2], the use of a BOM as encoding signature
should be forbidden if the encoding is *known* to be always UTF-8. And
.gitignore, .gitattributes and .gitmodules contain path names, which are always
UTF-8 as of Git for Windows v1.7.10.

IOW, allowing a BOM would mean that files *without* BOM are *not* UTF-8 and
need to be decoded from e.g. system encoding (which unfortunately cannot be set
to UTF-8 on Windows). But this makes no sense as the repository would not be
portable. E.g. a .gitattributes file created on a Greek Windows, containing
greek path names in Cp1253, would not work on platforms with different encoding.

On the other hand, just ignoring the BOM (as this patch series does) leaves us
with two alternative binary representations of the same content file...i.e.
we'll eventually end up with spurious 1st line changes as users add / remove
BOMs from committed .git[ignore|attributes|modules] files, depending on their
editor preference...

For local files (.gitconfig, .git/info/exclude, .git/COMMIT_EDITMSG...),
auto-detecting encoding based on the presence of a BOM makes somewhat more
sense. However, this will most likely break editors that follow the
recommendation of the Unicode specification (Use of a BOM is neither required
nor recommended for UTF-8 [3]). So we'd probably need a core.editorEncoding or
core.editorUseBom setting to tell git whether no BOM means UTF-8 or system
encoding...

Just as a reminder: we should update the Git for Windows Unicode document [4]
if we improve support for BOM-adamant editors.

Cheers,
Karsten

[1]
http://stackoverflow.com/questions/27223985/git-ignore-bom-prevent-git-diff-from-showing-byte-order-mark-changes
[2] https://tools.ietf.org/html/rfc3629
[3] http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf p.40
[4]
https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#editor

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak karthik@gmail.com wrote:
 Update sha1_loose_object_info() to optionally allow it to read
 from a loose object file of unknown/bogus type; as the function
 usually returns the type of the object it read in the form of enum
 for known types, add an optional typename field to receive the
 name of the type in textual form and a flag to indicate the reading
 of a loose object file of unknown/bogus type.
 [...]
 ---
 diff --git a/sha1_file.c b/sha1_file.c
 index 980ce6b..267399d 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
 *sha1,
  }

  static int sha1_loose_object_info(const unsigned char *sha1,
 - struct object_info *oi)
 + struct object_info *oi,
 + int flags)
  {
 -   int status;
 -   unsigned long mapsize, size;
 +   int status = 0;
 +   unsigned long mapsize;
 void *map;
 git_zstream stream;
 char hdr[32];
 +   struct strbuf hdrbuf = STRBUF_INIT;

 if (oi-delta_base_sha1)
 hashclr(oi-delta_base_sha1);
 @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char 
 *sha1,
 return -1;
 if (oi-disk_sizep)
 *oi-disk_sizep = mapsize;
 -   if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
 -   status = error(unable to unpack %s header,
 -  sha1_to_hex(sha1));
 -   else if ((status = parse_sha1_header(hdr, size))  0)
 -   status = error(unable to parse %s header, 
 sha1_to_hex(sha1));
 -   else if (oi-sizep)
 -   *oi-sizep = size;
 +   if ((flags  LOOKUP_LITERALLY)) {
 +   if (unpack_sha1_header_to_strbuf(stream, map, mapsize, 
 hdrbuf)  0)
 +   status = error(unable to unpack %s header with 
 --literally,
 +  sha1_to_hex(sha1));
 +   else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
 flags))  0)
 +   status = error(unable to parse %s header with 
 --literally,
 +  sha1_to_hex(sha1));
 +   } else {
 +   if (unpack_sha1_header(stream, map, mapsize, hdr, 
 sizeof(hdr))  0)
 +   status = error(unable to unpack %s header,
 +  sha1_to_hex(sha1));
 +   else if ((status = parse_sha1_header_extended(hdr, oi, 
 flags))  0)
 +   status = error(unable to parse %s header, 
 sha1_to_hex(sha1));
 +   }
 git_inflate_end(stream);
 munmap(map, mapsize);
 -   if (oi-typep)
 +   if (status  oi-typep)
 *oi-typep = status;
 +   if (hdrbuf.buf)
 +   strbuf_release(hdrbuf);

Why is strbuf_release() protected by a conditional rather than being
called unconditionally? Am I missing something obvious?

 return 0;
  }

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 4/4] t1006: add tests for git cat-file --literally

2015-04-17 Thread karthik nayak



On 04/18/2015 05:30 AM, Eric Sunshine wrote:

On Wed, Apr 15, 2015 at 1:00 PM, Karthik Nayak karthik@gmail.com wrote:
 Signed-off-by: Karthik Nayak karthik@gmail.com
 ---
 diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
 index ab36b1e..61fab78 100755
 --- a/t/t1006-cat-file.sh
 +++ b/t/t1006-cat-file.sh
 @@ -296,4 +308,21 @@ test_expect_success '%(deltabase) reports packed delta 
bases' '
  }
   '

 +bogus_type=bogus
 +bogus_content=bogus
 +bogus_size=$(strlen $bogus_content)

If someone ever changes the value of 'bogus_content' so it contains
whitespace, then the result of strlen() will be incorrect as you've
invoked it. You should quote its argument (as other callers in this
script do) to safeguard against such an issue.

 bogus_size=$(strlen $bogus_content)

 +bogus_sha1=$(printf $bogus_content | git hash-object -t $bogus_type 
--literally -w --stdin)

Ditto regarding quoting of 'bogus_content'.

Also, if someone ever modifies 'bogus_content' so that it contains a
literal '%' (such as %s), then your printf() invocation will
misbehave. Either call it like this:

 $(printf '%s' $bogus_content | ...)

or, better yet, call echo_without_newline() as other similar code
elsewhere in this script does, and as suggested earlier[1].

[1]: http://article.gmane.org/gmane.comp.version-control.git/266972/

 +test_expect_success Type of broken object is correct '
 +   echo $bogus_type expect 
 +   git cat-file -t --literally $bogus_sha1 actual 
 +   test_cmp expect actual
 +'
 +
 +test_expect_success Size of broken object is correct '
 +echo $bogus_size expect 

Bad indentation. Use tab rather than spaces.

 +   git cat-file -s --literally $bogus_sha1 actual 
 +   test_cmp expect actual
 +'
 +
   test_done
 --
 2.4.0.rc1.249.gb598846

Thanks Eric for the changes. I did echo -n at the beginning but that
gave me a warning and asked me to use printf instead. I'll use
echo_without_newline, Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-17 Thread Stefan Saasen

 If it's not a problem, I'd love to see timings for your case with just
 the first patch, and then with both.

Thanks for the swift response, much appreciated Jeff!

Here are the timings for the two patches:

Patch 1 on top of 33d4221c79

   Elapsed   System  User
 Min.   :6.110   Min.   :0.6700   Min.   :0.3600
 1st Qu.:6.580   1st Qu.:0.6900   1st Qu.:0.3900
 Median :7.260   Median :0.7100   Median :0.4100
 Mean   :7.347   Mean   :0.7248   Mean   :0.4214
 3rd Qu.:8.000   3rd Qu.:0.7400   3rd Qu.:0.4600
 Max.   :8.860   Max.   :0.8700   Max.   :0.5100

I've had to slightly tweak your second patch (`freshened` was never
set) but applying the modified patch yielded even better results
compared to patch 1:

   Elapsed   System  User
 Min.   :0.38   Min.   :0.03000   Min.   :0.2900
 1st Qu.:0.38   1st Qu.:0.04000   1st Qu.:0.3100
 Median :0.39   Median :0.06000   Median :0.3200
 Mean   :0.43   Mean   :0.05667   Mean   :0.3519
 3rd Qu.:0.42   3rd Qu.:0.07000   3rd Qu.:0.3600
 Max.   :0.68   Max.   :0.08000   Max.   :0.5700

This is pretty much back to the before state.
The graph really tells the whole story:
https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png
(After is the change in #33d4221, Before the parent of #33d4221 and so on)
The graph and the NFS stats can be found here:
https://bitbucket.org/snippets/ssaasen/GeRE

My tweaked version of your second patch is:

diff --git a/cache.h b/cache.h
index 51ee856..8982055 100644
--- a/cache.h
+++ b/cache.h
@@ -1168,6 +1168,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
 pack_keep:1,
+   freshened:1,
 do_not_close:1;
unsigned char sha1[20];
/* something like .git/objects/pack/x.pack */
diff --git a/sha1_file.c b/sha1_file.c
index bc6322e..c0ccd4b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned
char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
struct pack_entry e;
-   return find_pack_entry(sha1, e)  freshen_file(e.p-pack_name);
+   if (!find_pack_entry(sha1, e))
+  return 0;
+   if (e.p-freshened)
+  return 1;
+   return e.p-freshened = freshen_file(e.p-pack_name);
 }

 int write_sha1_file(const void *buf, unsigned long len, const char
*type, unsigned char *returnsha1)



The only change is that I assign the result of `freshen_file` to the
`freshened` flag.

I've only ran this with the test case I was using before but it looks
like this is pretty much fixing the merge time changes we observed.

Thanks again for the swift response. I've got my test setup sitting
here, happy to rerun the tests if that'd be useful.

Is there a chance to backport those changes to the 2.2+ branches?

 You may also be interested in:

   http://thread.gmane.org/gmane.comp.version-control.git/266370

 which addresses another performance problem related to the
 freshen/recent code in v2.2.

Thanks for the pointer, I'll have a look at that as well.

Cheers,
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] pathspec: adjust prefixlen after striping trailing slash

2015-04-17 Thread Nguyễn Thái Ngọc Duy

A path(spec) from git perspective consists of two parts, the prefix,
and the rest. The prefix covers the part of `pwd` after expanding ..
components. The split is to support case-insensitive match in a sane
way (see 93d9353, especially the big comment block added in dir.c).

Normally the prefix is found after prefix_path_gently() and that will
be it. Unfortunately the *STRIP_SUBMODULE* flags can strip the
trailing slash (see 2ce53f9 for the reason) and cut into this prefix
part. In the test, the prefix is submodule/, but the final path is
just submodule. We need to readjust prefix info when this happens.

The assert() that catches this is turned to die() to make sure it's
always active. prefix_pathspec() is not in any critical path to be a
performance concern because of this change.

93d9353 (parse_pathspec: accept :(icase)path syntax - 2013-07-14)
2ce53f9 (git add: do not add files from a submodule - 2009-01-02)

Noticed-by: djanos_ (via irc)
Helped-by: Dennis Kaarsemaker den...@kaarsemaker.net
Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 On Fri, Apr 17, 2015 at 2:27 AM, Jens Lehmann jens.lehm...@web.de wrote:
  The problem seems to be that subrepo is still registered as a
  submodule in the index. But we should see a proper error message
  instead of an assert in that case ... CCed Duy who knows much
  more about pathspec.c than me.

 This deals with the bug in pathspec code. I leave it to you to decide
 how git-add should do when a submodule is registered in index, but
 it's no longer a submodule in worktree. Yeah maybe it should error
 out.

 pathspec.c| 18 +++---
 t/t3703-add-magic-pathspec.sh |  8 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 9304ee3..aa5e2c7 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -262,7 +262,6 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
} else
item-original = elt;
item-len = strlen(item-match);
-   item-prefix = prefixlen;
 
if ((flags  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP) 
(item-len = 1  item-match[item-len - 1] == '/') 
@@ -292,6 +291,15 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
 elt, ce_len, ce-name);
}
 
+   /*
+* Adjust prefixlen if the above trailing slash stripping cuts
+* into the prefix part
+*/
+   if ((flags  (PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP |
+ PATHSPEC_STRIP_SUBMODULE_SLASH_EXPENSIVE)) 
+   prefixlen  item-len)
+   prefixlen = item-len;
+
if (magic  PATHSPEC_LITERAL)
item-nowildcard_len = item-len;
else {
@@ -299,6 +307,7 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
if (item-nowildcard_len  prefixlen)
item-nowildcard_len = prefixlen;
}
+   item-prefix = prefixlen;
item-flags = 0;
if (magic  PATHSPEC_GLOB) {
/*
@@ -313,8 +322,11 @@ static unsigned prefix_pathspec(struct pathspec_item *item,
}
 
/* sanity checks, pathspec matchers assume these are sane */
-   assert(item-nowildcard_len = item-len 
-  item-prefix = item-len);
+   if (!(item-nowildcard_len = item-len 
+ item-prefix = item-len))
+   die(BUG: item-nowildcard_len (%d) or item-prefix (%d)
+is longer than item-len (%d),
+   item-nowildcard_len, item-prefix, item-len);
return magic;
 }
 
diff --git a/t/t3703-add-magic-pathspec.sh b/t/t3703-add-magic-pathspec.sh
index 5115de7..cced8c4 100755
--- a/t/t3703-add-magic-pathspec.sh
+++ b/t/t3703-add-magic-pathspec.sh
@@ -55,4 +55,12 @@ test_expect_success COLON_DIR 'a file with the same (short) 
magic name exists' '
git add -n ./:/bar
 '
 
+test_expect_success 'prefix is updated after trailing slash is stripped' '
+   git init submodule 
+   ( cd submodule  test_commit test ) 
+   git add submodule 
+   mv submodule/.git submodule/dotgit 
+   ( cd submodule  git add . )
+'
+
 test_done
-- 
2.3.0.rc1.137.g477eb31

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 4/4] t1006: add tests for git cat-file --literally