Re: [PATCH] mailmap: update brandon williams's email address

2018-12-08 Thread Ævar Arnfjörð Bjarmason


On Sat, Dec 08 2018, Junio C Hamano wrote:

> Stefan Beller  writes:
>
>> On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder  wrote:
>>>
>>> Brandon Williams wrote:
>>>
>>> > Signed-off-by: Brandon Williams 
>>> > ---
>>> >  .mailmap | 1 +
>>> >  1 file changed, 1 insertion(+)
>>>
>>> I can confirm that this is indeed the same person.
>>
>> What would be more of interest is why we'd be interested in this patch
>> as there is no commit/patch sent by Brandon with this email in gits history.
>
> Once I "git am" the message that began this thread, there will be a
> commit under this new ident, so that would be somewhat a moot point.

"Get to the top of 'git shortlog -sn' with this one easy trick" :)

(The patch makes sense, good to see you back on-list Brandon)


Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Ævar Arnfjörð Bjarmason


On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
>   /* Ignore commit comments */
>   if (!patchlen && !starts_with(line, "diff "))
>   continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
diff --git x/refs/files-backend.c y/refs/files-backend.c
index 9183875dad..dd8abe9185 100644
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store 
*refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
-   return files_reflog_path_other_worktrees(refs, sb, refname);
+   files_reflog_path_other_worktrees(refs, sb, refname);
+   break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;

Observe that the diff --git line matters, we hash it:

$ git diff-tree -p HEAD~.. | git patch-id
5870d115b7e2a9a936ab8fdc254932234413c710 

$ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id 
--stable
5870d115b7e2a9a936ab8fdc254932234413c710 

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id 
--stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 


The thing it doesn't care about is the "index" between the "diff" and
patch:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 


We also care about the +++ and --- lines:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
56985c2c38cce6079de2690082e1770a8e81214c 


Then we normalize the @@ line, e.g.:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/\d+/123/g' | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 



There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:

diff --git x/refs/files-backend.c y/refs/files-backend.c
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -123,123 +123,123 @@ static void files_reflog_path(struct 
files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
-   return files_reflog_path_other_worktrees(refs, sb, refname);
+   files_reflog_path_other_worktrees(refs, sb, refname);
+   break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c

To be:

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if 

Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Ævar Arnfjörð Bjarmason


On Fri, Dec 07 2018, Konstantin Ryabitsev wrote:

> Hi, all:
>
> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Can we please teach git-patch-id to work without the leading diff a/foo
> b/foo, same as git-am?
>
> Best,
> -K

The state machine is sensitive there being a "diff" line, then "index"
etc.

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 970d0d30b4..b99e4455fd 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -97,7 +97,9 @@ static int get_one_patchid(struct object_id *next_oid, struct 
object_id *result,
}

/* Ignore commit comments */
-   if (!patchlen && !starts_with(line, "diff "))
+   if (!patchlen && starts_with(line, "--- a/"))
+   ;
+   else if (!patchlen && !starts_with(line, "diff "))
continue;

/* Parsing diff header?  */

This would make it produce a patch-id for that input, however note that
I've done "--- a/" there, with just "--- " (which is legit) we'd get
confused and start earlier before the diffstat.

So if you're interested in having this I leave it to you to run with
this & write tests for it, but more convincingly run it on the git &
LKML archives and see that the output is the same (or just extra in case
where we now find patches) with --stable etc.


A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Ævar Arnfjörð Bjarmason
Let's ignore how bad this patch is for git.git, and just focus on how
diff.colorMoved treats it:

diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..d1155322ef 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -6,5 +6,3 @@
 #include "cache.h"
-#include "config.h"
 #include "builtin.h"
-#include "lockfile.h"
 #include "dir.h"
diff --git a/builtin/am.c b/builtin/am.c
index 8f27f3375b..eded15aa8a 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -6,3 +6,2 @@
 #include "cache.h"
-#include "config.h"
 #include "builtin.h"
diff --git a/builtin/blame.c b/builtin/blame.c
index 06a7163ffe..44a754f190 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -8,3 +8,2 @@
 #include "cache.h"
-#include "config.h"
 #include "color.h"
diff --git a/cache.h b/cache.h
index ca36b44ee0..ea8d60b94a 100644
--- a/cache.h
+++ b/cache.h
@@ -4,2 +4,4 @@
 #include "git-compat-util.h"
+#include "config.h"
+#include "new.h"
 #include "strbuf.h"

This is a common thing that's useful to have highlighted, e.g. we move
includes of config.h to some common file, so I want to se all the
deleted config.h lines as moved into the cache.h line, and then the
"lockfile.h" I removed while I was at it plain remove, and the new
"new.h" plain added.

Exactly that is what you get with diff.colorMoved=plain, but the default
of diff.colorMoved=zebra gets confused by this and highlights no moves
at all, same or "blocks" and "dimmed-zebra".

So at first I thought this had something to do with the many->one
detection, but it seems to be simpler, we just don't detect a move of
1-line with anything but plain, e.g. this works as expected in all modes
and detects the many->one:

diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..f4fda75890 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -5,4 +5,2 @@
  */
-#include "cache.h"
-#include "config.h"
 #include "builtin.h"
diff --git a/builtin/branch.c b/builtin/branch.c
index 0c55f7f065..52e39924d3 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -7,4 +7,2 @@

-#include "cache.h"
-#include "config.h"
 #include "color.h"
diff --git a/cache.h b/cache.h
index ca36b44ee0..d4146dbf8a 100644
--- a/cache.h
+++ b/cache.h
@@ -3,2 +3,4 @@

+#include "cache.h"
+#include "config.h"
 #include "git-compat-util.h"

So is there some "must be at least two consecutive lines" condition for
not-plain, or is something else going on here?


Re: git, monorepos, and access control

2018-12-06 Thread Ævar Arnfjörð Bjarmason


On Thu, Dec 06 2018, Jeff King wrote:

> On Thu, Dec 06, 2018 at 10:08:57AM +0900, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>
>> > In my opinion this feature is so contrary to Git's general assumptions
>> > that it's likely to create a ton of information leaks of the supposedly
>> > protected data.
>> > ...
>>
>> Yup, with s/implemented/designed/, I agree all you said here
>> (snipped).
>
> Heh, yeah, I actually scratched my head over what word to use. I think
> Git _could_ be written in a way that is both compatible with existing
> repositories (i.e., is still recognizably Git) and is careful about
> object access control. But either way, what we have now is not close to
> that.
>
>> > Sorry I don't have a more positive response. What you want to do is
>> > perfectly reasonable, but I just think it's a mismatch with how Git
>> > works (and because of the security impact, one missed corner case
>> > renders the whole thing useless).
>>
>> Yup, again.
>>
>> Storing source files encrypted and decrypting with smudge filter
>> upon checkout (and those without the access won't get keys and will
>> likely to use sparse checkout to exclude these priviledged sources)
>> is probably the only workaround that does not involve submodules.
>> Viewing "diff" and "log -p" would still be a challenge, which
>> probably could use the same filter as smudge for textconv.
>
> I suspect there are going to be some funny corner cases there. I use:
>
>   [diff "gpg"]
>   textconv = gpg -qd --no-tty
>
> which works pretty well, but it's for files which are _never_ decrypted
> by Git. So they're encrypted in the working tree too, and I don't use
> clean/smudge filters.
>
> If the files are already decrypted in the working tree, then running
> them through gpg again would be the wrong thing. I guess for a diff
> against the working tree, we would always do a "clean" operation to
> produce the encrypted text, and then decrypt the result using textconv.
> Which would work, but is rather slow.
>
>> I wonder (and this is the primary reason why I am responding to you)
>> if it is common enough wish to use the same filter for smudge and
>> textconv?  So far, our stance (which can be judged from the way the
>> clean/smudge filters are named) has been that the in-repo
>> representation is the canonical, and the representation used in the
>> checkout is ephemeral, and that is why we run "diff", "grep",
>> etc. over the in-repo representation, but the "encrypted in repo,
>> decrypted in checkout" abuse would be helped by an option to do the
>> reverse---find changes and look substrings in the representation
>> used in the checkout.  I am not sure if there are other use cases
>> that is helped by such an option.
>
> Hmm. Yeah, I agree with your line of reasoning here. I'm not sure how
> common it is. This is the first I can recall it. And personally, I have
> never really used clean/smudge filters myself, beyond some toy
> experiments.
>
> The other major user of that feature I can think of is LFS. There Git
> ends up diffing the LFS pointers, not the big files. Which arguably is
> the wrong thing (you'd prefer to see the actual file contents diffed),
> but I think nobody cares in practice because large files generally don't
> have readable diffs anyway.

I don't use this either, but I can imagine people who use binary files
via clean/smudge would be well served by dumping out textual metadata of
the file for diffing instead of showing nothing.

E.g. for a video file I might imagine having lines like:

duration-seconds: 123
camera-model: Shiny Thingamabob

Then when you check in a new file your "git diff" will show (using
normal diff view) that:

   - duration-seconds: 123
   + duration-seconds: 321
camera-model: Shiny Thingamabob

etc.


Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Josh Steadmon wrote:

> Breaks load_commit_graph_one() into a new function,
> parse_commit_graph(). The latter function operates on arbitrary buffers,
> which makes it suitable as a fuzzing target.
>
> Adds fuzz-commit-graph.c, which provides a fuzzing entry point
> compatible with libFuzzer (and possibly other fuzzing engines).
>
> Signed-off-by: Josh Steadmon 
> ---
>  .gitignore  |  1 +
>  Makefile|  1 +
>  commit-graph.c  | 63 +
>  fuzz-commit-graph.c | 18 +
>  4 files changed, 66 insertions(+), 17 deletions(-)
>  create mode 100644 fuzz-commit-graph.c
>
> diff --git a/.gitignore b/.gitignore
> index 0d77ea5894..8bcf153ed9 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -1,3 +1,4 @@
> +/fuzz-commit-graph
>  /fuzz_corpora
>  /fuzz-pack-headers
>  /fuzz-pack-idx
> diff --git a/Makefile b/Makefile
> index 1a44c811aa..6b72f37c29 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
>
>  ETAGS_TARGET = TAGS
>
> +FUZZ_OBJS += fuzz-commit-graph.o
>  FUZZ_OBJS += fuzz-pack-headers.o
>  FUZZ_OBJS += fuzz-pack-idx.o
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..0755359b1a 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -46,6 +46,10 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
>
> +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
> + size_t graph_size);
> +
> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>   return xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -84,16 +88,10 @@ static int commit_graph_compatible(struct repository *r)
>  struct commit_graph *load_commit_graph_one(const char *graph_file)
>  {
>   void *graph_map;
> - const unsigned char *data, *chunk_lookup;
>   size_t graph_size;
>   struct stat st;
> - uint32_t i;
> - struct commit_graph *graph;
> + struct commit_graph *ret;
>   int fd = git_open(graph_file);
> - uint64_t last_chunk_offset;
> - uint32_t last_chunk_id;
> - uint32_t graph_signature;
> - unsigned char graph_version, hash_version;
>
>   if (fd < 0)
>   return NULL;
> @@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char 
> *graph_file)
>   die(_("graph file %s is too small"), graph_file);
>   }
>   graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
> + ret = parse_commit_graph(graph_map, fd, graph_size);
> +
> + if (ret == NULL) {

Code in git usually uses just !ret.

> + munmap(graph_map, graph_size);
> + close(fd);
> + exit(1);

Ouch, exit(1) from load_commit_graph_one()? Can't we return NULL here
instead? Nasty to exit from a library routine, but then I see later...

> @@ -201,11 +235,6 @@ struct commit_graph *load_commit_graph_one(const char 
> *graph_file)
>   }
>
>   return graph;
> -
> -cleanup_fail:
> - munmap(graph_map, graph_size);
> - close(fd);
> - exit(1);
>  }

... ah, I see this is where you got the exit(1) from. So it was there
already.

>  static void prepare_commit_graph_one(struct repository *r, const char 
> *obj_dir)
> diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> new file mode 100644
> index 00..420851d0d2
> --- /dev/null
> +++ b/fuzz-commit-graph.c
> @@ -0,0 +1,18 @@
> +#include "object-store.h"
> +#include "commit-graph.h"
> +
> +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
> + size_t graph_size);
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
> +
> +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> +{
> + struct commit_graph *g;
> +
> + g = parse_commit_graph((void *) data, -1, size);
> + if (g)
> + free(g);
> +
> + return 0;
> +}

I hadn't looked at this before, but see your 5e47215080 ("fuzz: add
basic fuzz testing target.", 2018-10-12) for some prior art.

There's instructions there for a very long "make" invocation. Would be
nice if this were friendlier and we could just do "make test-fuzz" or
something...


Re: git, monorepos, and access control

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Coiner, John wrote:

I forgot to mention this in my initial reply in
<878t13zp8y@evledraar.gmail.com>, but on a re-reading I re-spotted
this:

>   - hashes are secret. If the hashes from a protected tree leak, the
> data also leaks. No check on the server prevents it from handing out
> contents for correctly-guessed hashes.

This is a thing I know *way* less about so maybe I'm completely wrong,
but even if we have all the rest of the things outlined in your post to
support this, isn't this part going to be susceptible to timing attacks?

We'll do more work if you send a SHA-1 during negotiation that shares a
prefix with an existing SHA-1, since we need to binary search & compare
further. SHA-1 is 160 bits which gives you a huge space of potential
hashes, but not if I can try one bit at a time working from the start of
the hash to work my way to a valid existing hash stored on the server.

Of course that assumes a way to do this over the network, it'll be on
AMD's internal network so much faster than average, but maybe this is
completely implausible.

NetSpectre was different and relied on executing code on the remote
computer in a sandbox, not waiting for network roundtrips for each try,
so maybe this would be a non-issue.


Re: git, monorepos, and access control

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Duy Nguyen wrote:

> On Wed, Dec 5, 2018 at 9:46 PM Derrick Stolee  wrote:
>> This directory-level security is not a goal for VFS for Git, and I don't
>> see itbecoming a priority as it breaks a number of design decisions we
>> made in our object storage and communication models.
>>
>> The best I can think about when considering Git as an approach would be
>> to use submodules for your security-related content, and then have server-
>> side security for access to those repos. Of course, submodules are not
>> supported in VFS for Git, either.
>
> Another option is builtin per-blob encryption (maybe with just
> clean/smudge filter), then access control will be about obtaining the
> decryption key (*) and we don't break object storage and
> communication. Of course pack delta compression becomes absolutely
> useless. But that is perhaps an acceptable trade off.

Right, this is another option, but from what John described wouldn't
work in this case. "Any hypothetical AMD monorepo should be able to
securely deny read access in certain subtrees to users without required
permissions".

I.e. in this case there will be a
secret-stuff-here/ryzen-microcode.code.encrypted or whatever,
unauthorized users can't see the content, but they can see from the
filename that it exists, and from "git log" who works on it.

It'll also baloon in size on the server-side since we can't delta any of
these objects, they'll all be X sized encrypted binaries.

> (*) Git will not cache the key in any shape or form. Whenever it needs
> to deflate an encrypted blob, it asks for the key from a separate
> daemon. This guy handles all the access control.
>
>> The Gerrit service has _branch_ level security, which is related to the
>> reachability questions that a directory security would need. However,
>> the problem is quite different. Gerrit does have a lot of experience in
>> dealing with submodules, though, so that's probably a good place to
>> start.
>>
>> Thanks,
>> -Stolee


Re: git, monorepos, and access control

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Coiner, John wrote:

> I'm an engineer with AMD. I'm looking at whether we could switch our
> internal version control to a monorepo, possibly one based on git and
> VFSForGit.
>
> One obstacle to moving AMD to git/VFSForGit is the lack of access
> control support in git. AMD has a lot of data whose distribution must be
> limited. Sometimes it's a legal requirement, eg. CPU core designs are
> covered by US export control laws and not all employees may see them.
> Sometimes it's a contractual obligation, as when a third party shares
> data with us and we agree only to share this data with certain
> employees. Any hypothetical AMD monorepo should be able to securely deny
> read access in certain subtrees to users without required permissions.
>
> Has anyone looked at adding access control to git, at a per-directory
> granularity? Is this a feature that the git community would possibly
> welcome?
>
> Here's my rough thinking about how it might work:
>   - an administrator can designate that a tree object requires zero or
> more named privileges to read
>   - when a mortal user attempts to retrieve the tree object, a hook
> allows the server to check if the user has a given privilege. The hook
> can query an arbitrary user/group data base, LDAP or whatever. The
> details of this check are mostly in the hook; git only knows about
> abstract named privileges.
>   - if the user has permission, everything goes as normal.
>   - if the user lacks permission, they get a DeniedTree object which
> might carry some metadata about what permissions would be needed to see
> more. The DeniedTree lacks the real tree's entries. (TBD, how do we
> render a denied tree in the workspace? An un-writable directory
> containing only a GITDENIED file with some user friendly error message?)
>   - hashes are secret. If the hashes from a protected tree leak, the
> data also leaks. No check on the server prevents it from handing out
> contents for correctly-guessed hashes.
>   - mortal users shouldn't be able to alter permissions. Of course,
> mortal users will often modify tree objects that carry permissions. So
> the server should enforce that a user isn't pushing updates that alter
> permissions on the same logical directory.
>
> I would welcome your feedback on whether this idea makes technical
> sense, and whether the feature could ever be a fit for git.
>
> You might ask what alternatives we are looking at. At our scale, we'd
> really want a version control system that implements a virtual
> filesystem. That already limits us to ClearCase, VFSForGit, and maybe
> Vesta among public ones.  Am I missing any? We would also want one that
> permits branching enormous numbers of files without creating enormous
> amounts of data in the repo -- git gets that right, and perforce (our
> status quo) does not. That's how I got onto the idea of adding read
> authorization to git.

All of what you've described is possible to implement in git, but as far
as I know there's no existing implementation of it.

Microsoft's GVFS probably comes closest, and they're actively
upstreaming bits of that, but as far as I know that doesn't in any way
try to solve this "users XYZ can't even list such-and-such a tree"
problem.

Google has much the same problem with Android, i.e. there's some parts
of the giant checkout of multiple repos the "repo" tool manages that
aren't available to some of Google's employees, or only to some partners
etc. They solve this by splitting those at repo boundaries, and are
working on moving that to submodules. Perhaps this would work in your
case? It's not as easy to work with, but could be made to work with
existing software.

In case you haven't, read "SECURITY" in the git-fetch(1) manpage,
although if that could be fixed to work for the case you're describing.

In a hypothetical implementation that does this, you are always going to
need to have something where you hand off the SHA-1 of the
"secret-stuff-here/" top-level tree object to clients that can't access
any of the sub-trees or blobs from that directory, since they're going
to need it to construct a commit object consisting of
"stuff-they-can-access/" plus the current state of "secret-stuff-here/"
to push upstream. But you can hide the blobs & trees under
"secret-stuff-here/" and their SHA-1s.

Note also that in the general case your "is this blob secret to user
xyz" or "is this tree secret to user xyz" will need to deal with there
happening to be some blob or tree in "secret-stuff-here/" that just so
happens to share the exact same tree/blob as something they need under
"stuff-they-can-access/". Think a directory that only contains a
".gitignore" file, or a BSD "LICENSE" file.


Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, SZEDER Gábor wrote:

> On Wed, Dec 05, 2018 at 03:01:41PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> decide to stress test in advance, since we'd either flock() the trash
>> >> dir, or just mktemp(1)-it.
>> >
>> > While 'mktemp' seems to be more portable than 'flock', it doesn't seem
>> > to be portable enough; at least it's not in POSIX.>
>>
>> We are already relying on stuff like mktemp() being reliable for
>> e.g. the split index.
>
> That's the mktemp() function from libc; I meant the 'mktemp' command
> that we would use this early in 'test-lib.sh', where PATH has not been
> set up for testing yet.

Ah, if that ever becomes an issue it's a one-line test-tool wrapper.


Re: gitweb: local configuration not found

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Jonathan Nieder wrote:

> Martin Mares wrote[1]:
>
>> After upgrade to Stretch, gitweb no longer finds the configuration file
>> "gitweb_config.perl" in the current directory. However, "man gitweb" still
>> mentions this as one of the possible locations of the config file (and
>> indeed a useful one when using multiple instances of gitweb).
>>
>> It was probably broken by Perl dropping "." from the default search path
>> for security reasons.
>
> Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc),
>
>   Core modules and tools no longer search "." for optional modules
>
> gitweb.perl contains
>
>  sub read_config_file {
>  my $filename = shift;
>  return unless defined $filename;
>  # die if there are errors parsing config file
>  if (-e $filename) {
>  do $filename;
>
> which implies an @INC search but it is silly --- as the "-e" test
> illustrates, this never intended to search @INC.
>
> Documentation says "If you are absolutely certain that you want your
> script to load and execute a file from the current directory, then use
> a ./ prefix".  We can do that, like so:
>
> diff --git i/gitweb/Makefile w/gitweb/Makefile
> index cd194d057f..3160b6cc5d 100644
> --- i/gitweb/Makefile
> +++ w/gitweb/Makefile
> @@ -18,7 +18,7 @@ RM ?= rm -f
>  INSTALL ?= install
>
>  # default configuration for gitweb
> -GITWEB_CONFIG = gitweb_config.perl
> +GITWEB_CONFIG = ./gitweb_config.perl
>  GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
>  GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf
>  GITWEB_HOME_LINK_STR = projects
>
> but that does not help if someone overrides GITWEB_CONFIG, and besides,
> it would be nicer to avoid the possibility of an @INC search altogether.
> Another alternative would be to use
>
>   local @INC = ('.');
>
> Would that be better?
>
> Advice from someone more versed than I am in perl would be very welcome
> (hence the cc to Ævar).

It seems most sensible to follow the ./FILE.pl advice per
https://metacpan.org/pod/distribution/perl/pod/perl5260delta.pod#Removal-of-the-current-directory-(%22.%22)-from-@INC

Just:

local @INC = '.';
do 'FILE.pl';

Would do the same thing, but seems like a more indirect way to do it if
all we want is ./ anyway. FWIW to be pedantically bug-compatible with
the old version (we should not do this) it's:

local @INC = (@INC, ".");
do 'FILE.pl';

I.e. before our behavior was implicitly to check whether we had a local
FILE.pl, then loop through all of @INC to see if we found it there, and
finally come back to the file we did the -e check for.


Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, SZEDER Gábor wrote:

> On Tue, Dec 04, 2018 at 07:11:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> It's a frequent annoyance of mine in the test suite that I'm
>> e.g. running t*.sh with some parallel "prove" in one screen, and then I
>> run tABCD*.sh manually, and get unlucky because they use the same trash
>> dir, and both tests go boom.
>>
>> You can fix that with --root, which is much of what this patch does. My
>> one-liner for doing --stress has been something like:
>>
>> perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error 
>> soon,fail=1 './t-basic.sh --root=trash-{} -v'
>>
>> But it would be great if I didn't have to worry about that and could
>> just run two concurrent:
>>
>> ./t-basic.sh
>>
>> So I think we could just set some env variable where instead of having
>> the predictable trash directory we have a $TRASHDIR.$N as this patch
>> does, except we pick $N by locking some test-runs/tABCD.Nth file with
>> flock() during the run.
>
> Is 'flock' portable?  It doesn't appear so.

Portable enough, and since it's just an alias for "grab lock atomically"
it can devolve into other more basic FS functions on systems that don't
have it.

>> Then a stress mode like this would just be:
>>
>> GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel 
>> --jobs=100% --halt-on-error soon,fail=1 './t-basic.sh'
>
> This doesn't address the issue of TCP ports for various daemons.

Once we have a test ABCD and slot offset N (for a fixed size of N) we
can pick a port from that.

>> And sure, we could ship a --stress option too, but it wouldn't be
>> magical in any way, just another "spawn N in a loop" implementation, but
>> you could also e.g. use GNU parallel to drive it, and without needing to
>
> GNU 'parallel' may or may not be available on the platform, that's why
> I stuck with the barebones shell-loops-in-the-background approach.

Yes, my point is not that you should drop that part of your patch for
using GNU parallel, but just to demonstrate that we can get pretty close
to this for most tests with an external tool, and that it would make
this sort of thing work for concurrent tests without needing to decide
to concurrently test in advance.

>> decide to stress test in advance, since we'd either flock() the trash
>> dir, or just mktemp(1)-it.
>
> While 'mktemp' seems to be more portable than 'flock', it doesn't seem
> to be portable enough; at least it's not in POSIX.>

We are already relying on stuff like mktemp() being reliable for
e.g. the split index.

> And in general I'd prefer deterministic trash directory names.  If I
> re-run a failed test after fixing the bug, then currently the trash
> directory will be overwritten, and that's good.  With 'mktemp's junk
> in the trash direcory's name it won't be overwritten, and if my bugfix
> doesn't work, then I'll start accumulating trash directories and won't
> even know which one is from the last run.

In general you wouldn't need to set GIT_TEST_FLOCKED_TRASH_DIRS=1 and
would get the same monolithic trash names as now, and for something like
the flock() version it could use a job number as a suffix like your
patch does.


Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-04 Thread Ævar Arnfjörð Bjarmason


On Tue, Dec 04 2018, SZEDER Gábor wrote:

> Unfortunately, we have a few flaky tests, whose failures tend to be
> hard to reproduce.  We've found that the best we can do to reproduce
> such a failure is to run the test repeatedly while the machine is
> under load, and wait in the hope that the load creates enough variance
> in the timing of the test's commands that a failure is evenually
> triggered.  I have a command to do that, and I noticed that two other
> contributors have rolled their own scripts to do the same, all
> choosing slightly different approaches.
>
> To help reproduce failures in flaky tests, introduce the '--stress'
> option to run a test script repeatedly in multiple parallel
> invocations until one of them fails, thereby using the test script
> itself to increase the load on the machine.
>
> The number of parallel invocations is determined by, in order of
> precedence: the number specified as '--stress=', or the value of
> the GIT_TEST_STRESS_LOAD environment variable, or twice the number of
> available processors in '/proc/cpuinfo', or 8.
>
> To prevent the several parallel invocations of the same test from
> interfering with each other:
>
>   - Include the parallel job's number in the name of the trash
> directory and the various output files under 't/test-results/' as
> a '.stress-' suffix.
>
>   - Add the parallel job's number to the port number specified by the
> user or to the test number, so even tests involving daemons
> listening on a TCP socket can be stressed.
>
>   - Make '--stress' imply '--verbose-log' and discard the test's
> standard ouput and error; dumping the output of several parallel
> tests to the terminal would create a big ugly mess.
>
> 'wait' for all parallel jobs before exiting (either because a failure
> was found or because the user lost patience and aborted the stress
> test), allowing the still running tests to finish.  Otherwise the "OK
> X.Y" progress output from the last iteration would likely arrive after
> the user got back the shell prompt, interfering with typing in the
> next command.  OTOH, this waiting might induce a considerable delay
> between hitting ctrl-C and the test actually exiting; I'm not sure
> this is the right tradeoff.

I think it makes sense to generalize this and split it up into two
features.

It's a frequent annoyance of mine in the test suite that I'm
e.g. running t*.sh with some parallel "prove" in one screen, and then I
run tABCD*.sh manually, and get unlucky because they use the same trash
dir, and both tests go boom.

You can fix that with --root, which is much of what this patch does. My
one-liner for doing --stress has been something like:

perl -E 'say ++$_ while 1' | parallel --jobs=100% --halt-on-error 
soon,fail=1 './t-basic.sh --root=trash-{} -v'

But it would be great if I didn't have to worry about that and could
just run two concurrent:

./t-basic.sh

So I think we could just set some env variable where instead of having
the predictable trash directory we have a $TRASHDIR.$N as this patch
does, except we pick $N by locking some test-runs/tABCD.Nth file with
flock() during the run.

Then a stress mode like this would just be:

GIT_TEST_FLOCKED_TRASH_DIRS=1 perl -E 'say ++$_ while 1' | parallel 
--jobs=100% --halt-on-error soon,fail=1 './t-basic.sh'

And sure, we could ship a --stress option too, but it wouldn't be
magical in any way, just another "spawn N in a loop" implementation, but
you could also e.g. use GNU parallel to drive it, and without needing to
decide to stress test in advance, since we'd either flock() the trash
dir, or just mktemp(1)-it.


Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-04 Thread Ævar Arnfjörð Bjarmason


On Tue, Dec 04 2018, SZEDER Gábor wrote:

> The number of parallel invocations is determined by, in order of
> precedence: the number specified as '--stress=', or the value of
> the GIT_TEST_STRESS_LOAD environment variable, or twice the number of
> available processors in '/proc/cpuinfo', or 8.

With this series we have at least 3 ways to get this number. First
online_cpus() in the C code, then Peff's recent `getconf
_NPROCESSORS_ONLN` in doc-diff, and now this /proc/cpuinfo parsing.

Perhaps it makes sense to split online_cpus() into a helper to use from
the shellscripts instead?


[PATCH 2/3] sha1-file: emit error if an alternate looks like a repository

2018-12-04 Thread Ævar Arnfjörð Bjarmason
Since 26125f6b9b ("detect broken alternates.", 2006-02-22) we've
emitted an error if the alternates directory doesn't exist, but not
for the common misstep of adding a path to another git repository as
an alternate, as opposed to its "objects" directory.

Let's check for this, i.e. whether X/objects or X/.git/objects exists
if the user supplies X and print an error (which as a commit leading
up to this one shows doesn't change the exit code, just "warns").

This check is intentionally not implemented by e.g. requiring that any
of X/?? exists or X/info or X/pack exists. It's a legitimate use-case
to point to an existing alternate that hasn't been populated yet, but
pointing to one where an "X/objects" or "X/.git/objects" directory
exists is definitely a mistake we should warn the user about.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 sha1-file.c   | 10 +-
 t/t5613-info-alternate.sh | 14 ++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index 5bd11c85bc..f142f81658 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -376,12 +376,20 @@ static int alt_odb_usable(struct raw_object_store *o,
 {
struct alternate_object_database *alt;
 
-   /* Detect cases where alternate disappeared */
if (!is_directory(path->buf)) {
+   /* Detect cases where alternate disappeared */
error(_("object directory %s does not exist; "
"check .git/objects/info/alternates"),
  path->buf);
return 0;
+   } else if (is_directory(mkpath("%s/objects", path->buf)) ||
+  is_directory(mkpath("%s/.git/objects", path->buf))) {
+   /* Detect cases where alternate is a git repository */
+   error(_("object directory %s looks like a git repository; "
+   "alternates must point to the 'objects' directory. "
+   "check .git/objects/info/alternates"),
+ path->buf);
+   return 0;
}
 
/*
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index d2964c57b7..b959e21421 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -143,4 +143,18 @@ test_expect_success 'print "error" on non-existing 
alternate' '
test_i18ngrep "does not exist; check" stderr
 '
 
+test_expect_success 'print "error" on alternate that looks like a git 
repository' '
+   git init --bare J &&
+   git init --bare K &&
+
+   # H is bare, G is not
+   echo ../../H >J/objects/info/alternates &&
+   echo ../../G >K/objects/info/alternates &&
+
+   git -C J fsck 2>stderr &&
+   test_i18ngrep "looks like a git repository; alternates must" stderr &&
+   git -C K fsck 2>stderr &&
+   test_i18ngrep "looks like a git repository; alternates must" stderr
+'
+
 test_done
-- 
2.20.0.rc2.403.gdbc3b29805



[PATCH 1/3] sha1-file: test the error behavior of alt_odb_usable()

2018-12-04 Thread Ævar Arnfjörð Bjarmason
Add a test for the error() case in alt_odb_usable() where an alternate
directory doesn't exist. This behavior has been the same since
26125f6b9b ("detect broken alternates.", 2006-02-22), but if that
error() was turned into die() the entire test suite would still pass.

Perhaps we should die() in that case, but let's start by adding a test
here to assert the long-standing existing behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5613-info-alternate.sh | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 895f46bb91..d2964c57b7 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -136,4 +136,11 @@ test_expect_success CASE_INSENSITIVE_FS 'dup finding can 
be case-insensitive' '
test_cmp expect actual.alternates
 '
 
+test_expect_success 'print "error" on non-existing alternate' '
+   git init --bare I &&
+   echo DOES_NOT_EXIST >I/objects/info/alternates &&
+   git -C I fsck 2>stderr &&
+   test_i18ngrep "does not exist; check" stderr
+'
+
 test_done
-- 
2.20.0.rc2.403.gdbc3b29805



[PATCH 3/3] sha1-file: change alternate "error:" message to "warning:"

2018-12-04 Thread Ævar Arnfjörð Bjarmason
Change the "error" message emitted by alt_odb_usable() to be a
"warning" instead. As noted in commits leading up to this one this has
never impacted the exit code ever since the check was initially added
in 26125f6b9b ("detect broken alternates.", 2006-02-22).

It's confusing to emit an "error" when e.g. "git fsck" will exit with
0, so let's emit a "warning:" instead.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 sha1-file.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index f142f81658..4b9b63bdcb 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -378,17 +378,17 @@ static int alt_odb_usable(struct raw_object_store *o,
 
if (!is_directory(path->buf)) {
/* Detect cases where alternate disappeared */
-   error(_("object directory %s does not exist; "
-   "check .git/objects/info/alternates"),
- path->buf);
+   warning(_("object directory %s does not exist; "
+ "check .git/objects/info/alternates"),
+   path->buf);
return 0;
} else if (is_directory(mkpath("%s/objects", path->buf)) ||
   is_directory(mkpath("%s/.git/objects", path->buf))) {
/* Detect cases where alternate is a git repository */
-   error(_("object directory %s looks like a git repository; "
-   "alternates must point to the 'objects' directory. "
-   "check .git/objects/info/alternates"),
- path->buf);
+   warning(_("object directory %s looks like a git repository; "
+ "alternates must point to the 'objects' directory. "
+ "check .git/objects/info/alternates"),
+   path->buf);
return 0;
}
 
-- 
2.20.0.rc2.403.gdbc3b29805



[PATCH 0/3] sha1-file: warn if alternate is a git repo (not object dir)

2018-12-04 Thread Ævar Arnfjörð Bjarmason
This adds a warning for the issue discussed upthread. As noted in
these patches we've been emitting an "error" while not impacting the
exit code, should we die() instead? Maybe, but until there's consensus
on that let's change this to warning() while we're at it.

Ævar Arnfjörð Bjarmason (3):
  sha1-file: test the error behavior of alt_odb_usable()
  sha1-file: emit error if an alternate looks like a repository
  sha1-file: change alternate "error:" message to "warning:"

 sha1-file.c   | 16 
 t/t5613-info-alternate.sh | 21 +
 2 files changed, 33 insertions(+), 4 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805



Re: [PATCH v3 01/42] parse-options: support --git-completion-helper

2018-12-02 Thread Ævar Arnfjörð Bjarmason


On Fri, Feb 09 2018, Nguyễn Thái Ngọc Duy wrote:

> +static int show_gitcomp(struct parse_opt_ctx_t *ctx,
> + const struct option *opts)
> +{

Says it returns 'static int'...

> [...]
> + exit(0);

Then just exits...

> + /* lone --git-completion-helper is asked by git-completion.bash 
> */
> + if (ctx->total == 1 && !strcmp(arg + 1, 
> "-git-completion-helper"))
> + return show_gitcomp(ctx, options);

This return value is never used.

Whine from SunCC (not fatal, also this was in v2.17.0 so no need to fix
before 2.20, just saw this now):

"parse-options.c", line 520: warning: Function has no return
statement : show_gitcomp


Re: [RFC] git clean --local

2018-12-02 Thread Ævar Arnfjörð Bjarmason


On Sat, Dec 01 2018, Cameron Boehmer wrote:

> 1) add a new flag
> -l, --local
> Do not consult git config --global core.excludesFile in
> determining what files git ignores. This is useful in conjunction with
> -x/-X to preserve user files while removing build artifacts.

Or perhaps a general flag to ignore configuration would be useful for
such cases, see
https://public-inbox.org/git/87zhtqvm66@evledraar.gmail.com/


Re: easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-02 Thread Ævar Arnfjörð Bjarmason


On Sun, Dec 02 2018, Robert P. J. Day wrote:

>   as part of an upcoming git class i'm delivering, i thought it would
> be amusing to demonstrate the maximum length of colliding SHA-1
> prefixes in a repository (in my case, i use the linux kernel git repo
> for most of my examples).
>
>   is there a way to display the objects in the object database that
> clash in the longest object name SHA-1 prefix; i mean, short of
> manually listing all object names, running that through cut and sort
> and uniq and ... you get the idea.
>
>   is there a cute way to do that? thanks.

You'll always need to list them all. It's inherently an operation where
for each SHA-1 you need to search for other ones with that prefix up to
a given length.

Perhaps you've missed that you can use --abbrev=N for this, and just
grep for things that are loger than that N, e.g. for linux.git:

git log --oneline --abbrev=10 --pretty=format:%h |
grep -E -v '^.{10}$' |
perl -pe 's/^(.{10}).*/$1/'

This will list the 4 objects that need more than 10 characters to be
shown unambiguously. If you then "git cat-file -t" them you'll get the
disambiguation help.


Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files

2018-11-30 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 30 2018, Duy Nguyen wrote:

> On Fri, Nov 30, 2018 at 12:05 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> Assuming greenfield development (which we definitely don't have), I
>> don't like the "restore-files" name, but the alternative that makes
>> sense is "checkout". Then this "--from" argument could become "git
>> checkout-tree  -- ", and we'd have:
>>
>> git switch 
>> git checkout 
>> git checkout-tree  -- 
>>
>> Or maybe that sucks, anyway what I was going to suggest is *if* others
>> think that made sense as a "if we designed git today" endgame whether we
>> could have an opt-in setting where you set e.g. core.uiVersion=3 (in
>> anticipation of Git 3.0) and you'd get that behavior. There could be
>> some other setting where core.uiVersion would use the old behavior (or
>> with another setting, only warn) if we weren't connected to a terminal.
>
> core.uiVersion is a big no no to me. I don't want to go to someone's
> terminal, type something and have a total surprise because they set
> different ui version. If you want a total UI redesign, go with a new
> prefix, like "ng" (for new git) or something instead of "git".

I don't think that's a viable way forward. First, we're not talking
about a total change of the UI. Most the main porcelain will stay the
same. So we'd have a "ng" that's >98% the same UI, and then if we
discover warts in in 10 years we'd like to fix then what do wo do? Ship
"nng" and so forth?

We already have this UI problem with various config variables that
change things. I think we should just solve this general issue by a
combination of:

 a) Accepting that over the long term we will have Git's UI changing,
sometimes in breaking ways (with sensible phase-in), hopefully for
the better. E.g. we had this with "git-init" v.s. "git init". This
is similar, there'd be an error due to ambiguity with a "hint"
saying use the new thing if you e.g. feed "git checkout" a branch
name.

 b) For the general problem of "user has exotic config" we should learn
a "git -Q " option similar to Emacs, which is another highly
customizable piece of software that has a "don't read user config"
escape hatch.

That's a bit more complex than for Emacs since we need to actually
read some config (e.g. remote config from .git/config), and maybe
opt to keep some stuff like user.*. But there's no reason we can't
have such a black/whitelist of config & env variables that impact us
with a switch to get "make it as if nothing was configured" for
whatever sane version of that we'd come up with.


Re: [PATCH] format-patch: do not let its diff-options affect --range-diff (was Re: [PATCH 2/2] format-patch: allow for independent diff & range-diff options)

2018-11-30 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 30 2018, Junio C Hamano wrote:

> Junio C Hamano  writes:
>
>>> I had to delay -rc2 to see these last minute tweaks come to some
>>> reasonable place to stop at, and I do not think we want to delay the
>>> final any longer or destablizing it further by piling last minute
>>> undercooked changes on top.
>>
>> So how about doing this on top of 'master' instead?  As this leaks
>> *no* information wrt how range-diff machinery should behave from the
>> format-patch side by not passing any diffopt, as long as the new
>> code I added to show_range_diff() comes up with a reasonable default
>> diffopts (for which I really would appreciate extra sets of eyes to
>> make sure), this change by definition cannot be wrong (famous last
>> words).
>
> As listed in today's "What's cooking" report, I've merged this to
> 'next' in today's pushout and planning to have it in the -rc2.  I am
> not married to this exact implementation, and I'd welcome to have an
> even simpler and less disruptive solution if exists, but I am hoping
> that this is a good-enough interim measure for the upcoming release,
> until we decide what to do with the customizability of range-diff
> driven by format-patch.
>
> In addition to this, I am planning the "rebase --stat" and "reflog
> that does not say 'rebase -i' but 'rebase'" fixes merged to 'master'
> before cutting -rc2.

Thanks a lot, yeah having this wait looks good to me.


Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Nov 29 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> v3 sees switch-branch go back to switch-branch (in v2 it was
>> checkout-branch). checkout-files is also renamed restore-files (v1 was
>> restore-paths). Hopefully we won't see another rename.
>>
>> I'll try to summarize the differences between the new commands and
>> 'git checkout' down here, but you're welcome to just head to 07/14 and
>> read the new man pages.
>>
>> 'git switch-branch'
>>
>> - does not "do nothing", you have to either switch branch, create a
>>   new branch, or detach. "git switch-branch" with no arguments is
>>   rejected.
>>
>> - implicit detaching is rejected. If you need to detach, you need to
>>   give --detach. Or stick to 'git checkout'.
>>
>> - -b/-B is renamed to -c/-C with long option names
>>
>> - of course does not accept pathspec
>>
>> 'git restore-files'
>>
>> - takes a ref from --from argument, not as a free ref. As a result,
>>   '--' is no longer needed. All non-option arguments are pathspec
>>
>> - pathspec is mandatory, you can't do "git restore-files" without any
>>   pathspec.
>>
>> - I just remember -p which is allowed to take no pathspec :( I'll fix
>>   it later.
>>
>> - Two more fancy features (the "git checkout --index" being the
>>   default mode and the backup log for accidental overwrites) are of
>>   course still missing. But they are coming.
>>
>> I did not go replace "detached HEAD" with "unnamed branch" (or "no
>> branch") everywhere because I think a unique term is still good to
>> refer to this concept. Or maybe "no branch" is good enough. I dunno.
>
> I finally tracked down
> https://redfin.engineering/two-commits-that-wrecked-the-user-experience-of-git-f0075b77eab1
> which I'd remembered reading and couldn't find again in these
> discussions. Re-reading it while one may not 100% agree with the
> author's opinion, it's an interesting rabbit hole.
>
> I also didn't know about EasyGit, or that Elijah Newren had written
> it. I haven't seen him chime in on this series, and would be interested
> to see what he thinks about it.
>
> Re the naming question in
> https://public-inbox.org/git/87o9abzv46@evledraar.gmail.com/ &
> seeing that eg-switch exists, I wonder if just s/switch-branch/switch/
> makes more sense.
>
> Assuming greenfield development (which we definitely don't have), I
> don't like the "restore-files" name, but the alternative that makes
> sense is "checkout". Then this "--from" argument could become "git
> checkout-tree  -- ", and we'd have:
>
> git switch 
> git checkout 
> git checkout-tree  -- 
>
> Or maybe that sucks, anyway what I was going to suggest is *if* others
> think that made sense as a "if we designed git today" endgame whether we
> could have an opt-in setting where you set e.g. core.uiVersion=3 (in
> anticipation of Git 3.0) and you'd get that behavior. There could be
> some other setting where core.uiVersion would use the old behavior (or
> with another setting, only warn) if we weren't connected to a terminal.
>
> I.e. I'm thinking of this as step #2 in a #3 step series. Where this is
> the fully backwards compatible UI improvement, but someone who'd
> e.g. use EasyGit or didn't have backwards compatibility concerns could
> enable step #3 and opt-in to a mode where we'd fixed a bunch of UI warts
> in a backwards-incompatible way.
>
> What would that mode look like? I'd to work on piling that on top of
> this :)

(Digging some more)

There's also more interesting prior art at https://gitless.com/ (CC'd
authors) and 2x research papers linked at the bottom of that page which
were briefly discussed on-list before:
https://public-inbox.org/git/20160930191413.002049b94b3908b15881b...@domain007.com/

The "gitless" UI has a "gl checkout" which just takes paths as I was
musing about above (and that Redfin post also suggests).


Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Nguyễn Thái Ngọc Duy wrote:

> v3 sees switch-branch go back to switch-branch (in v2 it was
> checkout-branch). checkout-files is also renamed restore-files (v1 was
> restore-paths). Hopefully we won't see another rename.
>
> I'll try to summarize the differences between the new commands and
> 'git checkout' down here, but you're welcome to just head to 07/14 and
> read the new man pages.
>
> 'git switch-branch'
>
> - does not "do nothing", you have to either switch branch, create a
>   new branch, or detach. "git switch-branch" with no arguments is
>   rejected.
>
> - implicit detaching is rejected. If you need to detach, you need to
>   give --detach. Or stick to 'git checkout'.
>
> - -b/-B is renamed to -c/-C with long option names
>
> - of course does not accept pathspec
>
> 'git restore-files'
>
> - takes a ref from --from argument, not as a free ref. As a result,
>   '--' is no longer needed. All non-option arguments are pathspec
>
> - pathspec is mandatory, you can't do "git restore-files" without any
>   pathspec.
>
> - I just remember -p which is allowed to take no pathspec :( I'll fix
>   it later.
>
> - Two more fancy features (the "git checkout --index" being the
>   default mode and the backup log for accidental overwrites) are of
>   course still missing. But they are coming.
>
> I did not go replace "detached HEAD" with "unnamed branch" (or "no
> branch") everywhere because I think a unique term is still good to
> refer to this concept. Or maybe "no branch" is good enough. I dunno.

I finally tracked down
https://redfin.engineering/two-commits-that-wrecked-the-user-experience-of-git-f0075b77eab1
which I'd remembered reading and couldn't find again in these
discussions. Re-reading it while one may not 100% agree with the
author's opinion, it's an interesting rabbit hole.

I also didn't know about EasyGit, or that Elijah Newren had written
it. I haven't seen him chime in on this series, and would be interested
to see what he thinks about it.

Re the naming question in
https://public-inbox.org/git/87o9abzv46@evledraar.gmail.com/ &
seeing that eg-switch exists, I wonder if just s/switch-branch/switch/
makes more sense.

Assuming greenfield development (which we definitely don't have), I
don't like the "restore-files" name, but the alternative that makes
sense is "checkout". Then this "--from" argument could become "git
checkout-tree  -- ", and we'd have:

git switch 
git checkout 
git checkout-tree  -- 

Or maybe that sucks, anyway what I was going to suggest is *if* others
think that made sense as a "if we designed git today" endgame whether we
could have an opt-in setting where you set e.g. core.uiVersion=3 (in
anticipation of Git 3.0) and you'd get that behavior. There could be
some other setting where core.uiVersion would use the old behavior (or
with another setting, only warn) if we weren't connected to a terminal.

I.e. I'm thinking of this as step #2 in a #3 step series. Where this is
the fully backwards compatible UI improvement, but someone who'd
e.g. use EasyGit or didn't have backwards compatibility concerns could
enable step #3 and opt-in to a mode where we'd fixed a bunch of UI warts
in a backwards-incompatible way.

What would that mode look like? I'd to work on piling that on top of
this :)


Re: [PATCH 2/2] format-patch: allow for independent diff & range-diff options

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Thu, 29 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Nov 29 2018, Johannes Schindelin wrote:
>>
>> > On Thu, 29 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> On Thu, Nov 29 2018, Johannes Schindelin wrote:
>> >>
>> >> > On Wed, 28 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>> >> >
>> >> >> Change the semantics of the "--range-diff" option so that the regular
>> >> >> diff options can be provided separately for the range-diff and the
>> >> >> patch. This allows for supplying e.g. --range-diff-U0 and -U1 to
>> >> >> "format-patch" to provide different context for the range-diff and the
>> >> >> patch. This wasn't possible before.
>> >> >
>> >> > I really, really dislike the `--range-diff-`. We have
>> >> > precedent for passing optional arguments that are passed to some other
>> >> > command, so a much more logical and consistent convention would be to 
>> >> > use
>> >> > `--range-diff[=..]`, allowing all of the diff options that
>> >> > you might want to pass to the outer diff in one go rather than having a
>> >> > lengthy string of `--range-diff-this` and `--range-diff-that` options.
>> >>
>> >> Where do we pass those sorts of arguments?
>> >>
>> >> Reasons I did it this way:
>> >>
>> >>  a) Passing it as one option will require the user to double-quote those
>> >> options that take quoted arguments (e.g. --word-diff-regex), which I
>> >> thought sucked more than the prefix. On the implementation side we
>> >> couldn't leave the parsing of the command-line to the shell anymore.
>> >>
>> >>  b) I think people will want to tweak this very rarely, much more rarely
>> >> than e.g. -U10 in format-patch itself, so having something long-ish
>> >> doesn't sound bad.
>> >
>> > Hmm. I still don't like it. It sets a precedent, and we simply do not do
>> > it that way in other circumstances (most obvious would be the -X merge
>> > options). The more divergent user interfaces for the same sort of thing
>> > are, the more brain cycles you force users to spend on navigating said
>> > interfaces.
>>
>> Yeah it sucks, I just think it sucks less than the alternative :)
>> I.e. I'm not picky about --range-diff-* prefix the name, but I think
>> doing our own shell parsing would be nasty.
>
> What prevents you from using `sq_dequote_to_argv()`?

I mean not just nasty in terms of implementation, yeah we could do it,
but also a nasty UX for things like --word-diff-regex. I.e. instead of:

--range-diff-word-diff-regex='[0-9"]'

You need:

--range-diff-opts="--word-diff-regex='[0-9\"]'"

Now admittedly that in itself isn't very painful *in this case*, but in
terms of precedent I really dislike that option, i.e. git having some
mode where I need to work to escape input to pass to another command.

Not saying that this --range-diff-* thing is what we should go for, but
surely we can find some way to do deal with this that doesn't involve
the user needing to escape stuff like this.

It also has other downstream effects in the UI, e.g. it's presumably
easy to teach the bash completion that a --foo=XYZ option is also called
--some-prefix--foo=XYZ and to enable completion for that, less so for
making it smart enough to complete "--some-prefix-opts="--foo=".

>> >> > I only had time to skim the patch, and I have to wonder why you pass
>> >> > around full-blown `rev_info` structs for range diff (and with that 
>> >> > really
>> >> > awful name `rd_rev`) rather than just the `diff_options` that you
>> >> > *actually* care about?
>> >>
>> >> Because setup_revisions() which does all the command-line parsing needs
>> >> a rev_info, so even if we only need the diffopt in the end we need to
>> >> initiate the whole thing.
>> >>
>> >> Suggestions for a better varibale name most welcome.
>> >
>> > `range_diff_revs`
>> >
>> > And you do not need to pass around the whole thing. You can easily pass
>> > `_diff_revs.diffopt`.
>> >
>> > Don't pass around what you do not need to pass around.
>>
>> Ah, you mean internally in log.c, yes that makes sense. I thought you
>> meant just pass diffopt to setup_revisions() (which n

Re: Simple git push --tags deleted all branches

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Mateusz Loskot wrote:

> On Thu, 29 Nov 2018 at 16:03, Ævar Arnfjörð Bjarmason  
> wrote:
>> On Wed, Nov 28 2018, Mateusz Loskot wrote:
>> >
>> > (using git version 2.19.2.windows.1)
>> >
>> > I've just encountered one of those WTH moments.
>> >
>> > I have a bare repository
>> >
>> > core.git (BARE:master) $ git branch
>> >   1.0
>> >   2.0
>> > * master
>> >
>> > core.git (BARE:master) $ git tag
>> > 1.0.1651
>> > 1.0.766
>> > 2.0.1103
>> > 2.0.1200
>> >
>> > I published the repo using: git push --all --follow-tags
>> >
>> > This succeeded, but there seem to be no tags pushed, just branches.
>> > So, I followed with
>> >
>> > core.git (BARE:master) $ git push --tags
>> > To XXX
>> >  - [deleted]   1.0
>> >  - [deleted]   2.0
>> >  ! [remote rejected]   master (refusing to delete the current
>> > branch: refs/heads/master)
>> > error: failed to push some refs to 'XXX'
>> >
>> > And, I've found out that all branches and tags have been
>> > wiped in both, local repo and remote :)
>> >
>> > I restored the repo and tried out
>> >
>> > git push origin 1.0
>> > git push origin --tags
>> >
>> > and this time both succeeded, without wiping out any refs.
>> >
>> > Could anyone help me to understand why remote-less
>> >
>> > git push --tags
>> >
>> > is/was so dangerous and unforgiving?!
>>
>> Since nobody's replied yet, I can't see what's going on here from the
>> info you've provided. My guess is that you have something "mirror" set
>> on the remote.
>
> Thank you for responding.
>
> The git push --tags hugely surprised me, and here is genuine screenshot
> https://twitter.com/mloskot/status/1068072285846859776
>
>> It seems you can't share the repo or its URL, but could you share the
>> scrubbed output of 'git config -l --show-origin' when run inside this
>> repository?
>
> Here is complete output. I have not stripped the basics like aliases,
> just in case.

Right, it's because you used --mirror, the important bit:

> file:config remote.origin.url=https://xxx.com/core-external-metadata.git
> file:config remote.origin.fetch=+refs/*:refs/*
> file:config remote.origin.mirror=true
> file:config

I.e. you have cloned with the --mirror flag, this is what it's supposed
to do: https://git-scm.com/docs/git-clone#git-clone---mirror
https://git-scm.com/docs/git-fetch#git-fetch---prune

I.e. you push and git tries to mirror the refs you have locally to the
remote, including pruning stuff in the remote.

This is useful, but not what you wanted here. It's used for e.g. making
an up-to-date copy of a repo from server A to server B in HA setups
where you'd like to fail over to server B and get the same refs you had
on A.


Re: Git Tags

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Stefanie Leisestreichler wrote:

> Hi.
>
> I have done this (on box A):
>
> git commit -m "Message"
> git tag -a 0.9.0
> git push origin master
>
> In my local repository, when I run "git tag" it is showing me "0.9.0".
>
> Then I did (on box B)
> git clone ssh://user@host:/path/project.git
> cd project
> git tag
>
> Now git tag is showing nothing.
>
> Why is the tag only available in my local repository?
>
> Also when I try to
> git clone --branch 0.9.0 ssh://user@host:/path/project.git
> it tells me: fatal:remote branch not found in upstream repository origin

Because --branch  means get refs/heads/, tags are not
branches. However, because we're apparently quite loose about this in
the clone/fetch code this does give you the tag if it exists, but
probably not in the way you expect.

We interpret the argument as a branch, and will get not only this tag
but "follow" (see --no-tags in git-fetch(1)) the tag as though it were a
branch and give you all tags leading up to that one. This would give you
a single tag:

git clone --no-tags --branch v2.19.0 --single-branch 
https://github.com/git/git.git

But this is a more direct way to do it:

git init git; git -C git fetch --no-tags https://github.com/git/git.git tag 
v2.19.0

Which'll since you said it failed that's because you haven't pushed the
tag. Try 'git ls-remote ' to see if it's there (it's not).


Re: Simple git push --tags deleted all branches

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Mateusz Loskot wrote:

> Hi,
>
> (using git version 2.19.2.windows.1)
>
> I've just encountered one of those WTH moments.
>
> I have a bare repository
>
> core.git (BARE:master) $ git branch
>   1.0
>   2.0
> * master
>
> core.git (BARE:master) $ git tag
> 1.0.1651
> 1.0.766
> 2.0.1103
> 2.0.1200
>
> I published the repo using: git push --all --follow-tags
>
> This succeeded, but there seem to be no tags pushed, just branches.
> So, I followed with
>
> core.git (BARE:master) $ git push --tags
> To XXX
>  - [deleted]   1.0
>  - [deleted]   2.0
>  ! [remote rejected]   master (refusing to delete the current
> branch: refs/heads/master)
> error: failed to push some refs to 'XXX'
>
> And, I've found out that all branches and tags have been
> wiped in both, local repo and remote :)
>
> I restored the repo and tried out
>
> git push origin 1.0
> git push origin --tags
>
> and this time both succeeded, without wiping out any refs.
>
> Could anyone help me to understand why remote-less
>
> git push --tags
>
> is/was so dangerous and unforgiving?!

Since nobody's replied yet, I can't see what's going on here from the
info you've provided. My guess is that you have something "mirror" set
on the remote.

It seems you can't share the repo or its URL, but could you share the
scrubbed output of 'git config -l --show-origin' when run inside this
repository?


Re: [PATCH 2/2] format-patch: allow for independent diff & range-diff options

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Thu, 29 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Nov 29 2018, Johannes Schindelin wrote:
>>
>> > On Wed, 28 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> Change the semantics of the "--range-diff" option so that the regular
>> >> diff options can be provided separately for the range-diff and the
>> >> patch. This allows for supplying e.g. --range-diff-U0 and -U1 to
>> >> "format-patch" to provide different context for the range-diff and the
>> >> patch. This wasn't possible before.
>> >
>> > I really, really dislike the `--range-diff-`. We have
>> > precedent for passing optional arguments that are passed to some other
>> > command, so a much more logical and consistent convention would be to use
>> > `--range-diff[=..]`, allowing all of the diff options that
>> > you might want to pass to the outer diff in one go rather than having a
>> > lengthy string of `--range-diff-this` and `--range-diff-that` options.
>>
>> Where do we pass those sorts of arguments?
>>
>> Reasons I did it this way:
>>
>>  a) Passing it as one option will require the user to double-quote those
>> options that take quoted arguments (e.g. --word-diff-regex), which I
>> thought sucked more than the prefix. On the implementation side we
>> couldn't leave the parsing of the command-line to the shell anymore.
>>
>>  b) I think people will want to tweak this very rarely, much more rarely
>> than e.g. -U10 in format-patch itself, so having something long-ish
>> doesn't sound bad.
>
> Hmm. I still don't like it. It sets a precedent, and we simply do not do
> it that way in other circumstances (most obvious would be the -X merge
> options). The more divergent user interfaces for the same sort of thing
> are, the more brain cycles you force users to spend on navigating said
> interfaces.

Yeah it sucks, I just think it sucks less than the alternative :)
I.e. I'm not picky about --range-diff-* prefix the name, but I think
doing our own shell parsing would be nasty.

>> > I only had time to skim the patch, and I have to wonder why you pass
>> > around full-blown `rev_info` structs for range diff (and with that really
>> > awful name `rd_rev`) rather than just the `diff_options` that you
>> > *actually* care about?
>>
>> Because setup_revisions() which does all the command-line parsing needs
>> a rev_info, so even if we only need the diffopt in the end we need to
>> initiate the whole thing.
>>
>> Suggestions for a better varibale name most welcome.
>
> `range_diff_revs`
>
> And you do not need to pass around the whole thing. You can easily pass
> `_diff_revs.diffopt`.
>
> Don't pass around what you do not need to pass around.

Ah, you mean internally in log.c, yes that makes sense. I thought you
meant just pass diffopt to setup_revisions() (which needs the containing
struct). Willdo.

> Ciao,
> Dscho
>
>>
>> > Ciao,
>> > Dscho
>> >
>> >>
>> >> Ever since the "--range-diff" option was added in
>> >> 31e2617a5f ("format-patch: add --range-diff option to embed diff in
>> >> cover letter", 2018-07-22) the "rev->diffopt" we pass down to the diff
>> >> machinery has been the one we get from "format-patch"'s own
>> >> setup_revisions().
>> >>
>> >> This sort of thing is unique among the log-like commands in
>> >> builtin/log.c, no command than format-patch will embed the output of
>> >> another log-like command. Since the "rev->diffopt" is reused we need
>> >> to munge it before we pass it to show_range_diff(). See
>> >> 43dafc4172 ("format-patch: don't include --stat with --range-diff
>> >> output", 2018-11-22) for a related regression fix which is being
>> >> mostly reverted here.
>> >>
>> >> Implementation notes: 1) We're not bothering with the full teardown
>> >> around die() and will leak memory, but it's too much boilerplate to do
>> >> all the frees with/without the die() and not worth it. 2) We call
>> >> repo_init_revisions() for "rd_rev" even though we could get away with
>> >> a shallow copy like the code we're replacing (and which
>> >> show_range_diff() itself does). This is to make this code more easily
>> >> understood.
>> >>
>> >> Signed-off-by: Ævar Arnfjörð Bja

Re: [PATCH 2/2] format-patch: allow for independent diff & range-diff options

2018-11-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 29 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Wed, 28 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the semantics of the "--range-diff" option so that the regular
>> diff options can be provided separately for the range-diff and the
>> patch. This allows for supplying e.g. --range-diff-U0 and -U1 to
>> "format-patch" to provide different context for the range-diff and the
>> patch. This wasn't possible before.
>
> I really, really dislike the `--range-diff-`. We have
> precedent for passing optional arguments that are passed to some other
> command, so a much more logical and consistent convention would be to use
> `--range-diff[=..]`, allowing all of the diff options that
> you might want to pass to the outer diff in one go rather than having a
> lengthy string of `--range-diff-this` and `--range-diff-that` options.

Where do we pass those sorts of arguments?

Reasons I did it this way:

 a) Passing it as one option will require the user to double-quote those
options that take quoted arguments (e.g. --word-diff-regex), which I
thought sucked more than the prefix. On the implementation side we
couldn't leave the parsing of the command-line to the shell anymore.

 b) I think people will want to tweak this very rarely, much more rarely
than e.g. -U10 in format-patch itself, so having something long-ish
doesn't sound bad.

> I only had time to skim the patch, and I have to wonder why you pass
> around full-blown `rev_info` structs for range diff (and with that really
> awful name `rd_rev`) rather than just the `diff_options` that you
> *actually* care about?

Because setup_revisions() which does all the command-line parsing needs
a rev_info, so even if we only need the diffopt in the end we need to
initiate the whole thing.

Suggestions for a better varibale name most welcome.

> Ciao,
> Dscho
>
>>
>> Ever since the "--range-diff" option was added in
>> 31e2617a5f ("format-patch: add --range-diff option to embed diff in
>> cover letter", 2018-07-22) the "rev->diffopt" we pass down to the diff
>> machinery has been the one we get from "format-patch"'s own
>> setup_revisions().
>>
>> This sort of thing is unique among the log-like commands in
>> builtin/log.c, no command than format-patch will embed the output of
>> another log-like command. Since the "rev->diffopt" is reused we need
>> to munge it before we pass it to show_range_diff(). See
>> 43dafc4172 ("format-patch: don't include --stat with --range-diff
>> output", 2018-11-22) for a related regression fix which is being
>> mostly reverted here.
>>
>> Implementation notes: 1) We're not bothering with the full teardown
>> around die() and will leak memory, but it's too much boilerplate to do
>> all the frees with/without the die() and not worth it. 2) We call
>> repo_init_revisions() for "rd_rev" even though we could get away with
>> a shallow copy like the code we're replacing (and which
>> show_range_diff() itself does). This is to make this code more easily
>> understood.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  Documentation/git-format-patch.txt | 10 ++-
>>  builtin/log.c  | 42 +++---
>>  t/t3206-range-diff.sh  | 41 +
>>  3 files changed, 82 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/git-format-patch.txt 
>> b/Documentation/git-format-patch.txt
>> index aba4c5febe..6c048f415f 100644
>> --- a/Documentation/git-format-patch.txt
>> +++ b/Documentation/git-format-patch.txt
>> @@ -24,7 +24,8 @@ SYNOPSIS
>> [--to=] [--cc=]
>> [--[no-]cover-letter] [--quiet] [--notes[=]]
>> [--interdiff=]
>> -   [--range-diff= [--creation-factor=]]
>> +   [--range-diff= [--creation-factor=]
>> +  [--range-diff]]
>> [--progress]
>> []
>> [  |  ]
>> @@ -257,6 +258,13 @@ feeding the result to `git send-email`.
>>  creation/deletion cost fudge factor. See linkgit:git-range-diff[1])
>>  for details.
>>
>> +--range-diff::
>> +Other options prefixed with `--range-diff` are stripped of
>> +that prefix and passed as-is to the diff machinery used to
>> +generate the range-diff, e.g. `--range-diff-U0` and
>> +`--range-diff--no-color`. This allows for adjusting the format
>> +of the range-diff independently from the patch itself.
>> +
>> 

Re: [PATCH 0/5] Add a new "sparse" tree walk algorithm

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Derrick Stolee via GitGitGadget wrote:

> One of the biggest remaining pain points for users of very large
> repositories is the time it takes to run 'git push'. We inspected some slow
> pushes by our developers and found that the "Enumerating Objects" phase of a
> push was very slow. This is unsurprising, because this is why reachability
> bitmaps exist. However, reachability bitmaps are not available to us because
> of the single pack-file requirement. The bitmap approach is intended for
> servers anyway, and clients have a much different behavior pattern.
>
> Specifically, clients are normally pushing a very small number of objects
> compared to the entire working directory. A typical user changes only a
> small cone of the working directory, so let's use that to our benefit.
>
> Create a new "sparse" mode for 'git pack-objects' that uses the paths that
> introduce new objects to direct our search into the reachable trees. By
> collecting trees at each path, we can then recurse into a path only when
> there are uninteresting and interesting trees at that path. This gains a
> significant performance boost for small topics while presenting a
> possibility of packing extra objects.
>
> The main algorithm change is in patch 4, but is set up a little bit in
> patches 1 and 2.
>
> As demonstrated in the included test script, we see that the existing
> algorithm can send extra objects due to the way we specify the "frontier".
> But we can send even more objects if a user copies objects from one folder
> to another. I say "copy" because a rename would (usually) change the
> original folder and trigger a walk into that path, discovering the objects.
>
> In order to benefit from this approach, the user can opt-in using the
> pack.useSparse config setting. This setting can be overridden using the
> '--no-sparse' option.

This is really interesting. I tested this with:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 124b1bafc4..5c7615f06c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3143 +3143 @@ static void get_object_list(int ac, const char **av)
-   mark_edges_uninteresting(, show_edge, sparse);
+   mark_edges_uninteresting(, show_edge, 1);

To emulate having a GIT_TEST_* mode for this, which seems like a good
idea since it turned up a lot of segfaults in pack-objects. I wasn't
able to get a backtrace for that since it always happens indirectly, and
I didn't dig enough to see how to manually invoke it the right way.


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> What do you think about some patch like that which retains the plumbing
>> behavior for things like read-tree, doesn't introduce "precious" or
>> "trashable", and just makes you specify "[checkout|merge|...] --force"
>> in cases where we'd have clobbering?
>
> Whether you like it or not, don't people's automation use tons of
> invocations of "git merge", "git checkout", etc.?  You'd be breaking
> them by such a change.

I'm so sympathetic to this argument that I tried to convince you of
something like this around a year and a half ago:
https://public-inbox.org/git/cacbzzx59kxpoejiuktzln6zjo_xpiwve7xga6q-53j2lwvf...@mail.gmail.com/
:)

I was probing for what your current stance on this sort of thing is,
because discussions like this tend to get bogged down in the irrelevant
distraction of whether something is plumbing or porcelain, which almost
none of our users care about, and we've effectively stopped caring about
ourselves.

But we must have some viable way to repair warts in the tools, and
losing user data is a *big* wart.

I don't think something like the endgame you've described in
https://public-inbox.org/git/xmqqzhtwuhpc@gitster-ct.c.googlers.com/
is ever going to work. Novice git users (the vast majority) are not
going to diligently update both .gitignore and some .gitattribute
mechanism in lockstep. I'd bet most git users haven't read more than a
few paragraphs of our entire documentation at best.

So what's the way forward? I think ultimately we must move to something
where we effectively version the entire CLI UI similar to stable API
versions. I.e. for things like this that would break some things (or
Duy's new "split checkout") introduce them as flags first, then bundle
up all such flags and cut a major release "Git 3, 4, ...", and
eventually remove old functionality.

> Other than that, if we never had Git before and do not have to worry
> about existing users, I'd think it would be a lot closer to the ideal
> than today's system if "checkout  foo.o" rejected overwriting
> "foo.o" that is not tracked in the current index but matches an ignore
> pattern, and required a "--force" option to overwrite it.
>
> A user, during a conflict resolution, may say "I want this 'git
> checkout foo/' to ignore conflicted paths in that directory, so I
> would give "--force" option to it, but now "--force" also implies
> that I am willing to clobber ignored paths, which means I cannot use
> it".
>
> I would think that a proper automation needs per-path hint from the
> user and/or the project, not just a single-size-fits-all --force
> option, and "unlike all the *.o ignored files that are expendable,
> this vendor-supplied-object.o is not" is one way to give such a
> per-path hint.
>
>> This would give scripts which relied on our stable plumbing consistent
>> behavior, while helping users who're using our main porcelain not to
>> lose data. I could then add a --force option to the likes of read-tree
>> (on by default), so you could get porcelain-like behavior with
>> --no-force.
>
> At that low level, I suspect that a single size fits all "--force"
> would work even less well.

Yeah I don't think the one-size-fits-all way out of this is a single
--force flag.


[PATCH 1/2] format-patch: add test for --range-diff diff output

2018-11-28 Thread Ævar Arnfjörð Bjarmason
As noted in 43dafc4172 ("format-patch: don't include --stat with
--range-diff output", 2018-11-22) the diff options provided on the
command-line currently affect both the range-diff and the patch
output, but there was no test for checking this with output where we'd
show a patch diff. Let's add one.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3206-range-diff.sh | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 90def330bd..bc5facc1cd 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -267,4 +267,64 @@ test_expect_success 'format-patch --range-diff as 
commentary' '
test_i18ngrep "^Range-diff:$" actual
 '
 
+test_expect_success 'format-patch with ' '
+   # No diff options
+   git format-patch --cover-letter --stdout --range-diff=topic~..topic \
+   changed~..changed >actual.raw &&
+   sed -ne "/^1:/,/^--/p" actual.range-diff &&
+   sed -e "s|:$||" >expect <<-\EOF &&
+   1:  a63e992 ! 1:  d966c5c s/12/B/
+   @@ -8,7 +8,7 @@
+@@
+ 9
+ 10
+   - B
+   + BB
+-12
++B
+ 13
+   -- :
+   EOF
+   test_cmp expect actual.range-diff &&
+   sed -ne "/^--- /,/^--/p" actual.diff &&
+   sed -e "s|:$||" >expect <<-\EOF &&
+   --- a/file
+   +++ b/file
+   @@ -9,7 +9,7 @@ A
+9
+10
+BB
+   -12
+   +B
+13
+14
+15
+   -- :
+   EOF
+   test_cmp expect actual.diff &&
+
+   # -U0
+   git format-patch --cover-letter --stdout -U0 \
+   --range-diff=topic~..topic changed~..changed >actual.raw &&
+   sed -ne "/^1:/,/^--/p" actual.range-diff &&
+   sed -e "s|:$||" >expect <<-\EOF &&
+   1:  a63e992 ! 1:  d966c5c s/12/B/
+   @@ -11 +11 @@
+   - B
+   + BB
+   -- :
+   EOF
+   test_cmp expect actual.range-diff &&
+   sed -ne "/^--- /,/^--/p" actual.diff &&
+   sed -e "s|:$||" >expect <<-\EOF &&
+   --- a/file
+   +++ b/file
+   @@ -12 +12 @@ BB
+   -12
+   +B
+   -- :
+   EOF
+   test_cmp expect actual.diff
+'
+
 test_done
-- 
2.20.0.rc1.387.gf8505762e3



[PATCH 0/2] format-patch: fix root cause of recent regression

2018-11-28 Thread Ævar Arnfjörð Bjarmason
As noted in 2/2 this fixes the root cause of the bug I plastered over
in
https://public-inbox.org/git/20181122211248.24546-3-ava...@gmail.com/
(that patch is sitting in 'next').

1/2 is a test for existing behavior, to make it more easily understood
what's being changed.

Junio: I know it's late, but unless Eric has objections to this UI
change I'd really like to have this in 2.20 since this is a change to
a new command-line UI that's newly added in 2.20.

As noted in 2/2 the current implementation is inherently limited, you
can't tweak diff options for the range-diff in the cover-letter and
the patch independently, now you can, and the implementation is much
less nasty now that we're not having to share diffopts across two
different modes of operation.

Ævar Arnfjörð Bjarmason (2):
  format-patch: add test for --range-diff diff output
  format-patch: allow for independent diff & range-diff options

 Documentation/git-format-patch.txt |  10 ++-
 builtin/log.c  |  42 +---
 t/t3206-range-diff.sh  | 101 +
 3 files changed, 142 insertions(+), 11 deletions(-)

-- 
2.20.0.rc1.387.gf8505762e3



[PATCH 2/2] format-patch: allow for independent diff & range-diff options

2018-11-28 Thread Ævar Arnfjörð Bjarmason
Change the semantics of the "--range-diff" option so that the regular
diff options can be provided separately for the range-diff and the
patch. This allows for supplying e.g. --range-diff-U0 and -U1 to
"format-patch" to provide different context for the range-diff and the
patch. This wasn't possible before.

Ever since the "--range-diff" option was added in
31e2617a5f ("format-patch: add --range-diff option to embed diff in
cover letter", 2018-07-22) the "rev->diffopt" we pass down to the diff
machinery has been the one we get from "format-patch"'s own
setup_revisions().

This sort of thing is unique among the log-like commands in
builtin/log.c, no command than format-patch will embed the output of
another log-like command. Since the "rev->diffopt" is reused we need
to munge it before we pass it to show_range_diff(). See
43dafc4172 ("format-patch: don't include --stat with --range-diff
output", 2018-11-22) for a related regression fix which is being
mostly reverted here.

Implementation notes: 1) We're not bothering with the full teardown
around die() and will leak memory, but it's too much boilerplate to do
all the frees with/without the die() and not worth it. 2) We call
repo_init_revisions() for "rd_rev" even though we could get away with
a shallow copy like the code we're replacing (and which
show_range_diff() itself does). This is to make this code more easily
understood.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-format-patch.txt | 10 ++-
 builtin/log.c  | 42 +++---
 t/t3206-range-diff.sh  | 41 +
 3 files changed, 82 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index aba4c5febe..6c048f415f 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -24,7 +24,8 @@ SYNOPSIS
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
   [--interdiff=]
-  [--range-diff= [--creation-factor=]]
+  [--range-diff= [--creation-factor=]
+ [--range-diff]]
   [--progress]
   []
   [  |  ]
@@ -257,6 +258,13 @@ feeding the result to `git send-email`.
creation/deletion cost fudge factor. See linkgit:git-range-diff[1])
for details.
 
+--range-diff::
+   Other options prefixed with `--range-diff` are stripped of
+   that prefix and passed as-is to the diff machinery used to
+   generate the range-diff, e.g. `--range-diff-U0` and
+   `--range-diff--no-color`. This allows for adjusting the format
+   of the range-diff independently from the patch itself.
+
 --notes[=]::
Append the notes (see linkgit:git-notes[1]) for the commit
after the three-dash line.
diff --git a/builtin/log.c b/builtin/log.c
index 02d88fa233..7658e56ecc 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1023,7 +1023,8 @@ static void show_diffstat(struct rev_info *rev,
fprintf(rev->diffopt.file, "\n");
 }
 
-static void make_cover_letter(struct rev_info *rev, int use_stdout,
+static void make_cover_letter(struct rev_info *rev, struct rev_info *rd_rev,
+ int use_stdout,
  struct commit *origin,
  int nr, struct commit **list,
  const char *branch_name,
@@ -1095,13 +1096,9 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
}
 
if (rev->rdiff1) {
-   struct diff_options opts;
-   memcpy(, >diffopt, sizeof(opts));
-   opts.output_format &= ~(DIFF_FORMAT_DIFFSTAT | 
DIFF_FORMAT_SUMMARY);
-
fprintf_ln(rev->diffopt.file, "%s", rev->rdiff_title);
show_range_diff(rev->rdiff1, rev->rdiff2,
-   rev->creation_factor, 1, );
+   rev->creation_factor, 1, _rev->diffopt);
}
 }
 
@@ -1485,6 +1482,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
struct commit *commit;
struct commit **list = NULL;
struct rev_info rev;
+   struct rev_info rd_rev;
struct setup_revision_opt s_r_opt;
int nr = 0, total, i;
int use_stdout = 0;
@@ -1603,6 +1601,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
init_log_defaults();
git_config(git_format_config, NULL);
repo_init_revisions(the_repository, , prefix);
+   repo_init_revisions(the_repository, _rev, prefix);
rev.commit_format = CMIT_FMT_EMAIL;
rev.expand_tabs_in_log_default = 0;
rev.verbose_header = 1;
@@ -1689,8 +1688,32 @@ int cmd_f

Re: [PATCH 1/2] git-reset.txt: render tables correctly under Asciidoctor

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Martin Ågren wrote:

> Asciidoctor removes the indentation of each line in these tables, so the
> last lines of each table have a completely broken alignment.
>
> Similar to 379805051d ("Documentation: render revisions correctly under
> Asciidoctor", 2018-05-06), use an explicit literal block to indicate
> that we want to keep the leading whitespace in the tables.
>
> Because this gives us some extra indentation, we can remove the one that
> we have been carrying explicitly. That is, drop the first six spaces of
> indentation on each line. With Asciidoc (8.6.10), this results in
> identical rendering before and after this commit, both for git-reset.1
> and git-reset.html.
>
> Reported-by: Paweł Samoraj 
> Signed-off-by: Martin Ågren 

Earlier I was trying to get the Documentation/doc-diff script to diff
the asciidoc and asciidoctor docs without much success (hadn't used it
before, just hacking the Makefile to turn on asciidoctor yielded syntax
errors or something).

Is something like that a thing we could make into a regression test?


Re: [PATCH] t5562: skip if NO_CURL is enabled

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 22 2018, Jeff King wrote:

> On Thu, Nov 22, 2018 at 02:17:01AM -0800, Carlo Arenas wrote:
>> PS. upstreaming the PERL_PATH fix is likely to be good to do soonish
>> as I presume at least all BSD might be affected, let me know if you
>> would rather me do that instead as I suspect we might be deadlocked
>> otherwise ;)
>
> Yeah, the $PERL_PATH thing is totally orthogonal, and should graduate
> separately.

On the subject of orthagonal things: This test fails on AIX with /bin/sh
(but not /bin/bash) due to some interaction of ssize_b100dots and the
build_option function. On that system:

$ ./git version --build-options
git version 2.20.0-rc1
cpu: 00FA74164C00
no commit associated with this build
sizeof-long: 4
sizeof-size_t: 4

But it somehow ends up in the 'die' condition in that case statement. I
dug around briefly but couldn't find the cause, probably some limitation
in the shell constructs it supports. Just leaving a note about this...


Re: [PATCH v2] log -G: Ignore binary files

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Thomas Braun wrote:

Looks much better this time around.

> The -G option of log looks for the differences whose patch text
> contains added/removed lines that match regex.
>
> As the concept of patch text only makes sense for text files, we need to
> ignore binary files when searching with -G  as well.
>
> The -S option of log looks for differences that changes
> the number of occurrences of the specified block of text (i.e.
> addition/deletion) in a file. As we want to keep the current behaviour,
> add a test to ensure it.
> [...]
> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index c0a60f3158..059ddd3431 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -242,7 +242,7 @@ textual diff has an added or a deleted line that matches 
> the given
>  regular expression.  This means that it will detect in-file (or what
>  rename-detection considers the same file) moves, which is noise.  The
>  implementation runs diff twice and greps, and this can be quite
> -expensive.
> +expensive.  Binary files without textconv filter are ignored.

Now that we support --text that should be documented. I tried to come up
with something on top:

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..42ae65fb57 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -524,6 +524,10 @@ struct), and want to know the history of that block 
since it first
 came into being: use the feature iteratively to feed the interesting
 block in the preimage back into `-S`, and keep going until you get the
 very first version of the block.
++
+Unlike `-G` the `-S` option will always search through binary files
+without a textconv filter. [[TODO: Don't we want to support --no-text
+then as an optimization?]].

 -G::
Look for differences whose patch text contains added/removed
@@ -545,6 +549,15 @@ occurrences of that string did not change).
 +
 See the 'pickaxe' entry in linkgit:gitdiffcore[7] for more
 information.
++
+Unless `--text` is supplied binary files without a textconv filter
+will be ignored.  This was not the case before Git version 2.21..
++
+With `--text`, instead of patch lines we ::
Look for differences that change the number of occurrences of
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index c0a60f3158..26880b4149 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -251,6 +251,10 @@ criterion in a changeset, the entire changeset is 
kept.  This behavior
 is designed to make reviewing changes in the context of the whole
 changeset easier.

+Both `-S' and `-G' will ignore binary files without a textconv filter
+by default, this can be overriden with `--text`. With `--text` the
+binary patch we look through is generated as [[TODO: ???]].
+
 diffcore-order: For Sorting the Output Based on Filenames
 -

But as you can see given the TODO comments I don't know how this works
exactly. I *could* dig, but that's my main outstanding problem with this
patch, the commit message / docs aren't being updated to reflect the new
behavior.

I.e. let's leave the docs in some state where the reader can as
unambiguously know what to expect with -G and these binary diffs we've
been implicitly supporting as with the textual diffs. Ideally with some
examples of how to generate them (re my question about the base85 output
in v1).

Part of that's obviously behavior we've had all along, but it's much
more convincing to say:

We are changing X which we've done for ages, it works exactly like
this, and here's a switch to get it back.

Instead of:

X doesn't make sense, let's turn it off.

Also the diffcore docs already say stuff about how slow/fast things are,
and in a side-thread you said:

My main motiviation is to speed up "log -G" as that takes a
considerable amount of time when it wades through MBs of binary
files which change often.

Makes sense, but then let's say something about that in that section of
the docs.

>  When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
>  that match their respective criterion are kept in the output.  When
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 69fc55ea1e..4cea086f80 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -154,6 +154,12 @@ static int pickaxe_match(struct diff_filepair *p, struct 
> diff_options *o,
>   if (textconv_one == textconv_two && diff_unmodified_pair(p))
>   return 0;
>
> + if ((o->pickaxe_opts & DIFF_PICKAXE_KIND_G) &&
> + !o->flags.text &&
> + ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
> +  (!textconv_two && 

Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)

2018-11-28 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Johannes Schindelin wrote:

> Hi Jonathan,
>
> On Tue, 27 Nov 2018, Jonathan Nieder wrote:
>
>> At https://bugs.debian.org/914695 is a report of a test regression in
>> an outside project that is very likely to have been triggered by the
>> new faster rebase code.
>
> From looking through that log.gz (without having a clue where the test
> code lives, so I cannot say what it is supposed to do, and also: this is
> the first time I hear about dgit...), it would appear that this must be a
> regression in the reflog messages produced by `git rebase`.
>
>> The issue has not been triaged, so I don't know yet whether it's a
>> problem in rebase-in-c or a manifestation of a bug in the test.
>
> It ends thusly:
>
> -- snip --
> [...]
> + git reflog
> + egrep 'debrebase new-upstream.*checkout'
> + test 1 = 0
> + t-report-failure
> + set +x
> TEST FAILED
> -- snap --
>
> Which makes me think that the reflog we produce in *some* code path that
> originally called `git checkout` differs from the scripted rebase's
> generated reflog.
>
>> That said, Google has been running with the new rebase since ~1 month
>> ago when it became the default, with no issues reported by users.  As a
>> result, I am confident that it can cope with what most users of "next"
>> throw at it, which means that if we are to find more issues to polish it
>> better, it will need all the exposure it can get.
>
> Right. And there are a few weeks before the holidays, which should give me
> time to fix whatever bugs are discovered (I only half mind being the only
> one who fixes these bugs).
>
>> In the Google deployment, we will keep using rebase-in-c even if it
>> gets disabled by default, in order to help with that.
>>
>> From the Debian point of view, it's only a matter of time before
>> rebase-in-c becomes the default: even if it's not the default in 2.20,
>> it would presumably be so in 2.21 or 2.22.  That means the community's
>> attention when resolving security and reliability bugs would be on the
>> rebase-in-c implementation.  As a result, the Debian package will most
>> likely enable rebase-in-c by default even if upstream disables it, in
>> order to increase the package's shelf life (i.e. to ease the
>> maintenance burden of supporting whichever version of the package ends
>> up in the next Debian stable).
>>
>> So with either hat on, it doesn't matter whether you apply this patch
>> upstream.
>>
>> Having two pretty different deployments end up with the same
>> conclusion leads me to suspect that it's best for upstream not to
>> apply the revert patch, unless either
>>
>>   (a) we have a concrete regression to address and then try again, or
>>   (b) we have a test or other plan to follow before trying again.
>
> In this instance, I am more a fan of the "let's move fast and break
> things, then move even faster fixing them" approach.
>
> Besides, the bug that Ævar discovered was a bug already in the scripted
> rebase, but hidden by yet another bug (the missing error checking).
>
> I get the pretty firm impression that the common code paths are now pretty
> robust, and only lesser-exercised features may expose a bug (or
> regression, as in the case of the reflogs, where one could argue that the
> exact reflog message is not something we promise not to fiddle with).

Since I raised this 'should we hold off?' I thought I'd chime in and say
that I'm fine with going along with what you suggest and having the
builtin as the default in the final. IOW not merge
jc/postpone-rebase-in-c down.


Re: Forcing GC to always fail

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 28 2018, Bryan Turner wrote:

> On Tue, Nov 27, 2018 at 3:47 PM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>>
>> On Tue, Nov 27 2018, Bryan Turner wrote:
>>
>> >
>> > Is there anything I can set, perhaps some invalid configuration
>> > option/value, that will make "git gc" (most important) and "git
>> > reflog" (ideal, but less important) fail when they're run in our
>> > repositories? Hopefully at that point customers will reach out to us
>> > for help _before_ they corrupt their repositories.
>>
>> You could fix this and so many other issues by just hanging up a "Is
>> This Good For The Company?" banner up in Atlassian offices .
>
> Not sure I understand what this means, or what your goal was in saying
> it. No one inside Atlassian is running these commands. I'm trying to
> help save administrators from themselves, which reduces real world
> end-user pain that comes from decisions made without fully
> understanding the consequences. It feels like this comment is mocking
> my earnest desire to help, and my genuine question looking for any
> insight people more familiar with the code might be able to offer.
> Perhaps I'm just missing the joke, but if it's an Office Space
> reference it feels like it's in pretty poor taste.

I (mis)read 'administrators' as being other people at Atlassian. Yeah
it's a reference to Office Space. I meaning to poke some fun at the
situation of having to defensively configure tools least co-workers run
them the wrong way, which I'm sure we've all had to do at some point. I
didn't mean any offense by it.

>>
>> But more seriously:
>>
>> $ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && git 
>> -c gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog expire
>> error: Invalid gc.pruneexpire: 
>> 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
>> fatal: unable to parse 'gc.pruneexpire' from command-line config
>> error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
>> 'gc.reflogexpire' is not a valid timestamp
>> fatal: unable to parse 'gc.reflogexpire' from command-line config
>
> Thanks for that! It looks like that does block both "git gc" and "git
> reflog expire" without blocking "git pack-refs", "git repack" or "git
> prune". Fantastic! The fact that it shows the invalid value means it
> might also be possible to at least provide a useful hint that manual
> GC is not safe.
>
> I appreciate your help, Ævar.

No problem. I was going to add that you can set
e.g. pack.windowMemory='some.message' to make this go for git-repack
too, but it sounds like you don't want that.

Is there a reason for why BitBucket isn't using 'git-gc'? Some other
hosting providers use it, and if you don't run it with "expire now" or
similarly aggressive non-default values on an active repository it won't
corrupt anything.


Re: Forcing GC to always fail

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 27 2018, Bryan Turner wrote:

> Something of an odd question, but is there something I can do in the
> configuration for a repository that forces any "git gc" run in that
> repository to always fail without doing anything? (Ideally I'd like to
> make "git reflog expire" _also_ fail.)
>
> Background: For Bitbucket Server, we have a fairly recurrent issue
> where administrators decide they know how to manage garbage collection
> for our repositories better than we do, so they jump on the server and
> start running things like this:
>
> git reflog expire --expire=now –all
> git gc --prune=now
> git repack -adf --window=200 --depth=200
>
> They then come running to us with their corrupted repository expecting
> and/or hoping that we can fix it (often without proper backups).
>
> Bitbucket Server itself never runs "git gc" (or "git reflog expire").
> We've configured how reflog expiry should be handled, but of course
> that's overridden by explicit command line options like
> "--expire=now". We _do_ run "git pack-refs", "git repack" and "git
> prune" (with various options), so those commands need to continue to
> work.
>
> Is there anything I can set, perhaps some invalid configuration
> option/value, that will make "git gc" (most important) and "git
> reflog" (ideal, but less important) fail when they're run in our
> repositories? Hopefully at that point customers will reach out to us
> for help _before_ they corrupt their repositories.

You could fix this and so many other issues by just hanging up a "Is
This Good For The Company?" banner up in Atlassian offices .

But more seriously:

$ stahp='Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' && git -c 
gc.pruneExpire=$stahp gc; git -c gc.reflogExpire=$stahp reflog expire
error: Invalid gc.pruneexpire: 
'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc'
fatal: unable to parse 'gc.pruneexpire' from command-line config
error: 'Bryan.Turner.will.hunt.you.down.if.you.manually.run.gc' for 
'gc.reflogexpire' is not a valid timestamp
fatal: unable to parse 'gc.reflogexpire' from command-line config


Re: [PATCH v11 20/22] stash: convert `stash--helper.c` into `stash.c`

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 27 2018, Johannes Schindelin wrote:

> Hi Junio & Paul,
>
> On Mon, 26 Nov 2018, Junio C Hamano wrote:
>
>> Paul-Sebastian Ungureanu  writes:
>>
>> > The old shell script `git-stash.sh`  was removed and replaced
>> > entirely by `builtin/stash.c`. In order to do that, `create` and
>> > `push` were adapted to work without `stash.sh`. For example, before
>> > this commit, `git stash create` called `git stash--helper create
>> > --message "$*"`. If it called `git stash--helper create "$@"`, then
>> > some of these changes wouldn't have been necessary.
>> >
>> > This commit also removes the word `helper` since now stash is
>> > called directly and not by a shell script.
>> >
>> > Signed-off-by: Paul-Sebastian Ungureanu 
>> > ---
>> >  .gitignore   |   1 -
>> >  Makefile |   3 +-
>> >  builtin.h|   2 +-
>> >  builtin/{stash--helper.c => stash.c} | 157 +++
>> >  git-stash.sh | 153 --
>> >  git.c|   2 +-
>> >  6 files changed, 92 insertions(+), 226 deletions(-)
>> >  rename builtin/{stash--helper.c => stash.c} (91%)
>> >  delete mode 100755 git-stash.sh
>>
>> Seeing the recent trouble in "rebase in C" and how keeping the
>> scripted version as "git legacy-rebase" helped us postpone the
>> rewritten version without ripping the whole thing out, I wonder if
>> we can do the same here.
>
> Feel very free to cherry-pick
> https://github.com/git-for-windows/git/commit/004da7e7faa36c872868ae938e06594ea1c2f01c
> and
> https://github.com/git-for-windows/git/commit/cedfcd39f5a4e4beb33e16fa67c4659fd4bdabf6
> which is what we carry in Git for Windows.

...and then something similar to 62c23938fa ("tests: add a special setup
where rebase.useBuiltin is off", 2018-11-14) so those of us who're
smoking next for bugs can test both and report if some of the test
setups (odd OS's etc) show a difference in behavior.

I did some of this the last time around, but then I had to e.g. smoke
next against pu, and look at the general fallout there and see what was
due to stash-in-C, it would be much better to have a
GIT_TEST_STASH_USE_BUILTIN.


>> Also, the remaining two patches should be done _before_ this step, I
>> would think.  I can understand it if the reason you have those two
>> after this step is because you found the opportunity for these
>> improvements after you wrote this step, but in the larger picture
>> seen by the end users of the "stash in C" and those developers who
>> follow the evolution of the code, the logical place for this "now we
>> have everything in C, we retire the scripted version" step to happen
>> is at the very end.
>>
>> Thanks.
>>


Re: Git pull confusing output

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 27 2018, Will wrote:

> On 27 Nov 2018, at 19:24, Stefan Beller wrote:
>
>> The different phases taking each one line takes up precious
>> screen real estate, so another approach would be delete the line
>> after one phase is finished, such that you'd only see the currently
>> active phase (that can be useful for debugging as in "The phase of
>> 'Writing objects' takes very long" -> slow network connection).
>
> I like this idea
>
 Pushing to github.com:williamdclt/some-repo.git… done
 1ca9aaa..4320d30  master -> master
>>>
>>>
>>> I’d be more than happy to work on this (`git push` is an example
>>> amongst so many other), but want the mailing list’s opinion on it. Am
>>> I wrong in thinking that this output is not something users want, am I
>>> fighting windmills or maybe just being ignorant?
>>
>> I think this would be a useful patch, but it could get complicated
>> quickly: push uses other low level git commands to prepare the
>> packfile to be sent to the server, currently it only needs to pipe
>> through the output of the low level command (or even have the
>> low level command directly write to the terminal).
>>
>> The output of those low level commands should not be changed
>> when run on their own, I would assume.
>
> Agreed. I didn’t expect it to be so subtle, but I’ll look into it
> and see if that’s something within my reach.

It's not *quite* the same topic, but a related WIP patch that got
dropped (and it would be great if someone looking at this area could
pick it up) is
https://public-inbox.org/git/20180902085503.ga25...@sigill.intra.peff.net/

I think it's also worth looking into making the progress code able to
emit stuff like:

Counting / Compressing / Writing objects A% (X_1/Y_2) B% (X_2/Y_2) C% 
(X_3/Y_3)

That would allow for splitting up some of these cases where our progress
bars are overly verbose across multiple lines without losing info by
completely erasing the line, and would also support other cases where
sometimes we do this stuff concurrently. See e.g. my recent commit-graph
progress patches for something that would benefit from this type of
thing.

It would need a patch to the progress code to make it able to juggle N
number of progress bars with some format to stitch them all together.


[PATCH 3/5] t/README: re-flow a paragraph

2018-11-27 Thread Ævar Arnfjörð Bjarmason
An earlier change changed this paragraph to make the first line quite
short as to produce a more minimal diff. Let's re-flow it. There's no
changes here if diffed with --word-diff.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/t/README b/t/README
index 3139f4330a..c03b268813 100644
--- a/t/README
+++ b/t/README
@@ -216,11 +216,10 @@ Or tests matching a glob:
 
 $ GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make
 
-The value of the environment variable is a
-SP separated list of patterns that tells which tests to skip,
-and either can match the "t[0-9]{4}" part to skip the whole
-test, or t[0-9]{4} followed by ".$number" to say which
-particular test to skip.
+The value of the environment variable is a SP separated list of
+patterns that tells which tests to skip, and either can match the
+"t[0-9]{4}" part to skip the whole test, or t[0-9]{4} followed by
+".$number" to say which particular test to skip.
 
 For an individual test suite --run could be used to specify that
 only some tests should be run or that some tests should be
-- 
2.20.0.rc1.379.g1dd7ef354c



[PATCH 2/5] t/README: modernize description of GIT_SKIP_TESTS

2018-11-27 Thread Ævar Arnfjörð Bjarmason
When the GIT_SKIP_TESTS documentation was added in
fbd458a3f6 ("t/README: Add 'Skipping Tests' section below 'Running
Tests'", 2008-06-20) there was no way to declare test prerequisites,
that came later in a7bb394037 ("test-lib: Infrastructure to test and
check for prerequisites", 2009-03-01).

The docs were newer updated, and have been saying that you might want
to use GIT_SKIP_TESTS for a use-case which we'd never use them for,
skipping tests because 'unzip' isn't there. For that we'd use the
UNZIP prerequisite added in 552a26c8c0 ("Use prerequisites to skip
tests that need unzip", 2009-03-16). Fix the docs accordingly.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/t/README b/t/README
index b6ec28f634..3139f4330a 100644
--- a/t/README
+++ b/t/README
@@ -202,20 +202,21 @@ GIT_TEST_EXEC_PATH defaults to `$GIT_TEST_INSTALLED/git 
--exec-path`.
 Skipping Tests
 --
 
-In some environments, certain tests have no way of succeeding
-due to platform limitation, such as lack of 'unzip' program, or
-filesystem that do not allow arbitrary sequence of non-NUL bytes
-as pathnames.
+Certain tests may fail intermittently or entirely. These should
+ideally be reported as bugs and fixed, or guarded by a prerequisite
+(see "Using test prerequisites" below). But until then they can be
+skipped.
 
-You should be able to say something like
+To skip tests, set the GIT_SKIP_TESTS variable. Individual tests can
+be skipped:
 
 $ GIT_SKIP_TESTS=t9200.8 sh ./t9200-git-cvsexport-commit.sh
 
-and even:
+Or tests matching a glob:
 
 $ GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make
 
-to omit such tests.  The value of the environment variable is a
+The value of the environment variable is a
 SP separated list of patterns that tells which tests to skip,
 and either can match the "t[0-9]{4}" part to skip the whole
 test, or t[0-9]{4} followed by ".$number" to say which
-- 
2.20.0.rc1.379.g1dd7ef354c



[PATCH 5/5] test-lib: add support for GIT_TODO_TESTS

2018-11-27 Thread Ævar Arnfjörð Bjarmason
As noted in the updated t/README documentation being added here
setting this new GIT_TODO_TESTS variable is usually better than
GIT_SKIP_TESTS.

My use-case for this is to get feedback from the CI infrastructure[1]
about which tests are passing due to fixes that have trickled into
git.git.

With the GIT_SKIP_TESTS variable this use-case is painful, you need to
do an occasional manual run without GIT_SKIP_TESTS set. It's much
better to use GIT_TODO_TESTS and get a report of passing TODO tests
from prove(1) at the bottom of the test output. Once those passing
TODO tests have trickled down to 'master' the relevant glob (set for
all of master/next/pu) can be removed.

As seen in the "GIT_TODO_TESTS mixed failure" test the lack of
interaction with existing tests marked as TODO by the test suite
itself is intentional. There's no need to print out something saying
they matched GIT_TODO_TESTS if they're already TODO tests.

1. https://public-inbox.org/git/875zwm15k2@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 ci/lib-travisci.sh |  2 +-
 t/README   | 18 +--
 t/t-basic.sh   | 81 +++---
 t/test-lib.sh  | 31 +++---
 4 files changed, 112 insertions(+), 20 deletions(-)

diff --git a/ci/lib-travisci.sh b/ci/lib-travisci.sh
index 69dff4d1ec..ad8290bfdb 100755
--- a/ci/lib-travisci.sh
+++ b/ci/lib-travisci.sh
@@ -121,7 +121,7 @@ osx-clang|osx-gcc)
# t9810 occasionally fails on Travis CI OS X
# t9816 occasionally fails with "TAP out of sequence errors" on
# Travis CI OS X
-   export GIT_SKIP_TESTS="t9810 t9816"
+   export GIT_TODO_TESTS="t9810 t9816"
;;
 GIT_TEST_GETTEXT_POISON)
export GIT_TEST_GETTEXT_POISON=YesPlease
diff --git a/t/README b/t/README
index c03b268813..922b4fb3bf 100644
--- a/t/README
+++ b/t/README
@@ -207,8 +207,19 @@ ideally be reported as bugs and fixed, or guarded by a 
prerequisite
 (see "Using test prerequisites" below). But until then they can be
 skipped.
 
-To skip tests, set the GIT_SKIP_TESTS variable. Individual tests can
-be skipped:
+To skip tests, set either the GIT_SKIP_TESTS or GIT_TODO_TESTS
+variables. The difference is that with SKIP the tests won't be run at
+all, whereas they will be run with TODO, but in success or failure
+annotated as a TODO test.
+
+It's usually preferrable to use TODO, since the test suite will report
+those tests that unexpectedly succeed, which may indicate that
+whatever bug broke the test in the past has been fixed, and the test
+should be un-TODO'd. There's no such feedback loop with
+GIT_SKIP_TESTS.
+
+The GIT_SKIP_TESTS and GIT_TODO_TESTS take the same values. Individual
+tests can be skipped:
 
 $ GIT_SKIP_TESTS=t9200.8 sh ./t9200-git-cvsexport-commit.sh
 
@@ -223,7 +234,8 @@ patterns that tells which tests to skip, and either can 
match the
 
 For an individual test suite --run could be used to specify that
 only some tests should be run or that some tests should be
-excluded from a run.
+excluded from a run. The --run option is a shorthand for setting
+a GIT_SKIP_TESTS pattern.
 
 The argument for --run is a list of individual test numbers or
 ranges with an optional negation prefix that define what tests in
diff --git a/t/t-basic.sh b/t/t-basic.sh
index b87a8f18c2..34c9c5c2a3 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -324,9 +324,10 @@ test_expect_success 'test --verbose-only' '
EOF
 '
 
-test_expect_success 'GIT_SKIP_TESTS' "
+test_expect_success 'GIT_{SKIP,TODO}_TESTS' "
(
GIT_SKIP_TESTS='git.2' && export GIT_SKIP_TESTS &&
+   GIT_TODO_TESTS='git.3' && export GIT_TODO_TESTS &&
run_sub_test_lib_test git-skip-tests-basic \
'GIT_SKIP_TESTS' <<-\\EOF &&
for i in 1 2 3
@@ -338,16 +339,17 @@ test_expect_success 'GIT_SKIP_TESTS' "
check_sub_test_lib_test git-skip-tests-basic <<-\\EOF
> ok 1 - passing test #1
> ok 2 # skip passing test #2 (GIT_SKIP_TESTS)
-   > ok 3 - passing test #3
-   > # passed all 3 test(s)
+   > ok 3 - passing test #3 # TODO (GIT_TODO_TESTS)
+   > # passed all 3 test(s) (including 1/0 ok/failing TODO test(s))
> 1..3
EOF
)
 "
 
-test_expect_success 'GIT_SKIP_TESTS several tests' "
+test_expect_success 'GIT_{SKIP,TODO}_TESTS several tests' "
(
GIT_SKIP_TESTS='git.2 git.5' && export GIT_SKIP_TESTS &&
+   GIT_TODO_TESTS='git.3 git.6' && export GIT_TODO_TESTS &&
run_sub_test_lib_test git-skip-tests-several \
'GIT_SKIP_TESTS several tests' <<-\\EOF &&
   

[PATCH 1/5] t/README: make the 'Skipping tests' section less confusing

2018-11-27 Thread Ævar Arnfjörð Bjarmason
I added this section in b5500d16cd ("t/README: Add a section about
skipping tests", 2010-07-02), but apparently didn't notice that there
was an existing "Skipping Tests" section added in
fbd458a3f6 ("t/README: Add 'Skipping Tests' section below 'Running
Tests'", 2008-06-20).

Then since 20873f45e7 ("t/README: Document the do's and don'ts of
tests", 2010-07-02) and 0445e6f0a1 ("test-lib: '--run' to run only
specific tests", 2014-04-30) we've started to refer to "Skipping
tests" or "Skipping Tests" sections in prose elsewhere.

Let's clean up this confusion by renaming the section, and while we're
at it improve the example. Usually we don't use the PERL prerequisite,
and we should accurately describe why you'd want to use prerequisites,
as opposed to GIT_SKIP_TESTS. So let's say "Tests that depend[...]"
instead of "If you need to skip tests[...]" here.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/t/README b/t/README
index 28711cc508..b6ec28f634 100644
--- a/t/README
+++ b/t/README
@@ -494,7 +494,7 @@ And here are the "don'ts:"
 
The harness will catch this as a programming error of the test.
Use test_done instead if you need to stop the tests early (see
-   "Skipping tests" below).
+   "Using test prerequisites" below).
 
  - Don't use '! git cmd' when you want to make sure the git command
exits with failure in a controlled way by calling "die()".  Instead,
@@ -587,28 +587,28 @@ And here are the "don'ts:"
it'll complain if anything is amiss.
 
 
-Skipping tests
---
+Using test prerequisites
+
 
-If you need to skip tests you should do so by using the three-arg form
-of the test_* functions (see the "Test harness library" section
-below), e.g.:
+Tests that depend on something which may not be present on the system
+should use the three-arg form of the test_* functions (see the "Test
+harness library" section below), e.g.:
 
-test_expect_success PERL 'I need Perl' '
-perl -e "hlagh() if unf_unf()"
+test_expect_success SYMLINKS 'setup' '
+ln -s target link
 '
 
 The advantage of skipping tests like this is that platforms that don't
-have the PERL and other optional dependencies get an indication of how
+have the SYMLINKS and other optional dependencies get an indication of how
 many tests they're missing.
 
 If the test code is too hairy for that (i.e. does a lot of setup work
 outside test assertions) you can also skip all remaining tests by
 setting skip_all and immediately call test_done:
 
-   if ! test_have_prereq PERL
+   if ! test_have_prereq SYMLINKS
then
-   skip_all='skipping perl interface tests, perl not available'
+   skip_all="skipping symlink tests, the filesystem doesn't support it"
test_done
fi
 
-- 
2.20.0.rc1.379.g1dd7ef354c



[PATCH 4/5] test-lib: add more exhaustive GIT_SKIP_TESTS testing

2018-11-27 Thread Ævar Arnfjörð Bjarmason
Add a test for the when GIT_SKIP_TESTS is used to skip entire test
files. Support for this was added back in 04ece59399 ("GIT_SKIP_TESTS:
allow users to omit tests that are known to break", 2006-12-28), but
never tested for.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t-basic.sh | 17 +
 1 file changed, 17 insertions(+)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index b6566003dd..b87a8f18c2 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -393,6 +393,23 @@ test_expect_success 'GIT_SKIP_TESTS sh pattern' "
)
 "
 
+test_expect_success 'GIT_SKIP_TESTS entire file' "
+   (
+   GIT_SKIP_TESTS='git' && export GIT_SKIP_TESTS &&
+   run_sub_test_lib_test git-skip-tests-entire-file \
+   'GIT_SKIP_TESTS' <<-\\EOF &&
+   for i in 1 2 3
+   do
+   test_expect_success \"passing test #\$i\" 'true'
+   done
+   test_done
+   EOF
+   check_sub_test_lib_test git-skip-tests-entire-file <<-\\EOF
+   1..0 # SKIP skip all tests in git
+   EOF
+   )
+"
+
 test_expect_success '--run basic' "
run_sub_test_lib_test run-basic \
'--run basic' --run='1 3 5' <<-\\EOF &&
-- 
2.20.0.rc1.379.g1dd7ef354c



[PATCH] advice: don't pointlessly suggest --convert-graft-file

2018-11-27 Thread Ævar Arnfjörð Bjarmason
The advice to run 'git replace --convert-graft-file' added in
f9f99b3f7d ("Deprecate support for .git/info/grafts", 2018-04-29)
didn't add an exception for the 'git replace --convert-graft-file'
codepath itself.

As a result we'd suggest running --convert-graft-file while the user
was running --convert-graft-file, which makes no sense. Before:

$ git replace --convert-graft-file
hint: Support for /info/grafts is deprecated
hint: and will be removed in a future Git version.
hint:
hint: Please use "git replace --convert-graft-file"
hint: to convert the grafts into replace refs.
hint:
hint: Turn this message off by running
hint: "git config advice.graftFileDeprecated false"

Add a check for that case and skip printing the advice while the user
is busy following our advice.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/replace.c  | 1 +
 t/t6050-replace.sh | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index a58b9c6d13..affcdfb416 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -495,6 +495,7 @@ static int convert_graft_file(int force)
if (!fp)
return -1;
 
+   advice_graft_file_deprecated = 0;
while (strbuf_getline(, fp) != EOF) {
if (*buf.buf == '#')
continue;
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index 86374a9c52..5d6d3184ac 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -461,7 +461,10 @@ test_expect_success '--convert-graft-file' '
printf "%s\n%s %s\n\n# comment\n%s\n" \
$(git rev-parse HEAD^^ HEAD^ HEAD^^ HEAD^2) \
>.git/info/grafts &&
-   git replace --convert-graft-file &&
+   git status 2>stderr &&
+   test_i18ngrep "hint:.*grafts is deprecated" stderr &&
+   git replace --convert-graft-file 2>stderr &&
+   test_i18ngrep ! "hint:.*grafts is deprecated" stderr &&
test_path_is_missing .git/info/grafts &&
 
: verify that the history is now "grafted" &&
-- 
2.20.0.rc1.379.g1dd7ef354c



Re: [PATCH] tests: avoid syntax triggering old dash bug

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 27 2018, Eric Sunshine wrote:

> On Tue, Nov 27, 2018 at 11:43 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> Avoid a bug in dash that's been fixed ever since its
>> ec2c84d ("[PARSER] Fix clobbering of checkkwd", 2011-03-15)[1] first
>> released with dash v0.5.7 in July 2011.
>
> Perhaps enhance the commit message to explain the nature of the bug
> itself. It is not at all obvious from reading the above or from
> looking at the diff itself what the actual problem is that the patch
> is fixing. (And it wasn't even immediately obvious by looking at the
> commit message of ec2c84d in the dash repository.) To help readers of
> this patch avoid re-introducing this problem or diagnose such a
> failure, it might be a good idea to give an example of the syntax
> which trips up old dash (i.e. a here-doc followed immediately by a
> {...} expression) and the actual error message 'Syntax error: "}"
> unexpected'.

I haven't taken the time to understand the bug either. Our entire test
suite had one instance of this, so I think it's obscure enough that it's
fine to just fix it as a one-off and not spend any more time on making
sure it doesn't happen again or add some lint for detecting it.

>> This fixes 1/2 tests failing on Debian Lenny & Squeeze. The other
>> failure is due to 1b42f45255 ("git-svn: apply "svn.pathnameencoding"
>> before URL encoding", 2016-02-09).
>>
>> This particular test has been failing since 5f9674243d ("config: add
>> --expiry-date", 2017-11-18).
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 


[PATCH] tests: avoid syntax triggering old dash bug

2018-11-27 Thread Ævar Arnfjörð Bjarmason
Avoid a bug in dash that's been fixed ever since its
ec2c84d ("[PARSER] Fix clobbering of checkkwd", 2011-03-15)[1] first
released with dash v0.5.7 in July 2011.

This fixes 1/2 tests failing on Debian Lenny & Squeeze. The other
failure is due to 1b42f45255 ("git-svn: apply "svn.pathnameencoding"
before URL encoding", 2016-02-09).

This particular test has been failing since 5f9674243d ("config: add
--expiry-date", 2017-11-18).

1. https://git.kernel.org/pub/scm/utils/dash/dash.git/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t1300-config.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9652b241c7..7690b518b8 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -892,8 +892,9 @@ test_expect_success 'get --expiry-date' '
1510348087
0
EOF
+   date_valid1=$(git config --expiry-date date.valid1) &&
{
-   echo "$rel_out $(git config --expiry-date date.valid1)"
+   echo "$rel_out $date_valid1"
git config --expiry-date date.valid2 &&
git config --expiry-date date.valid3 &&
git config --expiry-date date.valid4 &&
-- 
2.20.0.rc1.379.g1dd7ef354c



Re: [RFC PATCH] Introduce "precious" file concept

2018-11-27 Thread Ævar Arnfjörð Bjarmason


On Mon, Nov 26 2018, Junio C Hamano wrote:

> Per Lundberg  writes:
>
>> How about something like this:
>> ...
>> Would this be a reasonable compromise for everybody?
>
> I do not think you'd need to introduce such a deliberately breaking
> change at all.  Just introduce a new "precious" class, perhaps mark
> them with the atttribute mechanism, and that would be the endgame.
> Early adopters would start marking ignored but not expendable paths
> with the "precious" attribute and they won't be clobbered.  As the
> mechanism becomes widely known and mature, more and more people use
> it.  And even after that happens, early adopters do not have to change
> any attribute setting, and late adopters would have plenty of examples
> to imitate.  Those who do not need any "precious" class do not have
> to do anything and you won't break any existing automation that way.

The patch I submitted in <87zhuf3gs0@evledraar.gmail.com>[1] changed
the behavior for read-tree & checkout & merge etc.

It was an RFC more in the spirit of showing what in our current tests
had to change to spur some discussion.

But I'm very sympathetic to this line of argument. I.e. in my patch I'm
changing the semantics of read-tree, which is plumbing.

What do you think about some patch like that which retains the plumbing
behavior for things like read-tree, doesn't introduce "precious" or
"trashable", and just makes you specify "[checkout|merge|...] --force"
in cases where we'd have clobbering?

This would give scripts which relied on our stable plumbing consistent
behavior, while helping users who're using our main porcelain not to
lose data. I could then add a --force option to the likes of read-tree
(on by default), so you could get porcelain-like behavior with
--no-force.

1. https://public-inbox.org/git/87zhuf3gs0@evledraar.gmail.com/


Re: [RFC] Introduce two new commands, switch-branch and restore-paths

2018-11-26 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 20 2018, Duy Nguyen wrote:

> On Mon, Nov 19, 2018 at 04:19:53PM +0100, Duy Nguyen wrote:
>> I promise to come back with something better (at least it still
>> sounds better in my mind). If that idea does not work out, we can
>> come back and see if we can improve this.
>
> So this is it. The patch isn't pretty, mostly as a proof of
> concept. Just look at the three functions at the bottom of checkout.c,
> which is the main thing.
>
> This patch tries to split "git checkout" command in two new ones:
>
> - git switch-branch is all about switching branches

Isn't this going to also take the other ref arguments 'git checkout'
takes now? I.e. tags, detached HEADs etc? I'm reminded of the discussion
about what "range-diff" should be called :)

> - git restore-paths (maybe restore-file is better) for checking out
>   paths

If it takes globs/dirs then a plural is probably better.

> The main idea is these two commands will co-exist with the good old
> 'git checkout', which will NOT be deprecated. Old timers will still
> use "git checkout". But new people should be introduced to the new two
> instead. And the new ones are just as capable as "git checkout".
>
> Since the three commands will co-exist (with duplicate functionality),
> maintenance cost must be kept to minimum. The way I did this is simply
> split the command line options into three pieces: common,
> switch-branch and checkout-paths. "git checkout" has all three, the
> other two have common and another piece.
>
> With this, a new option added to git checkout will be automatically
> available in either switch-branch or checkout-paths. Bug fixes apply
> to all relevant commands.
>
> Later on, we could start to add a bit more stuff in, e.g. some form of
> disambiguation is no longer needed when running as switch-branch, or
> restore-paths.
>
> So, what do you think?

That "git checkout" does too many things is something that keeps coming
up in online discussions about Git's UI. Two things:

a) It would really help to have some comparison of cases where these
   split commands are much clearer or less ambiguous than
   git-checkout. I can think of some (e.g. branch with the same name as
   a file) but having some overall picture of what the new UI looks like
   with solved / not solved cases would be nice. Also a comparison with
   other SCMs people find less confusing (svn, hg, bzr, ...)

b) I think we really need to have some end-game where we'd actually
   switch away from "checkout" (which we could still auto-route to new
   commands in perpetuity, but print a warning or error). Otherwise
   we'll just end up with https://xkcd.com/927/ and more UI confusion
   for all.


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-26 Thread Ævar Arnfjörð Bjarmason


On Mon, Nov 26 2018, Duy Nguyen wrote:

> On Mon, Nov 26, 2018 at 4:34 PM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>>
>> On Mon, Nov 26 2018, Duy Nguyen wrote:
>>
>> > On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg  
>> > wrote:
>> >>
>> >> On 11/13/18 1:22 AM, brian m. carlson wrote:
>> >> > This is going to totally hose automation.  My last job had files which
>> >> > might move from tracked to untracked (a file that had become generated),
>> >> > and long-running CI and build systems would need to be able to check out
>> >> > one status and switch to the other.  Your proposed change will prevent
>> >> > those systems from working, whereas they previously did.
>> >> >
>> >> > I agree that your proposal would have been a better design originally,
>> >> > but breaking the way automated systems currently work is probably going
>> >> > to be a dealbreaker.
>> >>
>> >> How about something like this:
>> >>
>> >> 1. Introduce a concept with "garbage" files, which git is "permitted to
>> >> delete" without prompting.
>> >>
>> >> 2. Retain the current default, i.e. "ignored files are garbage" for now,
>> >> making the new behavior _opt in_ to avoid breaking automated
>> >> systems/existing scripts for anyone. Put the setting for this behind a
>> >> new core.* config flag.
>> >>
>> >> 3. In the plan for version 3.0 (a new major version where some breakage
>> >> can be tolerable, according to Semantic Versioning), change the default
>> >> so that "only explicit garbage is garbage". Include very clear notices
>> >> of this in the release notes. The config flag is retained, but its
>> >> default changes from true->false or vice versa. People who dislike the
>> >> new behavior can easily change back to the 2.x semantics.
>> >
>> > How does this garbage thing interact with "git clean -x"? My
>> > interpretation of this flag/attribute is that at version 3.0 by
>> > default all ignored files are _not_ garbage, so "git clean -x" should
>> > not remove any of them. Which is weird because most of ignored files
>> > are like *.o that should be removed.
>> >
>> > I also need to mark "precious" on untracked or even tracked files (*).
>> > Not sure how this "garbage" attribute interacts with that.
>> >
>> > (*) I was hoping I could get the idea [1] implemented in somewhat good
>> > shape before presenting here. But I'm a bit slow on that front. So
>> > yeah this "precious" on untracked/tracked thingy may be even
>> > irrelevant if the patch series will be rejected.
>>
>> I think a garbage (or trashable) flag, if implemented, wouldn't need any
>> special case in git-clean, i.e. -x would remove all untracked files,
>> whether ignored or garbage/trashable. That's what my patch to implement
>> it does:
>> https://public-inbox.org/git/87zhuf3gs0@evledraar.gmail.com/
>>
>> I think that makes sense. Users running "git clean" have "--dry-run" and
>> unlike "checkout a branch" or "merge this commit" where we'll now shred
>> data implicitly it's obvious that git-clean is going to shred your data.
>
> Then that't not what I want. If I'm going to mark to keep "config.mak"
> around, I'm not going to carefully move it away before doing "git
> clean -fdx" then move it back. No "git clean --dry-run" telling me to
> make a backup of config.mak is no good.

Understood. I mean this in the context of solving the problem users have
with running otherwise non-data-destroying commands like "checkout" and
"merge" and getting their data destroyed, which is overwhelmingly why
this topic gets resurrected.

Some of the solutions overlap with this thing you want, but I think it's
worth keeping the distinction between the two in mind. I.e. I can
imagine us finding some acceptable solution to the data shredding
problem that doesn't implement this mode for "git-clean", or the other
way around.


Re: [RFC PATCH 0/7] test: NetBSD support

2018-11-26 Thread Ævar Arnfjörð Bjarmason


On Sun, Nov 25 2018, Carlo Marcelo Arenas Belón wrote:

> Likely still missing changes as it only completes a run with a minimal
> number of dependencies but open for feedback
>
> Requires pkgsrc packages for gmake, perl, bash and curl and completes a run
>
>   $ gmake SHELL_PATH=/usr/pkg/bin/bash NO_PYTHON=1 CURL_DIR=/usr/pkg test

This looks good to me, and fixes most of the issues I've encountered on
NetBSD recently. See
https://gitlab.com/git-vcs/git-gitlab-ci/blob/9337ed6f99e3780d1d8dc087dff688f5a68805a4/ci/gitlab/run-on-gcc-farm.sh#L228-239
and https://gitlab.com/git-vcs/git-ci/-/jobs/125476939


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-26 Thread Ævar Arnfjörð Bjarmason


On Mon, Nov 26 2018, Duy Nguyen wrote:

> On Mon, Nov 26, 2018 at 10:30 AM Per Lundberg  wrote:
>>
>> On 11/13/18 1:22 AM, brian m. carlson wrote:
>> > This is going to totally hose automation.  My last job had files which
>> > might move from tracked to untracked (a file that had become generated),
>> > and long-running CI and build systems would need to be able to check out
>> > one status and switch to the other.  Your proposed change will prevent
>> > those systems from working, whereas they previously did.
>> >
>> > I agree that your proposal would have been a better design originally,
>> > but breaking the way automated systems currently work is probably going
>> > to be a dealbreaker.
>>
>> How about something like this:
>>
>> 1. Introduce a concept with "garbage" files, which git is "permitted to
>> delete" without prompting.
>>
>> 2. Retain the current default, i.e. "ignored files are garbage" for now,
>> making the new behavior _opt in_ to avoid breaking automated
>> systems/existing scripts for anyone. Put the setting for this behind a
>> new core.* config flag.
>>
>> 3. In the plan for version 3.0 (a new major version where some breakage
>> can be tolerable, according to Semantic Versioning), change the default
>> so that "only explicit garbage is garbage". Include very clear notices
>> of this in the release notes. The config flag is retained, but its
>> default changes from true->false or vice versa. People who dislike the
>> new behavior can easily change back to the 2.x semantics.
>
> How does this garbage thing interact with "git clean -x"? My
> interpretation of this flag/attribute is that at version 3.0 by
> default all ignored files are _not_ garbage, so "git clean -x" should
> not remove any of them. Which is weird because most of ignored files
> are like *.o that should be removed.
>
> I also need to mark "precious" on untracked or even tracked files (*).
> Not sure how this "garbage" attribute interacts with that.
>
> (*) I was hoping I could get the idea [1] implemented in somewhat good
> shape before presenting here. But I'm a bit slow on that front. So
> yeah this "precious" on untracked/tracked thingy may be even
> irrelevant if the patch series will be rejected.

I think a garbage (or trashable) flag, if implemented, wouldn't need any
special case in git-clean, i.e. -x would remove all untracked files,
whether ignored or garbage/trashable. That's what my patch to implement
it does:
https://public-inbox.org/git/87zhuf3gs0@evledraar.gmail.com/

I think that makes sense. Users running "git clean" have "--dry-run" and
unlike "checkout a branch" or "merge this commit" where we'll now shred
data implicitly it's obvious that git-clean is going to shred your data.


Re: [RFC PATCH] Introduce "precious" file concept

2018-11-26 Thread Ævar Arnfjörð Bjarmason


On Mon, Nov 26 2018, Per Lundberg wrote:

> On 11/13/18 1:22 AM, brian m. carlson wrote:
>> This is going to totally hose automation.  My last job had files which
>> might move from tracked to untracked (a file that had become generated),
>> and long-running CI and build systems would need to be able to check out
>> one status and switch to the other.  Your proposed change will prevent
>> those systems from working, whereas they previously did.
>>
>> I agree that your proposal would have been a better design originally,
>> but breaking the way automated systems currently work is probably going
>> to be a dealbreaker.
>
> How about something like this:
>
> 1. Introduce a concept with "garbage" files, which git is "permitted to
> delete" without prompting.
>
> 2. Retain the current default, i.e. "ignored files are garbage" for now,
> making the new behavior _opt in_ to avoid breaking automated
> systems/existing scripts for anyone. Put the setting for this behind a
> new core.* config flag.
>
> 3. In the plan for version 3.0 (a new major version where some breakage
> can be tolerable, according to Semantic Versioning), change the default
> so that "only explicit garbage is garbage". Include very clear notices
> of this in the release notes. The config flag is retained, but its
> default changes from true->false or vice versa. People who dislike the
> new behavior can easily change back to the 2.x semantics.
>
> Would this be a reasonable compromise for everybody?

Possibly, but I think there's an earlier step zero there for anyone
interested in pursuing this (and currently I can't make time for it),
which is to submit a patch with tests and documentation showing exactly
the sort of scenarios where we clobber or don't clobber existing files.

As my https://public-inbox.org/git/87zhuf3gs0@evledraar.gmail.com/
shows we have tests for this, but they're not explicit, and some want to
test some unrelated thing.

I.e. to test the cases where we clobber foo.c because foo.c now
explicitly exists, or cases where dir/foo.c is clobbered because "dir"
is now a tracked text file etc., are those the only two cases? I vaguely
suspect that there were other interesting cases, but at this point the
information has been paged out of the working set of my wetware. The
thread at
https://public-inbox.org/git/87o9au39s7@evledraar.gmail.com/ has
some notes about this.

Then as noted in
https://public-inbox.org/git/87wopj3661@evledraar.gmail.com/ the
reason we have this behavior seems to be something that grew organically
from a semi-related bugfix.

So I don't think we're at a point where we're all dug into our trenches
and some people want X and others want Y and in the name of backwards
compatibility we're going to stay with X. It may turn out that we just
want to retain 10% of X, and can get 99% of the safety of Y by doing
that.


Re: What's cooking in git.git (Sep 2018, #01; Tue, 4)

2018-11-25 Thread Ævar Arnfjörð Bjarmason


On Sun, Nov 25 2018, Torsten Bögershausen wrote:

> On Sun, Nov 25, 2018 at 05:28:35AM +0100, Torsten Bögershausen wrote:
>> On Sat, Nov 24, 2018 at 08:33:37PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >
>> > On Wed, Sep 05 2018, Ævar Arnfjörð Bjarmason wrote:
>> >
>> > > On Wed, Sep 05 2018, Eric Sunshine wrote:
>>
>> []
>>
>> > > SunCC used to be ahead of GCC & Clang when it came to certain classes of
>> > > warnings, but e.g. now everything it complains about is because it
>> > > doesn't understand C as well, e.g. we have quite a few compile warnings
>> > > due to code like this, which it claims is unreachable (but isn't):
>> > > https://github.com/git/git/blob/v2.19.0-rc2/read-cache.c#L950-L955
>> >
>>
>> Wait a second - even if the compiler claims something (wrong)...
>> there a still 1+1/2 questions from my side:
>>
>>
>> int verify_path(const char *path, unsigned mode)
>> {
>>  char c;
>>   ^
>>  /* Q1: should  "c" be initialized like this: */
>>  char c = *path;
>>
>>  if (has_dos_drive_prefix(path))
>>  return 0;
>>
>>  goto inside;
>>   /* Q2: and why do we need the "goto" here ? */
>>  for (;;) {
>>  if (!c)
>>  return 1;
>>  if (is_dir_sep(c)) {
>> inside:
>
> After some re-reading,
> I think that the "goto inside" was just hard to read
>
> Out of interest:
> would the following make the compiler happy ?
>
>
> diff --git a/read-cache.c b/read-cache.c
> index 49add63fe1..d574d58b9d 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -951,17 +951,15 @@ static int verify_dotfile(const char *rest, unsigned 
> mode)
>
>  int verify_path(const char *path, unsigned mode)
>  {
> - char c;
> + char c = *path ? '/' : '\0';
>
>   if (has_dos_drive_prefix(path))
>   return 0;
>
> - goto inside;
>   for (;;) {
>   if (!c)
>   return 1;
>   if (is_dir_sep(c)) {
> -inside:
>   if (protect_hfs) {
>   if (is_hfs_dotgit(path))
>   return 0;

I haven't tested (it's tedious) but yes, I can tell you SunCC won't
whine about this. I've only seen its unreachability detector get
confused about goto inside a loop, not the the sort of code you've
replaced this with.

We should not be appeasing these old compiler warnings in cases where
they're wrong. You can check out the CI output I linked to to see 10-20
cases in the codebase where SunCC is wrong about unreliability.

Whether we should just fix this for its own sake is another question.


Re: How to efficiently backup a bare repository?

2018-11-24 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 23 2018, Guilhem Bonnefille wrote:

> I'm managing many bare repositories for development teams.
>
> One service we want to offer is to let developers retrieve old state
> of the repository up to 30 days. For example, one developer
> (accidently) removed (push -f) a branch/tag and realize few days later
> (after vacations) that it was an error.
>
> What is the best approach to do this?
>
> Currently, we use a classical approach, backuping all the repo every
> day. But this is far from efficient as:
> - we accumulate 30th copies of the repository
> - due to packing logic of Git, even if the content is mostly similar,
> from one backup to another, there is no way to deduplicate.
>
> Is there any tricks based on reflog? Even for deleted refs (branch/tags)?
> Is there any tooling playing with the internal of git to offer such
> feature, like copying all refs in a timestamped refs directory to
> retain objects?
>
> Thanks in advance for any tips letting improve the backup.

There's no easy out of the box way to do exactly what you've
described. A few things come to mind:

a) If you can simply deny non-fast-forwards that's ideal. E.g. for some
   branches you care about, or tags. This is how most of us deal with
   this issue in practice. I.e. have some "blessed" refs that matter,
   and if someone rewinds their own topic branch that's their own
   problem.

b) You could as you touched upon have a post-receive hook that detects
   non-fast-forwards, and e.g. pushes a clobberd "master" or "v1.0" to
   some backup repo's 2018-11-24-23-39-04-master or whatever. Then users
   could grab old versions of refs from that repo. I do a similar thing
   at work to archive certain refs (old tags), but without renaming
   them.

   The advantage is that you get all refs ever, the disadvantage is that
   you're not going to get a copy of the repo as it was N days ago,
   it'll need to be manually pieced together.

c) Git could be made block-level de-duplication friendly. I was planning
   to work on it, but it's a small enough itch that I didn't care, but
   initial results look promising:
   https://public-inbox.org/git/20180125002942.ga21...@sigill.intra.peff.net/

d) Note that if you're e.g. rsyncing repos that are actively being
   pushed into you're likely to sometimes end up with corrupt repos
   unless you're very careful about what you grab and in what
   order. Best to backup repos with "git fetch".

e) If you're burned by one-off cases like this dev going away for 30
   days you could bump the default expiry that comes with git from 2
   weeks to e.g. 6 weeks. It's still a manual process to recover data
   (with fsck etc), but at least it's there.


GCC Compile farm (Linux, Solaris, AIX etc.) testing of git.git

2018-11-24 Thread Ævar Arnfjörð Bjarmason
I've had access to the GCC Compile Farm for testing on various
architectures for a while. Around the 2.19.0 release I submitted some
patches / bug reports found there.

I've now improved this to run it via GitLab CI & made the output
accessible. Outline of how this works at
https://gitlab.com/git-vcs/git-gitlab-ci/commit/497805b18f

https://gitlab.com/git-vcs/git-ci/branches is a repo that houses a merge
of {master,next,pu} from git.git and that git-gitlab-ci.git repo.

There's still plenty of rough edges to it, but
https://gitlab.com/git-vcs/git-ci/-/jobs already has some useful
results.

Right now this is all pushed out manually. But I'm planning to get it to
a point where soon after Junio pushes changes out a run will kick off on
all these platforms.

The machines it's running on & their build / test configuration is also
something I set up as a one-off.  The full list of available machines is
at https://cfarm.tetaneutral.net/machines/list/ and I picked some subset
that looked interesting (outlier platforms) at
https://gitlab.com/git-vcs/git-gitlab-ci/blob/master/.gitlab-ci.yml


Re: [ANNOUNCE] Git v2.20.0-rc1

2018-11-24 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 21 2018, Junio C Hamano wrote:

>  * "git rebase" and "git rebase -i" have been reimplemented in C.

Here's another regression in the C version (and rc1), note: the
sha1collisiondetection is just a stand in for "some repo":

(
rm -rf /tmp/repo &&
git init /tmp/repo &&
cd /tmp/repo &&
for c in 1 2
do
touch $c &&
git add $c &&
git commit -m"add $c"
done &&
git remote add origin 
https://github.com/cr-marcstevens/sha1collisiondetection.git &&
git fetch &&
git branch --set-upstream-to origin/master &&
git rebase -i
)

The C version will die with "fatal: unable to read tree
". Running this with
rebase.useBuiltin=false does the right thing and rebases as of the merge
base of the two (which here is the root of the history).

I wasn't trying to stress test rebase. I was just wanting to rebase a
history I was about to force-push after cleaning it up, hardly an
obscure use-case. So [repeat last transmission in
https://public-inbox.org/git/87y39w1wc2@evledraar.gmail.com/ ]


Re: What's cooking in git.git (Sep 2018, #01; Tue, 4)

2018-11-24 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 05 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 05 2018, Eric Sunshine wrote:
>
>> On Wed, Sep 5, 2018 at 4:29 AM Ævar Arnfjörð Bjarmason  
>> wrote:
>>> I recently gained access to a Solaris 10 SPARC (5.10) box and discovered
>>> that the chainlint.sed implementation in 2.19.0 has more sed portability
>>> issues.
>>>
>>> First, whoever implemented the /bin/sed on Solaris apparently read the
>>> POSIX requirements for a max length label of 8 to mean that 8 characters
>>> should include the colon, so a bunch of things fail because of that, but
>>> are fixed with a shorter 7 character label.
>>
>> I'm pretty sure that Solaris 'sed' predates POSIX by a good bit, but
>> that's neither here nor there.
>>
>>> Then GIT_TEST_CHAIN_LINT=1 still fails because 878f988350 ("t/test-lib:
>>> teach --chain-lint to detect broken &&-chains in subshells", 2018-07-11)
>>> added a "grep -q" invocation. The /bin/grep on that version of Solaris
>>> doesn't have -q.
>>
>> I knew that '-q' was potentially problematic on some platforms, so I
>> checked and saw that existing tests were already using it, thus went
>> ahead and used it. Dropping '-q' here and redirecting stderr to
>> /dev/null is a perfectly fine alternative.
>>
>>> We fixed a similar issue way back in 80700fde91
>>> ("t/t1304: make a second colon optional in the mask ACL check",
>>> 2010-03-15) by redirecting to /dev/null instead.
>>>
>>> A bunch of other tests in the test suite rely on "grep -q", but nothing
>>> as central as chainlint, so it makes everything break. Do we want to
>>> away with "grep -q" entirely because of old Solaris /bin/grep?
>>
>> I count 132 instances in existing tests (though, I may have missed some).
>>
>>> At this point those familiar with Solaris are screaming ("why are you
>>> using anything in /bin!"). Okey fine, but it mostly worked before, so
>>> are we OK with breaking it? "Mostly" here is "test suite would fail
>>> 20-30 tests for various reasons, but at least no failures in test-lib.sh
>>> and the like".
>>>
>>> However, if as config.mak.uname does we run the tests with
>>> PATH=/usr/xpg6/bin:/usr/xpg4/bin:$PATH, at this point sed is fine with 8
>>> character labels [...]
>>
>> So, if you run the tests via 'make test' (or whatever), then you get
>> /usr/xpg?/bin in PATH, but if you run an individual test script (and
>> haven't set your PATH appropriately), then you encounter the problems
>> you report above?
>
> You need to manually set the PATH before running the tests, the
> SANE_TOOL_PATH just affects git-sh-setup. We should probably fix that,
> i.e. now if you compile git our shellscripts will use the fixed paths,
> but not the tests.
>
>>> [...] but starts complaining about this (also in
>>> chainlint.sed):
>>>
>>> sed: Too many commands, last: s/\n//
>>
>> Erm. Any additional context to help narrow this down?
>
> I tried playing around with it a bit, but honestly don't understand
> enough of this advanced sed syntax to make any headway, it's complaining
> about e.g. the "check for multi-line double-quoted string" rule, but
> then if you remove the s/\n// from there it complains about the next
> rule in that format.
>
> If you want to dig into this yourself I think the best way forward is
> the last two paragraphs of
> https://public-inbox.org/git/20180824152016.20286-1-ava...@gmail.com/
>
>>> diff --git a/config.mak.uname b/config.mak.uname
>>> @@ -163,6 +163,10 @@ ifeq ($(uname_S),SunOS)
>>> BASIC_CFLAGS += -D__EXTENSIONS__ -D__sun__
>>> +   # t/chainlint.sed is hopelessly broken all known (tested
>>> +   # Solaris 10 & 11) versions of Solaris, both /bin/sed and
>>> +   # /usr/xpg4/bin/sed
>>> +   GIT_TEST_CHAIN_LINT = 0
>>>  endif
>>>
>>> It slightly sucks to not have chainlint on
>>> Solaris, but it would also suck to revert chainlint.sed back to 2.18.0
>>> (there were some big improvements). So I think the patch above is the
>>> best way forward, especially since we're on rc2. What do you think?
>>
>> Keeping in mind that the main goal of 'chainlint' is to prevent _new_
>> breakage from entering the test suite, disabling 'chainlint' on
>> Solaris is an entirely reasonable way forward. In present day, it
>> seems quite u

Re: [PATCH v3 3/8] refs: new ref types to make per-worktree refs visible to all worktrees

2018-11-24 Thread Ævar Arnfjörð Bjarmason


On Sun, Oct 21 2018, Nguyễn Thái Ngọc Duy wrote:

This change has a regression in 2.20:

> [...]
>  static void files_reflog_path(struct files_ref_store *refs,
> struct strbuf *sb,
> const char *refname)
> @@ -158,6 +178,9 @@ static void files_reflog_path(struct files_ref_store 
> *refs,
>   case REF_TYPE_PSEUDOREF:
>   strbuf_addf(sb, "%s/logs/%s", refs->gitdir, refname);
>   break;
> + case REF_TYPE_OTHER_PSEUDOREF:
> + case REF_TYPE_MAIN_PSEUDOREF:
> + return files_reflog_path_other_worktrees(refs, sb, refname);
>   case REF_TYPE_NORMAL:
>   strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
>   break;

SunCC on Solaris hard errors on this:

"refs/files-backend.c", line 183: void function cannot return value

Needs to be files...(); break; instead.


Re: [PATCH 1/2] format-patch: add a more exhaustive --range-diff test

2018-11-24 Thread Ævar Arnfjörð Bjarmason


On Sat, Nov 24 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Change the narrow test added in 31e2617a5f ("format-patch: add
>> --range-diff option to embed diff in cover letter", 2018-07-22) to
>> test the full output. This test would have spotted a regression in the
>> output if it wasn't beating around the bush and tested the full
>> output, let's do that.
>
> This looks wrong, given that we are declaring that showing the
> broken --stat in the output is wrong.  It makes us to expect a
> known-wrong output from the command.
>
> If the theme of the topic were "what we are giving by default is a
> sensible output when --stat is asked for, but it is rather too noisy
> and our default should be quieter---let's tone it down", then this
> patch in 1/2 and then a behaviour-change patch with updated
> expectation would make sense, but I do not think that is what you
> are aiming for with this series.
>
> Perhaps squash this into the real "fix" to show what the desired
> output should look like with the same patch?

I see you did that in 'pu'. I don't mind, looks good to me.

FWIW I split this up for ease of review, i.e. showing what the output
was before and what effect the code change had, but maybe that was
overdoing it and it's simpler just to have this all in one go.

>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  t/t3206-range-diff.sh | 27 ++-
>>  1 file changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
>> index e497c1358f..0235c038be 100755
>> --- a/t/t3206-range-diff.sh
>> +++ b/t/t3206-range-diff.sh
>> @@ -249,11 +249,28 @@ for prev in topic master..topic
>>  do
>>  test_expect_success "format-patch --range-diff=$prev" '
>>  git format-patch --stdout --cover-letter --range-diff=$prev \
>> -master..unmodified >actual &&
>> -grep "= 1: .* s/5/A" actual &&
>> -grep "= 2: .* s/4/A" actual &&
>> -grep "= 3: .* s/11/B" actual &&
>> -grep "= 4: .* s/12/B" actual
>> +master..unmodified >actual.raw &&
>> +sed -e "s|^:||" -e "s|:$||" >expect <<-\EOF &&
>> +:1:  4de457d = 1:  35b9b25 s/5/A/
>> +: a => b | 0
>> +: 1 file changed, 0 insertions(+), 0 deletions(-)
>> +::
>> +:2:  fccce22 = 2:  de345ab s/4/A/
>> +: a => b | 0
>> +: 1 file changed, 0 insertions(+), 0 deletions(-)
>> +::
>> +:3:  147e64e = 3:  9af6654 s/11/B/
>> +: a => b | 0
>> +: 1 file changed, 0 insertions(+), 0 deletions(-)
>> +::
>> +:4:  a63e992 = 4:  2901f77 s/12/B/
>> +: a => b | 0
>> +: 1 file changed, 0 insertions(+), 0 deletions(-)
>> +::
>> +:-- :
>> +EOF
>> +sed -ne "/^1:/,/^--/p" actual &&
>> +test_cmp expect actual
>>  '
>>  done


[PATCH 1/2] format-patch: add a more exhaustive --range-diff test

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Change the narrow test added in 31e2617a5f ("format-patch: add
--range-diff option to embed diff in cover letter", 2018-07-22) to
test the full output. This test would have spotted a regression in the
output if it wasn't beating around the bush and tested the full
output, let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3206-range-diff.sh | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e497c1358f..0235c038be 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -249,11 +249,28 @@ for prev in topic master..topic
 do
test_expect_success "format-patch --range-diff=$prev" '
git format-patch --stdout --cover-letter --range-diff=$prev \
-   master..unmodified >actual &&
-   grep "= 1: .* s/5/A" actual &&
-   grep "= 2: .* s/4/A" actual &&
-   grep "= 3: .* s/11/B" actual &&
-   grep "= 4: .* s/12/B" actual
+   master..unmodified >actual.raw &&
+   sed -e "s|^:||" -e "s|:$||" >expect <<-\EOF &&
+   :1:  4de457d = 1:  35b9b25 s/5/A/
+   : a => b | 0
+   : 1 file changed, 0 insertions(+), 0 deletions(-)
+   ::
+   :2:  fccce22 = 2:  de345ab s/4/A/
+   : a => b | 0
+   : 1 file changed, 0 insertions(+), 0 deletions(-)
+   ::
+   :3:  147e64e = 3:  9af6654 s/11/B/
+   : a => b | 0
+   : 1 file changed, 0 insertions(+), 0 deletions(-)
+   ::
+   :4:  a63e992 = 4:  2901f77 s/12/B/
+   : a => b | 0
+   : 1 file changed, 0 insertions(+), 0 deletions(-)
+   ::
+   :-- :
+   EOF
+   sed -ne "/^1:/,/^--/p" actual &&
+   test_cmp expect actual
'
 done
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 2/2] format-patch: don't include --stat with --range-diff output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Fix a regression introduced in my a48e12ef7a ("range-diff: make diff
option behavior (e.g. --stat) consistent", 2018-11-13). Since the
format-patch setup code implicitly sets --stat --summary by default,
we started emitting the --stat output in the cover letter's
range-diff.

As noted in df569c3f31 ("range-diff doc: add a section about output
stability", 2018-11-09) the --stat output is currently rather useless,
and just adds noise.

Perhaps we should detect if --stat or --summary were implicitly passed
to format-patch, and then pass them along, but I think fixing it this
way is fine. If our --stat output ever becomes useful in range-diff we
can revisit this.

There's still cases like e.g. --numstat triggering rather useless
range-diff output, but I think it's OK to just handle the default
case. Users are unlikely to produce a formatted patch with the likes
of --numstat, or indeed any other custom diff option except -U or
maybe -W. If they need weirder combinations of options they can always
manually produce the range-diff.

This whole situation comes about because we're assuming that when the
user passes along e.g. -U10 that they want that some 10-line context
for the range-diff as for the patches themselves. As noted in [1] I
think it's worth re-visiting this and making -U10 just apply to the
patches, and e.g. --range-diff-U10 to the range-diff. But that's left
as a topic for another series less close to a rc2.

1. https://public-inbox.org/git/87d0ri7gbs@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/log.c |  7 ++-
 t/t3206-range-diff.sh | 12 
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 0fe6f9ba1e..7cd2db0be9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1094,9 +1094,13 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
}
 
if (rev->rdiff1) {
+   struct diff_options opts;
+   memcpy(, >diffopt, sizeof(opts));
+   opts.output_format &= ~(DIFF_FORMAT_DIFFSTAT | 
DIFF_FORMAT_SUMMARY);
+
fprintf_ln(rev->diffopt.file, "%s", rev->rdiff_title);
show_range_diff(rev->rdiff1, rev->rdiff2,
-   rev->creation_factor, 1, >diffopt);
+   rev->creation_factor, 1, );
}
 }
 
@@ -1697,6 +1701,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (!use_patch_format &&
(!rev.diffopt.output_format ||
 rev.diffopt.output_format == DIFF_FORMAT_PATCH))
+   /* Needs to be mirrored in show_range_diff() invocation */
rev.diffopt.output_format = DIFF_FORMAT_DIFFSTAT | 
DIFF_FORMAT_SUMMARY;
if (!rev.diffopt.stat_width)
rev.diffopt.stat_width = MAIL_DEFAULT_WRAP;
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 0235c038be..90def330bd 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -252,21 +252,9 @@ do
master..unmodified >actual.raw &&
sed -e "s|^:||" -e "s|:$||" >expect <<-\EOF &&
:1:  4de457d = 1:  35b9b25 s/5/A/
-   : a => b | 0
-   : 1 file changed, 0 insertions(+), 0 deletions(-)
-   ::
:2:  fccce22 = 2:  de345ab s/4/A/
-   : a => b | 0
-   : 1 file changed, 0 insertions(+), 0 deletions(-)
-   ::
:3:  147e64e = 3:  9af6654 s/11/B/
-   : a => b | 0
-   : 1 file changed, 0 insertions(+), 0 deletions(-)
-   ::
:4:  a63e992 = 4:  2901f77 s/12/B/
-   : a => b | 0
-   : 1 file changed, 0 insertions(+), 0 deletions(-)
-   ::
:-- :
EOF
sed -ne "/^1:/,/^--/p" actual &&
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 0/2] format-patch: pre-2.20 range-diff regression fix

2018-11-22 Thread Ævar Arnfjörð Bjarmason
[Dropping LKML & git-packagers from CC]

On Thu, Nov 22 2018, Eric Sunshine wrote:

> On Thu, Nov 22, 2018 at 10:58 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> There's a regression related to this that I wanted to send a headsup
>> for, but don't have time to fix today. Now range-diff in format-patch
>> includes --stat output. See e.g. my
>> https://public-inbox.org/git/20181122132823.9883-1-ava...@gmail.com/
>
> Umf. Unfortunate fallout from [1].
>
>> diff --git a/builtin/log.c b/builtin/log.c
>> @@ -1094,9 +1094,12 @@ static void make_cover_letter(struct rev_info *rev, 
>> int use_stdout,
>> if (rev->rdiff1) {
>> +   const int oldfmt = rev->diffopt.output_format;
>> fprintf_ln(rev->diffopt.file, "%s", rev->rdiff_title);
>> +   rev->diffopt.output_format &= ~(DIFF_FORMAT_DIFFSTAT | 
>> DIFF_FORMAT_SUMMARY);
>> show_range_diff(rev->rdiff1, rev->rdiff2,
>> rev->creation_factor, 1, >diffopt);
>> +   rev->diffopt.output_format = oldfmt;
>> }
>>  }
>
> A few questions/observations:
>
> Does this same "fix" need to be applied to the --interdiff case just
> above this --range-diff block?
>
> Does the same "fix" need to be applied to the --interdiff and
> --range-diff cases in log-tree.c:show_log()?

No, that seems to do the right thing, but perhaps tests are lacking
there too. I haven't looked.

> Aside from fixing the broken --no-patches option[2], a goal of the
> series was to some day make --stat do something useful. Doesn't this
> "fix" go against that goal?

The goal was to fix the regression introduced in 73a834e9e2
("range-diff: relieve callers of low-level configuration burden",
2018-07-22). One aspect of having fixed that is we might in the future
do stuff with --stat.

> The way this change needs to be spread around at various locations is
> making it feel like a BandAid "fix" rather than a proper solution. It
> seems like it should be fixed at a different level, though I'm not
> sure yet if that level is higher (where the options get set) or lower
> (where they actually affect the operation).
>
> Until we figure out the answers to these questions, I wonder if a more
> sensible short-term solution would be to back out [1] and just keep
> [2], which fixed the --no-patches regression.

I think that patch leaves range-diff itself in a good state without
any bugs, and it would be a mistake to revert it.

But this interaction with format-patch --range-diff is another
matter. As noted in 2/2 I think in practice this series sweeps the
current bugs under the rug.

But as also noted there I think re-using what we get from
setup_revisions() in format-patch for the range-diff was a mistake,
and is always going to lead to some confusion. It should be split up
so we can supply those diff options independently.

> [...]
> [1]: https://public-inbox.org/git/20181113185558.23438-4-ava...@gmail.com/
> [2]: https://public-inbox.org/git/20181113185558.23438-3-ava...@gmail.com/

Ævar Arnfjörð Bjarmason (2):
  format-patch: add a more exhaustive --range-diff test
  format-patch: don't include --stat with --range-diff output

 builtin/log.c |  7 ++-
 t/t3206-range-diff.sh | 15 ++-
 2 files changed, 16 insertions(+), 6 deletions(-)

-- 
2.20.0.rc0.387.gc7a69e6b6c



Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)

2018-11-22 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 22 2018, Jeff King wrote:

> On Wed, Nov 21, 2018 at 11:48:14AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>>
>> On Wed, Nov 21 2018, Junio C Hamano wrote:
>>
>> > * jk/loose-object-cache (2018-11-13) 9 commits
>> >   (merged to 'next' on 2018-11-18 at 276691a21b)
>> >  + fetch-pack: drop custom loose object cache
>> >  + sha1-file: use loose object cache for quick existence check
>> >  + object-store: provide helpers for loose_objects_cache
>> >  + sha1-file: use an object_directory for the main object dir
>> >  + handle alternates paths the same as the main object dir
>> >  + sha1_file_name(): overwrite buffer instead of appending
>> >  + rename "alternate_object_database" to "object_directory"
>> >  + submodule--helper: prefer strip_suffix() to ends_with()
>> >  + fsck: do not reuse child_process structs
>> >
>> >  Code clean-up with optimization for the codepath that checks
>> >  (non-)existence of loose objects.
>> >
>> >  Will cook in 'next'.
>>
>> I think as noted in
>> https://public-inbox.org/git/e5148b8c-9a3a-5d2e-ac8c-3e536c0f2...@web.de/
>> that we should hold off the [89]/9 of this series due to the performance
>> regressions this introduces in some cases (while fixing other cases).
>>
>> I hadn't had time to follow up on that, and figured it could wait until
>> post-2.20 for a re-roll.
>
> Yeah, my intent had been to circle back around to this, but I just
> hadn't gotten to it. I'm still pondering a config option or similar,
> though I remain unconvinced that the cases in which you've showed it
> being slow are actually realistic or worth worrying about

FWIW those "used to be 2ms are now 20-40ms" pushes on ext4 are
representative of the actual prod setup I'm mainly targeting. Now, I
don't run on ext4 this patch helps there, but it seems plausible that it
matters to someone who's counting on that performance.

Buh yeah, it's certainly obscure. I don't blame you if you don't want to
hack on it, and not ejecting this out before 2.20 isn't going to break
anything for me. But do you mind if I make it configurable as part of my
post-2.20 "disable collisions?"

>  (and certainly having an obscure config option is not enough to help
> most people). If we could have it kick in heuristically, that would be
> better.

Aside from this specific scenario. I'd really prefer if we avoid having
heuristic performance optimizations at all costs.

Database servers tend to do that sort of thing with their query planner,
and it results in cases where your entire I/O profile changes overnight
because you're now on the wrong side of some if/else heuristic about
whather to use some index or not.

> However, note that the cache-load for finding abbreviations _must_ have
> the complete list. And has been loading it for some time. So if you run
> "git-fetch", for example, you've already been running this code for
> months (and using the cache in more places is now a free speedup).

This is reminding me that I need to get around to re-submitting my
core.validateAbbrev series, which addresses this part of the problem:
https://public-inbox.org/git/20180608224136.20220-21-ava...@gmail.com/

> At the very least, we'd want this patch on top, too. I also think René's
> suggestion use access() is worth pursuing (though to some degree is
> orthogonal to the cache).

I haven't had time to test that, and wasn't prioritizing it since I
figured this was post-2.20. My hunch is it doesn't matter much if at all
on NFS. The roundtrip time is what matters, whether that roundtrip is
fstat() or access() probably not.

> -- >8 --
> Subject: [PATCH] odb_load_loose_cache: fix strbuf leak
>
> Commit 66f04152be (object-store: provide helpers for
> loose_objects_cache, 2018-11-12) moved the cache-loading code from
> find_short_object_filename(), but forgot the line that releases the path
> strbuf.
>
> Reported-by: René Scharfe 
> Signed-off-by: Jeff King 
> ---
>  sha1-file.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sha1-file.c b/sha1-file.c
> index 5894e48ea4..5a272f70de 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2169,6 +2169,7 @@ void odb_load_loose_cache(struct object_directory *odb, 
> int subdir_nr)
>   NULL, NULL,
>   >loose_objects_cache);
>   odb->loose_objects_subdir_seen[subdir_nr] = 1;
> + strbuf_release();
>  }
>
>  static int check_stream_sha1(git_zstream *stream,


[PATCH v4 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'

2018-11-22 Thread Ævar Arnfjörð Bjarmason
From: SZEDER Gábor 

The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
-   uint32_t num_extra_edges = 0;
+   uint32_t num_large_edges = 0;
 
while (list < last) {
struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
if (!parent)
edge_value = GRAPH_PARENT_NONE;
else if (parent->next)
-   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_extra_edges;
+   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_large_edges;
else {
edge_value = sha1_pos(parent->item->object.oid.hash,
  commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
do {
-   num_extra_edges++;
+   num_large_edges++;
parent = parent->next;
} while (parent);
}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
uint32_t chunk_ids[5];
uint64_t chunk_offsets[5];
int num_chunks;
-   int num_extra_edges;
+   int num_large_edges;
struct commit_list *parent;
struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
commits.alloc = count_distinct;
ALLOC_ARRAY(commits.list, commits.alloc);
 
-   num_extra_edges = 0;
+   num_large_edges = 0;
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
if (i > 0 && oideq([i - 1], [i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
num_parents++;
 
if (num_parents > 2)
-   num_extra_edges += num_parents - 1;
+   num_large_edges += num_parents - 1;
 
commits.nr++;
}
-   num_chunks = num_extra_edges ? 4 : 3;
+   num_chunks = num_large_edges ? 4 : 3;
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
chunk_ids[2] = GRAPH_CHUNKID_DATA;
-   if (num_extra_edges)
+   if (num_large_edges)
chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
else
chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-   chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+   chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
for (i = 0; i <= num_chunks; i++) {
uint32_t chunk_write[3];
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 03/10] commit-graph write: rephrase confusing progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.gh19...@szeder.dev/

Reported-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 965eb23a7b..d11370a2b3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commits in commit graph"), 0);
+   _("Annotating commit graph"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

2018-11-22 Thread Ævar Arnfjörð Bjarmason
From: SZEDER Gábor 

The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_fanout(f, commits.list, commits.nr);
write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_large_edges(f, commits.list, commits.nr);
+   if (num_large_edges)
+   write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 04/10] commit-graph write: add "Writing out" progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Add progress output to be shown when we're writing out the
commit-graph, this adds to the output already added in 7b0f229222
("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6365442, done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 100% (318/318), done.

This "Writing out" number is 3x or 4x the number of commits, depending
on the graph we're processing. A later change will make this explicit
to the user.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 48 +++-
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d11370a2b3..dc57b8fedc 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, 
const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 struct commit **commits,
-int nr_commits)
+int nr_commits,
+struct progress *progress,
+uint64_t *progress_cnt)
 {
int i, count = 0;
struct commit **list = commits;
@@ -447,6 +449,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
while (count < nr_commits) {
if ((*list)->object.oid.hash[0] != i)
break;
+   display_progress(progress, ++*progress_cnt);
count++;
list++;
}
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
int count;
-   for (count = 0; count < nr_commits; count++, list++)
+   for (count = 0; count < nr_commits; count++, list++) {
+   display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+   }
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
struct commit_list *parent;
int edge_value;
uint32_t packedDate[2];
+   display_progress(progress, ++*progress_cnt);
 
parse_commit(*list);
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
  struct commit **commits,
- int nr_commits)
+ int nr_commits,
+ struct progress *progress,
+ uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -550,6 +562,9 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
 
while (list < last) {
int num_parents = 0;
+
+   display_progress(progress, ++*progress_cnt);
+
for (parent = (*list)->parents; num_parents < 3 && parent;
 parent = parent->next)
num_parents++;
@@ -764,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
int num_large_edges;
 

[PATCH v4 09/10] commit-graph write: add itermediate progress

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.

On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(48333911/48333911), done.
Annotating commit graph: 21435984, done.
Counting distinct commits in commit graph: 100% (7145328/7145328), done.
Finding extra edges in commit graph: 100% (7145328/7145328), done.
Computing commit graph generation numbers: 100% (7145328/7145328), done.
Writing out commit graph in 4 passes: 100% (28581312/28581312), done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365442/6365442), done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph in 4 passes: 100% (318/318), done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 199155bd68..80f201adf4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,12 +889,19 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable(, report_progress);
 
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Counting distinct commits in commit graph"),
+   oids.nr);
+   display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(oids.list, oids.nr, commit_compare);
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
+   display_progress(progress, i + 1);
if (!oideq([i - 1], [i]))
count_distinct++;
}
+   stop_progress();
 
if (count_distinct >= GRAPH_PARENT_MISSING)
die(_("the commit graph format cannot write %d commits"), 
count_distinct);
@@ -904,8 +911,13 @@ void write_commit_graph(const char *obj_dir,
ALLOC_ARRAY(commits.list, commits.alloc);
 
num_large_edges = 0;
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Finding extra edges in commit graph"),
+   oids.nr);
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
+   display_progress(progress, i + 1);
if (i > 0 && oideq([i - 1], [i]))
continue;
 
@@ -922,6 +934,7 @@ void write_commit_graph(const char *obj_dir,
commits.nr++;
}
num_chunks = num_large_edges ? 4 : 3;
+   stop_progress();
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 00/10] commit-graph write: progress output improvements

2018-11-22 Thread Ævar Arnfjörð Bjarmason
The "Writing out" progress output was off-by-one because I'd screwed
up a merge conflict. Fix that, and update the various progress output.

On my test setup the "Annotating commit graph" progress sometimes
shows up on linux.git, sometimes not, it's right on that edge of
taking 1 second. So always show it in the commit message examples,
that's less confusing for the reader.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

Ævar Arnfjörð Bjarmason (8):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add "Writing out" progress output
  commit-graph write: more descriptive "writing out" output
  commit-graph write: show progress for object search
  commit-graph write: add more descriptive progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add itermediate progress
  commit-graph write: emit a percentage for all progress

 commit-graph.c | 130 ++---
 1 file changed, 102 insertions(+), 28 deletions(-)

Range-diff:
1:  2b52ad2284 ! 1:  9c17f56ed3 commit-graph write: add "Writing out" progress 
output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

@@ -15,10 +15,11 @@
 
 $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
 Finding commits for commit graph: 6365442, done.
+Annotating commit graph: 2391666, done.
 Computing commit graph generation numbers: 100% (797222/797222), 
done.
-Writing out commit graph: 100% (3986110/3986110), done.
+Writing out commit graph: 100% (318/318), done.
 
-This "Writing out" number is 4x or 5x the number of commits, depending
+This "Writing out" number is 3x or 4x the number of commits, depending
 on the graph we're processing. A later change will make this explicit
 to the user.
 
@@ -126,7 +127,7 @@
 +   * below loops over our N commits. This number must be
 +   * kept in sync with the number of passes we're doing.
 +   */
-+  int graph_passes = 4;
++  int graph_passes = 3;
 +  if (num_large_edges)
 +  graph_passes++;
 +  progress = start_delayed_progress(
2:  b1773677b1 ! 2:  79b0a467d9 commit-graph write: more descriptive "writing 
out" output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

@@ -3,7 +3,7 @@
 commit-graph write: more descriptive "writing out" output
 
 Make the "Writing out" part of the progress output more
-descriptive. Depending on the shape of the graph we either make 4 or 5
+descriptive. Depending on the shape of the graph we either make 3 or 4
 passes over it.
 
 Let's present this information to the user in case they're wondering
@@ -13,8 +13,9 @@
 
 $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
 Finding commits for commit graph: 6365442, done.
+Annotating commit graph: 2391666, done.
 Computing commit graph generation numbers: 100% (797222/797222), 
done.
-Writing out commit graph in 5 passes: 100% (3986110/3986110), done.
+Writing out commit graph in 4 passes: 100% (318/318), done.
 
 A note on i18n: Why are we using the Q_() function and passing a
 number & English text for a singular which'll never be used? Because
@@ -35,7 +36,7 @@
if (!commit_graph_compatible(the_repository))
return;
 @@
-   int graph_passes = 4;
+   int graph_passes = 3;
if (num_large_edges)
graph_passes++;
 +  strbuf_addf(_title,
3:  3138b00a2c ! 3:  b32be83b38 commit-graph write: show progress for object 
search
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

@@ -8,12 +8,12 @@
 
 Before we'd emit on e.g. linux.git with "commit-graph write":
 
-Finding commits for commit graph: 6365492, done.
+Finding commits for commit graph: 6365442, done.
 [...]
 
 And now:
 
-Finding commits for commit graph: 100% (6365492/6365492), done.
+Finding commits for commit graph: 100% (6365442/6365442), done.
 [...]
 
 Since the commit graph only includes those commits that are packed
4:  f41e3b3eb3 ! 4:  54276723c0 commit-graph write: add more descriptive 
progress output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

@@ -10,7 +10,7 @@
 we support:
 
 $ git commit-graph write
 

[PATCH v4 10/10] commit-graph write: emit a percentage for all progress

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Change the "Annotating commit graph" progress output to show a
completion percentage. I added this in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) and evidently didn't notice
how easy it was to add a completion percentage.

Now for e.g. linux.git we'll emit:

~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365442/6365442), done.
Annotating commit graph: 100% (2391666/2391666), done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph in 4 passes: 100% (318/318), done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 80f201adf4..6c6edc679b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -660,10 +660,17 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
struct commit *commit;
struct progress *progress = NULL;
int j = 0;
+   /*
+* We loop over the OIDs N times to close the graph
+* below. This number must be kept in sync with the number of
+* passes.
+*/
+   const int oid_passes = 3;
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commit graph"), 0);
+   _("Annotating commit graph"),
+   oid_passes * oids->nr);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 05/10] commit-graph write: more descriptive "writing out" output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Make the "Writing out" part of the progress output more
descriptive. Depending on the shape of the graph we either make 3 or 4
passes over it.

Let's present this information to the user in case they're wondering
what this number, which is much larger than their number of commits,
has to do with writing out the commit graph. Now e.g. on linux.git we
emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6365442, done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph in 4 passes: 100% (318/318), done.

A note on i18n: Why are we using the Q_() function and passing a
number & English text for a singular which'll never be used? Because
the plural rules of translated languages may not match those of
English, and to use the plural function we need to use this format.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index dc57b8fedc..3de65bc2e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -780,6 +780,7 @@ void write_commit_graph(const char *obj_dir,
struct commit_list *parent;
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
+   struct strbuf progress_title = STRBUF_INIT;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -962,8 +963,13 @@ void write_commit_graph(const char *obj_dir,
int graph_passes = 3;
if (num_large_edges)
graph_passes++;
+   strbuf_addf(_title,
+   Q_("Writing out commit graph in %d pass",
+  "Writing out commit graph in %d passes",
+  graph_passes),
+   graph_passes);
progress = start_delayed_progress(
-   _("Writing out commit graph"),
+   progress_title.buf,
graph_passes * commits.nr);
}
write_graph_chunk_fanout(f, commits.list, commits.nr, progress, 
_cnt);
@@ -973,6 +979,8 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_large_edges(f, commits.list, commits.nr, 
progress, _cnt);
stop_progress();
 
+   strbuf_release(_title);
+
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 08/10] commit-graph write: remove empty line for readability

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 43b15785f6..199155bd68 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -890,7 +890,6 @@ void write_commit_graph(const char *obj_dir,
close_reachable(, report_progress);
 
QSORT(oids.list, oids.nr, commit_compare);
-
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
if (!oideq([i - 1], [i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 06/10] commit-graph write: show progress for object search

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

Finding commits for commit graph: 6365442, done.
[...]

And now:

Finding commits for commit graph: 100% (6365442/6365442), done.
[...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 3de65bc2e9..42d8365f0d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -781,12 +781,14 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
struct strbuf progress_title = STRBUF_INIT;
+   unsigned long approx_nr_objects;
 
if (!commit_graph_compatible(the_repository))
return;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 32;
+   approx_nr_objects = approximate_object_count();
+   oids.alloc = approx_nr_objects / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
@@ -866,8 +868,11 @@ void write_commit_graph(const char *obj_dir,
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   _("Finding commits for commit graph"),
+   approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
+   if (oids.progress_done < approx_nr_objects)
+   display_progress(oids.progress, approx_nr_objects);
stop_progress();
}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v4 07/10] commit-graph write: add more descriptive progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

$ git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365442/6365442), done.
[...]

# Actually we don't emit this since this takes almost no time at
# all. But if we did (s/_delayed//) we'd show:
$ git for-each-ref --format='%(objectname)' | git commit-graph write 
--stdin-commits
Finding commits for commit graph from 584 refs: 100% (584/584), done.
[...]

$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 2 packs: 6365442, done.
[...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

Finding commits for commit graph from 165203 refs: 100% (165203/165203), 
done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 42d8365f0d..43b15785f6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -818,8 +818,12 @@ void write_commit_graph(const char *obj_dir,
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
if (report_progress) {
-   oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph in %d 
pack",
+  "Finding commits for commit graph in %d 
packs",
+  pack_indexes->nr),
+   pack_indexes->nr);
+   oids.progress = 
start_delayed_progress(progress_title.buf, 0);
oids.progress_done = 0;
}
for (i = 0; i < pack_indexes->nr; i++) {
@@ -836,14 +840,20 @@ void write_commit_graph(const char *obj_dir,
free(p);
}
stop_progress();
+   strbuf_reset(_title);
strbuf_release();
}
 
if (commit_hex) {
-   if (report_progress)
-   progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
-   commit_hex->nr);
+   if (report_progress) {
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph from 
%d ref",
+  "Finding commits for commit graph from 
%d refs",
+  commit_hex->nr),
+   commit_hex->nr);
+   progress = start_delayed_progress(progress_title.buf,
+ commit_hex->nr);
+   }
for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
@@ -863,12 +873,13 @@ void write_commit_graph(const char *obj_dir,
}
}
stop_progress();
+   strbuf_reset(_title);
}
 
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
+   _("Finding commits for commit graph among 
packed objects"),
approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
if (oids.progress_done < approx_nr_objects)
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 08/10] commit-graph write: remove empty line for readability

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index cb1aebeb79..21751231e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -890,7 +890,6 @@ void write_commit_graph(const char *obj_dir,
close_reachable(, report_progress);
 
QSORT(oids.list, oids.nr, commit_compare);
-
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
if (!oideq([i - 1], [i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 09/10] commit-graph write: add itermediate progress

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.

On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(50026015/50026015), done.
Annotating commit graph: 21567407, done.
Counting distinct commits in commit graph: 100% (7189147/7189147), done.
Finding extra edges in commit graph: 100% (7189147/7189147), done.
Computing commit graph generation numbers: 100% (7144680/7144680), done.
Writing out commit graph: 21434417, done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 2399912, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 21751231e0..a6e6eeb56b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,12 +889,19 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable(, report_progress);
 
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Counting distinct commits in commit graph"),
+   oids.nr);
+   display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(oids.list, oids.nr, commit_compare);
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
+   display_progress(progress, i + 1);
if (!oideq([i - 1], [i]))
count_distinct++;
}
+   stop_progress();
 
if (count_distinct >= GRAPH_PARENT_MISSING)
die(_("the commit graph format cannot write %d commits"), 
count_distinct);
@@ -904,8 +911,13 @@ void write_commit_graph(const char *obj_dir,
ALLOC_ARRAY(commits.list, commits.alloc);
 
num_large_edges = 0;
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Finding extra edges in commit graph"),
+   oids.nr);
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
+   display_progress(progress, i + 1);
if (i > 0 && oideq([i - 1], [i]))
continue;
 
@@ -922,6 +934,7 @@ void write_commit_graph(const char *obj_dir,
commits.nr++;
}
num_chunks = num_large_edges ? 4 : 3;
+   stop_progress();
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 10/10] commit-graph write: emit a percentage for all progress

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Change the "Annotating commit graph" progress output to show a
completion percentage. I added this in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) and evidently didn't notice
how easy it was to add a completion percentage.

Now for the very large test repository mentioned in previous commits
we'll emit (shows all progress output):

Finding commits for commit graph among packed objects: 100% 
(48333911/48333911), done.
Annotating commit graph: 100% (21435984/21435984), done.
Counting distinct commits in commit graph: 100% (7145328/7145328), done.
Finding extra edges in commit graph: 100% (7145328/7145328), done.
Computing commit graph generation numbers: 100% (7145328/7145328), done.
Writing out commit graph in 5 passes: 100% (35726640/35726640), done.

And for linux.git:

Finding commits for commit graph among packed objects: 100% 
(6365442/6365442), done.
Annotating commit graph: 100% (2391666/2391666), done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph in 5 passes: 100% (3986110/3986110), done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index a6e6eeb56b..c893466042 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -660,10 +660,17 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
struct commit *commit;
struct progress *progress = NULL;
int j = 0;
+   /*
+* We loop over the OIDs N times to close the graph
+* below. This number must be kept in sync with the number of
+* passes.
+*/
+   const int oid_passes = 3;
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commit graph"), 0);
+   _("Annotating commit graph"),
+   oid_passes * oids->nr);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 04/10] commit-graph write: add "Writing out" progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Add progress output to be shown when we're writing out the
commit-graph, this adds to the output already added in 7b0f229222
("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6365442, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 100% (3986110/3986110), done.

This "Writing out" number is 4x or 5x the number of commits, depending
on the graph we're processing. A later change will make this explicit
to the user.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 48 +++-
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d11370a2b3..e32a5cc1bc 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, 
const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 struct commit **commits,
-int nr_commits)
+int nr_commits,
+struct progress *progress,
+uint64_t *progress_cnt)
 {
int i, count = 0;
struct commit **list = commits;
@@ -447,6 +449,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
while (count < nr_commits) {
if ((*list)->object.oid.hash[0] != i)
break;
+   display_progress(progress, ++*progress_cnt);
count++;
list++;
}
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
int count;
-   for (count = 0; count < nr_commits; count++, list++)
+   for (count = 0; count < nr_commits; count++, list++) {
+   display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+   }
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
struct commit_list *parent;
int edge_value;
uint32_t packedDate[2];
+   display_progress(progress, ++*progress_cnt);
 
parse_commit(*list);
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
  struct commit **commits,
- int nr_commits)
+ int nr_commits,
+ struct progress *progress,
+ uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -550,6 +562,9 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
 
while (list < last) {
int num_parents = 0;
+
+   display_progress(progress, ++*progress_cnt);
+
for (parent = (*list)->parents; num_parents < 3 && parent;
 parent = parent->next)
num_parents++;
@@ -764,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
int num_large_edges;
struct commit_list *parent;

[PATCH v3 00/10] commit-graph write: progress output improvements

2018-11-22 Thread Ævar Arnfjörð Bjarmason
This incorporates SZEDER's recent two-part series, rebases mine on
top, and fixes a few things while I'm at it. Now there's no progress
output where we don't show a completion percentage.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

Ævar Arnfjörð Bjarmason (8):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add "Writing out" progress output
  commit-graph write: more descriptive "writing out" output
  commit-graph write: show progress for object search
  commit-graph write: add more descriptive progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add itermediate progress
  commit-graph write: emit a percentage for all progress

 commit-graph.c | 130 ++---
 1 file changed, 102 insertions(+), 28 deletions(-)

Range-diff:

By the way, is there any way to

 [.. snipped lots of irrelevant commits...]
 -:  -- > 14:  07d06c50c0 commit-graph: rename 'num_extra_edges' 
variable to 'num_large_edges'
 -:  -- > 15:  904dda1e7a commit-graph: don't call 
write_graph_chunk_large_edges() unnecessarily

Pass the equivalent of "git range-diff origin/master topic-2 topic-3"
to git-format-patch?

 1:  9f7fb459bd = 16:  1126c7e29d commit-graph write: rephrase confusing 
progress output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

 2:  093c63e99f ! 17:  2b52ad2284 commit-graph write: add more progress output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
    
@@ -1,9 +1,10 @@
 Author: Ævar Arnfjörð Bjarmason 
 
-commit-graph write: add more progress output
+commit-graph write: add "Writing out" progress output
 
-Add more progress output to the output already added in
-7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
+Add progress output to be shown when we're writing out the
+commit-graph, this adds to the output already added in 7b0f229222
+("commit-graph write: add progress output", 2018-09-17).
 
 As noted in that commit most of the progress output isn't displayed on
 small repositories, but before this change we'd noticeably hang for
@@ -13,30 +14,13 @@
 point at which we're not producing progress output:
 
 $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
-Finding commits for commit graph: 6365492, done.
+Finding commits for commit graph: 6365442, done.
 Computing commit graph generation numbers: 100% (797222/797222), 
done.
-Writing out commit graph: 2399912, done.
+Writing out commit graph: 100% (3986110/3986110), done.
 
-This "writing out" number is not meant to be meaningful to the user,
-but just to show that we're doing work and the command isn't
-hanging.
-
-In the current implementation it's approximately 4x the number of
-commits. As noted in on-list discussion[1] we could add the loops up
-and show percentage progress here, but I don't think it's worth it. It
-would make the implementation more complex and harder to maintain for
-very little gain.
-
-On a much larger in-house repository I have we'll show (note how we
-also say "Annotating[...]"):
-
-$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
-Finding commits for commit graph: 50026015, done.
-Annotating commit graph: 21567407, done.
-Computing commit graph generation numbers: 100% (7144680/7144680), 
done.
-Writing out commit graph: 21434417, done.
-
-1. https://public-inbox.org/git/20181120165800.gb30...@szeder.dev/
+This "Writing out" number is 4x or 5x the number of commits, depending
+on the graph we're processing. A later change will make this explicit
+to the user.
 
 Signed-off-by: Ævar Arnfjörð Bjarmason 
 
@@ -55,13 +39,13 @@
int i, count = 0;
struct commit **list = commits;
 @@
-*/
-   for (i = 0; i < 256; i++) {
while (count < nr_commits) {
-+  display_progress(progress, ++*progress_cnt);
if ((*list)->object.oid.hash[0] != i)
break;
++  display_progress(progress, ++*progress_cnt);
count++;
+   list++;
+   }
 @@
  }
  
@@ -112,15 +96,17 @@
struct commit **list = commits;
struct commit **last = commits + nr_commits;
 @@
-   

[PATCH v3 07/10] commit-graph write: add more descriptive progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

$ git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
[...]

# Actually we don't emit this since this takes almost no time at
# all. But if we did (s/_delayed//) we'd show:
$ git for-each-ref --format='%(objectname)' | git commit-graph write 
--stdin-commits
Finding commits for commit graph from 584 refs: 100% (584/584), done.
[...]

$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 3 packs: 6365492, done.
[...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

Finding commits for commit graph from 165203 refs: 100% (165203/165203), 
done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d6166beb19..cb1aebeb79 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -818,8 +818,12 @@ void write_commit_graph(const char *obj_dir,
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
if (report_progress) {
-   oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph in %d 
pack",
+  "Finding commits for commit graph in %d 
packs",
+  pack_indexes->nr),
+   pack_indexes->nr);
+   oids.progress = 
start_delayed_progress(progress_title.buf, 0);
oids.progress_done = 0;
}
for (i = 0; i < pack_indexes->nr; i++) {
@@ -836,14 +840,20 @@ void write_commit_graph(const char *obj_dir,
free(p);
}
stop_progress();
+   strbuf_reset(_title);
strbuf_release();
}
 
if (commit_hex) {
-   if (report_progress)
-   progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
-   commit_hex->nr);
+   if (report_progress) {
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph from 
%d ref",
+  "Finding commits for commit graph from 
%d refs",
+  commit_hex->nr),
+   commit_hex->nr);
+   progress = start_delayed_progress(progress_title.buf,
+ commit_hex->nr);
+   }
for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
@@ -863,12 +873,13 @@ void write_commit_graph(const char *obj_dir,
}
}
stop_progress();
+   strbuf_reset(_title);
}
 
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
+   _("Finding commits for commit graph among 
packed objects"),
approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
if (oids.progress_done < approx_nr_objects)
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 06/10] commit-graph write: show progress for object search

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

Finding commits for commit graph: 6365492, done.
[...]

And now:

Finding commits for commit graph: 100% (6365492/6365492), done.
[...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8e5970f0b9..d6166beb19 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -781,12 +781,14 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
struct strbuf progress_title = STRBUF_INIT;
+   unsigned long approx_nr_objects;
 
if (!commit_graph_compatible(the_repository))
return;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 32;
+   approx_nr_objects = approximate_object_count();
+   oids.alloc = approx_nr_objects / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
@@ -866,8 +868,11 @@ void write_commit_graph(const char *obj_dir,
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   _("Finding commits for commit graph"),
+   approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
+   if (oids.progress_done < approx_nr_objects)
+   display_progress(oids.progress, approx_nr_objects);
stop_progress();
}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

2018-11-22 Thread Ævar Arnfjörð Bjarmason
From: SZEDER Gábor 

The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_fanout(f, commits.list, commits.nr);
write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_large_edges(f, commits.list, commits.nr);
+   if (num_large_edges)
+   write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 05/10] commit-graph write: more descriptive "writing out" output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Make the "Writing out" part of the progress output more
descriptive. Depending on the shape of the graph we either make 4 or 5
passes over it.

Let's present this information to the user in case they're wondering
what this number, which is much larger than their number of commits,
has to do with writing out the commit graph. Now e.g. on linux.git we
emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6365442, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph in 5 passes: 100% (3986110/3986110), done.

A note on i18n: Why are we using the Q_() function and passing a
number & English text for a singular which'll never be used? Because
the plural rules of translated languages may not match those of
English, and to use the plural function we need to use this format.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index e32a5cc1bc..8e5970f0b9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -780,6 +780,7 @@ void write_commit_graph(const char *obj_dir,
struct commit_list *parent;
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
+   struct strbuf progress_title = STRBUF_INIT;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -962,8 +963,13 @@ void write_commit_graph(const char *obj_dir,
int graph_passes = 4;
if (num_large_edges)
graph_passes++;
+   strbuf_addf(_title,
+   Q_("Writing out commit graph in %d pass",
+  "Writing out commit graph in %d passes",
+  graph_passes),
+   graph_passes);
progress = start_delayed_progress(
-   _("Writing out commit graph"),
+   progress_title.buf,
graph_passes * commits.nr);
}
write_graph_chunk_fanout(f, commits.list, commits.nr, progress, 
_cnt);
@@ -973,6 +979,8 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_large_edges(f, commits.list, commits.nr, 
progress, _cnt);
stop_progress();
 
+   strbuf_release(_title);
+
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 03/10] commit-graph write: rephrase confusing progress output

2018-11-22 Thread Ævar Arnfjörð Bjarmason
Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.gh19...@szeder.dev/

Reported-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 965eb23a7b..d11370a2b3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commits in commit graph"), 0);
+   _("Annotating commit graph"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v3 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'

2018-11-22 Thread Ævar Arnfjörð Bjarmason
From: SZEDER Gábor 

The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
-   uint32_t num_extra_edges = 0;
+   uint32_t num_large_edges = 0;
 
while (list < last) {
struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
if (!parent)
edge_value = GRAPH_PARENT_NONE;
else if (parent->next)
-   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_extra_edges;
+   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_large_edges;
else {
edge_value = sha1_pos(parent->item->object.oid.hash,
  commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
do {
-   num_extra_edges++;
+   num_large_edges++;
parent = parent->next;
} while (parent);
}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
uint32_t chunk_ids[5];
uint64_t chunk_offsets[5];
int num_chunks;
-   int num_extra_edges;
+   int num_large_edges;
struct commit_list *parent;
struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
commits.alloc = count_distinct;
ALLOC_ARRAY(commits.list, commits.alloc);
 
-   num_extra_edges = 0;
+   num_large_edges = 0;
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
if (i > 0 && oideq([i - 1], [i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
num_parents++;
 
if (num_parents > 2)
-   num_extra_edges += num_parents - 1;
+   num_large_edges += num_parents - 1;
 
commits.nr++;
}
-   num_chunks = num_extra_edges ? 4 : 3;
+   num_chunks = num_large_edges ? 4 : 3;
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
chunk_ids[2] = GRAPH_CHUNKID_DATA;
-   if (num_extra_edges)
+   if (num_large_edges)
chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
else
chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-   chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+   chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
for (i = 0; i <= num_chunks; i++) {
uint32_t chunk_write[3];
-- 
2.20.0.rc0.387.gc7a69e6b6c



Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)

2018-11-22 Thread Ævar Arnfjörð Bjarmason
On Thu, Nov 22, 2018 at 11:31 AM Stephen P. Smith  wrote:
>
> On Wednesday, November 21, 2018 2:00:16 AM MST Junio C Hamano wrote:
> > [Stalled]
> >
> > * lt/date-human (2018-07-09) 1 commit
> >  - Add 'human' date format
> >
> >  A new date format "--date=human" that morphs its output depending
> >  on how far the time is from the current time has been introduced.
> >  "--date=auto" can be used to use this new format when the output is
> >  goint to the pager or to the terminal and otherwise the default
> >  format.
>
> What needs to be done with this patch to move it along?

In e.g. "Git Test Coverage Report (Wednesday Nov 21)" by Stolee you
can see that the new code in date.c is largely uncovered. Adding tests
for the behavior would be a good start.

> I see that both Linus and Junio have signed the patch.

That just means Linus wrote it and Junio ran "git am -s" on it.


Re: [PATCH v1 1/2] log -G: Ignore binary files

2018-11-22 Thread Ævar Arnfjörð Bjarmason
>
On Wed, Nov 21 2018, Thomas Braun wrote:

> The -G  option of log looks for the differences whose patch text
> contains added/removed lines that match regex.
>
> The concept of differences only makes sense for text files, therefore
> we need to ignore binary files when searching with -G  as well.
>
> Signed-off-by: Thomas Braun 
> ---
>  Documentation/gitdiffcore.txt |  2 +-
>  diffcore-pickaxe.c|  5 +
>  t/t4209-log-pickaxe.sh| 22 ++
>  3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index c0a60f3158..059ddd3431 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -242,7 +242,7 @@ textual diff has an added or a deleted line that matches 
> the given
>  regular expression.  This means that it will detect in-file (or what
>  rename-detection considers the same file) moves, which is noise.  The
>  implementation runs diff twice and greps, and this can be quite
> -expensive.
> +expensive.  Binary files without textconv filter are ignored.
>
>  When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
>  that match their respective criterion are kept in the output.  When
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 69fc55ea1e..8c2558b07d 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -144,6 +144,11 @@ static int pickaxe_match(struct diff_filepair *p, struct 
> diff_options *o,
>   textconv_two = get_textconv(o->repo->index, p->two);
>   }
>
> + if ((o->pickaxe_opts & DIFF_PICKAXE_KIND_G) &&
> + ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
> +  (!textconv_two && diff_filespec_is_binary(o->repo, p->two
> + return 0;
> +
>   /*
>* If we have an unmodified pair, we know that the count will be the
>* same and don't even have to load the blobs. Unless textconv is in
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index 844df760f7..42cc8afd8b 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -106,4 +106,26 @@ test_expect_success 'log -S --no-textconv (missing 
> textconv tool)' '
>   rm .gitattributes
>  '
>
> +test_expect_success 'log -G ignores binary files' '
> + rm -rf .git &&
> + git init &&
> + printf "a\0b" >data.bin &&
> + git add data.bin &&
> + git commit -m "message" &&
> + git log -G a >result &&

Would be less confusing as "-Ga" since that's the invocation we
document, even though I see (but wasn't aware that...) "-G a" works too.

> + test_must_be_empty result
> +'
> +
> +test_expect_success 'log -G looks into binary files with textconv filter' '
> + rm -rf .git &&
> + git init &&
> + echo "* diff=bin" > .gitattributes &&
> + printf "a\0b" >data.bin &&
> + git add data.bin &&
> + git commit -m "message" &&
> + git -c diff.bin.textconv=cat log -G a >actual &&
> + git log >expected &&
> + test_cmp actual expected
> +'
> +
>  test_done

This patch seems like the wrong direction to me. In particular the
assertion that "the concept of differences only makes sense for text
files". That's just not true. This patch breaks this:

(
rm -rf /tmp/g-test &&
git init /tmp/g-test &&
cd /tmp/g-test &&
for i in {1..10}; do
echo "Always matching thensome 5" >file &&
printf "a thensome %d binary \0" $i >>file &&
git add file &&
git commit -m"Bump $i"
done &&
git log -Gthensome.*5
)

Right now this will emit 3/10 patches, and the right ones! I.e. "Bump
[156]". The 1st one because it introduces the "Always matching thensome
5". Then 5/6 because the add/remove the string "a thensome 5 binary",
respectively. Which matches /thensome.*5/.

I.e. in the first one we do a regex match against the content here
because we don't have both sides:
https://github.com/git/git/blob/v2.19.2/diffcore-pickaxe.c#L48-L53

And then for the later ones where we have both sides we end up in
diffgrep_consume():
https://github.com/git/git/blob/v2.19.2/diffcore-pickaxe.c#L27-L36

I think there may be a real issue here to address, which might be some
combination of:

 a) Even though the diffcore can do a binary diff internally, this is
not what it exposes with "-p", we just say "Binary files differ".

I don't know how to emit the raw version we'll end up passing to
diffgrep_consume() in this case. Is it just --binary without the
encoding? I don't know...

 b) Your test case shows that you're matching a string at a \0
boundary. Is this perhaps something you ran into? I.e. that we don't
have some -F version of -G so we can't supply regexes that match
past a \0? I had some related work on grep for this that hasn't been
carried over to the diffcore:

git log --grep='grep:.*\\0' --author=Ævar

 c) Is this binary 

Re: [PATCH v1 2/2] log -S: Add test which searches in binary files

2018-11-22 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 21 2018, Thomas Braun wrote:

> The -S  option of log looks for differences that changes the
> number of occurrences of the specified string (i.e. addition/deletion)
> in a file.
>
> Add a test to ensure that we keep looking into binary files with -S
> as changing that would break backwards compatibility in unexpected ways.
>
> Signed-off-by: Thomas Braun 
> ---
>  t/t4209-log-pickaxe.sh | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index 42cc8afd8b..d430f6f2f9 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -128,4 +128,15 @@ test_expect_success 'log -G looks into binary files with 
> textconv filter' '
>   test_cmp actual expected
>  '
>
> +test_expect_success 'log -S looks into binary files' '
> + rm -rf .git &&
> + git init &&
> + printf "a\0b" >data.bin &&
> + git add data.bin &&
> + git commit -m "message" &&
> + git log -S a >actual &&
> + git log >expected &&
> + test_cmp actual expected
> +'
> +
>  test_done

This should just be part of 1/2 since the behavior is changed there &
the commit message should describe both cases.


Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)

2018-11-21 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 21 2018, Junio C Hamano wrote:

> * jk/loose-object-cache (2018-11-13) 9 commits
>   (merged to 'next' on 2018-11-18 at 276691a21b)
>  + fetch-pack: drop custom loose object cache
>  + sha1-file: use loose object cache for quick existence check
>  + object-store: provide helpers for loose_objects_cache
>  + sha1-file: use an object_directory for the main object dir
>  + handle alternates paths the same as the main object dir
>  + sha1_file_name(): overwrite buffer instead of appending
>  + rename "alternate_object_database" to "object_directory"
>  + submodule--helper: prefer strip_suffix() to ends_with()
>  + fsck: do not reuse child_process structs
>
>  Code clean-up with optimization for the codepath that checks
>  (non-)existence of loose objects.
>
>  Will cook in 'next'.

I think as noted in
https://public-inbox.org/git/e5148b8c-9a3a-5d2e-ac8c-3e536c0f2...@web.de/
that we should hold off the [89]/9 of this series due to the performance
regressions this introduces in some cases (while fixing other cases).

I hadn't had time to follow up on that, and figured it could wait until
post-2.20 for a re-roll.


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-21 Thread Ævar Arnfjörð Bjarmason


On Wed, Nov 21 2018, Jonathan Nieder wrote:

> Junio C Hamano wrote:
>
>>  This series has a strong smell of pushing back by the
>> toolsmiths who refuse to promptly upgrade to help their users, and
>> that is why I do not feel entirely happy with this series.
>
> Last reply, I promise. :)
>
> This sentence might have the key to the misunderstanding.  Let me say
> a little more about where this showed up in the internal deployment
> here, to clarify things a little.
>
> At Google we deploy snapshots of the "next" branch approximately
> weekly so that we can find problems early before they affect a
> published release.  We rely on the ability to roll back quickly when a
> problem is discovered, and we might care more about compatibility than
> some others because of that.
>
> A popular tool within Google has a bundled copy of Git (also a
> snapshot of the "next" branch, but from a few weeks prior) and when we
> deployed Git with the EOIE and IEOT extensions, users of that tool
> very quickly reported the mysterious message.
>
> That said, the maintainers of that tool did not complain at all, so
> hopefully I can allay your worries about toolsmiths pushing back.
> Once the problem reached my attention (a few days later than I would
> have liked it to), the Git team at Google knew that we could not roll
> back and were certainly alarmed about what that means about our
> ability to cope with other problems should we need to.  But we were
> able to quickly update that popular tool --- no issue.
>
> Instead, we ran into a number of other users running into the same
> problem, when sharing repositories between machines using sshfs, etc.
> That, plus the aforementioned inability to roll back Git if we need
> to, meant that this was a serious issue so we quickly addressed it in
> the internal installation.
>
> In general, we haven't had much trouble getting people to use Git
> 2.19.1 or newer.  So the problem here does not have to do with users
> being slow to upgrade.
>
> Instead, it's simply that upgrading Git should not cause the older,
> widely deployed version of Git to complain about the repositories it
> acts on.  That's a recipe for difficult debugging situations, it can
> lead to people upgrading less quickly and reporting bugs later, and
> all in all it's a bad situation to be in.  I've used tools like
> Subversion that would upgrade repositories so they are unusable by the
> previous version and experienced all of these problems.
>
> So I consider it important *to Git upstream* to handle this well in
> the Git 2.20 release.  We can flip the default soon after, even as
> soon as 2.21.
>
> Moreover, I am not the only one who ran into this --- e.g. from [1],
> 2018-10-19:
>
>   17:10  jrnieder: Yes, I noticed that annoyance myself. ;)
>   17:11  Yeah, I saw that message a few times and was slightly
>  annoyed as well.
>
> Now, a meta point.  Throughout this discussion, I have been hoping for
> some acknowledgement of the problem --- e.g. an "I am sympathetic to
> what you are trying to do, but ".  I wasn't able to find that, and
> that is part of what contributed to the feeling of not being heard.
>
> Thanks for your patient explanations, and hope that helps,
> Jonathan

I think it makes total sense to fix this. I had not spotted this myself
since I tend to just roll forward and only use one version of git on one
system, but fixing this makes sense.


[PATCH v2 6/6] commit-graph write: add even more progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Add more progress output to sections of code that can collectively
take 5-10 seconds on a large enough repository. On a test repository
with I have with ~7 million commits and ~50 million objects we'll now
emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(50026015/50026015), done.
Annotating commit graph: 21567407, done.
Counting distinct commits in commit graph: 100% (7189147/7189147), done.
Finding extra edges in commit graph: 100% (7189147/7189147), done.
Computing commit graph generation numbers: 100% (7144680/7144680), done.
Writing out commit graph: 21434417, done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 2399912, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 0e98679bce..1ad960 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -887,12 +887,19 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable(, report_progress);
 
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Counting distinct commits in commit graph"),
+   oids.nr);
+   display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(oids.list, oids.nr, commit_compare);
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
+   display_progress(progress, i + 1);
if (!oideq([i - 1], [i]))
count_distinct++;
}
+   stop_progress();
 
if (count_distinct >= GRAPH_PARENT_MISSING)
die(_("the commit graph format cannot write %d commits"), 
count_distinct);
@@ -902,8 +909,13 @@ void write_commit_graph(const char *obj_dir,
ALLOC_ARRAY(commits.list, commits.alloc);
 
num_extra_edges = 0;
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Finding extra edges in commit graph"),
+   oids.nr);
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
+   display_progress(progress, i + 1);
if (i > 0 && oideq([i - 1], [i]))
continue;
 
@@ -920,6 +932,7 @@ void write_commit_graph(const char *obj_dir,
commits.nr++;
}
num_chunks = num_extra_edges ? 4 : 3;
+   stop_progress();
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 4/6] commit-graph write: add more describing progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

$ git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
[...]

# Actually we don't emit this since this takes almost no time at
# all. But if we did (s/_delayed//) we'd show:
$ git for-each-ref --format='%(objectname)' | git commit-graph write 
--stdin-commits
Finding commits for commit graph from 584 refs: 100% (584/584), done.
[...]

$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 3 packs: 6365492, done.
[...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

Finding commits for commit graph from 165203 refs: 100% (165203/165203), 
done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7c1afa4704..fd1fd61750 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -779,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
unsigned long approx_nr_objects;
+   struct strbuf progress_title = STRBUF_INIT;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -815,8 +816,12 @@ void write_commit_graph(const char *obj_dir,
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
if (report_progress) {
-   oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph in %d 
pack",
+  "Finding commits for commit graph in %d 
packs",
+  pack_indexes->nr),
+   pack_indexes->nr);
+   oids.progress = 
start_delayed_progress(progress_title.buf, 0);
oids.progress_done = 0;
}
for (i = 0; i < pack_indexes->nr; i++) {
@@ -833,14 +838,20 @@ void write_commit_graph(const char *obj_dir,
free(p);
}
stop_progress();
+   strbuf_reset(_title);
strbuf_release();
}
 
if (commit_hex) {
-   if (report_progress)
-   progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
-   commit_hex->nr);
+   if (report_progress) {
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph from 
%d ref",
+  "Finding commits for commit graph from 
%d refs",
+  commit_hex->nr),
+   commit_hex->nr);
+   progress = start_delayed_progress(progress_title.buf,
+ commit_hex->nr);
+   }
for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
@@ -860,12 +871,13 @@ void write_commit_graph(const char *obj_dir,
}
}
stop_progress();
+   strbuf_reset(_title);
}
 
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
+   _("Finding commits for commit graph among 
packed objects"),
approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
if (oids.progress_done < approx_nr_objects)
@@ -970,6 +982,8 @@ void write_commit_graph(const char *obj_dir,
  _cnt);
stop_progress();
 
+   strbuf_release(_title);
+
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 3/6] commit-graph write: show progress for object search

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

Finding commits for commit graph: 6365492, done.
[...]

And now:

Finding commits for commit graph: 100% (6365492/6365492), done.
[...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6f6409b292..7c1afa4704 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -778,12 +778,14 @@ void write_commit_graph(const char *obj_dir,
struct commit_list *parent;
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
+   unsigned long approx_nr_objects;
 
if (!commit_graph_compatible(the_repository))
return;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 32;
+   approx_nr_objects = approximate_object_count();
+   oids.alloc = approx_nr_objects / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
@@ -863,8 +865,11 @@ void write_commit_graph(const char *obj_dir,
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   _("Finding commits for commit graph"),
+   approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
+   if (oids.progress_done < approx_nr_objects)
+   display_progress(oids.progress, approx_nr_objects);
stop_progress();
}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 5/6] commit-graph write: remove empty line for readability

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index fd1fd61750..0e98679bce 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -888,7 +888,6 @@ void write_commit_graph(const char *obj_dir,
close_reachable(, report_progress);
 
QSORT(oids.list, oids.nr, commit_compare);
-
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
if (!oideq([i - 1], [i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c



  1   2   3   4   5   6   7   8   9   10   >