date:20180226

Re: [PATCH v3 18/35] fetch: pass ref patterns when fetching

2018-02-26 Thread Jonathan Nieder

Brandon Williams wrote:

> Construct a list of ref patterns to be passed to
> 'transport_get_remote_refs()' from the refspec to be used during the
> fetch.  This list of ref patterns will be used to allow the server to
> filter the ref advertisement when communicating using protocol v2.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/fetch.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 
Nice.

I take it that tests covering this come later in the series?

Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder

Brandon Williams wrote:

> Teach the client to be able to request a remote's refs using protocol
> v2.  This is done by having a client issue a 'ls-refs' request to a v2
> server.

Yay, ls-remote support!

[...]
> --- a/builtin/upload-pack.c
> +++ b/builtin/upload-pack.c
> @@ -5,6 +5,7 @@
>  #include "parse-options.h"
>  #include "protocol.h"
>  #include "upload-pack.h"
> +#include "serve.h"

nit, no change needed in this patch: What is a good logical order for
the #includes here?  Bonus points if there's a tool to make it happen
automatically.

Asking since adding #includes like this at the end tends to result in
a harder-to-read list of #includes, sometimes with duplicates, and
often producing conflicts when multiple patches in flight add a
#include to the same file.

[...]
> @@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const 
> char *prefix)
>  
>   switch (determine_protocol_version_server()) {
>   case protocol_v2:
> - /*
> -  * fetch support for protocol v2 has not been implemented yet,
> -  * so ignore the request to use v2 and fallback to using v0.
> -  */
> - upload_pack();
> + serve_opts.advertise_capabilities = opts.advertise_refs;
> + serve_opts.stateless_rpc = opts.stateless_rpc;
> + serve(_opts);

Interesting!  daemon.c has its own (file-local) serve() function;
can one of the two be renamed?

I actually like both names: serve() is a traditional name for the
function a server calls when it's done setting up and is ready to
serve.  But the clash is confusing.

[...]
> +++ b/connect.c
> @@ -12,9 +12,11 @@
>  #include "sha1-array.h"
>  #include "transport.h"
>  #include "strbuf.h"
> +#include "version.h"
>  #include "protocol.h"
>  
>  static char *server_capabilities;
> +static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;

Can a quick doc comment describe these and how they relate?

Is only one of them set, based on which protocol version is in use?
Should server_capabilities be renamed to server_capabilities_v1?

>  static const char *parse_feature_value(const char *, const char *, int *);
>  
>  static int check_ref(const char *name, unsigned int flags)
> @@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
> "and the repository exists."));
>  }
>  
> +/* Checks if the server supports the capability 'c' */
> +int server_supports_v2(const char *c, int die_on_error)
> +{
> + int i;
> +
> + for (i = 0; i < server_capabilities_v2.argc; i++) {
> + const char *out;
> + if (skip_prefix(server_capabilities_v2.argv[i], c, ) &&
> + (!*out || *out == '='))
> + return 1;
> + }
> +
> + if (die_on_error)
> + die("server doesn't support '%s'", c);
> +
> + return 0;
> +}

Nice.
> +
> +static void process_capabilities_v2(struct packet_reader *reader)
> +{
> + while (packet_reader_read(reader) == PACKET_READ_NORMAL)
> + argv_array_push(_capabilities_v2, reader->line);
> +
> + if (reader->status != PACKET_READ_FLUSH)
> + die("protocol error");

Can this say more?  E.g. "expected flush after capabilities, got "?

[...]
> @@ -85,7 +114,7 @@ enum protocol_version discover_version(struct 
> packet_reader *reader)
>   /* Maybe process capabilities here, at least for v2 */

Is this comment out of date now?

>   switch (version) {
>   case protocol_v2:
> - die("support for protocol v2 not implemented yet");
> + process_capabilities_v2(reader);
>   break;
>   case protocol_v1:
>   /* Read the peeked version line */
> @@ -293,6 +322,98 @@ struct ref **get_remote_heads(struct packet_reader 
> *reader,
>   return list;
>  }
>  
> +static int process_ref_v2(const char *line, struct ref ***list)

What does the return value represent?

Could it return the more typical 0 on success, -1 on error?

> +{
> + int ret = 1;
> + int i = 0;
> + struct object_id old_oid;
> + struct ref *ref;
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> +
> + if (string_list_split(_sections, line, ' ', -1) < 2) {

Can there be a comment describing the expected format?

> + ret = 0;
> + goto out;
> + }
> +
> + if (get_oid_hex(line_sections.items[i++].string, _oid)) {
> + ret = 0;
> + goto out;
> + }
> +
> + ref = alloc_ref(line_sections.items[i++].string);

Ref names cannot contains a space, so this is safe.  Good.

> +
> + oidcpy(>old_oid, _oid);
> + **list = ref;
> + *list = >next;
> +
> + for (; i < line_sections.nr; i++) {
> + const char *arg = line_sections.items[i].string;
> + if (skip_prefix(arg, "symref-target:", ))
> + ref->symref = xstrdup(arg);

Using space-delimited fields in a single pkt-line means that
- values

Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder

Jonathan Tan wrote:
> On Thu, 22 Feb 2018 13:26:58 -0500
> Jeff King  wrote:

>> I agree that it shouldn't matter much here. But if the name argv_array
>> is standing in the way of using it, I think we should consider giving it
>> a more general name. I picked that not to evoke "this must be arguments"
>> but "this is terminated by a single NULL".
[...]
> This sounds reasonable - I withdraw my comment about using struct
> string_list.

Marking with #leftoverbits as a reminder to think about what such a
more general name would be (or what kind of docs to put in
argv-array.h) and make it so the next time I do a search for that
keyword.

Thanks,
Jonathan

Re: [PATCH v3 10/35] protocol: introduce enum protocol_version value protocol_v2

2018-02-26 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Introduce protocol_v2, a new value for 'enum protocol_version'.
> Subsequent patches will fill in the implementation of protocol_v2.
>
> Signed-off-by: Brandon Williams 
> ---

Yay!

[...]
> +++ b/builtin/fetch-pack.c
> @@ -201,6 +201,9 @@ int cmd_fetch_pack(int argc, const char **argv, const 
> char *prefix)
>  PACKET_READ_GENTLE_ON_EOF);
>  
>   switch (discover_version()) {
> + case protocol_v2:
> + die("support for protocol v2 not implemented yet");
> + break;

This code goes away in a later patch, so no need to do anything about
this, but the 'break' is redundant after the 'die'.

[...]
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, 
> const char *prefix)
>   unpack_limit = receive_unpack_limit;
>  
>   switch (determine_protocol_version_server()) {
> + case protocol_v2:
> + /*
> +  * push support for protocol v2 has not been implemented yet,
> +  * so ignore the request to use v2 and fallback to using v0.
> +  */
> + break;

As you mentioned in the cover letter, it's probably worth doing the
same fallback on the client side (send-pack), too.

Otherwise when this client talks to a new-enough server, it would
request protocol v2 and then get confused when the server responds
with the protocol v2 it requested.

Thanks,
Jonathan

Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder

Jonathan Nieder wrote:
> Brandon Williams wrote:

>> Sometimes it is advantageous to be able to peek the next packet line
>> without consuming it (e.g. to be able to determine the protocol version
>> a server is speaking).  In order to do that introduce 'struct
>> packet_reader' which is an abstraction around the normal packet reading
>> logic.  This enables a caller to be able to peek a single line at a time
>> using 'packet_reader_peek()' and having a caller consume a line by
>> calling 'packet_reader_read()'.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>  pkt-line.c | 59 +++
>>  pkt-line.h | 58 ++
>>  2 files changed, 117 insertions(+)
>
> I like it!
>
> The questions and nits from
> https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
> still apply.  In particular, the ownership of the buffers inside the
> 'struct packet_reader' is still unclear; could the packet_reader create
> its own (strbuf) buffers so that the contract around them (who is allowed
> to write to them; who is responsible for freeing them) is more obvious?

Just to be clear: I sent that review after you sent this patch, so
there should not have been any reason for me to expect the q's and
nits to magically not apply. ;-)

Re: [PATCH v3 06/35] transport: use get_refs_via_connect to get refs

2018-02-26 Thread Jonathan Nieder

Brandon Williams wrote:

> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
> 
> Signed-off-by: Brandon Williams 
> ---
>  transport.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)

I like the diffstat.

[...]
> +++ b/transport.c
> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
>   args.cloning = transport->cloning;
>   args.update_shallow = data->options.update_shallow;
>  
> - if (!data->got_remote_heads) {
> - connect_setup(transport, 0);
> - get_remote_heads(data->fd[0], NULL, 0, _tmp, 0,
> -  NULL, >shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + refs_tmp = get_refs_via_connect(transport, 0);

The only difference between the old and new code is that the old code
passes NULL as 'extra_have' and the new code passes >extra_have.

That means this populates the data->extra_have oid_array.  Does it
matter?

> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
> *transport, struct ref *remote_re
>   struct send_pack_args args;
>   int ret;
>  
> - if (!data->got_remote_heads) {
> - struct ref *tmp_refs;
> - connect_setup(transport, 1);
> -
> - get_remote_heads(data->fd[0], NULL, 0, _refs, REF_NORMAL,
> -  NULL, >shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + get_refs_via_connect(transport, 1);

not a new problem, just curious: Does this leak tmp_refs?

Same question as the other caller about whether we mind getting
extra_have populated.

Thanks,
Jonathan

Prize

2018-02-26 Thread ''Internet''

The USA Internet Command Protocol promotions has selected your e-mail ID as 
lucky winner of US$2,500.000  out of US$350,000.000.00 shared worldwide for 
150.000 internet e-mail users.
Your Winning Code is (youripnews763-alertenews-abc-2.5m$$$@)

For your payment, quickly forward your email winning code to the local payment 
agency through  this 
E-mail:youripnews763-alertenews-abc-01016-wn-hlo.worldwide-$$$@yahoo.com

FYI: Contact Authorized Agency Through Below E-mail For Payment:
E-mail:youripnews763-alertenews-abc-01016-wn-hlo.worldwide-$$$@yahoo.com

Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Sometimes it is advantageous to be able to peek the next packet line
> without consuming it (e.g. to be able to determine the protocol version
> a server is speaking).  In order to do that introduce 'struct
> packet_reader' which is an abstraction around the normal packet reading
> logic.  This enables a caller to be able to peek a single line at a time
> using 'packet_reader_peek()' and having a caller consume a line by
> calling 'packet_reader_read()'.
>
> Signed-off-by: Brandon Williams 
> ---
>  pkt-line.c | 59 +++
>  pkt-line.h | 58 ++
>  2 files changed, 117 insertions(+)

I like it!

The questions and nits from
https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
still apply.  In particular, the ownership of the buffers inside the
'struct packet_reader' is still unclear; could the packet_reader create
its own (strbuf) buffers so that the contract around them (who is allowed
to write to them; who is responsible for freeing them) is more obvious?

Thanks,
Jonathan

Re: [PATCH v2 12/27] serve: introduce git-serve

2018-02-26 Thread Jonathan Nieder

Hi Duy,

Duy Nguyen wrote:
> On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams  wrote:

>> + stateless-rpc
>> +---
>> +
>> +If advertised, the `stateless-rpc` capability indicates that the server
>> +supports running commands in a stateless-rpc mode, which means that a
>> +command lasts for only a single request-response round.
>> +
>> +Normally a command can last for as many rounds as are required to
>> +complete it (multiple for negotiation during fetch or no additional
>> +trips in the case of ls-refs).  If the client sends the `stateless-rpc`
>> +capability with a value of `true` (in the form `stateless-rpc=true`)
>> +then the invoked command must only last a single round.
>
> Speaking of stateless-rpc, I remember last time this topic was brought
> up, there was some discussion to kind of optimize it for http as well,
> to fit the "client sends request, server responds data" model and
> avoid too many round trips (ideally everything happens in one round
> trip). Does it evolve to anything real? All the cool stuff happened
> while I was away, sorry if this was discussed and settled.

We have a few different ideas for improving negotiation.  They were
speculative enough that we didn't want to make them part of the
baseline protocol v2.  Feel free to poke me in a new thread. :)

Some teasers:

- allow both client and server to suggest commits in negotiation,
  instead of just the client?

- send a bloom filter for the peer to filter their suggestions
  against?

- send other basic information like maximum generation number or
  maximum commit date?

- exponential backoff in negotiation instead of linear walking?
  prioritizing ref tips?  Imitating the bitmap selection algorithm?

- at the "end" of negotiation, sending a graph data structure instead
  of a pack, to allow an extra round trip to produce a truly minimal
  pack?

Those are some initial ideas, but it's also likely that someone can
come up with some other experiments to try, too.  (E.g. we've looked
at various papers on set reconciliation, but they don't make enough
use of the graph structure to help much.)

Thanks,
Jonathan

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-02-26 Thread Jacob Keller

On Mon, Feb 26, 2018 at 4:07 PM, Johannes Schindelin
 wrote:
> Hi Buga,
>
> On Tue, 20 Feb 2018, Igor Djordjevic wrote:
>
>> I`m really interested in this topic, which seems to (try to) address the
>> only "bad feeling" I had with rebasing merges - being afraid of silently
>> losing amendments by actually trying to "replay" the merge (where
>> additional and possibly important context is missing), instead of really
>> "rebasing" it (somehow).
>
> If those amendments are what you are worried about, why not address them
> specifically?
>
> In other words, rather than performing the equivalent of
>
> git show ^! | git apply
>
> (which would of course completely ignore if the rewritten ^2
> dropped a patch, amended a patch, or even added a new patch), what you
> really want is to figure out what changes the user made when merging, and
> what merge strategy was used to begin with.
>
> To see what I mean, look at the output of `git show 0fd90daba8`: it shows
> how conflicts were resolved. By necessity, this is more complicated than a
> simple diff: it is *not* as simple as taking a diff between two revisions
> and applying that diff to a third revision. There were (at least) three
> revisions involved in the original merge commit, and recreating that merge
> commit faithfully means to represent the essence of the merge commit
> faithfully enough to be able to replay it on a new set of at least three
> revisions.  That can be simplified to two-way diffs only in very, very
> special circumstances, and in all other cases this simplification will
> simply fall on its nose.
>
> If the proposed solution was to extend `git apply` to process combined
> diffs, I would agree that we're on to something. That is not the proposed
> solution, though.
>
> In short: while I am sympathetic to the desire to keep things simple,
> the idea to somehow side-step replaying the original merge seems to be
> *prone* to be flawed. Any system that cannot accommodate
> dropped/changed/added commits on either side of a merge is likely to be
> too limited to be useful.
>


The reason Sergey's solution works is because he cherry picks the
merge using each parent first, and then merges the result of those. So
each branch of the merge gets one, and then you merge the result of
those cherry-picks. This preservers amendments and changes properly,
and should result in a good solution.

I agree that making "git apply" work with combined diffs could also be
another solution, but it may be trickier.

If this *doesn't* work, a test case showing that it doesn't work would
be appreciated. I'm hoping to be able to put together something soon,
but I haven't had time due to $dayjob.

> Ciao,
> Johannes

Re: [PATCH v8 3/7] utf8: add function to detect prohibited UTF-16/32 BOM

2018-02-26 Thread Eric Sunshine

On Sun, Feb 25, 2018 at 6:35 AM, Lars Schneider
 wrote:
>> On 25 Feb 2018, at 04:41, Eric Sunshine  wrote:
>> Is this interpretation correct? When I read [1], I interpret it as
>> saying that no BOM _of any sort_ should be present when the encoding
>> is declared as one of UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE.
>
> Correct!
>
>> This code, on the other hand, only checks for BOMs corresponding
>> to the declared size (16 or 32 bits).
>
> Hmm. Interesting thought. You are saying that my code won't complain if
> a document declared as UTF-16LE has a UTF32-LE BOM, correct?

Well, not specifically that case since UTF-16LE BOM is a subset of UTF32-LE BOM.

My observation was more general in that [1] seems to say that there
should be _no_ BOM whatsoever if one of UTF-16BE, UTF-16LE, UTF-32BE,
or UTF-32LE is declared.

> I would say
> this is correct behavior in context of this function. This function assumes
> that the document is proper UTF-16/UTF-16BE/UTF-16LE but it is wrongly
> declared with respect to its BOM in the .gitattributes. Would this
> comment make it more clear to you?
> /*
>  * If a data stream is declared as UTF-16BE or UTF-16LE, then a UTF-16
>  * BOM must not be used [1]. The same applies for the UTF-32 
> equivalents.
>  * The function returns true if this rule is violated.
>  *
>  * [1] http://unicode.org/faq/utf_bom.html#bom10
>  */
> I think what you are referring to is a different class of error and
> would therefore warrant its own checker function. Would you agree?

I don't understand to what different class of error you refer. The
FAQ[1] seems pretty clear to me in that if one of those declarations
is used explicitly, then there should be _no_ BOM, period. It doesn't
say anything about allowing a BOM for a differently-sized encoding (16
vs 32).

If I squint very hard, I _guess_ I can see how you interpret [1] with
the more narrow meaning of the restriction applying only to a BOM of
the same size as the declared encoding, though reading it that way
doesn't come easily to me.

>> I suppose the intention of [1] is to detect a mismatch between the
>> declared encoding and how the stream is actually encoded. The check
>> implemented here will fail to detect a mismatch between, say, declared
>> encoding UTF-16BE and actual encoding UTF-32BE.
>
> As stated above the intention is to detect wrong BOMs! I think we cannot
> detect the "declared as UTF-16BE but actually UTF-32BE" error.
>
> Consider this:
>
> printf "test" | iconv -f UTF-8 -t UTF-32BE | iconv -f UTF-16BE -t UTF-8 | od 
> -c
> 000   \0   t  \0   e  \0   s  \0   t
> 010
>
> In the first step we "encode" the string to UTF-32BE and then we "decode" it 
> as
> UTF-16BE. The result is valid although not correct. Does this make sense?

I'm probably being dense, but I don't understand what this is trying
to illustrate in relation to has_prohibited_utf_bom().

Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command

2018-02-26 Thread Jonathan Nieder

Ævar Arnfjörð Bjarmason wrote:
> On Sat, Feb 24 2018, Jeff King jotted:

>> I actually wonder if we should just specify that the patterns must
>> _always_ be fully-qualified, but may end with a single "/*" to iterate
>> over wildcards. Or even simpler, that "refs/heads/foo" would find that
>> ref itself, and anything under it.
>
> I agree that this is a very good trade-off for now, but I think having
> an escape hatch makes sense. It looks like the protocol is implicitly
> extendible since another parameter could be added, but maybe having such
> a parameter from the get-go would make sense:

I prefer to rely on the implicit extensibility (following the general
YAGNI principle).

In other words, we can introduce a pattern-type later and make the
current pattern-type the default.

Thanks for looking to the future.

[...]
> E.g. if the refs were stored indexed using the method described at
> https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less
> efficient than prefix matching, but a function of how many trigrams in
> your index match the pattern given.

I think the nearest planned change to ref storage is [1], which is
still optimized for prefix matching.  Longer term, maybe some day
we'll want a secondary index that supports infix matching, or maybe
we'll never need it. :)

Sincerely,
Jonathan

[1] 
https://public-inbox.org/git/CAJo=hjszcam9sipdvr7tmd-fd2v2w6_pvmq791egcdsdkq0...@mail.gmail.com/#t

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-02-26 Thread Sergey Organov

Hi Johannes,

Johannes Schindelin  writes:
> Hi Buga,
>
> On Tue, 20 Feb 2018, Igor Djordjevic wrote:
>
>> I`m really interested in this topic, which seems to (try to) address the
>> only "bad feeling" I had with rebasing merges - being afraid of silently
>> losing amendments by actually trying to "replay" the merge (where
>> additional and possibly important context is missing), instead of really
>> "rebasing" it (somehow).

[...]

> In short: while I am sympathetic to the desire to keep things simple,
> the idea to somehow side-step replaying the original merge seems to be
> *prone* to be flawed.

The proposed (TM) solution does replay the original merge.

> Any system that cannot accommodate dropped/changed/added commits on
> either side of a merge is likely to be too limited to be useful.

I believe the proposed (TM) solution handles all that nicely. It does
accommodate dropped/changed/added commits on either side of a merge,
symmetrically, and never silently drops user modifications.

If you think (TM) is flawed, please give us a test-case.

-- Sergey

[PATCH] git-p4: Fix depot path stripping

2018-02-26 Thread Nuno Subtil

When useClientSpec=true, stripping of P4 depot paths doesn't happen
correctly on sync. Modifies stripRepoPath to handle this case better.

Signed-off-by: Nuno Subtil 
---
 git-p4.py | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 7bb9cadc69738..3df95d0fd1d83 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2480,7 +2480,7 @@ def stripRepoPath(self, path, prefixes):
 if path.startswith(b + "/"):
 path = path[len(b)+1:]
 
-elif self.keepRepoPath:
+if self.keepRepoPath:
 # Preserve everything in relative path name except leading
 # //depot/; just look at first prefix as they all should
 # be in the same depot.
@@ -2490,6 +2490,12 @@ def stripRepoPath(self, path, prefixes):
 
 else:
 for p in prefixes:
+   if self.useClientSpec and not self.keepRepoPath:
+# when useClientSpec is false, the prefix will contain the 
depot name but the path will not
+# extract the depot name and add it to the path so the 
match below will do the right thing
+depot = re.sub("^(//[^/]+/).*", r'\1', p)
+path = depot + path
+
 if p4PathStartsWith(path, p):
 path = path[len(p):]
 break
@@ -2526,8 +2532,8 @@ def splitFilesIntoBranches(self, commit):
 # go in a p4 client
 if self.useClientSpec:
 relPath = self.clientSpecDirs.map_in_client(path)
-else:
-relPath = self.stripRepoPath(path, self.depotPaths)
+
+relPath = self.stripRepoPath(path, self.depotPaths)
 
 for branch in self.knownBranches.keys():
 # add a trailing slash so that a commit into qt/4.2foo

--
https://github.com/git/git/pull/463

[PATCH v5 09/13] commit-graph: close under reachability

2018-02-26 Thread Derrick Stolee

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 7b0cfb4..01aa23d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -369,6 +369,28 @@ static int add_packed_commits(const struct object_id *oid,
return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+   int i;
+   struct rev_info revs;
+   struct commit *commit;
+   init_revisions(, NULL);
+   for (i = 0; i < oids->nr; i++) {
+   commit = lookup_commit(>list[i]);
+   if (commit && !parse_commit(commit))
+   revs.commits = commit_list_insert(commit, 
);
+   }
+
+   if (prepare_revision_walk())
+   die(_("revision walk setup failed"));
+
+   while ((commit = get_revision()) != NULL) {
+   ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+   oidcpy(>list[oids->nr], &(commit->object.oid));
+   (oids->nr)++;
+   }
+}
+
 void write_commit_graph(const char *obj_dir)
 {
struct packed_oid_list oids;
@@ -393,6 +415,7 @@ void write_commit_graph(const char *obj_dir)
ALLOC_ARRAY(oids.list, oids.alloc);
 
for_each_packed_object(add_packed_commits, , 0);
+   close_reachable();
 
QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.7.4

[PATCH v5 08/13] commit-graph: add core.commitGraph setting

2018-02-26 Thread Derrick Stolee

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 3 +++
 cache.h  | 1 +
 config.c | 5 +
 environment.c| 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf..77fcd53 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+   Enable git commit graph feature. Allows reading from .graph files.
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index 21fbcc2..3a18a02 100644
--- a/cache.h
+++ b/cache.h
@@ -801,6 +801,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6..25ee4a6 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.commitgraph")) {
+   core_commit_graph = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index de8431e..0e96be3 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = 
PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.7.4

[PATCH v5 07/13] commit-graph: implement git commit-graph read

2018-02-26 Thread Derrick Stolee

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  12 
 builtin/commit-graph.c |  56 +++
 commit-graph.c | 140 -
 commit-graph.h |  23 ++
 t/t5318-commit-graph.sh|  32 +++--
 5 files changed, 257 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index e688843..51cb038 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 
 [verse]
+'git commit-graph read'  [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -33,6 +34,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 
@@ -43,6 +49,12 @@ EXAMPLES
 $ git commit-graph write
 
 
+* Read basic information from the commit-graph file.
++
+
+$ git commit-graph read
+
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index a9d61f6..0e164be 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
+   N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph write [--object-dir ]"),
NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+   N_("git commit-graph read [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
N_("git commit-graph write [--object-dir ]"),
NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+   struct commit_graph *graph = 0;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_read_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_read_options,
+builtin_commit_graph_read_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+
+   if (!graph)
+   die("graph file %s does not exist", graph_name);
+   FREE_AND_NULL(graph_name);
+
+   printf("header: %08x %d %d %d %d\n",
+   ntohl(*(uint32_t*)graph->data),
+   *(unsigned char*)(graph->data + 4),
+   *(unsigned char*)(graph->data + 5),
+   *(unsigned char*)(graph->data + 6),
+   *(unsigned char*)(graph->data + 7));
+   printf("num_commits: %u\n", graph->num_commits);
+   printf("chunks:");
+
+   if (graph->chunk_oid_fanout)
+   printf(" oid_fanout");
+   if (graph->chunk_oid_lookup)
+   printf(" oid_lookup");
+   if (graph->chunk_commit_data)
+   printf(" commit_metadata");
+   if (graph->chunk_large_edges)
+   printf(" large_edges");
+   printf("\n");
+
+   return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
if (argc > 0) {
+   if (!strcmp(argv[0], "read"))
+   return graph_read(argc, argv);
if (!strcmp(argv[0], "write"))
return graph_write(argc, argv);
}
diff --git a/commit-graph.c b/commit-graph.c
index 2251397..7b0cfb4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,149 @@
GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+   struct commit_graph *g = xmalloc(sizeof(*g));
+   memset(g, 0,

[PATCH v5 02/13] graph: add commit graph design document

2018-02-26 Thread Derrick Stolee

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 164 +++
 1 file changed, 164 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
new file mode 100644
index 000..d11753a
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,164 @@
+Git Commit Graph Design Notes
+=
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+If A and B are commits with generation numbers N and M, respectively,
+and N <= M, then A cannot reach B. That is, we know without searching
+that B is not an ancestor of A because it is further from a root commit
+than A.
+
+Conversely, when checking if A is an ancestor of B, then we only need
+to walk commits until all commits on the walk boundary have generation
+number at most N. If we walk commits using a priority queue seeded by
+generation numbers, then we always expand the boundary commit with highest
+generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+If A and B are commits with commit time X and Y, respectively, and
+X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+---
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+

[PATCH v5 10/13] commit: integrate commit graph with commit parsing

2018-02-26 Thread Derrick Stolee

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 664,185
reachable commits and is behind 'origin/master' by 60,191 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  6.56s |  0.66s | -89%  |
| branch -vv   |  1.35s |  0.32s | -76%  |
| rev-list --all   |  6.7s  |  0.83s | -87%  |
| rev-list --all --objects | 33.0s  | 27.5s  | -16%  |

Signed-off-by: Derrick Stolee 
---
 alloc.c |   1 +
 commit-graph.c  | 141 +++-
 commit-graph.h  |  12 +
 commit.c|   3 ++
 commit.h|   3 ++
 t/t5318-commit-graph.sh |  47 +++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadf..cf4f8b6 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+   c->graph_pos = COMMIT_NOT_FROM_GRAPH;
return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 01aa23d..184b8da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -182,6 +181,145 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
exit(1);
 }
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+   char *graph_name;
+
+   if (commit_graph)
+   return;
+
+   graph_name = get_commit_graph_filename(obj_dir);
+   commit_graph = load_commit_graph_one(graph_name);
+
+   FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+   struct alternate_object_database *alt;
+   char *obj_dir;
+
+   if (prepare_commit_graph_run_once)
+   return;
+   prepare_commit_graph_run_once = 1;
+
+   obj_dir = get_object_directory();
+   prepare_commit_graph_one(obj_dir);
+   prepare_alt_odb();
+   for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+   prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+   if (!commit_graph)
+   return;
+
+   if (commit_graph->graph_fd >= 0) {
+   munmap((void *)commit_graph->data, commit_graph->data_len);
+   commit_graph->data = NULL;
+   close(commit_graph->graph_fd);
+   }
+
+   FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, 
uint32_t *pos)
+{
+   return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+   g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+uint64_t pos,
+struct commit_list **pptr)
+{
+   struct commit *c;
+   struct object_id oid;
+   hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+   c = lookup_commit();
+   if (!c)
+   die("could not find commit %s", oid_to_hex());
+   c->graph_pos = pos;
+   return _list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
+{
+   struct object_id oid;
+   uint32_t edge_value;
+   uint32_t *parent_data_ptr;
+   uint64_t date_low, date_high;
+   struct commit_list **pptr;
+   const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+
+   item->object.parsed = 1;
+   item->graph_pos = pos;
+
+   hashcpy(oid.hash, commit_data);
+   item->tree = lookup_tree();
+
+   date_high = ntohl(*(uint32_t*)(commit_data + g->hash_len + 8)) & 0x3;
+   date_low = ntohl(*(uint32_t*)(commit_data + g->hash_len + 12));
+   item->date = (timestamp_t)((date_high << 32) | date_low);
+

[PATCH v5 00/13] Serialized Git Commit Graph

2018-02-26 Thread Derrick Stolee

This patch series is another big difference from version 4, but I do 
think we are converging on a stable design.

This series depends on a few things in flight:

* jt/binsearch-with-fanout for bsearch_graph()

* 'master' includes the sha1file -> hashfile rename in (98a3beab).

* [PATCH] commit: drop uses of get_cached_commit_buffer(). [1] I
  couldn't find a ds/* branch for this one, but it is necessary or
  else the commit graph test script should fail.

Here are some of the inter-patch changes:

* The single commit graph file is stored in the fixed filename
  .git/objects/info/commit-graph

* Because of this change, I struggled with the right way to pair the
  lockfile API with the hashfile API. Perhaps they were not meant to
  interact like this. I include a new patch step that adds a flag for
  hashclose() to keep the file descriptor open so commit_lock_file()
  can succeed. Please let me know if this is the wrong approach.

* A side-benefit of this change is that the "--set-latest" and
  "--delete-expired" arguments are no longer useful.

* I re-ran the performance tests since I rebased onto master. I had
  moved my "master" branch on my copy of Linux from another perf test,
  which changed the data shape a bit.

* There was some confusion between v3 and v4 about whether commits in
  an existing commit-graph file are automatically added to the new
  file during a write. I think I cleared up all of the documentation
  that referenced this to the new behavior: we only include commits
  reachable from the starting commits (depending on --stdin-commits,
  --stdin-packs, or neither) unless the new "--additive" argument
  is specified.

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/1519240631-221761-1-git-send-email-dsto...@microsoft.com/

-- >8 --

This patch contains a way to serialize the commit graph.

The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 664,185 reachable commits and is behind 'origin/master'
by 60,191 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  6.56s |  0.66s | -89%  |
| branch -vv   |  1.35s |  0.32s | -76%  |
| rev-list --all   |  6.7s  |  0.83s | -87%  |
| rev-list --all --objects | 33.0s  | 27.5s  | -16%  |

To test this yourself, run the following on your repo:

  git config core.commitGraph true
  git show-ref -s | git commit-graph write --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisons by toggling the 'core.commitGraph' setting.

[1] https://github.com/derrickstolee/git/pull/2
A GitHub pull request containing the latest version of this patch.

Derrick Stolee (13):
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  csum-file: add CSUM_KEEP_OPEN flag
  commit-graph: implement write_commit_graph()
  commit-graph: implement 'git-commit-graph write'
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

[PATCH v5 05/13] commit-graph: implement write_commit_graph()

2018-02-26 Thread Derrick Stolee

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-graph.c | 360 +
 commit-graph.h |   7 ++
 3 files changed, 368 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index 2e4956f..bf91b2d 100644
--- a/Makefile
+++ b/Makefile
@@ -771,6 +771,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 000..2251397
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,360 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
+#define GRAPH_PARENT_MISSING 0x7fff
+#define GRAPH_EDGE_LAST_MASK 0x7fff
+#define GRAPH_PARENT_NONE 0x7000
+
+#define GRAPH_LAST_EDGE 0x8000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+   GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+   return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+struct commit **commits,
+int nr_commits)
+{
+   int i, count = 0;
+   struct commit **list = commits;
+
+   /*
+* Write the first-level table (the list is sorted,
+* but we use a 256-entry lookup to be able to avoid
+* having to do eight extra binary search iterations).
+*/
+   for (i = 0; i < 256; i++) {
+   while (count < nr_commits) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }
+
+   hashwrite_be32(f, count);
+   }
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   int count;
+   for (count = 0; count < nr_commits; count++, list++)
+   hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+   struct commit **commits = table;
+   return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_extra_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   int edge_value;
+   uint32_t packedDate[2];
+
+   parse_commit(*list);
+   hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+   parent = (*list)->parents;
+
+   if (!parent)
+   edge_value = GRAPH_PARENT_NONE;
+   else {
+   edge_value = sha1_pos(parent->item->object.oid.hash,
+ commits,
+ nr_commits,
+ commit_to_sha1);
+
+   if (edge_value < 0)
+   edge_value = GRAPH_PARENT_MISSING;
+   }
+
+   hashwrite_be32(f, edge_value);
+
+   if (parent)
+   parent = parent->next;
+
+   if (!parent)
+   edge_value = GRAPH_PARENT_NONE;
+   else if (parent->next)
+   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_extra_edges;
+   else {
+   edge_value =

[PATCH v5 11/13] commit-graph: read only from specific pack-indexes

2018-02-26 Thread Derrick Stolee

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 11 ++-
 builtin/commit-graph.c | 33 ++---
 commit-graph.c | 26 --
 commit-graph.h |  4 +++-
 packfile.c |  4 ++--
 packfile.h |  2 ++
 t/t5318-commit-graph.sh| 10 ++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 51cb038..b945510 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -32,7 +32,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified packfiles.
 
 'read'::
 
@@ -49,6 +51,13 @@ EXAMPLES
 $ git commit-graph write
 
 
+* Write a graph file, extending the current graph file using commits
+* in .
++
+
+$ echo  | git commit-graph write --stdin-packs
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0e164be..eebca57 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+   const char **pack_indexes = NULL;
+   int packs_nr = 0;
+   const char **lines = NULL;
+   int lines_nr = 0;
+   int lines_alloc = 0;
+
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", _dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "stdin-packs", _packs,
+   N_("scan packfiles listed by stdin for commits")),
OPT_END(),
};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
-   write_commit_graph(opts.obj_dir);
+   if (opts.stdin_packs) {
+   struct strbuf buf = STRBUF_INIT;
+   lines_nr = 0;
+   lines_alloc = 128;
+   ALLOC_ARRAY(lines, lines_alloc);
+
+   while (strbuf_getline(, stdin) != EOF) {
+   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+   lines[lines_nr++] = strbuf_detach(, NULL);
+   }
+
+   pack_indexes = lines;
+   packs_nr = lines_nr;
+   }
+
+   write_commit_graph(opts.obj_dir,
+  pack_indexes,
+  packs_nr);
+
return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 184b8da..4e9f1d5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -529,7 +529,9 @@ static void close_reachable(struct packed_oid_list *oids)
}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+   const char **pack_indexes,
+   int nr_packs)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -552,7 +554,27 @@ void write_commit_graph(const char *obj_dir)
oids.alloc = 1024;
ALLOC_ARRAY(oids.list, oids.alloc);
 
-   for_each_packed_object(add_packed_commits, , 0);
+   if (pack_indexes) {
+   struct strbuf packname = STRBUF_INIT;
+   int dirlen;
+   strbuf_addf(, "%s/pack/", obj_dir);
+   dirlen = packname.len;
+   for (i = 0; i < nr_packs; i++) {
+

[PATCH v5 13/13] commit-graph: implement "--additive" option

2018-02-26 Thread Derrick Stolee

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 10 ++
 builtin/commit-graph.c | 10 +++---
 commit-graph.c | 17 -
 commit-graph.h |  3 ++-
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 0710a68..ccf5e20 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -41,6 +41,9 @@ With the `--stdin-commits` option, generate the new commit 
graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--additive` option, include all commits that are present
+in the existing commit-graph file.
 
 'read'::
 
@@ -70,6 +73,13 @@ $ echo  | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --additive
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 1c7b7e7..d26a6d6 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--additive] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--additive] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
const char *obj_dir;
int stdin_packs;
int stdin_commits;
+   int additive;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
N_("scan packfiles listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", _commits,
N_("start walk at commits listed by stdin")),
+   OPT_BOOL(0, "additive", ,
+   N_("include all commits already in the commit-graph 
file")),
OPT_END(),
};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
   pack_indexes,
   packs_nr,
   commit_hex,
-  commits_nr);
+  commits_nr,
+  opts.additive);
 
return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index dbb9801..c111717 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -533,7 +533,8 @@ void write_commit_graph(const char *obj_dir,
const char **pack_indexes,
int nr_packs,
const char **commit_hex,
-   int nr_commits)
+   int nr_commits,
+   int additive)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -552,10 +553,24 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
oids.alloc = approximate_object_count() / 4;
 
+   if (additive) {
+   prepare_commit_graph_one(obj_dir);
+   if (commit_graph)
+   oids.alloc += commit_graph->num_commits;
+   }
+
if (oids.alloc < 1024)
oids.alloc = 1024;
ALLOC_ARRAY(oids.list, oids.alloc);
 
+   if (additive && commit_graph) {
+   for (i = 0; i < commit_graph->num_commits; i++) {
+   const unsigned char *hash = 
commit_graph->chunk_oid_lookup +
+   commit_graph->hash_len * i;
+   hashcpy(oids.list[oids.nr++].hash, hash);
+   }
+   }
+
if (pack_indexes) {
struct strbuf packname = STRBUF_INIT;
int

[PATCH v5 12/13] commit-graph: build graph from starting commits

2018-02-26 Thread Derrick Stolee

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 14 +-
 builtin/commit-graph.c | 27 +--
 commit-graph.c | 27 +--
 commit-graph.h |  4 +++-
 t/t5318-commit-graph.sh| 13 +
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index b945510..0710a68 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified packfiles.
+walking objects only in the specified packfiles. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -58,6 +64,12 @@ $ git commit-graph write
 $ echo  | git commit-graph write --stdin-packs
 
 
+* Write a graph file containing all reachable commits.
++
+
+$ git show-ref -s | git commit-graph write --stdin-commits
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index eebca57..1c7b7e7 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
+   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
+   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
int stdin_packs;
+   int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
const char **pack_indexes = NULL;
int packs_nr = 0;
+   const char **commit_hex = NULL;
+   int commits_nr = 0;
const char **lines = NULL;
int lines_nr = 0;
int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
N_("The object directory to store the graph")),
OPT_BOOL(0, "stdin-packs", _packs,
N_("scan packfiles listed by stdin for commits")),
+   OPT_BOOL(0, "stdin-commits", _commits,
+   N_("start walk at commits listed by stdin")),
OPT_END(),
};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
+   if (opts.stdin_packs && opts.stdin_commits)
+   die(_("cannot use both --stdin-commits and --stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
-   if (opts.stdin_packs) {
+   if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
lines_nr = 0;
lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
lines[lines_nr++] = strbuf_detach(, NULL);
}
 
-   pack_indexes = lines;
-   packs_nr = lines_nr;
+   if (opts.stdin_packs) {
+   pack_indexes = lines;
+   packs_nr = lines_nr;
+   }
+   if (opts.stdin_commits) {
+   commit_hex = lines;
+   commits_nr = lines_nr;
+   }
}

[PATCH v5 03/13] commit-graph: create git-commit-graph builtin

2018-02-26 Thread Derrick Stolee

Thanks for the help in getting all the details right in setting up a
builtin.

-- >8 --

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  1 +
 Documentation/git-commit-graph.txt | 11 ++
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/commit-graph.c | 37 ++
 command-list.txt   |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c  |  1 +
 8 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b..e82f901 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
new file mode 100644
index 000..5913340
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,11 @@
+git-commit-graph(1)
+===
+
+NAME
+
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
+
diff --git a/Makefile b/Makefile
index c56fdc1..2e4956f 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3..079855b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const 
char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 000..8ff7336
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,37 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+   N_("git commit-graph [--object-dir ]"),
+   NULL
+};
+
+static struct opts_commit_graph {
+   const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+   static struct option builtin_commit_graph_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+
+   git_config(git_default_config, NULL);
+   argc = parse_options(argc, argv, prefix,
+builtin_commit_graph_options,
+builtin_commit_graph_usage,
+PARSE_OPT_STOP_AT_NON_OPTION);
+
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28..835c589 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean   mainporcelain
 git-clone   mainporcelain   init
 git-column  purehelpers
 git-commit  mainporcelain   history
+git-commit-graphplumbingmanipulators
 git-commit-tree plumbingmanipulators
 git-config  ancillarymanipulators
 git-count-objects   ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 88813e9..b060b6f 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
check-ref-format) : plumbing;;
checkout-index)   : plumbing;;
column)   : internal

[PATCH v5 06/13] commit-graph: implement 'git-commit-graph write'

2018-02-26 Thread Derrick Stolee

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  39 
 builtin/commit-graph.c |  33 ++
 t/t5318-commit-graph.sh| 125 +
 3 files changed, 197 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 5913340..e688843 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,45 @@ NAME
 
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+
+[verse]
+'git commit-graph write'  [--object-dir ]
+
+
+DESCRIPTION
+---
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+---
+--object-dir::
+   Use given directory for the location of packfiles and commit graph
+   file. The commit graph file is expected to be at /info/commit-graph
+   and the packfiles are expected to be in /pack.
+
+
+COMMANDS
+
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+
+$ git commit-graph write
+
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 8ff7336..a9d61f6 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ]"),
+   NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+   N_("git commit-graph write [--object-dir ]"),
NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+   static struct option builtin_commit_graph_write_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_write_options,
+builtin_commit_graph_write_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   write_commit_graph(opts.obj_dir);
+   return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
 builtin_commit_graph_usage,
 PARSE_OPT_STOP_AT_NON_OPTION);
 
+   if (argc > 0) {
+   if (!strcmp(argv[0], "write"))
+   return graph_write(argc, argv);
+   }
+
usage_with_options(builtin_commit_graph_usage,
   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 000..7524d2a
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,125 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+   mkdir full &&
+   cd "$TRASH_DIRECTORY/full" &&
+   git init &&
+   objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit-graph write --object-dir . &&
+   test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+   cd "$TRASH_DIRECTORY/full" &&
+   for i in $(test_seq 3)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git repack
+'
+
+test_expect_success 'write graph' '
+   cd "$TRASH_DIRECTORY/full" &&
+   graph1=$(git commit-graph write) &&
+   test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git reset --hard commits/1 &&
+   for i in $(test_seq 4 5)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git reset --hard commits/2 &&
+   for i in $(test_seq 6 7)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git reset --hard commits/2 &&
+   git merge commits/4 &&
+   git branch merge/1 &&
+

[PATCH v5 01/13] commit-graph: add format document

2018-02-26 Thread Derrick Stolee

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph-format.txt | 98 +
 1 file changed, 98 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 000..4402baa
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,98 @@
+Git commit graph format
+===
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as 32-bit integers corresponding to
+the array position withing the list of commit OIDs. We use the most-significant
+bit for special purposes, so we can store at most (1 << 31) - 1 (around 2
+billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+  The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+  Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+  We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+ Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+  First 4 bytes describe the chunk id. Value 0 is a terminating label.
+  Other 8 bytes provide the byte-offset in current file for chunk to
+  start. (Chunks are ordered contiguously in the file, so you can infer
+  the length using the next chunk position if necessary.) Each chunk
+  type appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+  The ith entry, F[i], stores the number of OIDs with first
+  byte at most i. Thus F[255] stores the total
+  number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+  The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+* The first H bytes are for the OID of the root tree.
+* The next 8 bytes are for the positions of the first two parents
+  of the ith commit. Stores value 0x if no parent in that
+  position. If there are more than two parents, the second value
+  has its most-significant bit on and the other bits store an array
+  position into the Large Edge List chunk.
+* The next 8 bytes store the generation number of the commit and
+  the commit time in seconds since EPOCH. The generation number
+  uses the higher 30 bits of the first 4 bytes, while the commit
+  time uses the 32 bits of the second 4 bytes, along with the lowest
+  2 bits of the lowest byte, storing the 33rd and 34th bit of the
+  commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+  This list of 4-byte values store the second through nth parents for
+  all octopus merges. The second parent value in the commit data stores
+  an array position within this list along with the most-significant bit
+  on. Starting at that array position, iterate through this list of commit
+  positions for the parents until reaching a value with the 
most-significant
+  bit on. The

[PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-02-26 Thread Derrick Stolee

This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --

If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Signed-off-by: Derrick Stolee 
---
 csum-file.c | 10 +++---
 csum-file.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 5eda7fb..302e6ae 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, 
unsigned int flags)
flush(f, f->buffer, the_hash_algo->rawsz);
if (flags & CSUM_FSYNC)
fsync_or_die(f->fd, f->name);
-   if (close(f->fd))
-   die_errno("%s: sha1 file error on close", f->name);
-   fd = 0;
+   if (flags & CSUM_KEEP_OPEN)
+   fd = f->fd;
+   else {
+   if (close(f->fd))
+   die_errno("%s: sha1 file error on close", 
f->name);
+   fd = 0;
+   }
} else
fd = f->fd;
if (0 <= f->check_fd) {
diff --git a/csum-file.h b/csum-file.h
index 992e5c0..b7c0e48 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct 
hashfile_checkpoint *);
 /* hashclose flags */
 #define CSUM_CLOSE 1
 #define CSUM_FSYNC 2
+#define CSUM_KEEP_OPEN 4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
-- 
2.7.4

Re: [PATCH v2 05/36] resolve-undo: convert struct resolve_undo_info to object_id

2018-02-26 Thread brian m. carlson

On Mon, Feb 26, 2018 at 06:25:24PM +0700, Duy Nguyen wrote:
> On Mon, Feb 26, 2018 at 4:11 AM, brian m. carlson
>  wrote:
> > @@ -44,7 +44,7 @@ void resolve_undo_write(struct strbuf *sb, struct 
> > string_list *resolve_undo)
> > for (i = 0; i < 3; i++) {
> > if (!ui->mode[i])
> > continue;
> > -   strbuf_add(sb, ui->sha1[i], 20);
> > +   strbuf_add(sb, ui->oid[i].hash, 
> > the_hash_algo->rawsz);
> > }
> > }
> >  }
> > @@ -89,7 +89,7 @@ struct string_list *resolve_undo_read(const char *data, 
> > unsigned long size)
> > continue;
> > if (size < 20)
> > goto error;
> > -   hashcpy(ui->sha1[i], (const unsigned char *)data);
> > +   hashcpy(ui->oid[i].hash, (const unsigned char 
> > *)data);
> > size -= 20;
> > data += 20;
> > }

It looks like I may have missed a conversion there.  I'll fix that in a
reroll.

> Here we see the same pattern again, but this time the @@ lines give
> better context: these are actually hash I/O. Maybe it's about time we
> add
> 
> int oidwrite(char *, size_t , const struct object_id *);
> // optionally, void strbuf_addoid(struct strbuf *, const struct object_id *);
> int oidread(struct object_id *, const char *, size_t);
> 
> for conversion from between an object_id in memory and on disk? It
> would probably be a straight memcpy for all hash algorithms so we
> don't really need new function pointers in git_hash_algo for this.

I don't have a strong opinion about adding those or not adding them; if
people think it makes the code cleaner to read, I'm happy to add them.
It would probably makes sense to make them inline if we do, so that the
compiler can optimize them best.

I think we do need to preserve hashcpy anyway for a handful of cases
(such as pack checksums and rerere) that aren't technically object IDs
and won't use those functions.  I encountered a similar experience with
get_sha1_hex recently: there are a handful of cases that want to read
the name of a pack or a rerere cache, which are not object IDs.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

[no subject]

2018-02-26 Thread Alan Gage

Hello, I recently noticed a bug involving GitBash and Python. I was
running a function that would post the system time once every second
using a while loop but the text was only sent after the while loop
ended due to a timer I had set. Essesntially, instead of it being
entered every second into the terminal, it was entered all at once,
when the loop ended. I tried this with the Command Line, among other
things, and it worked as intended, with the text being entered every
second. This is on Windows 10 Pro with the Fall Creators Update and
the most recent version of GitBash.

Re: [PATCH 1/4] repository.c: move env-related setup code back to environment.c

2018-02-26 Thread Duy Nguyen

On Tue, Feb 27, 2018 at 3:30 AM, Stefan Beller  wrote:
>> diff --git a/setup.c b/setup.c
>> index c5d55dcee4..6fac1bb58a 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -1116,8 +1116,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>> const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
>> if (!gitdir)
>> gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
>> -   repo_set_gitdir(the_repository, gitdir);
>> -   setup_git_env();
>> +   setup_git_env(gitdir);
>> }
>
> What makes git_dir special, such that we have to pass it in here?
> Could the check for getenv(GIT_DIR_ENVIRONMENT) and fallback to
> DEFAULT_... be part of setup_git_env() ?
> Oh I guess that cannot be done easily as the submodule code
> spcifically doesn't want to discover the git dir itself.

No, submodule code already does not care about setup_git_env().

Back to your first question, this is related to the "NEEDSWORK"
comment in this code. We _should_ always set_git_dir() before leaving
setup_git_directory_gently() then everybody behaves the same way and
we don't need this special setup_git_env() here at all. BUT we need to
be careful about setting environment variable $GIT_DIR, done inside
set_git_dir(), because sub-processes will inherit it and if we're not
careful, we'll break .git discovery in those. There's another thing I
don't like about this is setup_work_tree() using set_git_dir() the
second time to adjust relative $GIT_DIR and friends when it moves
$CWD. I'll need to fix this, soon.

So in short, it's special because the current situation is messy and
(hopefully) temporary. But it should eventually be gone.
-- 
Duy

Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Duy Nguyen

On Tue, Feb 27, 2018 at 3:46 AM, Stefan Beller  wrote:
> What is the plan from here on?
> Should I build further series on top of yours? The next series will
> focus on the pack side of things (pack.h, packfile.{c,h})
>
> So maybe we'll have Junio merge down my series (and yours as it
> needs one reroll?) and then build on top of that?

I think that's the less painful way for everybody.
-- 
Duy

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-02-26 Thread Johannes Schindelin

Hi Buga,

On Tue, 20 Feb 2018, Igor Djordjevic wrote:

> I`m really interested in this topic, which seems to (try to) address the
> only "bad feeling" I had with rebasing merges - being afraid of silently
> losing amendments by actually trying to "replay" the merge (where
> additional and possibly important context is missing), instead of really
> "rebasing" it (somehow).

If those amendments are what you are worried about, why not address them
specifically?

In other words, rather than performing the equivalent of

git show ^! | git apply

(which would of course completely ignore if the rewritten ^2
dropped a patch, amended a patch, or even added a new patch), what you
really want is to figure out what changes the user made when merging, and
what merge strategy was used to begin with.

To see what I mean, look at the output of `git show 0fd90daba8`: it shows
how conflicts were resolved. By necessity, this is more complicated than a
simple diff: it is *not* as simple as taking a diff between two revisions
and applying that diff to a third revision. There were (at least) three
revisions involved in the original merge commit, and recreating that merge
commit faithfully means to represent the essence of the merge commit
faithfully enough to be able to replay it on a new set of at least three
revisions.  That can be simplified to two-way diffs only in very, very
special circumstances, and in all other cases this simplification will
simply fall on its nose.

If the proposed solution was to extend `git apply` to process combined
diffs, I would agree that we're on to something. That is not the proposed
solution, though.

In short: while I am sympathetic to the desire to keep things simple,
the idea to somehow side-step replaying the original merge seems to be
*prone* to be flawed. Any system that cannot accommodate
dropped/changed/added commits on either side of a merge is likely to be
too limited to be useful.

Ciao,
Johannes

Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)

2018-02-26 Thread Duy Nguyen

On Tue, Feb 27, 2018 at 3:50 AM, Stefan Beller  wrote:
>> The natural thing to do is move these to raw_object_store too (and
>> repo_submodule_init needs to check submodule's config file). But one
>> may argue these should be per-process instead of per-repo though. I
>> don't know. But I thought I should mention this.
>
> For now a process and a repository is the same as git-gc or git-repack

I think you're thinking about the pack writing part, but these are
configuration for pack reading (aka "object store"). If you read a
blob from a submodule (e.g. git-grep), you'll use these configurations
at some point.

There are of course more configuration for pack writing (like
zlib_compression_level) which I deliberately avoided to mention
because I don't even know where they belong.

> doesn't know about the --recurse-submodules flag, yet.
> I wonder if we ever want to teach those commands the submodule
> recursion, because of the issue you bring up, which settings do we apply
> for a submodule? Traditionally we'd just have the command line override
> the configuration, which I don't know if it is a good idea for these settings.
-- 
Duy

Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command

2018-02-26 Thread Ævar Arnfjörð Bjarmason

On Sat, Feb 24 2018, Jeff King jotted:

> On Thu, Feb 22, 2018 at 04:45:14PM -0800, Brandon Williams wrote:
>> > Does the client have to be aware that we're using wildmatch? I think
>> > they'd need "refs/heads/**" to actually implement what we usually
>> > specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME
>> > make this work with just "*"?
>> >
>> > Do we anticipate that the client would left-anchor the refspec like
>> > "/refs/heads/*" so that in theory the server could avoid looking outside
>> > of /refs/heads/?
>>
>> Yeah we may want to anchor it by providing the leading '/' instead of
>> just "refs/".
>
> I actually wonder if we should just specify that the patterns must
> _always_ be fully-qualified, but may end with a single "/*" to iterate
> over wildcards. Or even simpler, that "refs/heads/foo" would find that
> ref itself, and anything under it.

I agree that this is a very good trade-off for now, but I think having
an escape hatch makes sense. It looks like the protocol is implicitly
extendible since another parameter could be added, but maybe having such
a parameter from the get-go would make sense:

pattern-type [simple|wildmatch|pcre|...]
ref-pattern 

E.g.:

pattern-type simple
ref-pattern refs/tags/*
ref-pattern refs/pull/*
pattern-type wildmatch
ref-pattern refs/**/2018
pattern-type pcre
ref-pattern ^refs/release/v-201[56789]-\d+$

I.e. each ref-pattern is typed by the pattern-type in play, with just
"simple" (with the behavior being discussed here) for now, anything else
(wildmatch, pcre etc.) would be an error.

But it allows for adding more patterns down the line, and in
e.g. in-house setups of git where you control both the server & clients
to make the trade-off that we'd like a bit more work on the server
(e.g. to match dated tags created in the last 3 months) by setting some
config option.

The discussion upthread about:

> The other problem with tail-matching is that it's inefficient on the
> server[...]

Is also something that's only true in the current implementation, but
doesn't need to be, so it would be unfortunate to not work in an escape
hatch for that limtiation.

E.g. if the refs were stored indexed using the method described at
https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less
efficient than prefix matching, but a function of how many trigrams in
your index match the pattern given.

Re: [PATCHv4 01/27] repository: introduce raw object store field

2018-02-26 Thread Duy Nguyen

On Tue, Feb 27, 2018 at 2:28 AM, Stefan Beller  wrote:
> On Mon, Feb 26, 2018 at 1:30 AM, Duy Nguyen  wrote:
>> On Fri, Feb 23, 2018 at 04:47:28PM -0800, Stefan Beller wrote:
>>>  /* The main repository */
>>>  static struct repository the_repo = {
>>> - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
>>> _algos[GIT_HASH_SHA1], 0, 0
>>> + NULL, NULL,
>>> + RAW_OBJECT_STORE_INIT,
>>> + NULL, NULL, NULL,
>>> + NULL, NULL, NULL,
>>> + _index,
>>> + _algos[GIT_HASH_SHA1],
>>> + 0, 0
>>>  };
>>>  struct repository *the_repository = _repo;
>>
>> I wonder if we should do something like this. It makes the_repo
>> initialization easier to read and it helps unify the init code with
>> submodule.
>>
>> No don't reroll. If you think it's a good idea, you can do something
>> like this in the next series instead.
>>
>> -- 8< --
>> diff --git a/check-racy.c b/check-racy.c
>> index 24b6542352..47cbb4eb6d 100644
>> --- a/check-racy.c
>> +++ b/check-racy.c
>
> totally offtopic: Do we want to move this file into t/helper?

No wonder both Jeff and I missed this program (he didn't convert it to
use cmd_main, and I didn't move it to t/helper). This git-check-racy
is added in 42f774063d (Add check program "git-check-racy" -
2006-08-15) and is not part of the default build. You need to manually
update Makefile first to build it.

Right now it's broken (multiple definition of 'main'). If you add
init_the_repository() or something similar, just leave this file
untouched. Maybe I'll fix it later, separately. Or perhaps I'll move
this functionality to git-update-index if this is still worth keeping.
-- 
Duy

Re: [PATCH 1/1] git-p4: add format-patch subcommand

2018-02-26 Thread Eric Sunshine

On Mon, Feb 26, 2018 at 6:48 AM, Luke Diamand  wrote:
> It takes a list of P4 changelists and generates a patch for
> each one, using "p4 describe".
> [...]
> Signed-off-by: Luke Diamand 
> ---
> diff --git a/git-p4.py b/git-p4.py
> @@ -3749,6 +3761,277 @@ class P4Branches(Command):
> +class P4MakePatch(Command,P4UserMap):
> +def run(self, args):
> +if self.output and not os.path.isdir(self.output):
> +sys.exit("output directory %s does not exist" % self.output)

For consistency with "git format-patch", this could create the output
directory automatically rather than erroring out.

> +if self.strip_depot_prefix:
> +self.clientSpec = getClientSpec()
> +else:
> +self.clientSpec = None
> +
> +self.loadUserMapFromCache()
> +if len(args) < 1:
> +return False

Would it make sense to handle this no-arguments case earlier before
doing work, such as loading the user map from cache?

> +for change in args:
> +self.make_patch(int(change))
> +
> +return True
> +
> +def p4_fetch_delta(self, change, files, shelved=False):
> +cmd = ["describe"]
> +if shelved:
> +cmd += ["-S"]
> +cmd += ["-du"]
> +cmd += ["%s" % change]
> +cmd = p4_build_cmd(cmd)
> +
> +p4 = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE)
> +try:
> +result = p4.stdout.readlines()
> +except EOFError:
> +pass
> +in_diff = False
> +matcher = re.compile('\s+(.*)#(\d+)\s+\(text\)\s+')
> +diffmatcher = re.compile("Differences ...")

I'm not familiar with the output of "p4 describe", but does it include
user-supplied text? If so, would it be safer to anchor 'diffmatcher',
for instance, "^Diferences...$"?

> +delta = ""
> +skip_next_blank_line = False
> +
> +for line in result:
> +if diffmatcher.match(line):

Stepping back, does "Differences..." appear on a line of its own? If
so, why does this need to be a regex at all? Can't it just do a direct
string comparison?

> +in_diff = True
> +continue
> +
> +if in_diff:
> +
> +if skip_next_blank_line and \
> +line.rstrip() == "":
> +skip_next_blank_line = False
> +continue
> +
> +m = matcher.match(line)
> +if m:
> +file = self.map_path(m.group(1))
> +ver = m.group(2)
> +delta += "diff --git a%s b%s\n" % (file, file)
> +delta += "--- a%s\n" % file
> +delta += "+++ b%s\n" % file
> +skip_next_blank_line = True
> +else:
> +delta += line
> +
> +delta += "\n"
> +
> +exitCode = p4.wait()
> +if exitCode != 0:
> +raise IOError("p4 '%s' command failed" % str(cmd))

Since p4.stdout.readlines() gathered all output from the command
before massaging it into Git format, can't the p4.wait() be done
earlier, just after the output has been read?

> +return delta
> +
> +def make_patch(self, change):
> +[...]
> +# add new or deleted files
> +for file in files:
> +[...]
> +if add or delete:
> +if add:
> +[...]
> +else:
> +[...]
> +
> +[...]
> +
> +if add:
> +prefix = "+"
> +else:
> +prefix = "-"
> +
> +if delete:
> +[...]
> +else:
> +# added
> +[...]
> +
> +(lines, delta_content) = self.read_file_contents(prefix, 
> path_rev)
> +
> +if add:
> +if lines > 0:
> +delta += "@@ -0,0 +1,%d @@\n" % lines
> +else:
> +delta += "@@ -1,%d +0,0 @@\n" % lines

It's not clear why you sometimes check 'add' but other times 'delete'.
Perhaps always used one or the other? For instance:

action = file["action"]
if action == "add" or action == "delete":
if action == "add":
before = "/dev/null"
...
else
...

delta += ...

if action == "add":
...

or something.

> +delta += delta_content
> +
> +if self.output:
> +with open("%s/%s.patch" % (self.output, change), "w") as f:
> +f.write(delta)
> +else:
> +print(delta)
> diff --git a/t/t9832-make-patch.sh b/t/t9832-make-patch.sh
> @@ -0,0 +1,135 @@
> +# Converting P4 changes into patches
> +#
> +# - added, deleted, modified files
> +# - regular commits, shelved commits
>

Re: [BUG] [git 2.16.1] yeeek ... my files are gone .. by git pull

2018-02-26 Thread Johannes Schindelin

Hi Peff,

On Fri, 23 Feb 2018, Jeff King wrote:

> On Fri, Feb 23, 2018 at 06:29:55AM +0100, "Marcel 'childNo͡.de' Trautwein" 
> wrote:
> 
> > shows me a quite different behavior, so solely rebase not seems the
> > full problem BUT `--rebase=preserve` will .. o’man , really, is this
> > intended?
> 
> Yeah, the bug seems to be in --preserve-merges. Here's an easier
> reproduction:
> 
>   git init
>   >one && git add one && git commit -m one
>   git checkout --orphan other
>   git mv one two && git commit -m two
> 
>   git rebase --preserve-merges master
> 
> at which point we've dropped commit "two" and its files entirely.
> 
> I don't know much about how preserve-merges works. It looks like in the
> loop around git-rebase--interactive.sh:931 that we mark commit "two"
> with preserve=t, and so we _don't_ add it to the list of commits to
> pick.
> 
> I think that stems from the fact that it has no parent marked to be
> rewritten.
> 
> So something like this helps:
> 
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 81c5b42875..71e6cbb388 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -921,15 +921,20 @@ else
>  
>   if test -z "$rebase_root"
>   then
>   preserve=t
> + p=
>   for p in $(git rev-list --parents -1 $sha1 | cut -d' ' 
> -s -f2-)
>   do
>   if test -f "$rewritten"/$p
>   then
>   preserve=f
>   fi
>   done
> + if test -z "$p"
> + then
> + preserve=f
> + fi
>   else
>   preserve=f
>   fi
>   if test f = "$preserve"
> 
> Because it at least adds "two" to the list of commits to pick. But
> oddly, it picks it directly as a root commit again. Whereas a rebase
> without --preserve-merges (and even "-i") picks it on top of commit
> "one" (which is what I'd expect).
> 
> +cc Dscho, as the --preserve-merges guru.

Your analysis makes sense to me. Please note, though, that I would not
consider myself a guru on preserve-merges. I think this mode is broken by
design (you can blame me if you want).

Ciao,
Dscho

Re: [PATCH 1/1] git-p4: add format-patch subcommand

2018-02-26 Thread Miguel Torroja

very nice subcommand, I tested it with a shelved CL and a submitted CL.
Find my comments inline

On Mon, Feb 26, 2018 at 12:48 PM, Luke Diamand  wrote:
> It takes a list of P4 changelists and generates a patch for
> each one, using "p4 describe".
>
> This is especially useful for applying shelved changelists
> to your git-p4 tree; the existing "git p4" subcommands do
> not handle these.
>
> That's because they "p4 print" the contents of each file at
> a given revision, and then use git-fastimport to generate the
> deltas. But in the case of a shelved changelist, there is no
> easy way to find out what the previous file state was - Perforce
> does not have the concept of a single repo-wide revision.
>
> Unfortunately, using "p4 describe" comes with a price: it cannot
> be used to reliably generate diffs for binary files (it tries to
> linebreak on LF characters) and it is also _much_ slower.
>
> Unicode character correctness is untested - in theory if
> "p4 describe" knows about the character encoding it shouldn't
> break unicode characters if they happen to contain LF, but I
> haven't tested this.
>
> Signed-off-by: Luke Diamand 
> ---
>  Documentation/git-p4.txt |  33 +
>  git-p4.py| 304 
> +--
>  t/t9832-make-patch.sh| 135 +
>  3 files changed, 462 insertions(+), 10 deletions(-)
>  create mode 100755 t/t9832-make-patch.sh
>
> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
> index d8c8f11c9f..1908b00de2 100644
> --- a/Documentation/git-p4.txt
> +++ b/Documentation/git-p4.txt
> @@ -164,6 +164,28 @@ $ git p4 submit --shelve
>  $ git p4 submit --update-shelve 1234 --update-shelve 2345
>  
>
> +
> +format-patch
> +~~
> +This will attempt to create a unified diff (using the git patch variant) 
> which
> +can be passed to patch. This is generated using the output from "p4 
> describe".
> +
> +It includes the contents of added files (which "p4 describe" does not).
> +
> +Binary files cannot be handled correctly due to limitations in "p4 describe".
> +
> +It will handle both normal and shelved (pending) changelists.
> +
> +Because of the way this works, it will be much slower than the normal git-p4 
> clone
> +path.
> +
> +
> +$ git p4 format-patch 12345
> +$ git p4 format-patch --output patchdir 12345 12346 12347
> +$ git p4 format-patch --strip-depot-prefix 12348 > out.patch
> +$ git am out.patch
> +
> +
>  OPTIONS
>  ---
>
> @@ -337,6 +359,17 @@ These options can be used to modify 'git p4 rebase' 
> behavior.
>  --import-labels::
> Import p4 labels.
>
> +Format-patch options
> +
> +
> +--output::
> +Write patches to this directory (which must exist) instead of to
> +standard output.
> +
> +--strip-depot-prefix::
> +Strip the depot prefix from filenames in the patch.  This makes
> +it suitable for passing to tools such as "git am".
> +
>  DEPOT PATH SYNTAX
>  -
>  The p4 depot path argument to 'git p4 sync' and 'git p4 clone' can
> diff --git a/git-p4.py b/git-p4.py
> index 7bb9cadc69..a1e998e6f5 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -26,6 +26,7 @@ import zipfile
>  import zlib
>  import ctypes
>  import errno
> +import time
>
>  try:
>  from subprocess import CalledProcessError
> @@ -316,12 +317,17 @@ def p4_last_change():
>  results = p4CmdList(["changes", "-m", "1"], skip_info=True)
>  return int(results[0]['change'])
>
> -def p4_describe(change):
> +def p4_describe(change, shelved=False):
>  """Make sure it returns a valid result by checking for
> the presence of field "time".  Return a dict of the
> results."""
>
> -ds = p4CmdList(["describe", "-s", str(change)], skip_info=True)
> +cmd = ["describe", "-s"]
> +if shelved:
> +cmd += ["-S"]
> +cmd += [str(change)]
> +
> +ds = p4CmdList(cmd, skip_info=True)
>  if len(ds) != 1:
>  die("p4 describe -s %d did not return 1 result: %s" % (change, 
> str(ds)))
>
> @@ -372,7 +378,14 @@ def split_p4_type(p4type):
>  mods = ""
>  if len(s) > 1:
>  mods = s[1]
> -return (base, mods)
> +
> +git_mode = "100644"
> +if "x" in mods:
> +git_mode = "100755"
> +if base == "symlink":
> +git_mode = "12"
> +
> +return (base, mods, git_mode)
>
>  #
>  # return the raw p4 type of a file (text, text+ko, etc)
> @@ -413,7 +426,7 @@ def p4_keywords_regexp_for_file(file):
>  if not os.path.exists(file):
>  return None
>  else:
> -(type_base, type_mods) = split_p4_type(p4_type(file))
> +(type_base, type_mods, _) = split_p4_type(p4_type(file))
>  return p4_keywords_regexp_for_type(type_base, type_mods)
>
>  def setP4ExecBit(file, mode):
> @@ -1208,6 +1221,9 @@ class P4UserMap:
>  else:
>  return True
>
> +def getP4UsernameEmail(self, userid):
> +return

Re: What's cooking in git.git (Feb 2018, #03; Wed, 21)

2018-02-26 Thread Johannes Schindelin

Hi Junio,

On Fri, 23 Feb 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Wed, 21 Feb 2018, Junio C Hamano wrote:
> >
> >> * js/rebase-recreate-merge (2018-02-12) 12 commits
> >>  ...
> >> 
> >>  "git rebase" learned "--recreate-merges" to transplant the whole
> >>  topology of commit graph elsewhere.
> >> 
> >>  Is this ready for 'next'?
> >
> > Not quite. I have two more changes lined up. Let me send v4.
> 
> Picked up and started looking at it.  Thanks.

FWIW v4's patches are identical to v5, but the interdiff as well as the
pushed tags are completely broken in v4 (culprit: an error in my script
that I recently converted from Bash to node.js, where it would not stop on
`git rebase` returning with a non-zero exit status, fixed since).

Ciao,
Dscho

Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-26 Thread Junio C Hamano

Duy Nguyen  writes:

> Yes that's the intention. But after writing cover letter for v2 and
> sending it out, it looks to me that this thing must stay until all our
> code is converted to using the_hash_algo (I don't know if there are
> more to convert or it's finished already). So an alternative is we do
> the opposite: default to GIT_HASH_SHA1, but when an env variable is
> set, reset it back to NULL. This env variable will _always_ be set by
> the test suite to help us catch problems.

If we were to do the "do not blindly initialize and always
initialize explicitly for debugging" (which I doubt is a good
direction to go in), then what you outlined above certainly is a
less evil way to do so than what the patch under discussion did, I
would think.

Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command

2018-02-26 Thread Junio C Hamano

Jeff King  writes:

> I actually wonder if we should just specify that the patterns must
> _always_ be fully-qualified, but may end with a single "/*" to iterate
> over wildcards. Or even simpler, that "refs/heads/foo" would find that
> ref itself, and anything under it.
>
> That drops any question about how wildcards work (e.g., does "refs/foo*"
> work to find "refs/foobar"?).

Sounds quite sensible to me.

Re: [PATCH v3 21/35] fetch-pack: perform a fetch using v2

2018-02-26 Thread Brandon Williams

On 02/23, Jonathan Tan wrote:
> On Tue,  6 Feb 2018 17:12:58 -0800
> Brandon Williams  wrote:
> 
> > +   while ((oid = get_rev())) {
> > +   packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
> > +   if (++haves_added >= INITIAL_FLUSH)
> > +   break;
> > +   };
> 
> Unnecessary semicolon after closing brace.

Thanks, I'll remove that.

> 
> > +   /* Filter 'ref' by 'sought' and those that aren't local 
> > */
> > +   if (everything_local(args, , sought, nr_sought))
> > +   state = FETCH_DONE;
> > +   else
> > +   state = FETCH_SEND_REQUEST;
> > +   break;
> 
> I haven't looked at this patch in detail, but I found a bug that can be
> reproduced if you patch the following onto this patch:
> 
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -124,6 +124,7 @@ test_expect_success 'clone with file:// using 
> protocol v2' '
>  
>  test_expect_success 'fetch with file:// using protocol v2' '
> test_commit -C file_parent two &&
> +   git -C file_parent tag -d one &&
>  
> GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=2 \
> fetch origin 2>log &&
> @@ -133,7 +134,8 @@ test_expect_success 'fetch with file:// using 
> protocol v2' '
> test_cmp expect actual &&
>  
> # Server responded using protocol v2
> -   grep "fetch< version 2" log
> +   grep "fetch< version 2" log &&
> +   grep "have " log
>  '
> 
> Merely including the second hunk (the one with 'grep "have "') does not
> make the test fail, but including both the first and second hunks does.
> That is, fetch v2 emits "have" only for remote refs that point to
> objects we already have, not for local refs.
> 
> Everything still appears to work, except that packfiles are usually much
> larger than they need to be.
> 
> I did some digging in the code and found out that the equivalent of
> find_common() (which calls `for_each_ref(rev_list_insert_ref_oid,
> NULL)`) was not called in v2. In v1, find_common() is called immediately
> after everything_local(), but there is no equivalent in v2. (I quoted
> the invocation of everything_local() in v2 above.)

I actually caught this Friday morning when I realized that fetching from
a referenced repository would replicated objects instead of using them
from the referenced repository.  Thanks for pointing this out :)

-- 
Brandon Williams

[PATCH v5 12/12] rebase -i: introduce --recreate-merges=[no-]rebase-cousins

2018-02-26 Thread Johannes Schindelin

This one is a bit tricky to explain, so let's try with a diagram:

C
  /   \
A - B - E - F
  \   /
D

To illustrate what this new mode is all about, let's consider what
happens upon `git rebase -i --recreate-merges B`, in particular to
the commit `D`. So far, the new branch structure would be:

   --- C' --
  / \
A - B -- E' - F'
  \/
D'

This is not really preserving the branch topology from before! The
reason is that the commit `D` does not have `B` as ancestor, and
therefore it gets rebased onto `B`.

This is unintuitive behavior. Even worse, when recreating branch
structure, most use cases would appear to want cousins *not* to be
rebased onto the new base commit. For example, Git for Windows (the
heaviest user of the Git garden shears, which served as the blueprint
for --recreate-merges) frequently merges branches from `next` early, and
these branches certainly do *not* want to be rebased. In the example
above, the desired outcome would look like this:

   --- C' --
  / \
A - B -- E' - F'
  \/
   -- D' --

Let's introduce the term "cousins" for such commits ("D" in the
example), and let's not rebase them by default, introducing the new
"rebase-cousins" mode for use cases where they should be rebased.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-rebase.txt  |  7 ++-
 builtin/rebase--helper.c  |  9 -
 git-rebase--interactive.sh|  1 +
 git-rebase.sh | 12 +++-
 sequencer.c   |  4 
 sequencer.h   |  6 ++
 t/t3430-rebase-recreate-merges.sh | 23 +++
 7 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 5e056c8ab6b..c5a77599c47 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -373,11 +373,16 @@ The commit list format can be changed by setting the 
configuration option
 rebase.instructionFormat.  A customized instruction format will automatically
 have the long commit hash prepended to the format.
 
---recreate-merges::
+--recreate-merges[=(rebase-cousins|no-rebase-cousins)]::
Recreate merge commits instead of flattening the history by replaying
merges. Merge conflict resolutions or manual amendments to merge
commits are not recreated automatically, but have to be recreated
manually.
++
+By default, or when `no-rebase-cousins` was specified, commits which do not
+have `` as direct ancestor keep their original branch point.
+If the `rebase-cousins` mode is turned on, such commits are rebased onto
+`` (or ``, if specified).
 
 -p::
 --preserve-merges::
diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index a5b07c43c96..5d1f12de57b 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -13,7 +13,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
unsigned flags = 0, keep_empty = 0, recreate_merges = 0;
-   int abbreviate_commands = 0;
+   int abbreviate_commands = 0, rebase_cousins = -1;
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH,
@@ -25,6 +25,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "allow-empty-message", _empty_message,
N_("allow commits with empty messages")),
OPT_BOOL(0, "recreate-merges", _merges, N_("recreate 
merge commits")),
+   OPT_BOOL(0, "rebase-cousins", _cousins,
+N_("keep original branch points of cousins")),
OPT_CMDMODE(0, "continue", , N_("continue rebase"),
CONTINUE),
OPT_CMDMODE(0, "abort", , N_("abort rebase"),
@@ -59,8 +61,13 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
flags |= keep_empty ? TODO_LIST_KEEP_EMPTY : 0;
flags |= abbreviate_commands ? TODO_LIST_ABBREVIATE_CMDS : 0;
flags |= recreate_merges ? TODO_LIST_RECREATE_MERGES : 0;
+   flags |= rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
flags |= command == SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
 
+   if (rebase_cousins >= 0 && !recreate_merges)
+   warning(_("--[no-]rebase-cousins has no effect without "
+ "--recreate-merges"));
+
if (command == CONTINUE && argc == 1)
return !!sequencer_continue();
if (command == ABORT && argc == 1)
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index f5c8db2fdf8..679d79e0d17 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -907,6 +907,7 @@ if test t != "$preserve_merges"
 then

[PATCH v5 02/12] sequencer: make rearrange_squash() a bit more obvious

2018-02-26 Thread Johannes Schindelin

There are some commands that have to be skipped from rearranging by virtue
of not handling any commits.

However, the logic was not quite obvious: it skipped commands based on
their position in the enum todo_command.

Instead, let's make it explicit that we skip all commands that do not
handle any commit. With one exception: the `drop` command, because it,
well, drops the commit and is therefore not eligible to rearranging.

Note: this is a bit academic at the moment because the only time we call
`rearrange_squash()` is directly after generating the todo list, when we
have nothing but `pick` commands anyway.

However, the upcoming `merge` command *will* want to be handled by that
function, and it *can* handle commits.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 5aa3dc3c95c..cfa01d3bdd2 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3412,7 +3412,7 @@ int rearrange_squash(void)
struct subject2item_entry *entry;
 
next[i] = tail[i] = -1;
-   if (item->command >= TODO_EXEC) {
+   if (!item->commit || item->command == TODO_DROP) {
subjects[i] = NULL;
continue;
}
-- 
2.16.1.windows.4

[PATCH v5 11/12] pull: accept --rebase=recreate to recreate the branch topology

2018-02-26 Thread Johannes Schindelin

Similar to the `preserve` mode simply passing the `--preserve-merges`
option to the `rebase` command, the `recreate` mode simply passes the
`--recreate-merges` option.

This will allow users to conveniently rebase non-trivial commit
topologies when pulling new commits, without flattening them.

Signed-off-by: Johannes Schindelin 
---
 Documentation/config.txt   |  8 
 Documentation/git-pull.txt |  5 -
 builtin/pull.c | 14 ++
 builtin/remote.c   |  2 ++
 contrib/completion/git-completion.bash |  2 +-
 5 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10ca..8c9adea0d0c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1058,6 +1058,10 @@ branch..rebase::
"git pull" is run. See "pull.rebase" for doing this in a non
branch-specific manner.
 +
+When recreate, also pass `--recreate-merges` along to 'git rebase'
+so that locally committed merge commits will not be flattened
+by running 'git pull'.
++
 When preserve, also pass `--preserve-merges` along to 'git rebase'
 so that locally committed merge commits will not be flattened
 by running 'git pull'.
@@ -2607,6 +2611,10 @@ pull.rebase::
pull" is run. See "branch..rebase" for setting this on a
per-branch basis.
 +
+When recreate, also pass `--recreate-merges` along to 'git rebase'
+so that locally committed merge commits will not be flattened
+by running 'git pull'.
++
 When preserve, also pass `--preserve-merges` along to 'git rebase'
 so that locally committed merge commits will not be flattened
 by running 'git pull'.
diff --git a/Documentation/git-pull.txt b/Documentation/git-pull.txt
index ce05b7a5b13..b4f9f057ea9 100644
--- a/Documentation/git-pull.txt
+++ b/Documentation/git-pull.txt
@@ -101,13 +101,16 @@ Options related to merging
 include::merge-options.txt[]
 
 -r::
---rebase[=false|true|preserve|interactive]::
+--rebase[=false|true|recreate|preserve|interactive]::
When true, rebase the current branch on top of the upstream
branch after fetching. If there is a remote-tracking branch
corresponding to the upstream branch and the upstream branch
was rebased since last fetched, the rebase uses that information
to avoid rebasing non-local changes.
 +
+When set to recreate, rebase with the `--recreate-merges` option passed
+to `git rebase` so that locally created merge commits will not be flattened.
++
 When set to preserve, rebase with the `--preserve-merges` option passed
 to `git rebase` so that locally created merge commits will not be flattened.
 +
diff --git a/builtin/pull.c b/builtin/pull.c
index 1876271af94..9da2cfa0bd3 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -27,14 +27,16 @@ enum rebase_type {
REBASE_FALSE = 0,
REBASE_TRUE,
REBASE_PRESERVE,
+   REBASE_RECREATE,
REBASE_INTERACTIVE
 };
 
 /**
  * Parses the value of --rebase. If value is a false value, returns
  * REBASE_FALSE. If value is a true value, returns REBASE_TRUE. If value is
- * "preserve", returns REBASE_PRESERVE. If value is a invalid value, dies with
- * a fatal error if fatal is true, otherwise returns REBASE_INVALID.
+ * "recreate", returns REBASE_RECREATE. If value is "preserve", returns
+ * REBASE_PRESERVE. If value is a invalid value, dies with a fatal error if
+ * fatal is true, otherwise returns REBASE_INVALID.
  */
 static enum rebase_type parse_config_rebase(const char *key, const char *value,
int fatal)
@@ -47,6 +49,8 @@ static enum rebase_type parse_config_rebase(const char *key, 
const char *value,
return REBASE_TRUE;
else if (!strcmp(value, "preserve"))
return REBASE_PRESERVE;
+   else if (!strcmp(value, "recreate"))
+   return REBASE_RECREATE;
else if (!strcmp(value, "interactive"))
return REBASE_INTERACTIVE;
 
@@ -130,7 +134,7 @@ static struct option pull_options[] = {
/* Options passed to git-merge or git-rebase */
OPT_GROUP(N_("Options related to merging")),
{ OPTION_CALLBACK, 'r', "rebase", _rebase,
- "false|true|preserve|interactive",
+ "false|true|recreate|preserve|interactive",
  N_("incorporate changes by rebasing rather than merging"),
  PARSE_OPT_OPTARG, parse_opt_rebase },
OPT_PASSTHRU('n', NULL, _diffstat, NULL,
@@ -800,7 +804,9 @@ static int run_rebase(const struct object_id *curr_head,
argv_push_verbosity();
 
/* Options passed to git-rebase */
-   if (opt_rebase == REBASE_PRESERVE)
+   if (opt_rebase == REBASE_RECREATE)
+   argv_array_push(, "--recreate-merges");
+   else if (opt_rebase == REBASE_PRESERVE)
argv_array_push(, "--preserve-merges");
else if (opt_rebase == REBASE_INTERACTIVE)

[PATCH v5 10/12] sequencer: handle post-rewrite for merge commands

2018-02-26 Thread Johannes Schindelin

In the previous patches, we implemented the basic functionality of the
`git rebase -i --recreate-merges` command, in particular the `merge`
command to create merge commits in the sequencer.

The interactive rebase is a lot more these days, though, than a simple
cherry-pick in a loop. For example, it calls the post-rewrite hook (if
any) after rebasing with a mapping of the old->new commits.

This patch implements the post-rewrite handling for the `merge` command
we just introduced. The other commands that were added recently (`label`
and `reset`) do not create new commits, therefore post-rewrite do not
need to handle them.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c   |  7 +--
 t/t3430-rebase-recreate-merges.sh | 25 +
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 01bafe2fe47..85ce37cb99f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2980,11 +2980,14 @@ static int pick_commits(struct todo_list *todo_list, 
struct replay_opts *opts)
else if (item->command == TODO_RESET)
res = do_reset(item->arg, item->arg_len, opts);
else if (item->command == TODO_MERGE ||
-item->command == TODO_MERGE_AND_EDIT)
+item->command == TODO_MERGE_AND_EDIT) {
res = do_merge(item->commit, item->arg, item->arg_len,
   item->command == TODO_MERGE_AND_EDIT ?
   EDIT_MSG | VERIFY_MSG : 0, opts);
-   else if (!is_noop(item->command))
+   if (item->commit)
+   record_in_rewritten(>commit->object.oid,
+   peek_command(todo_list, 1));
+   } else if (!is_noop(item->command))
return error(_("unknown command %d"), item->command);
 
todo_list->current++;
diff --git a/t/t3430-rebase-recreate-merges.sh 
b/t/t3430-rebase-recreate-merges.sh
index 1a3e43d66ff..35a61ce90bb 100755
--- a/t/t3430-rebase-recreate-merges.sh
+++ b/t/t3430-rebase-recreate-merges.sh
@@ -157,4 +157,29 @@ test_expect_success 'refs/rewritten/* is worktree-local' '
test_cmp_rev HEAD "$(cat wt/b)"
 '
 
+test_expect_success 'post-rewrite hook and fixups work for merges' '
+   git checkout -b post-rewrite &&
+   test_commit same1 &&
+   git reset --hard HEAD^ &&
+   test_commit same2 &&
+   git merge -m "to fix up" same1 &&
+   echo same old same old >same2.t &&
+   test_tick &&
+   git commit --fixup HEAD same2.t &&
+   fixup="$(git rev-parse HEAD)" &&
+
+   mkdir -p .git/hooks &&
+   test_when_finished "rm .git/hooks/post-rewrite" &&
+   echo "cat >actual" | write_script .git/hooks/post-rewrite &&
+
+   test_tick &&
+   git rebase -i --autosquash --recreate-merges HEAD^^^ &&
+   printf "%s %s\n%s %s\n%s %s\n%s %s\n" >expect $(git rev-parse \
+   $fixup^^2 HEAD^2 \
+   $fixup^^ HEAD^ \
+   $fixup^ HEAD \
+   $fixup HEAD) &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.16.1.windows.4

[PATCH v5 08/12] rebase: introduce the --recreate-merges option

2018-02-26 Thread Johannes Schindelin

Once upon a time, this here developer thought: wouldn't it be nice if,
say, Git for Windows' patches on top of core Git could be represented as
a thicket of branches, and be rebased on top of core Git in order to
maintain a cherry-pick'able set of patch series?

The original attempt to answer this was: git rebase --preserve-merges.

However, that experiment was never intended as an interactive option,
and it only piggy-backed on git rebase --interactive because that
command's implementation looked already very, very familiar: it was
designed by the same person who designed --preserve-merges: yours truly.

Some time later, some other developer (I am looking at you, Andreas!
;-)) decided that it would be a good idea to allow --preserve-merges to
be combined with --interactive (with caveats!) and the Git maintainer
(well, the interim Git maintainer during Junio's absence, that is)
agreed, and that is when the glamor of the --preserve-merges design
started to fall apart rather quickly and unglamorously.

The reason? In --preserve-merges mode, the parents of a merge commit (or
for that matter, of *any* commit) were not stated explicitly, but were
*implied* by the commit name passed to the `pick` command.

This made it impossible, for example, to reorder commits. Not to mention
to flatten the branch topology or, deity forbid, to split topic branches
into two.

Alas, these shortcomings also prevented that mode (whose original
purpose was to serve Git for Windows' needs, with the additional hope
that it may be useful to others, too) from serving Git for Windows'
needs.

Five years later, when it became really untenable to have one unwieldy,
big hodge-podge patch series of partly related, partly unrelated patches
in Git for Windows that was rebased onto core Git's tags from time to
time (earning the undeserved wrath of the developer of the ill-fated
git-remote-hg series that first obsoleted Git for Windows' competing
approach, only to be abandoned without maintainer later) was really
untenable, the "Git garden shears" were born [*1*/*2*]: a script,
piggy-backing on top of the interactive rebase, that would first
determine the branch topology of the patches to be rebased, create a
pseudo todo list for further editing, transform the result into a real
todo list (making heavy use of the `exec` command to "implement" the
missing todo list commands) and finally recreate the patch series on
top of the new base commit.

That was in 2013. And it took about three weeks to come up with the
design and implement it as an out-of-tree script. Needless to say, the
implementation needed quite a few years to stabilize, all the while the
design itself proved itself sound.

With this patch, the goodness of the Git garden shears comes to `git
rebase -i` itself. Passing the `--recreate-merges` option will generate
a todo list that can be understood readily, and where it is obvious
how to reorder commits. New branches can be introduced by inserting
`label` commands and calling `merge `. And once this mode will
have become stable and universally accepted, we can deprecate the design
mistake that was `--preserve-merges`.

Link *1*:
https://github.com/msysgit/msysgit/blob/master/share/msysGit/shears.sh
Link *2*:
https://github.com/git-for-windows/build-extra/blob/master/shears.sh

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-rebase.txt   |   9 +-
 contrib/completion/git-completion.bash |   2 +-
 git-rebase--interactive.sh |   1 +
 git-rebase.sh  |   6 ++
 t/t3430-rebase-recreate-merges.sh  | 146 +
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100755 t/t3430-rebase-recreate-merges.sh

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index d713951b86a..5e056c8ab6b 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -373,6 +373,12 @@ The commit list format can be changed by setting the 
configuration option
 rebase.instructionFormat.  A customized instruction format will automatically
 have the long commit hash prepended to the format.
 
+--recreate-merges::
+   Recreate merge commits instead of flattening the history by replaying
+   merges. Merge conflict resolutions or manual amendments to merge
+   commits are not recreated automatically, but have to be recreated
+   manually.
+
 -p::
 --preserve-merges::
Recreate merge commits instead of flattening the history by replaying
@@ -775,7 +781,8 @@ BUGS
 The todo list presented by `--preserve-merges --interactive` does not
 represent the topology of the revision graph.  Editing commits and
 rewording their commit messages should work fine, but attempts to
-reorder commits tend to produce counterintuitive results.
+reorder commits tend to produce counterintuitive results. Use
+--recreate-merges for a more faithful representation.
 
 For example, an attempt to rearrange

[PATCH v5 09/12] sequencer: make refs generated by the `label` command worktree-local

2018-02-26 Thread Johannes Schindelin

This allows for rebases to be run in parallel in separate worktrees
(think: interrupted in the middle of one rebase, being asked to perform
a different rebase, adding a separate worktree just for that job).

Signed-off-by: Johannes Schindelin 
---
 refs.c|  3 ++-
 t/t3430-rebase-recreate-merges.sh | 14 ++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 20ba82b4343..e8b84c189ff 100644
--- a/refs.c
+++ b/refs.c
@@ -600,7 +600,8 @@ int dwim_log(const char *str, int len, struct object_id 
*oid, char **log)
 static int is_per_worktree_ref(const char *refname)
 {
return !strcmp(refname, "HEAD") ||
-   starts_with(refname, "refs/bisect/");
+   starts_with(refname, "refs/bisect/") ||
+   starts_with(refname, "refs/rewritten/");
 }
 
 static int is_pseudoref_syntax(const char *refname)
diff --git a/t/t3430-rebase-recreate-merges.sh 
b/t/t3430-rebase-recreate-merges.sh
index 0073601a206..1a3e43d66ff 100755
--- a/t/t3430-rebase-recreate-merges.sh
+++ b/t/t3430-rebase-recreate-merges.sh
@@ -143,4 +143,18 @@ test_expect_success 'with a branch tip that was 
cherry-picked already' '
EOF
 '
 
+test_expect_success 'refs/rewritten/* is worktree-local' '
+   git worktree add wt &&
+   cat >wt/script-from-scratch <<-\EOF &&
+   label xyz
+   exec GIT_DIR=../.git git rev-parse --verify refs/rewritten/xyz >a || :
+   exec git rev-parse --verify refs/rewritten/xyz >b
+   EOF
+
+   test_config -C wt sequence.editor \""$PWD"/replace-editor.sh\" &&
+   git -C wt rebase -i HEAD &&
+   test_must_be_empty wt/a &&
+   test_cmp_rev HEAD "$(cat wt/b)"
+'
+
 test_done
-- 
2.16.1.windows.4

[PATCH v5 05/12] sequencer: introduce the `merge` command

2018-02-26 Thread Johannes Schindelin

This patch is part of the effort to reimplement `--preserve-merges` with
a substantially improved design, a design that has been developed in the
Git for Windows project to maintain the dozens of Windows-specific patch
series on top of upstream Git.

The previous patch implemented the `label` and `reset` commands to label
commits and to reset to labeled commits. This patch adds the `merge`
command, with the following syntax:

merge [-C ]  # 

The  parameter in this instance is the *original* merge commit,
whose author and message will be used for the merge commit that is about
to be created.

The  parameter refers to the (possibly rewritten) revision to
merge. Let's see an example of a todo list:

label onto

# Branch abc
reset onto
pick deadbeef Hello, world!
label abc

reset onto
pick cafecafe And now for something completely different
merge -C baaabaaa abc # Merge the branch 'abc' into master

To edit the merge commit's message (a "reword" for merges, if you will),
use `-c` (lower-case) instead of `-C`; this convention was borrowed from
`git commit` that also supports `-c` and `-C` with similar meanings.

To create *new* merges, i.e. without copying the commit message from an
existing commit, simply omit the `-C ` parameter (which will
open an editor for the merge message):

merge abc

This comes in handy when splitting a branch into two or more branches.

Note: this patch only adds support for recursive merges, to keep things
simple. Support for octopus merges will be added later in a separate
patch series, support for merges using strategies other than the
recursive merge is left for the future.

Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh |   4 ++
 sequencer.c| 158 +
 2 files changed, 162 insertions(+)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 501f09b28c4..2d8bbe20b74 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -164,6 +164,10 @@ x, exec  = run command (the rest of the line) 
using shell
 d, drop  = remove commit
 l, label  = label current HEAD with a name
 t, reset  = reset HEAD to a label
+m, merge [-C  | -c ]  [# ]
+.   create a merge commit using the original merge commit's
+.   message (or the oneline, if no original merge commit was
+.   specified). Use -c  to reword the commit message.
 
 These lines can be re-ordered; they are executed from top to bottom.
 " | git stripspace --comment-lines >>"$todo"
diff --git a/sequencer.c b/sequencer.c
index e25522ecdf1..64dbd1d3e2e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1306,6 +1306,8 @@ enum todo_command {
TODO_EXEC,
TODO_LABEL,
TODO_RESET,
+   TODO_MERGE,
+   TODO_MERGE_AND_EDIT,
/* commands that do nothing but are counted for reporting progress */
TODO_NOOP,
TODO_DROP,
@@ -1326,6 +1328,8 @@ static struct {
{ 'x', "exec" },
{ 'l', "label" },
{ 't', "reset" },
+   { 'm', "merge" },
+   { 0, "merge" }, /* MERGE_AND_EDIT */
{ 0,   "noop" },
{ 'd', "drop" },
{ 0,   NULL }
@@ -1839,6 +1843,21 @@ static int parse_insn_line(struct todo_item *item, const 
char *bol, char *eol)
return 0;
}
 
+   if (item->command == TODO_MERGE) {
+   if (skip_prefix(bol, "-C", ))
+   bol += strspn(bol, " \t");
+   else if (skip_prefix(bol, "-c", )) {
+   bol += strspn(bol, " \t");
+   item->command = TODO_MERGE_AND_EDIT;
+   } else {
+   item->command = TODO_MERGE_AND_EDIT;
+   item->commit = NULL;
+   item->arg = bol;
+   item->arg_len = (int)(eol - bol);
+   return 0;
+   }
+   }
+
end_of_object_name = (char *) bol + strcspn(bol, " \t\n");
saved = *end_of_object_name;
*end_of_object_name = '\0';
@@ -2624,6 +2643,134 @@ static int do_reset(const char *name, int len, struct 
replay_opts *opts)
return ret;
 }
 
+static int do_merge(struct commit *commit, const char *arg, int arg_len,
+   int run_commit_flags, struct replay_opts *opts)
+{
+   int merge_arg_len;
+   struct strbuf ref_name = STRBUF_INIT;
+   struct commit *head_commit, *merge_commit, *i;
+   struct commit_list *common, *j, *reversed = NULL;
+   struct merge_options o;
+   int ret;
+   static struct lock_file lock;
+
+   for (merge_arg_len = 0; merge_arg_len < arg_len; merge_arg_len++)
+   if (isspace(arg[merge_arg_len]))
+   break;
+
+   if (hold_locked_index(, LOCK_REPORT_ON_ERROR) < 0)
+   return -1;
+
+   head_commit = lookup_commit_reference_by_name("HEAD");
+

[PATCH v5 03/12] git-rebase--interactive: clarify arguments

2018-02-26 Thread Johannes Schindelin

From: Stefan Beller 

Up to now each command took a commit as its first argument and ignored
the rest of the line (usually the subject of the commit)

Now that we are about to introduce commands that take different
arguments, clarify each command by giving the argument list.

Signed-off-by: Stefan Beller 
Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 81c5b428757..a2659fea982 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -155,13 +155,13 @@ reschedule_last_action () {
 append_todo_help () {
gettext "
 Commands:
-p, pick = use commit
-r, reword = use commit, but edit the commit message
-e, edit = use commit, but stop for amending
-s, squash = use commit, but meld into previous commit
-f, fixup = like \"squash\", but discard this commit's log message
-x, exec = run command (the rest of the line) using shell
-d, drop = remove commit
+p, pick  = use commit
+r, reword  = use commit, but edit the commit message
+e, edit  = use commit, but stop for amending
+s, squash  = use commit, but meld into previous commit
+f, fixup  = like \"squash\", but discard this commit's log message
+x, exec  = run command (the rest of the line) using shell
+d, drop  = remove commit
 
 These lines can be re-ordered; they are executed from top to bottom.
 " | git stripspace --comment-lines >>"$todo"
-- 
2.16.1.windows.4

[PATCH v5 04/12] sequencer: introduce new commands to reset the revision

2018-02-26 Thread Johannes Schindelin

In the upcoming commits, we will teach the sequencer to recreate merges.
This will be done in a very different way from the unfortunate design of
`git rebase --preserve-merges` (which does not allow for reordering
commits, or changing the branch topology).

The main idea is to introduce new todo list commands, to support
labeling the current revision with a given name, resetting the current
revision to a previous state, and  merging labeled revisions.

This idea was developed in Git for Windows' Git garden shears (that are
used to maintain the "thicket of branches" on top of upstream Git), and
this patch is part of the effort to make it available to a wider
audience, as well as to make the entire process more robust (by
implementing it in a safe and portable language rather than a Unix shell
script).

This commit implements the commands to label, and to reset to, given
revisions. The syntax is:

label 
reset 

Internally, the `label ` command creates the ref
`refs/rewritten/`. This makes it possible to work with the labeled
revisions interactively, or in a scripted fashion (e.g. via the todo
list command `exec`).

These temporary refs are removed upon sequencer_remove_state(), so that
even a `git rebase --abort` cleans them up.

We disallow '#' as label because that character will be used as separator
in the upcoming `merge` command.

Later in this patch series, we will mark the `refs/rewritten/` refs as
worktree-local, to allow for interactive rebases to be run in parallel in
worktrees linked to the same repository.

Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh |   2 +
 sequencer.c| 196 +++--
 2 files changed, 192 insertions(+), 6 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index a2659fea982..501f09b28c4 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -162,6 +162,8 @@ s, squash  = use commit, but meld into previous 
commit
 f, fixup  = like \"squash\", but discard this commit's log message
 x, exec  = run command (the rest of the line) using shell
 d, drop  = remove commit
+l, label  = label current HEAD with a name
+t, reset  = reset HEAD to a label
 
 These lines can be re-ordered; they are executed from top to bottom.
 " | git stripspace --comment-lines >>"$todo"
diff --git a/sequencer.c b/sequencer.c
index cfa01d3bdd2..e25522ecdf1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -23,6 +23,8 @@
 #include "hashmap.h"
 #include "notes-utils.h"
 #include "sigchain.h"
+#include "unpack-trees.h"
+#include "worktree.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -120,6 +122,13 @@ static GIT_PATH_FUNC(rebase_path_stopped_sha, 
"rebase-merge/stopped-sha")
 static GIT_PATH_FUNC(rebase_path_rewritten_list, "rebase-merge/rewritten-list")
 static GIT_PATH_FUNC(rebase_path_rewritten_pending,
"rebase-merge/rewritten-pending")
+
+/*
+ * The path of the file listing refs that need to be deleted after the rebase
+ * finishes. This is used by the `label` command to record the need for 
cleanup.
+ */
+static GIT_PATH_FUNC(rebase_path_refs_to_delete, "rebase-merge/refs-to-delete")
+
 /*
  * The following files are written by git-rebase just after parsing the
  * command-line (and are only consumed, not modified, by the sequencer).
@@ -244,18 +253,33 @@ static const char *gpg_sign_opt_quoted(struct replay_opts 
*opts)
 
 int sequencer_remove_state(struct replay_opts *opts)
 {
-   struct strbuf dir = STRBUF_INIT;
+   struct strbuf buf = STRBUF_INIT;
int i;
 
+   if (strbuf_read_file(, rebase_path_refs_to_delete(), 0) > 0) {
+   char *p = buf.buf;
+   while (*p) {
+   char *eol = strchr(p, '\n');
+   if (eol)
+   *eol = '\0';
+   if (delete_ref("(rebase -i) cleanup", p, NULL, 0) < 0)
+   warning(_("could not delete '%s'"), p);
+   if (!eol)
+   break;
+   p = eol + 1;
+   }
+   }
+
free(opts->gpg_sign);
free(opts->strategy);
for (i = 0; i < opts->xopts_nr; i++)
free(opts->xopts[i]);
free(opts->xopts);
 
-   strbuf_addstr(, get_dir(opts));
-   remove_dir_recursively(, 0);
-   strbuf_release();
+   strbuf_reset();
+   strbuf_addstr(, get_dir(opts));
+   remove_dir_recursively(, 0);
+   strbuf_release();
 
return 0;
 }
@@ -1280,6 +1304,8 @@ enum todo_command {
TODO_SQUASH,
/* commands that do something else than handling a single commit */
TODO_EXEC,
+   TODO_LABEL,
+   TODO_RESET,
/* commands that do nothing but are counted for reporting progress */
TODO_NOOP,
TODO_DROP,
@@ -1298,6 +1324,8 @@ static struct {
{ 'f', "fixup" },
{ 's',

[PATCH v5 07/12] rebase-helper --make-script: introduce a flag to recreate merges

2018-02-26 Thread Johannes Schindelin

The sequencer just learned new commands intended to recreate branch
structure (similar in spirit to --preserve-merges, but with a
substantially less-broken design).

Let's allow the rebase--helper to generate todo lists making use of
these commands, triggered by the new --recreate-merges option. For a
commit topology like this (where the HEAD points to C):

- A - B - C
\   /
  D

the generated todo list would look like this:

# branch D
pick 0123 A
label branch-point
pick 1234 D
label D

reset branch-point
pick 2345 B
merge -C 3456 D # C

To keep things simple, we first only implement support for merge commits
with exactly two parents, leaving support for octopus merges to a later
patch in this patch series.

As a special, hard-coded label, all merge-recreating todo lists start with
the command `label onto` so that we can later always refer to the revision
onto which everything is rebased.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase--helper.c |   4 +-
 sequencer.c  | 349 ++-
 sequencer.h  |   1 +
 3 files changed, 351 insertions(+), 3 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index ad074705bb5..a5b07c43c96 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -12,7 +12,7 @@ static const char * const builtin_rebase_helper_usage[] = {
 int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
-   unsigned flags = 0, keep_empty = 0;
+   unsigned flags = 0, keep_empty = 0, recreate_merges = 0;
int abbreviate_commands = 0;
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
@@ -24,6 +24,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "keep-empty", _empty, N_("keep empty 
commits")),
OPT_BOOL(0, "allow-empty-message", _empty_message,
N_("allow commits with empty messages")),
+   OPT_BOOL(0, "recreate-merges", _merges, N_("recreate 
merge commits")),
OPT_CMDMODE(0, "continue", , N_("continue rebase"),
CONTINUE),
OPT_CMDMODE(0, "abort", , N_("abort rebase"),
@@ -57,6 +58,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
 
flags |= keep_empty ? TODO_LIST_KEEP_EMPTY : 0;
flags |= abbreviate_commands ? TODO_LIST_ABBREVIATE_CMDS : 0;
+   flags |= recreate_merges ? TODO_LIST_RECREATE_MERGES : 0;
flags |= command == SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
 
if (command == CONTINUE && argc == 1)
diff --git a/sequencer.c b/sequencer.c
index 361ec98f764..01bafe2fe47 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -25,6 +25,8 @@
 #include "sigchain.h"
 #include "unpack-trees.h"
 #include "worktree.h"
+#include "oidmap.h"
+#include "oidset.h"
 
 #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
 
@@ -3336,6 +3338,341 @@ void append_signoff(struct strbuf *msgbuf, int 
ignore_footer, unsigned flag)
strbuf_release();
 }
 
+struct labels_entry {
+   struct hashmap_entry entry;
+   char label[FLEX_ARRAY];
+};
+
+static int labels_cmp(const void *fndata, const struct labels_entry *a,
+ const struct labels_entry *b, const void *key)
+{
+   return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
+}
+
+struct string_entry {
+   struct oidmap_entry entry;
+   char string[FLEX_ARRAY];
+};
+
+struct label_state {
+   struct oidmap commit2label;
+   struct hashmap labels;
+   struct strbuf buf;
+};
+
+static const char *label_oid(struct object_id *oid, const char *label,
+struct label_state *state)
+{
+   struct labels_entry *labels_entry;
+   struct string_entry *string_entry;
+   struct object_id dummy;
+   size_t len;
+   int i;
+
+   string_entry = oidmap_get(>commit2label, oid);
+   if (string_entry)
+   return string_entry->string;
+
+   /*
+* For "uninteresting" commits, i.e. commits that are not to be
+* rebased, and which can therefore not be labeled, we use a unique
+* abbreviation of the commit name. This is slightly more complicated
+* than calling find_unique_abbrev() because we also need to make
+* sure that the abbreviation does not conflict with any other
+* label.
+*
+* We disallow "interesting" commits to be labeled by a string that
+* is a valid full-length hash, to ensure that we always can find an
+* abbreviation for any uninteresting commit's names that does not
+* clash with any other label.
+*/
+   if (!label) {
+   char *p;
+
+

[PATCH v5 06/12] sequencer: fast-forward merge commits, if possible

2018-02-26 Thread Johannes Schindelin

Just like with regular `pick` commands, if we are trying to recreate a
merge commit, we now test whether the parents of said commit match HEAD
and the commits to be merged, and fast-forward if possible.

This is not only faster, but also avoids unnecessary proliferation of
new objects.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 64dbd1d3e2e..361ec98f764 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2651,7 +2651,7 @@ static int do_merge(struct commit *commit, const char 
*arg, int arg_len,
struct commit *head_commit, *merge_commit, *i;
struct commit_list *common, *j, *reversed = NULL;
struct merge_options o;
-   int ret;
+   int can_fast_forward, ret;
static struct lock_file lock;
 
for (merge_arg_len = 0; merge_arg_len < arg_len; merge_arg_len++)
@@ -2719,6 +2719,14 @@ static int do_merge(struct commit *commit, const char 
*arg, int arg_len,
strbuf_release();
}
 
+   /*
+* If HEAD is not identical to the parent of the original merge commit,
+* we cannot fast-forward.
+*/
+   can_fast_forward = opts->allow_ff && commit && commit->parents &&
+   !oidcmp(>parents->item->object.oid,
+   _commit->object.oid);
+
strbuf_addf(_name, "refs/rewritten/%.*s", merge_arg_len, arg);
merge_commit = lookup_commit_reference_by_name(ref_name.buf);
if (!merge_commit) {
@@ -2732,6 +2740,17 @@ static int do_merge(struct commit *commit, const char 
*arg, int arg_len,
rollback_lock_file();
return -1;
}
+
+   if (can_fast_forward && commit->parents->next &&
+   !commit->parents->next->next &&
+   !oidcmp(>parents->next->item->object.oid,
+   _commit->object.oid)) {
+   strbuf_release(_name);
+   rollback_lock_file();
+   return fast_forward_to(>object.oid,
+  _commit->object.oid, 0, opts);
+   }
+
write_message(oid_to_hex(_commit->object.oid), GIT_SHA1_HEXSZ,
  git_path_merge_head(), 0);
write_message("no-ff", 5, git_path_merge_mode(), 0);
-- 
2.16.1.windows.4

[PATCH v5 00/12] rebase -i: offer to recreate merge commits

2018-02-26 Thread Johannes Schindelin

Once upon a time, I dreamt of an interactive rebase that would not
flatten branch structure, but instead recreate the commit topology
faithfully.

My original attempt was --preserve-merges, but that design was so
limited that I did not even enable it in interactive mode.

Subsequently, it *was* enabled in interactive mode, with the predictable
consequences: as the --preserve-merges design does not allow for
specifying the parents of merge commits explicitly, all the new commits'
parents are defined *implicitly* by the previous commit history, and
hence it is *not possible to even reorder commits*.

This design flaw cannot be fixed. Not without a complete re-design, at
least. This patch series offers such a re-design.

Think of --recreate-merges as "--preserve-merges done right". It
introduces new verbs for the todo list, `label`, `reset` and `merge`.
For a commit topology like this:

A - B - C
  \   /
D

the generated todo list would look like this:

# branch D
pick 0123 A
label branch-point
pick 1234 D
label D

reset branch-point
pick 2345 B
merge -C 3456 D # C

There are more patches in the pipeline, based on this patch series, but
left for later in the interest of reviewable patch series: one mini
series to use the sequencer even for `git rebase -i --root`, and another
one to add support for octopus merges to --recreate-merges.

Changes since v3:

- (sorry for the broken iteration v4)

- fixed a grammar error in "introduce the `merge` command"'s commit message.

- fixed a couple of resource leaks in safe_append() and do_reset(), pointed
  out by Eric Sunshine.


Johannes Schindelin (11):
  sequencer: avoid using errno clobbered by rollback_lock_file()
  sequencer: make rearrange_squash() a bit more obvious
  sequencer: introduce new commands to reset the revision
  sequencer: introduce the `merge` command
  sequencer: fast-forward merge commits, if possible
  rebase-helper --make-script: introduce a flag to recreate merges
  rebase: introduce the --recreate-merges option
  sequencer: make refs generated by the `label` command worktree-local
  sequencer: handle post-rewrite for merge commands
  pull: accept --rebase=recreate to recreate the branch topology
  rebase -i: introduce --recreate-merges=[no-]rebase-cousins

Stefan Beller (1):
  git-rebase--interactive: clarify arguments

 Documentation/config.txt   |   8 +
 Documentation/git-pull.txt |   5 +-
 Documentation/git-rebase.txt   |  14 +-
 builtin/pull.c |  14 +-
 builtin/rebase--helper.c   |  13 +-
 builtin/remote.c   |   2 +
 contrib/completion/git-completion.bash |   4 +-
 git-rebase--interactive.sh |  22 +-
 git-rebase.sh  |  16 +
 refs.c |   3 +-
 sequencer.c| 742 -
 sequencer.h|   7 +
 t/t3430-rebase-recreate-merges.sh  | 208 +
 13 files changed, 1027 insertions(+), 31 deletions(-)
 create mode 100755 t/t3430-rebase-recreate-merges.sh


base-commit: e3a80781f5932f5fea12a49eb06f3ade4ed8945c
Published-As: https://github.com/dscho/git/releases/tag/recreate-merges-v5
Fetch-It-Via: git fetch https://github.com/dscho/git recreate-merges-v5

Interdiff vs v4:
 diff --git a/sequencer.c b/sequencer.c
 index 63ae71a7512..b2bf63029d4 100644
 --- a/sequencer.c
 +++ b/sequencer.c
 @@ -2514,14 +2514,17 @@ static int safe_append(const char *filename, const 
char *fmt, ...)
  
if (write_in_full(fd, buf.buf, buf.len) < 0) {
error_errno(_("could not write to '%s'"), filename);
 +  strbuf_release();
rollback_lock_file();
return -1;
}
if (commit_lock_file() < 0) {
 +  strbuf_release();
rollback_lock_file();
return error(_("failed to finalize '%s'"), filename);
}
  
 +  strbuf_release();
return 0;
  }
  
 @@ -2601,8 +2604,11 @@ static int do_reset(const char *name, int len, struct 
replay_opts *opts)
unpack_tree_opts.update = 1;
unpack_tree_opts.reset = 1;
  
 -  if (read_cache_unmerged())
 +  if (read_cache_unmerged()) {
 +  rollback_lock_file();
 +  strbuf_release(_name);
return error_resolve_conflict(_(action_name(opts)));
 +  }
  
if (!fill_tree_descriptor(, )) {
error(_("failed to find tree of %s"), oid_to_hex());
-- 
2.16.1.windows.4

[PATCH v5 01/12] sequencer: avoid using errno clobbered by rollback_lock_file()

2018-02-26 Thread Johannes Schindelin

As pointed out in a review of the `--recreate-merges` patch series,
`rollback_lock_file()` clobbers errno. Therefore, we have to report the
error message that uses errno before calling said function.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index e9baaf59bd9..5aa3dc3c95c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -345,12 +345,14 @@ static int write_message(const void *buf, size_t len, 
const char *filename,
if (msg_fd < 0)
return error_errno(_("could not lock '%s'"), filename);
if (write_in_full(msg_fd, buf, len) < 0) {
+   error_errno(_("could not write to '%s'"), filename);
rollback_lock_file(_file);
-   return error_errno(_("could not write to '%s'"), filename);
+   return -1;
}
if (append_eol && write(msg_fd, "\n", 1) < 0) {
+   error_errno(_("could not write eol to '%s'"), filename);
rollback_lock_file(_file);
-   return error_errno(_("could not write eol to '%s'"), filename);
+   return -1;
}
if (commit_lock_file(_file) < 0) {
rollback_lock_file(_file);
@@ -2106,16 +2108,17 @@ static int save_head(const char *head)
 
fd = hold_lock_file_for_update(_lock, git_path_head_file(), 0);
if (fd < 0) {
+   error_errno(_("could not lock HEAD"));
rollback_lock_file(_lock);
-   return error_errno(_("could not lock HEAD"));
+   return -1;
}
strbuf_addf(, "%s\n", head);
written = write_in_full(fd, buf.buf, buf.len);
strbuf_release();
if (written < 0) {
+   error_errno(_("could not write to '%s'"), git_path_head_file());
rollback_lock_file(_lock);
-   return error_errno(_("could not write to '%s'"),
-  git_path_head_file());
+   return -1;
}
if (commit_lock_file(_lock) < 0) {
rollback_lock_file(_lock);
-- 
2.16.1.windows.4

Re: [PATCH] t/known-leaky: add list of known-leaky test scripts

2018-02-26 Thread Martin Ågren

On 25 February 2018 at 04:48, Kaartic Sivaraam
 wrote:
> On Wednesday 21 February 2018 02:14 AM, Martin Ågren wrote:
>> To sum up: I probably won't be looking into Travis-ing such a blacklist
>> in the near future.
>>
>
> Just thinking out loud, how about having a white-list instead of a
> black-list and using it to run only those tests in the white list.
> Something like,
>
> t/white_list
> 
> t
> t0001
>
> To run
> --
>
> cd t/
> for test in $(cat white_list)
> do
> white_list_tests="$white_list_tests $test*"
> done
> make SANITIZE=leak $white_list_tests

Yeah, that would also work. An incomplete whitelist can't cause errors
for those running other tests, as an incomplete blacklist could. So
that's good. At some point, the whitelist would need to be turned into a
blacklist. At the very latest when the whitelist is the full set of
tests, in order to flip the default of new tests. ;-) Right now, I think
whitelists and blacklists are about equally useful.

Let's hope we're heading for a future where a blacklist gets more and
more feasible, whereas a whitelist would get longer and longer. ;-)

Martin

Re: [PATCH 08/27] pack: move approximate object count to object store

2018-02-26 Thread Stefan Beller

On Mon, Feb 26, 2018 at 12:55 AM, Jeff King  wrote:
> On Fri, Feb 23, 2018 at 02:22:14PM -0800, Stefan Beller wrote:
>
>> >> + /*
>> >> +  * A fast, rough count of the number of objects in the repository.
>> >> +  * These two fields are not meant for direct access. Use
>> >> +  * approximate_object_count() instead.
>> >> +  */
>> >> + unsigned long approximate_object_count;
>> >> + unsigned approximate_object_count_valid : 1;
>> >
>> > Patch looks fine and is effectively a no-op, though what is the need for
>> > both of these variables?  Maybe it can be simplified down to just use
>> > one?  Just musing as its out of the scope of this patch and we probably
>> > shouldn't try to change that in this series.
>>
>> I agree we should. It was introduced in e323de4ad7f (sha1_file:
>> allow sha1_loose_object_info to handle arbitrary repositories, 2017-08-30)
>> and I think it was seen as a clever optimization trick back then?
>
> I think you meant 8e3f52d778 (find_unique_abbrev: move logic out of
> get_short_sha1(), 2016-10-03)?

Yes, copy paste error.

>
> Yes, it was just to avoid the dual-meaning of "0" for "not set" and a
> repository with no packfiles.  It would probably be fine to get rid of
> it. If you have no packfiles then you probably don't have enough objects
> to worry about micro-optimizing. And anyway, the "wasted" case wouldn't
> even make any syscalls (it would do a noop prepare_packed_git() and then
> realize the packed_git list is empty).

Good point!

>
> -Peff

Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)

2018-02-26 Thread Stefan Beller

On Sat, Feb 24, 2018 at 7:00 AM, Duy Nguyen  wrote:
> I notice that there are a few global state (or configuration rather)
> left after this: packed_git_window_size, packed_git_limit and
> delta_base_cache_limit. These can be changed by $GIT_DIR/config, but
> since it's still global, a submodule repository will use the same
> settings of its super repository. If $SUBMODULE/config changes any of
> these, they are ignored.

That sounds all packing related, which I plan on working on next.

> The natural thing to do is move these to raw_object_store too (and
> repo_submodule_init needs to check submodule's config file). But one
> may argue these should be per-process instead of per-repo though. I
> don't know. But I thought I should mention this.

For now a process and a repository is the same as git-gc or git-repack
doesn't know about the --recurse-submodules flag, yet.
I wonder if we ever want to teach those commands the submodule
recursion, because of the issue you bring up, which settings do we apply
for a submodule? Traditionally we'd just have the command line override
the configuration, which I don't know if it is a good idea for these settings.

Thanks,
Stefan

Re: [PATCH v4 00/12] rebase -i: offer to recreate merge commits

2018-02-26 Thread Johannes Schindelin

Hi Jake,

On Sun, 25 Feb 2018, Jacob Keller wrote:

> On Fri, Feb 23, 2018 at 4:35 AM, Johannes Schindelin
>  wrote:
> > Changes since v3:
> >
> > - fixed a grammar error in "introduce the `merge` command"'s commit message.
> >
> > - fixed a couple of resource leaks in safe_append() and do_reset(), pointed
> >   out by Eric Sunshine.
> >
> 
> The interdiff seems incorrect for such a small list of changes, it
> appears like large sections of code added by this series appear in the
> interdiff without subtractions from the previous versions? Is all that
> code new to v3? If not, I'd suspect you accidentally diffed between
> the wrong points.

Indeed, it seems that I messed this iteration up rather well. Will redo.

Ciao,
Dscho

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-26 Thread Jeff King

On Mon, Feb 26, 2018 at 06:35:33PM +0100, Torsten Bögershausen wrote:

> > diff --git a/userdiff.c b/userdiff.c
> > index dbfb4e13cd..48fa7e8bdd 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -161,6 +161,7 @@ IPATTERN("css",
> >  "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
> >  "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> >  ),
> > +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
> >  { "default", NULL, -1, { NULL, 0 } },
> >  };
> >  #undef PATTERNS
> 
> The patch looks like a possible step into the right direction -
> some minor notes: "utf8" is better written as "UTF-8", when talking
> to iconv.h, same for utf16.
> 
> But, how do I activate the diff ?
> I have in .gitattributes
> XXXenglish.txt diff=UTF-16
> 
> and in .git/config
> [diff "UTF-16"]
>   command = iconv:UTF-16
> 
> 
> What am I doing wrong ?

After applying the patch, if I do:

  git init
  echo hello | iconv -f utf8 -t utf16 >file
  git add file
  git commit -m one
  echo goodbye | iconv -f utf8 -t utf16 >file
  git add file
  git commit -m two

then:

  git log -p

shows "binary files differ" but:

  echo "file diff=utf16" >.gitattributes
  git log -p

shows text diffs. I assume you tweaked the patch before switching to
the UTF-16 spelling in your example. Did you use a plumbing command to
show the diff? textconv isn't enabled for plumbing, because the
resulting patches cannot actually be applied (in that sense an encoding
switch is potentially special, since in theory one could convert to the
canonical text format, apply the patch, and then convert back).

-Peff

Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Stefan Beller

On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duy  wrote:
> It turns out I don't need my other series [1] in order to delete this
> field. This series moves getenv() calls from
> repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in
> environment.c where they belong in my opinion.
>
> The repo_set_gitdir() now takes $GIT_DIR and optionally all other
> configurable paths. If those paths are NULL, default repo layout will
> be used. With getenv() no longer called inside repo_set_gitdir(),
> ignore_env has no reason to stay. This is in 1/4.
>
> The getenv() in prepare_alt_odb() is also moved back to
> setup_git_env() in 3/4. It demonstrates how we could move other
> getenv() back to if we want.
>
> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.
>

Thanks for working on this,
I found this series a pleasant read, the only issue I saw was Erics upbringing
of multiple getenv calls without strdup()ing the content.

What is the plan from here on?
Should I build further series on top of yours? The next series will
focus on the pack side of things (pack.h, packfile.{c,h})

So maybe we'll have Junio merge down my series (and yours as it
needs one reroll?) and then build on top of that?

Thanks,
Stefan

Re: [PATCH] sha1_name: fix uninitialized memory errors

2018-02-26 Thread Jeff King

On Mon, Feb 26, 2018 at 09:56:47AM -0500, Derrick Stolee wrote:

> diff --git a/sha1_name.c b/sha1_name.c
> index 611c7d2..44dd595 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -546,17 +546,12 @@ static void find_abbrev_len_for_pack(struct packed_git 
> *p,
>* nearby for the abbreviation length.
>*/
>   mad->init_len = 0;
> - if (!match) {
> - nth_packed_object_oid(, p, first);
> + if (!match && nth_packed_object_oid(, p, first))
>   extend_abbrev_len(, mad);
> - } else if (first < num - 1) {
> - nth_packed_object_oid(, p, first + 1);
> + else if (first < num - 1 && nth_packed_object_oid(, p, first + 1))
>   extend_abbrev_len(, mad);
> - }

I think including the nth_packed_object_oid() in the main if-else chain
works out, but it's kind of tricky.

In the code before, we'd hit the "first < num - 1" conditional only when
we didn't match something. But now we also hit it if we _did_ match
something, but nth_packed_object_oid() didn't work.

But this works out the same if we assume any match must also succeed at
nth_packed_object_oid(). Which in turn implies that checking the result
of nth_packed_object_oid() in the "else if" is redundant (though we
already clamp it to "num - 1", so we'd expect it to always succeed
anyway).

So I think this behaves well, but I wonder if the two-level conditionals
like:

  if (!match) {
if (nth_packed_object_oid(, p, first))
extend_abbrev_len(, mad);
  } else if ...

are easier to reason about.

-Peff

Re: [PATCH 1/4] repository.c: move env-related setup code back to environment.c

2018-02-26 Thread Stefan Beller

> diff --git a/setup.c b/setup.c
> index c5d55dcee4..6fac1bb58a 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1116,8 +1116,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
> const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
> if (!gitdir)
> gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
> -   repo_set_gitdir(the_repository, gitdir);
> -   setup_git_env();
> +   setup_git_env(gitdir);
> }

What makes git_dir special, such that we have to pass it in here?
Could the check for getenv(GIT_DIR_ENVIRONMENT) and fallback to
DEFAULT_... be part of setup_git_env() ?
Oh I guess that cannot be done easily as the submodule code
spcifically doesn't want to discover the git dir itself.


> if (startup_info->have_repository)
> repo_set_hash_algo(the_repository, 
> repo_fmt.hash_algo);
> --
> 2.16.1.435.g8f24da2e1a
>

Re: [PATCHv4 01/27] repository: introduce raw object store field

2018-02-26 Thread Stefan Beller

On Mon, Feb 26, 2018 at 1:30 AM, Duy Nguyen  wrote:
> On Fri, Feb 23, 2018 at 04:47:28PM -0800, Stefan Beller wrote:
>>  /* The main repository */
>>  static struct repository the_repo = {
>> - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
>> _algos[GIT_HASH_SHA1], 0, 0
>> + NULL, NULL,
>> + RAW_OBJECT_STORE_INIT,
>> + NULL, NULL, NULL,
>> + NULL, NULL, NULL,
>> + _index,
>> + _algos[GIT_HASH_SHA1],
>> + 0, 0
>>  };
>>  struct repository *the_repository = _repo;
>
> I wonder if we should do something like this. It makes the_repo
> initialization easier to read and it helps unify the init code with
> submodule.
>
> No don't reroll. If you think it's a good idea, you can do something
> like this in the next series instead.
>
> -- 8< --
> diff --git a/check-racy.c b/check-racy.c
> index 24b6542352..47cbb4eb6d 100644
> --- a/check-racy.c
> +++ b/check-racy.c

totally offtopic: Do we want to move this file into t/helper?

> --- a/common-main.c
> +++ b/common-main.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "exec_cmd.h"
>  #include "attr.h"
> +#include "repository.h"
>
>  /*
>   * Many parts of Git have subprograms communicate via pipe, expect the
> @@ -32,6 +33,8 @@ int main(int argc, const char **argv)
>  */
> sanitize_stdfds();
>
> +   init_the_repository();
> +

So this is the real deal.

> -#define RAW_OBJECT_STORE_INIT { NULL }
>
> +void raw_object_store_init(struct raw_object_store *o);
>  void raw_object_store_clear(struct raw_object_store *o);
>
>  #endif /* OBJECT_STORE_H */
> diff --git a/object.c b/object.c
> index 11d904c033..8a4d01dd5f 100644
> --- a/object.c
> +++ b/object.c
> @@ -446,6 +446,11 @@ void clear_commit_marks_all(unsigned int flags)
> }
>  }
>
> +void raw_object_store_init(struct raw_object_store *o)
> +{
> +   memset(o, 0, sizeof(*o));
> +}

We'll have to adapt this for the list that we use for packed_git_mru,
but that should be no problem.

> +
> +static void repo_pre_init(struct repository *repo)
> +{
> +   memset(repo, 0, sizeof(*repo));
> +   raw_object_store_init(>objects);
> +}
> +
> +void init_the_repository(void)
> +{
> +   the_repository = _repo;
> +   repo_pre_init(the_repository);
> +   the_repository->index = _index;
> +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +}
>
>  static char *git_path_from_env(const char *envvar, const char *git_dir,
>const char *path, int fromenv)
> @@ -138,7 +144,8 @@ static int read_and_verify_repository_format(struct 
> repository_format *format,
>  int repo_init(struct repository *repo, const char *gitdir, const char 
> *worktree)
>  {
> struct repository_format format;
> -   memset(repo, 0, sizeof(*repo));
> +
> +   repo_pre_init(repo);
>
> repo->ignore_env = 1;
>

I like the approach. Though as you say, let's have it in the next series.

Thanks for the thoughts and guidance,
Stefan

Re: [PATCH v3 0/6] Fix initializing the_hash_algo

2018-02-26 Thread Stefan Beller

On Sun, Feb 25, 2018 at 12:34 PM, brian m. carlson
 wrote:
> On Sun, Feb 25, 2018 at 06:18:34PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> v3 refines v2 a bit more:
>>
>> - fix configure typo (and stray words in commit message)
>> - use repo_set_hash_algo() instead of reassigning the_hash_algo
>> - compare hash algos by format_id
>> - catch NULL hash algo, report nicely and suggest GIT_HASH_FIXUP
>>
>> The last point makes me much happier about keeping this workaround
>> around until we are confident we can live without it. Interdiff
>
> This looks sane to me.
>
> Reviewed-by: brian m. carlson 

I agree with this version as well.

Thanks,
Stefan

Re: [PATCH 1/4] repository.c: move env-related setup code back to environment.c

2018-02-26 Thread Eric Sunshine

On Mon, Feb 26, 2018 at 5:30 AM, Nguyễn Thái Ngọc Duy  wrote:
> It does not make sense that generic repository code contains handling
> of environment variables, which are specific for the main repository
> only. Refactor repo_set_gitdir() function to take $GIT_DIR and
> optionally _all_ other customizable paths. These optional paths can be
> NULL and will be calculated according to the default directory layout.
>
> Note that some dead functions are left behind to reduce diff
> noise. They will be deleted in the next patch.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/environment.c b/environment.c
> @@ -148,10 +148,17 @@ static char *expand_namespace(const char *raw_namespace)
> -void setup_git_env(void)
> +void setup_git_env(const char *git_dir)
>  {
> const char *shallow_file;
> const char *replace_ref_base;
> +   struct set_gitdir_args args = { NULL };
> +
> +   args.shared_root = getenv(GIT_COMMON_DIR_ENVIRONMENT);
> +   args.object_dir = getenv(DB_ENVIRONMENT);
> +   args.graft_file = getenv(GRAFT_ENVIRONMENT);
> +   args.index_file = getenv(INDEX_ENVIRONMENT);

According to POSIX[1], the result of getenv() may be invalidated by
another call to getenv() (or setenv() or unsetenv() or putenv()).

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

> +   repo_set_gitdir(the_repository, git_dir, );
> diff --git a/repository.c b/repository.c
> @@ -61,15 +61,50 @@ static void repo_setup_env(struct repository *repo)
> +void repo_set_gitdir(struct repository *repo,
> +const char *root,
> +const struct set_gitdir_args *o)
> +{
> +   const char *gitfile = read_gitfile(root);
> +   char *old_gitdir = repo->gitdir;
> +
> +   repo->gitdir = xstrdup(gitfile ? gitfile : root);
> free(old_gitdir);

Can:

char *old_gitdir = repo->gitdir;
repo->gitdir = xstrdup(...);
free(old_gitdir);

be simplified to:

free(repo->gitdir);
repo->gitdir = xstrdup(...);

?

> +   repo_set_commondir(repo, o->shared_root);

The repo_set_gitdir() prototype (below) makes it seem as if the last
argument ('o', in this case) is optional, presumably by passing in
NULL, however, this code does not appear to be prepared to deal with
NULL.

> +   expand_base_dir(>objects.objectdir, o->object_dir,
> +   repo->commondir, "objects");
> +   expand_base_dir(>graft_file, o->graft_file,
> +   repo->commondir, "info/grafts");
> +   expand_base_dir(>index_file, o->index_file,
> +   repo->gitdir, "index");
>  }
> diff --git a/repository.h b/repository.h
> @@ -89,7 +89,16 @@ struct repository {
> +struct set_gitdir_args {
> +   [...]
> +};
> +
> +extern void repo_set_gitdir(struct repository *repo,
> +   const char *root,
> +   const struct set_gitdir_args *optional);

Re: [PATCH/RFC 1/1] Auto diff of UTF-16 files in UTF-8

2018-02-26 Thread Peter Krefting


tbo...@web.de:


+static inline int buffer_has_utf16_bom(const void *buf, size_t len) {
+  const unsigned char *text = (unsigned char *)buf;
+  if (!text ||  len < 2)
+return 0;
+  if (text[0] == 0xff && text[1] == 0xfe)
+return 1;
+  if (text[0] == 0xfe && text[1] == 0xff)
+return 1;
+  return 0;
+}


This would also match a UTF-32LE BOM, not that I think anyone would 
actually use that for text files, but it might be worth adding a test 
for that, just in case?


  if (text[0] == 0xff && text[1] == 0xfe)
return len < 4 || (text[2] != 0 && text[3] != 0);

--
\\// Peter - http://www.softwolves.pp.se/

Re: [PATCHv4 01/27] repository: introduce raw object store field

2018-02-26 Thread Brandon Williams

On 02/26, Junio C Hamano wrote:
> Duy Nguyen  writes:
> 
> > diff --git a/common-main.c b/common-main.c
> > index 6a689007e7..a13ab981aa 100644
> > --- a/common-main.c
> > +++ b/common-main.c
> > @@ -1,6 +1,7 @@
> >  #include "cache.h"
> >  #include "exec_cmd.h"
> >  #include "attr.h"
> > +#include "repository.h"
> >  
> >  /*
> >   * Many parts of Git have subprograms communicate via pipe, expect the
> > @@ -32,6 +33,8 @@ int main(int argc, const char **argv)
> >  */
> > sanitize_stdfds();
> >  
> > +   init_the_repository();
> > +
> > git_setup_gettext();
> > ...
> > +void init_the_repository(void)
> > +{
> > +   the_repository = _repo;
> > +   repo_pre_init(the_repository);
> > +   the_repository->index = _index;
> > +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> > +}
> 
> I see what you did here, and I like it.

I thought this would be a good idea to do eventually but back when I
first introduced struct repository there wasn't enough to justify it
till now.  This definitely makes it much easier to read the
initialization and I prefer this over the initializer.  Thanks for
working on this :)

-- 
Brandon Williams

Re: [PATCHv4 01/27] repository: introduce raw object store field

2018-02-26 Thread Junio C Hamano

Duy Nguyen  writes:

> diff --git a/common-main.c b/common-main.c
> index 6a689007e7..a13ab981aa 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "exec_cmd.h"
>  #include "attr.h"
> +#include "repository.h"
>  
>  /*
>   * Many parts of Git have subprograms communicate via pipe, expect the
> @@ -32,6 +33,8 @@ int main(int argc, const char **argv)
>*/
>   sanitize_stdfds();
>  
> + init_the_repository();
> +
>   git_setup_gettext();
> ...
> +void init_the_repository(void)
> +{
> + the_repository = _repo;
> + repo_pre_init(the_repository);
> + the_repository->index = _index;
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +}

I see what you did here, and I like it.

Re: [PATCHv4 00/27] Moving global state into the repository object (part 1)

2018-02-26 Thread Jonathan Tan

On Fri, 23 Feb 2018 16:47:27 -0800
Stefan Beller  wrote:

> v4:
> * addressed feedback from the single patches (mostly nits)
> * rebased on latest master

The patches I looked at previously (patches 7, 15, 19, 22, and 24) look
good to me.

Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.

As a follow-up fix for combined "move things to the_repo as much as
possible" efforts I think what you did makes most sense.  I would
say that it makes even more sense to make these part of the
object-store series if/when the series needs rerolling--after all
the topic is already a multi-author effort.

I'll have to read them carefully before commenting on the actual
patches ;-)  Thanks.

>
> [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/
>
> Nguyễn Thái Ngọc Duy (4):
>   repository.c: move env-related setup code back to environment.c
>   repository.c: delete dead functions
>   sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
>   repository: delete ignore_env member
>
>  cache.h|  2 +-
>  environment.c  | 13 +++--
>  object-store.h |  5 +++-
>  object.c   |  1 +
>  repository.c   | 79 ++
>  repository.h   | 21 +++---
>  setup.c|  3 +-
>  sha1_file.c|  6 +---
>  8 files changed, 64 insertions(+), 66 deletions(-)

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-26 Thread Torsten Bögershausen

On Sun, Feb 25, 2018 at 08:44:46PM -0500, Jeff King wrote:
> On Sat, Feb 24, 2018 at 04:18:36PM +0100, Lars Schneider wrote:
> 
> > > We always use the in-repo contents when generating 'diff'.  I think
> > > by "attribute to be used in diff", what you are reallying after is
> > > to convert the in-repo contents to that encoding _BEFORE_ running
> > > 'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
> > > the place will not diff well with the xdiff machinery, but if you
> > > first convert it to UTF-8 and have xdiff work on it, you can get
> > > reasonable result out of it.  It is unclear what encoding you want
> > > your final diff output in (it is equally valid in such a set-up to
> > > desire your patch output in UTF-16 or UTF-8), but assuming that you
> > > want UTF-8 in your patch output, perhaps we do not have to break
> > > gitk users by hijacking the 'encoding' attribute.  Instead what you
> > > want is a single bit that says between in-repo or working tree which
> > > representation should be given to the xdiff machinery.
> > 
> > I fear that we could confuse users with an additional knob/bit that
> > defines what we diff against. Git always diff'ed against in-repo 
> > content and I feel it should stay that way.
> 
> Well, except for textconv. You can already do this:
> 
>   echo "foo diff=utf16" >.gitattributes
>   git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
> 
> We could make that easier to use and much more efficient by:
> 
>   1. Allowing a special syntax for textconv filters that kicks off an
>  internal iconv.
> 
>   2. Providing baked-in config for utf16.
> 
> The patch below provides a sketch. But I think Torsten raised a good
> point that you might want the encoding conversion to be independent of
> other diff characteristics (so, e.g., you might say "this is utf16 but
> once converted treat it like C code for finding funcnames, etc").
> 
> ---
> diff --git a/diff.c b/diff.c
> index 21c3838b25..04032e059c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5968,6 +5968,21 @@ struct diff_filepair *diff_unmerge(struct diff_options 
> *options, const char *pat
>   return pair;
>  }
>  
> +static char *iconv_textconv(const char *encoding, struct diff_filespec *spec,
> + size_t *outsize)
> +{
> + char *ret;
> + int outsize_int; /* this really should be a size_t */
> +
> + if (diff_populate_filespec(spec, 0))
> + die("unable to load content for %s", spec->path);
> + ret = reencode_string_len(spec->data, spec->size,
> +   "utf-8", /* should be log_output_encoding? */
> +   encoding, _int);
> + *outsize = outsize_int;
> + return ret;
> +}
> +
>  static char *run_textconv(const char *pgm, struct diff_filespec *spec,
>   size_t *outsize)
>  {
> @@ -5978,6 +5993,9 @@ static char *run_textconv(const char *pgm, struct 
> diff_filespec *spec,
>   struct strbuf buf = STRBUF_INIT;
>   int err = 0;
>  
> + if (skip_prefix(pgm, "iconv:", ))
> + return iconv_textconv(pgm, spec, outsize);
> +
>   temp = prepare_temp_file(spec->path, spec);
>   *arg++ = pgm;
>   *arg++ = temp->name;
> diff --git a/userdiff.c b/userdiff.c
> index dbfb4e13cd..48fa7e8bdd 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -161,6 +161,7 @@ IPATTERN("css",
>"-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
>"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
>  ),
> +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS

The patch looks like a possible step into the right direction -
some minor notes: "utf8" is better written as "UTF-8", when talking
to iconv.h, same for utf16.

But, how do I activate the diff ?
I have in .gitattributes
XXXenglish.txt diff=UTF-16

and in .git/config
[diff "UTF-16"]
  command = iconv:UTF-16


What am I doing wrong ?

Re: [PATCH v2] subtree: fix add and pull for GPG-signed commits

2018-02-26 Thread Junio C Hamano

Stephen R Guglielmo  writes:

> On Fri, Feb 23, 2018 at 5:45 PM, Junio C Hamano  wrote:
>...
>> I am however starting to feel that
>> ...
>> may be a better approach.
> ...
> I'm happy to develop a new patch based on your recommendations. Should
> it be on top of the previous patch I sent or should it replace the
> previous patch?

Even though I said "may be a better approach", to be bluntly honest,
I do not think it is _that_ _much_ better than what you sent ;-)  So
please do the "on top of the previous patch" thing as an independent
effort (not the "git log" workaround bugfix) only if you deeply care
about improving "subtree" script (as opposed to just want to see the
immediate glitch corrected to get on with your life---I happen to be
in the latter category with respect to this issue).  The independent
effort's focus would instead be to improve the script not to make so
heavy use of "git log" (and other Porcelain commands) in it, so that
we do not have to tweak the script to undo improvements we will make
to the Porcelain commands for better human experience in the future.

Notice that one hunk in this patch is a small step in the direction;
it stops using "git log -1" and uses "rev-parse" instead.

For the remainder of the script, we need to identify how "git log"
is used in the script and what they are used for, and then rewrite
them with the more stable lower level interface.  It is a very first
step to mark invocation sites by replacing "git log" with "$git_log"
;-)  

The same may apply to uses of other Porcelain commands.  A general
rule is that scripts should avoid using Porcelain commands unless
they are interested in giving output for human-consumption directly
out of these commands (as opposed to running the Git commands and
then reading their output and reacting to it).

Thanks.

[no subject]

2018-02-26 Thread tboegi

>From 9f7d43f29eaf6017b7b16261ce91d8ef182cf415 Mon Sep 17 00:00:00 2001
In-Reply-To: <20171218131249.gb4...@sigill.intra.peff.net>
References: <20171218131249.gb4...@sigill.intra.peff.net>
From: =?UTF-8?q?Torsten=20B=C3=B6gershausen?= 
Date: Fri, 23 Feb 2018 20:53:34 +0100
Subject: [PATCH 0/1] Auto diff of UTF-16 files in UTF-8
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Make it possible to show a user-readable diff for UTF-16 encoded files.
This would replace the "binary files differ" with something useful,
without breaking anything for existing users (?).
For future repos the w-t-e encoding can be used, which allows e.g. easier
merging.
People which stick to native UTF-16 because they need the compatiblity
with e.g. libgit2 can still get a readable diff.
Opinions ?

Torsten Bögershausen (1):
  Auto diff of UTF-16 files in UTF-8

 diff.c   | 43 -
 diffcore.h   |  3 ++
 t/t4066-diff-encoding.sh | 98 
 utf8.h   | 11 ++
 4 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100755 t/t4066-diff-encoding.sh

-- 
2.16.1.194.gb2e45c695d

[PATCH/RFC 1/1] Auto diff of UTF-16 files in UTF-8

2018-02-26 Thread tboegi

From: Torsten Bögershausen 

When an UTF-16 file is commited and later changed, `git diff` shows
"Binary files XX and YY differ".

When the user wants a diff in UTF-8, a textconv needs to be specified
in .gitattributes and the textconv must be configured.

A more user-friendly diff can be produced for UTF-16 if
- the user did not use `git diff --binary`
- the blob is identified as binary
- the blob has an UTF-16 BOM
- the blob can be converted into UTF-8

Enhance the diff machinery to auto-detect UTF-16 blobs and show them
as UTF-8, unless the user specifies `git diff --binary` which creates
a binary diff.

Signed-off-by: Torsten Bögershausen 
---
 diff.c   | 43 -
 diffcore.h   |  3 ++
 t/t4066-diff-encoding.sh | 98 
 utf8.h   | 11 ++
 4 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100755 t/t4066-diff-encoding.sh

diff --git a/diff.c b/diff.c
index fb22b19f09..51831ee94d 100644
--- a/diff.c
+++ b/diff.c
@@ -3192,6 +3192,10 @@ static void builtin_diff(const char *name_a,
strbuf_reset();
}
 
+   if (one && one->reencoded_from_utf16)
+   strbuf_addf(, "a is converted to UTF-8 from 
UTF-16\n");
+   if (two && two->reencoded_from_utf16)
+   strbuf_addf(, "b is converted to UTF-8 from 
UTF-16\n");
mf1.size = fill_textconv(textconv_one, one, );
mf2.size = fill_textconv(textconv_two, two, );
 
@@ -3611,8 +3615,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->size = size;
s->should_free = 1;
}
-   }
-   else {
+   if (!s->binary && buffer_is_binary(s->data, s->size) &&
+   buffer_has_utf16_bom(s->data, s->size)) {
+   int outsz = 0;
+   char *outbuf;
+   outbuf = reencode_string_len(s->data, (int)s->size,
+"UTF-8", "UTF-16", );
+   if (outbuf) {
+   if (s->should_free)
+   free(s->data);
+   if (s->should_munmap)
+   munmap(s->data, s->size);
+   s->should_munmap = 0;
+   s->data = outbuf;
+   s->size = outsz;
+   s->reencoded_from_utf16 = 1;
+   s->should_free = 1;
+   }
+   }
+   } else {
enum object_type type;
if (size_only || (flags & CHECK_BINARY)) {
type = sha1_object_info(s->oid.hash, >size);
@@ -3629,6 +3650,19 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->data = read_sha1_file(s->oid.hash, , >size);
if (!s->data)
die("unable to read %s", oid_to_hex(>oid));
+   if (!s->binary && buffer_is_binary(s->data, s->size) &&
+   buffer_has_utf16_bom(s->data, s->size)) {
+   int outsz = 0;
+   char *buf;
+   buf = reencode_string_len(s->data, (int)s->size,
+ "UTF-8", "UTF-16", );
+   if (buf) {
+   free(s->data);
+   s->data = buf;
+   s->size = outsz;
+   s->reencoded_from_utf16 = 1;
+   }
+   }
s->should_free = 1;
}
return 0;
@@ -5695,6 +5729,10 @@ static int diff_filespec_is_identical(struct 
diff_filespec *one,
 
 static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 {
+   if (p->binary) {
+   p->one->binary = 1;
+   p->two->binary = 1;
+   }
if (p->done_skip_stat_unmatch)
return p->skip_stat_unmatch_result;
 
@@ -5735,6 +5773,7 @@ static void diffcore_skip_stat_unmatch(struct 
diff_options *diffopt)
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
 
+   p->binary = diffopt->flags.binary;
if (diff_filespec_check_stat_unmatch(p))
diff_q(, p);
else {
diff --git a/diffcore.h b/diffcore.h
index a30da161da..3cd97bb93b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -47,6 +47,8 @@ struct diff_filespec {
unsigned has_more_entries : 1; /* only appear in combined diff */
/* data should be considered "binary"; -1 means "don't know yet" */
signed int is_binary : 2;
+   unsigned binary : 1;
+

Re: [PATCH v4 03/13] commit-graph: create git-commit-graph builtin

2018-02-26 Thread Derrick Stolee


On 2/26/2018 11:25 AM, SZEDER Gábor wrote:

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Since 'git commit-graph' is a builtin command, it shouldn't show up in
completion when doing 'git co'.
Please squash in the patch below to make it so.

Furthermore, please have a look at
   
   https://public-inbox.org/git/20180202160132.31550-1-szeder@gmail.com/


for an other oneliner change.


diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 17929b0809..fafed13c06 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
check-ref-format) : plumbing;;
checkout-index)   : plumbing;;
column)   : internal helper;;
+   commit-graph) : plumbing;;
commit-tree)  : plumbing;;
count-objects): infrequent;;
credential)   : credentials;;


Thanks for this, and the reminder. I made these changes locally, so they 
will be in v5.


-Stolee

Re: [PATCH v4 03/13] commit-graph: create git-commit-graph builtin

2018-02-26 Thread SZEDER Gábor

> Teach git the 'commit-graph' builtin that will be used for writing and
> reading packed graph files. The current implementation is mostly
> empty, except for an '--object-dir' option.

Since 'git commit-graph' is a builtin command, it shouldn't show up in
completion when doing 'git co'.
Please squash in the patch below to make it so.

Furthermore, please have a look at
  
  https://public-inbox.org/git/20180202160132.31550-1-szeder@gmail.com/

for an other oneliner change.


diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 17929b0809..fafed13c06 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
check-ref-format) : plumbing;;
checkout-index)   : plumbing;;
column)   : internal helper;;
+   commit-graph) : plumbing;;
commit-tree)  : plumbing;;
count-objects): infrequent;;
credential)   : credentials;;

Re: [PATCH v4 04/13] commit-graph: implement write_commit_graph()

2018-02-26 Thread SZEDER Gábor

On Mon, Feb 19, 2018 at 7:53 PM, Derrick Stolee  wrote:

> +static int if_packed_commit_add_to_list(const struct object_id *oid,
> +   struct packed_git *pack,
> +   uint32_t pos,
> +   void *data)
> +{
> +   struct packed_oid_list *list = (struct packed_oid_list*)data;
> +   enum object_type type;
> +   unsigned long size;
> +   void *inner_data;
> +   off_t offset = nth_packed_object_offset(pack, pos);
> +   inner_data = unpack_entry(pack, offset, , );
> +
> +   if (inner_data)
> +   free(inner_data);

The condition is unnecessary, free() can handle a NULL argument just fine.

(Suggested by Coccinelle and 'contrib/coccinelle/free.cocci.patch'.)

[PATCH] sha1_name: fix uninitialized memory errors

2018-02-26 Thread Derrick Stolee

During abbreviation checks, we navigate to the position within a
pack-index that an OID would be inserted and check surrounding OIDs
for the maximum matching prefix. This position may be beyond the
last position, because the given OID is lexicographically larger
than every OID in the pack. Then nth_packed_object_oid() does not
initialize "oid".

Use the return value of nth_packed_object_oid() to prevent these
errors.

Reported-by: Christian Couder 
Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d2..44dd595 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -546,17 +546,12 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 * nearby for the abbreviation length.
 */
mad->init_len = 0;
-   if (!match) {
-   nth_packed_object_oid(, p, first);
+   if (!match && nth_packed_object_oid(, p, first))
extend_abbrev_len(, mad);
-   } else if (first < num - 1) {
-   nth_packed_object_oid(, p, first + 1);
+   else if (first < num - 1 && nth_packed_object_oid(, p, first + 1))
extend_abbrev_len(, mad);
-   }
-   if (first > 0) {
-   nth_packed_object_oid(, p, first - 1);
+   if (first > 0 && nth_packed_object_oid(, p, first - 1))
extend_abbrev_len(, mad);
-   }
mad->init_len = mad->cur_len;
 }
 
-- 
2.7.4

Re: [PATCH v2] subtree: fix add and pull for GPG-signed commits

2018-02-26 Thread Stephen R Guglielmo

On Fri, Feb 23, 2018 at 5:45 PM, Junio C Hamano  wrote:
> Stephen R Guglielmo  writes:
>
>> If log.showsignature is true (or --show-signature is passed) while
>> performing a `subtree add` or `subtree pull`, the command fails.
>>
>> toptree_for_commit() calls `log` and passes the output to `commit-tree`.
>> If this output shows the GPG signature data, `commit-tree` throws a
>> fatal error.
>>
>> This commit fixes the issue by adding --no-show-signature to `log` calls
>> in a few places, as well as using the more appropriate `rev-parse`
>> instead where possible.
>>
>> Signed-off-by: Stephen R Guglielmo 
>> ---
>>  contrib/subtree/git-subtree.sh | 12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> This was too heavily whitespace damaged so I recreated your patch
> manually from scratch and queued, during which time I may have made
> silly and simple mistakes.  Please double check what appears on the
> 'pu' branch in a few hours.
>
> Thanks.
>
> I am however starting to feel that
>
>  (1) add gitlog="git log" and then do s/git log/$gitlog/; to the
>  remainder of the whole script in patch 1/2; and
>
>  (2) turn the variable definition to gitlog="git log --no-show-signature"
>  in patch 2/2
>
> may be a better approach.  After all, this script is not prepared to
> be used by any group of people who use signed commits, and showing
> commit signature in any of its use of 'git log', either present or
> in the future, will not be useful to it, I suspect.


Hi Junio,

I can confirm the changes to the pu branch looks good. I apologize for
the whitespace issue; Gmail must've mangled it.

I'm happy to develop a new patch based on your recommendations. Should
it be on top of the previous patch I sent or should it replace the
previous patch?

Thanks,
Steve

Re: Use of uninitialised value of size 8 in sha1_name.c

2018-02-26 Thread Christian Couder

On Mon, Feb 26, 2018 at 3:06 PM, Derrick Stolee  wrote:
>
> Christian: do you want to submit the patch, or should I put one together?

I'd rather have you put one together.

Thanks,
Christian.

Re: [PATCH] commit-graph: fix some "plain integer as NULL pointer" warnings

2018-02-26 Thread Derrick Stolee


On 2/24/2018 12:42 AM, René Scharfe wrote:

Am 24.02.2018 um 03:24 schrieb Ramsay Jones:

diff --git a/commit-graph.c b/commit-graph.c
index fc5ee7e99..c2f443436 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,7 +45,7 @@ char *get_graph_latest_filename(const char *obj_dir)
   {
struct strbuf fname = STRBUF_INIT;
strbuf_addf(, "%s/info/graph-latest", obj_dir);
-   return strbuf_detach(, 0);
+   return strbuf_detach(, NULL);
   }

You could also replace that function body with:

return xstrfmt("%s/info/graph-latest", obj_dir);

René


Thanks for the feedback! I will apply these locally as I am re-rolling 
today.


-Stolee

Re: Use of uninitialised value of size 8 in sha1_name.c

2018-02-26 Thread Derrick Stolee


On 2/26/2018 5:23 AM, Christian Couder wrote:

On Mon, Feb 26, 2018 at 10:53 AM, Jeff King  wrote:

On Mon, Feb 26, 2018 at 10:04:22AM +0100, Christian Couder wrote:


==21455== Use of uninitialised value of size 8
==21455==at 0x2D2A73: get_hex_char_from_oid (sha1_name.c:492)
==21455==by 0x2D2AFE: extend_abbrev_len (sha1_name.c:502)
==21455==by 0x2D2C3D: find_abbrev_len_for_pack (sha1_name.c:551)
==21455==by 0x2D2CFF: find_abbrev_len_packed (sha1_name.c:569)
==21455==by 0x2D2E12: find_unique_abbrev_r (sha1_name.c:608)
==21455==by 0x2DCB66: strbuf_add_unique_abbrev (strbuf.c:877)
==21455==by 0x14F7CE: update_local_ref (fetch.c:700)
==21455==by 0x1500CF: store_updated_refs (fetch.c:871)
==21455==by 0x15035B: fetch_refs (fetch.c:932)
==21455==by 0x150CF8: do_fetch (fetch.c:1146)
==21455==by 0x1515AB: fetch_one (fetch.c:1370)
==21455==by 0x151A1D: cmd_fetch (fetch.c:1457)
==21455==  Uninitialised value was created by a stack allocation
==21455==at 0x2D2B2E: find_abbrev_len_for_pack (sha1_name.c:513)
==21455==

A quick git blame seems to point to 0e87b85683 (sha1_name: minimize
OID comparisons during disambiguation, 2017-10-12).

It is difficult to tell for sure though as t5616-partial-clone.sh was
added after that commit.

I think that commit is to blame, though the error isn't exactly where
that stack trace puts it. Try this:

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..6f7f36436f 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -547,7 +547,10 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
  */
 mad->init_len = 0;
 if (!match) {
-   nth_packed_object_oid(, p, first);
+   warning("p->num_objects = %u, first = %u",
+   p->num_objects, first);
+   if (!nth_packed_object_oid(, p, first))
+   die("oops!");
 extend_abbrev_len(, mad);
 } else if (first < num - 1) {
 nth_packed_object_oid(, p, first + 1);

I get failures all over the test suite, like this:

   warning: p->num_objects = 4, first = 3
   warning: p->num_objects = 8, first = 6
   warning: p->num_objects = 10, first = 0
   warning: p->num_objects = 4, first = 0
   warning: p->num_objects = 8, first = 0
   warning: p->num_objects = 10, first = 10
   fatal: oops!

Yeah, I tried to t5601-clone.sh with --valgrind and I also get one
error (the same "Uninitialised value" error actually).


Thanks for finding this. Sorry for the bug.


Any time the abbreviated hex would go after the last object in the pack,
then first==p->num_objects, and nth_packed_object_oid() will fail. That
leaves uninitialized data in "oid", which is what valgrind complains
about when we examine it in extend_abbrev_len().

Most of the time the code behaves correctly anyway, because the
uninitialized bytes are unlikely to match up with our hex and extend the
length. Probably we just need to detect that case and skip the call to
extend_abbrev_len() altogether.

Yeah, something like:

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..a041d8d24f 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -547,15 +547,15 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
  */
 mad->init_len = 0;
 if (!match) {
-   nth_packed_object_oid(, p, first);
-   extend_abbrev_len(, mad);
+   if (nth_packed_object_oid(, p, first))
+   extend_abbrev_len(, mad);
 } else if (first < num - 1) {
-   nth_packed_object_oid(, p, first + 1);
-   extend_abbrev_len(, mad);
+   if (nth_packed_object_oid(, p, first + 1))
+   extend_abbrev_len(, mad);
 }
 if (first > 0) {
-   nth_packed_object_oid(, p, first - 1);
-   extend_abbrev_len(, mad);
+   if (nth_packed_object_oid(, p, first - 1))
+   extend_abbrev_len(, mad);
 }
 mad->init_len = mad->cur_len;
  }

seems to prevent valgrind from complaining when running t5616-partial-clone.sh.


This seems like the safest fix, but also we could use our values of 
"match", "first" and "num" to safely call nth_packed_object_oid(). 
However, since nth_packed_object_oid() checks the bounds, don't 
duplicate work and just use the return value.


I would reformat your patch slightly, but that's just preference:

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d2..97b632c 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -546,17 +546,14 @@ static void find_abbrev_len_for_pack(struct 
packed_git *p,

 * nearby for the abbreviation length.
 */
    mad->init_len = 0;
-   if (!match) {
-   nth_packed_object_oid(, p, first);
+   if (!match && nth_packed_object_oid(, p, first))
    extend_abbrev_len(, mad);
-   } else if (first < num - 1) {
-

Re: [PATCH 0/2] wildmatch precompilation interface

2018-02-26 Thread Ævar Arnfjörð Bjarmason

On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> My recently landed wildmatch test series was in preperation for some
>> more major wildmatch changes.
>>
>> Here's another series that prepares for bigger changes in wildmatch,
>> it adds an interface to pre-compile the patterns. Right now there's no
>> point in doing this, and it's harmless since none of the codepaths are
>> that performance sensitive, but down the line this'll save us time as
>> we'll be able to skip re-parsing the pattern each time with a better
>> wildmatch backend.
>
> I don't see any big problem with this, but should this be a standalone
> series? Some changes look harmless now, but I'd rather see the real
> precompile implementation to see how it impacts (or benefits) the
> converted call sites.

Let's drop this for now sinceyou're on the fence about it, I wasn't all
that sure myself and thought I'd send it in for comments.

I don't have anything ready for submission from the rest of this series,
but figured (if you/others didn't mind) that it might be easier to
review the addition of the interface at first, but indeed, on second
thought that doesn't make sense.

The state of what I have is:

 1. 
 2. 

The dir.c use (and some tree-related stuff) are the real hot
codepaths using wildmatch.
 3. 

This is somewhat of a mess right now. Preliminary tests reveal that
the pathological case tested for by t/perf/p0100-globbing.sh is
wildly faster, but that most regular matching is a bit slower,
although that might be me being stupid with the interface.

Re: [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns

2018-02-26 Thread Ævar Arnfjörð Bjarmason


On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> Add the scaffolding necessary for precompiling wildmatch()
>> patterns.
>>
>> There is currently no point in doing this with the wildmatch()
>> function we have now, since it can't make any use of precompiling the
>> pattern.
>>
>> But adding this interface and making use of it will make it easy to
>> refactor the wildmatch() function to parse the pattern into opcodes as
>> some glob() implementations do, or to drop an alternate wildmatch()
>> backend in which trades parsing slowness for faster matching, such as
>> the PCRE v2 conversion function that understands the wildmatch()
>> syntax.
>>
>> It's very unlikely that we'll remove the wildmatch() function as a
>> convenience wrapper even if we end up requiring a separate compilation
>> step in some future implementation. There are a lot of one-shot
>> wildmatches in the codebase, in that case most likely wildmatch() will
>> be kept around as a shorthand for wildmatch_{compile,match,free}().
>>
>> I modeled this interface on the PCRE v2 interface. I didn't go with a
>> glob(3) & globfree(3)-like interface because that would require every
>> wildmatch() user to pass a dummy parameter, which I got rid of in
>> 55d3426929 ("wildmatch: remove unused wildopts parameter",
>> 2017-06-22).
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  wildmatch.c | 25 +
>>  wildmatch.h | 11 +++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/wildmatch.c b/wildmatch.c
>> index d074c1be10..032f339391 100644
>> --- a/wildmatch.c
>> +++ b/wildmatch.c
>> @@ -276,3 +276,28 @@ int wildmatch(const char *pattern, const char *text, 
>> unsigned int flags)
>>  {
>> return dowild((const uchar*)pattern, (const uchar*)text, flags);
>>  }
>> +
>> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
>> +unsigned int flags)
>> +{
>> +   struct wildmatch_compiled *wildmatch_compiled = xmalloc(
>> +   sizeof(struct wildmatch_compiled));
>
> struct wildmatch_compiled *data = xmalloc(sizeof(*data));
>
> ?
>
> It shortens the line a bit. We already use WM_ prefix for wildmatch
> flags, perhaps we can use it for wildmatch structs too (e.g.
> wm_compiled instead)

Thanks, that's better.

>> +   wildmatch_compiled->pattern = xstrdup(pattern);
>> +   wildmatch_compiled->flags = flags;
>> +
>> +   return wildmatch_compiled;
>> +}
>> +
>> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
>> +   const char *text)
>> +{
>> +   return wildmatch(wildmatch_compiled->pattern, text,
>> +wildmatch_compiled->flags);
>> +}
>> +
>> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled)
>> +{
>> +   if (wildmatch_compiled)
>> +   free((void *)wildmatch_compiled->pattern);
>
> Why do make pattern type "const char *" then remove "const" with
> typecast here? Why not just "char *" in wildmatch_compiled?
>
> If the reason is to avoid other users from peeking in and modifying
> it, then perhaps you can move struct wildmatch_compiled to wildmatch.c
> and keep it an opaque struct pointer.

Yes, it's to indicate that "pattern" won't ever be modified. I think
it's more readable / self documenting to have the compiler enforce that
via const, even though it requires the cast to free it.

Re: [PATCH 2/2] wildmatch: use the new precompiling wildmatch()

2018-02-26 Thread Ævar Arnfjörð Bjarmason


On Mon, Feb 26 2018, Duy Nguyen jotted:

> On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> diff --git a/config.c b/config.c
>> index b0c20e6cb8..0f595de971 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -210,6 +210,7 @@ static int include_by_gitdir(const struct config_options 
>> *opts,
>> int ret = 0, prefix;
>> const char *git_dir;
>> int already_tried_absolute = 0;
>> +   struct wildmatch_compiled *wildmatch_compiled = NULL;
>>
>> if (opts->git_dir)
>> git_dir = opts->git_dir;
>> @@ -237,8 +238,10 @@ static int include_by_gitdir(const struct 
>> config_options *opts,
>> goto done;
>> }
>>
>> -   ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
>> -icase ? WM_CASEFOLD : 0);
>> +   if (!wildmatch_compiled)
>> +   wildmatch_compiled = wildmatch_compile(pattern.buf + prefix,
>> +  icase ? WM_CASEFOLD : 
>> 0);
>> +   ret = !wildmatch_match(wildmatch_compiled, text.buf + prefix);
>
> This is a one-shot matching.

It will match once *or* twice depending on how the goto in that codepath
does.

> Is it worth converting to the new interface?

I moved it more as a showcase, to show how the interface looks like in
common scenarios. It obviously won't be a performance bottleneck or
anything like that for this specific codepath.

>>
>> if (!ret && !already_tried_absolute) {
>> /*
>> @@ -257,6 +260,7 @@ static int include_by_gitdir(const struct config_options 
>> *opts,
>>  done:
>> strbuf_release();
>> strbuf_release();
>> +   wildmatch_free(wildmatch_compiled);
>> return ret;
>>  }
>>
>> diff --git a/refs.c b/refs.c
>> index 20ba82b434..c631793d1e 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -213,7 +213,7 @@ char *resolve_refdup(const char *refname, int 
>> resolve_flags,
>>
>>  /* The argument to filter_refs */
>>  struct ref_filter {
>> -   const char *pattern;
>> +   struct wildmatch_compiled *code;
>
> This actually makes me think if we should name this struct wildmatch_pattern.

Yeah, that's a better name.

>> each_ref_fn *fn;
>> void *cb_data;
>>  };
>> @@ -291,7 +291,7 @@ static int filter_refs(const char *refname, const struct 
>> object_id *oid,
>>  {
>> struct ref_filter *filter = (struct ref_filter *)data;
>>
>> -   if (wildmatch(filter->pattern, refname, 0))
>> +   if (wildmatch_match(filter->code, refname))
>> return 0;
>> return filter->fn(refname, oid, flags, filter->cb_data);
>>  }
>> @@ -454,12 +454,13 @@ int for_each_glob_ref_in(each_ref_fn fn, const char 
>> *pattern,
>> strbuf_addch(_pattern, '*');
>> }
>>
>> -   filter.pattern = real_pattern.buf;
>> +   filter.code = wildmatch_compile(real_pattern.buf, 0);
>> filter.fn = fn;
>> filter.cb_data = cb_data;
>> ret = for_each_ref(filter_refs, );
>>
>> strbuf_release(_pattern);
>> +   wildmatch_free(filter.code);
>> return ret;
>>  }
>>
>> --
>> 2.15.1.424.g9478a66081
>>

[PATCH 0/1] git-p4: add format-patch subcommand

2018-02-26 Thread Luke Diamand

This is an initial attempt to add a "format-patch" command
to git-p4, following on from the earlier discussion about
shelving.

It uses the "p4 describe" command to generate the diff content and
post-processes it enough to generate git-style patches. These
can be fed to tools such as patch, or "git am".

This is useful for "unshelving" a P4 changelist into your git tree,
since the usual git subcommands (sync, clone) cannot easily read
a shelved changelist: there is no good way to get from Perforce
a consistent single revision against which to generate a diff
using git fast-import, since Perforce doesn't have the concept of
a repo revision.

By default, it leaves the depot prefix in the patch, but using
the option "--strip-depot-prefix" makes it suitable for "git am".

Use it like this:

 $ git p4 format-patch 12345 >out.patch

or
 $ mkdir patches
 $ git p4 format-patch --output patches 12345 12346

or
 $ git p4 format-patch --strip-depot-prefix 12347 >out.patch
 $ git am out.patch

Limitations of "p4 describe" mean that this will not work reliably
with binary files. There's no easy way around this. The change makes
a small attempt to at least stop on binary files, but in the case
of a file marked in P4 as "text", which contains binary deltas, the
file will unavoidably come out corrupted.

Luke Diamand (1):
  git-p4: add format-patch subcommand

 Documentation/git-p4.txt |  33 +
 git-p4.py| 304 +--
 t/t9832-make-patch.sh| 135 +
 3 files changed, 462 insertions(+), 10 deletions(-)
 create mode 100755 t/t9832-make-patch.sh

-- 
2.15.1.272.gc310869385

[PATCH 1/1] git-p4: add format-patch subcommand

2018-02-26 Thread Luke Diamand

It takes a list of P4 changelists and generates a patch for
each one, using "p4 describe".

This is especially useful for applying shelved changelists
to your git-p4 tree; the existing "git p4" subcommands do
not handle these.

That's because they "p4 print" the contents of each file at
a given revision, and then use git-fastimport to generate the
deltas. But in the case of a shelved changelist, there is no
easy way to find out what the previous file state was - Perforce
does not have the concept of a single repo-wide revision.

Unfortunately, using "p4 describe" comes with a price: it cannot
be used to reliably generate diffs for binary files (it tries to
linebreak on LF characters) and it is also _much_ slower.

Unicode character correctness is untested - in theory if
"p4 describe" knows about the character encoding it shouldn't
break unicode characters if they happen to contain LF, but I
haven't tested this.

Signed-off-by: Luke Diamand 
---
 Documentation/git-p4.txt |  33 +
 git-p4.py| 304 +--
 t/t9832-make-patch.sh| 135 +
 3 files changed, 462 insertions(+), 10 deletions(-)
 create mode 100755 t/t9832-make-patch.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index d8c8f11c9f..1908b00de2 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -164,6 +164,28 @@ $ git p4 submit --shelve
 $ git p4 submit --update-shelve 1234 --update-shelve 2345
 
 
+
+format-patch
+~~
+This will attempt to create a unified diff (using the git patch variant) which
+can be passed to patch. This is generated using the output from "p4 describe".
+
+It includes the contents of added files (which "p4 describe" does not).
+
+Binary files cannot be handled correctly due to limitations in "p4 describe".
+
+It will handle both normal and shelved (pending) changelists.
+
+Because of the way this works, it will be much slower than the normal git-p4 
clone
+path.
+
+
+$ git p4 format-patch 12345
+$ git p4 format-patch --output patchdir 12345 12346 12347
+$ git p4 format-patch --strip-depot-prefix 12348 > out.patch
+$ git am out.patch
+
+
 OPTIONS
 ---
 
@@ -337,6 +359,17 @@ These options can be used to modify 'git p4 rebase' 
behavior.
 --import-labels::
Import p4 labels.
 
+Format-patch options
+
+
+--output::
+Write patches to this directory (which must exist) instead of to
+standard output.
+
+--strip-depot-prefix::
+Strip the depot prefix from filenames in the patch.  This makes
+it suitable for passing to tools such as "git am".
+
 DEPOT PATH SYNTAX
 -
 The p4 depot path argument to 'git p4 sync' and 'git p4 clone' can
diff --git a/git-p4.py b/git-p4.py
index 7bb9cadc69..a1e998e6f5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,6 +26,7 @@ import zipfile
 import zlib
 import ctypes
 import errno
+import time
 
 try:
 from subprocess import CalledProcessError
@@ -316,12 +317,17 @@ def p4_last_change():
 results = p4CmdList(["changes", "-m", "1"], skip_info=True)
 return int(results[0]['change'])
 
-def p4_describe(change):
+def p4_describe(change, shelved=False):
 """Make sure it returns a valid result by checking for
the presence of field "time".  Return a dict of the
results."""
 
-ds = p4CmdList(["describe", "-s", str(change)], skip_info=True)
+cmd = ["describe", "-s"]
+if shelved:
+cmd += ["-S"]
+cmd += [str(change)]
+
+ds = p4CmdList(cmd, skip_info=True)
 if len(ds) != 1:
 die("p4 describe -s %d did not return 1 result: %s" % (change, 
str(ds)))
 
@@ -372,7 +378,14 @@ def split_p4_type(p4type):
 mods = ""
 if len(s) > 1:
 mods = s[1]
-return (base, mods)
+
+git_mode = "100644"
+if "x" in mods:
+git_mode = "100755"
+if base == "symlink":
+git_mode = "12"
+
+return (base, mods, git_mode)
 
 #
 # return the raw p4 type of a file (text, text+ko, etc)
@@ -413,7 +426,7 @@ def p4_keywords_regexp_for_file(file):
 if not os.path.exists(file):
 return None
 else:
-(type_base, type_mods) = split_p4_type(p4_type(file))
+(type_base, type_mods, _) = split_p4_type(p4_type(file))
 return p4_keywords_regexp_for_type(type_base, type_mods)
 
 def setP4ExecBit(file, mode):
@@ -1208,6 +1221,9 @@ class P4UserMap:
 else:
 return True
 
+def getP4UsernameEmail(self, userid):
+return self.users[userid]
+
 def getUserCacheFilename(self):
 home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
 return home + "/.gitp4-usercache.txt"
@@ -2570,13 +2586,9 @@ class P4Sync(Command, P4UserMap):
 sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], 
relPath, size/1024/1024))
 sys.stdout.flush()
 
-(type_base, type_mods) = split_p4_type(file["type"])
+(type_base,

Re: [PATCH v2 34/36] Convert lookup_replace_object to struct object_id

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 4:12 AM, brian m. carlson
 wrote:
> diff --git a/sha1_file.c b/sha1_file.c
> index 7493bc7f11..c41fbe2f01 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1227,22 +1227,18 @@ int oid_object_info_extended(const struct object_id 
> *oid, struct object_info *oi
> static struct object_info blank_oi = OBJECT_INFO_INIT;
> struct pack_entry e;
> int rtype;
> -   const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
> -   lookup_replace_object(oid->hash) :
> -   oid->hash;
> +   const struct object_id *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
> +  lookup_replace_object(oid) :
> +  oid;

Micro nit. Perhaps we should replace "? :" with a real "if" construct
-- 
Duy

Re: Git should preserve modification times at least on request

2018-02-26 Thread Andreas Krey

On Thu, 22 Feb 2018 03:05:35 +, 'Peter Backes' wrote:
...
> The bigger issue is usually to copy with those pesky leap seconds. It 
> makes a difference whether one uses solar seconds ("posix" style; those 
> are more commonly seen) or atomic seconds ("right" style) for the UNIX 
> timestamp.

Is there any system, unix or otherwise, that uses 'right'-style seconds,
i.e. TAI, as its base?

(I.e. one where (time(0)%60) does not indicate the current position
of the second hand of an accurate clock?)

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800

Re: [PATCH v2 05/36] resolve-undo: convert struct resolve_undo_info to object_id

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 4:11 AM, brian m. carlson
 wrote:
> @@ -44,7 +44,7 @@ void resolve_undo_write(struct strbuf *sb, struct 
> string_list *resolve_undo)
> for (i = 0; i < 3; i++) {
> if (!ui->mode[i])
> continue;
> -   strbuf_add(sb, ui->sha1[i], 20);
> +   strbuf_add(sb, ui->oid[i].hash, the_hash_algo->rawsz);
> }
> }
>  }
> @@ -89,7 +89,7 @@ struct string_list *resolve_undo_read(const char *data, 
> unsigned long size)
> continue;
> if (size < 20)
> goto error;
> -   hashcpy(ui->sha1[i], (const unsigned char *)data);
> +   hashcpy(ui->oid[i].hash, (const unsigned char *)data);
> size -= 20;
> data += 20;
> }

Here we see the same pattern again, but this time the @@ lines give
better context: these are actually hash I/O. Maybe it's about time we
add

int oidwrite(char *, size_t , const struct object_id *);
// optionally, void strbuf_addoid(struct strbuf *, const struct object_id *);
int oidread(struct object_id *, const char *, size_t);

for conversion from between an object_id in memory and on disk? It
would probably be a straight memcpy for all hash algorithms so we
don't really need new function pointers in git_hash_algo for this.
-- 
Duy

Re: What's cooking in git.git (Feb 2018, #03; Wed, 21)

2018-02-26 Thread Eric Sunshine

On Mon, Feb 26, 2018 at 5:17 AM, Duy Nguyen  wrote:
> On Thu, Feb 22, 2018 at 7:31 AM, Junio C Hamano  wrote:
>> * nd/worktree-move (2018-02-12) 7 commits
>>
>>  "git worktree" learned move and remove subcommands.
>>
>>  Expecting a reroll.
>>  cf. <20180124095357.19645-1-pclo...@gmail.com>
>
> I think 'pu' already has the reroll (v2). So far no new comments.

I've been meaning to take a look at v2 but haven't had a chance yet.
Based upon the interdiff, though, it looks like all my review comments
from v1 have been addressed.

Re: [PATCH v2 04/36] cache-tree: convert remnants to struct object_id

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 4:11 AM, brian m. carlson
 wrote:
> @@ -465,7 +465,7 @@ static void write_one(struct strbuf *buffer, struct 
> cache_tree *it,
>  #endif
>
> if (0 <= it->entry_count) {
> -   strbuf_add(buffer, it->oid.hash, 20);
> +   strbuf_add(buffer, it->oid.hash, the_hash_algo->rawsz);
> }
> for (i = 0; i < it->subtree_nr; i++) {
> struct cache_tree_sub *down = it->down[i];
> @@ -520,11 +520,11 @@ static struct cache_tree *read_one(const char **buffer, 
> unsigned long *size_p)
> goto free_return;
> buf++; size--;
> if (0 <= it->entry_count) {
> -   if (size < 20)
> +   if (size < the_hash_algo->rawsz)
> goto free_return;
> hashcpy(it->oid.hash, (const unsigned char*)buf);
> -   buf += 20;
> -   size -= 20;
> +   buf += the_hash_algo->rawsz;
> +   size -= the_hash_algo->rawsz;
> }

This looks a bit strange that everything else is now with hash
abstraction, except that it->oid is still copied with hashcpy(). But I
guess we don't have an equivalent for that function yet..

But since it->oid.hash is copied to buffer with strbuf_add() in the
top chunk, maybe we can do memcpy() here with the_hash_algo ->rawsz
too?
-- 
Duy

Re: Git should preserve modification times at least on request

2018-02-26 Thread 'Peter Backes'

On Mon, Feb 26, 2018 at 11:56:42AM +0100, Andreas Krey wrote:
> > The bigger issue is usually to copy with those pesky leap seconds. It 
> > makes a difference whether one uses solar seconds ("posix" style; those 
> > are more commonly seen) or atomic seconds ("right" style) for the UNIX 
> > timestamp.
> 
> Is there any system, unix or otherwise, that uses 'right'-style seconds,
> i.e. TAI, as its base?

Most certainly there is. This depends on the individual configuration 
of the system. On my Fedora system, the commonly used tzdata package 
off the shelf contains support for 'right' style versions of all 
timezones in /usr/share/zoneinfo/right If the user links one of those 
timezones to /etc/localtime or manually specifies them (like 
TZ=right/Europe/Berlin ls -l) they will be used.

You don't find a lot of those systems today, but those who used to use 
the 'right' timestamps might for legacy reasons explicitly configure 
their system to use those timezone variants. I personally did this for 
a number of years, but then converted the filesystems timestamps to 
'posix' and I am now exclusively using 'posix' ones.

Best wishes
Peter
-- 
Peter Backes, r...@helen.plasma.xg8.de

OK

2018-02-26 Thread Ahmed Zama

Greetings,

I am desperately in need of a foreigner with a foreign account. I have
a profitable business that wil be of a benefit to both of us. Permit
me to disclose the details of the business to you

Ahmed Zama

Re: [PATCH 0/2] wildmatch precompilation interface

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
 wrote:
> My recently landed wildmatch test series was in preperation for some
> more major wildmatch changes.
>
> Here's another series that prepares for bigger changes in wildmatch,
> it adds an interface to pre-compile the patterns. Right now there's no
> point in doing this, and it's harmless since none of the codepaths are
> that performance sensitive, but down the line this'll save us time as
> we'll be able to skip re-parsing the pattern each time with a better
> wildmatch backend.

I don't see any big problem with this, but should this be a standalone
series? Some changes look harmless now, but I'd rather see the real
precompile implementation to see how it impacts (or benefits) the
converted call sites.
-- 
Duy

Re: [PATCH 2/2] wildmatch: use the new precompiling wildmatch()

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
 wrote:
> diff --git a/config.c b/config.c
> index b0c20e6cb8..0f595de971 100644
> --- a/config.c
> +++ b/config.c
> @@ -210,6 +210,7 @@ static int include_by_gitdir(const struct config_options 
> *opts,
> int ret = 0, prefix;
> const char *git_dir;
> int already_tried_absolute = 0;
> +   struct wildmatch_compiled *wildmatch_compiled = NULL;
>
> if (opts->git_dir)
> git_dir = opts->git_dir;
> @@ -237,8 +238,10 @@ static int include_by_gitdir(const struct config_options 
> *opts,
> goto done;
> }
>
> -   ret = !wildmatch(pattern.buf + prefix, text.buf + prefix,
> -icase ? WM_CASEFOLD : 0);
> +   if (!wildmatch_compiled)
> +   wildmatch_compiled = wildmatch_compile(pattern.buf + prefix,
> +  icase ? WM_CASEFOLD : 
> 0);
> +   ret = !wildmatch_match(wildmatch_compiled, text.buf + prefix);

This is a one-shot matching. Is it worth converting to the new interface?

>
> if (!ret && !already_tried_absolute) {
> /*
> @@ -257,6 +260,7 @@ static int include_by_gitdir(const struct config_options 
> *opts,
>  done:
> strbuf_release();
> strbuf_release();
> +   wildmatch_free(wildmatch_compiled);
> return ret;
>  }
>
> diff --git a/refs.c b/refs.c
> index 20ba82b434..c631793d1e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -213,7 +213,7 @@ char *resolve_refdup(const char *refname, int 
> resolve_flags,
>
>  /* The argument to filter_refs */
>  struct ref_filter {
> -   const char *pattern;
> +   struct wildmatch_compiled *code;

This actually makes me think if we should name this struct wildmatch_pattern.

> each_ref_fn *fn;
> void *cb_data;
>  };
> @@ -291,7 +291,7 @@ static int filter_refs(const char *refname, const struct 
> object_id *oid,
>  {
> struct ref_filter *filter = (struct ref_filter *)data;
>
> -   if (wildmatch(filter->pattern, refname, 0))
> +   if (wildmatch_match(filter->code, refname))
> return 0;
> return filter->fn(refname, oid, flags, filter->cb_data);
>  }
> @@ -454,12 +454,13 @@ int for_each_glob_ref_in(each_ref_fn fn, const char 
> *pattern,
> strbuf_addch(_pattern, '*');
> }
>
> -   filter.pattern = real_pattern.buf;
> +   filter.code = wildmatch_compile(real_pattern.buf, 0);
> filter.fn = fn;
> filter.cb_data = cb_data;
> ret = for_each_ref(filter_refs, );
>
> strbuf_release(_pattern);
> +   wildmatch_free(filter.code);
> return ret;
>  }
>
> --
> 2.15.1.424.g9478a66081
>
-- 
Duy

Re: [PATCH] Call timegm and timelocal with 4-digit year

2018-02-26 Thread Bernhard M. Wiedemann

On 2018-02-24 23:28, Ævar Arnfjörð Bjarmason wrote:
> My Time::Local 1.2300 on perl 5.024001 currently interprets "69" here as
> 1969, but after this it'll be 2069.

on one hand, there is already
perl -e 'use Time::Local;
  print scalar gmtime(Time::Local::timegm(0,0,0,1,0,68))'
Sun Jan  1 00:00:00 2068

The problem here is the 'currently', because in 11 months from now, 69
will be interpreted as 2069 by Time::Local, too.

faketime 2019-01-01 perl -e 'use Time::Local;
   print scalar gmtime(Time::Local::timegm(0,0,0,1,0,69))'
Tue Jan  1 00:00:00 2069


We could compromize for
$y+=100 if $y<69;

to freeze the behaviour to what Time::Local does in 2018, but then there
might not be actual repos with such 2-digit year timestamps.

There are definitely CVS repos from before 1999 and without this change,
those might be misinterpreted in 2049 (or before, depending on age)

Re: [PATCH 1/2] wildmatch: add interface for precompiling wildmatch() patterns

2018-02-26 Thread Duy Nguyen

On Mon, Feb 26, 2018 at 3:35 AM, Ævar Arnfjörð Bjarmason
 wrote:
> Add the scaffolding necessary for precompiling wildmatch()
> patterns.
>
> There is currently no point in doing this with the wildmatch()
> function we have now, since it can't make any use of precompiling the
> pattern.
>
> But adding this interface and making use of it will make it easy to
> refactor the wildmatch() function to parse the pattern into opcodes as
> some glob() implementations do, or to drop an alternate wildmatch()
> backend in which trades parsing slowness for faster matching, such as
> the PCRE v2 conversion function that understands the wildmatch()
> syntax.
>
> It's very unlikely that we'll remove the wildmatch() function as a
> convenience wrapper even if we end up requiring a separate compilation
> step in some future implementation. There are a lot of one-shot
> wildmatches in the codebase, in that case most likely wildmatch() will
> be kept around as a shorthand for wildmatch_{compile,match,free}().
>
> I modeled this interface on the PCRE v2 interface. I didn't go with a
> glob(3) & globfree(3)-like interface because that would require every
> wildmatch() user to pass a dummy parameter, which I got rid of in
> 55d3426929 ("wildmatch: remove unused wildopts parameter",
> 2017-06-22).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  wildmatch.c | 25 +
>  wildmatch.h | 11 +++
>  2 files changed, 36 insertions(+)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..032f339391 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -276,3 +276,28 @@ int wildmatch(const char *pattern, const char *text, 
> unsigned int flags)
>  {
> return dowild((const uchar*)pattern, (const uchar*)text, flags);
>  }
> +
> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
> +unsigned int flags)
> +{
> +   struct wildmatch_compiled *wildmatch_compiled = xmalloc(
> +   sizeof(struct wildmatch_compiled));

struct wildmatch_compiled *data = xmalloc(sizeof(*data));

?

It shortens the line a bit. We already use WM_ prefix for wildmatch
flags, perhaps we can use it for wildmatch structs too (e.g.
wm_compiled instead)

> +   wildmatch_compiled->pattern = xstrdup(pattern);
> +   wildmatch_compiled->flags = flags;
> +
> +   return wildmatch_compiled;
> +}
> +
> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
> +   const char *text)
> +{
> +   return wildmatch(wildmatch_compiled->pattern, text,
> +wildmatch_compiled->flags);
> +}
> +
> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled)
> +{
> +   if (wildmatch_compiled)
> +   free((void *)wildmatch_compiled->pattern);

Why do make pattern type "const char *" then remove "const" with
typecast here? Why not just "char *" in wildmatch_compiled?

If the reason is to avoid other users from peeking in and modifying
it, then perhaps you can move struct wildmatch_compiled to wildmatch.c
and keep it an opaque struct pointer.

> +   free(wildmatch_compiled);
> +}
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..2fc00e0ca0 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -10,5 +10,16 @@
>  #define WM_ABORT_ALL -1
>  #define WM_ABORT_TO_STARSTAR -2
>
> +struct wildmatch_compiled {
> +   const char *pattern;
> +   unsigned int flags;
> +};
> +
>  int wildmatch(const char *pattern, const char *text, unsigned int flags);
> +struct wildmatch_compiled *wildmatch_compile(const char *pattern,
> +unsigned int flags);
> +int wildmatch_match(struct wildmatch_compiled *wildmatch_compiled,
> +   const char *text);
> +void wildmatch_free(struct wildmatch_compiled *wildmatch_compiled);
> +
>  #endif
> --
> 2.15.1.424.g9478a66081
>



-- 
Duy

1 2 >

1 - 100 of 112 matches

Mail list logo