Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Phillip Wood

Hi Ævar

On 06/12/2018 13:54, Ævar Arnfjörð Bjarmason wrote:

Let's ignore how bad this patch is for git.git, and just focus on how
diff.colorMoved treats it:

 diff --git a/builtin/add.c b/builtin/add.c
 index f65c172299..d1155322ef 100644
 --- a/builtin/add.c
 +++ b/builtin/add.c
 @@ -6,5 +6,3 @@
  #include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 -#include "lockfile.h"
  #include "dir.h"
 diff --git a/builtin/am.c b/builtin/am.c
 index 8f27f3375b..eded15aa8a 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -6,3 +6,2 @@
  #include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 diff --git a/builtin/blame.c b/builtin/blame.c
 index 06a7163ffe..44a754f190 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -8,3 +8,2 @@
  #include "cache.h"
 -#include "config.h"
  #include "color.h"
 diff --git a/cache.h b/cache.h
 index ca36b44ee0..ea8d60b94a 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -4,2 +4,4 @@
  #include "git-compat-util.h"
 +#include "config.h"
 +#include "new.h"
  #include "strbuf.h"

This is a common thing that's useful to have highlighted, e.g. we move
includes of config.h to some common file, so I want to se all the
deleted config.h lines as moved into the cache.h line, and then the
"lockfile.h" I removed while I was at it plain remove, and the new
"new.h" plain added.

Exactly that is what you get with diff.colorMoved=plain, but the default
of diff.colorMoved=zebra gets confused by this and highlights no moves
at all, same or "blocks" and "dimmed-zebra".

So at first I thought this had something to do with the many->one
detection, but it seems to be simpler, we just don't detect a move of
1-line with anything but plain, e.g. this works as expected in all modes
and detects the many->one:

 diff --git a/builtin/add.c b/builtin/add.c
 index f65c172299..f4fda75890 100644
 --- a/builtin/add.c
 +++ b/builtin/add.c
 @@ -5,4 +5,2 @@
   */
 -#include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 diff --git a/builtin/branch.c b/builtin/branch.c
 index 0c55f7f065..52e39924d3 100644
 --- a/builtin/branch.c
 +++ b/builtin/branch.c
 @@ -7,4 +7,2 @@

 -#include "cache.h"
 -#include "config.h"
  #include "color.h"
 diff --git a/cache.h b/cache.h
 index ca36b44ee0..d4146dbf8a 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -3,2 +3,4 @@

 +#include "cache.h"
 +#include "config.h"
  #include "git-compat-util.h"

So is there some "must be at least two consecutive lines" condition for
not-plain, or is something else going on here?


To be considered a block has to have 20 alphanumeric characters - see 
commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", 
2017-08-15). This stops things like random '}' lines being marked as 
moved on their own. It might be better to use some kind of frequency 
information (a bit like python's difflib junk parameter) instead so that 
(fairly) unique short lines also get marked properly.


Best Wishes

Phillip


Re: [PATCH] rebase -i: introduce the 'test' command

2018-12-03 Thread Phillip Wood

On 01/12/2018 20:02, Jeff King wrote:

On Thu, Nov 29, 2018 at 09:32:48AM +0100, Johannes Schindelin wrote:


Would it not make more sense to add a command-line option (and a config
setting) to re-schedule failed `exec` commands? Like so:


Your proposition would do in most cases, however it is not possible to
make a distinction between reschedulable and non-reschedulable commands.


True. But I don't think that's so terrible.

What I think is something to avoid is two commands that do something very,
very similar, but with two very, very different names.

In reality, I think that it would even make sense to change the default to
reschedule failed `exec` commands. Which is why I suggested to also add a
config option.


I sometimes add "x false" to the top of the todo list to stop and create
new commits before the first one. That would be awkward if I could never
get past that line. However, I think elsewhere a "pause" line has been
discussed, which would serve the same purpose.

I wonder how often this kind of "yes, I know it fails, but keep going
anyway" situation would come up. And what the interface is like for
getting past it. E.g., what if you fixed a bunch of stuff but your tests
still fail? You may not want to abandon the changes you've made, but you
need to "rebase --continue" to move forward. I encounter this often when
the correct fix is actually in an earlier commit than the one that
yields the test failure. You can't rewind an interactive rebase, so I
complete and restart it, adding an "e"dit at the earlier commit.

How would I move past the test that fails to continue? I guess "git
rebase --edit-todo" and then manually remove it (and any other remaining
test lines)?


Perhaps we could teach git rebase --skip to skip a rescheduled command, 
it could be useful if people want to skip rescheduled picks as well 
(though I don't think I've ever had that happen in the wild). I can see 
myself turning on the rescheduling config setting but occasionally 
wanting to be able to skip over the rescheduled exec command.



Best Wishes

Phillip


That's not too bad, but I wonder if people would find it more awkward
than the current way (which is to just "rebase --continue" until you get
to the end).

I dunno. I am not sure if I am for or against changing the default, so
these are just some musings. :)

-Peff





Re: [PATCH v3 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-30 Thread Phillip Wood

Hi Alban

Sorry it has taken me a while to look at the latest iteration. I like
the changes to pass a list of strings for the exec commands. I've only
had a chance to take a quick look, but I've got a couple of comments below

On 09/11/2018 08:07, Alban Gruin wrote:

This refactors sequencer_add_exec_commands() to work on a todo_list to
avoid redundant reads and writes to the disk.

Instead of just inserting the `exec' command between the other commands,
and re-parsing the buffer at the end the exec command is appended to the
buffer once, and a new list of items is created.  Items from the old
list are copied across and new `exec' items are appended when
necessary.  This eliminates the need to reparse the buffer, but this
also means we have to use todo_list_write_to_disk() to write the file().

todo_list_add_exec_commands() and sequencer_add_exec_commands() are
modified to take a string list instead of a string -- one item for each
command.  This makes it easier to insert a new command to the todo list
for each command to execute.

sequencer_add_exec_commands() still reads the todo list from the disk,
as it is needed by rebase -p.

complete_action() still uses sequencer_add_exec_commands() for now.
This will be changed in a future commit.

Signed-off-by: Alban Gruin 
---
 builtin/rebase--interactive.c |  15 +++--
 sequencer.c   | 111 +-
 sequencer.h   |   4 +-
 3 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/builtin/rebase--interactive.c b/builtin/rebase--interactive.c
index c1d87c0fe6..1fb5a43c0d 100644
--- a/builtin/rebase--interactive.c
+++ b/builtin/rebase--interactive.c
@@ -65,7 +65,7 @@ static int do_interactive_rebase(struct replay_opts *opts, 
unsigned flags,
 const char *onto, const char *onto_name,
 const char *squash_onto, const char *head_name,
 const char *restrict_revision, char 
*raw_strategies,
-const char *cmd, unsigned autosquash)
+struct string_list *commands, unsigned 
autosquash)
 {
int ret;
const char *head_hash = NULL;
@@ -115,7 +115,7 @@ static int do_interactive_rebase(struct replay_opts *opts, 
unsigned flags,
else {
discard_cache();
ret = complete_action(opts, flags, shortrevisions, onto_name, 
onto,
- head_hash, cmd, autosquash);
+ head_hash, commands, autosquash);
}
 
 	free(revisions);

@@ -138,6 +138,7 @@ int cmd_rebase__interactive(int argc, const char **argv, 
const char *prefix)
const char *onto = NULL, *onto_name = NULL, *restrict_revision = NULL,
*squash_onto = NULL, *upstream = NULL, *head_name = NULL,
*switch_to = NULL, *cmd = NULL;
+   struct string_list commands = STRING_LIST_INIT_DUP;
char *raw_strategies = NULL;
enum {
NONE = 0, CONTINUE, SKIP, EDIT_TODO, SHOW_CURRENT_PATCH,
@@ -220,6 +221,12 @@ int cmd_rebase__interactive(int argc, const char **argv, 
const char *prefix)
warning(_("--[no-]rebase-cousins has no effect without "
  "--rebase-merges"));
 
+	if (cmd && *cmd) {

+   string_list_split(, cmd, '\n', -1);
+   if (strlen(commands.items[commands.nr - 1].string) == 0)
+   --commands.nr;
+   }
+
switch (command) {
case NONE:
if (!onto && !upstream)
@@ -227,7 +234,7 @@ int cmd_rebase__interactive(int argc, const char **argv, 
const char *prefix)
 
 		ret = do_interactive_rebase(, flags, switch_to, upstream, onto,

onto_name, squash_onto, head_name, 
restrict_revision,
-   raw_strategies, cmd, autosquash);
+   raw_strategies, , 
autosquash);
break;
case SKIP: {
struct string_list merge_rr = STRING_LIST_INIT_DUP;
@@ -261,7 +268,7 @@ int cmd_rebase__interactive(int argc, const char **argv, 
const char *prefix)
ret = rearrange_squash();
break;
case ADD_EXEC:
-   ret = sequencer_add_exec_commands(cmd);
+   ret = sequencer_add_exec_commands();
break;
default:
BUG("invalid command '%d'", command);
diff --git a/sequencer.c b/sequencer.c
index 900899ef20..11692d0b98 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4394,24 +4394,29 @@ int sequencer_make_script(FILE *out, int argc, const 
char **argv,
return 0;
 }
 
-/*

- * Add commands after pick and (series of) squash/fixup commands
- * in the todo list.
- */
-int sequencer_add_exec_commands(const char *commands)
+static void todo_list_add_exec_commands(struct todo_list *todo_list,
+ 

Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment

2018-11-27 Thread Phillip Wood

Hi Stefan

On 26/11/2018 21:20, Stefan Beller wrote:

On Fri, Nov 23, 2018 at 3:17 AM Phillip Wood  wrote:


From: Phillip Wood 

Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
response to those comments - see the range-diff below for details (the
patch numbers are off by one in the range diff, I think because the
first patch is unchanged and so it was used as the merge base by
--range-diff=.


`git range-diff` accepts a three dotted "range" OLD...NEW
as an easy abbreviation for the arguments
"COMMON..OLD COMMON..NEW" and the common element is
computed as the last common element. It doesn't have knowledge
about where you started your topic branch.


I was using the new --range-diff option to format-patch, I think I 
should have given --range-diff=@{u}...



For some reason the range-diff also includes
the notes even though I did not give --notes to format-patch)


This is interesting.
The existence of notes.rewrite. seems to work well
with the range-diff then, as the config would trigger the copy-over
of notes and then range-diff would diff the original notes to the new
notes.


Yes, but I think with format-patch it should only diff the notes when 
--notes is given.



When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.



The range-diff looks good to me.


That's good, thanks for your comments on the previous iterations.

Best Wishes

Phillip

Thanks,
Stefan





[PATCH v2 9/9] diff --color-moved-ws: handle blank lines

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

  git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood 
---
 diff.c | 34 ---
 t/t4015-diff-whitespace.sh | 41 ++
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 148503e49c..ef5b8c78d7 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
memset(b, 0, sizeof(*b));
 }
 
+#define INDENT_BLANKLINE INT_MIN
+
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
 {
-   unsigned int off = 0;
+   unsigned int off = 0, i;
int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
const char *s = es->line;
const int len = es->len;
@@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol 
*es)
}
}
 
-   es->indent_off = off;
-   es->indent_width = width;
+   /* check if this line is blank */
+   for (i = off; i < len; i++)
+   if (!isspace(s[i]))
+   break;
+
+   if (i == len) {
+   es->indent_width = INDENT_BLANKLINE;
+   es->indent_off = len;
+   } else {
+   es->indent_off = off;
+   es->indent_width = width;
+   }
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
@@ -834,6 +846,11 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
b_width = b->indent_width;
int delta;
 
+   if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
+   *out = INDENT_BLANKLINE;
+   return 1;
+   }
+
if (a->s == DIFF_SYMBOL_PLUS)
delta = a_width - b_width;
else
@@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
if (al != bl)
return 1;
 
+   /* If 'l' and 'cur' are both blank then they match. */
+   if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+   return 0;
+
/*
 * The indent changes of the block are known and stored in pmb->wsd;
 * however we need to check if the indent changes of the current line
@@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
else
delta = c_width - a_width;
 
+   /*
+* If the previous lines of this block were all blank then set its
+* whitespace delta.
+*/
+   if (pmb->wsd == INDENT_BLANKLINE)
+   pmb->wsd = delta;
+
return !(delta == pmb->wsd && al - a_off == cl - c_off &&
 !memcmp(a, b, al) && !
 memcmp(a + a_off, c + c_off, al - a_off));
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e023839ba6..9d6f88b07f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta 
incompatible with other space opti
test_i18ngrep allow-indentation-change err
 '
 
+EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
git reset --hard &&
tr Q_ "\t " <<-EOF >text.txt &&
+   ${EMPTY}
+   too short without
+   ${EMPTY}
+   ___being grouped across blank line
+   ${EMPTY}
+   context
+   lines
+   to
+   anchor
Indented text to
_Qbe further indented by four spaces across
Qseveral lines
@@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta 
across moved blocks' '
git commit -m "add text.txt" &&
 
tr Q_ "\t " <<-EOF >text.txt &&
+   context
+   lines
+   to
+   anchor
QIndented text to
QQbe further indented by four spaces across
Qseveral lines
+   ${EMPTY}
+   QQtoo short without
+   ${EMPTY}
+   Q___being grouped across blank line
+   ${EMPTY}
Q_QThese two lines have had their
indentation reduced by four spaces
QQdifferent indentation change
@@ -1937,7 +1956,16 @@ test_expect_success 'compare mixed whitespace delta 
across moved blocks' '
diff --git a/text.txt b/t

[PATCH v2 1/9] diff: document --no-color-moved

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Add documentation for --no-color-moved.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..151690f814 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,6 +293,10 @@ dimmed-zebra::
`dimmed_zebra` is a deprecated synonym.
 --
 
+--no-color-moved::
+   Turn off move detection. This can be used to override configuration
+   settings. It is the same as `--color-moved=no`.
+
 --color-moved-ws=::
This configures how white spaces are ignored when performing the
move detection for `--color-moved`.
-- 
2.19.1.1690.g258b440b18



[PATCH v2 2/9] Use "whitespace" consistently

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Most of the messages and documentation use 'whitespace' rather than
'white space' or 'white spaces' convert to latter two to the former for
consistency.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 ++--
 Documentation/git-cat-file.txt | 8 
 diff.c | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 151690f814..57a2f4cb7a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -298,7 +298,7 @@ dimmed-zebra::
settings. It is the same as `--color-moved=no`.
 
 --color-moved-ws=::
-   This configures how white spaces are ignored when performing the
+   This configures how whitespace is ignored when performing the
move detection for `--color-moved`.
 ifdef::git-diff[]
It can be set by the `diff.colorMovedWS` configuration setting.
@@ -316,7 +316,7 @@ ignore-all-space::
Ignore whitespace when comparing lines. This ignores differences
even if one line has whitespace where the other line has none.
 allow-indentation-change::
-   Initially ignore any white spaces in the move detection, then
+   Initially ignore any whitespace in the move detection, then
group the moved code blocks only into a block if the change in
whitespace is the same per line. This is incompatible with the
other modes.
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 74013335a1..9a2e9cdafb 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -23,8 +23,8 @@ In the second form, a list of objects (separated by 
linefeeds) is provided on
 stdin, and the SHA-1, type, and size of each object is printed on stdout. The
 output format can be overridden using the optional `` argument. If
 either `--textconv` or `--filters` was specified, the input is expected to
-list the object names followed by the path name, separated by a single white
-space, so that the appropriate drivers can be determined.
+list the object names followed by the path name, separated by a single
+whitespace, so that the appropriate drivers can be determined.
 
 OPTIONS
 ---
@@ -79,15 +79,15 @@ OPTIONS
Print object information and contents for each object provided
on stdin.  May not be combined with any other options or arguments
except `--textconv` or `--filters`, in which case the input lines
-   also need to specify the path, separated by white space.  See the
+   also need to specify the path, separated by whitespace.  See the
section `BATCH OUTPUT` below for details.
 
 --batch-check::
 --batch-check=::
Print object information for each object provided on stdin.  May
not be combined with any other options or arguments except
`--textconv` or `--filters`, in which case the input lines also
-   need to specify the path, separated by white space.  See the
+   need to specify the path, separated by whitespace.  See the
section `BATCH OUTPUT` below for details.
 
 --batch-all-objects::
diff --git a/diff.c b/diff.c
index c29b1cce14..78cd3958f4 100644
--- a/diff.c
+++ b/diff.c
@@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg)
 
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
(ret & XDF_WHITESPACE_FLAGS))
-   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other whitespace modes"));
 
string_list_clear(, 0);
 
-- 
2.19.1.1690.g258b440b18



[PATCH v2 3/9] diff: allow --no-color-moved-ws

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 7 +++
 diff.c | 6 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 57a2f4cb7a..e1744fa80d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -306,6 +306,8 @@ endif::git-diff[]
These modes can be given as a comma separated list:
 +
 --
+no::
+   Do not ignore whitespace when performing move detection.
 ignore-space-at-eol::
Ignore changes in whitespace at EOL.
 ignore-space-change::
@@ -322,6 +324,11 @@ allow-indentation-change::
other modes.
 --
 
+--no-color-moved-ws::
+   Do not ignore whitespace when performing move detection. This can be
+   used to override configuration settings. It is the same as
+   `--color-moved-ws=no`.
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 78cd3958f4..9b9811988b 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg)
strbuf_addstr(, i->string);
strbuf_trim();
 
-   if (!strcmp(sb.buf, "ignore-space-change"))
+   if (!strcmp(sb.buf, "no"))
+   ret = 0;
+   else if (!strcmp(sb.buf, "ignore-space-change"))
ret |= XDF_IGNORE_WHITESPACE_CHANGE;
else if (!strcmp(sb.buf, "ignore-space-at-eol"))
ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
@@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options,
if (cm < 0)
die("bad --color-moved argument: %s", arg);
options->color_moved = cm;
+   } else if (!strcmp(arg, "--no-color-moved-ws")) {
+   options->color_moved_ws_handling = 0;
} else if (skip_prefix(arg, "--color-moved-ws=", )) {
options->color_moved_ws_handling = parse_color_moved_ws(arg);
} else if (skip_to_optional_arg_default(arg, "--color-words", 
>word_regex, NULL)) {
-- 
2.19.1.1690.g258b440b18



[PATCH v2 4/9] diff --color-moved-ws: demonstrate false positives

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can highlight lines
that have internal whitespace changes rather than indentation
changes. For example in commit 1a07e59c3e ("Update messages in
preparation for i18n", 2018-07-21) the lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are highlighted as moved when they should not be. Modify an existing
test to show the problem that will be fixed in the next commit.

Signed-off-by: Phillip Wood 
---
 t/t4015-diff-whitespace.sh | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index a9fb226c5a..eee81a1987 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white 
spaces' '
test_cmp expected actual
 '
 
-test_expect_success 'compare whitespace delta across moved blocks' '
+test_expect_failure 'compare whitespace delta across moved blocks' '
 
git reset --hard &&
q_to_tab <<-\EOF >text.txt &&
@@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQQthat has similar lines
QQQto previous blocks, but with different indent
QQQYetQAnotherQoutlierQ
+   QLine with internal w h i t e s p a c e change
EOF
 
git add text.txt &&
@@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQthat has similar lines
QQto previous blocks, but with different indent
QQYetQAnotherQoutlier
+   QLine with internal whitespace change
EOF
 
git diff --color --color-moved 
--color-moved-ws=allow-indentation-change >actual.raw &&
@@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
diff --git a/text.txt b/text.txt
--- a/text.txt
+++ b/text.txt
-   @@ -1,14 +1,14 @@
+   @@ -1,15 +1,15 @@
-QIndented
-QText across
-Qsome lines
@@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
-QQQthat has similar lines
-QQQto previous blocks, but with different 
indent
-QQQYetQAnotherQoutlierQ
+   -QLine with internal w h i t e s p a c e change
+QQIndented
+QQText across
+QQsome lines
@@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
+QQthat has similar lines
+QQto previous blocks, but with 
different indent
+QQYetQAnotherQoutlier
+   +QLine with internal whitespace 
change
EOF
 
test_cmp expected actual
-- 
2.19.1.1690.g258b440b18



[PATCH v2 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int al = cur->es->len, cl = l->len;
+   int al = cur->es->len, bl = match->es->len, cl = l->len;
const char *a = cur->es->line,
   *b = match->es->line,
   *c = l->line;
-
+   const char *orig_a = a;
int wslen;
 
/*
-* We need to check if 'cur' is equal to 'match'.
-* As those are from the same (+/-) side, we do not need to adjust for
-* indent changes. However these were found using fuzzy matching
-* so we do have to check if they are equal.
+* We need to check if 'cur' is equal to 'match'.  As those
+* are from the same (+/-) side, we do not need to adjust for
+* indent changes. However these were found using fuzzy
+* matching so we do have to check if they are equal. Here we
+* just check the lengths. We delay calling memcmp() to check
+* the contents until later as if the length comparison for a
+* and c fails we can avoid the call all together.
 */
-   if (strcmp(a, b))
+   if (al != bl)
return 1;
 
if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (al != cl || memcmp(a, c, al))
+   if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.1.1690.g258b440b18



[PATCH v2 6/9] diff --color-moved=zebra: be stricter with color alternation

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.

Fix this by only using the alternate colors for adjacent moved blocks.

Signed-off-by: Phillip Wood 
---
 diff.c | 27 +++
 t/t4015-diff-whitespace.sh |  6 +++---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 53a7ab5aca..8c08dd68df 100644
--- a/diff.c
+++ b/diff.c
@@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct 
moved_block *pmb,
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  *
+ * Returns 0 if the last block is empty or is unset by this function, non zero
+ * otherwise.
+ *
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
-static void adjust_last_block(struct diff_options *o, int n, int block_length)
+static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
int i, alnum_count = 0;
if (o->color_moved == COLOR_MOVED_PLAIN)
-   return;
+   return block_length;
for (i = 1; i < block_length + 1; i++) {
const char *c = o->emitted_symbols->buf[n - i].line;
for (; *c; c++) {
if (!isalnum(*c))
continue;
alnum_count++;
if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
-   return;
+   return 1;
}
}
for (i = 1; i < block_length + 1; i++)
o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+   return 0;
 }
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
@@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o,
 {
struct moved_block *pmb = NULL; /* potentially moved blocks */
int pmb_nr = 0, pmb_alloc = 0;
-   int n, flipped_block = 1, block_length = 0;
+   int n, flipped_block = 0, block_length = 0;
 
 
for (n = 0; n < o->emitted_symbols->nr; n++) {
struct hashmap *hm = NULL;
struct moved_entry *key;
struct moved_entry *match = NULL;
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
+   enum diff_symbol last_symbol = 0;
 
switch (l->s) {
case DIFF_SYMBOL_PLUS:
@@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o,
free(key);
break;
default:
-   flipped_block = 1;
+   flipped_block = 0;
}
 
if (!match) {
@@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o,
moved_block_clear([i]);
pmb_nr = 0;
block_length = 0;
+   flipped_block = 0;
+   last_symbol = l->s;
continue;
}
 
if (o->color_moved == COLOR_MOVED_PLAIN) {
+   last_symbol = l->s;
l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
}
@@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o,
}
}
 
-   flipped_block = (flipped_block + 1) % 2;
+   if (adjust_last_block(o, n, block_length) &&
+   pmb_nr && last_symbol != l->s)
+   flipped_block = (flipped_block + 1) % 2;
+   else
+   flipped_block = 0;
 
-   adjust_last_block(o, n, block_length);
block_length = 0;
}
 
if (pmb_nr) {
block_length++;
-
l->flags |= DIFF_SYMBOL_MOVED_LINE;
if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
}
+   last_symbol = l->s;
}
adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index eee81a1987..fe8a2ab06e 100755
--- a/t/t401

[PATCH v2 0/9] diff --color-moved-ws fixes and enhancment

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
response to those comments - see the range-diff below for details (the
patch numbers are off by one in the range diff, I think because the
first patch is unchanged and so it was used as the merge base by
--range-diff=. For some reason the range-diff also includes
the notes even though I did not give --notes to format-patch)

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.


Phillip Wood (9):
  diff: document --no-color-moved
  Use "whitespace" consistently
  diff: allow --no-color-moved-ws
  diff --color-moved-ws: demonstrate false positives
  diff --color-moved-ws: fix false positives
  diff --color-moved=zebra: be stricter with color alternation
  diff --color-moved-ws: optimize allow-indentation-change
  diff --color-moved-ws: modify allow-indentation-change
  diff --color-moved-ws: handle blank lines

 Documentation/diff-options.txt |  15 ++-
 Documentation/git-cat-file.txt |   8 +-
 diff.c | 219 +
 t/t4015-diff-whitespace.sh |  99 ++-
 4 files changed, 255 insertions(+), 86 deletions(-)

Range-diff against v1:
1:  ae58ae4f29 ! 1:  4939ee371d diff: use whitespace consistently
@@ -1,9 +1,10 @@
 Author: Phillip Wood 
 
-diff: use whitespace consistently
+Use "whitespace" consistently
 
-Most of the documentation uses 'whitespace' rather than 'white space'
-or 'white spaces' convert to latter two to the former for consistency.
+Most of the messages and documentation use 'whitespace' rather than
+'white space' or 'white spaces' convert to latter two to the former for
+consistency.
 
     Signed-off-by: Phillip Wood 
 
@@ -29,6 +30,39 @@
whitespace is the same per line. This is incompatible with the
other modes.
 
+ diff --git a/Documentation/git-cat-file.txt 
b/Documentation/git-cat-file.txt
+ --- a/Documentation/git-cat-file.txt
+ +++ b/Documentation/git-cat-file.txt
+@@
+ stdin, and the SHA-1, type, and size of each object is printed on stdout. 
The
+ output format can be overridden using the optional `` argument. If
+ either `--textconv` or `--filters` was specified, the input is expected to
+-list the object names followed by the path name, separated by a single 
white
+-space, so that the appropriate drivers can be determined.
++list the object names followed by the path name, separated by a single
++whitespace, so that the appropriate drivers can be determined.
+ 
+ OPTIONS
+ ---
+@@
+   Print object information and contents for each object provided
+   on stdin.  May not be combined with any other options or arguments
+   except `--textconv` or `--filters`, in which case the input lines
+-  also need to specify the path, separated by white space.  See the
++  also need to specify the path, separated by whitespace.  See the
+   section `BATCH OUTPUT` below for details.
+ 
+ --batch-check::
+ --batch-check=::
+   Print object information for each object provided on stdin.  May
+   not be combined with any other options or arguments except
+   `--textconv` or `--filters`, in which case the input lines also
+-  need to specify the path, separated by white space.  See the
++  need to specify the path, separated by whitespace.  See the
+   section `BATCH OUTPUT` below for details.
+ 
+ --batch-all-objects::
+
  diff --git a/diff.c b/diff.c
  --- a/diff.c
  +++ b/diff.c
2:  7072bc6211 = 2:  204c7fea9d diff: allow --no-color-moved-ws
3:  ce3ad19eea = 3:  542b79b215 diff --color-moved-ws: demonstrate false 
positives
4:  700e0b61e7 = 4:  4ffb5c4122 diff --color-moved-ws: fix false positives
5:  9ecd8159a7 = 5:  a3a84f90c5 diff --color-moved=zebra: be stricter with 
color alternation
6:  1b1158b1ca = 6:  f94f2e0bae diff --color-moved-ws: optimize 
allow-indentation-change
7:  d8a362be6a ! 7:  fe8eb9cdbc diff --color-moved-ws: modify 
allow-indentation-change
@@ -17,7 +17,7 @@
 
 This commit changes the way the indentation is handled to track the
 visual size of the indentation rather than the characters in the
-indentation. This has they benefit that any whitespace errors do not
+indentation. This has the benefit that any whitespace errors do not
 interfer with the move detection (the whitespace errors will still be
 highlighted according to --ws-error-highlight). During the discussion
 of this feature there were concerns about the correct detection of
@@ -30,7 +30,7 @@
 they are uncolored.
 
     Signed-off-by: Phillip Wood 
  

[PATCH v2 5/9] diff --color-moved-ws: fix false positives

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are colored as moved even though they are different.

This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.

Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.

Signed-off-by: Phillip Wood 
---
 diff.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9b9811988b..53a7ab5aca 100644
--- a/diff.c
+++ b/diff.c
@@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o,
continue;
}
 
-   l->flags |= DIFF_SYMBOL_MOVED_LINE;
-
-   if (o->color_moved == COLOR_MOVED_PLAIN)
+   if (o->color_moved == COLOR_MOVED_PLAIN) {
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
+   }
 
if (o->color_moved_ws_handling &
COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o,
block_length = 0;
}
 
-   block_length++;
+   if (pmb_nr) {
+   block_length++;
 
-   if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
-   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
+   if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
+   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   }
}
adjust_last_block(o, n, block_length);
 
-- 
2.19.1.1690.g258b440b18



[PATCH v2 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-23 Thread Phillip Wood
From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has the benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.

Signed-off-by: Phillip Wood 
---
 diff.c | 130 +
 t/t4015-diff-whitespace.sh |  56 
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..148503e49c 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
const char *line;
int len;
int flags;
+   int indent_off;   /* Offset to first non-whitespace character */
+   int indent_width; /* The visual width of the indentation */
enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-   char *string;
-   unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
struct moved_entry *match;
-   struct ws_delta wsd;
+   int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-   FREE_AND_NULL(b->wsd.string);
-   b->match = NULL;
+   memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+   unsigned int off = 0;
+   int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+   const char *s = es->line;
+   const int len = es->len;
+
+   /* skip any \v \f \r at start of indentation */
+   while (s[off] == '\f' || s[off] == '\v' ||
+  (s[off] == '\r' && off < len - 1))
+   off++;
+
+   /* calculate the visual width of indentation */
+   while(1) {
+   if (s[off] == ' ') {
+   width++;
+   off++;
+   } else if (s[off] == '\t') {
+   width += tab_width - (width % tab_width);
+   while (s[++off] == '\t')
+   width += tab_width;
+   } else {
+   break;
+   }
+   }
+
+   es->indent_off = off;
+   es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-const struct emitted_diff_symbol *b,
-struct ws_delta *out)
+   const struct emitted_diff_symbol *b,
+   int *out)
 {
-   const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-   const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-   int d = longer->len - shorter->len;
+   int a_len = a->len,
+   b_len = b->len,
+   a_off = a->indent_off,
+   a_width = a->indent_width,
+   b_off = b->indent_off,
+   b_width = b->indent_width;
+   int delta;
 
-   if (strncmp(longer->line + d, shorter->line, shorter->len))
+   if (a->s == DIFF_SYMBOL_PLUS)
+   delta = a_width - b_width;
+   else
+   delta = b_width - a_width;
+
+   if (a_len - a_off != b_len - b_off ||
+   memcmp(a->li

Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines

2018-11-21 Thread Phillip Wood

On 20/11/2018 18:05, Stefan Beller wrote:

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:


From: Phillip Wood 

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

   git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood 
---

Notes:
 Changes since rfc:
  - Split these changes into a separate commit.
  - Detect blank lines when processing the indentation rather than
parsing each line twice.
  - Tweaked the test to make it harder as suggested by Stefan.
  - Added timing data to the commit message.

  diff.c | 34 ---
  t/t4015-diff-whitespace.sh | 41 ++
  2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 89559293e7..072b5bced6 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
 memset(b, 0, sizeof(*b));
  }

+#define INDENT_BLANKLINE INT_MIN


Answering my question from the previous patch:
This is why we need to keep the indents signed.

This patch looks quite nice to read along.

The whole series looks good to me.


Thanks


Do we need to update the docs in any way?


I'm not sure, at the moment it does not make any promises about the 
exact behavior of --color-moved-ws=allow-indentation-change, we could 
change it to be more explicit but I'm not sure it's worth it.


Thanks for looking over these patches, I'll post a reroll soon based on 
your comments.


Phillip


Thanks,
Stefan





Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-21 Thread Phillip Wood
Hi Stefan

On 20/11/2018 20:24, Stefan Xenos wrote:
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
> 
> I've read and reread that sentence several times and don't fully
> understand it. Could you elaborate?

Sorry if I wasn't very clear. To cherry-pick (or revert) a merge commit
one has to specify a parent of the commit being picked with -m for
cherry-pick to use as the merge base for the three way merge that
creates the new commit. If the original merge is updated then evolve
wont know which parent to use as the merge base when evolving the
cherry-picked version of the merge as the parent is not recorded in the
meta commit.

> It sounds scary, though. With the evolve command, amending merges will
> need to be supported.

Evolving a merge should be fine, I was just referring to merges that
have been cherry-picked.


Best Wishes

Phillip

(Thanks for your reply to my other message, I'm still digesting it at
the moment, once I've done that and found the references to mercurial
using commit obsolescence information in rebase and pull I'll reply.)

> If you create a merge and then amend one of its
> parent commits, the evolve command will need to rebase the merge and
> point one or both parents to the replacement instead.
> 
>   - Stefan
> On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood  
> wrote:
>>
>> On 15/11/2018 00:55, sxe...@google.com wrote:
>>> From: Stefan Xenos 
>>>
>>> +Obsolescence across cherry-picks
>>> +
>>> +By default the evolve command will treat cherry-picks and squash merges as 
>>> being
>>> +completely separate from the original. Further amendments to the original 
>>> commit
>>> +will have no effect on the cherry-picked copy. However, this behavior may 
>>> not be
>>> +desirable in all circumstances.
>>> +
>>> +The evolve command may at some point support an option to look for cases 
>>> where
>>> +the source of a cherry-pick or squash merge has itself been amended, and
>>> +automatically apply that same change to the cherry-picked copy. In such 
>>> cases,
>>> +it would traverse origin edges rather than ignoring them, and would treat a
>>> +commit with origin edges as being obsolete if any of its origins were 
>>> obsolete.
>>
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
>>
>> Best Wishes
>>
>> Phillip



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

+Obsolescence across cherry-picks
+
+By default the evolve command will treat cherry-picks and squash merges as 
being
+completely separate from the original. Further amendments to the original 
commit
+will have no effect on the cherry-picked copy. However, this behavior may not 
be
+desirable in all circumstances.
+
+The evolve command may at some point support an option to look for cases where
+the source of a cherry-pick or squash merge has itself been amended, and
+automatically apply that same change to the cherry-picked copy. In such cases,
+it would traverse origin edges rather than ignoring them, and would treat a
+commit with origin edges as being obsolete if any of its origins were obsolete.


If a merge has been cherry-picked we cannot update it as we don't record 
which parent was used for the pick, however it is probably not a problem 
in practice - I think it is unusual to amend merges.


Best Wishes

Phillip


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 20/11/2018 12:18, Phillip Wood wrote:

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 
+Divergence
+--
+From the user’s perspective, two changes are divergent if they both 
ask for
+different replacements to the same commit. More precisely, a target 
commit is
+considered divergent if there is more than one commit at the head of 
a change in
+refs/metas that leads to the target commit via an unbroken chain of 
“obsolete”

+edges.
+
+Much like a merge conflict, divergence is a situation that requires user
+intervention to resolve. The evolve command will stop when it encounters
+divergence and prompt the user to resolve the problem. Users can 
solve the

+problem in several ways:
+
+- Discard one of the changes (by deleting its change branch).
+- Merge the two changes (producing a single change branch).


I assume this wont create merge commits for the actual commits though, 
just merge the meta branches and create some new commits that are each 
the result of something like 'merge-recursive original-commit 
our-new-version their-new-version'


That should have been

merge-recursive original-commit^ -- our-new-version their-new-version

Best Wishes

Phillip



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

Hi Stefan

Thanks for working on this, I think it could be a really useful addition 
to git. I'd echo Gábor's comments about making commands descriptive and 
easy for the user to find, git has aliases, accepts abbreviated option 
names and has shell completion so I don't think typing is really such a 
problem. From your reply it looks like you've taken those concerns on 
board. I've got some more comments below.


On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.

Signed-off-by: Stefan Xenos 
---
  Documentation/technical/evolve.txt | 885 +
  1 file changed, 885 insertions(+)
  create mode 100644 Documentation/technical/evolve.txt

diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt
new file mode 100644
index 00..88470eada3
--- /dev/null
+++ b/Documentation/technical/evolve.txt
@@ -0,0 +1,885 @@
+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.
+
+Background
+--
+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?
+
+The evolve command is a convenient way to work with chains of commits that are
+under review. Whenever you rebase or amend a commit, the repository remembers
+that the old commit is obsolete and has been replaced by the new one. Then, at
+some point in the future, you can run "git evolve" and the correct sequence of
+rebases will occur in the correct order such that no commit has an obsolete
+parent.
+
+Part of making the "evolve" command work involves tracking the edits to a 
commit
+over time, which is why we need an obsolescence graph. However, the 
obsolescence
+graph will also bring other benefits:
+
+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).
+- It will be possible to quickly locate and list all the changes the user
+  currently has in progress.
+- It can be used as part of other high-level commands that combine or split
+  changes.
+- It can be used to decorate commits (in git log, gitk, etc) that are either
+  obsolete or are the tip of a work in progress.
+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.
+- It could be used to correctly rebase local changes and other local branches
+  after running git-filter-branch.
+- It can replace the change-id footer used by gerrit.
+
+Similar technologies
+
+There are some other technologies that address the same end-user problem.
+
+Rebase -i can be used to solve the same problem, but users can't easily switch
+tasks midway through an interactive rebase or have more than one interactive
+rebase going on at the same time. It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased
+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase. You can think of rebase -i as a top-down approach and the evolve 
command
+as the bottom-up approach to the same problem.
+
+Several patch queue managers have been built on top of git (such as topgit,
+stgit, and quilt). They address the same user need. However they also rely on
+state managed outside git that needs to be kept in sync. Such state can be
+easily damaged when running a git native command that is unaware of the patch
+queue. They also typically require an explicit initialization step to be done 
by
+the user which creates workflow problems.
+
+Replacements (refs/replace) are superficially similar to obsolescences in that
+they describe that one commit should be replaced by another. However, they
+differ in both how they are created and how they are intended to be used.
+Obsolescences are created automatically by the commands a user runs, and they
+describe the user’s intent to perform a future rebase. Obsolete commits still
+appear in branches, logs, etc like normal commits (possibly with an extra
+decoration that marks them as obsolete). Replacements are typically created
+explicitly by the user, they are meant to be kept around for a long time, and
+they describe a replacement to be applied at read-time rather than as the input
+to a future operation. When a replaced commit is queried, it is typically 
hidden
+and swapped out with its replacement as though the replacement has already
+occurred.
+
+Goals

Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-17 Thread Phillip Wood

Hi Stefan

On 16/11/2018 21:47, Stefan Beller wrote:

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:


From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not


s/they/the/


Thanks, well spotted




interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
 lines each line was colored as a different block, since that commit
 they are uncolored.

Signed-off-by: Phillip Wood 
---

Notes:
 Changes since rfc:
  - It now replaces the existing implementation rather than adding a new
mode.
  - The indentation deltas are now calculated once for each line and
cached.
  - Optimized the whitespace delta comparison to compare string lengths
before comparing the actual strings.
  - Modified the calculation of tabs as suggested by Stefan.
  - Split out the blank line handling into a separate commit as suggest
by Stefan.
  - Fixed some comments pointed out by Stefan.

  diff.c | 130 +
  t/t4015-diff-whitespace.sh |  56 
  2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
 const char *line;
 int len;
 int flags;
+   int indent_off;
+   int indent_width;


So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.


I did wonder about that, but decided the increase was small compared to 
all the strings that are copied when creating the emitted_diff_symbols. 
If we want to save memory then we should stop struct 
emitted_diff_symbol() from carrying a copy of all the strings.



Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)


Yes a comment would make sense. I don't think I have a strong preference 
for signed/unsigned, I can change it if you want.


Thanks for looking at these so promptly

Best Wishes

Phillip

+static void fill_es_indent_data(struct emitted_diff_symbol *es)
[...]



+   if (o->color_moved_ws_handling &
+   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+   fill_es_indent_data(>emitted_symbols->buf[n]);


Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.


+++ b/t/t4015-diff-whitespace.sh
@@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta 
incompatible with other space opti
 test_i18ngrep allow-indentation-change err
  '

+test_expect_success 'compare mixed whitespace delta across moved blocks' '


Looks good,

Thanks!
Stefan





Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-17 Thread Phillip Wood

On 16/11/2018 20:40, Stefan Beller wrote:

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:


From: Phillip Wood 

When running

   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.


Up to here the series looks good and I think we could take it
as a preparatory self-standing series.


Thanks for looking at these, I think it makes sense to split the series 
here, the commit message for this patch may want tweaking slightly if we 
do. (I did wonder about splitting it in two when I submitted it but took 
the easy way out.)


Best Wishes

Phillip


I'll read on.
Thanks,
Stefan





[PATCH v1 1/9] diff: document --no-color-moved

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Add documentation for --no-color-moved.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..151690f814 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,6 +293,10 @@ dimmed-zebra::
`dimmed_zebra` is a deprecated synonym.
 --
 
+--no-color-moved::
+   Turn off move detection. This can be used to override configuration
+   settings. It is the same as `--color-moved=no`.
+
 --color-moved-ws=::
This configures how white spaces are ignored when performing the
move detection for `--color-moved`.
-- 
2.19.1



[PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can highlight lines
that have internal whitespace changes rather than indentation
changes. For example in commit 1a07e59c3e ("Update messages in
preparation for i18n", 2018-07-21) the lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are highlighted as moved when they should not be. Modify an existing
test to show the problem that will be fixed in the next commit.

Signed-off-by: Phillip Wood 
---
 t/t4015-diff-whitespace.sh | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index a9fb226c5a..eee81a1987 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white 
spaces' '
test_cmp expected actual
 '
 
-test_expect_success 'compare whitespace delta across moved blocks' '
+test_expect_failure 'compare whitespace delta across moved blocks' '
 
git reset --hard &&
q_to_tab <<-\EOF >text.txt &&
@@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQQthat has similar lines
QQQto previous blocks, but with different indent
QQQYetQAnotherQoutlierQ
+   QLine with internal w h i t e s p a c e change
EOF
 
git add text.txt &&
@@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQthat has similar lines
QQto previous blocks, but with different indent
QQYetQAnotherQoutlier
+   QLine with internal whitespace change
EOF
 
git diff --color --color-moved 
--color-moved-ws=allow-indentation-change >actual.raw &&
@@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
diff --git a/text.txt b/text.txt
--- a/text.txt
+++ b/text.txt
-   @@ -1,14 +1,14 @@
+   @@ -1,15 +1,15 @@
-QIndented
-QText across
-Qsome lines
@@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
-QQQthat has similar lines
-QQQto previous blocks, but with different 
indent
-QQQYetQAnotherQoutlierQ
+   -QLine with internal w h i t e s p a c e change
+QQIndented
+QQText across
+QQsome lines
@@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
+QQthat has similar lines
+QQto previous blocks, but with 
different indent
+QQYetQAnotherQoutlier
+   +QLine with internal whitespace 
change
EOF
 
test_cmp expected actual
-- 
2.19.1



[PATCH v1 3/9] diff: allow --no-color-moved-ws

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 7 +++
 diff.c | 6 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 57a2f4cb7a..e1744fa80d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -306,6 +306,8 @@ endif::git-diff[]
These modes can be given as a comma separated list:
 +
 --
+no::
+   Do not ignore whitespace when performing move detection.
 ignore-space-at-eol::
Ignore changes in whitespace at EOL.
 ignore-space-change::
@@ -322,6 +324,11 @@ allow-indentation-change::
other modes.
 --
 
+--no-color-moved-ws::
+   Do not ignore whitespace when performing move detection. This can be
+   used to override configuration settings. It is the same as
+   `--color-moved-ws=no`.
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 78cd3958f4..9b9811988b 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg)
strbuf_addstr(, i->string);
strbuf_trim();
 
-   if (!strcmp(sb.buf, "ignore-space-change"))
+   if (!strcmp(sb.buf, "no"))
+   ret = 0;
+   else if (!strcmp(sb.buf, "ignore-space-change"))
ret |= XDF_IGNORE_WHITESPACE_CHANGE;
else if (!strcmp(sb.buf, "ignore-space-at-eol"))
ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
@@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options,
if (cm < 0)
die("bad --color-moved argument: %s", arg);
options->color_moved = cm;
+   } else if (!strcmp(arg, "--no-color-moved-ws")) {
+   options->color_moved_ws_handling = 0;
} else if (skip_prefix(arg, "--color-moved-ws=", )) {
options->color_moved_ws_handling = parse_color_moved_ws(arg);
} else if (skip_to_optional_arg_default(arg, "--color-words", 
>word_regex, NULL)) {
-- 
2.19.1



[PATCH v1 2/9] diff: use whitespace consistently

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Most of the documentation uses 'whitespace' rather than 'white space'
or 'white spaces' convert to latter two to the former for consistency.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 ++--
 diff.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 151690f814..57a2f4cb7a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -298,7 +298,7 @@ dimmed-zebra::
settings. It is the same as `--color-moved=no`.
 
 --color-moved-ws=::
-   This configures how white spaces are ignored when performing the
+   This configures how whitespace is ignored when performing the
move detection for `--color-moved`.
 ifdef::git-diff[]
It can be set by the `diff.colorMovedWS` configuration setting.
@@ -316,7 +316,7 @@ ignore-all-space::
Ignore whitespace when comparing lines. This ignores differences
even if one line has whitespace where the other line has none.
 allow-indentation-change::
-   Initially ignore any white spaces in the move detection, then
+   Initially ignore any whitespace in the move detection, then
group the moved code blocks only into a block if the change in
whitespace is the same per line. This is incompatible with the
other modes.
diff --git a/diff.c b/diff.c
index c29b1cce14..78cd3958f4 100644
--- a/diff.c
+++ b/diff.c
@@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg)
 
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
(ret & XDF_WHITESPACE_FLAGS))
-   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other whitespace modes"));
 
string_list_clear(, 0);
 
-- 
2.19.1



[PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since rfc:
 - It now replaces the existing implementation rather than adding a new
   mode.
 - The indentation deltas are now calculated once for each line and
   cached.
 - Optimized the whitespace delta comparison to compare string lengths
   before comparing the actual strings.
 - Modified the calculation of tabs as suggested by Stefan.
 - Split out the blank line handling into a separate commit as suggest
   by Stefan.
 - Fixed some comments pointed out by Stefan.

 diff.c | 130 +
 t/t4015-diff-whitespace.sh |  56 
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
const char *line;
int len;
int flags;
+   int indent_off;
+   int indent_width;
enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-   char *string;
-   unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
struct moved_entry *match;
-   struct ws_delta wsd;
+   int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-   FREE_AND_NULL(b->wsd.string);
-   b->match = NULL;
+   memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+   unsigned int off = 0;
+   int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+   const char *s = es->line;
+   const int len = es->len;
+
+   /* skip any \v \f \r at start of indentation */
+   while (s[off] == '\f' || s[off] == '\v' ||
+  (s[off] == '\r' && off < len - 1))
+   off++;
+
+   /* calculate the visual width of indentation */
+   while(1) {
+   if (s[off] == ' ') {
+   width++;
+   off++;
+   } else if (s[off] == '\t') {
+   width += tab_width - (width % tab_width);
+   while (s[++off] == '\t')
+   width += tab_width;
+   } else {
+   break;
+   }
+   }
+
+   es->indent_off = off;
+   es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-const struct emitted_diff_symbol *b,
-struct ws_delta *out)
+   const struct emitted_diff_symbol *b,
+   int *out)
 {
-   const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-   const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-   int d = longer->len - shorter->len;
+   int a_len = a->len,
+   b_len = b->len,
+   a_off = a->inden

[PATCH v1 0/9] diff --color-moved-ws fixes and enhancment

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.

Since the rfc this series has grown a few fixes at the beginning. The
implementation has been reworked, the last two patches correspond to a
heavily reworked version the last patch of the rfc version, all the
other patches are new.

Phillip Wood (9):
  diff: document --no-color-moved
  diff: use whitespace consistently
  diff: allow --no-color-moved-ws
  diff --color-moved-ws: demonstrate false positives
  diff --color-moved-ws: fix false positives
  diff --color-moved=zebra: be stricter with color alternation
  diff --color-moved-ws: optimize allow-indentation-change
  diff --color-moved-ws: modify allow-indentation-change
  diff --color-moved-ws: handle blank lines

 Documentation/diff-options.txt |  15 ++-
 diff.c | 219 +
 t/t4015-diff-whitespace.sh |  99 ++-
 3 files changed, 251 insertions(+), 82 deletions(-)

-- 
2.19.1



[PATCH v1 5/9] diff --color-moved-ws: fix false positives

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are colored as moved even though they are different.

This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.

Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.

Signed-off-by: Phillip Wood 
---
 diff.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9b9811988b..53a7ab5aca 100644
--- a/diff.c
+++ b/diff.c
@@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o,
continue;
}
 
-   l->flags |= DIFF_SYMBOL_MOVED_LINE;
-
-   if (o->color_moved == COLOR_MOVED_PLAIN)
+   if (o->color_moved == COLOR_MOVED_PLAIN) {
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
+   }
 
if (o->color_moved_ws_handling &
COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o,
block_length = 0;
}
 
-   block_length++;
+   if (pmb_nr) {
+   block_length++;
 
-   if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
-   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
+   if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
+   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   }
}
adjust_last_block(o, n, block_length);
 
-- 
2.19.1



[PATCH v1 9/9] diff --color-moved-ws: handle blank lines

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

  git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since rfc:
 - Split these changes into a separate commit.
 - Detect blank lines when processing the indentation rather than
   parsing each line twice.
 - Tweaked the test to make it harder as suggested by Stefan.
 - Added timing data to the commit message.

 diff.c | 34 ---
 t/t4015-diff-whitespace.sh | 41 ++
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 89559293e7..072b5bced6 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
memset(b, 0, sizeof(*b));
 }
 
+#define INDENT_BLANKLINE INT_MIN
+
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
 {
-   unsigned int off = 0;
+   unsigned int off = 0, i;
int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
const char *s = es->line;
const int len = es->len;
@@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol 
*es)
}
}
 
-   es->indent_off = off;
-   es->indent_width = width;
+   /* check if this line is blank */
+   for (i = off; i < len; i++)
+   if (!isspace(s[i]))
+   break;
+
+   if (i == len) {
+   es->indent_width = INDENT_BLANKLINE;
+   es->indent_off = len;
+   } else {
+   es->indent_off = off;
+   es->indent_width = width;
+   }
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
@@ -834,6 +846,11 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
b_width = b->indent_width;
int delta;
 
+   if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
+   *out = INDENT_BLANKLINE;
+   return 1;
+   }
+
if (a->s == DIFF_SYMBOL_PLUS)
delta = a_width - b_width;
else
@@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
if (al != bl)
return 1;
 
+   /* If 'l' and 'cur' are both blank then they match. */
+   if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+   return 0;
+
/*
 * The indent changes of the block are known and stored in pmb->wsd;
 * however we need to check if the indent changes of the current line
@@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
else
delta = c_width - a_width;
 
+   /*
+* If the previous lines of this block were all blank then set its
+* whitespace delta.
+*/
+   if (pmb->wsd == INDENT_BLANKLINE)
+   pmb->wsd = delta;
+
return !(delta == pmb->wsd && al - a_off == cl - c_off &&
 !memcmp(a, b, al) && !
 memcmp(a + a_off, c + c_off, al - a_off));
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e023839ba6..9d6f88b07f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta 
incompatible with other space opti
test_i18ngrep allow-indentation-change err
 '
 
+EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
git reset --hard &&
tr Q_ "\t " <<-EOF >text.txt &&
+   ${EMPTY}
+   too short without
+   ${EMPTY}
+   ___being grouped across blank line
+   ${EMPTY}
+   context
+   lines
+   to
+   anchor
Indented text to
_Qbe further indented by four spaces across
Qseveral lines
@@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta 
across moved blocks' '
git commit -m "add text.txt" &&
 
tr Q_ "\t " <<-EOF >text.txt &&
+   context
+   lines
+   to
+   anchor
QIndented text to
QQbe further indented by four spaces across
Qseveral lines
+   ${EMPTY}
+   QQtoo short without
+   ${EMPTY}
+   Q___

[PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.

Fix this by only using the alternate colors for adjacent moved blocks.

Signed-off-by: Phillip Wood 
---

Notes:
An alternative would be to always alternate the color of blocks whether
are not they are adjacent to each other.

 diff.c | 27 +++
 t/t4015-diff-whitespace.sh |  6 +++---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 53a7ab5aca..8c08dd68df 100644
--- a/diff.c
+++ b/diff.c
@@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct 
moved_block *pmb,
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  *
+ * Returns 0 if the last block is empty or is unset by this function, non zero
+ * otherwise.
+ *
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
-static void adjust_last_block(struct diff_options *o, int n, int block_length)
+static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
int i, alnum_count = 0;
if (o->color_moved == COLOR_MOVED_PLAIN)
-   return;
+   return block_length;
for (i = 1; i < block_length + 1; i++) {
const char *c = o->emitted_symbols->buf[n - i].line;
for (; *c; c++) {
if (!isalnum(*c))
continue;
alnum_count++;
if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
-   return;
+   return 1;
}
}
for (i = 1; i < block_length + 1; i++)
o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+   return 0;
 }
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
@@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o,
 {
struct moved_block *pmb = NULL; /* potentially moved blocks */
int pmb_nr = 0, pmb_alloc = 0;
-   int n, flipped_block = 1, block_length = 0;
+   int n, flipped_block = 0, block_length = 0;
 
 
for (n = 0; n < o->emitted_symbols->nr; n++) {
struct hashmap *hm = NULL;
struct moved_entry *key;
struct moved_entry *match = NULL;
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
+   enum diff_symbol last_symbol = 0;
 
switch (l->s) {
case DIFF_SYMBOL_PLUS:
@@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o,
free(key);
break;
default:
-   flipped_block = 1;
+   flipped_block = 0;
}
 
if (!match) {
@@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o,
moved_block_clear([i]);
pmb_nr = 0;
block_length = 0;
+   flipped_block = 0;
+   last_symbol = l->s;
continue;
}
 
if (o->color_moved == COLOR_MOVED_PLAIN) {
+   last_symbol = l->s;
l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
}
@@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o,
}
}
 
-   flipped_block = (flipped_block + 1) % 2;
+   if (adjust_last_block(o, n, block_length) &&
+   pmb_nr && last_symbol != l->s)
+   flipped_block = (flipped_block + 1) % 2;
+   else
+   flipped_block = 0;
 
-   adjust_last_block(o, n, block_length);
block_length = 0;
}
 
if (pmb_nr) {
block_length++;
-
l->flags |= DIFF_SYMBOL_MOVED_LINE;
if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
}
+   last_symbol = l->s;
}
adjust_last_block(o, n, block

[PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int al = cur->es->len, cl = l->len;
+   int al = cur->es->len, bl = match->es->len, cl = l->len;
const char *a = cur->es->line,
   *b = match->es->line,
   *c = l->line;
-
+   const char *orig_a = a;
int wslen;
 
/*
-* We need to check if 'cur' is equal to 'match'.
-* As those are from the same (+/-) side, we do not need to adjust for
-* indent changes. However these were found using fuzzy matching
-* so we do have to check if they are equal.
+* We need to check if 'cur' is equal to 'match'.  As those
+* are from the same (+/-) side, we do not need to adjust for
+* indent changes. However these were found using fuzzy
+* matching so we do have to check if they are equal. Here we
+* just check the lengths. We delay calling memcmp() to check
+* the contents until later as if the length comparison for a
+* and c fails we can avoid the call all together.
 */
-   if (strcmp(a, b))
+   if (al != bl)
return 1;
 
if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (al != cl || memcmp(a, c, al))
+   if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.1



Re: [PATCH v2 2/2] rebase: validate -C and --whitespace= parameters early

2018-11-14 Thread Phillip Wood

Hi Johannes

Thanks for doing this, I think this patch is good. I've not checked the 
first patch as I think it is the same as before judging from the 
covering letter.


Best Wishes

Phillip

On 14/11/2018 16:25, Johannes Schindelin via GitGitGadget wrote:

From: Johannes Schindelin 

It is a good idea to error out early upon seeing, say, `-Cbad`, rather
than starting the rebase only to have the `--am` backend complain later.

Let's do this.

The only options accepting parameters which we pass through to `git am`
(which may, or may not, forward them to `git apply`) are `-C` and
`--whitespace`. The other options we pass through do not accept
parameters, so we do not have to validate them here.

Signed-off-by: Johannes Schindelin 
---
  builtin/rebase.c  | 12 +++-
  t/t3406-rebase-message.sh |  7 +++
  2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 96ffa80b71..571cf899d5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1064,12 +1064,22 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
}
  
  	for (i = 0; i < options.git_am_opts.argc; i++) {

-   const char *option = options.git_am_opts.argv[i];
+   const char *option = options.git_am_opts.argv[i], *p;
if (!strcmp(option, "--committer-date-is-author-date") ||
!strcmp(option, "--ignore-date") ||
!strcmp(option, "--whitespace=fix") ||
!strcmp(option, "--whitespace=strip"))
options.flags |= REBASE_FORCE;
+   else if (skip_prefix(option, "-C", )) {
+   while (*p)
+   if (!isdigit(*(p++)))
+   die(_("switch `C' expects a "
+ "numerical value"));
+   } else if (skip_prefix(option, "--whitespace=", )) {
+   if (*p && strcmp(p, "warn") && strcmp(p, "nowarn") &&
+   strcmp(p, "error") && strcmp(p, "error-all"))
+   die("Invalid whitespace option: '%s'", p);
+   }
}
  
  	if (!(options.flags & REBASE_NO_QUIET))

diff --git a/t/t3406-rebase-message.sh b/t/t3406-rebase-message.sh
index 0392e36d23..2c79eed4fe 100755
--- a/t/t3406-rebase-message.sh
+++ b/t/t3406-rebase-message.sh
@@ -84,4 +84,11 @@ test_expect_success 'rebase --onto outputs the invalid ref' '
test_i18ngrep "invalid-ref" err
  '
  
+test_expect_success 'error out early upon -C or --whitespace=' '

+   test_must_fail git rebase -Cnot-a-number HEAD 2>err &&
+   test_i18ngrep "numerical value" err &&
+   test_must_fail git rebase --whitespace=bad HEAD 2>err &&
+   test_i18ngrep "Invalid whitespace option" err
+'
+
  test_done





Re: [PATCH 1/1] rebase: really just passthru the `git am` options

2018-11-13 Thread Phillip Wood

Hi Johannes

On 13/11/2018 19:21, Johannes Schindelin wrote:

Hi Phillip,

On Tue, 13 Nov 2018, Phillip Wood wrote:


Thanks for looking at this. Unfortunately using OPT_PASSTHRU_ARGV seems to
break the error reporting

Running
   bin/wrappers/git rebase --onto @ @^^ -Cbad

Gives
   git encountered an error while preparing the patches to replay
   these revisions:


67f673aa4a580b9e407b1ca505abf1f50510ec47...7c3e01a708856885e60bf4051586970e65dd326c

   As a result, git cannot rebase them.



git 2.19.1 gives

First, rewinding head to replay your work on top of it...
Applying: Ninth batch for 2.20
error: switch `C' expects a numerical value

So it has a clear message as to what the error is, this patch regresses 
that. It would be better if rebase detected the error before starting 
though.



If I do

   bin/wrappers/git rebase @^^ -Cbad

I get no error, it just tells me that it does not need to rebase (which is
true)


Hmm. Isn't this the same behavior as with the scripted version?


Ah you're right the script does not check if the option argument is valid.


Also: are
we sure that we want to allow options to come *after* the ``
argument?


Maybe not but the scripted version does. I'm not sure if that is a good 
idea or not.


Best Wishes

Phillip


Ciao,
Dscho


Best Wishes

Phillip


On 13/11/2018 12:38, Johannes Schindelin via GitGitGadget wrote:

From: Johannes Schindelin 

Currently, we parse the options intended for `git am` as if we wanted to
handle them in `git rebase`, and then reconstruct them painstakingly to
define the `git_am_opt` variable.

However, there is a much better way (that I was unaware of, at the time
when I mentored Pratik to implement these options): OPT_PASSTHRU_ARGV.
It is intended for exactly this use case, where command-line options
want to be parsed into a separate `argv_array`.

Let's use this feature.

Incidentally, this also allows us to address a bug discovered by Phillip
Wood, where the built-in rebase failed to understand that the `-C`
option takes an optional argument.

Signed-off-by: Johannes Schindelin 
---
   builtin/rebase.c | 98 +---
   1 file changed, 35 insertions(+), 63 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 0ee06aa363..96ffa80b71 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -87,7 +87,7 @@ struct rebase_options {
 REBASE_FORCE = 1<<3,
 REBASE_INTERACTIVE_EXPLICIT = 1<<4,
} flags;
-   struct strbuf git_am_opt;
+   struct argv_array git_am_opts;
const char *action;
int signoff;
int allow_rerere_autoupdate;
@@ -339,7 +339,7 @@ N_("Resolve all conflicts manually, mark them as
resolved with\n"
   static int run_specific_rebase(struct rebase_options *opts)
   {
const char *argv[] = { NULL, NULL };
-   struct strbuf script_snippet = STRBUF_INIT;
+   struct strbuf script_snippet = STRBUF_INIT, buf = STRBUF_INIT;
int status;
const char *backend, *backend_func;
   @@ -433,7 +433,9 @@ static int run_specific_rebase(struct rebase_options
*opts)
oid_to_hex(>restrict_revision->object.oid) : NULL);
add_var(_snippet, "GIT_QUIET",
opts->flags & REBASE_NO_QUIET ? "" : "t");
-   add_var(_snippet, "git_am_opt", opts->git_am_opt.buf);
+   sq_quote_argv_pretty(, opts->git_am_opts.argv);
+   add_var(_snippet, "git_am_opt", buf.buf);
+   strbuf_release();
add_var(_snippet, "verbose",
opts->flags & REBASE_VERBOSE ? "t" : "");
add_var(_snippet, "diffstat",
@@ -756,7 +758,7 @@ int cmd_rebase(int argc, const char **argv, const char
*prefix)
struct rebase_options options = {
 .type = REBASE_UNSPECIFIED,
 .flags = REBASE_NO_QUIET,
-   .git_am_opt = STRBUF_INIT,
+   .git_am_opts = ARGV_ARRAY_INIT,
 .allow_rerere_autoupdate  = -1,
 .allow_empty_message = 1,
 .git_format_patch_opt = STRBUF_INIT,
@@ -777,12 +779,7 @@ int cmd_rebase(int argc, const char **argv, const
char *prefix)
 ACTION_EDIT_TODO,
 ACTION_SHOW_CURRENT_PATCH,
} action = NO_ACTION;
-   int committer_date_is_author_date = 0;
-   int ignore_date = 0;
-   int ignore_whitespace = 0;
const char *gpg_sign = NULL;
-   int opt_c = -1;
-   struct string_list whitespace = STRING_LIST_INIT_NODUP;
struct string_list exec = STRING_LIST_INIT_NODUP;
const char *rebase_merges = NULL;
int fork_point = -1;
@@ -804,15 +801,20 @@ int cmd_rebase(int argc, const char **argv, const
char *prefix)
 {OPTION_NEGBIT, 'n', "no-stat", , NULL,
  N_("do not show diffstat of what changed upstream"),
  PARSE_OPT_NOARG, NULL, REBASE_DIFFSTAT },
-   OPT_BOOL(0, "ignore-whitespace", _whitespace,
-N_("passed to 'git 

Re: [PATCH 1/1] rebase: really just passthru the `git am` options

2018-11-13 Thread Phillip Wood

Hi Johannes

Thanks for looking at this. Unfortunately using OPT_PASSTHRU_ARGV seems 
to break the error reporting


Running
  bin/wrappers/git rebase --onto @ @^^ -Cbad

Gives
  git encountered an error while preparing the patches to replay
  these revisions:


67f673aa4a580b9e407b1ca505abf1f50510ec47...7c3e01a708856885e60bf4051586970e65dd326c

  As a result, git cannot rebase them.

If I do

  bin/wrappers/git rebase @^^ -Cbad

I get no error, it just tells me that it does not need to rebase (which 
is true)


Best Wishes

Phillip


On 13/11/2018 12:38, Johannes Schindelin via GitGitGadget wrote:

From: Johannes Schindelin 

Currently, we parse the options intended for `git am` as if we wanted to
handle them in `git rebase`, and then reconstruct them painstakingly to
define the `git_am_opt` variable.

However, there is a much better way (that I was unaware of, at the time
when I mentored Pratik to implement these options): OPT_PASSTHRU_ARGV.
It is intended for exactly this use case, where command-line options
want to be parsed into a separate `argv_array`.

Let's use this feature.

Incidentally, this also allows us to address a bug discovered by Phillip
Wood, where the built-in rebase failed to understand that the `-C`
option takes an optional argument.

Signed-off-by: Johannes Schindelin 
---
  builtin/rebase.c | 98 +---
  1 file changed, 35 insertions(+), 63 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 0ee06aa363..96ffa80b71 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -87,7 +87,7 @@ struct rebase_options {
REBASE_FORCE = 1<<3,
REBASE_INTERACTIVE_EXPLICIT = 1<<4,
} flags;
-   struct strbuf git_am_opt;
+   struct argv_array git_am_opts;
const char *action;
int signoff;
int allow_rerere_autoupdate;
@@ -339,7 +339,7 @@ N_("Resolve all conflicts manually, mark them as resolved 
with\n"
  static int run_specific_rebase(struct rebase_options *opts)
  {
const char *argv[] = { NULL, NULL };
-   struct strbuf script_snippet = STRBUF_INIT;
+   struct strbuf script_snippet = STRBUF_INIT, buf = STRBUF_INIT;
int status;
const char *backend, *backend_func;
  
@@ -433,7 +433,9 @@ static int run_specific_rebase(struct rebase_options *opts)

oid_to_hex(>restrict_revision->object.oid) : NULL);
add_var(_snippet, "GIT_QUIET",
opts->flags & REBASE_NO_QUIET ? "" : "t");
-   add_var(_snippet, "git_am_opt", opts->git_am_opt.buf);
+   sq_quote_argv_pretty(, opts->git_am_opts.argv);
+   add_var(_snippet, "git_am_opt", buf.buf);
+   strbuf_release();
add_var(_snippet, "verbose",
opts->flags & REBASE_VERBOSE ? "t" : "");
add_var(_snippet, "diffstat",
@@ -756,7 +758,7 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
struct rebase_options options = {
.type = REBASE_UNSPECIFIED,
.flags = REBASE_NO_QUIET,
-   .git_am_opt = STRBUF_INIT,
+   .git_am_opts = ARGV_ARRAY_INIT,
.allow_rerere_autoupdate  = -1,
.allow_empty_message = 1,
.git_format_patch_opt = STRBUF_INIT,
@@ -777,12 +779,7 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
ACTION_EDIT_TODO,
ACTION_SHOW_CURRENT_PATCH,
} action = NO_ACTION;
-   int committer_date_is_author_date = 0;
-   int ignore_date = 0;
-   int ignore_whitespace = 0;
const char *gpg_sign = NULL;
-   int opt_c = -1;
-   struct string_list whitespace = STRING_LIST_INIT_NODUP;
struct string_list exec = STRING_LIST_INIT_NODUP;
const char *rebase_merges = NULL;
int fork_point = -1;
@@ -804,15 +801,20 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
{OPTION_NEGBIT, 'n', "no-stat", , NULL,
N_("do not show diffstat of what changed upstream"),
PARSE_OPT_NOARG, NULL, REBASE_DIFFSTAT },
-   OPT_BOOL(0, "ignore-whitespace", _whitespace,
-N_("passed to 'git apply'")),
OPT_BOOL(0, "signoff", ,
 N_("add a Signed-off-by: line to each commit")),
-   OPT_BOOL(0, "committer-date-is-author-date",
-_date_is_author_date,
-N_("passed to 'git am'")),
-   OPT_BOOL(0, "ignore-date", _date,
-N_("passed to 'git am'")),
+   OPT_PASSTHRU_ARGV(0, "ignore-whitespace", _am_opts,
+ NULL, N_(&q

Regression: rebase -C1 fails in master

2018-11-12 Thread Phillip Wood
I've just tried running

bin-wrappers/git rebase -C1 @^

and I get

error: unknown switch `1'
usage: git rebase [-i] [options] [--exec ] [--onto ]
[] []
   or: git rebase [-i] [options] [--exec ] [--onto ]
--root []
   or: git rebase --continue | --abort | --skip | --edit-todo
...

bin-wrappers/git --version gives
git version 2.19.1.856.g8858448bb4

Unfortunately I've not got time it investigate properly at the moment.

Best Wishes

Phillip


Re: [PATCH v2 2/2] rebase: Implement --merge via git-rebase--interactive

2018-11-12 Thread Phillip Wood
Hi Elijah

It's good to see these patches progressing, I've just got a couple of
comments related to Johannes' points below

On 12/11/2018 16:21, Johannes Schindelin wrote:
> Hi Elijah,
> 
> On Wed, 7 Nov 2018, Elijah Newren wrote:
> 
>> Interactive rebases are implemented in terms of cherry-pick rather than
>> the merge-recursive builtin, but cherry-pick also calls into the recursive
>> merge machinery by default and can accept special merge strategies and/or
>> special strategy options.  As such, there really is not any need for
>> having both git-rebase--merge and git-rebase--interactive anymore.
>>
>> Delete git-rebase--merge.sh and have the --merge option be implemented
>> by the now built-in interactive machinery.
> 
> Okay.
> 
>> Note that this change fixes a few known test failures (see t3421).
> 
> Nice!
> 
>> testcase modification notes:
>>   t3406: --interactive and --merge had slightly different progress output
>>  while running; adjust a test to match
>>   t3420: these test precise output while running, but rebase--am,
>>  rebase--merge, and rebase--interactive all were built on very
>>  different commands (am, merge-recursive, cherry-pick), so the
>>  tests expected different output for each type.  Now we expect
>>  --merge and --interactive to have the same output.
>>   t3421: --interactive fixes some bugs in --merge!  Wahoo!
>>   t3425: topology linearization was inconsistent across flavors of rebase,
>>  as already noted in a TODO comment in the testcase.  This was not
>>  considered a bug before, so getting a different linearization due
>>  to switching out backends should not be considered a bug now.
> 
> Ideally, the test would be fixed, then. If it fails for other reasons than
> a real regression, it is not a very good regression test, is it.
> 
>>   t5407: different rebase types varied slightly in how many times checkout
>>  or commit or equivalents were called based on a quick comparison
>>  of this tests and previous ones which covered different rebase
>>  flavors.  I think this is just attributable to this difference.
> 
> This concerns me.
> 
> In bigger repositories (no, I am not talking about Linux kernel sized
> ones, I consider those small-ish) there are a ton of files, and checkout
> and commit (and even more so reset) sadly do not have a runtime complexity
> growing with the number of modified files, but with the number of tracked
> files (and some commands even with the number of worktree files).
> 
> So a larger number of commit/checkout operations makes me expect
> performance regressions.
> 
> In this light, could you elaborate a bit on the differences you see
> between rebase -i and rebase -m?
> 
>>   t9903: --merge uses the interactive backend so the prompt expected is
>>  now REBASE-i.
> 
> We should be able to make that test pass, still, by writing out a special
> file (e.g. $state_dir/opt_m) and testing for that. Users are oddly upset
> when their expectations are broken... (and I actually agree with them.)
> 
>> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
>> index 3407d835bd..35084f5681 100644
>> --- a/Documentation/git-rebase.txt
>> +++ b/Documentation/git-rebase.txt
>> @@ -504,15 +504,7 @@ See also INCOMPATIBLE OPTIONS below.
>>  INCOMPATIBLE OPTIONS
>>  
>>  
>> -git-rebase has many flags that are incompatible with each other,
>> -predominantly due to the fact that it has three different underlying
>> -implementations:
>> -
>> - * one based on linkgit:git-am[1] (the default)
>> - * one based on git-merge-recursive (merge backend)
>> - * one based on linkgit:git-cherry-pick[1] (interactive backend)
>> -
> 
> Could we retain this part, with `s/three/two/` and `/git-merge/d`? *Maybe*
> `s/interactive backend/interactive\/merge backend/`

I was hoping we could get rid of that, I'm not sure how useful it is to
users.

> 
>> -Flags only understood by the am backend:
>> +The following options:
>>  
>>   * --committer-date-is-author-date
>>   * --ignore-date
>> @@ -520,15 +512,12 @@ Flags only understood by the am backend:
>>   * --ignore-whitespace
>>   * -C
>>  
>> -Flags understood by both merge and interactive backends:
>> +are incompatible with the following options:
>>  
>>   * --merge
>>   * --strategy
>>   * --strategy-option
>>   * --allow-empty-message
>> -
>> -Flags only understood by the interactive backend:
>> -

It's nice to see this being simplified

>>   * --[no-]autosquash
>>   * --rebase-merges
>>   * --preserve-merges
>> @@ -539,7 +528,7 @@ Flags only understood by the interactive backend:
>>   * --edit-todo
>>   * --root when used in combination with --onto
>>  
>> -Other incompatible flag pairs:
>> +In addition, the following pairs of options are incompatible:
>>  
>>   * --preserve-merges and --interactive
>>   * --preserve-merges and --signoff
> 
> The rest of the diff is funny to read, but the post 

Re: What's cooking in git.git (Nov 2018, #03; Wed, 7)

2018-11-08 Thread Phillip Wood

On 07/11/2018 09:41, Junio C Hamano wrote:

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

 http://git-blame.blogspot.com/p/git-public-repositories.html

--

* pw/add-p-select (2018-07-26) 4 commits
  - add -p: optimize line selection for short hunks
  - add -p: allow line selection to be inverted
  - add -p: select modified lines correctly
  - add -p: select individual hunk lines

  "git add -p" interactive interface learned to let users choose
  individual added/removed lines to be used in the operation, instead
  of accepting or rejecting a whole hunk.

  Will discard.
  No further feedbacks on the topic for quite some time.


Unfortunately I've not found time to work on this recently and don't see 
myself doing so before the new year so I think it makes sense to drop 
it. Hopefully I can come up with something next year, it would be nice 
for users to avoid having to edit patches where possible.


Best Wishes

Phillip


Re: [PATCH/RFC] sequencer.c: record revert/cherry-pick commit with trailer lines

2018-11-04 Thread Phillip Wood

Hi Duy

On 04/11/2018 17:41, Duy Nguyen wrote:

On Sun, Nov 4, 2018 at 5:45 PM Phillip Wood  wrote:


On 04/11/2018 07:22, Nguyễn Thái Ngọc Duy wrote:

When a commit is reverted (or cherry-picked with -x) we add an English
sentence recording that commit id in the new commit message. Make
these real trailer lines instead so that they are more friendly to
parsers (especially "git interpret-trailers").

A reverted commit will have a new trailer

  Revert: 

Similarly a cherry-picked commit with -x will have

  Cherry-Pick: 


I think this is a good idea though I wonder if it will break someones
script that is looking for the messages generated by -x at the moment.


It will [1] but I still think it's worth the trouble. The script will
be less likely to break after, and you can use git-interpret-trailers
instead of plain grep.

[1] 
https://public-inbox.org/git/20181017143921.gr270...@devbig004.ftw2.facebook.com/


@@ -1758,16 +1757,10 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
   base_label = msg.label;
   next = parent;
   next_label = msg.parent_label;
- strbuf_addstr(, "Revert \"");
- strbuf_addstr(, msg.subject);
- strbuf_addstr(, "\"\n\nThis reverts commit ");
- strbuf_addstr(, oid_to_hex(>object.oid));
-
- if (commit->parents && commit->parents->next) {
- strbuf_addstr(, ", reversing\nchanges made to ");
- strbuf_addstr(, oid_to_hex(>object.oid));
- }


As revert currently records the parent given on the command line when
reverting a merge commit it would probably be a good idea to add that
either as a separate trailer or to the Revert: trailer and possibly also
generate it for cherry picks.


My mistake. I didn't read carefully and thought it was logging
commit->parents, which is pointless.

So what should be the trailer for this (I don't think putting it in
Revert: is a good idea, too much to parse)? Revert-parent: ?
Revert-merge: ?


I think Revert-parent: is good, though you seem to have gone for 
including it the Revert: trailer in v2.

- strbuf_addstr(, ".\n");
+ strbuf_addf(, "Revert \"%s\"\n\n", msg.subject);


If the message already contains trailers then should we just append the
Revert trailer those rather than inserting "\n\n"?


Umm.. but this \n\n is for separating the subject and the body. I
think we need it anyway, trailer or not.


Ah you're right, I had forgotten that the revert message body is empty, 
unlike cherry-pick where the message is copied (and that does the right 
thing already when there are existing trailers).


Best wishes

Phillip



@@ -1784,9 +1777,8 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
   strbuf_complete_line();
   if (!has_conforming_footer(, NULL, 0))
   strbuf_addch(, '\n');
- strbuf_addstr(, cherry_picked_prefix);
- strbuf_addstr(, oid_to_hex(>object.oid));
- strbuf_addstr(, ")\n");
+ strbuf_addf(, "Cherry-Pick: %s\n",
+ oid_to_hex(>object.oid));


I will probably make this "Cherry-picked-from:" to match our S-o-b style.





Re: [PATCH/RFC] sequencer.c: record revert/cherry-pick commit with trailer lines

2018-11-04 Thread Phillip Wood

On 04/11/2018 07:22, Nguyễn Thái Ngọc Duy wrote:

When a commit is reverted (or cherry-picked with -x) we add an English
sentence recording that commit id in the new commit message. Make
these real trailer lines instead so that they are more friendly to
parsers (especially "git interpret-trailers").

A reverted commit will have a new trailer

 Revert: 

Similarly a cherry-picked commit with -x will have

 Cherry-Pick: 


I think this is a good idea though I wonder if it will break someones 
script that is looking for the messages generated by -x at the moment. 
I've got a couple of comments below.



Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  I think standardizing how we record commit ids in the commit message
  is a good idea. Though to be honest this started because of my irk of
  an English string "cherry picked from..." that cannot be translated.
  It might as well be a computer language that happens to look like
  English.

  Documentation/git-cherry-pick.txt |  5 ++---
  sequencer.c   | 20 ++--
  t/t3510-cherry-pick-sequence.sh   | 12 ++--
  t/t3511-cherry-pick-x.sh  | 26 +-
  4 files changed, 27 insertions(+), 36 deletions(-)

diff --git a/Documentation/git-cherry-pick.txt 
b/Documentation/git-cherry-pick.txt
index d35d771fc8..8ffbaed1a0 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -58,9 +58,8 @@ OPTIONS
message prior to committing.
  
  -x::

-   When recording the commit, append a line that says
-   "(cherry picked from commit ...)" to the original commit
-   message in order to indicate which commit this change was
+   When recording the commit, append "Cherry-Pick:" trailer line
+   recording the commit name which commit this change was
cherry-picked from.  This is done only for cherry
picks without conflicts.  Do not use this option if
you are cherry-picking from your private branch because
diff --git a/sequencer.c b/sequencer.c
index 9e1ab3a2a7..f7318f2862 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -36,7 +36,6 @@
  #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
  
  const char sign_off_header[] = "Signed-off-by: ";

-static const char cherry_picked_prefix[] = "(cherry picked from commit ";
  
  GIT_PATH_FUNC(git_path_commit_editmsg, "COMMIT_EDITMSG")
  
@@ -1758,16 +1757,10 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,

base_label = msg.label;
next = parent;
next_label = msg.parent_label;
-   strbuf_addstr(, "Revert \"");
-   strbuf_addstr(, msg.subject);
-   strbuf_addstr(, "\"\n\nThis reverts commit ");
-   strbuf_addstr(, oid_to_hex(>object.oid));
-
-   if (commit->parents && commit->parents->next) {
-   strbuf_addstr(, ", reversing\nchanges made to ");
-   strbuf_addstr(, oid_to_hex(>object.oid));
-   }


As revert currently records the parent given on the command line when 
reverting a merge commit it would probably be a good idea to add that 
either as a separate trailer or to the Revert: trailer and possibly also 
generate it for cherry picks.



-   strbuf_addstr(, ".\n");
+   strbuf_addf(, "Revert \"%s\"\n\n", msg.subject);


If the message already contains trailers then should we just append the 
Revert trailer those rather than inserting "\n\n"?


Best Wishes

Phillip


+
+   strbuf_addf(, "Revert: %s\n",
+   oid_to_hex(>object.oid));



} else {
const char *p;
  
@@ -1784,9 +1777,8 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,

strbuf_complete_line();
if (!has_conforming_footer(, NULL, 0))
strbuf_addch(, '\n');
-   strbuf_addstr(, cherry_picked_prefix);
-   strbuf_addstr(, oid_to_hex(>object.oid));
-   strbuf_addstr(, ")\n");
+   strbuf_addf(, "Cherry-Pick: %s\n",
+   oid_to_hex(>object.oid));
}
if (!is_fixup(command))
author = get_author(msg.message);
diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
index c84eeefdc9..89b6e7fc1e 100755
--- a/t/t3510-cherry-pick-sequence.sh
+++ b/t/t3510-cherry-pick-sequence.sh
@@ -390,10 +390,10 @@ test_expect_success '--continue respects opts' '
git cat-file commit HEAD~1 >picked_msg &&
git cat-file commit HEAD~2 >unrelatedpick_msg &&
git cat-file commit HEAD~3 >initial_msg &&
-   ! grep "cherry picked from" initial_msg &&
-   grep "cherry picked from" unrelatedpick_msg &&
-   grep "cherry picked from" picked_msg &&
-   grep "cherry picked from" 

Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-02 Thread Phillip Wood

Hi Alban

On 02/11/2018 16:26, Alban Gruin wrote:

Hi Phillip,

Le 02/11/2018 à 11:09, Phillip Wood a écrit :

+    struct todo_item *items = NULL,
+    base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
+
+    strbuf_addstr(buf, commands);
+    base_item.offset_in_buf = buf->len - commands_len - 1;
+    base_item.arg = buf->buf + base_item.offset_in_buf;


I think if the user gives --exec more than once on the command line then
commands will contain more than one exec command so this needs to parse
commands and create one todo_item for each command.



Ouch, you’re right.  Thanks for the heads up.


I haven't looked how difficult it would be but it might be best to
change the option parsing to pass an array of strings containing the
exec commands rather than one long string so we can just loop over the
array here.



It would be the best way to do so.  This string comes from git-rebase.sh
(or builtin/rebase.c) -- they format it this way before invoking
git-rebase--interactive.  So either I modify both of them (for this, I
would need to rebase my branch on master), or I can split this string in
builtin/rebase--interactive.c.  I prefer the first option, but maybe
changing the base of this series will not please Junio.


I think in the last 'what's cooking' email Junio said he was going to 
merge all the builtin/rebase.c stuff to master so there may not be a 
problem if you wait a couple of days.


Best Wishes

Phillip


Cheers,
Alban





Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-02 Thread Phillip Wood
Hi Alban

On 01/11/2018 23:31, Alban Gruin wrote:
> Le 30/10/2018 à 17:47, Phillip Wood a écrit :
>> On 27/10/2018 22:29, Alban Gruin wrote:
>>> This refactors sequencer_add_exec_commands() to work on a todo_list to
>>> avoid redundant reads and writes to the disk.
>>>
>>> An obvious way to do this would be to insert the `exec' command between
>>> the other commands, and reparse it once this is done.  This is not what
>>> is done here.  Instead, the command is appended to the buffer once, and
>>> a new list of items is created.  Items from the old list are copied to
>>> it, and new `exec' items are appended when necessary.  This eliminates
>>> the need to reparse the todo list, but this also means its buffer cannot
>>> be directly written to the disk, hence todo_list_write_to_disk().
>>
>> I'd reword this slightly, maybe
>>
>> Instead of just inserting the `exec' command between the other commands,
>> and re-parsing the buffer at the end the exec command is appended to the
>> buffer once, and a new list of items is created.  Items from the old
>> list are copied across and new `exec' items are appended when necessary.
>>  This eliminates the need to reparse the buffer, but this also means we
>> have to use todo_list_write_to_disk() to write the file.
>>
>>> sequencer_add_exec_commands() still reads the todo list from the disk,
>>> as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
>>> todo_list structure, and reparses it at the end.
>>
>> I think the saying 'reparses' is confusing as that is what we're trying
>> to avoid.
>>
>>> complete_action() still uses sequencer_add_exec_commands() for now.
>>> This will be changed in a future commit.
>>>
>>> Signed-off-by: Alban Gruin 
>>> ---
>>>   sequencer.c | 69 +
>>>   1 file changed, 49 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/sequencer.c b/sequencer.c
>>> index e12860c047..12a3efeca8 100644
>>> --- a/sequencer.c
>>> +++ b/sequencer.c
>>> @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc,
>>> const char **argv,
>>>   return 0;
>>>   }
>>>   +static void todo_list_add_exec_commands(struct todo_list *todo_list,
>>> +    const char *commands)
>>> +{
>>> +    struct strbuf *buf = _list->buf;
>>> +    const char *old_buf = buf->buf;
>>> +    size_t commands_len = strlen(commands + strlen("exec ")) - 1;
>>> +    int i, first = 1, nr = 0, alloc = 0;
>>
>> Minor nit pick, I think it is clearer if first is initialized just
>> before the loop as it is in the deleted code below.
>>
>>> +    struct todo_item *items = NULL,
>>> +    base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>>> +
>>> +    strbuf_addstr(buf, commands);
>>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>
>> I think if the user gives --exec more than once on the command line then
>> commands will contain more than one exec command so this needs to parse
>> commands and create one todo_item for each command.
>>
> 
> Ouch, you’re right.  Thanks for the heads up.

I haven't looked how difficult it would be but it might be best to
change the option parsing to pass an array of strings containing the
exec commands rather than one long string so we can just loop over the
array here.

> 
>>> +
>>> +    /*
>>> + * Insert  after every pick. Here, fixup/squash chains
>>> + * are considered part of the pick, so we insert the commands
>>> *after*
>>> + * those chains if there are any.
>>> + */
>>> +    for (i = 0; i < todo_list->nr; i++) {
>>> +    enum todo_command command = todo_list->items[i].command;
>>> +    if (todo_list->items[i].arg)
>>> +    todo_list->items[i].arg = todo_list->items[i].arg -
>>> old_buf + buf->buf;
>>> +
>>> +    if (command == TODO_PICK && !first) {
>>> +    ALLOC_GROW(items, nr + 1, alloc);
>>> +    memcpy(items + nr++, _item, sizeof(struct todo_item));
>>
>> I think it would be clearer to say
>> items[nr++] = base_item;
>> rather than using memcpy. This applies below and to some of the other
>> patches as well. Also this needs to loop over all the base_items if the
>> user gave --exec more than once on the command line.
>>
> 
> I agree with you, it’s way more readable, IMO.  But for some reason, I
> thought it was not possible to assign a struct to another in C.

In general one needs to be careful as it is only does a shallow copy but
that is exactly what we want here. Having said that if we have an array
of exec commands to insert then it might be worth sticking with memcpy()
here so the whole array can be copied in one go.

Best Wishes

Phillip

>> Best Wishes
>>
>> Phillip
>>
> 
> Cheers,
> Alban
> 



Re: [PATCH v4 0/5] am/rebase: share read_author_script()

2018-11-01 Thread Phillip Wood

Hi Junio

On 01/11/2018 03:01, Junio C Hamano wrote:

Phillip Wood  writes:


From: Phillip Wood 

Sorry for the confusion with v3, here are the updated patches.

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.


The mistake of overwriting -1 (i.e. earlier we detected dup) with
the third instance of the same originates at [2/5], so updating
[4/5] without fixing it at its source would mean [4/5] is not a pure
code movement to make it available to libgit users---it instead hides
a (not so important) bugfix in it.


Facepalm. Thanks for clearing up my mess, what you've got in pu looks 
fine to me.


Thanks

Phillip





v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.


Phillip Wood (5):
   am: don't die in read_author_script()
   am: improve author-script error reporting
   am: rename read_author_script()
   add read_author_script() to libgit
   sequencer: use read_author_script()

  builtin/am.c |  60 ++--
  sequencer.c  | 192 ---
  sequencer.h  |   3 +
  3 files changed, 128 insertions(+), 127 deletions(-)




[PATCH v4 2/5] am: improve author-script error reporting

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i >= 0)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i >= 0)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i >= 0)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.1



[PATCH v4 0/5] am/rebase: share read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Sorry for the confusion with v3, here are the updated patches.

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.


Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--
 sequencer.c  | 192 ---
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1



[PATCH v4 4/5] add read_author_script() to libgit

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v2:
 - tweaked an error message as suggested by Junio
 - fixed corner case where a key is given three times (Thanks to Junio
   for pointing this out)

changes since v1:
 - added comment above read_author_script()
 - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i >= 0)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i >= 0)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i >= 0)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-

[PATCH v4 1/5] am: don't die in read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine 
Signed-off-by: Phillip Wood 
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
if (fd < 0) {
if (errno == ENOENT)
return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
+   return error_errno(_("could not open '%s' for reading"),
+  filename);
}
strbuf_read(, fd, 0);
close(fd);
-- 
2.19.1



[PATCH v4 5/5] sequencer: use read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - use argv_array_pushf() as suggested by Eric
 - fixed strbuf handling as suggested by Eric
 - fix comments and commit message to reflect changed behavior of
   read_env_script()

 sequencer.c | 97 -
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index af9987c807..09dc200b4f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-   struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+   argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+   argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebas

[PATCH v4 3/5] am: rename read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.19.1



Re: [PATCH v3 0/5] am/rebase: share read_author_script()

2018-10-31 Thread Phillip Wood
On 31/10/2018 02:50, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
>> those comments, the rest are unchanged.
> 
> Hmph, all these five patches seem to be identical to what I have in
> 'pu'.  Did you send the right version?

Oh dear that's embarrassing. I updated the patches on my laptop and
forget to sync before sending them from my desktop. I'll send v4 now.

Sorry for the confusion

Phillip

> 
>> v1 cover letter:
>>
>> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
>> duplication and improves rebase's parsing of the author script. After
>> this I'll do another series to share the code to write the author
>> script.
>>
>> Phillip Wood (5):
>>   am: don't die in read_author_script()
>>   am: improve author-script error reporting
>>   am: rename read_author_script()
>>   add read_author_script() to libgit
>>   sequencer: use read_author_script()
>>
>>  builtin/am.c |  60 ++--
>>  sequencer.c  | 192 ---
>>  sequencer.h  |   3 +
>>  3 files changed, 128 insertions(+), 127 deletions(-)



Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-10-30 Thread Phillip Wood

On 27/10/2018 22:29, Alban Gruin wrote:

This refactors sequencer_add_exec_commands() to work on a todo_list to
avoid redundant reads and writes to the disk.

An obvious way to do this would be to insert the `exec' command between
the other commands, and reparse it once this is done.  This is not what
is done here.  Instead, the command is appended to the buffer once, and
a new list of items is created.  Items from the old list are copied to
it, and new `exec' items are appended when necessary.  This eliminates
the need to reparse the todo list, but this also means its buffer cannot
be directly written to the disk, hence todo_list_write_to_disk().


I'd reword this slightly, maybe

Instead of just inserting the `exec' command between the other commands, 
and re-parsing the buffer at the end the exec command is appended to the 
buffer once, and a new list of items is created.  Items from the old 
list are copied across and new `exec' items are appended when necessary. 
 This eliminates the need to reparse the buffer, but this also means we 
have to use todo_list_write_to_disk() to write the file.



sequencer_add_exec_commands() still reads the todo list from the disk,
as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
todo_list structure, and reparses it at the end.


I think the saying 'reparses' is confusing as that is what we're trying 
to avoid.



complete_action() still uses sequencer_add_exec_commands() for now.
This will be changed in a future commit.

Signed-off-by: Alban Gruin 
---
  sequencer.c | 69 +
  1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index e12860c047..12a3efeca8 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc, const 
char **argv,
return 0;
  }
  
+static void todo_list_add_exec_commands(struct todo_list *todo_list,

+   const char *commands)
+{
+   struct strbuf *buf = _list->buf;
+   const char *old_buf = buf->buf;
+   size_t commands_len = strlen(commands + strlen("exec ")) - 1;
+   int i, first = 1, nr = 0, alloc = 0;


Minor nit pick, I think it is clearer if first is initialized just 
before the loop as it is in the deleted code below.



+   struct todo_item *items = NULL,
+   base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
+
+   strbuf_addstr(buf, commands);
+   base_item.offset_in_buf = buf->len - commands_len - 1;
+   base_item.arg = buf->buf + base_item.offset_in_buf;


I think if the user gives --exec more than once on the command line then 
commands will contain more than one exec command so this needs to parse 
commands and create one todo_item for each command.



+
+   /*
+* Insert  after every pick. Here, fixup/squash chains
+* are considered part of the pick, so we insert the commands *after*
+* those chains if there are any.
+*/
+   for (i = 0; i < todo_list->nr; i++) {
+   enum todo_command command = todo_list->items[i].command;
+   if (todo_list->items[i].arg)
+   todo_list->items[i].arg = todo_list->items[i].arg - 
old_buf + buf->buf;
+
+   if (command == TODO_PICK && !first) {
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, _item, sizeof(struct 
todo_item));


I think it would be clearer to say
items[nr++] = base_item;
rather than using memcpy. This applies below and to some of the other 
patches as well. Also this needs to loop over all the base_items if the 
user gave --exec more than once on the command line.


Best Wishes

Phillip


+   }
+
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, todo_list->items + i, sizeof(struct 
todo_item));
+   first = 0;
+   }
+
+   /* insert or append final  */
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, _item, sizeof(struct todo_item));
+
+   FREE_AND_NULL(todo_list->items);
+   todo_list->items = items;
+   todo_list->nr = nr;
+   todo_list->alloc = alloc;
+}
+
  /*
   * Add commands after pick and (series of) squash/fixup commands
   * in the todo list.
@@ -4224,10 +4268,7 @@ int sequencer_add_exec_commands(const char *commands)
  {
const char *todo_file = rebase_path_todo();
struct todo_list todo_list = TODO_LIST_INIT;
-   struct todo_item *item;
-   struct strbuf *buf = _list.buf;
-   size_t offset = 0, commands_len = strlen(commands);
-   int i, first;
+   int res;
  
  	if (strbuf_read_file(_list.buf, todo_file, 0) < 0)

return error(_("could not read '%s'."), todo_file);
@@ -4237,23 +4278,11 @@ int sequencer_add_exec_commands(const char *commands)
return error(_("unusable todo list: '%s'"), todo_file);

Re: [PATCH v2 04/16] sequencer: introduce todo_list_write_to_file()

2018-10-30 Thread Phillip Wood

Hi Alban

I like the direction this is going, it is an improvement on re-scanning 
the list at the end of each function.


On 27/10/2018 22:29, Alban Gruin wrote:

This introduce a new function to recreate the text of a todo list from
its commands, and then to write it to the disk.  This will be useful in
the future, the buffer of a todo list won’t be treated as a strict
mirror of the todo file by some of its functions once they will be
refactored.


I'd suggest rewording this slightly, maybe something like

This introduces a new function to recreate the text of a todo list from
its commands and write it to a file. This will be useful as the next few 
commits will change the use of the buffer in struct todo_list so it will 
no-longer be a mirror of the file on disk.



This functionnality can already be found in todo_list_transform(), but


s/functionnality/functionality/


it is specifically made to replace the buffer of a todo list, which is
not the desired behaviour.  Thus, the part of todo_list_transform() that
actually creates the buffer is moved to a new function,
todo_list_to_strbuf().  The rest is unused, and so is dropped.

todo_list_write_to_file() can also take care to append the help text to


s/care to append/care of appending/


the buffer before writing it to the disk, or to write only the first n
items of the list.


Why/when do we only want to write a subset of the items?


Signed-off-by: Alban Gruin 
---
  sequencer.c | 59 -
  sequencer.h |  4 +++-
  2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 07296f1f57..0dd45677b1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4256,24 +4256,27 @@ int sequencer_add_exec_commands(const char *commands)
return i;
  }
  
-void todo_list_transform(struct todo_list *todo_list, unsigned flags)

+static void todo_list_to_strbuf(struct todo_list *todo_list, struct strbuf 
*buf,
+   int num, unsigned flags)
  {
-   struct strbuf buf = STRBUF_INIT;
struct todo_item *item;
-   int i;
+   int i, max = todo_list->nr;
  
-	for (item = todo_list->items, i = 0; i < todo_list->nr; i++, item++) {

+   if (num > 0 && num < max)
+   max = num;
+
+   for (item = todo_list->items, i = 0; i < max; i++, item++) {
/* if the item is not a command write it and continue */
if (item->command >= TODO_COMMENT) {
-   strbuf_addf(, "%.*s\n", item->arg_len, item->arg);
+   strbuf_addf(buf, "%.*s\n", item->arg_len, item->arg);
continue;
}
  
  		/* add command to the buffer */

if (flags & TODO_LIST_ABBREVIATE_CMDS)
-   strbuf_addch(, command_to_char(item->command));
+   strbuf_addch(buf, command_to_char(item->command));
else
-   strbuf_addstr(, command_to_string(item->command));
+   strbuf_addstr(buf, command_to_string(item->command));
  
  		/* add commit id */

if (item->commit) {
@@ -4283,27 +4286,46 @@ void todo_list_transform(struct todo_list *todo_list, 
unsigned flags)
  
  			if (item->command == TODO_MERGE) {

if (item->flags & TODO_EDIT_MERGE_MSG)
-   strbuf_addstr(, " -c");
+   strbuf_addstr(buf, " -c");
else
-   strbuf_addstr(, " -C");
+   strbuf_addstr(buf, " -C");
}
  
-			strbuf_addf(, " %s", oid);

+   strbuf_addf(buf, " %s", oid);
}
  
  		/* add all the rest */

if (!item->arg_len)
-   strbuf_addch(, '\n');
+   strbuf_addch(buf, '\n');
else
-   strbuf_addf(, " %.*s\n", item->arg_len, item->arg);
+   strbuf_addf(buf, " %.*s\n", item->arg_len, item->arg);
}
+}
  
-	strbuf_reset(_list->buf);

-   strbuf_add(_list->buf, buf.buf, buf.len);
+int todo_list_write_to_file(struct todo_list *todo_list, const char *file,
+   const char *shortrevisions, const char *shortonto,
+   int command_count, int append_help, int num, 
unsigned flags)


This is a really long argument list which makes it easy for callers to 
get the parameters in the wrong order. I think append_help could 
probably be folded into the flags, I'm not sure what the command_count 
is used for but I've only read the first few patches. Maybe it would be 
better to pass a struct so we have named fields.


Best Wishes

Phillip


+{
+   int edit_todo = !(shortrevisions && shortonto), res;
+   struct strbuf buf = STRBUF_INIT;
+
+   todo_list_to_strbuf(todo_list, , 

[PATCH v3 3/5] am: rename read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.19.1



[PATCH v3 4/5] add read_author_script() to libgit

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1:
 - added comment above read_author_script()
 - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i >= 0)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i >= 0)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i >= 0)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return read_author_script(filename, >author_name,
+   

[PATCH v3 1/5] am: don't die in read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine 
Signed-off-by: Phillip Wood 
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
if (fd < 0) {
if (errno == ENOENT)
return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
+   return error_errno(_("could not open '%s' for reading"),
+  filename);
}
strbuf_read(, fd, 0);
close(fd);
-- 
2.19.1



[PATCH v3 5/5] sequencer: use read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - use argv_array_pushf() as suggested by Eric
 - fixed strbuf handling as suggested by Eric
 - fix comments and commit message to reflect changed behavior of
   read_env_script()

 sequencer.c | 97 -
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3530dbeb6c..987542f67c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-   struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+   argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+   argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebas

[PATCH v3 2/5] am: improve author-script error reporting

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i >= 0)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i >= 0)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i >= 0)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.1



[PATCH v3 0/5] am/rebase: share read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--
 sequencer.c  | 192 ---
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1



Re: [PATCH v2 0/5] am/rebase: share read_author_script()

2018-10-26 Thread Phillip Wood
Hi Junio

On 25/10/2018 09:59, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> Thanks to Eric for his feedback on v1. I've rerolled based on
>> that. Patches 1 & 2 are new and try to address some of the concerns
>> Eric raised, particularly the error handling for a badly edited author
>> script. See the notes on patches 4 & 5 for the changes to those (they
>> were previously 2 & 3).
> 
> I spotted a weird corner case buglet, but it seems that this one is
> ready for 'next' even without fixing that "give it three times and
> we will happily continue" thing.

Well spotted on the corner case. If you're happy to hold off on moving
it to next I can send a re-roll with fixes for that next week or I can
do it as a fixup if you want to move this forward now.

> Do we know of any other issues?  Can we now move it forward?

I don't know of any other issues but I don't think anyone else has
looked at this iteration. Thanks for taking a look at these.

Best Wishes

Phillip



> Thanks.
> 



[PATCH v2 4/5] add read_author_script() to libgit

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1:
 - added comment above read_author_script()
 - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i >= 0)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i >= 0)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i >= 0)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return read_author_script(filename, >author_name,
+   

[PATCH v2 5/5] sequencer: use read_author_script()

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - use argv_array_pushf() as suggested by Eric
 - fixed strbuf handling as suggested by Eric
 - fix comments and commit message to reflect changed behavior of
   read_env_script()

 sequencer.c | 97 -
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3530dbeb6c..987542f67c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-   struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+   argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+   argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebas

[PATCH v2 1/5] am: don't die in read_author_script()

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine 
Signed-off-by: Phillip Wood 
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
if (fd < 0) {
if (errno == ENOENT)
return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
+   return error_errno(_("could not open '%s' for reading"),
+  filename);
}
strbuf_read(, fd, 0);
close(fd);
-- 
2.19.0



[PATCH v2 0/5] am/rebase: share read_author_script()

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

Thanks to Eric for his feedback on v1. I've rerolled based on
that. Patches 1 & 2 are new and try to address some of the concerns
Eric raised, particularly the error handling for a badly edited author
script. See the notes on patches 4 & 5 for the changes to those (they
were previously 2 & 3).

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--
 sequencer.c  | 192 ---
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.0



[PATCH v2 3/5] am: rename read_author_script()

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.19.0



[PATCH v2 2/5] am: improve author-script error reporting

2018-10-18 Thread Phillip Wood
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i >= 0)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i >= 0)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i >= 0)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.0



Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-16 Thread Phillip Wood

On 12/10/2018 14:36, Junio C Hamano wrote:

Phillip Wood  writes:


It would be nice if the parsing used starts_with(option_name, user_text)
rather than strcmp() as well. Also I think --color-moved=no is valid as
a synonym of --no-color-moved but --color-moved-ws=no is not supported.


I am not sure about starts_with().  Do you mean we should accept
"--color-mo", as that is a prefix of "--color-moved" that is not
shared with any existing option, until we gain a different option
"--color-more"?


I was thinking of the option arguments rather than the option names 
although being able to abbreviate the names in the same way as the 
commands that parse_options() would be good too (I seem to remember 
someone saying they had some rough patches to use parse_options() for 
diff and log in a discussion of adding completion support to 
parse_options())



If you mean "--color-moved-ws=no" (or "--no-color-moved-ws") as a
way to countermand an earlier --color-moved-ws= on the
command line, I fully agree that it is a good idea.


Oh I assumed --no-color-moved-ws was allowed but it isn't it. Allowing 
--color-moved-ws=no as well would match what is allowed for 
--color-moved. I'll try and look at that.


Best Wishes

Phillip



Re: [PATCH v2 2/2] rebase -i: introduce the 'break' command

2018-10-12 Thread Phillip Wood
Hi Johannes
On 12/10/2018 09:35, Johannes Schindelin wrote:
> Hi Phillip,
> 
> On Thu, 11 Oct 2018, Phillip Wood wrote:
> 
>> I think this would be a useful addition to rebase, there's one small
>> comment below.
>>
>> On 10/10/2018 09:53, Johannes Schindelin via GitGitGadget wrote:
>>> From: Johannes Schindelin 
>>>
>>> The 'edit' command can be used to cherry-pick a commit and then
>>> immediately drop out of the interactive rebase, with exit code 0, to let
>>> the user amend the commit, or test it, or look around.
>>>
>>> Sometimes this functionality would come in handy *without*
>>> cherry-picking a commit, e.g. to interrupt the interactive rebase even
>>> before cherry-picking a commit, or immediately after an 'exec' or a
>>> 'merge'.
>>>
>>> This commit introduces that functionality, as the spanking new 'break'
>>> command.
>>>
>>> Suggested-by: Stefan Beller 
>>> Signed-off-by: Johannes Schindelin 
>>> ---

>>> diff --git a/sequencer.c b/sequencer.c
>>> index 8dd6db5a01..b209f8af46 100644
>>> --- a/sequencer.c
>>> +++ b/sequencer.c

>>> @@ -3293,6 +3295,9 @@ static int pick_commits(struct todo_list *todo_list, 
>>> struct replay_opts *opts)
>>> unlink(rebase_path_stopped_sha());
>>> unlink(rebase_path_amend());
>>> delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
>>> +
>>> +   if (item->command == TODO_BREAK)
>>> +   break;
>>
>> Normally when rebase stops it prints a message to say where it has got
>> to and how to continue, it might be useful to do the same here
> 
> That's a very valid point. I'll think of something.

I'm not sure what gets left on the screen from the previous picks but
something to say what the last pick/exec was and maybe what the current
HEAD is would be useful I think. One thing has just occurred to me -
what does git status say (and does it still work - I'm not sure how much
parsing it does on the todo and done files) if it is run while rebase is
stopped on a break command?

Best Wishes

Phillip


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-12 Thread Phillip Wood
On 11/10/2018 23:40, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> On 10/10/2018 06:43, Junio C Hamano wrote:
>>> Here are the topics that have been cooking.  Commits prefixed with
>>> '-' are only in 'pu' (proposed updates) while commits prefixed with
>>> '+' are in 'next'.  The ones marked with '.' do not appear in any of
>>> the integration branches, but I am still holding onto them.
>>>
>>> * pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
>>>   - diff --color-moved: fix a memory leak
>>>   - diff --color-moved-ws: fix another memory leak
>>>   - diff --color-moved-ws: fix a memory leak
>>>   - diff --color-moved-ws: fix out of bounds string access
>>>   - diff --color-moved-ws: fix double free crash
>>>
>>>   Various fixes to "diff --color-moved-ws".
>>>
>>>   What's the status of this topic?
>>
>> I think it is ready for next - Stefan was happy with the last iteration.
> 
> This is not about your fixes, but I was skimming the color-moved
> support in general as a final sanity check to move this forward and
> noticed that
> 
>   $ git diff --color-moved-ws=ignore-any master...
> 
> does not do anything interesting, which is broken at at least two
> points.
> 
>  * There is no "ignore-any" supported by the feature---I think that
>the parser for the option should have noticed and barfed, but it
>did not.  It merely emitted a message to the standard output and
>let it scroll away with the huge diff before the reader noticed
>it.

It would be nice if the parsing used starts_with(option_name, user_text)
rather than strcmp() as well. Also I think --color-moved=no is valid as
a synonym of --no-color-moved but --color-moved-ws=no is not supported.

>  * After fixing ignore-any to one of the supported option
>(e.g. "ignore-all-spaces"), the color-moved feature still did not
>trigger.  I think the presence of --color-moved-ws by itself is a
>hint that the user wants --color-moved to be used.  If it turns
>out that there are some valid use cases where --color-moved-ws
>may have to be set but the color-moved feature should not be
>enabled, then
> 
>   diff --color-moved-ws=ignore-all-space --no-color-moved
> 
>can be used to countermand this, of course.
> 
> Am I missing something or are these mere small sloppiness in the
> current code?
> 
> 
> 



Re: [PATCH 04/15] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-10-12 Thread Phillip Wood
On 11/10/2018 17:57, Alban Gruin wrote:
> Hi Phillip,
> 
> thanks for taking the time to review my patches.
> 
> Le 11/10/2018 à 13:25, Phillip Wood a écrit :
>> On 07/10/2018 20:54, Alban Gruin wrote:
>>> @@ -4419,15 +4406,38 @@ int sequencer_add_exec_commands(const char
>>> *commands)
>>>   }
>>>     /* insert or append final  */
>>> -    if (insert >= 0 && insert < todo_list.nr)
>>> -    strbuf_insert(buf, todo_list.items[insert].offset_in_buf +
>>> +    if (insert >= 0 && insert < todo_list->nr)
>>> +    strbuf_insert(buf, todo_list->items[insert].offset_in_buf +
>>>     offset, commands, commands_len);
>>>   else if (insert >= 0 || !offset)
>>>   strbuf_add(buf, commands, commands_len);
>>>   -    i = write_message(buf->buf, buf->len, todo_file, 0);
>>> +    if (todo_list_parse_insn_buffer(buf->buf, todo_list))
>>> +    BUG("unusable todo list");}
>>
>> It is a shame to have to re-parse the todo list, I wonder how difficult
>> it would be to adjust the todo_list item array as the exec commands are
>> inserted. The same applies to the next couple of patches
>>
> 
> Good question.
> 
> This function inserts an `exec' command after every `pick' command.
> These commands are stored in a dynamically allocated list, grew with
> ALLOW_GROW().
> 
> If we want to keep the current structure, we would have to grow the size
> of the list by 1 and move several element to the end every time we want
> to add an `exec' command.  It would not be very effective.  Perhaps I
> should use a linked list here, instead.  It may also work well with
> rearrange_squash() and skip_unnecessary_picks().
> 
> Maybe we could even get rid of the strbuf at some point.

Another way would be to use the strbuf as a string pool rather than a
copy of the text of the file. There could be a write_todo_list()
function that takes a todo list and some flags, iterates over the items
in the todo list and writes the file. The flags would specify whether to
append the todo help and whether to abbreviate the object ids (so there
is no need for a separate call to transform_todos()). Then
add_exec_commands() could allocate a new array of todo items which it
builds up as it works through the original list and replaces the
original list with the new one at the end. The text of the exec items
can be added to the end of the strbuf (we only need one copy of the exec
text with this scheme). rearrange_squash() can use a temporary array to
build a new list as well or just memmove() things but that might be
slower if there is a lot of rearrangement to do. skip_unecessary_picks()
could just set the current item to the one we want to start with.

Best Wishes

Phillip

> 
>> Best Wishes
>>
>> Phillip
>>
> 
> Cheers,
> Alban
> 



Re: [PATCH 10/15] rebase-interactive: use todo_list_transform() in edit_todo_list()

2018-10-11 Thread Phillip Wood

On 07/10/2018 20:54, Alban Gruin wrote:

Just like complete_action(), edit_todo_list() used a
function (transform_todo_file()) that read the todo-list from the disk
and wrote it back, resulting in useless disk accesses.

This changes edit_todo_list() to call directly todo_list_transform()
instead.

Signed-off-by: Alban Gruin 
---
  rebase-interactive.c | 40 +++-
  1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/rebase-interactive.c b/rebase-interactive.c
index 7c7f720a3d..f42d48e192 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -78,39 +78,37 @@ void append_todo_help(unsigned edit_todo, unsigned 
keep_empty,
  
  int edit_todo_list(unsigned flags)

  {
-   struct strbuf buf = STRBUF_INIT;
const char *todo_file = rebase_path_todo();
+   struct todo_list todo_list = TODO_LIST_INIT;
+   int res = 0;
  
-	if (strbuf_read_file(, todo_file, 0) < 0)

+   if (strbuf_read_file(_list.buf, todo_file, 0) < 0)
return error_errno(_("could not read '%s'."), todo_file);
  
-	strbuf_stripspace(, 1);

-   if (write_message(buf.buf, buf.len, todo_file, 0)) {
-   strbuf_release();
-   return -1;
-   }
-
-   strbuf_release();
+   strbuf_stripspace(_list.buf, 1);
+   if (!todo_list_parse_insn_buffer(todo_list.buf.buf, _list))
+   todo_list_transform(_list, flags | TODO_LIST_SHORTEN_IDS);
  
-	transform_todo_file(flags | TODO_LIST_SHORTEN_IDS);

-
-   if (strbuf_read_file(, todo_file, 0) < 0)
-   return error_errno(_("could not read '%s'."), todo_file);
+   append_todo_help(1, 0, _list.buf);
  
-	append_todo_help(1, 0, );


I think this patch is fine, I was just wondering if you meant to move 
the call to append_todo_help() above the blank line?


Best Wishes

Phillip


-   if (write_message(buf.buf, buf.len, todo_file, 0)) {
-   strbuf_release();
+   if (write_message(todo_list.buf.buf, todo_list.buf.len, todo_file, 0)) {
+   todo_list_release(_list);
return -1;
}
  
-	strbuf_release();

-
-   if (launch_sequence_editor(todo_file, NULL, NULL))
+   strbuf_reset(_list.buf);
+   if (launch_sequence_editor(todo_file, _list.buf, NULL)) {
+   todo_list_release(_list);
return -1;
+   }
  
-	transform_todo_file(flags & ~(TODO_LIST_SHORTEN_IDS));

+   if (!todo_list_parse_insn_buffer(todo_list.buf.buf, _list)) {
+   todo_list_transform(_list, flags & 
~(TODO_LIST_SHORTEN_IDS));
+   res = write_message(todo_list.buf.buf, todo_list.buf.len, 
todo_file, 0);
+   }
  
-	return 0;

+   todo_list_release(_list);
+   return res;
  }
  
  define_commit_slab(commit_seen, unsigned char);






Re: [PATCH 08/15] sequencer: change complete_action() to use the refactored functions

2018-10-11 Thread Phillip Wood

On 07/10/2018 20:54, Alban Gruin wrote:

complete_action() used functions that read the todo-list file, made some
changes to it, and wrote it back to the disk.

The previous commits were dedicated to separate the part that deals with
the file from the actual logic of these functions.  Now that this is
done, we can call directly the "logic" functions to avoid useless file
access.

Signed-off-by: Alban Gruin 
---
  builtin/rebase--interactive.c | 13 +-
  sequencer.c   | 76 +--
  sequencer.h   |  2 +-
  3 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/builtin/rebase--interactive.c b/builtin/rebase--interactive.c
index eef1ff2e83..0700339f90 100644
--- a/builtin/rebase--interactive.c
+++ b/builtin/rebase--interactive.c
@@ -71,7 +71,6 @@ static int do_interactive_rebase(struct replay_opts *opts, 
unsigned flags,
const char *head_hash = NULL;
char *revisions = NULL, *shortrevisions = NULL;
struct argv_array make_script_args = ARGV_ARRAY_INIT;
-   FILE *todo_list_file;
struct todo_list todo_list = TODO_LIST_INIT;
  
  	if (prepare_branch_to_be_rebased(opts, switch_to))

@@ -94,14 +93,6 @@ static int do_interactive_rebase(struct replay_opts *opts, 
unsigned flags,
if (!upstream && squash_onto)
write_file(path_squash_onto(), "%s\n", squash_onto);
  
-	todo_list_file = fopen(rebase_path_todo(), "w");

-   if (!todo_list_file) {
-   free(revisions);
-   free(shortrevisions);
-
-   return error_errno(_("could not open %s"), rebase_path_todo());
-   }
-
argv_array_pushl(_script_args, "", revisions, NULL);
if (restrict_revision)
argv_array_push(_script_args, restrict_revision);
@@ -109,15 +100,13 @@ static int do_interactive_rebase(struct replay_opts 
*opts, unsigned flags,
ret = sequencer_make_script(_list.buf,
make_script_args.argc, 
make_script_args.argv,
flags);


I think it would be clearer to parse the todo list here explicitly 
rather than doing it implicitly in complete_action()



-   fputs(todo_list.buf.buf, todo_list_file);
-   fclose(todo_list_file);
  
  	if (ret)

error(_("could not generate todo list"));
else {
discard_cache() >ret = complete_action(opts, flags, 
shortrevisions, onto_name, onto,
- head_hash, cmd, autosquash);
+ head_hash, cmd, autosquash, _list);
}
  
  	free(revisions);

diff --git a/sequencer.c b/sequencer.c
index dfb8d1c974..b37935e5ab 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4624,93 +4624,89 @@ static int skip_unnecessary_picks(struct object_id 
*output_oid)
return 0;
  }
  
+static int todo_list_rearrange_squash(struct todo_list *todo_list);

+
  int complete_action(struct replay_opts *opts, unsigned flags,
const char *shortrevisions, const char *onto_name,
const char *onto, const char *orig_head, const char *cmd,
-   unsigned autosquash)
+   unsigned autosquash, struct todo_list *todo_list)
  {
const char *shortonto, *todo_file = rebase_path_todo();
-   struct todo_list todo_list = TODO_LIST_INIT;
-   struct strbuf *buf = &(todo_list.buf);
+   struct todo_list new_todo = TODO_LIST_INIT;
+   struct strbuf *buf = _list->buf;
struct object_id oid;
-   struct stat st;
+   int command_count;
  
  	get_oid(onto, );

shortonto = find_unique_abbrev(, DEFAULT_ABBREV);
  
-	if (!lstat(todo_file, ) && st.st_size == 0 &&

-   write_message("noop\n", 5, todo_file, 0))
-   return -1;
+   if (buf->len == 0)
+   strbuf_add(buf, "noop\n", 5);
+
+   if (todo_list_parse_insn_buffer(buf->buf, todo_list))
+   BUG("unusable todo list");
  
-	if (autosquash && rearrange_squash_in_todo_file())

+   if (autosquash && todo_list_rearrange_squash(todo_list))
return -1;
  
  	if (cmd && *cmd)

-   sequencer_add_exec_commands(cmd);
+   todo_list_add_exec_commands(todo_list, cmd);
  
-	if (strbuf_read_file(buf, todo_file, 0) < 0)

-   return error_errno(_("could not read '%s'."), todo_file);
-
-   if (todo_list_parse_insn_buffer(buf->buf, _list)) {
-   todo_list_release(_list);
-   return error(_("unusable todo list: '%s'"), todo_file);
-   }
-
-   if (count_commands(_list) == 0) {
+   command_count = count_commands(todo_list);
+   if (command_count == 0) {
apply_autostash(opts);
sequencer_remove_state(opts);
-   todo_list_release(_list);
  
  		return error(_("nothing to do"));

}
  
+	todo_list_transform(todo_list, flags | TODO_LIST_SHORTEN_IDS);

Re: [PATCH 04/15] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-10-11 Thread Phillip Wood

On 07/10/2018 20:54, Alban Gruin wrote:

This refactors sequencer_add_exec_commands() to work on a todo_list to
avoid redundant reads and writes to the disk.

sequencer_add_exec_commands() still reads the todo list from the disk,
as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
todo_list structure, and reparses it at the end.

Signed-off-by: Alban Gruin 
---
  sequencer.c | 56 +++--
  1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 8dda61927c..6d998f21a4 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4370,34 +4370,21 @@ int sequencer_make_script(FILE *out, int argc, const 
char **argv,
return 0;
  }
  
-/*

- * Add commands after pick and (series of) squash/fixup commands
- * in the todo list.
- */
-int sequencer_add_exec_commands(const char *commands)
+static void todo_list_add_exec_commands(struct todo_list *todo_list,
+   const char *commands)
  {
-   const char *todo_file = rebase_path_todo();
-   struct todo_list todo_list = TODO_LIST_INIT;
-   struct strbuf *buf = _list.buf;
+   struct strbuf *buf = _list->buf;
size_t offset = 0, commands_len = strlen(commands);
int i, insert;
  
-	if (strbuf_read_file(_list.buf, todo_file, 0) < 0)

-   return error(_("could not read '%s'."), todo_file);
-
-   if (todo_list_parse_insn_buffer(todo_list.buf.buf, _list)) {
-   todo_list_release(_list);
-   return error(_("unusable todo list: '%s'"), todo_file);
-   }
-
/*
 * Insert  after every pick. Here, fixup/squash chains
 * are considered part of the pick, so we insert the commands *after*
 * those chains if there are any.
 */
insert = -1;
-   for (i = 0; i < todo_list.nr; i++) {
-   enum todo_command command = todo_list.items[i].command;
+   for (i = 0; i < todo_list->nr; i++) {
+   enum todo_command command = todo_list->items[i].command;
  
  		if (insert >= 0) {

/* skip fixup/squash chains */
@@ -4408,7 +4395,7 @@ int sequencer_add_exec_commands(const char *commands)
continue;
}
strbuf_insert(buf,
- todo_list.items[insert].offset_in_buf +
+ todo_list->items[insert].offset_in_buf +
  offset, commands, commands_len);
offset += commands_len;
insert = -1;
@@ -4419,15 +4406,38 @@ int sequencer_add_exec_commands(const char *commands)
}
  
  	/* insert or append final  */

-   if (insert >= 0 && insert < todo_list.nr)
-   strbuf_insert(buf, todo_list.items[insert].offset_in_buf +
+   if (insert >= 0 && insert < todo_list->nr)
+   strbuf_insert(buf, todo_list->items[insert].offset_in_buf +
  offset, commands, commands_len);
else if (insert >= 0 || !offset)
strbuf_add(buf, commands, commands_len);
  
-	i = write_message(buf->buf, buf->len, todo_file, 0);

+   if (todo_list_parse_insn_buffer(buf->buf, todo_list))
+   BUG("unusable todo list");}


It is a shame to have to re-parse the todo list, I wonder how difficult 
it would be to adjust the todo_list item array as the exec commands are 
inserted. The same applies to the next couple of patches


Best Wishes

Phillip


+
+/*
+ * Add commands after pick and (series of) squash/fixup commands
+ * in the todo list.
+ */
+int sequencer_add_exec_commands(const char *commands)
+{
+   const char *todo_file = rebase_path_todo();
+   struct todo_list todo_list = TODO_LIST_INIT;
+   int res;
+
+   if (strbuf_read_file(_list.buf, todo_file, 0) < 0)
+   return error(_("could not read '%s'."), todo_file);
+
+   if (todo_list_parse_insn_buffer(todo_list.buf.buf, _list)) {
+   todo_list_release(_list);
+   return error(_("unusable todo list: '%s'"), todo_file);
+   }
+
+   todo_list_add_exec_commands(_list, commands);
+   res = write_message(todo_list.buf.buf, todo_list.buf.len, todo_file, 0);
todo_list_release(_list);
-   return i;
+
+   return res;
  }
  
  int transform_todos(unsigned flags)






Re: [PATCH 03/15] sequencer: refactor check_todo_list() to work on a todo_list

2018-10-11 Thread Phillip Wood

Hi Alban

I like the direction that this series is going

On 07/10/2018 20:54, Alban Gruin wrote:

This refactors check_todo_list() to work on a todo_list to avoid
redundant reads and writes to the disk.  The function is renamed
todo_list_check().

As rebase -p still need to check the todo list from the disk, a new
function is introduced, check_todo_list_from_file().  It reads the file
from the disk, parses it, pass the todo_list to todo_list_check(), and
writes it back to the disk.


After this commit we still use check_todo_list_from_file() even without 
'-p', it would be clearer if this (and the following commits) mentioned 
that the new function will be wired up later.



As get_missing_commit_check_level() and the enum
missing_commit_check_level are no longer needed inside of sequencer.c,
they are moved to rebase-interactive.c, and made static again.

Signed-off-by: Alban Gruin 
---
  builtin/rebase--interactive.c |   2 +-
  rebase-interactive.c  | 106 -
  rebase-interactive.h  |   1 +
  sequencer.c   | 109 --
  sequencer.h   |   9 +--
  5 files changed, 120 insertions(+), 107 deletions(-)

diff --git a/builtin/rebase--interactive.c b/builtin/rebase--interactive.c
index a2ab68ed06..ea1f93ccb6 100644
--- a/builtin/rebase--interactive.c
+++ b/builtin/rebase--interactive.c
@@ -255,7 +255,7 @@ int cmd_rebase__interactive(int argc, const char **argv, 
const char *prefix)
ret = transform_todos(flags);
break;
case CHECK_TODO_LIST:
-   ret = check_todo_list();
+   ret = check_todo_list_from_file();
break;
case REARRANGE_SQUASH:
ret = rearrange_squash();
diff --git a/rebase-interactive.c b/rebase-interactive.c
index 0f4119cbae..ef8540245d 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -1,8 +1,32 @@
  #include "cache.h"
  #include "commit.h"
-#include "rebase-interactive.h"
  #include "sequencer.h"
+#include "rebase-interactive.h"
  #include "strbuf.h"
+#include "commit-slab.h"
+#include "config.h"
+
+enum missing_commit_check_level {
+   MISSING_COMMIT_CHECK_IGNORE = 0,
+   MISSING_COMMIT_CHECK_WARN,
+   MISSING_COMMIT_CHECK_ERROR
+};
+
+static enum missing_commit_check_level get_missing_commit_check_level(void)
+{
+   const char *value;
+
+   if (git_config_get_value("rebase.missingcommitscheck", ) ||
+   !strcasecmp("ignore", value))
+   return MISSING_COMMIT_CHECK_IGNORE;
+   if (!strcasecmp("warn", value))
+   return MISSING_COMMIT_CHECK_WARN;
+   if (!strcasecmp("error", value))
+   return MISSING_COMMIT_CHECK_ERROR;
+   warning(_("unrecognized setting %s for option "
+ "rebase.missingCommitsCheck. Ignoring."), value);
+   return MISSING_COMMIT_CHECK_IGNORE;
+}
  
  void append_todo_help(unsigned edit_todo, unsigned keep_empty,

  struct strbuf *buf)
@@ -88,3 +112,83 @@ int edit_todo_list(unsigned flags)
  
  	return 0;

  }
+
+define_commit_slab(commit_seen, unsigned char);
+/*
+ * Check if the user dropped some commits by mistake
+ * Behaviour determined by rebase.missingCommitsCheck.
+ * Check if there is an unrecognized command or a
+ * bad SHA-1 in a command.
+ */
+int todo_list_check(struct todo_list *old_todo, struct todo_list *new_todo)
+{
+   enum missing_commit_check_level check_level = 
get_missing_commit_check_level();
+   struct strbuf missing = STRBUF_INIT;
+   int advise_to_edit_todo = 0, res = 0, i;
+   struct commit_seen commit_seen;
+
+   init_commit_seen(_seen);
+
+   res = todo_list_parse_insn_buffer(old_todo->buf.buf, old_todo);


While it made sense to parse the old and new todo lists here when they 
were being loaded from a file, I think it is confusing to keep that 
behavior now it is being passed struct todo_lists. When we get to patch 
8 the newly read commit list is passed to this function without being 
parsed which confused me until I came back and check what was going on here.


Best Wishes

Phillip


+   if (!res)
+   res = todo_list_parse_insn_buffer(new_todo->buf.buf, new_todo);
+   if (res) {
+   advise_to_edit_todo = res;
+   goto leave_check;
+   }
+
+   if (check_level == MISSING_COMMIT_CHECK_IGNORE)
+   goto leave_check;
+
+   /* Mark the commits in git-rebase-todo as seen */
+   for (i = 0; i < new_todo->nr; i++) {
+   struct commit *commit = new_todo->items[i].commit;
+   if (commit)
+   *commit_seen_at(_seen, commit) = 1;
+   }
+
+   /* Find commits in git-rebase-todo.backup yet unseen */
+   for (i = old_todo->nr - 1; i >= 0; i--) {
+   struct todo_item *item = old_todo->items + i;
+   struct commit *commit = item->commit;
+

Re: [PATCH] revert & cherry-pick: run git gc --auto

2018-10-11 Thread Phillip Wood

Hi Ævar
On 11/10/2018 11:08, Ævar Arnfjörð Bjarmason wrote:


On Thu, Oct 11 2018, Phillip Wood wrote:


Hi Ævar

On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:

Expand on the work started in 095c741edd ("commit: run git gc --auto
just before the post-commit hook", 2018-02-28) to run "gc --auto" in
more commands where new objects can be created.

The notably missing commands are now "rebase" and "stash". Both are
being rewritten in C, so any use of "gc --auto" there can wait for
that.


If cherry-pick, revert or 'rebase -i' edit the commit message then they
fork 'git commit' so gc --auto will be run there anyway.


Yeah it seems I totally screwed up the testing for this patch, first it
doesn't even compile because I'm not including run-command.h, I *did*
fix that, but while wrangling a few things didn't commit that *sigh*.

And yeah, there's some invocations where we now run gc --auto twice,
i.e. if you do revert, but not revert --no-edit, and not on cherry-pick,
but on cherry-pick --edit.

So yeah, this really needs to be re-thought.


I wonder if it would be better to call 'gc --auto' from sequencer.c at
the end of a string of successful picks, that would cover cherry-pick,
'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid
calling 'gc --auto' until the very end, rather than every time we stop
for an edit but that is probably more trouble than it is worth.


That seems a lot better indeed. I.e. running it from the sequencer. I do
wonder if there should be some smarts about running it in the middle of
a sequence, i.e. think of a case where we're rebasing 10k commits, which
is a gc need similar to what happens in the middle of "git svn
clone". So maybe something where we gc --auto in the sequencer for every
Nth commit, and at the end.


That sounds like a good idea. It would be nice if need_to_gc() was in 
libgit, then we could avoid the cost of forking unless we actually need 
to gc. Looking at builtin/gc.c there seem to be quite a few global 
variables so transforming it to library code may not be that straight 
forward.


Best Wishes

Phillip





Signed-off-by: Ævar Arnfjörð Bjarmason 
---

After reading the "Users are encouraged to run this task..." paragraph
in the git-gc manpage I was wondering if due to gc --auto all over the
place now (including recently in git-commit with a patch of mine) if
we shouldn't change that advice.

I'm meaning to send some doc changes to git-gc.txt, but in the
meantime let's address this low-hanging fruit of running gc --auto
when we revert or cherry-pick commits, which can like git-commit
create a significant amount of loose objects.

  builtin/revert.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/builtin/revert.c b/builtin/revert.c
index 9a66720cfc..1b20902910 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
  {
struct replay_opts opts = REPLAY_OPTS_INIT;
int res;
+   const char *argv_gc_auto[] = {"gc", "--auto", NULL};

if (isatty(0))
opts.edit = 1;
@@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("revert failed"));
+   run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
return res;
  }

@@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const 
char *prefix)
  {
struct replay_opts opts = REPLAY_OPTS_INIT;
int res;
+   const char *argv_gc_auto[] = {"gc", "--auto", NULL};

opts.action = REPLAY_PICK;
sequencer_init_config();
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("cherry-pick failed"));
+   run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
return res;
  }





Re: [PATCH] revert & cherry-pick: run git gc --auto

2018-10-11 Thread Phillip Wood
Hi Ævar

On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
> Expand on the work started in 095c741edd ("commit: run git gc --auto
> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
> more commands where new objects can be created.
> 
> The notably missing commands are now "rebase" and "stash". Both are
> being rewritten in C, so any use of "gc --auto" there can wait for
> that.

If cherry-pick, revert or 'rebase -i' edit the commit message then they
fork 'git commit' so gc --auto will be run there anyway. I wonder if it
would be better to call 'gc --auto' from sequencer.c at the end of a
string of successful picks, that would cover cherry-pick, 'rebase -iu'
and revert. With 'rebase -i' it might be nice to avoid calling 'gc
--auto' until the very end, rather than every time we stop for an edit
but that is probably more trouble than it is worth.

Best Wishes

Phillip

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> After reading the "Users are encouraged to run this task..." paragraph
> in the git-gc manpage I was wondering if due to gc --auto all over the
> place now (including recently in git-commit with a patch of mine) if
> we shouldn't change that advice.
> 
> I'm meaning to send some doc changes to git-gc.txt, but in the
> meantime let's address this low-hanging fruit of running gc --auto
> when we revert or cherry-pick commits, which can like git-commit
> create a significant amount of loose objects.
> 
>  builtin/revert.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/builtin/revert.c b/builtin/revert.c
> index 9a66720cfc..1b20902910 100644
> --- a/builtin/revert.c
> +++ b/builtin/revert.c
> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char 
> *prefix)
>  {
>   struct replay_opts opts = REPLAY_OPTS_INIT;
>   int res;
> + const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>  
>   if (isatty(0))
>   opts.edit = 1;
> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char 
> *prefix)
>   res = run_sequencer(argc, argv, );
>   if (res < 0)
>   die(_("revert failed"));
> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>   return res;
>  }
>  
> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const 
> char *prefix)
>  {
>   struct replay_opts opts = REPLAY_OPTS_INIT;
>   int res;
> + const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>  
>   opts.action = REPLAY_PICK;
>   sequencer_init_config();
>   res = run_sequencer(argc, argv, );
>   if (res < 0)
>   die(_("cherry-pick failed"));
> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>   return res;
>  }
> 



Re: [PATCH v2 2/2] rebase -i: introduce the 'break' command

2018-10-11 Thread Phillip Wood
Hi Johannes

I think this would be a useful addition to rebase, there's one small
comment below.

On 10/10/2018 09:53, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin 
> 
> The 'edit' command can be used to cherry-pick a commit and then
> immediately drop out of the interactive rebase, with exit code 0, to let
> the user amend the commit, or test it, or look around.
> 
> Sometimes this functionality would come in handy *without*
> cherry-picking a commit, e.g. to interrupt the interactive rebase even
> before cherry-picking a commit, or immediately after an 'exec' or a
> 'merge'.
> 
> This commit introduces that functionality, as the spanking new 'break'
> command.
> 
> Suggested-by: Stefan Beller 
> Signed-off-by: Johannes Schindelin 
> ---
>  Documentation/git-rebase.txt | 3 +++
>  rebase-interactive.c | 1 +
>  sequencer.c  | 7 ++-
>  t/lib-rebase.sh  | 2 +-
>  t/t3418-rebase-continue.sh   | 9 +
>  5 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index db2faca73c..5bed1da36b 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -561,6 +561,9 @@ By replacing the command "pick" with the command "edit", 
> you can tell
>  the files and/or the commit message, amend the commit, and continue
>  rebasing.
>  
> +To interrupt the rebase (just like an "edit" command would do, but without
> +cherry-picking any commit first), use the "break" command.
> +
>  If you just want to edit the commit message for a commit, replace the
>  command "pick" with the command "reword".
>  
> diff --git a/rebase-interactive.c b/rebase-interactive.c
> index 0f4119cbae..78f3263fc1 100644
> --- a/rebase-interactive.c
> +++ b/rebase-interactive.c
> @@ -14,6 +14,7 @@ void append_todo_help(unsigned edit_todo, unsigned 
> keep_empty,
>  "s, squash  = use commit, but meld into previous commit\n"
>  "f, fixup  = like \"squash\", but discard this commit's log 
> message\n"
>  "x, exec  = run command (the rest of the line) using shell\n"
> +"b, break = stop here (continue rebase later with 'git rebase --continue')\n"
>  "d, drop  = remove commit\n"
>  "l, label  = label current HEAD with a name\n"
>  "t, reset  = reset HEAD to a label\n"
> diff --git a/sequencer.c b/sequencer.c
> index 8dd6db5a01..b209f8af46 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1416,6 +1416,7 @@ enum todo_command {
>   TODO_SQUASH,
>   /* commands that do something else than handling a single commit */
>   TODO_EXEC,
> + TODO_BREAK,
>   TODO_LABEL,
>   TODO_RESET,
>   TODO_MERGE,
> @@ -1437,6 +1438,7 @@ static struct {
>   { 'f', "fixup" },
>   { 's', "squash" },
>   { 'x', "exec" },
> + { 'b', "break" },
>   { 'l', "label" },
>   { 't', "reset" },
>   { 'm', "merge" },
> @@ -1964,7 +1966,7 @@ static int parse_insn_line(struct todo_item *item, 
> const char *bol, char *eol)
>   padding = strspn(bol, " \t");
>   bol += padding;
>  
> - if (item->command == TODO_NOOP) {
> + if (item->command == TODO_NOOP || item->command == TODO_BREAK) {
>   if (bol != eol)
>   return error(_("%s does not accept arguments: '%s'"),
>command_to_string(item->command), bol);
> @@ -3293,6 +3295,9 @@ static int pick_commits(struct todo_list *todo_list, 
> struct replay_opts *opts)
>   unlink(rebase_path_stopped_sha());
>   unlink(rebase_path_amend());
>   delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
> +
> + if (item->command == TODO_BREAK)
> + break;

Normally when rebase stops it prints a message to say where it has got
to and how to continue, it might be useful to do the same here

Best Wishes

Phillip

>   }
>   if (item->command <= TODO_SQUASH) {
>   if (is_rebase_i(opts))
> diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
> index 25a77ee5cb..584604ee63 100644
> --- a/t/lib-rebase.sh
> +++ b/t/lib-rebase.sh
> @@ -49,7 +49,7 @@ set_fake_editor () {
>   case $line in
>   squash|fixup|edit|reword|drop)
>   action="$line";;
> - exec*)
> + exec*|break)
>   echo "$line" | sed 's/_/ /g' >> "$1";;
>   "#")
>   echo '# comment' >> "$1";;
> diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
> index c145dbac38..185a491089 100755
> --- a/t/t3418-rebase-continue.sh
> +++ b/t/t3418-rebase-continue.sh
> @@ -239,5 +239,14 @@ test_rerere_autoupdate -m
>  GIT_SEQUENCE_EDITOR=: && export GIT_SEQUENCE_EDITOR
>  test_rerere_autoupdate -i
>  test_rerere_autoupdate --preserve-merges
> +unset GIT_SEQUENCE_EDITOR
> +
> +test_expect_success 'the todo command "break" works' '
> + rm -f execed 

Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-10 Thread Phillip Wood

On 09/10/2018 22:10, Stefan Beller wrote:

As I said above I've more or less come to the view that the correctness
of pythonic indentation is orthogonal to move detection as it affects
all additions, not just those that correspond to moved lines.


Makes sense.


Right so are you happy for we to re-roll with a single 
allow-indentation-change mode based on my RFC?





What is your use case, what kind of content do you process that
this patch would help you?


I wrote this because I was re-factoring some shell code than was using a
indentation step of four spaces but with tabs in the leading indentation
which the current mode does not handle.


Ah that is good to know.

I was thinking whether we want to generalize the move detection into a more
generic "detect and fade out uninteresting things" and not just focus on white
spaces (but these are most often the uninteresting things).

Over the last year we had quite a couple of large refactorings, that
would have helped by that:
* For example the hash transition plan had a lot of patches that
   were basically s/char *sha1/struct object oid/ or some variation thereof.
* Introducing struct repository

I used the word diff to look at those patches, which helped a lot, but
maybe a mode that would allow me to mark this specific replacement
uninteresting would be even better.
Maybe this can be done as a piggyback on top of the move detection as
a "move in place, but with uninteresting pattern". The problem of this
is that the pattern needs to be accounted for when hashing the entries
into the hashmaps, which is easy when doing white spaces only.


Yes the I like the idea. Yesterday I was looking at Alban's patches to 
refactor the todo list handling for rebase -i and there are a lot of '.' 
to '->' changes which weren't particularly interesting though at least 
diff-highlight made it clear if that was the only change on a line. 
Incidentally --color-moved was very useful for looking at that series.



+   if (a->s == DIFF_SYMBOL_PLUS)
+   *delta = la - lb;
+   else
+   *delta = lb - la;


When writing the original feature I had reasons
not to rely on the symbol, as you could have
moved things from + to - (or the other way round)
and added or removed indentation. That is what the
`current_longer` is used for. But given that you only
count here, we can have negative numbers, so it
would work either way for adding or removing indentation.

But then, why do we need to have a different sign
depending on the sign of the line?


The check means that we get the same delta whichever way round the lines
are compared. I think I added this because without it the highlighting
gets broken if there is increase in indentation followed by an identical
decrease on the next line.


But wouldn't we want to get that highlighted?
I do not quite understand the scenario, yet. Are both indented
and dedented part of the same block?


With --color-moved=zebra the indented lines and the de-indented lines 
should be different colors, without the test they both ended up in the 
same block.


Best Wishes

Phillip



+   } else {
+   BUG("no color_moved_ws_allow_indentation_change set");


Instead of the BUG here could we have a switch/case (or if/else)
covering the complete space of delta->have_string instead?
Then we would not leave a lingering bug in the code base.


I'm not sure what you mean, we cover all the existing
color_moved_ws_handling values, I added the BUG() call to pick up future
omissions if another mode is added. (If we go for a single mode none of
this matters)


Ah, makes sense!





Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Phillip Wood

On 10/10/2018 06:43, Junio C Hamano wrote:

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

* pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
  - diff --color-moved: fix a memory leak
  - diff --color-moved-ws: fix another memory leak
  - diff --color-moved-ws: fix a memory leak
  - diff --color-moved-ws: fix out of bounds string access
  - diff --color-moved-ws: fix double free crash

  Various fixes to "diff --color-moved-ws".

  What's the status of this topic?


I think it is ready for next - Stefan was happy with the last iteration.

Best Wishes

Phillip


Re: [PATCH 3/3] sequencer: use read_author_script()

2018-10-10 Thread Phillip Wood
On 14/09/2018 01:31, Eric Sunshine wrote:
> On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood  
> wrote:
>> Use the new function to read the author script, updating
>> read_env_script() and read_author_ident(). This means we now have a
>> single code path that reads the author script and uses sq_dequote() to
>> dequote it. This fixes potential problems with user edited scripts
>> as read_env_script() which did not track quotes properly.
>> [...]
>> Signed-off-by: Phillip Wood 
>> ---
>>  /*
>>   * Read a list of environment variable assignments (such as the 
>> author-script
>>   * file) into an environment block. Returns -1 on error, 0 otherwise.
>>   */
> 
> According to this comment, this function is capable of parsing a file
> of arbitrary "NAME=Value" lines, and indeed the original code does
> just that, but...
> 
>>  static int read_env_script(struct argv_array *env)
>>  {
>> +   char *name, *email, *date;
>>
>> -   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
>> +   if (read_author_script(rebase_path_author_script(),
>> +  , , , 0))
> 
> ...the new implementation is able to handle only GIT_AUTHOR_NAME,
> GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE, in exactly that order.
> 
> This seems like a pretty serious (and possibly buggy) change of
> behavior, and makes the function much less useful (in general). Is it
> true that it will only ever be used for files containing that limited
> set of names? If so, the behavior change deserves mention in the
> commit message, the function comment needs updating, and the function
> itself probably ought to be renamed.

You're right the change in behavior should be mentioned explicitly, I'd
not thought about it in those terms. I'm not sure if the change is
buggy, this code is what am uses for its author script handling. To me
the point of the author-script file is to set the author details, not to
set arbitrary environment variables. We have already significantly
reduced what someone can do with this file in the transition from shell
to C as we no longer support arbitrary shell code in the file. I'd
rather try and reuse the existing code from am unless someone can
demonstrate an active use for something more general. (I'm still not
sure what use editing the author-script is - it is only of use if the
rebase stops for conflicts, it cannot be used to change the author of an
arbitrary set of commits)

>> +   strbuf_addstr(, "GIT_AUTHOR_NAME=");
>> +   strbuf_addstr(, name);
>> +   argv_array_push(env, script.buf);
>> +   strbuf_reset();
>> +   strbuf_addstr(, "GIT_AUTHOR_EMAIL=");
>> +   strbuf_addstr(, email);
>> +   argv_array_push(env, script.buf);
>> +   strbuf_reset();
>> +   strbuf_addstr(, "GIT_AUTHOR_DATE=");
>> +   strbuf_addstr(, date);
>> +   argv_array_push(env, script.buf);
>> +   strbuf_release();
> 
> Mentioned earlier[1], this can all collapse down to:
> 
> argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
> argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
> argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
> 
> However, it's unfortunate that this manual and hard-coded
> reconstruction is needed at all. If you restructure the factoring of
> this patch series, such ugliness can be avoided altogether. For
> instance, the series could be structured like this:
> 
> 1. Introduce a general-purpose function for reading a file containing
> arbitrary "NAME=Value" lines (not carrying about specific key names or
> their order) and returning them in some data structure (perhaps via
> 'string_list' as parse_key_value_squoted() in patch 2/3 does).
> 
> 2. Build read_author_script() atop #1, making it expect and extract
> GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE (possibly in
> that order, or possibly not if we don't care).
> 
> 3. Retrofit existing parsers to call one of those two functions (this
> step may happen over several patches). So, for instance,
> read_env_script() would call the generic parser from #1, whereas
> sequencer.c:read_author_ident() would call the more specific parser
> from #2.

That plan requires all new code rather than reusing tried and tested
code from am. Furthermore I'm not sure it has any advantage as far as
users are concerned. My aim with this series was to try and do something
fairly simple that reused the parsing from am, rather than build a whole
new system with its own bugs.

> More below...
> 
>> @@ -790,54 +771,25 @@ static char *get_author(const char *message)
>>  /* Read author-script a

Re: [PATCH 2/3] add read_author_script() to libgit

2018-10-10 Thread Phillip Wood
Hi Eric

Thanks for looking at this series, sorry it has taken so long for me to
reply.

On 14/09/2018 00:49, Eric Sunshine wrote:
> On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood  
> wrote:
>> Add read_author_script() to sequencer.c based on the implementation in
>> builtin/am.c and update read_am_author_script() to use
>> read_author_script(). The sequencer code that reads the author script
>> will be updated in the next commit.
>>
>> Signed-off-by: Phillip Wood 
>> ---
>> diff --git a/builtin/am.c b/builtin/am.c
>> @@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct 
>> string_list *list)
>>  static int read_am_author_script(struct am_state *state)
>>  {
> 
> This function returns 'int'...
> 
>> const char *filename = am_path(state, "author-script");
>> +   if (read_author_script(filename, >author_name,
>> +  >author_email, >author_date, 1))
>> +   exit(128);
> 
> ...but only ever exit()s...
> 
>> +   return 0;
> 
> ... or returns 0.
> 
>>  }
> 
> It's a little disturbing that it now invokes exit() directly rather
> than calling die() since die() may involve extra functionality before
> actually exiting.
> 
> What if, instead of exit()ing directly, you drop the conditional and
> instead return the value of read_author_script():
> 
> return read_author_script(...);
> 
> and let the caller deal with the zero or non-zero return value as
> usual? (True, you'd get two error messages upon failure rather than
> just one, but that's not necessarily a bad thing.)
> 
> A possibly valid response is that such a change is outside the scope
> of this series since the original code shared this odd architecture of
> sometimes returning 0, sometimes -1, and sometimes die()ing.

My aim was to try and to keep the changes to a minimum as this patch
isn't about changing the odd architecture of the original. I could add a
follow up patch that cleans things up as you suggest.

Best Wishes

Phillip



[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-09 Thread Phillip Wood
Hi Stefan

Thanks for all your comments on this, they've been really helpful.

On 25/09/2018 02:07, Stefan Beller wrote:
> On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood  
> wrote:
>>
>> From: Phillip Wood 
>>
>> This adds another mode for highlighting lines that have moved with an
>> indentation change. Unlike the existing
>> --color-moved-ws=allow-indentation-change setting this mode uses the
>> visible change in the indentation to group lines, rather than the
>> indentation string.
> 
> Wow! Thanks for putting this RFC out.
> My original vision was to be useful to python users as well,
> which counts 1 tab as 8 spaces IIUC.
> 
> The "visual" indentation you mention here sounds like
> a tab is counted as "up to the next position of (n-1) % 8",
> i.e. stop at positions 8, 16, 24... which would not be pythonic,
> but useful in e.g. our code base.

The docs for python2 state[1]

  Leading whitespace (spaces and tabs) at the beginning of a logical
  line is used to compute the indentation level of the line, which in
  turn is used to determine the grouping of statements.

  First, tabs are replaced (from left to right) by one to eight spaces
  such that the total number of characters up to and including the
  replacement is a multiple of eight (this is intended to be the same
  rule as used by Unix). The total number of spaces preceding the
  first non-blank character then determines the line’s
  indentation. Indentation cannot be split over multiple physical
  lines using backslashes; the whitespace up to the first backslash
  determines the indentation.

As I understand it that fits with the "visual" indentation implemented
by this patch.

For python3 adds a third paragraph[2]

  Indentation is rejected as inconsistent if a source file mixes tabs
  and spaces in a way that makes the meaning dependent on the worth of
  a tab in spaces; a TabError is raised in that case.

My impression is that people generally avoid mixing tabs and spaces in
python3 code, in which case I wonder if the "visual" indentation
combined with a suitable setting for core.whitespace to highlight
erroneous tabs/spaces would be enough. (I'm not a python programmer so I
could be completely wrong on that)

In any case the more I think about it the more convinced I am that
having a move detection mode for "pythonic" indentation is a mistake. If
a line is added with dodgy indentation then it is a problem whether or
not it has been moved so I think this should be handled by the
whitespace error highlighting. This would allow a single mode for move
detection with an indentation change.

[1] https://docs.python.org/2.7/reference/lexical_analysis.html#indentation
[2] https://docs.python.org/3.7/reference/lexical_analysis.html#indentation

>> This means it works with files that use a mix of
>> tabs and spaces for indentation and can cope with whitespace errors
>> where there is a space before a tab
> 
> Cool!
> 
>> (it's the job of
>> --ws-error-highlight to deal with those errors, it should affect the
>> move detection).
> 
> Not sure I understand this side note. So --ws-error-highlight can
> highlight them, but the move detection should *not*(?) be affected
> by the highlighted parts, or it should do things differently on
> whether  --ws-error-highlight is given?

I just meant that the move detection should pretend the whitespace
errors do not exist.

>> It will also group the lines either
>> side of a blank line if their indentation change matches so short
>> lines followed by a blank line followed by more lines with the same
>> indentation change will be correctly highlighted.
> 
> That sounds very useful (at least for my editor, that strips
> blank lines to be empty lines), but I would think this feature is
> worth its own commit/patch.
> 
> I wonder how much this feature is orthogonal to the existing
> problem of detecting the moved indented blocks (existing
> allow-indentation-change vs the new feature discussed first
> above)

It only works if the blank lines get moved with the non-blank lines
around it, then it matches the normal moved behavior I think. I'd like
to have it include blank context lines where the lines either side have
the same indentation change but that is trickier to implement.

>>
>> This is a RFC as there are a number of questions about how to proceed
>> from here:
>>  1) Do we need a second option or should this implementation replace
>> --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>> has any advantages for some people. There seems to have been an
>> intention [1] to get it working with mixes of tabs and spaces but
>> nothing ever came of it.
> 
> Oh, yeah, I was working on that, b

Re: [PATCH v4 4/4] rebase --skip: clean up commit message after a failed fixup/squash

2018-10-08 Thread Phillip Wood

Hi Johannes

On 02/10/2018 14:50, Johannes Schindelin wrote:

Hi Phillip,

[sorry, I just got to this mail now]


Don't worry, I'm impressed you remembered it, I'd completely forgotten 
about it.




On Sun, 6 May 2018, Phillip Wood wrote:


On 27/04/18 21:48, Johannes Schindelin wrote:


During a series of fixup/squash commands, the interactive rebase builds
up a commit message with comments. This will be presented to the user in
the editor if at least one of those commands was a `squash`.

In any case, the commit message will be cleaned up eventually, removing
all those intermediate comments, in the final step of such a
fixup/squash chain.

However, if the last fixup/squash command in such a chain fails with
merge conflicts, and if the user then decides to skip it (or resolve it
to a clean worktree and then continue the rebase), the current code
fails to clean up the commit message.

This commit fixes that behavior.

The fix is quite a bit more involved than meets the eye because it is
not only about the question whether we are `git rebase --skip`ing a
fixup or squash. It is also about removing the skipped fixup/squash's
commit message from the accumulated commit message. And it is also about
the question whether we should let the user edit the final commit
message or not ("Was there a squash in the chain *that was not
skipped*?").

For example, in this case we will want to fix the commit message, but
not open it in an editor:

pick<- succeeds
fixup   <- succeeds
squash  <- fails, will be skipped

This is where the newly-introduced `current-fixups` file comes in real
handy. A quick look and we can determine whether there was a non-skipped
squash. We only need to make sure to keep it up to date with respect to
skipped fixup/squash commands. As a bonus, we can even avoid committing
unnecessarily, e.g. when there was only one fixup, and it failed, and
was skipped.

To fix only the bug where the final commit message was not cleaned up
properly, but without fixing the rest, would have been more complicated
than fixing it all in one go, hence this commit lumps together more than
a single concern.

For the same reason, this commit also adds a bit more to the existing
test case for the regression we just fixed.

The diff is best viewed with --color-moved.

Signed-off-by: Johannes Schindelin 
---
  sequencer.c| 113 -
  t/t3418-rebase-continue.sh |  35 ++--
  2 files changed, 131 insertions(+), 17 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 56166b0d6c7..cec180714ef 100644
--- a/sequencer.c
+++ b/sequencer.c
[...]
@@ -2804,19 +2801,107 @@ static int commit_staged_changes(struct replay_opts 
*opts)
if (get_oid_hex(rev.buf, _amend))
return error(_("invalid contents: '%s'"),
rebase_path_amend());
-   if (oidcmp(, _amend))
+   if (!is_clean && oidcmp(, _amend))
return error(_("\nYou have uncommitted changes in your "
   "working tree. Please, commit them\n"
   "first and then run 'git rebase "
   "--continue' again."));
+   /*
+* When skipping a failed fixup/squash, we need to edit the
+* commit message, the current fixup list and count, and if it
+* was the last fixup/squash in the chain, we need to clean up
+* the commit message and if there was a squash, let the user
+* edit it.
+*/
+   if (is_clean && !oidcmp(, _amend) &&
+   opts->current_fixup_count > 0 &&
+   file_exists(rebase_path_stopped_sha())) {
+   const char *p = opts->current_fixups.buf;
+   int len = opts->current_fixups.len;
+
+   opts->current_fixup_count--;
+   if (!len)
+   BUG("Incorrect current_fixups:\n%s", p);
+   while (len && p[len - 1] != '\n')
+   len--;
+   strbuf_setlen(>current_fixups, len);
+   if (write_message(p, len, rebase_path_current_fixups(),
+ 0) < 0)
+   return error(_("could not write file: '%s'"),
+rebase_path_current_fixups());
+
+   /*
+* If a fixup/squash in a fixup/squash chain failed, the
+* commit message is already correct, no need to commit
+* it again.
+*
+   

[PATCH v2 4/5] diff --color-moved-ws: fix another memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

This is obvious in retrospect, it was found with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diff.c b/diff.c
index 9bf41bad0d..69f6309db6 100644
--- a/diff.c
+++ b/diff.c
@@ -969,6 +969,8 @@ static void pmb_advance_or_null_multi_match(struct 
diff_options *o,
moved_block_clear([i]);
}
}
+
+   free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
-- 
2.19.0



[PATCH v2 5/5] diff --color-moved: fix a memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Free the hashmap items as well as the hashmap itself. This was found
with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 69f6309db6..100d24b9c4 100644
--- a/diff.c
+++ b/diff.c
@@ -5768,8 +5768,8 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
dim_moved_lines(o);
 
-   hashmap_free(_lines, 0);
-   hashmap_free(_lines, 0);
+   hashmap_free(_lines, 1);
+   hashmap_free(_lines, 1);
}
 
for (i = 0; i < esm.nr; i++)
-- 
2.19.0



[PATCH v2 1/5] diff --color-moved-ws: fix double free crash

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Running
  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
results in a crash due to a double free. This happens when two
potential moved blocks start with consecutive lines. As
pmb_advance_or_null_multi_match() advances it copies the ws_delta from
the last matching line to the next. When the first of our consecutive
lines is advanced its ws_delta well be copied to the second,
overwriting the ws_delta of the block containing the second line. Then
when the second line is advanced it will copy the new ws_delta to the
line below it and so on. Eventually one of these blocks will stop
matching and the ws_delta will be freed. From then on the other block
is in a use-after-free state and when it stops matching it will try to
free the ws_delta that has already been freed by the other block.

The solution is to store the ws_delta in the array of potential moved
blocks rather than with the lines. This means that it no longer needs
to be copied around and one block cannot overwrite the ws_delta of
another. Additionally it saves some malloc/free calls as we don't keep
allocating and freeing ws_deltas.

Signed-off-by: Phillip Wood 
---

Notes:
This applies on top of fab01ec52e ("diff: fix
--color-moved-ws=allow-indentation-change", 2018-09-04). Note that the
crash happens with or without that patch, but the mechanism is
slightly different without it.

Changes since v1:
 - updated comments to match new implementation.

 diff.c | 82 --
 1 file changed, 45 insertions(+), 37 deletions(-)

diff --git a/diff.c b/diff.c
index 9393993e33..02d885f039 100644
--- a/diff.c
+++ b/diff.c
@@ -751,7 +751,6 @@ struct moved_entry {
struct hashmap_entry ent;
const struct emitted_diff_symbol *es;
struct moved_entry *next_line;
-   struct ws_delta *wsd;
 };
 
 /**
@@ -768,6 +767,17 @@ struct ws_delta {
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+struct moved_block {
+   struct moved_entry *match;
+   struct ws_delta wsd;
+};
+
+static void moved_block_clear(struct moved_block *b)
+{
+   FREE_AND_NULL(b->wsd.string);
+   b->match = NULL;
+}
+
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
 const struct emitted_diff_symbol *b,
 struct ws_delta *out)
@@ -785,7 +795,7 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
 static int cmp_in_block_with_wsd(const struct diff_options *o,
 const struct moved_entry *cur,
 const struct moved_entry *match,
-struct moved_entry *pmb,
+struct moved_block *pmb,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
@@ -805,25 +815,24 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
if (strcmp(a, b))
return 1;
 
-   if (!pmb->wsd)
+   if (!pmb->wsd.string)
/*
-* No white space delta was carried forward? This can happen
-* when we exit early in this function and do not carry
-* forward ws.
+* The white space delta is not active? This can happen
+* when we exit early in this function.
 */
return 1;
 
/*
-* The indent changes of the block are known and carried forward in
+* The indent changes of the block are known and stored in
 * pmb->wsd; however we need to check if the indent changes of the
 * current line are still the same as before.
 *
 * To do so we need to compare 'l' to 'cur', adjusting the
 * one of them for the white spaces, depending which was longer.
 */
 
-   wslen = strlen(pmb->wsd->string);
-   if (pmb->wsd->current_longer) {
+   wslen = strlen(pmb->wsd.string);
+   if (pmb->wsd.current_longer) {
c += wslen;
cl -= wslen;
} else {
@@ -873,7 +882,6 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
ret->es = l;
ret->next_line = NULL;
-   ret->wsd = NULL;
 
return ret;
 }
@@ -913,76 +921,72 @@ static void add_lines_to_move_detection(struct 
diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
struct moved_entry *match,
struct hashmap *hm,
-   struct moved_entry **pmb,
+   struct moved_block *pmb,
int pmb_nr)
 {
int i;
for (i = 0; i < pmb_nr; i++) {
-   struct moved_entry *

[PATCH v2 0/5] diff --color-moved-ws: fix double free crash

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Thanks to Stefan and Martin for their comments on v1. This version has
some small changes to patches 1-3 based on that feedback - see the
individual patches for what has changed.

Phillip Wood (5):
  diff --color-moved-ws: fix double free crash
  diff --color-moved-ws: fix out of bounds string access
  diff --color-moved-ws: fix a memory leak
  diff --color-moved-ws: fix another memory leak
  diff --color-moved: fix a memory leak

 diff.c | 95 +-
 1 file changed, 54 insertions(+), 41 deletions(-)

-- 
2.19.0



[PATCH v2 2/5] diff --color-moved-ws: fix out of bounds string access

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

When adjusting the start of the string to take account of the change
in indentation the code was not checking that the string being
adjusted was in fact longer than the indentation change. This was
detected by asan.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since v1:
 - simplified comparison as suggested by Stefan

 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 02d885f039..e492f8b74f 100644
--- a/diff.c
+++ b/diff.c
@@ -840,7 +840,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (strcmp(a, c))
+   if (al != cl || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.0



[PATCH v2 3/5] diff --color-moved-ws: fix a memory leak

2018-10-04 Thread Phillip Wood
From: Phillip Wood 

Don't duplicate the indentation string if we're not going to use it.
This was found with asan.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since v1:
 - simplified code as suggested by Martin Ågren

 diff.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index e492f8b74f..9bf41bad0d 100644
--- a/diff.c
+++ b/diff.c
@@ -786,10 +786,13 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
int d = longer->len - shorter->len;
 
+   if (strncmp(longer->line + d, shorter->line, shorter->len))
+   return 0;
+
out->string = xmemdupz(longer->line, d);
out->current_longer = (a == longer);
 
-   return !strncmp(longer->line + d, shorter->line, shorter->len);
+   return 1;
 }
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
-- 
2.19.0



Re: [PATCH 3/5] diff --color-moved-ws: fix a memory leak

2018-10-03 Thread Phillip Wood
On 02/10/2018 20:08, Stefan Beller wrote:
> On Tue, Oct 2, 2018 at 10:55 AM Phillip Wood  
> wrote:
>>
>> From: Phillip Wood 
>>
>> Don't duplicate the indentation string if we're not going to use it.
>> This was found with asan.
> 
> Makes sense,
> 
> Thanks,
> Stefan
> 
> With compute_ws_delta growing bigger here (and having only one caller),
> I wonder if we want to pass 'match' instead of match->es (and pmb_nr),
> such that the function can also take care of
> pmb[pmb_nr++].match = match;

Wouldn't we still need to increment pmb_nr in the caller though, in
which case I'm not sure it saves much.

> Then the function is not a mere compute_ws_delta, but a whole
> setup_new_pmb(...) thing. Undecided.
> 



Re: [PATCH 2/5] diff --color-moved-ws: fix out of bounds string access

2018-10-03 Thread Phillip Wood
On 02/10/2018 19:58, Stefan Beller wrote:
> On Tue, Oct 2, 2018 at 10:55 AM Phillip Wood  
> wrote:
>>
>> From: Phillip Wood 
>>
>> When adjusting the start of the string to take account of the change
>> in indentation the code was not checking that the string being
>> adjusted was in fact longer than the indentation change. This was
>> detected by asan.
>>
>> Signed-off-by: Phillip Wood 
>> ---
>>  diff.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 5a08d64497..0096bdc339 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -841,7 +841,7 @@ static int cmp_in_block_with_wsd(const struct 
>> diff_options *o,
>> al -= wslen;
>> }
>>
>> -   if (strcmp(a, c))
>> +   if (al < 0 || al != cl || memcmp(a, c, al))
> 
> If (al < 0) then al != cl, so I would think we could drop the first
> part, and only check with al != cl || memcmp

Yes, I couldn't make up my mind weather to add the al < 0 or not but of
course only one of the lengths can ever be negative, I'll remove it

> In theory we should use xdiff_compare_lines here
> as the rest of the lines could have more white space issues,
> but we catch that earlier via a die("color-moved-ws:
> allow-indentation-change cannot be combined with other
> white space modes"), so memcmp is fine.
> 
> Side note: There are comments above this code that seem
> to be also obsolete after patch 1 ("...carried forward in
> pmb->wsd; ...)

Good point I'll go through the comments and update them

Best Wishes

Phillip



Re: [PATCH 1/5] diff --color-moved-ws: fix double free crash

2018-10-03 Thread Phillip Wood
Hi Stefan

Thanks for looking at these patches

On 02/10/2018 19:49, Stefan Beller wrote:
> On Tue, Oct 2, 2018 at 10:55 AM Phillip Wood  
> wrote:
> 
>> The solution is to store the ws_delta in the array of potential moved
>> blocks rather than with the lines. This means that it no longer needs
>> to be copied around and one block cannot overwrite the ws_delta of
>> another. Additionally it saves some malloc/free calls as we don't keep
>> allocating and freeing ws_deltas.
> 
> Another solution would be to duplicate the copy-arounds, that it only
> fixes the double free, but having another layer of abstraction
> (moved block vs line) makes sense as then we don't need to copy
> it forward.
> 
> With this patch applied the diff as mentioned works and having the
> ws deltas with the blocks instead of the
> 
>>
>> Signed-off-by: Phillip Wood 
> 
> 
> 
>>  static void pmb_advance_or_null_multi_match(struct diff_options *o,
> [...]
>> for (i = 0; i < pmb_nr; i++) {
>> if (got_match[i]) {
>> /* Carry the white space delta forward */
> 
> I would think this comment is obsolete as well with this patch?

Yes you're right I should have removed that. As there are some changes
needed to some other comments I'll re-roll

Best Wishes

Phillip

> 
> With or without that nit addressed, this patch is
> Reviewed-by: Stefan Beller 
> 



[PATCH 5/5] diff --color-moved: fix a memory leak

2018-10-02 Thread Phillip Wood
From: Phillip Wood 

Free the hashmap items as well as the hashmap itself. This was found
with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 4464feacf8..94cc1b5592 100644
--- a/diff.c
+++ b/diff.c
@@ -5770,8 +5770,8 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
dim_moved_lines(o);
 
-   hashmap_free(_lines, 0);
-   hashmap_free(_lines, 0);
+   hashmap_free(_lines, 1);
+   hashmap_free(_lines, 1);
}
 
for (i = 0; i < esm.nr; i++)
-- 
2.19.0



[PATCH 1/5] diff --color-moved-ws: fix double free crash

2018-10-02 Thread Phillip Wood
From: Phillip Wood 

Running
  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
results in a crash due to a double free. This happens when two
potential moved blocks start with consecutive lines. As
pmb_advance_or_null_multi_match() advances it copies the ws_delta from
the last matching line to the next. When the first of our consecutive
lines is advanced its ws_delta well be copied to the second,
overwriting the ws_delta of the block containing the second line. Then
when the second line is advanced it will copy the new ws_delta to the
line below it and so on. Eventually one of these blocks will stop
matching and the ws_delta will be freed. From then on the other block
is in a use-after-free state and when it stops matching it will try to
free the ws_delta that has already been freed by the other block.

The solution is to store the ws_delta in the array of potential moved
blocks rather than with the lines. This means that it no longer needs
to be copied around and one block cannot overwrite the ws_delta of
another. Additionally it saves some malloc/free calls as we don't keep
allocating and freeing ws_deltas.

Signed-off-by: Phillip Wood 
---
 diff.c | 73 +-
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/diff.c b/diff.c
index 9393993e33..5a08d64497 100644
--- a/diff.c
+++ b/diff.c
@@ -751,7 +751,6 @@ struct moved_entry {
struct hashmap_entry ent;
const struct emitted_diff_symbol *es;
struct moved_entry *next_line;
-   struct ws_delta *wsd;
 };
 
 /**
@@ -768,6 +767,17 @@ struct ws_delta {
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+struct moved_block {
+   struct moved_entry *match;
+   struct ws_delta wsd;
+};
+
+static void moved_block_clear(struct moved_block *b)
+{
+   FREE_AND_NULL(b->wsd.string);
+   b->match = NULL;
+}
+
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
 const struct emitted_diff_symbol *b,
 struct ws_delta *out)
@@ -785,7 +795,7 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
 static int cmp_in_block_with_wsd(const struct diff_options *o,
 const struct moved_entry *cur,
 const struct moved_entry *match,
-struct moved_entry *pmb,
+struct moved_block *pmb,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
@@ -805,7 +815,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
if (strcmp(a, b))
return 1;
 
-   if (!pmb->wsd)
+   if (!pmb->wsd.string)
/*
 * No white space delta was carried forward? This can happen
 * when we exit early in this function and do not carry
@@ -822,8 +832,8 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
 * one of them for the white spaces, depending which was longer.
 */
 
-   wslen = strlen(pmb->wsd->string);
-   if (pmb->wsd->current_longer) {
+   wslen = strlen(pmb->wsd.string);
+   if (pmb->wsd.current_longer) {
c += wslen;
cl -= wslen;
} else {
@@ -873,7 +883,6 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
ret->es = l;
ret->next_line = NULL;
-   ret->wsd = NULL;
 
return ret;
 }
@@ -913,76 +922,72 @@ static void add_lines_to_move_detection(struct 
diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
struct moved_entry *match,
struct hashmap *hm,
-   struct moved_entry **pmb,
+   struct moved_block *pmb,
int pmb_nr)
 {
int i;
for (i = 0; i < pmb_nr; i++) {
-   struct moved_entry *prev = pmb[i];
+   struct moved_entry *prev = pmb[i].match;
struct moved_entry *cur = (prev && prev->next_line) ?
prev->next_line : NULL;
if (cur && !hm->cmpfn(o, cur, match, NULL)) {
-   pmb[i] = cur;
+   pmb[i].match = cur;
} else {
-   pmb[i] = NULL;
+   pmb[i].match = NULL;
}
}
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
struct moved_entry *match,
struct hashmap *hm,
-   struct moved_entry **pmb,
+   stru

[PATCH 4/5] diff --color-moved-ws: fix another memory leak

2018-10-02 Thread Phillip Wood
From: Phillip Wood 

This is obvious in retrospect, it was found with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diff.c b/diff.c
index efadd05c90..4464feacf8 100644
--- a/diff.c
+++ b/diff.c
@@ -971,6 +971,8 @@ static void pmb_advance_or_null_multi_match(struct 
diff_options *o,
moved_block_clear([i]);
}
}
+
+   free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
-- 
2.19.0



[PATCH 3/5] diff --color-moved-ws: fix a memory leak

2018-10-02 Thread Phillip Wood
From: Phillip Wood 

Don't duplicate the indentation string if we're not going to use it.
This was found with asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 0096bdc339..efadd05c90 100644
--- a/diff.c
+++ b/diff.c
@@ -785,11 +785,15 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
int d = longer->len - shorter->len;
+   int ret = !strncmp(longer->line + d, shorter->line, shorter->len);
+
+   if (!ret)
+   return ret;
 
out->string = xmemdupz(longer->line, d);
out->current_longer = (a == longer);
 
-   return !strncmp(longer->line + d, shorter->line, shorter->len);
+   return ret;
 }
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
-- 
2.19.0



[PATCH 2/5] diff --color-moved-ws: fix out of bounds string access

2018-10-02 Thread Phillip Wood
From: Phillip Wood 

When adjusting the start of the string to take account of the change
in indentation the code was not checking that the string being
adjusted was in fact longer than the indentation change. This was
detected by asan.

Signed-off-by: Phillip Wood 
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 5a08d64497..0096bdc339 100644
--- a/diff.c
+++ b/diff.c
@@ -841,7 +841,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (strcmp(a, c))
+   if (al < 0 || al != cl || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.0



Re: [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change

2018-09-24 Thread Phillip Wood
On 24/09/2018 11:06, Phillip Wood wrote:
> From: Phillip Wood 
> 
> When trying out the new --color-moved-ws=allow-indentation-change I
> was disappointed to discover it did not work if the indentation
> contains a mix of spaces and tabs. This series adds a new option that
> does. It's and RFC as there are some open questions about how to
> proceed, see the last patch for details. Once there's a clearer way
> forward I'll reroll with some documentation changes.
> 

I should have said that this series is based on top of fab01ec52e
("diff: fix --color-moved-ws=allow-indentation-change", 2018-09-04), the
result merges into current master without conflicts.

Best Wishes

Phillip

> Phillip Wood (3):
>   xdiff-interface: make xdl_blankline() available
>   diff.c: remove unused variables
>   diff: add --color-moved-ws=allow-mixed-indentation-change
> 
>  diff.c | 124 -
>  diff.h |   1 +
>  t/t4015-diff-whitespace.sh |  89 ++
>  xdiff-interface.c  |   5 ++
>  xdiff-interface.h  |   5 ++
>  5 files changed, 208 insertions(+), 16 deletions(-)
> 



[RFC PATCH 2/3] diff.c: remove unused variables

2018-09-24 Thread Phillip Wood
From: Phillip Wood 

The string lengths are not used in cmp_in_block_with_wsd() so lets
remove them.

Signed-off-by: Phillip Wood 
---
 diff.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9393993e33..0a652e28d4 100644
--- a/diff.c
+++ b/diff.c
@@ -789,7 +789,6 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int al = cur->es->len, cl = l->len;
const char *a = cur->es->line,
   *b = match->es->line,
   *c = l->line;
@@ -823,13 +822,10 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
 */
 
wslen = strlen(pmb->wsd->string);
-   if (pmb->wsd->current_longer) {
+   if (pmb->wsd->current_longer)
c += wslen;
-   cl -= wslen;
-   } else {
+   else
a += wslen;
-   al -= wslen;
-   }
 
if (strcmp(a, c))
return 1;
-- 
2.19.0



[RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-09-24 Thread Phillip Wood
From: Phillip Wood 

This adds another mode for highlighting lines that have moved with an
indentation change. Unlike the existing
--color-moved-ws=allow-indentation-change setting this mode uses the
visible change in the indentation to group lines, rather than the
indentation string. This means it works with files that use a mix of
tabs and spaces for indentation and can cope with whitespace errors
where there is a space before a tab (it's the job of
--ws-error-highlight to deal with those errors, it should affect the
move detection). It will also group the lines either
side of a blank line if their indentation change matches so short
lines followed by a blank line followed by more lines with the same
indentation change will be correctly highlighted.

This is a RFC as there are a number of questions about how to proceed
from here:
 1) Do we need a second option or should this implementation replace
--color-moved-ws=allow-indentation-change. I'm unclear if that mode
has any advantages for some people. There seems to have been an
intention [1] to get it working with mixes of tabs and spaces but
nothing ever came of it.
 2) If we keep two options what should this option be called, the name
is long and ambiguous at the moment - mixed could refer to mixed
indentation length rather than a mix of tabs and spaces.
 3) Should we support whitespace flags with this mode?
--ignore-space-at-eol and --ignore-cr-at eol would be fairly simple
to support and I can see a use for them, --ignore-all-space and
--ignore-space-change would need some changes to xdiff to allow them
to apply only after the indentation. I think --ignore-blank-lines
would need a bit of work to get it working as well. (Note the
existing mode does not support any of these flags either)

[1] 
https://public-inbox.org/git/CAGZ79kasAqE+=7ciVrdjoRdu0UFjVBr8Ma502nw+3hZL=eb...@mail.gmail.com/

Signed-off-by: Phillip Wood 
---
 diff.c | 122 +
 diff.h |   1 +
 t/t4015-diff-whitespace.sh |  89 +++
 3 files changed, 199 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 0a652e28d4..45f33daa60 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
else if (!strcmp(sb.buf, "ignore-all-space"))
ret |= XDF_IGNORE_WHITESPACE;
else if (!strcmp(sb.buf, "allow-indentation-change"))
-   ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
+   ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+(ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
+   else if (!strcmp(sb.buf, "allow-mixed-indentation-change"))
+   ret = COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE |
+(ret & ~COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE);
else
error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
 
@@ -314,6 +318,9 @@ static int parse_color_moved_ws(const char *arg)
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
(ret & XDF_WHITESPACE_FLAGS))
die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
+(ret & XDF_WHITESPACE_FLAGS))
+   die(_("color-moved-ws: allow-mixed-indentation-change cannot be 
combined with other white space modes"));
 
string_list_clear(, 0);
 
@@ -763,11 +770,65 @@ struct moved_entry {
  * comparision is longer than the second.
  */
 struct ws_delta {
-   char *string;
+   union {
+   int delta;
+   char *string;
+   };
unsigned int current_longer : 1;
+   unsigned int have_string : 1;
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
+ const struct emitted_diff_symbol *b,
+ int *delta)
+{
+   unsigned int i = 0, j = 0;
+   int la, lb;
+   int ta = a->flags & WS_TAB_WIDTH_MASK;
+   int tb = b->flags & WS_TAB_WIDTH_MASK;
+   const char *sa = a->line;
+   const char *sb = b->line;
+
+   if (xdiff_is_blankline(sa, a->len, 0) &&
+   xdiff_is_blankline(sb, b->len, 0)) {
+   *delta = INT_MIN;
+   return 1;
+   }
+
+   /* skip any \v \f \r at start of indentation */
+   while (sa[i] == '\f' || sa[i] == '\v' ||
+  (sa[i] == '\r' && i < a->len - 1))
+   i++;
+   while (sb[j] == '\f' || sb[j] == '\v' ||
+  (sb[j] == '\r' &&

[RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change

2018-09-24 Thread Phillip Wood
From: Phillip Wood 

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series adds a new option that
does. It's and RFC as there are some open questions about how to
proceed, see the last patch for details. Once there's a clearer way
forward I'll reroll with some documentation changes.

Phillip Wood (3):
  xdiff-interface: make xdl_blankline() available
  diff.c: remove unused variables
  diff: add --color-moved-ws=allow-mixed-indentation-change

 diff.c | 124 -
 diff.h |   1 +
 t/t4015-diff-whitespace.sh |  89 ++
 xdiff-interface.c  |   5 ++
 xdiff-interface.h  |   5 ++
 5 files changed, 208 insertions(+), 16 deletions(-)

-- 
2.19.0



[RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available

2018-09-24 Thread Phillip Wood
From: Phillip Wood 

This will be used by the move detection code.

Signed-off-by: Phillip Wood 
---
 xdiff-interface.c | 5 +
 xdiff-interface.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 9315bc0ede..eceabfa72d 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -308,6 +308,11 @@ int xdiff_compare_lines(const char *l1, long s1,
return xdl_recmatch(l1, s1, l2, s2, flags);
 }
 
+int xdiff_is_blankline(const char *l1, long s1, long flags)
+{
+   return xdl_blankline(l1, s1, flags);
+}
+
 int git_xmerge_style = -1;
 
 int git_xmerge_config(const char *var, const char *value, void *cb)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 135fc05d72..d0008b016f 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -45,4 +45,9 @@ extern int xdiff_compare_lines(const char *l1, long s1,
  */
 extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
 
+/*
+ * Returns 1 if the line is blank, taking XDF_WHITESPACE_FLAGS into account
+ */
+extern int xdiff_is_blankline(const char *s, long len, long flags);
+
 #endif
-- 
2.19.0



  1   2   3   4   5   6   >