Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-23 Thread Prarit Bhargava



On 10/23/19 1:02 AM, Jeff King wrote:
> On Tue, Oct 22, 2019 at 07:28:47PM -0400, Prarit Bhargava wrote:
> 
>> In many projects the number of contributors is low enough that users know
>> each other and the full email address doesn't need to be displayed.
>> Displaying only the author's username saves a lot of columns on the screen.
>> For example displaying "prarit" instead of "pra...@redhat.com" saves 11
>> columns.
>>
>> Add a "%aU"|"%au" option that outputs the author's email username.
> 
> Like others, this seems potentially useful even if I probably wouldn't
> use it myself. Another more complicated way to think of it would be to
> give a list of domains to omit (so if 90% of the committers are
> @redhat.com, we can skip that, but the one-off contributor from another
> domain gets their fully qualified name.>
> But that's a lot more complicated. I don't mind doing the easy thing
> now, and even if we later grew the more complicated thing, I wouldn't be
> sad to still have this easy one as an option.

FWIW, I went through the exact same thought process as you did and came to the
same conclusion.

> 
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, 
>> char part,
>>  strbuf_add(sb, mail, maillen);
>>  return placeholder_len;
>>  }
>> +if (part == 'u' || part == 'U') {   /* username */
>> +maillen = strstr(s.mail_begin, "@") - s.mail_begin;
>> +strbuf_add(sb, mail, maillen);
>> +return placeholder_len;
>> +}
> 
> What happens if the email doesn't have an "@"? I think you'd either end
> up printing a bunch of extra cruft (because you're not limiting the
> search for "@" to the boundaries from split_ident_line) or you'd
> crash (if there's no "@" at all, and you'd get a huge maillen).
> 
> There's also no need to use the slower strstr() when looking for a
> single character. So perhaps:
> 
>   const char *at = memchr(mail, '@', maillen);
>   if (at)
>   maillen = at - mail;
>   strbuf_add(sb, mail, maillen);

TBH I had assumed that the email address was RFC2822 compliant.  Thanks for the
code.  I've incorporated it into v2.

> 
>> +test_expect_success 'log pretty %an %ae %au' '
> 
> As others noted, this could cover %aU, too (which is broken; you need to
> handle 'U' alongside 'E' and 'N' earlier in format_person_part()).

Whups.  Thanks for the pointer.  Also fixed in v2.

> 
>> +git checkout -b anaeau &&
>> +test_commit anaeau_test anaeau_test_file &&
>> +git log --pretty="%an" > actual &&
>> +git log --pretty="%ae" >> actual &&
>> +git log --pretty="%au" >> actual &&
> 
> Maybe:
> 
>   git log --pretty="%an %ae %au"
> 
> or
> 
>   git log --pretty="%an%n%ae%n%au"
> 
> which is shorter and runs more efficiently?
> 
>> +git log > full &&
>> +name=$(cat full | grep "^Author: " | awk -F "Author: " " { print \$2 } 
>> " | awk -F " <" " { print \$1 } ") &&
>> +email=$(cat full | grep "^Author: " | awk -F "<" " { print \$2 } " | 
>> awk -F ">" " { print \$1 } ") &&
>> +username=$(cat full | grep "^Author: " | awk -F "<" " { print \$2 } " | 
>> awk -F ">" " { print \$1 } " | awk -F "@" " { print \$1 } " ) &&
>> +echo "${name}" > expect &&
>> +echo "${email}" >> expect &&
>> +echo "${username}" >> expect &&
> 
> These values come from our hard-coded test setup, so it would be more
> readable to just expect those:
> 
>   {
>   echo "$GIT_AUTHOR_NAME" &&
>   echo "$GIT_AUTHOR_EMAIL" &&
>   echo "$GIT_AUTHOR_EMAIL" | sed "s/@.*//"
>   } >expect

Added to v2 (along with Brian's suggestion to test %aE, %aN, and %aL).

> 
> For the last one, also I wouldn't be upset to see test-lib.sh do
> something like:
> 
>   TEST_AUTHOR_USERNAME=author
>   TEST_AUTHOR_DOMAIN=example.com
>   GIT_AUTHOR_NAME=$TEST_AUTHOR_USERNAME@$TEST_AUTHOR_DOMAIN

I like this suggestion a lot.  I'm incorporating into v2 as well as similar
changes for the COMMITTER fields.

P.

> 
> to let tests like this pick out the individual values if they want.
> 
> -Peff
> 



Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-23 Thread Prarit Bhargava



On 10/22/19 7:48 PM, brian m. carlson wrote:
> On 2019-10-22 at 23:28:47, Prarit Bhargava wrote:
>> In many projects the number of contributors is low enough that users know
>> each other and the full email address doesn't need to be displayed.
>> Displaying only the author's username saves a lot of columns on the screen.
>> For example displaying "prarit" instead of "pra...@redhat.com" saves 11
>> columns.
>>
>> Add a "%aU"|"%au" option that outputs the author's email username.
> 
> I have no position on whether or not this is a useful change.  I don't
> think I'll end up using it, but I can imagine cases where it is useful,
> such as certain corporate environments.  It would be interesting to see
> what others think.
> 
>> diff --git a/Documentation/pretty-formats.txt 
>> b/Documentation/pretty-formats.txt
>> index b87e2e83e6d0..479a15a8ab12 100644
>> --- a/Documentation/pretty-formats.txt
>> +++ b/Documentation/pretty-formats.txt
>> @@ -163,6 +163,9 @@ The placeholders are:
>>  '%ae':: author email
>>  '%aE':: author email (respecting .mailmap, see linkgit:git-shortlog[1]
>>  or linkgit:git-blame[1])
>> +'%au':: author username
>> +'%aU':: author username (respecting .mailmap, see linkgit:git-shortlog[1]
>> +or linkgit:git-blame[1])
> 
> I think we need to actually document what "username" means here.  I
> expect you'll have people think that this magically means their
> $HOSTING_PLATFORM username, which it is not.
> 

Based on Junio's input, I'm going to change the option to "al" and "aL" where L
means "local-part" as defined by the rfc2822.txt specification, and include a
comment that says "(the portion of the email address preceding the '@' symbol)".
 Admittedly that doesn't convey the meaning of the mailbox concept of the email
address it does tell a user what is going to be output.


>> diff --git a/pretty.c b/pretty.c
>> index b32f0369531c..2a5b93022050 100644
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, 
>> char part,
>>  strbuf_add(sb, mail, maillen);
>>  return placeholder_len;
>>  }
>> +if (part == 'u' || part == 'U') {   /* username */
>> +maillen = strstr(s.mail_begin, "@") - s.mail_begin;
>> +strbuf_add(sb, mail, maillen);
>> +return placeholder_len;
>> +}
> 
> This branch doesn't appear to do anything different for the mailmap and
> non-mailmap cases.  Perhaps adding an additional test that demonstrates
> the difference would be a good idea.
> 

Thanks for the suggestion.  I'll look into it for v2.

P.



Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-23 Thread Prarit Bhargava



On 10/22/19 7:46 PM, Junio C Hamano wrote:
> Prarit Bhargava  writes:
> 
>> Subject: Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's 
>> username
> 
> Downcase "Add" (see "git shortlog --no-merges -100 master" and
> mimick the project convention).

I'll fix that.

> 
>> Add a "%aU"|"%au" option that outputs the author's email username.
> 
> Even though I personally do not see the use for it, I agree it would
> make sense to have an option to show the local part only where the
> e-mail address is shown.  
> 
> I do not know if u/U is a good mnemonic; it hints too strongly that
> it may come from GIT_{AUTHOR/COMMITTER}_NAME but that is not what
> you are doing---isn't there a letter that better conveys that this
> is about RFC 2822 local-part (cf. page 16 ieft.org/rfc/rfc2822.txt)?

I'll go with "l" and "L" for local-part as defined on page 16.  I will also
include a comment in braces that says (the portion of the email address
preceding the '@' symbol)".  Admittedly that doesn't convey the meaning of the
mailbox concept of the email address it does tell a user what is going to be 
output.

> 
>> diff --git a/Documentation/pretty-formats.txt 
>> b/Documentation/pretty-formats.txt
>> index b87e2e83e6d0..479a15a8ab12 100644
>> --- a/Documentation/pretty-formats.txt
>> +++ b/Documentation/pretty-formats.txt
>> @@ -163,6 +163,9 @@ The placeholders are:
>>  '%ae':: author email
>>  '%aE':: author email (respecting .mailmap, see linkgit:git-shortlog[1]
>>  or linkgit:git-blame[1])
>> +'%au':: author username
>> +'%aU':: author username (respecting .mailmap, see linkgit:git-shortlog[1]
>> +or linkgit:git-blame[1])
>>  '%ad':: author date (format respects --date= option)
>>  '%aD':: author date, RFC2822 style
>>  '%ar':: author date, relative
> 
>> diff --git a/pretty.c b/pretty.c
>> index b32f0369531c..2a5b93022050 100644
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, 
>> char part,
>>  strbuf_add(sb, mail, maillen);
>>  return placeholder_len;
>>  }
>> +if (part == 'u' || part == 'U') {   /* username */
>> +maillen = strstr(s.mail_begin, "@") - s.mail_begin;
>> +strbuf_add(sb, mail, maillen);
>> +return placeholder_len;
>> +}
> 
> I think users get %eu and %eU for free with this change, which should
> be documented.
> 

Did you mean %cu and %cU?  Or am I missing something with "%e"?

P.



Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-22 Thread Jeff King
On Tue, Oct 22, 2019 at 07:28:47PM -0400, Prarit Bhargava wrote:

> In many projects the number of contributors is low enough that users know
> each other and the full email address doesn't need to be displayed.
> Displaying only the author's username saves a lot of columns on the screen.
> For example displaying "prarit" instead of "pra...@redhat.com" saves 11
> columns.
> 
> Add a "%aU"|"%au" option that outputs the author's email username.

Like others, this seems potentially useful even if I probably wouldn't
use it myself. Another more complicated way to think of it would be to
give a list of domains to omit (so if 90% of the committers are
@redhat.com, we can skip that, but the one-off contributor from another
domain gets their fully qualified name.

But that's a lot more complicated. I don't mind doing the easy thing
now, and even if we later grew the more complicated thing, I wouldn't be
sad to still have this easy one as an option.

> --- a/pretty.c
> +++ b/pretty.c
> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, char 
> part,
>   strbuf_add(sb, mail, maillen);
>   return placeholder_len;
>   }
> + if (part == 'u' || part == 'U') {   /* username */
> + maillen = strstr(s.mail_begin, "@") - s.mail_begin;
> + strbuf_add(sb, mail, maillen);
> + return placeholder_len;
> + }

What happens if the email doesn't have an "@"? I think you'd either end
up printing a bunch of extra cruft (because you're not limiting the
search for "@" to the boundaries from split_ident_line) or you'd
crash (if there's no "@" at all, and you'd get a huge maillen).

There's also no need to use the slower strstr() when looking for a
single character. So perhaps:

  const char *at = memchr(mail, '@', maillen);
  if (at)
  maillen = at - mail;
  strbuf_add(sb, mail, maillen);

> +test_expect_success 'log pretty %an %ae %au' '

As others noted, this could cover %aU, too (which is broken; you need to
handle 'U' alongside 'E' and 'N' earlier in format_person_part()).

> + git checkout -b anaeau &&
> + test_commit anaeau_test anaeau_test_file &&
> + git log --pretty="%an" > actual &&
> + git log --pretty="%ae" >> actual &&
> + git log --pretty="%au" >> actual &&

Maybe:

  git log --pretty="%an %ae %au"

or

  git log --pretty="%an%n%ae%n%au"

which is shorter and runs more efficiently?

> + git log > full &&
> + name=$(cat full | grep "^Author: " | awk -F "Author: " " { print \$2 } 
> " | awk -F " <" " { print \$1 } ") &&
> + email=$(cat full | grep "^Author: " | awk -F "<" " { print \$2 } " | 
> awk -F ">" " { print \$1 } ") &&
> + username=$(cat full | grep "^Author: " | awk -F "<" " { print \$2 } " | 
> awk -F ">" " { print \$1 } " | awk -F "@" " { print \$1 } " ) &&
> + echo "${name}" > expect &&
> + echo "${email}" >> expect &&
> + echo "${username}" >> expect &&

These values come from our hard-coded test setup, so it would be more
readable to just expect those:

  {
echo "$GIT_AUTHOR_NAME" &&
echo "$GIT_AUTHOR_EMAIL" &&
echo "$GIT_AUTHOR_EMAIL" | sed "s/@.*//"
  } >expect

For the last one, also I wouldn't be upset to see test-lib.sh do
something like:

  TEST_AUTHOR_USERNAME=author
  TEST_AUTHOR_DOMAIN=example.com
  GIT_AUTHOR_NAME=$TEST_AUTHOR_USERNAME@$TEST_AUTHOR_DOMAIN

to let tests like this pick out the individual values if they want.

-Peff


Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-22 Thread brian m. carlson
On 2019-10-22 at 23:28:47, Prarit Bhargava wrote:
> In many projects the number of contributors is low enough that users know
> each other and the full email address doesn't need to be displayed.
> Displaying only the author's username saves a lot of columns on the screen.
> For example displaying "prarit" instead of "pra...@redhat.com" saves 11
> columns.
> 
> Add a "%aU"|"%au" option that outputs the author's email username.

I have no position on whether or not this is a useful change.  I don't
think I'll end up using it, but I can imagine cases where it is useful,
such as certain corporate environments.  It would be interesting to see
what others think.

> diff --git a/Documentation/pretty-formats.txt 
> b/Documentation/pretty-formats.txt
> index b87e2e83e6d0..479a15a8ab12 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -163,6 +163,9 @@ The placeholders are:
>  '%ae':: author email
>  '%aE':: author email (respecting .mailmap, see linkgit:git-shortlog[1]
>   or linkgit:git-blame[1])
> +'%au':: author username
> +'%aU':: author username (respecting .mailmap, see linkgit:git-shortlog[1]
> + or linkgit:git-blame[1])

I think we need to actually document what "username" means here.  I
expect you'll have people think that this magically means their
$HOSTING_PLATFORM username, which it is not.

> diff --git a/pretty.c b/pretty.c
> index b32f0369531c..2a5b93022050 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, char 
> part,
>   strbuf_add(sb, mail, maillen);
>   return placeholder_len;
>   }
> + if (part == 'u' || part == 'U') {   /* username */
> + maillen = strstr(s.mail_begin, "@") - s.mail_begin;
> + strbuf_add(sb, mail, maillen);
> + return placeholder_len;
> + }

This branch doesn't appear to do anything different for the mailmap and
non-mailmap cases.  Perhaps adding an additional test that demonstrates
the difference would be a good idea.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's username

2019-10-22 Thread Junio C Hamano
Prarit Bhargava  writes:

> Subject: Re: [PATCH] pretty: Add "%aU"|"%au" option to output author's 
> username

Downcase "Add" (see "git shortlog --no-merges -100 master" and
mimick the project convention).

> Add a "%aU"|"%au" option that outputs the author's email username.

Even though I personally do not see the use for it, I agree it would
make sense to have an option to show the local part only where the
e-mail address is shown.  

I do not know if u/U is a good mnemonic; it hints too strongly that
it may come from GIT_{AUTHOR/COMMITTER}_NAME but that is not what
you are doing---isn't there a letter that better conveys that this
is about RFC 2822 local-part (cf. page 16 ieft.org/rfc/rfc2822.txt)?

> diff --git a/Documentation/pretty-formats.txt 
> b/Documentation/pretty-formats.txt
> index b87e2e83e6d0..479a15a8ab12 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -163,6 +163,9 @@ The placeholders are:
>  '%ae':: author email
>  '%aE':: author email (respecting .mailmap, see linkgit:git-shortlog[1]
>   or linkgit:git-blame[1])
> +'%au':: author username
> +'%aU':: author username (respecting .mailmap, see linkgit:git-shortlog[1]
> + or linkgit:git-blame[1])
>  '%ad':: author date (format respects --date= option)
>  '%aD':: author date, RFC2822 style
>  '%ar':: author date, relative

> diff --git a/pretty.c b/pretty.c
> index b32f0369531c..2a5b93022050 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -706,6 +706,11 @@ static size_t format_person_part(struct strbuf *sb, char 
> part,
>   strbuf_add(sb, mail, maillen);
>   return placeholder_len;
>   }
> + if (part == 'u' || part == 'U') {   /* username */
> + maillen = strstr(s.mail_begin, "@") - s.mail_begin;
> + strbuf_add(sb, mail, maillen);
> + return placeholder_len;
> + }

I think users get %eu and %eU for free with this change, which should
be documented.