Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Derrick Stolee

On 2/21/2018 6:13 PM, Jeff King wrote:

On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote:


The get_cached_commit_buffer() method provides access to the buffer
loaded for a struct commit, if it was ever loadead and was not freed.

Two places use this to inform how to output information about commits.

log-tree.c uses this method to short-circuit the output of commit
information when the buffer is not cached. However, this leads to
incorrect output in 'git log --oneline' where the short-OID is written
but then the rest of the commit information is dropped and the next
commit is written on the same line.

rev-list uses this method for two reasons:

- First, if the revision walk visits a commit twice, the buffer was
   freed by rev-list in the first write. The output then does not
   match the format expectations, since the OID is written without the
   rest of the content.

I'm not sure after my earlier digging if there is even a way to trigger
this (and if so, it is probably accidental, since those lines were added
explicitly for --show-all).

And actually after re-reading the commit message for 3131b7130 again, I
think the current behavior is definitely not something that was
carefully planned. So I'd propose a commit message like below.


I only submitted my patch to avoid making you do the work of writing the 
commit message. My messages still don't have quite the right amount of 
detail (or the correct details, in this case).


Junio: please add

Reported-by: Derrick Stolee <dsto...@microsoft.com>

Thanks,

-Stolee




-- >8 --
Subject: [PATCH] commit: drop uses of get_cached_commit_buffer()

The "--show-all" revision option shows UNINTERESTING
commits. Some of these commits may be unparsed when we try
to show them (since we may or may not need to walk their
parents to fulfill the request).

Commit 3131b71301 (Add "--show-all" revision walker flag for
debugging, 2008-02-09) resolved this by just skipping
pretty-printing for commits without their object contents
cached, saying:

   Because we now end up listing commits we may not even have been parsed
   at all "show_log" and "show_commit" need to protect against commits
   that don't have a commit buffer entry.

That was the easy fix to avoid the pretty-printer segfaulting,
but:

   1. It doesn't work for all formats. E.g., --oneline
  prints the oid for each such commit but not a trailing
  newline, leading to jumbled output.

   2. It only affects some commits, depending on whether we
  happened to parse them or not (so if they were at the
  tip of an UNINTERESTING starting point, or if we
  happened to traverse over them, you'd see more data).

   3. It unncessarily ties the decision to show the verbose
  header to whether the commit buffer was cached. That
  makes it harder to change the logic around caching
  (e.g., if we could traverse without actually loading
  the full commit objects).

These days it's safe to feed such a commit to the
pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily
load missing commit buffers, 2013-01-26), we'll load it on
demand in such a case. So let's just always show the verbose
headers.

This does change the behavior of plumbing, but:

   a. The --show-all option was explicitly introduced as a
  debugging aid, and was never documented (and has rarely
  even been mentioned on the list by git devs).

   b. Avoiding the commits was already not deterministic due
  to (2) above. So the caller might have seen full
  headers for these commits anyway, and would need to be
  prepared for it.

Signed-off-by: Jeff King <p...@peff.net>
---
  builtin/rev-list.c | 2 +-
  log-tree.c | 3 ---
  2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48300d9e11..d320b6f1e3 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data)
else
putchar('\n');
  
-	if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {

+   if (revs->verbose_header) {
struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index fc0cc0d6d1..22b2fb6c58 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -659,9 +659,6 @@ void show_log(struct rev_info *opt)
show_mergetag(opt, commit);
}
  
-	if (!get_cached_commit_buffer(commit, NULL))

-   return;
-
if (opt->show_notes) {
int raw;
struct strbuf notebuf = STRBUF_INIT;


Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Jeff King
On Wed, Feb 21, 2018 at 02:19:17PM -0500, Derrick Stolee wrote:

> > These behaviors are undocumented, untested, and unlikely to be
> > expected by users or other software attempting to parse this output.
> > 
> > Helped-by: Jeff King 
> > Signed-off-by: Derrick Stolee 
> 
> This would be a good time to allow multiple authors, or to just change the
> author, since this is exactly the diff you (Peff) provided in an earlier
> email. The commit message hopefully summarizes our discussion, but I welcome
> edits.

The point is moot if we take the revision I just sent (though in
retrospect I really ought to have put in a Reported-by: for you there).

But some communities are settling on Co-authored-by as a trailer for
this case. And GitHub has started parsing and showing that along with
author information:

  https://github.com/blog/2496-commit-together-with-co-authors

-Peff


Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Jeff King
On Wed, Feb 21, 2018 at 03:22:02PM -0800, Stefan Beller wrote:

> > Subject: [PATCH] commit: drop uses of get_cached_commit_buffer()
> > ---
> >  builtin/rev-list.c | 2 +-
> >  log-tree.c | 3 ---
> >  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> Now if we'd get around to rewrite pretty.c as well, we could make it static,
> giving a stronger reason of not using that function. But it looks a bit
> complicated to me, who is not familiar in that area of the code.
> 
> Thanks for making less use of this suboptimal API,

I'm not sure the API is suboptimal. It's not wrong to ask "do you have a
cached copy of this?". It was just being used poorly here. :)

See the discussion in

  https://public-inbox.org/git/20180221184811.gd4...@sigill.intra.peff.net/

-Peff


Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Stefan Beller
> Subject: [PATCH] commit: drop uses of get_cached_commit_buffer()
> ---
>  builtin/rev-list.c | 2 +-
>  log-tree.c | 3 ---
>  2 files changed, 1 insertion(+), 4 deletions(-)

Now if we'd get around to rewrite pretty.c as well, we could make it static,
giving a stronger reason of not using that function. But it looks a bit
complicated to me, who is not familiar in that area of the code.

Thanks for making less use of this suboptimal API,
Stefan


Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Jeff King
On Wed, Feb 21, 2018 at 02:17:11PM -0500, Derrick Stolee wrote:

> The get_cached_commit_buffer() method provides access to the buffer
> loaded for a struct commit, if it was ever loadead and was not freed.
> 
> Two places use this to inform how to output information about commits.
> 
> log-tree.c uses this method to short-circuit the output of commit
> information when the buffer is not cached. However, this leads to
> incorrect output in 'git log --oneline' where the short-OID is written
> but then the rest of the commit information is dropped and the next
> commit is written on the same line.
> 
> rev-list uses this method for two reasons:
> 
> - First, if the revision walk visits a commit twice, the buffer was
>   freed by rev-list in the first write. The output then does not
>   match the format expectations, since the OID is written without the
>   rest of the content.

I'm not sure after my earlier digging if there is even a way to trigger
this (and if so, it is probably accidental, since those lines were added
explicitly for --show-all).

And actually after re-reading the commit message for 3131b7130 again, I
think the current behavior is definitely not something that was
carefully planned. So I'd propose a commit message like below.

-- >8 --
Subject: [PATCH] commit: drop uses of get_cached_commit_buffer()

The "--show-all" revision option shows UNINTERESTING
commits. Some of these commits may be unparsed when we try
to show them (since we may or may not need to walk their
parents to fulfill the request).

Commit 3131b71301 (Add "--show-all" revision walker flag for
debugging, 2008-02-09) resolved this by just skipping
pretty-printing for commits without their object contents
cached, saying:

  Because we now end up listing commits we may not even have been parsed
  at all "show_log" and "show_commit" need to protect against commits
  that don't have a commit buffer entry.

That was the easy fix to avoid the pretty-printer segfaulting,
but:

  1. It doesn't work for all formats. E.g., --oneline
 prints the oid for each such commit but not a trailing
 newline, leading to jumbled output.

  2. It only affects some commits, depending on whether we
 happened to parse them or not (so if they were at the
 tip of an UNINTERESTING starting point, or if we
 happened to traverse over them, you'd see more data).

  3. It unncessarily ties the decision to show the verbose
 header to whether the commit buffer was cached. That
 makes it harder to change the logic around caching
 (e.g., if we could traverse without actually loading
 the full commit objects).

These days it's safe to feed such a commit to the
pretty-print code. Since be5c9fb904 (logmsg_reencode: lazily
load missing commit buffers, 2013-01-26), we'll load it on
demand in such a case. So let's just always show the verbose
headers.

This does change the behavior of plumbing, but:

  a. The --show-all option was explicitly introduced as a
 debugging aid, and was never documented (and has rarely
 even been mentioned on the list by git devs).

  b. Avoiding the commits was already not deterministic due
 to (2) above. So the caller might have seen full
 headers for these commits anyway, and would need to be
 prepared for it.

Signed-off-by: Jeff King <p...@peff.net>
---
 builtin/rev-list.c | 2 +-
 log-tree.c | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48300d9e11..d320b6f1e3 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data)
else
putchar('\n');
 
-   if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {
+   if (revs->verbose_header) {
struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index fc0cc0d6d1..22b2fb6c58 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -659,9 +659,6 @@ void show_log(struct rev_info *opt)
show_mergetag(opt, commit);
}
 
-   if (!get_cached_commit_buffer(commit, NULL))
-   return;
-
if (opt->show_notes) {
int raw;
struct strbuf notebuf = STRBUF_INIT;
-- 
2.16.2.555.g885a024879



Re: [PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Derrick Stolee

On 2/21/2018 2:17 PM, Derrick Stolee wrote:

The get_cached_commit_buffer() method provides access to the buffer
loaded for a struct commit, if it was ever loadead and was not freed.

Two places use this to inform how to output information about commits.

log-tree.c uses this method to short-circuit the output of commit
information when the buffer is not cached. However, this leads to
incorrect output in 'git log --oneline' where the short-OID is written
but then the rest of the commit information is dropped and the next
commit is written on the same line.

rev-list uses this method for two reasons:

- First, if the revision walk visits a commit twice, the buffer was
   freed by rev-list in the first write. The output then does not
   match the format expectations, since the OID is written without the
   rest of the content.

- Second, if the revision walk visits a commit that was marked
   UNINTERESTING, the walk may never load a buffer and hence rev-list
   will not output the verbose information.

These behaviors are undocumented, untested, and unlikely to be
expected by users or other software attempting to parse this output.

Helped-by: Jeff King 
Signed-off-by: Derrick Stolee 


This would be a good time to allow multiple authors, or to just change 
the author, since this is exactly the diff you (Peff) provided in an 
earlier email. The commit message hopefully summarizes our discussion, 
but I welcome edits.



---
  builtin/rev-list.c | 2 +-
  log-tree.c | 3 ---
  2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48300d9..d320b6f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data)
else
putchar('\n');
  
-	if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {

+   if (revs->verbose_header) {
struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index fc0cc0d..22b2fb6 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -659,9 +659,6 @@ void show_log(struct rev_info *opt)
show_mergetag(opt, commit);
}
  
-	if (!get_cached_commit_buffer(commit, NULL))

-   return;
-
if (opt->show_notes) {
int raw;
struct strbuf notebuf = STRBUF_INIT;




[PATCH] commit: drop uses of get_cached_commit_buffer()

2018-02-21 Thread Derrick Stolee
The get_cached_commit_buffer() method provides access to the buffer
loaded for a struct commit, if it was ever loadead and was not freed.

Two places use this to inform how to output information about commits.

log-tree.c uses this method to short-circuit the output of commit
information when the buffer is not cached. However, this leads to
incorrect output in 'git log --oneline' where the short-OID is written
but then the rest of the commit information is dropped and the next
commit is written on the same line.

rev-list uses this method for two reasons:

- First, if the revision walk visits a commit twice, the buffer was
  freed by rev-list in the first write. The output then does not
  match the format expectations, since the OID is written without the
  rest of the content.

- Second, if the revision walk visits a commit that was marked
  UNINTERESTING, the walk may never load a buffer and hence rev-list
  will not output the verbose information.

These behaviors are undocumented, untested, and unlikely to be
expected by users or other software attempting to parse this output.

Helped-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 builtin/rev-list.c | 2 +-
 log-tree.c | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48300d9..d320b6f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -134,7 +134,7 @@ static void show_commit(struct commit *commit, void *data)
else
putchar('\n');
 
-   if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {
+   if (revs->verbose_header) {
struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index fc0cc0d..22b2fb6 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -659,9 +659,6 @@ void show_log(struct rev_info *opt)
show_mergetag(opt, commit);
}
 
-   if (!get_cached_commit_buffer(commit, NULL))
-   return;
-
if (opt->show_notes) {
int raw;
struct strbuf notebuf = STRBUF_INIT;
-- 
2.7.4