Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Jeff King
On Sun, May 19, 2013 at 02:17:35PM +0100, John Keeping wrote: When using git cherry or git log --cherry-pick we often have a small number of commits on one side and a large number on the other. In revision.c::cherry_pick_list we store the patch IDs for the small side before comparing the

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Jeff King
On Wed, May 29, 2013 at 02:20:07AM -0400, Jeff King wrote: In the best case, we compute no patch-ids at all. And even for the average case, I'd expect our lazy calculation to only have to compute a handful of ids. Here is a not-well-tested version of the idea. I tried to contain the changes

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Jeff King
On Wed, May 29, 2013 at 03:22:25AM -0400, Jeff King wrote: revs=origin/master...origin/jk/submodule-subdirectory-ok stock|you |me --- real 0m0.501s | 0m0.078s | 0m0.098s user 0m0.480s | 0m0.056s | 0m0.084s sys 0m0.016s |

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Junio C Hamano
Jeff King p...@peff.net writes: I think such a loose patch-id could just be a hash of the filenames that were changed by the patch (e.g., the first 32-bits of the sha1 of the concatenated filenames). Computing that should be about as expensive as a tree-diff. Per observation 2 above, if two

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Jeff King
On Wed, May 29, 2013 at 11:08:46AM -0700, Junio C Hamano wrote: This has rather interesting ramifications on cherry-pick and rebase, though. Both command can handle changes that come from an old tree before some paths were renamed, but strict patch-id would not spot equivalent changes we

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-29 Thread Junio C Hamano
Jeff King p...@peff.net writes: On Wed, May 29, 2013 at 11:08:46AM -0700, Junio C Hamano wrote: This has rather interesting ramifications on cherry-pick and rebase, though. Both command can handle changes that come from an old tree before some paths were renamed, but strict patch-id would

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-20 Thread John Keeping
On Sun, May 19, 2013 at 11:36:23PM -0700, Jonathan Nieder wrote: I don't know what it should mean to try to use --cherry without --no-merges or --first-parent, so I guess this is harmless. Currently --no-merges doesn't actually get passed down this far. We do the patch ID calculations without

[RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-19 Thread John Keeping
When using git cherry or git log --cherry-pick we often have a small number of commits on one side and a large number on the other. In revision.c::cherry_pick_list we store the patch IDs for the small side before comparing the large side to this. In this case, it is likely that only a small

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-19 Thread Jonathan Nieder
John Keeping wrote: In this case, it is likely that only a small number of paths are touched by the commits on the smaller side of the range and by storing these we can ignore many commits on the other side before unpacking blobs and diffing them. I like this idea a lot. --- a/patch-ids.c

Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

2013-05-19 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes: @@ -64,6 +199,13 @@ static struct patch_id *add_commit(struct commit *commit, unsigned char sha1[20]; int pos; +if (no_add) { +if (collect_touched_paths(commit, ids, 1) 0) +return NULL; Ah, so