Re: easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-03 Thread Jeff King
On Mon, Dec 03, 2018 at 02:30:44PM -0800, Matthew DeVore wrote:

> Here is a one-liner to do it. It is Perl line noise, so it's not very cute,
> thought that is subjective. The output shown below is for the Git project
> (not Linux) repository as I've currently synced it:
> 
> $ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; $long
> = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < length($long); if
> (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = substr($prev, 0, $i);
> last} } $prev = $n; END {say $long}'

Ooh, object-collision golf.

Try:

  git cat-file --batch-all-objects --batch-check='%(objectname)'

instead of "rev-list | sort". It's _much_ faster, because it doesn't
have to actually open the objects and walk the graph.

Some versions of uniq have "-w" (including GNU, but it's definitely not
in POSIX), which lets you do:

  git cat-file --batch-all-objects --batch-check='%(objectname)' |
  uniq -cdw 7

to list all collisions of length 7 (it will show just the first item
from each group, but you can use -D to see them all).

> > You'll always need to list them all. It's inherently an operation where
> > for each SHA-1 you need to search for other ones with that prefix up to
> > a given length.
> > 
> > Perhaps you've missed that you can use --abbrev=N for this, and just
> > grep for things that are loger than that N, e.g. for linux.git:
> > 
> >  git log --oneline --abbrev=10 --pretty=format:%h |
> >  grep -E -v '^.{10}$' |
> >  perl -pe 's/^(.{10}).*/$1/'
> 
> I think the goal was to search all object hashes, not just commits. And git
> rev-list --objects will do that.

You can add "-t --raw" to see the abbreviated tree and blob names,
though it gets tricky around handling merges.

-Peff


Re: easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-03 Thread Matthew DeVore




On 12/02/2018 05:23 AM, Ævar Arnfjörð Bjarmason wrote:


On Sun, Dec 02 2018, Robert P. J. Day wrote:


   as part of an upcoming git class i'm delivering, i thought it would
be amusing to demonstrate the maximum length of colliding SHA-1
prefixes in a repository (in my case, i use the linux kernel git repo
for most of my examples).

   is there a way to display the objects in the object database that
clash in the longest object name SHA-1 prefix; i mean, short of
manually listing all object names, running that through cut and sort
and uniq and ... you get the idea.

   is there a cute way to do that? thanks.




Here is a one-liner to do it. It is Perl line noise, so it's not very 
cute, thought that is subjective. The output shown below is for the Git 
project (not Linux) repository as I've currently synced it:


$ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; 
$long = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < 
length($long); if (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = 
substr($prev, 0, $i); last} } $prev = $n; END {say $long}'


c68038ef

$ git cat-file -t c68038ef

error: short SHA1 c68038ef is ambiguous
hint: The candidates are:
hint:   c68038effe commit 2012-06-01 - vcs-svn: suppress a 
signed/unsigned comparison warning

hint:   c68038ef00 blob
fatal: Not a valid object name c68038ef



You'll always need to list them all. It's inherently an operation where
for each SHA-1 you need to search for other ones with that prefix up to
a given length.

Perhaps you've missed that you can use --abbrev=N for this, and just
grep for things that are loger than that N, e.g. for linux.git:

 git log --oneline --abbrev=10 --pretty=format:%h |
 grep -E -v '^.{10}$' |
 perl -pe 's/^(.{10}).*/$1/'


I think the goal was to search all object hashes, not just commits. And 
git rev-list --objects will do that.


Re: easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-02 Thread Robert P. J. Day
On Sun, 2 Dec 2018, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Dec 02 2018, Robert P. J. Day wrote:
>
> >   as part of an upcoming git class i'm delivering, i thought it
> > would be amusing to demonstrate the maximum length of colliding
> > SHA-1 prefixes in a repository (in my case, i use the linux kernel
> > git repo for most of my examples).
> >
> >   is there a way to display the objects in the object database
> > that clash in the longest object name SHA-1 prefix; i mean, short
> > of manually listing all object names, running that through cut and
> > sort and uniq and ... you get the idea.
> >
> >   is there a cute way to do that? thanks.
>
> You'll always need to list them all. It's inherently an operation
> where for each SHA-1 you need to search for other ones with that
> prefix up to a given length.

  i assumed as much, just wasn't sure about the esoteric dark corners
of git i've never gotten to yet.

> Perhaps you've missed that you can use --abbrev=N for this, and just
> grep for things that are loger than that N, e.g. for linux.git:
>
> git log --oneline --abbrev=10 --pretty=format:%h |
> grep -E -v '^.{10}$' |
> perl -pe 's/^(.{10}).*/$1/'
>
> This will list the 4 objects that need more than 10 characters to be
> shown unambiguously. If you then "git cat-file -t" them you'll get
> the disambiguation help.

  that's pretty close to what i came up with, thanks.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday


Re: easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-02 Thread Ævar Arnfjörð Bjarmason


On Sun, Dec 02 2018, Robert P. J. Day wrote:

>   as part of an upcoming git class i'm delivering, i thought it would
> be amusing to demonstrate the maximum length of colliding SHA-1
> prefixes in a repository (in my case, i use the linux kernel git repo
> for most of my examples).
>
>   is there a way to display the objects in the object database that
> clash in the longest object name SHA-1 prefix; i mean, short of
> manually listing all object names, running that through cut and sort
> and uniq and ... you get the idea.
>
>   is there a cute way to do that? thanks.

You'll always need to list them all. It's inherently an operation where
for each SHA-1 you need to search for other ones with that prefix up to
a given length.

Perhaps you've missed that you can use --abbrev=N for this, and just
grep for things that are loger than that N, e.g. for linux.git:

git log --oneline --abbrev=10 --pretty=format:%h |
grep -E -v '^.{10}$' |
perl -pe 's/^(.{10}).*/$1/'

This will list the 4 objects that need more than 10 characters to be
shown unambiguously. If you then "git cat-file -t" them you'll get the
disambiguation help.


easy way to demonstrate length of colliding SHA-1 prefixes?

2018-12-02 Thread Robert P. J. Day


  as part of an upcoming git class i'm delivering, i thought it would
be amusing to demonstrate the maximum length of colliding SHA-1
prefixes in a repository (in my case, i use the linux kernel git repo
for most of my examples).

  is there a way to display the objects in the object database that
clash in the longest object name SHA-1 prefix; i mean, short of
manually listing all object names, running that through cut and sort
and uniq and ... you get the idea.

  is there a cute way to do that? thanks.

rday