Refactoring hardcoded SHA-1 constants

2014-04-18 Thread brian m. carlson
SHA-1 is clearly on its way out.  I know that there has been discussion
in the past about moving to different algorithms.  I'm not interested in
having that discussion now.

I'd like to introduce a set of preprocessor constants that we'd use
instead of hard-coded 20s and 40s everywhere.  That way, if we decide to
support a different algorithm, doing so becomes significantly easier.
These would be used in new code.

After that, I'd like to slowly start moving existing code to use these
constants.

I would also like to consider, as a third step, turning all of the
unsigned char[20] uses into a struct containing unsigned char[20] as its
only member, like libgit2 does.  This disambiguates arbitrary byte data
from byte sequences being used to represent an SHA-1 ID.

I'm hoping the first suggestion will be mostly unobjectionable, as
having hard-coded constants is widely considered bad programming
practice.  I suspect the latter two will be more controversial.  These
changes, if implemented, would be done by hand, not by machine, and they
would be bisectable.

Comments?

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: Refactoring hardcoded SHA-1 constants

2014-04-18 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

 I'd like to introduce a set of preprocessor constants that we'd use
 instead of hard-coded 20s and 40s everywhere.

Lukewarm on that.  It's hard to do consistently and unless they're
named well it can be harder to know what something like
BINARY_OBJECT_NAME_LENGTH means than plain '20' when first reading.

[...]
 I would also like to consider, as a third step, turning all of the
 unsigned char[20] uses into a struct containing unsigned char[20] as its
 only member, like libgit2 does.

That would be very welcome!

It's a nice way to steer people toward hashcmp using the type system,
and it makes it possible to use a union to enforce alignment later if
measurements show benefit.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Refactoring hardcoded SHA-1 constants

2014-04-18 Thread Michael Haggerty
 brian m. carlson wrote:
 I'd like to introduce a set of preprocessor constants that we'd use
 instead of hard-coded 20s and 40s everywhere.

I agree that it would help code clarity to have symbolic constants for
these numbers.

On 04/19/2014 12:40 AM, Jonathan Nieder wrote:
 Lukewarm on that.  It's hard to do consistently and unless they're
 named well it can be harder to know what something like
 BINARY_OBJECT_NAME_LENGTH means than plain '20' when first reading.

OK, so let's see if we can name them well.  (Though I think if the names
come into wide use then we'll get familiar with them quickly.)

libgit2 seems to use the name oid (for object ID) where we tend to
use sha1 or name.  That might be a good convention for us to move
towards.

Their constants are called GIT_OID_RAWSZ (== 20) and GIT_OID_HEXSZ (==
40).  They don't exactly roll off the tongue, I'll admit.

We wouldn't need a GIT_ prefix, I think, since our code is not meant
to be used as a library.

Let the brainstorming (and bikeshedding) begin!

1. GIT_OID_RAWSZ / GIT_OID_HEXSZ

2. OID_RAWSZ / OID_HEXSZ

3. OID_BINARY_LEN / OID_ASCII_LEN

4. BINARY_OID_LEN / ASCII_OID_LEN

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Refactoring hardcoded SHA-1 constants

2014-04-18 Thread Duy Nguyen
On Sat, Apr 19, 2014 at 7:06 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 Let the brainstorming (and bikeshedding) begin!

 1. GIT_OID_RAWSZ / GIT_OID_HEXSZ

 2. OID_RAWSZ / OID_HEXSZ

 3. OID_BINARY_LEN / OID_ASCII_LEN

 4. BINARY_OID_LEN / ASCII_OID_LEN

5. sizeof(oid) / ASCII_OID_LEN
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Refactoring hardcoded SHA-1 constants

2014-04-18 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Sat, Apr 19, 2014 at 7:06 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 Let the brainstorming (and bikeshedding) begin!

 1. GIT_OID_RAWSZ / GIT_OID_HEXSZ

 2. OID_RAWSZ / OID_HEXSZ

 3. OID_BINARY_LEN / OID_ASCII_LEN

 4. BINARY_OID_LEN / ASCII_OID_LEN

 5. sizeof(oid) / ASCII_OID_LEN

Can we safely assume sizeof(struct { uchar oid[20]; }) is 20, or on
some 64-bit platforms do we have to worry about 8-byte alignment?

In any case, if the pair of names in 1. are what is already used, I
do not see a need for us to waste time bikeshedding at all.

In an ideal world, an implementation of cmd_foo() that defines a
variable to hold an object name and then calls into libgit.a
services may link in the future with a different implementation of
the services that are currently offered by libgit.a but are
reimplemented on top of libgit2, right?  Having the same name would
help us, not hurt us, with that direction in some future, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Refactoring hardcoded SHA-1 constants

2014-04-18 Thread Duy Nguyen
On Sat, Apr 19, 2014 at 8:06 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Sat, Apr 19, 2014 at 7:06 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 Let the brainstorming (and bikeshedding) begin!

 1. GIT_OID_RAWSZ / GIT_OID_HEXSZ

 2. OID_RAWSZ / OID_HEXSZ

 3. OID_BINARY_LEN / OID_ASCII_LEN

 4. BINARY_OID_LEN / ASCII_OID_LEN

 5. sizeof(oid) / ASCII_OID_LEN

 Can we safely assume sizeof(struct { uchar oid[20]; }) is 20, or on
 some 64-bit platforms do we have to worry about 8-byte alignment?

That's an interesting point. sizeof(that struct) on x86-64 returns 20.
I haven't checked about other platforms.

We have some ondisk structures around that contain unsigned char [20]
if I remember correctly. Those would need to be handled with care
during the conversion if this becomes a real issue.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html