On Tue, Apr 22, 2014 at 6:57 PM, Stephan Beal <sgb...@googlemail.com> wrote:
> On Tue, Apr 22, 2014 at 6:48 PM, Richard Hipp <d...@sqlite.org> wrote:
>> Fossil generates some of its "GUID"s using the SHA1 hash algorithm.  Other
>> GUIDs (for example for ticket IDs) are generated using:
>>
>>         SELECT lower(hex(randomblob(20)));
>>
>> You can increase the 20 to make the GUIDs as "globally unique" as you
>> want.  The GUIDs discussed previously in this thread seem use 16 instead of
>> 20 and thus are less unique.
>>
>
> That reminds me of a specific snippet from this article:
>
> http://www.w3.org/DesignIssues/Axioms.html#nonunique
>
> In summary: the context of a GUID defines its "scope of required
> uniqueness," and a 16-byte GUID is essentially globally unique so long as
> it has no collisions within its context(s). (i.e. who cares if SHA1s
> collide, so long as it's not in the same repo?)

First, SHA1 hashes and GUID, although they look the same (size
notwithstanding), are not the same. Hashes like SHA1 derive their
value from actual content (at a point in time), so they are in fact
better than randomly generated GUIDs. But not every applications can
easily compute content hashes (using SHA1, SHA256, or whatever other
secure hashing algo) for their content. And for mutable entities,
content hashes would be definition also mutate (ignoring very unlikely
collisions), unlike GUIDs which are arbitrary and immutable "by
design", which makes them suitable as PKs of mutate entities.

Regarding the uniqueness argument made by DRH, it's actually very hard
to generate 2 random-based GUIDS, given that a 128-bit is a very very
large number. It is said that 128-bit is large enough to store the
estimated number of atoms in our galaxy. It's good enough for my own
uses. Being of the curious type, I wrote a little test to generate a
large number of GUIDs (using boost::uuid), then sort them, then look
for the longest prefix (byte-wise, not char wise). To keep things
simple, I did that in memory, so could only generate 1/2 a billion,
and the longest common prefix I found was 7 bytes, out of the 16
bytes. Intuitively, I suspect one must generate increasingly large
number of GUIDs to increase the common prefix length by 1 byte each
time, but I didn't verify this intuition.

So yes, in theory, one will eventually run out of bits using a 128-bit
(integer) GUID, but in practice I don't think it hardly matters.My
$0.02. --DD
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to