I don't believe that there has been any reports of collisions, but if. You are concerned you could use the SHA-1 for generating the hash. Relatively speaking, SHA-1is slower, but still fast enough for most applications.
Don't know if it's speed relative to an MD5 and string cat, but it should yield a smaller key. Sent from a remote device. Please excuse any typos... Mike Segel On Jul 20, 2012, at 11:31 AM, Damien Hardy <dha...@figarocms.fr> wrote: > Le 20/07/2012 18:22, Jonathan Bishop a écrit : >> Hi, >> >> I know it is a commonly suggested to use an MD5 checksum to create a row >> key from some other identifier, such as a string or long. This is usually >> done to guard against hot-spotting and seems to work well. >> >> My concern is that there no guard against collision when this is done - two >> different strings or longs could produce the same row-key. Although this is >> very unlikely, it is bothersome to consider this possibility for large >> systems. >> >> So what I usually do is concatenate the MD5 with the original identifier... >> >> MD5(id) + id >> >> which assures that the rowkey is both randomly distributed and unique. >> >> Is this necessary, or is it the common practice to just use the MD5 >> checksum itself? >> >> Thanks, >> >> Jon > > Hello Jonathan, > > md5(id)+id is the good way to avoid hotspotting and insure uniqueness. > > md5(id)[0]+id could be an other way to limit randomness of the rowid on > 16 values > You can now combine (with OR logic) 16 filters in a scanner (on for each > letter available in md5 digest) > it limits the balance on 16 potentials regions olso. > > Cheers, > > -- > Damien >