Re: OT - Hash Code Creation

2011-03-17 Thread Chris Tarnas
Ok - now I understand - doing pre-splits using the full binary space does not make sense when using a limited range. I do all my splits in the base-64 character space or let hbase do them organically. thanks for the explanation. -chris On Mar 17, 2011, at 11:32 AM, Ted Dunning wrote: > Just th

RE: OT - Hash Code Creation

2011-03-17 Thread Peter Haidinyak
ation Algorithm. Thanks Ted. -Pete -Original Message- From: Peter Haidinyak [mailto:phaidin...@local.com] Sent: Thursday, March 17, 2011 9:44 AM To: user@hbase.apache.org Subject: RE: OT - Hash Code Creation Hash Code in Object is limited to an int and a quick look at HashMap and Trove'

Re: OT - Hash Code Creation

2011-03-17 Thread Ted Dunning
Just that base-64 is not uniformly distributed relative to a binary representation. This is simply because it is all printable characters. If you do a 256 way pre-split based on a binary interpretation of the key, 64 regions will get traffic and 192 will get none. Among other things, this can s

Re: OT - Hash Code Creation

2011-03-17 Thread Chris Tarnas
I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not uniformly distributed? Or that a base-64 encoding is not evenly distributed? thanks, -chris On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote: > > There can be some odd effects with this because the keys are not uniforml

Re: OT - Hash Code Creation

2011-03-17 Thread Ted Dunning
On Thu, Mar 17, 2011 at 8:21 AM, Michael Segel wrote: > > Why not keep it simple? > > Use a SHA-1 hash of your key. See: > http://codelog.blogial.com/2008/09/13/password-encryption-using-sha1-md5-java/ > (This was just the first one I found and there are others...) > Sha-1 is kind of slow. > >

Re: OT - Hash Code Creation

2011-03-17 Thread Ted Dunning
There can be some odd effects with this because the keys are not uniformly distributed. Beware if you are using pre-split tables because the region traffic can be pretty unbalanced if you do a naive split. On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas wrote: > I've been using base-64 encoding w

RE: OT - Hash Code Creation

2011-03-17 Thread Peter Haidinyak
-Pete -Original Message- From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris Tarnas Sent: Thursday, March 17, 2011 9:21 AM To: user@hbase.apache.org Subject: Re: OT - Hash Code Creation With 24 million elements you'd probably want a 64bit hash to minimize the risk of

Re: OT - Hash Code Creation

2011-03-17 Thread Chris Tarnas
With 24 million elements you'd probably want a 64bit hash to minimize the risk of collision, the rule of thumb is with 64bit hash key expect a collision when you reach about 2^32 elements in your set. I half of a 128bit MD5 sum (a cryptographic hash so you can only use parts of it if you want) a

RE: OT - Hash Code Creation

2011-03-17 Thread Michael Segel
17 Mar 2011 00:23:00 -0700 > Subject: Re: OT - Hash Code Creation > To: user@hbase.apache.org > CC: oct...@gmail.com > > Double hashing is a find thing. To actually answer the question, though, I > would recommend Murmurhash or JOAAT ( > http://en.wikipedia.org/wiki/Jenkins_hash

Re: OT - Hash Code Creation

2011-03-17 Thread Pete Haidinyak
Thanks, I'll give that a try. -Pete On Thu, 17 Mar 2011 00:23:00 -0700, Ted Dunning wrote: Double hashing is a find thing. To actually answer the question, though, I would recommend Murmurhash or JOAAT ( http://en.wikipedia.org/wiki/Jenkins_hash_function) On Wed, Mar 16, 2011 at 3:48 P

Re: OT - Hash Code Creation

2011-03-17 Thread Ted Dunning
Double hashing is a find thing. To actually answer the question, though, I would recommend Murmurhash or JOAAT ( http://en.wikipedia.org/wiki/Jenkins_hash_function) On Wed, Mar 16, 2011 at 3:48 PM, Andrey Stepachev wrote: > Try hash table with double hashing. > Something like this > > http://ww

Re: OT - Hash Code Creation

2011-03-16 Thread Andrey Stepachev
Try hash table with double hashing. Something like this http://www.java2s.com/Code/Java/Collections-Data-Structure/Hashtablewithdoublehashing.htm 2011/3/17 Peter Haidinyak > Hi, >This is a little off topic but this group seems pretty swift so I > thought I would ask. I am aggregating a d

OT - Hash Code Creation

2011-03-16 Thread Peter Haidinyak
Hi, This is a little off topic but this group seems pretty swift so I thought I would ask. I am aggregating a day's worth of log data which means I have a Map of over 24 million elements. What would be a good algorithm to use for generating Hash Codes for these elements that cut down on