Ok - now I understand - doing pre-splits using the full binary space does not
make sense when using a limited range. I do all my splits in the base-64
character space or let hbase do them organically.
thanks for the explanation.
-chris
On Mar 17, 2011, at 11:32 AM, Ted Dunning wrote:
> Just th
ation Algorithm. Thanks Ted.
-Pete
-Original Message-
From: Peter Haidinyak [mailto:phaidin...@local.com]
Sent: Thursday, March 17, 2011 9:44 AM
To: user@hbase.apache.org
Subject: RE: OT - Hash Code Creation
Hash Code in Object is limited to an int and a quick look at HashMap and
Trove'
Just that base-64 is not uniformly distributed relative to a binary
representation. This is simply because it is all printable characters. If
you do a 256 way pre-split based on a binary interpretation of the key, 64
regions will get traffic and 192 will get none. Among other things, this
can s
I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not
uniformly distributed? Or that a base-64 encoding is not evenly distributed?
thanks,
-chris
On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote:
>
> There can be some odd effects with this because the keys are not uniforml
On Thu, Mar 17, 2011 at 8:21 AM, Michael Segel wrote:
>
> Why not keep it simple?
>
> Use a SHA-1 hash of your key. See:
> http://codelog.blogial.com/2008/09/13/password-encryption-using-sha1-md5-java/
> (This was just the first one I found and there are others...)
>
Sha-1 is kind of slow.
>
>
There can be some odd effects with this because the keys are not uniformly
distributed. Beware if you are using pre-split tables because the region
traffic can be pretty unbalanced if you do a naive split.
On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas wrote:
> I've been using base-64 encoding w
-Pete
-Original Message-
From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris Tarnas
Sent: Thursday, March 17, 2011 9:21 AM
To: user@hbase.apache.org
Subject: Re: OT - Hash Code Creation
With 24 million elements you'd probably want a 64bit hash to minimize the risk
of
With 24 million elements you'd probably want a 64bit hash to minimize the risk
of collision, the rule of thumb is with 64bit hash key expect a collision when
you reach about 2^32 elements in your set. I half of a 128bit MD5 sum (a
cryptographic hash so you can only use parts of it if you want) a
17 Mar 2011 00:23:00 -0700
> Subject: Re: OT - Hash Code Creation
> To: user@hbase.apache.org
> CC: oct...@gmail.com
>
> Double hashing is a find thing. To actually answer the question, though, I
> would recommend Murmurhash or JOAAT (
> http://en.wikipedia.org/wiki/Jenkins_hash
Thanks, I'll give that a try.
-Pete
On Thu, 17 Mar 2011 00:23:00 -0700, Ted Dunning
wrote:
Double hashing is a find thing. To actually answer the question,
though, I
would recommend Murmurhash or JOAAT (
http://en.wikipedia.org/wiki/Jenkins_hash_function)
On Wed, Mar 16, 2011 at 3:48 P
Double hashing is a find thing. To actually answer the question, though, I
would recommend Murmurhash or JOAAT (
http://en.wikipedia.org/wiki/Jenkins_hash_function)
On Wed, Mar 16, 2011 at 3:48 PM, Andrey Stepachev wrote:
> Try hash table with double hashing.
> Something like this
>
> http://ww
Try hash table with double hashing.
Something like this
http://www.java2s.com/Code/Java/Collections-Data-Structure/Hashtablewithdoublehashing.htm
2011/3/17 Peter Haidinyak
> Hi,
>This is a little off topic but this group seems pretty swift so I
> thought I would ask. I am aggregating a d
Hi,
This is a little off topic but this group seems pretty swift so I
thought I would ask. I am aggregating a day's worth of log data which means I
have a Map of over 24 million elements. What would be a good algorithm to use
for generating Hash Codes for these elements that cut down on
13 matches
Mail list logo