Thank you for your response. However, I'm still having a hard time
understanding you. Apologies for this.
So, this is where I think I'm getting confused :
Let's talk about the original rowkey, before anything has been prepended to
it. Let's call this original_rowkey.
Let's say your first
Thank you for the explanation, but I'm a little confused. The key will be
monotonically increasing, but the hash of that key will not be.
So, even though your original keys may look like : 1_foobar, 2_foobar,
3_foobar
After the hashing, they'd look more like : 349000_1_foobar,
99_2_foobar,
Jeremy,
I think you have to be careful in how you say things.
While over time, you’re going to get an even distribution, the hash isn’t
random. Its consistent so that hash(x) = y and will always be the same.
You’re taking the modulus to create 1 to n buckets.
In each bucket, your new key
Thank you for your response!
So I guess 'salt' is a bit of a misnomer. What I used to do is this :
1) Say that my key value is something like '1234foobar'
2) I obtain the hash of '1234foobar'. Let's say that's '54824923'
3) I mod the hash by my number of regions. Let's say I have 2000
we do this for almost all our tables
On May 5, 2015 11:05 AM, jeremy p athomewithagroove...@gmail.com wrote:
Thank you for your response!
So I guess 'salt' is a bit of a misnomer. What I used to do is this :
1) Say that my key value is something like '1234foobar'
2) I obtain the hash of
Yes, what you described mod(hash(rowkey),n) where n is the number of regions
will remove the hotspotting issue.
However, if your key is sequential you will only have regions half full post
region split.
Look at it this way…
If I have a key that is a sequential count 1,2,3,4,5 … I am
Yes, don’t use a salt. Salt implies that your seed is orthogonal (read random)
to the base table row key.
You’re better off using a truncated hash (md5 is fastest) so that at least you
can use a single get().
Common?
Only if your row key is mostly sequential.
Note that even with
Hello all,
I've been out of the HBase world for a while, and I'm just now jumping back
in.
As of HBase .94, it was still common to take a hash of your RowKey and use
that to salt the beginning of your RowKey to obtain an even distribution
among your region servers. Is this still a common