Re: RowKey hashing in HBase 1.0

2015-05-13 Thread jeremy p
Thank you for your response. However, I'm still having a hard time understanding you. Apologies for this. So, this is where I think I'm getting confused : Let's talk about the original rowkey, before anything has been prepended to it. Let's call this original_rowkey. Let's say your first

Re: RowKey hashing in HBase 1.0

2015-05-06 Thread jeremy p
Thank you for the explanation, but I'm a little confused. The key will be monotonically increasing, but the hash of that key will not be. So, even though your original keys may look like : 1_foobar, 2_foobar, 3_foobar After the hashing, they'd look more like : 349000_1_foobar, 99_2_foobar,

Re: RowKey hashing in HBase 1.0

2015-05-06 Thread Michael Segel
Jeremy, I think you have to be careful in how you say things. While over time, you’re going to get an even distribution, the hash isn’t random. Its consistent so that hash(x) = y and will always be the same. You’re taking the modulus to create 1 to n buckets. In each bucket, your new key

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread jeremy p
Thank you for your response! So I guess 'salt' is a bit of a misnomer. What I used to do is this : 1) Say that my key value is something like '1234foobar' 2) I obtain the hash of '1234foobar'. Let's say that's '54824923' 3) I mod the hash by my number of regions. Let's say I have 2000

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread Koert Kuipers
we do this for almost all our tables On May 5, 2015 11:05 AM, jeremy p athomewithagroove...@gmail.com wrote: Thank you for your response! So I guess 'salt' is a bit of a misnomer. What I used to do is this : 1) Say that my key value is something like '1234foobar' 2) I obtain the hash of

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread Michael Segel
Yes, what you described mod(hash(rowkey),n) where n is the number of regions will remove the hotspotting issue. However, if your key is sequential you will only have regions half full post region split. Look at it this way… If I have a key that is a sequential count 1,2,3,4,5 … I am

Re: RowKey hashing in HBase 1.0

2015-05-03 Thread Michael Segel
Yes, don’t use a salt. Salt implies that your seed is orthogonal (read random) to the base table row key. You’re better off using a truncated hash (md5 is fastest) so that at least you can use a single get(). Common? Only if your row key is mostly sequential. Note that even with

RowKey hashing in HBase 1.0

2015-05-01 Thread jeremy p
Hello all, I've been out of the HBase world for a while, and I'm just now jumping back in. As of HBase .94, it was still common to take a hash of your RowKey and use that to salt the beginning of your RowKey to obtain an even distribution among your region servers. Is this still a common