Hi everyone,

I read about salting and how it is used for load balancing in case of
sequential keys. Basically, salt should distribute sequential rows to
different region servers.

I also read this article
<http://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key-ranges-in-mapreduce/>
which
explains how to run MR jobs on tables which were salted.

So, it advised to generate salt as:

StringUtils.leftPad(Integer.toString(Math.abs(keyCore.hashCode() %
numberOfRegions)), 3, "0") + "|" + logicalKey

So you basically take hash of original key and do modulo division to get
the salt.

You also need to specify pre-splitting based on the salt, so that each
region would contain rows with same salt.

All of this seems reasonable. My question is, *what happens when you add
more region servers*?

It is expected that you also increase number of regions so you would have
to change split strategy so that new regions follow the
"one-salt-for-all-rows-in-region" rule. You would also need to perform
modulo division by an increased numberOfRegions.

All of that means that I could *mess up* queries when trying to get rows
which were added when number of regions is smaller. For example, at the
beginning you could be dividing by modulo 10 (10 regions), and then you
would be dividing modulo 50 (now, 50 regions).

Can anyone please explain the full procedure to this salting/pre-splitting
properly?

-- 
Marko Dinic

Reply via email to