Hi everyone, I read about salting and how it is used for load balancing in case of sequential keys. Basically, salt should distribute sequential rows to different region servers.
I also read this article <http://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key-ranges-in-mapreduce/> which explains how to run MR jobs on tables which were salted. So, it advised to generate salt as: StringUtils.leftPad(Integer.toString(Math.abs(keyCore.hashCode() % numberOfRegions)), 3, "0") + "|" + logicalKey So you basically take hash of original key and do modulo division to get the salt. You also need to specify pre-splitting based on the salt, so that each region would contain rows with same salt. All of this seems reasonable. My question is, *what happens when you add more region servers*? It is expected that you also increase number of regions so you would have to change split strategy so that new regions follow the "one-salt-for-all-rows-in-region" rule. You would also need to perform modulo division by an increased numberOfRegions. All of that means that I could *mess up* queries when trying to get rows which were added when number of regions is smaller. For example, at the beginning you could be dividing by modulo 10 (10 regions), and then you would be dividing modulo 50 (now, 50 regions). Can anyone please explain the full procedure to this salting/pre-splitting properly? -- Marko Dinic