How many column families do you have ? For #3, per-splitting table at the row keys corresponding to peaks makes sense.
On Apr 20, 2013, at 10:52 AM, Pal Konyves <paul.kony...@gmail.com> wrote: > Hi, > > I am just reading about region splitting. By default - as I understand - > Hbase handles splitting the regions. I just don't know how to imagine on > which key it splits the regions. > > 1) For example when I write MD5 hash of rowkeys, they are most probably > evenly distributed from > 000000... to FFFFF... right? When Hbase starts with one region, all the > writes goes into that region, and when the HFile get's too big, it just > gets for example the median value of the stored keys, and split the region > by this? > > 2) I want to bulk load tons of data with the HBase java client API put > operations. I want it to perform well. My keys are numeric sequential > values (which I know from this post, I cannot load into Hbase sequentially, > because the Hbase tables are going to be sad > http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/ > ) > So I thought I would pre-split the table into regions, and load the data > randomized. This way I will get good distribution among region servers in > terms of network IO from the beginning. Is that a good idea? > > 3) If my rowkeys are not evenly distributed in the keyspace, but they show > some peaks or bursts. e.g. 000-999, but most of the keys gather around 020 > and 060 values, is it a good idea to have the pre region splits at those > peaks? > > Thanks in advance, > Pal