On Monday, May 21, 2012 12:33:05 PM, "Eric Newton" <[email protected]> wrote: > You need to estimate the size of the split. First, get the id of the > table with "tables -l" in the accumulo shell. > > > Then, find out the size of table in hdfs: > > > $ hadoop fs -dus /accumulo/tables/<id> > > > Divide by 7, and use that as the split size: > > > shell> config -t mytable -s table.split.threshold=newsize > > > The table will automatically split out. Afterwards, you can then raise > the split size to keep it from splitting until it gets much bigger: > > > shell> config -t mytable -s table.split.threshold=1G
It's going to be hard to get exactly 7 splits using that method. When Accumulo sees a tablet's size is over the threshold, it attempts to split it in half. If both of the resulting tablet sizes are above the threshold, it splits those in half. Assuming a uniform key distribution, you're likely to end up with 2^N tablets. 8 tablets on 7 servers would have one always doing twice the work, so you might be better off aiming for a larger number of tablets, which, I see now, answers your next question. If the key distribution isn't uniform, you may not see this 2^N behavior, but I would still recommend having significantly more tablets than tservers to make load balancing easier. Billie > -Eric > > > > On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J < > [email protected] > wrote: > > > Hi, > > I am looking for advice on how to best layout my table splits. I have > a 7 node cluster and my table contains ~10M records. I would like to > split the table equally across all the servers however I see no > utility to do this in this manner. I understand I can create splits > for some letter range but I was hoping for some way to have accumulo > create "n" equal splits. Is this possible? Right now the best way I > see to handle this is to write a utility that iterates the table, > keeps a count and at some given value (table size/ split count) spits > out the beginning and end row and then I create the split manually. > > Thanks, > Ralph > > __________________________________________________ > Ralph Perko > Pacific Northwest National Laboratory
