On Monday, May 21, 2012 12:33:05 PM, "Eric Newton" <[email protected]> 
wrote:
> You need to estimate the size of the split. First, get the id of the
> table with "tables -l" in the accumulo shell.
> 
> 
> Then, find out the size of table in hdfs:
> 
> 
> $ hadoop fs -dus /accumulo/tables/<id>
> 
> 
> Divide by 7, and use that as the split size:
> 
> 
> shell> config -t mytable -s table.split.threshold=newsize
> 
> 
> The table will automatically split out. Afterwards, you can then raise
> the split size to keep it from splitting until it gets much bigger:
> 
> 
> shell> config -t mytable -s table.split.threshold=1G


It's going to be hard to get exactly 7 splits using that method.  When Accumulo 
sees a tablet's size is over the threshold, it attempts to split it in half.  
If both of the resulting tablet sizes are above the threshold, it splits those 
in half.  Assuming a uniform key distribution, you're likely to end up with 2^N 
tablets.  8 tablets on 7 servers would have one always doing twice the work, so 
you might be better off aiming for a larger number of tablets, which, I see 
now, answers your next question.  If the key distribution isn't uniform, you 
may not see this 2^N behavior, but I would still recommend having significantly 
more tablets than tservers to make load balancing easier.

Billie


> -Eric
> 
> 
> 
> On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J <
> [email protected] > wrote:
> 
> 
> Hi,
> 
> I am looking for advice on how to best layout my table splits. I have
> a 7 node cluster and my table contains ~10M records. I would like to
> split the table equally across all the servers however I see no
> utility to do this in this manner. I understand I can create splits
> for some letter range but I was hoping for some way to have accumulo
> create "n" equal splits. Is this possible? Right now the best way I
> see to handle this is to write a utility that iterates the table,
> keeps a count and at some given value (table size/ split count) spits
> out the beginning and end row and then I create the split manually.
> 
> Thanks,
> Ralph
> 
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory

Reply via email to