Silly question... 

If you're doing a simple export, then you end up with all of your prior regions 
as separate files in a directory, right? 

So in theory, you could find the first row and the last complete row of each 
file and then do your pre-splits based on the start key and end key that you 
find.  

That would be your tool so to speak. 

But to the point that reading back in these files will cause you to crash your 
RS and HBase? 
That doesn't sound like its well tuned or right.

HTH
-Mike

On May 7, 2013, at 5:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> I am not aware of a tool which can pre-split table using another table's
> region boundaries as template.
> 
> Such a tool would be nice to have.
> 
> Cheers
> 
> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org
>> wrote:
> 
>> Hi,
>> 
>> When we are doing an export, we are only exporting the data. Then when
>> we are importing that back, we need to make sure the table is
>> pre-splitted correctly else we might hotspot some servers.
>> 
>> If you simply export then import without pre-splitting at all, you
>> will most probably brought some servers down because they will be
>> overwhelmed with splits and compactions.
>> 
>> Do we have any tool to pre-split a table the same way another table is
>> already pre-splitted?
>> 
>> Something like
>>> duplicate 'source_table', 'target_table'
>> 
>> Which will create a new table called 'target_table' with exactly the
>> same parameters as 'source_table' and the same regions boundaries?
>> 
>> If we don't have, will it be useful to have one?
>> 
>> Or event something like:
>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
>> 
>> 
>> JM
>> 

Reply via email to