Currently the Import tool doesn't create the table on target cluster, if we choose approach #2, Import tool should be enhanced with table creation capability.
Cheers On Tue, May 7, 2013 at 4:02 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org > wrote: > @Mohammad: The end goal is really more regarding the splits more than > the model. So I don't think Lars' options are good for this usecase. > @Mike: I agree that things were not configured correctly. User should > have had split the table before doing the import. I like the idea of > looking at the files to get the regions boundaries. That way you don't > need to have the source_table still there... > > So we have 2 different things here. > 1) a command on the shell to duplicate a table structure > 2) an option on the import command to split the table regions based on > the files names. > > If we agree on that I will open one JIRA for each... > > JM > > 2013/5/7 Michael Segel <michael_se...@hotmail.com>: > > Silly question... > > > > If you're doing a simple export, then you end up with all of your prior > regions as separate files in a directory, right? > > > > So in theory, you could find the first row and the last complete row of > each file and then do your pre-splits based on the start key and end key > that you find. > > > > That would be your tool so to speak. > > > > But to the point that reading back in these files will cause you to > crash your RS and HBase? > > That doesn't sound like its well tuned or right. > > > > HTH > > -Mike > > > > On May 7, 2013, at 5:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > >> I am not aware of a tool which can pre-split table using another table's > >> region boundaries as template. > >> > >> Such a tool would be nice to have. > >> > >> Cheers > >> > >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org > >>> wrote: > >> > >>> Hi, > >>> > >>> When we are doing an export, we are only exporting the data. Then when > >>> we are importing that back, we need to make sure the table is > >>> pre-splitted correctly else we might hotspot some servers. > >>> > >>> If you simply export then import without pre-splitting at all, you > >>> will most probably brought some servers down because they will be > >>> overwhelmed with splits and compactions. > >>> > >>> Do we have any tool to pre-split a table the same way another table is > >>> already pre-splitted? > >>> > >>> Something like > >>>> duplicate 'source_table', 'target_table' > >>> > >>> Which will create a new table called 'target_table' with exactly the > >>> same parameters as 'source_table' and the same regions boundaries? > >>> > >>> If we don't have, will it be useful to have one? > >>> > >>> Or event something like: > >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'} > >>> > >>> > >>> JM > >>> > > >