The cluser is stopped anyway, so there is no consistency concerns. which
mean snapshots might be the best option. No need to delete any after.

The goal is really to export the data locally, get the cluster down, get a
new cluster, put the data and reload the table... the 2 clusters can't be
up at the same time...

2013/5/13 Matteo Bertozzi <theo.berto...@gmail.com>

> I'll go with the snapshots since you can avoid all the I/O of the
> import/export but the consistency model is different, and you don't have
> the start/end time option... you should delete the rows < tstart and > tend
> after the clone
>
> Matteo
>
>
>
> On Tue, May 14, 2013 at 1:48 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi Jeremy,
> >
> > Thanks for sharing this.
> >
> > I will take a look at it, and also most probably give a try to the
> snapshot
> > option....
> >
> > JM
> >
> > 2013/5/7 Jeremy Carroll <phobos...@gmail.com>
> >
> > >
> > >
> >
> https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb
> > >
> > > I wrote a quick script to do it with mechanize + ruby. I have a new
> tool
> > > which I'm polishing up that does the same thing in Python but using the
> > > HBase REST interface to get the data.
> > >
> > >
> > > On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org
> > > > wrote:
> > >
> > > > Hi,
> > > >
> > > > When we are doing an export, we are only exporting the data. Then
> when
> > > > we are importing that back, we need to make sure the table is
> > > > pre-splitted correctly else we might hotspot some servers.
> > > >
> > > > If you simply export then import without pre-splitting at all, you
> > > > will most probably brought some servers down because they will be
> > > > overwhelmed with splits and compactions.
> > > >
> > > > Do we have any tool to pre-split a table the same way another table
> is
> > > > already pre-splitted?
> > > >
> > > > Something like
> > > > > duplicate 'source_table', 'target_table'
> > > >
> > > > Which will create a new table called 'target_table' with exactly the
> > > > same parameters as 'source_table' and the same regions boundaries?
> > > >
> > > > If we don't have, will it be useful to have one?
> > > >
> > > > Or event something like:
> > > > > create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'}
> > > >
> > > >
> > > > JM
> > > >
> > >
> >
>

Reply via email to