Re: Hbase Migrations

Mallikarjun Fri, 16 Apr 2021 19:01:41 -0700

*Simple procedure:*

1. Stop writing in one cluster.

2. Export snapshot as described below and Restore on other cluster for each
of the tables

3. Start writing to another cluster.

*Not so Simple Procedure (Zero Downtime)*

*If your source cluster > 2.1 (has Serial replication)*

You can do the following in sequence

1. Export snapshot to the destination cluster. Which starts at time say *t1*
and it takes say an hour for example.

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> $snapshot_name -copy-to hdfs://$dest_hdfs_ip:8020/hbase -bandwidth
> $bandwidth_in_MB

2. Setup up a Peer to destination cluster at say time *t2* which replicates
the table(s) to be migrated (works only if your source cluster has serial
replication > hbase 2.1) (make sure replication scope is 1 on cf you want
to copy over)

add_peer 'cluster_dest', CLUSTER_KEY =>
> "zk1,zk2,zk3:2182:/hbase", TABLE_CFS => { "table1" => ["cf1", "cf2"] },
> STATE => "ENABLED", SERIAL => true, REPLICATE_ALL => false
>

3. Make sure cluster replication is propagating data and then disable
cluster replication temporarily. This will start piling up the data in
oldWAL's and logs are kept in zk replication queues

disable_peer 'cluster_dest'
>

4. Copy the diff data from time *t1 *to present time  using Copy Table
which starts at time say *t3 ( > t2) *and it takes say 10 minutes for
example

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=
> $copy_starttime --endtime=$copy_endtime --peer.adr=$dest_zk_addr '
> $full_table_name'

5.  Enable cluster replication back to ensure data is replicated from time
*t2*

enable_peer 'cluster_dest'

6. Stop writes to old cluster by marking tables read only

alter '$full_table_name',{METHOD=>'table_att', READONLY=>true}
>

7. (optional) Verify if data is the same between 2 clusters for the window
of migration. (Note: This comes with caveats. If you have data being
updated for the same rows, or ttl expiring data some row differences are
expected.)

On Source cluster

> hbase org.apache.hadoop.hbase.mapreduce.HashTable --starttime=
> $verify_starttime --endtime=$verify_endtime '$full_table_name'
> $hash_table_full_path

On Destination Cluster

> hbase org.apache.hadoop.hbase.mapreduce.SyncTable --sourcezkcluster=
> $src_zk_addr *--dryrun=true* hdfs://$src_hdfs_ip:8020$hash_table_full_path
> '$full_table_name' '$full_table_name'

*If your source cluster < 2.1 (No serial replication)*

you will have some write downtime. Can skip step 3 and 5. and step 4 can be
repeated as many times as you want until the downtime window is very small.
Otherwise largely the same.

---
Mallikarjun

On Thu, Apr 15, 2021 at 7:23 PM Sajid Mohammed <[email protected]>
wrote:

> Hello All
>
> Any one know simple procedure to migrate all Hbase Tables from one cluster
> to another in one go ?
>
>
> Thanks
> Sajid.
>

Re: Hbase Migrations

Reply via email to