Kevin, from your requirement, I think the 'snapshot' feature with export will work better. Here is some info: http://hbase.apache.org/book/ops.snapshots.html to fully benefit from this feature, you may consider to move to 0.94.6+
I am still curiously about this hard requirement ".. The second map reduce job cannot start until all the data from Cluster A has been replicated to Cluster B....", consider the output of the first mapreduce job will be put into a HBase table of ClusterA. there is no need to wait till the replication complete, as long as use different rowID so the 2nd output wont' overwrite the 1st one. HBase replication will handle the situation very well. Demai On Mon, Nov 11, 2013 at 4:03 PM, Kevin Su <[email protected]> wrote: > Hi, > > I am having trouble searching for answers regarding HBase replication, so I > thought I would email the mailing list. > > Does HBase provide an API/way to see what has/hasn't been replicated yet? > > My use case is the following: > > I run a map reduce job in Cluster A and stick the output in HBase. I would > like to transport this output to Cluster B as (part of) the input to > another map reduce job. I hope to achieve this transport via HBase > replication. The second map reduce job cannot start until all the data from > Cluster A has been replicated to Cluster B. So what is the best way to > check if everything has been replicated? Do I query Zookeeper and check if > the RS queues are empty? Or is HBase replication not the right fit for my > use case? > > I am using HBase 0.94.2. > > Thanks in advance for any advice! > > -- > Kevin >
