Vivek,

You are correct, distcp will overwrite a file if it has changed or is new.
As to running this realtime (ie: as soon as data is deposited on the source 
cluster, you will have to handle that).
Please be aware if you are talking about hive tables, you will also need the 
hive metastore.
We copy our critical data from a Production Cluster to another Production 
Cluster and to a Test Cluster on a daily basis.
Also, the contents of the Hive Metastore database.
Be aware if you restore the Hive Metastore database on the destination cluster, 
any tables created solely on the destination cluster may disappear.

David


From: Vivek Singh Raghuwanshi [mailto:[email protected]]
Sent: Wednesday, February 10, 2016 1:28 PM
To: [email protected]
Subject: Re: Hadoop Backup and Archival Cluster

Thanks David,

I want to replicate the data once it reached on the cluster, and delete from 
source Cluster after one year. I want Cluster works as Hot Backup and Archival 
and Cluster A only having latest data.

And as per my information distcp copy all the data and over-right. Please 
correct me if i am wrong.


On Wed, Feb 10, 2016 at 12:21 PM, David Whitmore 
<[email protected]<mailto:[email protected]>>
 wrote:
Yes, you can run a distcp to copy data from one cluster to another, also distcp 
has an option to tell if it will delete files on the destination if they are 
NOT on the source.


From: Vivek Singh Raghuwanshi 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, February 10, 2016 1:16 PM
To: [email protected]<mailto:[email protected]>
Subject: Hadoop Backup and Archival Cluster



Hi Friends,



I am planning to setup a Hadoop Cluster (A) with Cluster replication (B). so 
that once data is reached to Cluster A it will replicated to Cluster D. I am 
having one question if i delete data from Cluster A on the basis of Time like 
one month old data is it also removed from Cluster B. if yes how i can avoid 
this.

What i want to achieve.

1. Once data is reached to Cluster A it will automatically replicated to 
Cluster B.

2. After one year old data from Cluster A remove automatically but not from 
Cluster B.

3. If any one wants to run query on latest data Cluster A is available but for 
Older data Cluster B is available.





Regards
--
ViVek Raghuwanshi
Mobile -+91-09595950504<tel:%2B91-09595950504>
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi



--
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi

Reply via email to