Re: Backing up HDFS?

2009-02-12 Thread Stefan Podkowinski
On Tue, Feb 10, 2009 at 2:22 AM, Allen Wittenauer wrote: > > The key here is to prioritize your data.  Impossible to replicate data gets > backed up using whatever means necessary, hard-to-regenerate data, next > priority. Easy to regenerate and ok to nuke data, doesn't get backed up. > I think t

Re: Backing up HDFS?

2009-02-11 Thread Steve Loughran
Allen Wittenauer wrote: On 2/9/09 4:41 PM, "Amandeep Khurana" wrote: Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... I'm reminded of a previous job where a site administrator

Re: Backing up HDFS?

2009-02-09 Thread lohit
We copy over selected files from HDFS to KFS and use an instance of KFS as backup file system. We use distcp to take backup. Lohit - Original Message From: Allen Wittenauer To: core-user@hadoop.apache.org Sent: Monday, February 9, 2009 5:22:38 PM Subject: Re: Backing up HDFS? On 2/9

Re: Backing up HDFS?

2009-02-09 Thread Jeff Hammerbacher
Hey, There's also a ticket open to enable global snapshots for a single HDFS instance: https://issues.apache.org/jira/browse/HADOOP-3637. While this doesn't solve the multi-site backup issue, it does provide stronger protection against programmatic deletion of data in a single cluster. Regards, J

Re: Backing up HDFS?

2009-02-09 Thread Allen Wittenauer
On 2/9/09 4:41 PM, "Amandeep Khurana" wrote: > Why would you want to have another backup beyond HDFS? HDFS itself > replicates your data so if the reliability of the system shouldnt be a > concern (if at all it is)... I'm reminded of a previous job where a site administrator refused to make tape

Re: Backing up HDFS?

2009-02-09 Thread Brian Bockelman
On Feb 9, 2009, at 6:41 PM, Amandeep Khurana wrote: Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... It should be. HDFS is not an archival system. Multiple replicas does

Re: Backing up HDFS?

2009-02-09 Thread Nathan Marz
Replication only protects against single node failure. If there's a fire and we lose the whole cluster, replication doesn't help. Or if there's human error and someone accidentally deletes data, then it's deleted from all the replicas. We want our backups to protect against all these scenar

Re: Backing up HDFS?

2009-02-09 Thread Amandeep Khurana
Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Feb 9, 2009 at 4:17 PM

Backing up HDFS?

2009-02-09 Thread Nathan Marz
How do people back up their data that they keep on HDFS? We have many TB of data which we need to get backed up but are unclear on how to do this efficiently/reliably.