On Tue, Feb 10, 2009 at 2:22 AM, Allen Wittenauer wrote:
>
> The key here is to prioritize your data. Impossible to replicate data gets
> backed up using whatever means necessary, hard-to-regenerate data, next
> priority. Easy to regenerate and ok to nuke data, doesn't get backed up.
>
I think t
Allen Wittenauer wrote:
On 2/9/09 4:41 PM, "Amandeep Khurana" wrote:
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
I'm reminded of a previous job where a site administrator
We copy over selected files from HDFS to KFS and use an instance of KFS as
backup file system.
We use distcp to take backup.
Lohit
- Original Message
From: Allen Wittenauer
To: core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 5:22:38 PM
Subject: Re: Backing up HDFS?
On 2/9
Hey,
There's also a ticket open to enable global snapshots for a single HDFS
instance: https://issues.apache.org/jira/browse/HADOOP-3637. While this
doesn't solve the multi-site backup issue, it does provide stronger
protection against programmatic deletion of data in a single cluster.
Regards,
J
On 2/9/09 4:41 PM, "Amandeep Khurana" wrote:
> Why would you want to have another backup beyond HDFS? HDFS itself
> replicates your data so if the reliability of the system shouldnt be a
> concern (if at all it is)...
I'm reminded of a previous job where a site administrator refused to make
tape
On Feb 9, 2009, at 6:41 PM, Amandeep Khurana wrote:
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
It should be. HDFS is not an archival system. Multiple replicas
does
Replication only protects against single node failure. If there's a
fire and we lose the whole cluster, replication doesn't help. Or if
there's human error and someone accidentally deletes data, then it's
deleted from all the replicas. We want our backups to protect against
all these scenar
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Mon, Feb 9, 2009 at 4:17 PM
How do people back up their data that they keep on HDFS? We have many
TB of data which we need to get backed up but are unclear on how to do
this efficiently/reliably.