Hi Steve, On top of Harsh answer, other than Backup there is a feature called Snapshot offered by some third party vendors like MapR. Though its not really a backup it is just a point for which you can revert back at any point in time.
Best, Mahesh Balija, CalsoftLabs. On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote: > You need some form of space capacity on the backup cluster that can > withstand it. Lower replication (<3) may also be an option there to > save yourself some disks/nodes? > > On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <sediso...@gmail.com> wrote: > > Backup to disks is what we do right now. Distcp would copy across HDFS > > clusters, meaning by I will have to build another 12 node cluster ? Is > that > > correct ? > > > > > > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts > > <mathias.herbe...@gmail.com> wrote: > >> > >> Backup on tape or on disk? > >> > >> On disk, have another Hadoop cluster dans do regular distcp. > >> > >> On tape, make sure you have a backup program which can backup streams > >> so you don't have to materialize your TB files outside of your Hadoop > >> cluster first... (I know Simpana can't do that :-(). > >> > >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <sediso...@gmail.com> > >> wrote: > >> > Folks, > >> > > >> > Its been an year and my HDFS / Solar /Hive setup is working flawless. > >> > The > >> > data logs which were meaningless to my business all of a sudden became > >> > precious to the extent that our management wants to backup this data. > I > >> > am > >> > talking about 20 TB of active HDFS data with an incremental of 2 > >> > TB/month. > >> > We would like to have weekly and monthly backups upto 12 months. > >> > > >> > Any ideas how to do this ? > >> > > >> > -- Steve > > > > > > > > -- > Harsh J >