You need some form of space capacity on the backup cluster that can withstand it. Lower replication (<3) may also be an option there to save yourself some disks/nodes?
On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <sediso...@gmail.com> wrote: > Backup to disks is what we do right now. Distcp would copy across HDFS > clusters, meaning by I will have to build another 12 node cluster ? Is that > correct ? > > > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts > <mathias.herbe...@gmail.com> wrote: >> >> Backup on tape or on disk? >> >> On disk, have another Hadoop cluster dans do regular distcp. >> >> On tape, make sure you have a backup program which can backup streams >> so you don't have to materialize your TB files outside of your Hadoop >> cluster first... (I know Simpana can't do that :-(). >> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <sediso...@gmail.com> >> wrote: >> > Folks, >> > >> > Its been an year and my HDFS / Solar /Hive setup is working flawless. >> > The >> > data logs which were meaningless to my business all of a sudden became >> > precious to the extent that our management wants to backup this data. I >> > am >> > talking about 20 TB of active HDFS data with an incremental of 2 >> > TB/month. >> > We would like to have weekly and monthly backups upto 12 months. >> > >> > Any ideas how to do this ? >> > >> > -- Steve > > -- Harsh J