You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?

On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <sediso...@gmail.com> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
>
>
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <mathias.herbe...@gmail.com> wrote:
>>
>> Backup on tape or on disk?
>>
>> On disk, have another Hadoop cluster dans do regular distcp.
>>
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>>
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <sediso...@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> >
>> > Any ideas how to do this ?
>> >
>> > -- Steve
>
>



-- 
Harsh J

Reply via email to