Hi Steve,

            On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
            Though its not really a backup it is just a point for which you
can revert back at any point in time.

Best,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 11:53 AM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <sediso...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <mathias.herbe...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <sediso...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Reply via email to