Incremental backups are nice to avoid copying all your data again.

You can code these at the application layer if you have nice partitioning
and keep track correctly.

You can also use platform level capabilities such as provided for by the
MapR distribution.

On Fri, Jan 25, 2013 at 3:23 PM, Harsh J <ha...@cloudera.com> wrote:

> You need some form of space capacity on the backup cluster that can
> withstand it. Lower replication (<3) may also be an option there to
> save yourself some disks/nodes?
>
> On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <sediso...@gmail.com> wrote:
> > Backup to disks is what we do right now. Distcp would copy across HDFS
> > clusters, meaning by I will have to build another 12 node cluster ? Is
> that
> > correct ?
> >
> >
> > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> > <mathias.herbe...@gmail.com> wrote:
> >>
> >> Backup on tape or on disk?
> >>
> >> On disk, have another Hadoop cluster dans do regular distcp.
> >>
> >> On tape, make sure you have a backup program which can backup streams
> >> so you don't have to materialize your TB files outside of your Hadoop
> >> cluster first... (I know Simpana can't do that :-().
> >>
> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <sediso...@gmail.com>
> >> wrote:
> >> > Folks,
> >> >
> >> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
> >> > The
> >> > data logs which were meaningless to my business all of a sudden became
> >> > precious to the extent that our management wants to backup this data.
> I
> >> > am
> >> > talking about 20 TB of active HDFS data with an incremental of 2
> >> > TB/month.
> >> > We would like to have weekly and monthly backups upto 12 months.
> >> >
> >> > Any ideas how to do this ?
> >> >
> >> > -- Steve
> >
> >
>
>
>
> --
> Harsh J
>

Reply via email to