Thank you for all your replies, I'm collecting my responses in one
message below:

On Tue, Aug 18, 2009 at 7:43 PM, Nicolas
Williams<nicolas.willi...@sun.com> wrote:
> On Tue, Aug 18, 2009 at 04:22:19PM -0400, Paul Kraus wrote:
>>        We have a system with some large datasets (3.3 TB and about 35
>> million files) and conventional backups take a long time (using
>> Netbackup 6.5 a FULL takes between two and three days, differential
>> incrementals, even with very few files changing, take between 15 and
>> 20 hours). We already use snapshots for day to day restores, but we
>> need the 'real' backups for DR.
>
> zfs send will be very fast for "differential incrementals ... with very
> few files changing" since zfs send is a block-level diff based on the
> differences between the selected snapshots.  Where a traditional backup
> tool would have to traverse the entire filesystem (modulo pruning based
> on ctime/mtime), zfs send simply traverses a list of changed blocks
> that's kept up by ZFS as you make changes in the first place.

Our testing indicates that for incremental zfs send the speed is very
good, and seems to be bandwidth limited and not limited by file count.
For example, while testing incremental sends I got the following
results:

~450,000 files sent, ~8.3 GB sent @ 690 files/sec. and 13 MB/sec.
~900,000 files sent, ~13 GB sent @ 890 files/sec. and 13 MB/sec.
~450,000 files sent, ~ 4.6 GB sent @ 1,800 files/sec. and 19 MB/sec.

Full zfs sends produced:

~2.5 million files, ~87 GB @ 500 files/sec. and 18 MB/sec.
~3.4 million files, ~ 100 GB @ 600 files/sec. and 19 MB/sec.

> For a *full* backup zfs send and traditional backup tools will have
> similar results as both will be I/O bound and both will have more or
> less the same number of I/Os to do.

The zfs send FULLS are in close agreement with what we are seeing with
a FULL NBU backup.

> Caveat: zfs send formats are not guraranteed to be backwards
> compatible, therefore zfs send is not suitable for long-term backups.

        Yup, we only need them for 5 weeks, and when we upgrade the
server (and ZFS version) we would need to do a new set of fulls.

On Tue, Aug 18, 2009 at 8:54 PM,  Mattias Pantzare <pant...@ludd.ltu.se> wrote:

> Conventional backups can be faster that that! I have not used
> netbackup but you should be able to configure netbackup to run several
> backup streams in parallel. You may have to point netbackup to subdirs
> instead of the file system root.

        We have over 180 filesystems on the production server right
now, we are really trying to avoid any manual customization of the
backup policy. In a previous incarnation this data lived on a Mac OS X
server in one FS (only about 4 TB total at that point), full backups
took so long that we manually configured three NBU policies with many
individual directories ... it was a nighmare as new data (and
directories) were added.

On Tue, Aug 18, 2009 at 10:33 PM, Mike Gerdts <mger...@gmail.com> wrote:

> This was discussed in another thread as well.
>
> http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0

        Thanks for that pointer. I had missed that thread in my
search, I just hadn't hit the right keywords. This thread got me
thinking about our data layout. Currently the data broken up by both
department and project. Each department gets a zpool and each project
within the department a dataset/zfs. Departments range in size from
one mirrored pair of LUNs (512 GB) to 11 mirrored pairs of LUNs (5.5
TB). Projects range from a few KB to 3.3 TB (and 33 million files).
The data is all relatively small, images of documents, but there are
many, many of them.

        Is there any throughput penalty for the dataset being part of
a bigger zpool ? In other words, am I more likely to get better FULL
throughput if I move the data to a dedicated zpool instead of a child
dataset ? We *can* change our model to assign each project a separate
zpool, but that would be wasteful of space. Perhaps move a given
project to it's own zpool when it grows to a certain size (>1 TB
maybe). But, if there would not be any performance advantage, it's not
worth the effort.

        I had assumed that a full zfs send would just stream the
underlying zfs structure and not really deal with individual files,
but if the dataset is part of a shared zpool then I guess it has to
look at the files' metadata to determine if a given file is part of
that dataset.

P.S. We are planning to move the backend stoage to JBOD (probably
J4400), but that is not where we are today, and we can't count on that
happening soon.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer, "The Pajama Game" @ Schenectady Light Opera Company
( http://www.sloctheater.org/ )
-> Technical Advisor, Lunacon 2010 (http://www.lunacon.org/)
-> Technical Advisor, RPI Players
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to