I don't know of any reason why we can't turn 1 backup job per filesystem
into say, up to say , 26 based on the cyrus file and directory
structure.

The cyrus file and directory structure is designed with users located
under the directories A,B,C,D,etc to deal with the millions of little
files issue at the  filesystem layer.

Our backups will have to be changed to use this design feature.
There will be a little work on the front end  to create the jobs but
once done the full backups should finish in a couple of hours.

As an aside, we are currently upgrading our backup server to a sun4v
machine.
This architecture is well suited to run more jobs in parallel.
 
Thanx for all your help and advice.

Ed

On Tue, 2009-08-11 at 22:47, Mike Gerdts wrote:
> On Tue, Aug 11, 2009 at 9:39 AM, Ed Spencer<ed_spen...@umanitoba.ca> wrote:
> > We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on
> > saturday. We backup to disk and then clone to tape. Our backup people
> > can only handle doing 2 filesystems per night.
> >
> > Creating more filesystems to increase the parallelism of our backup is
> > one solution but its a major redesign of the of the mail system.
> 
> What is magical about a 1:1 mapping of backup job to file system?
> According to the Networker manual[1], a save set in Networker can be
> configured to back up certain directories.  According to some random
> documentation about Cyrus[2], mail boxes fall under a pretty
> predictable hierarchy.
> 
> 1. http://oregonstate.edu/net/services/backups/clients/7_4/admin7_4.pdf
> 2. http://nakedape.cc/info/Cyrus-IMAP-HOWTO/components.html
> 
> Assuming that the way that your mailboxes get hashed fall into a
> structure like $fs/b/bigbird and $fs/g/grover (and not just
> $fs/bigbird and $fs/grover), you should be able to set a save set per
> top level directory or per group of a few directories.  That is,
> create a save set for $fs/a, $fs/b, etc. or $fs/a - $fs/d, $fs/e -
> $fs/h, etc.  If you are able to create many smaller save sets and turn
> the parallelism up you should be able to drive more throughput.
> 
> I wouldn't get too worried about ensuring that they all start at the
> same time[3], but it would probably make sense to prioritize the
> larger ones so that they start early and the smaller ones can fill in
> the parallelism gaps as the longer-running ones finish.
> 
> 3. That is, there is sometimes benefit in having many more jobs to run
> than you have concurrent streams.  This avoids having one save set
> that finishes long after all the others because of poorly balanced
> save sets.
> 
> -- 
> Mike Gerdts
> http://mgerdts.blogspot.com/
-- 
Ed 


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to