Matthew Ahrens wrote:
Joseph Barbey wrote:
Robert Milkowski wrote:

JB> So, normally, when the script runs, all snapshots finish in maybe a minute JB> total. However, on Sundays, it continues to take longer and longer. On JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11. The only JB> thing special thing about Sunday's snapshots is that they are the first JB> ones created since the full backup (using NetBackup) on Saturday. All
JB> other backups are incrementals.

hmmmmm do you have atime property set to off?
Maybe you spend most of the time in destroying snapshots due to much
larger delta coused by atime updates? You can possibly also gain some
performance by setting atime to off.

Yep, atime is set to off for all pools and filesystems. I looked through the other possible properties, and nothing really looked like it would really affect things.

One additional weird thing. My script hits each filesystem (email-pool/A..Z) individually, so I can run zfs list -t snapshot and find out how long each snapshot actually takes. Everything runs fine until I get to around V or (normally) W. Then it can take a couple of hours on the one FS. After that, the rest go quickly.

So, what operation exactly is taking "a couple of hours on the one FS"? The only one I can imagine taking more than a minute would be 'zfs destroy', but even that should be very rare on a snapshot. Is it always the same FS that takes longer than the rest? Is the pool busy when you do the slow operation?

I've now determined that renaming the previous snapshot seems to be the problem in certain instances.

What we are currently doing through the script is to keep 2 weeks of daily snapshots of the various pool/filesystems. These snapshots are named {fs}.$Day-2, {fs}.$Day-2, and {fs}.snap. Specifically, for our 'V' filesystem, which is created under the email-pool, I will have the following snapshots:

  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]
  email-pool/[EMAIL PROTECTED]

So, my script does the following for each FS:
  Check for FS.$Day-2.  If exists, then destroy it.
  Check if there is a FS.$Day-1.  If so, rename it to $DAY-2.
  Check for FS.snap. If so, rename to FS.$Yesterday-1 (day it was created).
  Create FS.snap

I added logging to a file, along with the action just run and the time that it completed:

  Destroy email-pool/[EMAIL PROTECTED]    Sun Apr  8 00:01:04 CDT 2007
Rename email-pool/[EMAIL PROTECTED] email-pool/[EMAIL PROTECTED] Sun Apr 8 00:01:05 CDT 2007 Rename email-pool/[EMAIL PROTECTED] email-pool/[EMAIL PROTECTED] Sun Apr 8 00:54:52 CDT 2007
  Create email-pool/[EMAIL PROTECTED]    Sun Apr  8 00:54:53 CDT 2007

Looking at the above, Rename took from 00:01:05 until 00:54:52, so almost 54 minutes.

So, any ideas on why a rename should take so long? And again, why is this only happening on Sunday? Any other information I can provide that might help diagnose this?

Thanks again for any help on this.

--
Joe Barbey               IT Services/Network Services
office: (715) 425-4357   Davee Library room 166C
cell:   (715) 821-0008   UW - River Falls
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to