> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Mark Sandrock > > I'm working with someone who replaced a failed 1TB drive (50% > utilized), > on an X4540 running OS build 134, and I think something must be wrong. > > Last Tuesday afternoon, zpool status reported: > > scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go > > and a week being 168 hours, that put completion at sometime tomorrow > night. > > However, he just reported zpool status shows: > > scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go > > so it's looking more like 2011 now. That can't be right. > > I'm hoping for a suggestion or two on this issue.
For a typical live system, which has been in production for a long time with files being created, snapshotted, partially overwritten, snapshots destroyed, etc etc... the blocks written to disk tend to be largely written in random order. And at least for now, the order of resilvering blocks is by creation time, not disk location. So resilver time is typically limited by IOPS for random IO, and the number of records that are in the affected vdev. To reduce the number of records in an affected vdev, it is effective to build the pool using mirrors instead of raidz... Or use smaller vdevs of raidz1 instead of large raidz3. Unfortunately, you're not going to be able to change that with an existing system. Roughly speaking, a 23-disk raidz3 with capacity of 20 disks would take 40x longer to resilver than one of the mirrors in a 40-disk stripe of mirrors with capacity of 20 disks. In rough numbers, that might be 20 days instead of 12 hours. To reduce the IOPS time... (for background info: Under normal circumstances, you should disable the HBA WriteBack cache if you have dedicated log present (on the X4275 that is done via realtek HBA utility, I don't know about X4540) ) ... But during resilver, you might enable the WriteBack for the drive that's being resilvered. I don't know for sure if that will help, but I think it should make some difference, because the logic which led to the disabling of WB does not apply to resilver writes. To reduce the number of records to resilver... * If possible, disable the creation of new snapshots while resilver is running. * If possible, delete files and destroy old snapshots that are not needed anymore * If possible, limit new writes to the system. By the way, I'm sorry to say ... Also don't trust the progress indicator. You're likely to reach 100% completed, and stay there for a long time. Even 2T resilvered on a 1T disk... This is an ugly area which looks bad in the face, but it's actually physically correct because the filesystem is in use, and performing new writes during the resilver... To reduce the IOPS time... * If possible, limit the "live" IO to the system. Resilver has lower priority and therefore gets delayed a lot for production systems. * Definitely DON'T scrub the pool while it's resilvering. Maybe you might be able to offload some of the IO by adding cache devices, dedicated log, or ram? Meaning... I know it's sound in principle, but YMMV immensely, depending on your workload. All of the above is likely to be not amazingly effective. There's not much you can do, if you started with a huge raidz3, for example. The most important thing you can do to affect resilver time is choose to use mirrors instead of raidz, at the time of pool creation. So, as a last ditch effort ... If you "zfs send" the pool to some other storage, and then recreate your local pool, which will be empty and therefore resilver completed, because zfs only resilvers used blocks... and then "zfs send" the data back to restore the pool... Then besides the fact that your resilver has been forcibly completed, your received data will also be ordered on disk optimally, which will greatly help in case another resilver is needed in the near future... and you create an opportunity to revisit the pool architecture, possibly in favor of mirrors instead of raidz. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss