> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Paul Kraus > > Is resilver time related to the amount of data (TBs) or the number > of objects (file + directory counts) ? I have seen zpools with lots of > data in very few files resilver quickly while smaller pools with lots > of tiny files take much longer (no hard data here, just recollection > of how long things took).
In some cases, it could be dependent on the total amount of data (TB) and be limited by sequential drive throughput. In that case, it will always be fast. In other cases, it could be dependent on a lot of small blocks scattered randomly about. In that case, it will be limited by random access time of the devices, and it's certain to be painfully slow. But in this conversation, we're trying to make a generalization. So let's define "typical," and discuss how each of the above cases is possible, and reach a generalization: Note: There is another common usage scenario. The home video server, or large static sequential file store. Which would have precisely the opposite usage characteristics. But for me, that's not typical, so when I'm the person writing, here is what I'm defining as "typical..." Typical: You have a nontrivial pool, with volatile data. Autosnapshots are on, which means snapshots are frequently created & destroyed. Some files & directories are deleted, created, and/or modified or appended to, in essentially random order. It is in the nature of COW (and therefore ZFS) to only write new copies of the changed blocks, while leaving old blocks in place, hence files become progressively more fragmented, as long as they are modified in the middles and ends (rather than deleted & recreated entirely). It is in the nature of ZFS small write aggregation into larger sequential blocks ... A bunch of small random writes are aggregated into a single larger sequential write ... And eventually those changes are changed or deleted, and snapshots destroyed, leaving a "hole" in the middle of what was formerly an aggregated sequential write... It's in the nature of ZFS to become progressively more fragmented in these too. All of the above is normal for any snapshot-capable filesystem. (Different implementations reach the same result.) Here is the part which is both a ZFS strength and weakness: Upon scrub or resilver, ZFS will only scrub or resilver the used blocks. It will not do the unused space. If you have a really small percentage of pool utilization, or highly sequential data, this is a strength. Because you get to skip over all the unused portions of disk, it will complete faster than resilvering or scrubbing the whole disk sequentially. Unfortunately, in my "typical" usage scenario, a system has been in volatile production for an extended time, so there is significant usage in the pool, which is highly fragmented. Unfortunately, in ZFS resilver (and I think scrub too) the order of resilvering blocks is NOT based on disk order, which means you don't get to simply perform a bunch of sequential disk reads and skip over all the unused sectors. Instead, your heads need to thrash around, randomly seeking small blocks all over the place, in essentially random order. So the answer to your question, assuming my "typical" usage and assuming hard drives (not SSD's etc) is: Resilver is dependent on neither the total quantity of data, nor the total number of files/directories. It is dependent on the number of used blocks in the vdev, and dependent on precisely how fragmented and how randomly those blocks are scattered throughout the vdev, and limited by the random access time of the vdev. YMMV, but here is one of my experiences: In a given pool that I admin, if I needed to resilver a whole disk including unused space, the sequential IO of the disk would be the limiting factor, and the time would be approx 2 hours. Instead, I am using ZFS, and this sytem is in "typical" production usage, and I am using mirrors. Hence, this is the best case scenario for a "typical" ZFS server with volatile data. My resilver took 12 hours. If I had used raidz2 with 8-2=6 disks, then it would have taken 3 days. So the conclusion to draw is: Yes, there are situations where ZFS resilver is a strength, and limited by serial throughput. But for what I call "typical" usage patterns, it's a weakness, and it's dramatically much worse than resilvering the whole disk sequentially. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss