[zfs-discuss] Re: How does ZFS write data to disks?
writes to ZFS objects have significant data and meta-data implications, based on the zfs copy-on write implementation ... as data is written into a file object, for example, this update must eventually be written to a new location on physical disk, and all of the meta-data (from the uberblock down to this object) must be updated and re-written to a new location as well ... while in cache, the changes to these objects can be consolidated, but once written out to disk, any further changes would make this recent write obsolete and require it all to be written once again to yet another new location on the disk ... batching transactions for 5 seconds (the trigger discussed in zfs documentation) ... is essential to limiting the amount of redundant re-writing that takes place to physical disk ... keeping a disk busy 100% of the time by writing mostly the same data over and over makes far less sense than collecting a group of changes in cache and writing them efficiently every trigger period of time ... even with this optimization, our experience with small, sequential writes (4KB or less) to zvols that have been previously written (to ensure the mapping of real space on the physical disk) for example, show bandwidth values that are less than 10% of comparable larger (128KB or larger) writes ... you can see this behavior dramatically if you compare the amount of host initiated write data (front-end data) to the actual amount of IO performed to the physical disks (both reads and writes) to handle the host's front-end request ... for example, doing sequential 1MB writes to a (previously written) zvol (simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced more than 4GB of IO to the drives (with smaller write sizes this ratio gets progressively worse) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How does ZFS write data to disks?
Bill Moloney wrote: for example, doing sequential 1MB writes to a previously written) zvol (simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced more than 4GB of IO to the drives (with smaller write sizes this ratio gets progressively worse) How did you measure this? This would imply that rewriting a zvol would be limited at below 50% of disk bandwidth, not something I'm seeing at all. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: How does ZFS write data to disks?
Hello Bart, Wednesday, May 16, 2007, 6:07:36 PM, you wrote: BS Bill Moloney wrote: for example, doing sequential 1MB writes to a previously written) zvol (simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced more than 4GB of IO to the drives (with smaller write sizes this ratio gets progressively worse) BS How did you measure this? This would imply that rewriting BS a zvol would be limited at below 50% of disk bandwidth, not BS something I'm seeing at all. Perhaps zvol was created with default 128k block size, then smaller writes were issued. Perhaps lowering volblocksize to 8k or whatever avarage (or constant?) io size he is using would help? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How does ZFS write data to disks?
I think it's also important to note _how_ one measure performance (which is black magic at the best of times). I personally like to see averages since doing #iostat -xnz 10 doesn't tell me anything really. Since zfs likes to bundle and flush I want my (very expensive ;) Sun storage to give me all it's got. I'm not too concerned if a 5 second flush gives the disk subsystem a good workout, but when I/O utilization is around 100% with services times of 30+ ms over a period of a hour... then I might want to wheel the drawing board into the architects office. My 2c :) The only issue is when using iostat commands the bursts make it a little harder to gauge performance. Is it safe to assume that if those bursts were to reach the upper performance limit that it would spread the writes out a bit more? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: How does ZFS write data to disks?
I've noticed a similar behavior in my writes. ZFS seems to write in bursts of around 5 seconds. I assume it's just something to do with caching? I was watching the drive lights on the T2000s with 3 disk raidz and the disks all blink a couple seconds then are solid for a few seconds. Is this behavior ok? seems it would be better to have the disks writing the whole time instead of in bursts. On my thumper pool used avail read write read write -- - - - - - - vault1 10.7T 8.32T108561 7.23M 24.8M vault1 10.7T 8.32T108152 2.68M 5.90M vault1 10.7T 8.32T143177 6.49M 11.4M vault1 10.7T 8.32T147429 6.59M 27.0M [b]vault1 10.7T 8.32T111 3.89K 2.84M 131M[/b] vault1 10.7T 8.32T 74151 460K 6.72M vault1 10.7T 8.32T103180 1.71M 7.21M vault1 10.7T 8.32T119144 832K 5.69M vault1 10.7T 8.32T110185 2.51M 4.75M [b]vault1 10.7T 8.32T 94 2.17K 1.07M 137M vault1 10.7T 8.32T 36 2.87K 354K 24.9M[/b] vault1 10.7T 8.32T 69140 3.36M 6.00M vault1 10.7T 8.32T 60177 4.78M 12.9M vault1 10.7T 8.32T 90198 2.82M 5.22M [b]vault1 10.7T 8.32T 94 1.12K 2.22M 18.1M vault1 10.7T 8.32T 37 3.79K 2.06M 130M[/b] vault1 10.7T 8.32T 88254 2.43M 10.2M vault1 10.7T 8.32T137147 3.64M 7.05M vault1 10.7T 8.32T307415 5.84M 9.38M [b]vault1 10.7T 8.32T132 4.13K 2.26M 158M vault1 10.7T 8.32T 57 1.45K 1.89M 13.2M[/b] vault1 10.7T 8.32T 78148 577K 8.47M vault1 10.7T 8.32T 17159 749K 6.26M vault1 10.7T 8.32T 74248 598K 6.56M [b]vault1 10.7T 8.32T178 1.20K 1.62M 23.8M vault1 10.7T 8.32T 46 5.23K 1.01M 168M[/b] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How does ZFS write data to disks?
On Fri, 2007-05-11 at 09:00 -0700, lonny wrote: I've noticed a similar behavior in my writes. ZFS seems to write in bursts of around 5 seconds. I assume it's just something to do with caching? Yep - the ZFS equivalent of fsflush. Runs more often so the pipes don't get as clogged. We've had lots of rain here recently, so I'm sort of sensitive to stories of clogged pipes. Is this behavior ok? seems it would be better to have the disks writing the whole time instead of in bursts. Perhaps - although not in all cases (probably not in most cases). Wouldn't it be cool to actually do some nice sequential writes to the sweet spot of the disk bandwidth curve, but not depend on it so much that a single random I/O here and there throws you for a loop ? Human analogy - it's often more wise to work smarter than harder :-) Directly to your question - are you seeing any anomalies in file system read or write performance (bandwidth or latency) ? Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: How does ZFS write data to disks?
On May 11, 2007, at 9:09 AM, Bob Netherton wrote: **On Fri, 2007-05-11 at 09:00 -0700, lonny wrote: **I've noticed a similar behavior in my writes. ZFS seems to write in bursts of ** around 5 seconds. I assume it's just something to do with caching? ^Yep - the ZFS equivalent of fsflush. Runs more often so the pipes don't ^get as clogged. We've had lots of rain here recently, so I'm sort of ^sensitive to stories of clogged pipes. ^ **Is this behavior ok? seems it would be better to have the disks writing ** the whole time instead of in bursts. ^ ^Perhaps - although not in all cases (probably not in most cases). ^Wouldn't it be cool to actually do some nice sequential writes to ^the sweet spot of the disk bandwidth curve, but not depend on it ^so much that a single random I/O here and there throws you for ^a loop ? ^ ^Human analogy - it's often more wise to work smarter than harder :-) ^ ^Directly to your question - are you seeing any anomalies in file ^system read or write performance (bandwidth or latency) ? ^Bob No performance problems so far, the thumper and zfs seem to handle everything we throw at them. On the T2000 internal disks we were seeing a bottleneck when using a single disk for our apps but moving to a 3 disk raidz alleviated that. The only issue is when using iostat commands the bursts make it a little harder to gauge performance. Is it safe to assume that if those bursts were to reach the upper performance limit that it would spread the writes out a bit more? thanks lonny This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How does ZFS write data to disks?
lonny wrote: On May 11, 2007, at 9:09 AM, Bob Netherton wrote: **On Fri, 2007-05-11 at 09:00 -0700, lonny wrote: **I've noticed a similar behavior in my writes. ZFS seems to write in bursts of ** around 5 seconds. I assume it's just something to do with caching? ^Yep - the ZFS equivalent of fsflush. Runs more often so the pipes don't ^get as clogged. We've had lots of rain here recently, so I'm sort of ^sensitive to stories of clogged pipes. ^ **Is this behavior ok? seems it would be better to have the disks writing ** the whole time instead of in bursts. ^ ^Perhaps - although not in all cases (probably not in most cases). ^Wouldn't it be cool to actually do some nice sequential writes to ^the sweet spot of the disk bandwidth curve, but not depend on it ^so much that a single random I/O here and there throws you for ^a loop ? ^ ^Human analogy - it's often more wise to work smarter than harder :-) ^ ^Directly to your question - are you seeing any anomalies in file ^system read or write performance (bandwidth or latency) ? ^Bob No performance problems so far, the thumper and zfs seem to handle everything we throw at them. On the T2000 internal disks we were seeing a bottleneck when using a single disk for our apps but moving to a 3 disk raidz alleviated that. The only issue is when using iostat commands the bursts make it a little harder to gauge performance. Is it safe to assume that if those bursts were to reach the upper performance limit that it would spread the writes out a bit more? The burst of activity every 5 seconds is when the transaction group is committed. Batching up the writes in this way can lead to a number of efficiencies (as Bob hinted). With heavier activity the writes will not get spread out, but will just takes longer. Another way to look at the gaps of IO inactivity is that they indicate underutilisation. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss