[zfs-discuss] Re: How does ZFS write data to disks?

2007-05-16 Thread Bill Moloney
writes to ZFS objects have significant data and meta-data implications, based 
on the zfs copy-on write implementation ... as data is written into a file 
object, for example, this update must eventually be written to a new location 
on physical disk, and all of the meta-data (from the uberblock down to this 
object) must be updated and re-written to a new location as well ... while in 
cache, the changes to these objects can be consolidated, but once written out 
to disk, any further changes would make this recent write obsolete and require 
it  all to be written once again to yet another new location on the disk ... 
batching transactions for 5 seconds (the trigger discussed in zfs 
documentation) ... is essential to limiting the amount of redundant re-writing 
that takes place to physical disk ... keeping a disk busy 100% of the time by 
writing mostly the same data over and over makes far less sense than collecting 
a group of changes in cache and writing them efficiently every trigger period 
of time ... even with this  optimization, our experience with small, sequential 
writes (4KB or less)  to zvols that have been previously written (to ensure the 
mapping of real space on the physical disk) for example, show bandwidth values 
that are less than 10%  of comparable larger (128KB or larger) writes ... you 
can see this behavior dramatically if you compare the amount of host initiated 
write data (front-end data) to the actual amount of IO  performed to the 
physical disks (both reads and writes) to handle the host's  front-end request 
... for example, doing sequential 1MB writes to a  (previously written) zvol 
(simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced 
more than 4GB of IO to the drives (with smaller write sizes this ratio gets 
progressively worse)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How does ZFS write data to disks?

2007-05-16 Thread Bart Smaalders

Bill Moloney wrote:
for example, doing sequential 1MB writes to a  
previously written) zvol (simple catenation of 5 
FC drives in a JBOD) and writing 2GB of data induced 
more than 4GB of IO to the drives (with smaller write 
sizes this ratio gets progressively worse)


How did you measure this?  This would imply that rewriting
a zvol would be limited at below 50% of disk bandwidth, not
something I'm seeing at all.

- Bart

--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: How does ZFS write data to disks?

2007-05-16 Thread Robert Milkowski
Hello Bart,

Wednesday, May 16, 2007, 6:07:36 PM, you wrote:

BS Bill Moloney wrote:
 for example, doing sequential 1MB writes to a  
 previously written) zvol (simple catenation of 5 
 FC drives in a JBOD) and writing 2GB of data induced 
 more than 4GB of IO to the drives (with smaller write 
 sizes this ratio gets progressively worse)

BS How did you measure this?  This would imply that rewriting
BS a zvol would be limited at below 50% of disk bandwidth, not
BS something I'm seeing at all.

Perhaps zvol was created with default 128k block size, then smaller
writes were issued. Perhaps lowering volblocksize to 8k or whatever
avarage (or constant?) io size he is using would help?

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How does ZFS write data to disks?

2007-05-12 Thread Louwtjie Burger

I think it's also important to note _how_ one measure performance
(which is black magic at the best of times).

I personally like to see averages since doing #iostat -xnz 10 doesn't
tell me anything really. Since zfs likes to bundle and flush I want
my (very expensive ;) Sun storage to give me all it's got.

I'm not too concerned if a 5 second flush gives the disk subsystem a
good workout, but when I/O utilization is around 100% with services
times of 30+ ms over a period of a hour... then I might want to wheel
the drawing board into the architects office.

My 2c :)


 The only issue is when using iostat commands the bursts make it a little 
harder to gauge performance. Is it safe to assume that if those bursts were to 
reach the upper performance limit that it would spread the writes out a bit more?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: How does ZFS write data to disks?

2007-05-11 Thread lonny
I've noticed a similar behavior in my writes. ZFS seems to write in bursts of 
around 5 seconds. I assume it's just something to do with caching? I was 
watching the drive lights on the T2000s with 3 disk raidz and the disks all 
blink a couple seconds then are solid for a few seconds. 

Is this behavior ok? seems it would be better to have the disks writing the 
whole time instead of in bursts.

On my thumper
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
vault1  10.7T  8.32T108561  7.23M  24.8M
vault1  10.7T  8.32T108152  2.68M  5.90M
vault1  10.7T  8.32T143177  6.49M  11.4M
vault1  10.7T  8.32T147429  6.59M  27.0M
[b]vault1  10.7T  8.32T111  3.89K  2.84M   131M[/b]
vault1  10.7T  8.32T 74151   460K  6.72M
vault1  10.7T  8.32T103180  1.71M  7.21M
vault1  10.7T  8.32T119144   832K  5.69M
vault1  10.7T  8.32T110185  2.51M  4.75M
[b]vault1  10.7T  8.32T 94  2.17K  1.07M   137M
vault1  10.7T  8.32T 36  2.87K   354K  24.9M[/b]
vault1  10.7T  8.32T 69140  3.36M  6.00M
vault1  10.7T  8.32T 60177  4.78M  12.9M
vault1  10.7T  8.32T 90198  2.82M  5.22M
[b]vault1  10.7T  8.32T 94  1.12K  2.22M  18.1M
vault1  10.7T  8.32T 37  3.79K  2.06M   130M[/b]
vault1  10.7T  8.32T 88254  2.43M  10.2M
vault1  10.7T  8.32T137147  3.64M  7.05M
vault1  10.7T  8.32T307415  5.84M  9.38M
[b]vault1  10.7T  8.32T132  4.13K  2.26M   158M
vault1  10.7T  8.32T 57  1.45K  1.89M  13.2M[/b]
vault1  10.7T  8.32T 78148   577K  8.47M
vault1  10.7T  8.32T 17159   749K  6.26M
vault1  10.7T  8.32T 74248   598K  6.56M
[b]vault1  10.7T  8.32T178  1.20K  1.62M  23.8M
vault1  10.7T  8.32T 46  5.23K  1.01M   168M[/b]
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How does ZFS write data to disks?

2007-05-11 Thread Bob Netherton
On Fri, 2007-05-11 at 09:00 -0700, lonny wrote:
 I've noticed a similar behavior in my writes. ZFS seems to write in bursts of
  around 5 seconds. I assume it's just something to do with caching? 

Yep - the ZFS equivalent of fsflush.  Runs more often so the pipes don't
get as clogged.   We've had lots of rain here recently, so I'm sort of
sensitive to stories of clogged pipes.

 Is this behavior ok? seems it would be better to have the disks writing
  the whole time instead of in bursts.

Perhaps - although not in all cases (probably not in most cases). 
Wouldn't it be cool to actually do some nice sequential writes to
the sweet spot of the disk bandwidth curve, but not depend on it
so much that a single random I/O here and there throws you for
a loop ?

Human analogy - it's often more wise to work smarter than harder :-)

Directly to your question - are you seeing any anomalies in file
system read or write performance (bandwidth or latency) ?

Bob



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: How does ZFS write data to disks?

2007-05-11 Thread lonny
On May 11, 2007, at 9:09 AM, Bob Netherton wrote:

**On Fri, 2007-05-11 at 09:00 -0700, lonny wrote:
**I've noticed a similar behavior in my writes. ZFS seems to write in bursts of
** around 5 seconds. I assume it's just something to do with caching?

^Yep - the ZFS equivalent of fsflush.  Runs more often so the pipes don't
^get as clogged.   We've had lots of rain here recently, so I'm sort of
^sensitive to stories of clogged pipes.
^
**Is this behavior ok? seems it would be better to have the disks writing
** the whole time instead of in bursts.
^
^Perhaps - although not in all cases (probably not in most cases).
^Wouldn't it be cool to actually do some nice sequential writes to
^the sweet spot of the disk bandwidth curve, but not depend on it
^so much that a single random I/O here and there throws you for
^a loop ?
^
^Human analogy - it's often more wise to work smarter than harder :-)
^
^Directly to your question - are you seeing any anomalies in file
^system read or write performance (bandwidth or latency) ?

^Bob


No performance problems so far, the thumper and zfs seem to handle everything 
we throw at them. On the T2000 internal disks we were seeing a bottleneck when 
using a single disk for our apps but moving to a 3 disk raidz alleviated that.

The only issue is when using iostat commands the bursts make it a little harder 
to gauge performance. Is it safe to assume that if those bursts were to reach 
the upper performance limit that it would spread the writes out a bit more?

thanks
lonny
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How does ZFS write data to disks?

2007-05-11 Thread Neil . Perrin

lonny wrote:

On May 11, 2007, at 9:09 AM, Bob Netherton wrote:

**On Fri, 2007-05-11 at 09:00 -0700, lonny wrote:
**I've noticed a similar behavior in my writes. ZFS seems to write in bursts of
** around 5 seconds. I assume it's just something to do with caching?

^Yep - the ZFS equivalent of fsflush.  Runs more often so the pipes don't
^get as clogged.   We've had lots of rain here recently, so I'm sort of
^sensitive to stories of clogged pipes.
^
**Is this behavior ok? seems it would be better to have the disks writing
** the whole time instead of in bursts.
^
^Perhaps - although not in all cases (probably not in most cases).
^Wouldn't it be cool to actually do some nice sequential writes to
^the sweet spot of the disk bandwidth curve, but not depend on it
^so much that a single random I/O here and there throws you for
^a loop ?
^
^Human analogy - it's often more wise to work smarter than harder :-)
^
^Directly to your question - are you seeing any anomalies in file
^system read or write performance (bandwidth or latency) ?

^Bob


No performance problems so far, the thumper and zfs seem to handle everything 
we throw at them. On the T2000 internal disks we were seeing a bottleneck when 
using a single disk for our apps but moving to a 3 disk raidz alleviated that.

The only issue is when using iostat commands the bursts make it a little harder 
to gauge performance. Is it safe to assume that if those bursts were to reach 
the upper performance limit that it would spread the writes out a bit more?


The burst of activity every 5 seconds is when the transaction group is 
committed.
Batching up the writes in this way can lead to a number of efficiencies (as Bob 
hinted).
With heavier activity the writes will not get spread out, but will just takes 
longer.
Another way to look at the gaps of IO inactivity is that they indicate 
underutilisation.


Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss