Hmm, scratch that. Maybe.

I did not first get the point that your writes to a filesystem dataset work 
quickly.
Perhaps filesystem is (better) cached indeed, i.e. *maybe* zvol writes are 
synchronous and zfs writes may be cached and thus async? Try playing around 
with relevant dataset attributes...

I'm running a test on my system (a snv_114 Thumper, 16Gb RAM, used for other 
purposes as well), the CPU is mostly idle now (2.5-3.2% kernel time, that's 
about 
it). Seems I have results not unlike yours. Not cool because I wanted to play 
with
COMSTAR iSCSI - and I'm not sure it will perform well ;)

I'm dd'ing 30Gb to an uncompressed test zvol with same 64kb block sizes (maybe 
they are too small?), and zpool iostat goes like this - a hundred IOs at 7Mbps 
for a
minute, then a burst of 100-170Mbps and 20-25K IOps for a second:

pond        5.79T  4.41T      0    106      0  7.09M
pond        5.79T  4.41T      0  1.93K      0  20.7M
pond        5.79T  4.41T      0  13.3K      0   106M
pond        5.79T  4.41T      0    116      0  7.76M
pond        5.79T  4.41T      0    108      0  7.23M
pond        5.79T  4.41T      0    107      0  7.16M
pond        5.79T  4.41T      0    107      0  7.16M

or

pond        5.79T  4.41T      0    117      0  7.83M
pond        5.79T  4.41T      0  5.61K      0  49.7M
pond        5.79T  4.41T      0  19.0K    504   149M
pond        5.79T  4.41T      0    104      0  6.96M

Weird indeed.

It wrote 10Gb (according to "zfs get usedbydataset pond/test") taking roughly 
30 
minutes after which I killed it.

Now, writing to an uncompressed filesystem dataset (although very far from 
what's trumpeted as Thumper performance) yields quite different numbers:

pond        5.80T  4.40T      1  3.64K   1022   457M
pond        5.80T  4.40T      0    866    967  75.7M
pond        5.80T  4.40T      0  4.65K      0   586M
pond        5.80T  4.40T      6    802  33.4K  69.2M
pond        5.80T  4.40T     29  2.44K  1.10M   301M
pond        5.80T  4.40T     32    691   735K  25.0M
pond        5.80T  4.40T     56  1.59K  2.29M   184M
pond        5.80T  4.40T    150    768  4.61M  10.5M
pond        5.80T  4.40T      2      0  25.5K      0
pond        5.80T  4.40T      0  2.75K      0   341M
pond        5.80T  4.40T      7  3.96K   339K   497M
pond        5.80T  4.39T     85    740  3.57M  59.0M
pond        5.80T  4.39T     67      0  2.22M      0
pond        5.80T  4.39T      9  4.67K   292K   581M
pond        5.80T  4.39T      4  1.07K   126K   137M
pond        5.80T  4.39T     27    333   338K  9.15M
pond        5.80T  4.39T      5      0  28.0K  3.99K
pond        5.82T  4.37T      1  5.42K  1.67K   677M
pond        5.83T  4.37T      3  1.69K  8.36K   173M
pond        5.83T  4.37T      2      0  5.49K      0
pond        5.83T  4.37T      0  6.32K      0   790M
pond        5.83T  4.37T      2    290  7.95K  27.8M
pond        5.83T  4.37T      0  9.64K  1.23K  1.18G

The numbers are jumpy (maybe due to fragmentation, other processes, etc.) but
there are often spikes in excess of 500MBps.

The whole test took a relatively little time:

# time dd if=/dev/zero of=/pond/tmpnocompress/test30g bs=65536 count=500000
500000+0 records in
500000+0 records out

real    1m27.657s
user    0m0.302s
sys     0m46.976s

# du -hs /pond/tmpnocompress/test30g 
  30G   /pond/tmpnocompress/test30g

To detail about the pool: 

The pool is on a Sun X4500 with 48 250Gb SATA drives. It was created as a 9x5 
set (9 stripes made of 5-disk raidz1 vdevs) spread across different 
controllers, 
with the command:

# zpool create -f pond \
raidz1 c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 \
raidz1 c0t1d0 c1t2d0 c4t3d0 c6t5d0 c7t6d0 \
raidz1 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 \
raidz1 c0t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0 \
raidz1 c0t3d0 c1t3d0 c5t3d0 c6t3d0 c7t3d0 \
raidz1 c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0 \
raidz1 c0t5d0 c1t5d0 c4t5d0 c5t5d0 c7t5d0 \
raidz1 c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 \
raidz1 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0 \
spare c0t7d0

Alas, while there were many blogs, I couldn't find a definitive answer last 
year as
to which Thumper layout is optimal in performance and/or reliability (in regard 
to 
6 controllers of 8 disks each, with 2 disks on one of the controllers reserved 
for 
booting). 

As a result, we spread each raidz1 across 5 controllers, so the loss of one
controller should have minimal impact on data loss on the average. Since the 
system layout is not symmetrical, some controllers are more important than 
others (say, the boot one).

//Jim
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to