The discussion is really old: writing many small files on an nfs mounted zfs
filesystem is slow without ssd zil due to the sync nature of the nfs protocol
itself. But there is something I don't really understand. My tests on an old
opteron box with 2 small u160 scsi arrays and a zpool with 4 mirrored vdevs
built from 146gb disks show mostly idle disks when untarring an archive with
many small files over nfs. Any source package can be used for this test. I'm on
zpool version 22 (still sxce b130, the client is opensolaris b130), nfs mount
options are all default, NFSD_SERVERS=128.
Configuration of the pool is like this:
zpool status ib1
pool: ib1
state: ONLINE
scrub: scrub completed after 0h52m with 0 errors on Sat Jan 15 14:19:02 2011
config:
NAME STATE READ WRITE CKSUM
ib1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
zpool iostat -v shows
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ib1 268G 276G 0 180 0 723K
mirror 95.4G 40.6G 0 44 0 180K
c1t4d0 - - 0 44 0 180K
c3t0d0 - - 0 44 0 180K
mirror 95.2G 40.8G 0 44 0 180K
c1t6d0 - - 0 44 0 180K
c4t0d0 - - 0 44 0 180K
mirror 39.0G 97.0G 0 45 0 184K
c3t3d0 - - 0 45 0 184K
c4t3d0 - - 0 45 0 184K
mirror 38.5G 97.5G 0 44 0 180K
c3t4d0 - - 0 44 0 180K
c4t4d0 - - 0 44 0 180K
---------- ----- ----- ----- ----- ----- -----
So each disk gets 40-50 iops, 180 ops on the whole pool (mirrored). Note that
these u320 scsi disks should be able to handle about 150 iops per disk, so
theres no iops aggregation. The strange thing is the following iostat -MindexC
output:
extended device statistics ---- errors ---
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot
device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14
c0t0d0
0.0 186.0 0.0 0.4 0.0 0.0 0.0 0.1 0 2 0 0 0 0 c1
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c1t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c1t5d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c1t6d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t2d0
0.0 279.5 0.0 0.5 0.0 0.0 0.0 0.1 0 3 0 0 0 0 c3
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c3t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c3t2d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t3d0
0.0 93.5 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t4d0
0.0 279.0 0.0 0.5 0.0 0.0 0.0 0.2 0 5 0 0 0 0 c4
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.3 0 3 0 0 0 0
c4t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c4t2d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c4t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c4t1d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c4t3d0
Service times for the involved disks are around 0.1-0.3 msec, I think this is
the sequential write nature of zfs. The disks are at most 3% busy. When writing
synchronous I'd expect 100% busy disks. And when reading or writing locally the
disks really get busy, about 50 MB/sec per disk due to the 160 MB/sec scsi bus
limitation per channel (there are 2 u160 channels with 3 disks each, and 1
channel with 2 disks).
Richard Ellings zilstat gives
N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB
4-32kB >=32kB
9552 9552 9552 671744 671744 671744 164 164
0 0
10192 10192 10192 724992 724992 724992 177 177
0 0
9568 9568 9568 679936 679936 679936 166 166
0 0
11712 11712 11712 823296 823296 823296 201 201
0 0
10784 10784 10784 765952 765952 765952 187 187
0 0
10024 10024 10024 708608 708608 708608 173 173
0 0
About 200 zil ops all < 4k as maximum. As said the disks aren't busy during
this test.
The test zfs ist configured with atime off. logbias nearly doesn't matter, with
logbias=latency the iops rate is a little bit lower.
Attached are some bonnie++ results to show, that all disks and the whole pool
are quite healthy. I get > 1000 random reads/sec local and still nearly 900
reads/sec via nfs. For large files I easily get gbit wirespeed (105 MB/sec
read) with nfs. And for random reads in a bonnie or iozone test the disks are
really 80%-100% busy. Just for small files the array sits almost idle, the
array can do way more. I discovered this on different solaris versions, not
only this test system. Is there any explanation for this behaviour?
Thanks,
Michael
--
This message posted from opensolaris.org
local
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ibmr10 16G 108972 25 89923 21 263540 26 1074
3
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 30359 99 +++++ +++ +++++ +++ 24836 99 +++++ +++ +++++ +++
ibmr10,16G,,,108972,25,89923,21,,,263540,26,1073.5,3,16,30359,99,+++++,+++,+++++,+++,24836,99,+++++,+++,+++++,+++
NFS
Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nfsibmr10 16G 50022 11 42524 14 105335 18 884.8 20
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 152 3 +++++ +++ 182 1 151 3 +++++ +++ 183 1
nfsibmr10,16G,,,50022,11,42524,14,,,105335,18,884.8,20,16,152,3,+++++,+++,182,1,151,3,+++++,+++,183,1
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss