Hi,

after some testing with ZFS I noticed that read requests are not scheduled even to the drives but the first one gets predominately selected:


My pool is setup as follows:

        NAME        STATE     READ WRITE CKSUM
        tpc         ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c4t7d0  ONLINE       0     0     0


Disk I/O after doing some benchmarking:

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tpc         7.70G  50.9G     85     21  10.5M  1.08M
  mirror    1.10G  7.28G     11      3  1.47M   159K
    c1t0d0      -      -     10      2  1.34M   159K
    c4t0d0      -      -      1      2   138K   159K
  mirror    1.10G  7.27G     11      3  1.48M   159K
    c1t1d0      -      -     10      2  1.34M   159K
    c4t1d0      -      -      1      2   140K   159K
  mirror    1.09G  7.28G     12      3  1.50M   159K
    c1t2d0      -      -     10      2  1.37M   159K
    c4t2d0      -      -      0      2   128K   159K
  mirror    1.10G  7.28G     12      3  1.53M   158K
    c1t3d0      -      -     11      2  1.42M   158K
    c4t3d0      -      -      0      2   110K   158K
  mirror    1.10G  7.28G     11      3  1.44M   158K
    c1t4d0      -      -     10      2  1.33M   158K
    c4t4d0      -      -      0      2   112K   158K
  mirror    1.10G  7.28G     12      3  1.53M   158K
    c1t6d0      -      -     11      2  1.42M   158K
    c4t6d0      -      -      0      2   106K   158K
  mirror    1.11G  7.26G     12      3  1.55M   158K
    c1t7d0      -      -     11      2  1.42M   158K
    c4t7d0      -      -      1      2   130K   158K
----------  -----  -----  -----  -----  -----  -----


or with "iostat"
   11.4    4.3 1451.1  157.1  0.0  0.3    0.4   19.6   0  17 c1t7d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5d0
   10.7    4.3 1361.4  158.4  0.0  0.3    0.4   22.1   0  18 c1t0d0
   10.9    4.3 1395.7  157.9  0.0  0.3    0.4   18.6   0  16 c1t2d0
    1.0    4.3  129.0  157.1  0.0  0.0    0.8    8.9   0   2 c4t7d0
    0.9    4.3  112.0  156.9  0.0  0.0    0.9    9.4   0   2 c4t4d0
    1.1    4.4  139.5  158.3  0.0  0.0    0.9    8.8   0   3 c4t1d0
   10.6    4.3 1354.8  157.0  0.0  0.3    0.4   18.8   0  16 c1t4d0
    0.9    4.3  109.2  157.3  0.0  0.1    0.9    9.7   0   3 c4t3d0
   10.7    4.4 1363.4  158.3  0.0  0.3    0.4   21.9   0  18 c1t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t8d0
    1.0    4.3  127.0  157.8  0.0  0.0    0.9    9.0   0   2 c4t2d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c1t8d0
   11.4    4.3 1449.9  156.9  0.0  0.3    0.4   20.0   0  17 c1t6d0
    0.8    4.3  105.4  156.8  0.0  0.0    0.9    8.5   0   2 c4t6d0
   11.3    4.3 1447.4  157.4  0.0  0.3    0.4   18.9   0  17 c1t3d0
    1.1    4.4  137.7  158.4  0.0  0.0    0.9    8.8   0   2 c4t0d0



So you can see the second disk of each mirror pair (c4tXd0) gets almost no I/O. How does ZFS decide from which mirror device to read?


And just another notice:
SVM does offer kstat values of type KSTAT_TYPE_IO. Why not ZFS (at least on zpool level)?

And BTW (not ZFS related, but SVM):
With the introduction of the SVM bunnahabhain project (friendly names) "iostat -n" output is now completely useless - even if you still use the old naming scheme:

% iostat -n
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    2.3   0   0 c0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    2.4   0   0 c0d1
    0.0    5.0    0.7   21.8  0.0  0.0    0.0    1.5   0   1 c3d0
    0.0    4.1    0.6   20.9  0.0  0.0    0.0    2.8   0   1 c4d0
    1.6   37.3   16.6  164.3  0.1  0.1    2.5    1.6   1   5 c2d0
    1.6   37.5   16.5  164.5  0.1  0.1    3.2    1.7   1   5 c1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 fd0
    2.9    1.9   19.3    4.8  0.0  0.2    0.3   37.2   0   1 md5
    0.0    0.0    0.0    0.0  0.0  0.0    0.0   19.9   0   0 md12
    0.0    0.0    0.0    0.0  0.0  0.0    0.0   12.4   0   0 md13
    0.0    0.0    0.0    0.0  0.0  0.0    3.9   17.7   0   0 md14
    1.5    1.9    9.6    4.8  0.0  0.1    0.0   35.7   0   0 md15
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 md16
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 md17
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 md18
    1.5    1.9    9.6    4.8  0.0  0.1    0.0   27.7   0   0 md19

Instead of "mdXXX" is was expecting the following names:

% ls -lL /dev/md/dsk
Gesamt 0
brw-r-----   1 root     sys       85,  5 Mai 26 00:43 d1
brw-r-----   1 root     sys       85, 15 Mai 26 00:43 root-0
brw-r-----   1 root     sys       85, 19 Mai 26 00:43 root-1
brw-r-----   1 root     sys       85, 18 Mai 26 00:43 scratch
brw-r-----   1 root     sys       85, 16 Mai 26 00:43 scratch-0
brw-r-----   1 root     sys       85, 17 Mai 26 00:43 scratch-1
brw-r-----   1 root     sys       85, 14 Mai 25 17:51 swap
brw-r-----   1 root     sys       85, 12 Mai 26 00:43 swap-0
brw-r-----   1 root     sys       85, 13 Mai 26 00:43 swap-1




Daniel

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to