[zfs-code] High sys utilization with zfs mirror (on file create?)

Dave Pratt Tue, 06 Nov 2007 14:08:24 PST

Sorry if this is not the correct place for this question/inquiry, but I'm now 
stuck in a hard place as I can't get the i/o performance I want using UFS and 
ZFS seems to cause blocking on network operations during even moderate levels 
of file creation operations. At any rate here goes:


Running Sol10 8/07 with the full compliment of 10_reccomended patches on 6-core 
T1000s.  Running our custom email server application which, not surprisingly, 
hits local disk very hard with many small file creates, writes, moves, deletes. 
 On traditional UFS (on solid state disks with disksuite mirrored FS), we had 
no system problems at full load.  On ZFS (on SAS 144gb drives), with a mirrored 
pool like such:

  pool: base
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        base          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c0t0d0s3  ONLINE       0     0     0
            c0t1d0s3  ONLINE       0     0     0

errors: No known data errors

-----------------
NAME         USED  AVAIL  REFER  MOUNTPOINT
base        1.20G   105G  1.50K  none
base/spool   184M   105G   184M  legacy
base/usr     989M   105G   989M  legacy
base/var    48.1M   105G  48.1M  legacy

We start showing serious kernel hiccups at low load -- pauses of 1-40 seconds 
where there's no external activity -- network nonresponsive, console 
nonresponsive, all statistics just show a gap (e.g. vmstat skips n seconds of 
data).  Right before this happens kernel cpu time pegs at 100%.  There may be a 
correlation in file creates before this happens.  

fsstat looks like:

 163    80   322 2.82K   177  4.91K    10   426  629K   969 2.28M /var/spool
   0    56   108 2.64K     1  3.39K    47   465  727K   164   164 /var/spool
   0    62     0 1.00K     0  1.63K     4   430 1.43M    91    91 /var/spool
   0    37     0   519     0    828     8   164  579K    49    49 /var/spool
  32     9    18   180     0    397     2     2 1.18K    16 1.63K /var/spool
  96    57   160 2.84K    86  4.54K    77   435 1.05M   645 1.48M /var/spool
   0    49    38 1.18K     9  1.77K    11   320  898K   101 26.6K /var/spool
   1    12     1   324     1    463     0     2 4.45K    25   534 /var/spool
  65     3    42   191     3    491     6    13 4.03K    21 8.76K /var/spool
new  name   name  attr  attr lookup rddir  read read  write write
file remov  chng   get   set    ops   ops   ops bytes   ops bytes
1014    71   302 1.70K    24  5.65K    19    58 85.4K   113 86.5K /var/spool
  73   415   964 2.71K   161  8.55K    12   183  300K   594  959K /var/spool
  73    76   136 2.25K   412  3.32K     9   273  460K 1.32K 2.24M /var/spool
  46    42    82 1.58K   155  2.10K    12   242  530K   976 2.32M /var/spool
  25    83   115 2.03K   170  3.08K     0   376  629K 1.25K 3.52M /var/spool
  24    68   152 1.87K    17  3.11K     4   838 2.23M   841 2.69M /var/spool
  65    57   144 6.65K    49  6.45K    24   490  812K   969  890K /var/spool

vmstat looks like:

kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr s0 s2 -- --   in   sy   cs us sy id
19 0 0 19502744 3999456 366 9912 2194 0 0 0 0 0 0 0 0 24868 112987 58937 49 49 2
5 0 0 19499696 4001416 345 3951 3501 0 0 0 0 0 0 0  0 36075 168337 96305 57 40 3
1 0 0 19492128 3993160 450 4542 2275 0 0 0 0 0 0 0  0 38505 128766 82580 25 46 
29
14 0 0 19488608 3989880 30 207 0 0 0 0  0  0  0  0  0 3042 3932 3106  1 75 24
214 0 0 19488800 3990048 0 2 5 0  0  0  0  5  5  0  0 2647 1717 1767  0 97  3
242 0 0 19489568 3990776 0 1 0 0  0  0  0  0  0  0  0 2548 1057  387  0 100 0
[many second lacuna here]
298 0 0 19489568 3990776 0 0 0 0  0  0  0  1  0  0  0 2523 1086  587  0 100 0
43 0 0 19481184 3980456 1289 31472 2499 0 0 0 0 144 145 0 0 32371 179144 97796 
44 54 2
2 0 0 19481872 3974600 2125 9766 5922 0 0 0 0 96 96 0 0 40929 170587 112274 37 
55 7
0 0 0 19486568 3974504 551 1110 2632 0 0 0 0 107 107 0 0 35579 95488 77269 17 
31 52
0 0 0 19485136 3969520 354 1310 2183 0 0 0 0 125 125 0 0 24527 56481 46909 11 
17 72

Network activity completely stops, as such:


64 bytes from w.x.y.z: icmp_seq=2. time=0. ms
64 bytes from w.x.y.z: icmp_seq=3. time=60. ms
64 bytes from w.x.y.z: icmp_seq=4. time=6497. ms
64 bytes from w.x.y.z: icmp_seq=5. time=5560. ms
64 bytes from w.x.y.z: icmp_seq=6. time=4619. ms
64 bytes from w.x.y.z: icmp_seq=7. time=3692. ms
64 bytes from w.x.y.z: icmp_seq=8. time=2713. ms
64 bytes from w.x.y.z: icmp_seq=9. time=1725. ms
64 bytes from w.x.y.z: icmp_seq=10. time=743. ms
64 bytes from w.x.y.z: icmp_seq=11. time=0. ms
64 bytes from w.x.y.z: icmp_seq=12. time=0. ms

As you can see, no packets are lost...but they all get processed after that 6 
second delay...

Again, only software change is moving from UFS to ZFS, and under very light 
load, we start seeing this behavior... 

I'm not even sure what further telemetry/diagnostic testing I can do to figure 
out what the problem is, especially since it appears that even utilities like 
vmstat stop working when the load goes to 100%. Any insight would be greatly 
appreciated.
--
This messages posted from opensolaris.org

[zfs-code] High sys utilization with zfs mirror (on file create?)

Reply via email to