Re: [zfs-discuss] ZFS slows down over a couple of days

Stephan Budach Wed, 12 Jan 2011 08:50:33 -0800

Am 12.01.11 16:32, schrieb Jeff Savit:

Stephan,
There are a bunch of tools you can use, mostly provided with Solaris11 Express, plus arcstat, arc_summary that are available asdownloads. The latter tools will tell you the size and state of ARC,which may be specific to your issues since you cite memory. For thelist, could you describe the ZFS pool configuration (zpool status),and summarize output from vmstat, iostat, and zpool iostat? Also, itmight be helpful to issue 'prstat -s rss' to see if any process isgrowing its resident memory size. An excellet source of informationis the "ZFS evil tuning guide" (just Google those words), which has awealth of information.
I hope that helps (for a start at least)
  Jeff



On 01/12/11 08:21 AM, Stephan Budach wrote:
Hi all,
I have exchanged my Dell R610 in favor of a Sun Fire 4170 M2 whichhas 32 GB RAM installed. I am running Sol11Expr on this host and Iuse it to primarily serve Netatalk AFP shares. From day one, I havenoticed that the amount of free RAM decereased and along with thatdecrease the overall performance of ZFS decreased as well.
Now, since I am still quite a Solaris newbie, I seem to cannot trackwhere the heck all the memory has gone and why ZFS performs so poorlyafter an uptime of only 5 days.I can reboot Solaris, which I did for testing, and that would bringback the performance to reasonable levels, but otherwiese I am quiteat my witts end.To give some numbers: the ZFS performance decreases down to 1/10th ofthe initial throughput, either read or write.
Anybody having some tips up their sleeves, where I should startlooking for the missing memory?
Cheers,
budy

Sure - here we go. First of all, the zpool configuration:

zpool status -v
  pool: obelixData
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scan: scrub repaired 0 in 15h29m with 0 errors on Mon Nov 15 21:42:52 2010
config:

    NAME                     STATE     READ WRITE CKSUM
    obelixData               ONLINE       0     0     0
      c9t210000D023038FA8d0  ONLINE       0     0     0
      c9t210000D02305FF42d0  ONLINE       0     0     0

errors: No known data errors

This pool consists of two FC LUNS which are exported from two FC RAIDs(no comments on that one, please I am still working on the transision toanother zpool config! ;) )


Next up are arcstat.pl and arc_summary.pl:

perl /usr/local/de.jvm.scripts/arcstat.pl
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
17:13:33     0     0      0     0    0     0    0     0    0    15G   16G
17:13:34    71     0      0     0    0     0    0     0    0    15G   16G
17:13:35     3     0      0     0    0     0    0     0    0    15G   16G
17:13:36   30K     0      0     0    0     0    0     0    0    15G   16G
17:13:37   13K     0      0     0    0     0    0     0    0    15G   16G
17:13:38    72     0      0     0    0     0    0     0    0    15G   16G
17:13:39    12     0      0     0    0     0    0     0    0    15G   16G
17:13:40    45     0      0     0    0     0    0     0    0    15G   16G
17:13:41    57     0      0     0    0     0    0     0    0    15G   16G
<State Changed>
17:13:42  1.3K     8      0     8    0     0    0     6    0    15G   16G
17:13:43    45     0      0     0    0     0    0     0    0    15G   16G
17:13:44  1.5K    15      1    13    0     2   50     4    0    15G   16G
17:13:45   122     0      0     0    0     0    0     0    0    15G   16G
17:13:46    74     0      0     0    0     0    0     0    0    15G   16G
17:13:47    88     0      0     0    0     0    0     0    0    15G   16G
17:13:48   19K    67      0    25    0    42    4    24    0    16G   16G
17:13:49   24K    31      0     0    0    31    9     0    0    15G   16G
17:13:50    41     0      0     0    0     0    0     0    0    15G   16G

perl /usr/local/de.jvm.scripts/arc_summary.pl
System Memory:
     Physical RAM:     32751 MB
     Free Memory :     5615 MB
     LotsFree:     511 MB

ZFS Tunables (/etc/system):
     set zfs:zfs_arc_max = 17179869184

ARC Size:
     Current Size:             16383 MB (arcsize)
     Target Size (Adaptive):   16384 MB (c)
     Min Size (Hard Limit):    2048 MB (zfs_arc_min)
     Max Size (Hard Limit):    16384 MB (zfs_arc_max)

ARC Size Breakdown:
     Most Recently Used Cache Size:      73%     12015 MB (p)
     Most Frequently Used Cache Size:      26%     4368 MB (c-p)

ARC Efficency:
     Cache Access Total:             300030668

Cache Hit Ratio: 92% 277102547 [Defined State forbuffer]Cache Miss Ratio: 7% 22928121 [Undefined State forBuffer]

     REAL Hit Ratio:       84%     253621864       [MRU/MFU Hits Only]

     Data Demand   Efficiency:    98%
     Data Prefetch Efficiency:    26%

    CACHE HITS BY CACHE LIST:

Anon: 4% 12706439 [New Customer, First Cache Hit ]Most Recently Used: 12% 35919656 (mru) [Return Customer ]Most Frequently Used: 78% 217702208 (mfu) [Frequent Customer ]Most Recently Used Ghost: 0% 2469558 (mru_ghost) [Return Customer Evicted, Now Back ]Most Frequently Used Ghost: 2% 8304686 (mfu_ghost) [Frequent Customer Evicted, Now Back ]

    CACHE HITS BY DATA TYPE:
      Demand Data:                38%      105648747
      Prefetch Data:               1%      3607719
      Demand Metadata:            53%      147826874
      Prefetch Metadata:           7%      20019207
    CACHE MISSES BY DATA TYPE:
      Demand Data:                 9%      2130351
      Prefetch Data:              43%      9907868
      Demand Metadata:            23%      5396413
      Prefetch Metadata:          23%      5493489
---------------------------------------------

So, these show me, that ZFS ARC actually shouldn't have grown over 16GB,right?


Now for vmstat/iostat/prstat:

vmstat 10
 kthr      memory            page            disk          faults      cpu

r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy csus sy id0 0 0 22210512 9676332 29 215 0 0 0 0 1 -0 8 8 40 6460 13462 54092 1 970 1 0 18193100 5752352 32 305 0 0 0 0 0 1 15 15 19 3245 29489 17860 1 980 1 0 18193112 5752084 11 223 0 0 0 0 0 0 5 5 19 3628 30364 22190 2 970 1 0 18193132 5752568 1 139 0 0 0 0 0 0 21 20 24 4125 31415 24320 2 980 1 0 18193140 5752592 25 89 0 0 0 0 0 0 0 0 19 3070 23238 15370 1 990 1 0 18193204 5752316 10 167 0 0 0 0 0 0 8 8 29 4117 14217 28100 2 98

As you can see, the free memory shrinked from 32 GB to approx 5 GB.Substracting the ZFS ARC from the 32 GB, that'd leave another 16 GB forSolaris and its crops. So actually there have been 11 GB consumed, whichI can't pinpoint using prstat. Although 5 GB od freem RAM should bestill enough, the performance drops as soon as the free memory fallsbelow 10 GB, which is pretty much the case after 2 work days.


iostat 10
   tty        sd0           sd1           sd2           sd3            cpu

tin tout kps tps serv kps tps serv kps tps serv kps tps serv ussy wt id0 24 0 0 0 61 8 3 61 8 3 1107 40 6 21 0 970 97 0 0 0 160 16 3 160 16 3 170 15 5 02 0 980 175 0 0 0 168 20 3 168 20 3 105 15 3 02 0 980 111 0 0 0 0 0 0 0 0 0 90 14 4 01 0 990 62 0 0 0 146 13 3 146 13 3 422 31 5 02 0 98


iostat seems to show a quite idle system...

Last, but not least: zpool iostat barely sees read or write throughputsabove 35 MB/s.


Btw, this is regardless whether I use netatlk or smb for sharing.

Thanks,
budy

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS slows down over a couple of days

Reply via email to