Re: [zfs-discuss] repost - high read iops

Richard Elling Sun, 27 Dec 2009 08:51:53 -0800

OK, I'll take a stab at it...

On Dec 26, 2009, at 9:52 PM, Brad wrote:

repost - Sorry for ccing the other forums.
I'm running into a issue where there seems to be a high number ofread iops hitting disks and physical free memory is fluctuatingbetween 200MB -> 450MB out of 16GB total. We have the l2arcconfigured on a 32GB Intel X25-E ssd and slog on another 32GB X25-Essd.


OK, this shows that memory is being used... a good thing.

According to our tester, Oracle writes are extremely slow (highlatency).


OK, this is a workable problem statement... another good thing.

Below is a snippet of iostat:

r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
4898.3 34.2 23.2 1.4 0.1 385.3 0.0 78.1 0 1246 c1
0.0 0.8 0.0 0.0 0.0 0.0 0.0 16.0 0 1 c1t0d0
401.7 0.0 1.9 0.0 0.0 31.5 0.0 78.5 1 100 c1t1d0
421.2 0.0 2.0 0.0 0.0 30.4 0.0 72.3 1 98 c1t2d0
403.9 0.0 1.9 0.0 0.0 32.0 0.0 79.2 1 100 c1t3d0
406.7 0.0 2.0 0.0 0.0 33.0 0.0 81.3 1 100 c1t4d0
414.2 0.0 1.9 0.0 0.0 28.6 0.0 69.1 1 98 c1t5d0
406.3 0.0 1.8 0.0 0.0 32.1 0.0 79.0 1 100 c1t6d0
404.3 0.0 1.9 0.0 0.0 31.9 0.0 78.8 1 100 c1t7d0
404.1 0.0 1.9 0.0 0.0 34.0 0.0 84.1 1 100 c1t8d0
407.1 0.0 1.9 0.0 0.0 31.2 0.0 76.6 1 100 c1t9d0
407.5 0.0 2.0 0.0 0.0 33.2 0.0 81.4 1 100 c1t10d0
402.8 0.0 2.0 0.0 0.0 33.5 0.0 83.2 1 100 c1t11d0
408.9 0.0 2.0 0.0 0.0 32.8 0.0 80.3 1 100 c1t12d0
9.6 10.8 0.1 0.9 0.0 0.4 0.0 20.1 0 17 c1t13d0
0.0 22.7 0.0 0.5 0.0 0.5 0.0 22.8 0 33 c1t14d0


You are getting 400+ IOPS @ 4 KB out of HDDs.  Count your lucky stars.
Don't expect that kind of performance as normal, it is much better than
normal.

Is this an indicator that we need more physical memory? From http://blogs.sun.com/brendan/entry/test, the order that a read request is satisfied is:


   0) Oracle SGA

1) ARC
2) vdev cache of L2ARC devices
3) L2ARC devices
4) vdev cache of disks
5) disks

Using arc_summary.pl, we determined that prefletch was not helpingmuch so we disabled.


CACHE HITS BY DATA TYPE:
Demand Data: 22% 158853174
Prefetch Data: 17% 123009991 <---not helping???
Demand Metadata: 60% 437439104
Prefetch Metadata: 0% 2446824

The write iops started to kick in more and latency reduced onspinning disks:


0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
1629.0 968.0 17.4 7.3 0.0 35.9 0.0 13.8 0 1088 c1
0.0 1.9 0.0 0.0 0.0 0.0 0.0 1.7 0 0 c1t0d0
126.7 67.3 1.4 0.2 0.0 2.9 0.0 14.8 0 90 c1t1d0
129.7 76.1 1.4 0.2 0.0 2.8 0.0 13.7 0 90 c1t2d0
128.0 73.9 1.4 0.2 0.0 3.2 0.0 16.0 0 91 c1t3d0
128.3 79.1 1.3 0.2 0.0 3.6 0.0 17.2 0 92 c1t4d0
125.8 69.7 1.3 0.2 0.0 2.9 0.0 14.9 0 89 c1t5d0
128.3 81.9 1.4 0.2 0.0 2.8 0.0 13.1 0 89 c1t6d0
128.1 69.2 1.4 0.2 0.0 3.1 0.0 15.7 0 93 c1t7d0
128.3 80.3 1.4 0.2 0.0 3.1 0.0 14.7 0 91 c1t8d0
129.2 69.3 1.4 0.2 0.0 3.0 0.0 15.2 0 90 c1t9d0
130.1 80.0 1.4 0.2 0.0 2.9 0.0 13.6 0 89 c1t10d0
126.2 72.6 1.3 0.2 0.0 2.8 0.0 14.2 0 89 c1t11d0
129.7 81.0 1.4 0.2 0.0 2.7 0.0 12.9 0 88 c1t12d0
90.4 41.3 1.0 4.0 0.0 0.2 0.0 1.2 0 6 c1t13d0
0.0 24.3 0.0 1.2 0.0 0.0 0.0 0.2 0 0 c1t14d0


latency is reduced, but you are also now only seeing 200 IOPS,
not 400+ IOPS.  This is closer to what you would see as a max
for HDDs.

I cannot tell which device is the cache device.  I would expect
to see one disk with significantly more reads than the others.
What do the l2arc stats show?

Is it true if your MFU stats start to go over 50% then more memoryis needed?


That is a good indicator. It means that most of the cache entries are
frequently used. Grow your SGA and you should see this go down.

CACHE HITS BY CACHE LIST:
Anon: 10% 74845266 [ New Customer, First Cache Hit ]
Most Recently Used: 19% 140478087 (mru) [ Return Customer ]
Most Frequently Used: 65% 475719362 (mfu) [ Frequent Customer ]
Most Recently Used Ghost: 2% 20785604 (mru_ghost) [ Return CustomerEvicted, Now Back ]Most Frequently Used Ghost: 1% 9920089 (mfu_ghost) [ FrequentCustomer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
Demand Data: 22% 158852935
Prefetch Data: 17% 123009991
Demand Metadata: 60% 437438658
Prefetch Metadata: 0% 2446824
My theory is since there's not enough memory for the arc to cachedata, its hits the l2arc where it can't find data and has to querythe disk for the request. This causes contention between reads andwrites causing the service times to inflate.


If you have a choice of where to use memory, always choose closer to

the application. Try a larger SGA first. Be aware of large pagestealing --

consider increasing the SGA immediately after a reboot and before the
database or applications are started.
 -- richard

uname: 5.10 Generic_141445-09 i86pc i386 i86pc
Sun Fire X4270: 11+1 raidz (SAS)
                       l2arc Intel X25-E
                       slog Intel X25-E
Thoughts?
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] repost - high read iops

Reply via email to