Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-13 Thread Stephan Budach

Hi all,

thanks a lot for your suggestions. I have checked all of them and 
neither the network itself nor any other check indicated any problem.


Alas, I think I know what is going on… ehh… my current zpool has two 
vdevs that are actually not even sized, as shown by zpool iostat -v:


zpool iostat -v obelixData 5
capacity operations bandwidth
pool alloc free read write read write
--- - - - - - -
obelixData 13,1T 5,84T 36 227 348K 21,5M
c9t21D023038FA8d0 6,25T 59,3G 21 98 269K 9,25M
c9t21D02305FF42d0 6,84T 5,78T 15 129 79,2K 12,3M
--- - - - - - -


So, the small vdev is actually 99+% full, which is likely to be the root 
cause for this issue. Especially, since RAIDs tend to take tremendous 
performance hits, when they exceed 90% space utilization.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread Stephan Budach

Hi all,

I have exchanged my Dell R610 in favor of a Sun Fire 4170 M2 which has 
32 GB RAM installed. I am running Sol11Expr on this host and I use it to 
primarily serve Netatalk AFP shares. From day one, I have noticed that 
the amount of free RAM decereased and along with that  decrease the 
overall performance of ZFS decreased as well.


Now, since I am still quite a Solaris newbie, I seem to cannot track 
where the heck all the memory has gone and why ZFS performs so poorly 
after an uptime of only 5 days.
I can reboot Solaris, which I did for testing, and that would bring back 
the performance to reasonable levels, but otherwiese I am quite at my 
witts end.
To give some numbers: the ZFS performance decreases down to 1/10th of 
the initial throughput, either read or write.


Anybody having some tips up their sleeves, where I should start looking 
for the missing memory?


Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread Jeff Savit

 Stephan,

There are a bunch of tools you can use, mostly provided with Solaris 11 
Express, plus arcstat, arc_summary that are available as downloads.  The 
latter tools will tell you the size and state of ARC, which may be 
specific to your issues since you cite memory.   For the list, could you 
describe the ZFS pool configuration (zpool status), and summarize output 
from vmstat, iostat, and zpool iostat?  Also, it might be helpful to 
issue 'prstat -s rss' to see if any process is growing its resident 
memory size.  An excellet source of information is the ZFS evil tuning 
guide (just Google those words), which has a wealth of information.


I hope that helps (for a start at least)
  Jeff



On 01/12/11 08:21 AM, Stephan Budach wrote:

Hi all,

I have exchanged my Dell R610 in favor of a Sun Fire 4170 M2 which has 
32 GB RAM installed. I am running Sol11Expr on this host and I use it 
to primarily serve Netatalk AFP shares. From day one, I have noticed 
that the amount of free RAM decereased and along with that  decrease 
the overall performance of ZFS decreased as well.


Now, since I am still quite a Solaris newbie, I seem to cannot track 
where the heck all the memory has gone and why ZFS performs so poorly 
after an uptime of only 5 days.
I can reboot Solaris, which I did for testing, and that would bring 
back the performance to reasonable levels, but otherwiese I am quite 
at my witts end.
To give some numbers: the ZFS performance decreases down to 1/10th of 
the initial throughput, either read or write.


Anybody having some tips up their sleeves, where I should start 
looking for the missing memory?


Cheers,
budy


--


*Jeff Savit* | Principal Sales Consultant
Phone: 602.824.6275 | Email: jeff.sa...@oracle.com | Blog: 
http://blogs.sun.com/jsavit

Oracle North America Commercial Hardware
Operating Environments  Infrastructure S/W Pillar
2355 E Camelback Rd | Phoenix, AZ 85016



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread Stephan Budach

Am 12.01.11 16:32, schrieb Jeff Savit:

Stephan,

There are a bunch of tools you can use, mostly provided with Solaris 
11 Express, plus arcstat, arc_summary that are available as 
downloads.  The latter tools will tell you the size and state of ARC, 
which may be specific to your issues since you cite memory.   For the 
list, could you describe the ZFS pool configuration (zpool status), 
and summarize output from vmstat, iostat, and zpool iostat?  Also, it 
might be helpful to issue 'prstat -s rss' to see if any process is 
growing its resident memory size.  An excellet source of information 
is the ZFS evil tuning guide (just Google those words), which has a 
wealth of information.


I hope that helps (for a start at least)
  Jeff



On 01/12/11 08:21 AM, Stephan Budach wrote:

Hi all,

I have exchanged my Dell R610 in favor of a Sun Fire 4170 M2 which 
has 32 GB RAM installed. I am running Sol11Expr on this host and I 
use it to primarily serve Netatalk AFP shares. From day one, I have 
noticed that the amount of free RAM decereased and along with that  
decrease the overall performance of ZFS decreased as well.


Now, since I am still quite a Solaris newbie, I seem to cannot track 
where the heck all the memory has gone and why ZFS performs so poorly 
after an uptime of only 5 days.
I can reboot Solaris, which I did for testing, and that would bring 
back the performance to reasonable levels, but otherwiese I am quite 
at my witts end.
To give some numbers: the ZFS performance decreases down to 1/10th of 
the initial throughput, either read or write.


Anybody having some tips up their sleeves, where I should start 
looking for the missing memory?


Cheers,
budy



Sure - here we go. First of all, the zpool configuration:

zpool status -v
  pool: obelixData
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scan: scrub repaired 0 in 15h29m with 0 errors on Mon Nov 15 21:42:52 2010
config:

NAME STATE READ WRITE CKSUM
obelixData   ONLINE   0 0 0
  c9t21D023038FA8d0  ONLINE   0 0 0
  c9t21D02305FF42d0  ONLINE   0 0 0

errors: No known data errors

This pool consists of two FC LUNS which are exported from two FC RAIDs 
(no comments on that one, please I am still working on the transision to 
another zpool config! ;) )


Next up are arcstat.pl and arc_summary.pl:

perl /usr/local/de.jvm.scripts/arcstat.pl
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
17:13:33 0 0  0 00 00 0015G   16G
17:13:3471 0  0 00 00 0015G   16G
17:13:35 3 0  0 00 00 0015G   16G
17:13:36   30K 0  0 00 00 0015G   16G
17:13:37   13K 0  0 00 00 0015G   16G
17:13:3872 0  0 00 00 0015G   16G
17:13:3912 0  0 00 00 0015G   16G
17:13:4045 0  0 00 00 0015G   16G
17:13:4157 0  0 00 00 0015G   16G
State Changed
17:13:42  1.3K 8  0 80 00 6015G   16G
17:13:4345 0  0 00 00 0015G   16G
17:13:44  1.5K15  1130 2   50 4015G   16G
17:13:45   122 0  0 00 00 0015G   16G
17:13:4674 0  0 00 00 0015G   16G
17:13:4788 0  0 00 00 0015G   16G
17:13:48   19K67  025042424016G   16G
17:13:49   24K31  0 00319 0015G   16G
17:13:5041 0  0 00 00 0015G   16G

perl /usr/local/de.jvm.scripts/arc_summary.pl
System Memory:
 Physical RAM: 32751 MB
 Free Memory : 5615 MB
 LotsFree: 511 MB

ZFS Tunables (/etc/system):
 set zfs:zfs_arc_max = 17179869184

ARC Size:
 Current Size: 16383 MB (arcsize)
 Target Size (Adaptive):   16384 MB (c)
 Min Size (Hard Limit):2048 MB (zfs_arc_min)
 Max Size (Hard Limit):16384 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  73% 12015 MB (p)
 Most Frequently Used Cache Size:  26% 4368 MB (c-p)

ARC Efficency:
 Cache Access Total: 300030668
 Cache Hit Ratio:  92% 277102547   [Defined State for 
buffer]
 Cache Miss Ratio:  7% 22928121   [Undefined State for 
Buffer]

 REAL Hit Ratio:   84% 253621864   [MRU/MFU Hits Only]

 Data Demand   Efficiency:   

Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread SR
You may need to adjust zfs_arc_max in /etc/system to avoid memory contention


http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.htm

Suresh
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread Stephan Budach

Am 12.01.11 18:49, schrieb SR:

You may need to adjust zfs_arc_max in /etc/system to avoid memory contention


http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.htm

Suresh

I though I had that done through this in /etc/system:

set zfs:zfs_arc_max = 17179869184


I do also think that arc_summary.pl showed exactly that...

Cheers,
budy


--
Stephan Budach
Jung von Matt/it-services GmbH
Glashüttenstraße 79
20357 Hamburg

Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.bud...@jvm.de
Internet: http://www.jvm.com

Geschäftsführer: Ulrich Pallas, Frank Wilhelm
AG HH HRB 98380

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS slows down over a couple of days

2011-01-12 Thread Marion Hakanson
Stephan,

The vmstat shows you are not actually short of memory;  The pi and po
columns are zero, so the system is not having to do any paging, and it seems
unlike the system is slow directly because of RAM shortage.  With the ARC,
it's not unusual for vmstat to show little free memory, but the system will
give up that RAM when an application asks for it.  You can tell if this is
happening a lot by:
echo ::arc | mdb -k | grep throttle

If the value of memory_throttle_count is large, that will indicate that
apps are often asking the kernel to give up ARC memory.

Also, as you said, the iostat figures look idle.  You can tell more
using iostat -xn 1, which will give service times  percent-busy
figures for the actual devices.

It could be that something about the networking involved is what is
actually slow.  You could find out if it's a local bottleneck by trying
some simple I/O tests on the server itself, maybe:
dd if=/dev/zero of=/file/in/zpool bs=1024k
and watching what iostat shows, etc.

Another test is to try a network-only test, maybe using ttcp between
the server and a client.  This could tell you if it's network or storage
that's causing the slow-down.  If you don't have ttcp, something silly
like, on a client running:
dd if=/dev/zero bs=1024k | ssh -c blowfish server dd of=/dev/null 
bs=1024k

You can watch network throughput on the server using:
dladm show-link -s -i 1

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss