Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
> Thus far there is no evidence that there is anything wrong with your
> storage arrays, or even with zfs. The problem seems likely to be
> somewhere else in the kernel.

Agreed.  And I tend to think that the problem lays somewhere in the LDOM 
software.  I mainly just wanted to get some experienced eyes on the problem to 
see if anything sticks out before I go through the trouble of reinstalling the 
system without LDOMs (the original need for VMs in this application no longer 
exists).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Bob Friesenhahn

On Tue, 19 Jan 2010, Scott Duckworth wrote:


No errors reported on any disks.

Nothing sticks out in /var/adm/messages on either the primary or cs0 domain.


Thus far there is no evidence that there is anything wrong with your 
storage arrays, or even with zfs. The problem seems likely to be 
somewhere else in the kernel.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
No errors reported on any disks.

$ iostat -xe
 extended device statistics  errors --- 
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot 
vdc0  0.65.6   25.0   33.5  0.0  0.1   17.3   0   2   0   0   0   0 
vdc1 78.1   24.4 3199.2   68.0  0.0  4.4   43.3   0  20   0   0   0   0 
vdc2 78.0   24.6 3187.6   67.6  0.0  4.5   43.5   0  20   0   0   0   0 
vdc3 78.1   24.4 3196.0   67.9  0.0  4.5   43.5   0  21   0   0   0   0 
vdc4 78.2   24.5 3189.8   67.6  0.0  4.5   43.7   0  21   0   0   0   0 
vdc5 78.3   24.4 3200.3   67.9  0.0  4.5   43.5   0  21   0   0   0   0 
vdc6 78.4   24.6 3186.5   67.7  0.0  4.5   43.5   0  21   0   0   0   0 
vdc7 76.4   25.9 3233.0   67.4  0.0  4.2   40.7   0  20   0   0   0   0 
vdc8 76.7   26.0 3222.5   67.1  0.0  4.2   41.1   0  21   0   0   0   0 
vdc9 76.5   26.0 3233.9   67.7  0.0  4.2   40.8   0  20   0   0   0   0 
vdc1076.5   25.7 3221.6   67.2  0.0  4.2   41.5   0  21   0   0   0   0 
vdc1176.4   25.9 3228.2   67.4  0.0  4.2   41.1   0  20   0   0   0   0 
vdc1276.4   26.1 3216.2   67.4  0.0  4.3   41.6   0  21   0   0   0   0 
vdc13 0.08.70.3  248.4  0.0  0.01.8   0   0   0   0   0   0 
vdc1495.38.2 2919.3   28.2  0.0  2.5   24.3   0  21   0   0   0   0 
vdc1595.99.4 2917.6   26.2  0.0  2.1   19.7   0  19   0   0   0   0 
vdc1695.38.0 2924.3   28.2  0.0  2.6   25.5   0  22   0   0   0   0 
vdc1796.19.4 2920.5   26.2  0.0  2.0   19.3   0  19   0   0   0   0 
vdc1895.48.2 2923.3   28.2  0.0  2.4   23.4   0  21   0   0   0   0 
vdc1995.89.3 2903.2   26.2  0.0  2.5   24.3   0  21   0   0   0   0 
vdc2095.08.4 2877.6   28.1  0.0  2.5   23.9   0  21   0   0   0   0 
vdc2195.99.5 2848.2   26.2  0.0  2.6   24.3   0  21   0   0   0   0 
vdc2295.08.4 2874.3   28.1  0.0  2.5   23.7   0  21   0   0   0   0 
vdc2395.79.5 2854.0   26.2  0.0  2.5   23.4   0  21   0   0   0   0 
vdc2495.18.4 2883.9   28.1  0.0  2.4   23.5   0  21   0   0   0   0 
vdc2595.69.4 2839.3   26.2  0.0  2.8   26.5   0  22   0   0   0   0 
vdc26 0.06.90.2  319.8  0.0  0.02.6   0   0   0   0   0   0 

Nothing sticks out in /var/adm/messages on either the primary or cs0 domain.

The SSD is a recent addition (~3 months ago), and was added in an attempt to 
counteract the poor performance we were already seeing without the SSD.

I will check firmware versions tomorrow.  I do recall updating the firmware 
about 8 months ago when we upgraded CAM to support the new J4200 array.  At the 
time, it was the most recent CAM release available, not the outdated version 
that shipped on the CD in the array package.

My supervisor pointed me to http://forums.sun.com/thread.jspa?threadID=5416833 
which describes what seems to be an identical problem.  It references 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6547651 which was 
reported to be fixed in Solaris 10 update 4.  No solution was posted, but it 
was pointed out that a similar configuration without LDOMs in the mix provided 
superb performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Bob Friesenhahn

On Tue, 19 Jan 2010, Scott Duckworth wrote:


[Cross-posting to ldoms-discuss]

We are occasionally seeing massive time-to-completions for I/O 
requests on ZFS file systems on a Sun T5220 attached to a Sun 
StorageTek 2540 and a Sun J4200, and using a SSD drive as a ZIL 
device.  Primary access to this system is via NFS, and with NFS 
COMMITs blocking until the request has been sent to disk, 
performance has been deplorable.  The NFS server is a LDOM domain on 
the T5220.


What is the output of 'zpool status' for this pool?  What is the 
output of 'iostat -xe'?


Have you verified that your StorageTek 2540 firmware is up to date? 
BTFW that new firmware comes with new CAM software, which does not 
seem to be announced in any useful fashion so you won't know about it 
unless you poll the Sun Downloads site. I have not seen any stalling 
problems with my StorageTek 2540 here.


I agree with Ray Van Dolson that the evidence supplied thus far points 
to an issue with the SSD.  Perhaps the system is noticing a problem 
and is continually resetting it.  Check for messages in 
/var/adm/messages.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Ray Van Dolson
On Tue, Jan 19, 2010 at 02:24:11PM -0800, Scott Duckworth wrote:
> [Cross-posting to ldoms-discuss]
> 
> We are occasionally seeing massive time-to-completions for I/O
> requests on ZFS file systems on a Sun T5220 attached to a Sun
> StorageTek 2540 and a Sun J4200, and using a SSD drive as a ZIL
> device.  Primary access to this system is via NFS, and with NFS
> COMMITs blocking until the request has been sent to disk, performance
> has been deplorable.  The NFS server is a LDOM domain on the T5220.
> 
> To give an idea of how bad the situation is, iotop from the DTrace
> Toolkit occasionally reports single I/O requests to 15k RPM FC disks
> that take more than 60 seconds to complete, and even requests to a
> SSD drive that take over 10 seconds to complete.  It's not uncommon
> to open a small text file using vim (or similar editor) and nothing
> to pop up for 10-30 seconds.  Browsing the web becomes a chore, as
> the browser locks up for a few seconds after doing anything.
> 
> I have a full write-up of the situation at
> http://www.cs.clemson.edu/~duckwos/zfs-performance/.  Any thoughts or
> comments are welcome.  -- 

Could your SSD be having problems?  Got another to swap in and compare
against?

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
[Cross-posting to ldoms-discuss]

We are occasionally seeing massive time-to-completions for I/O requests on ZFS 
file systems on a Sun T5220 attached to a Sun StorageTek 2540 and a Sun J4200, 
and using a SSD drive as a ZIL device.  Primary access to this system is via 
NFS, and with NFS COMMITs blocking until the request has been sent to disk, 
performance has been deplorable.  The NFS server is a LDOM domain on the T5220.

To give an idea of how bad the situation is, iotop from the DTrace Toolkit 
occasionally reports single I/O requests to 15k RPM FC disks that take more 
than 60 seconds to complete, and even requests to a SSD drive that take over 10 
seconds to complete.  It's not uncommon to open a small text file using vim (or 
similar editor) and nothing to pop up for 10-30 seconds.  Browsing the web 
becomes a chore, as the browser locks up for a few seconds after doing anything.

I have a full write-up of the situation at 
http://www.cs.clemson.edu/~duckwos/zfs-performance/.  Any thoughts or comments 
are welcome.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss