Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Bob Friesenhahn

On Tue, 15 Sep 2009, Dale Ghent wrote:


As someone who currently faces kernel panics with recent U7+ kernel patches 
(on AMD64 and SPARC) related to PCI bus upset, I expect that Sun will take 
the time to make sure that the implementation is as good as it can be and 
is thoroughly tested before release.


Are you referring the the same testing that gained you this PCI panic feature 
in s10u7?


No.  The system worked with the kernel patch corresponding to baseline 
S10U7.  Problems started with later kernel patches (which seem to be 
much less tested).  Of course there could actually be a real hardware 
problem.


Regardless, when the integrity of our data is involved, I prefer to 
wait for more testing rather than to potentially have to recover the 
pool from backup.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Dale Ghent

On Sep 15, 2009, at 6:28 PM, Bob Friesenhahn wrote:


On Tue, 15 Sep 2009, Dale Ghent wrote:


Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for  
at least 6 months from now, and with a huge environment, I try hard  
not to live off of IDRs.


As someone who currently faces kernel panics with recent U7+ kernel  
patches (on AMD64 and SPARC) related to PCI bus upset, I expect that  
Sun will take the time to make sure that the implementation is as  
good as it can be and is thoroughly tested before release.


Are you referring the the same testing that gained you this PCI panic  
feature in s10u7?


Testing is a no-brainer, and I would expect that there already exists  
some level of assurance that a CR fix is correct at the point of  
putback.


But I've dealt with many bugs both very recently and long in the past  
where a fix has existed in nevada for months, even a year, before I  
got bit by the same bug in s10 and then had to go through the support  
channels to A) convince whomever I'm talking to that, yes, I'm hitting  
this bug, B) yes, there is a fix, and then C) pretty please can I have  
an IDR


Just this week I'm wrapping up testing of a IDR which addresses a  
e1000g hardware errata that was fixed in onnv earlier this year in  
February. For something that addresses a hardware issue on a Intel  
chipset used on shipping Sun servers, one would think that Sustaining  
would be on the ball and get that integrated ASAP. But the current  
mode of operation appears to be "no CR, no backport", which leaves us  
customers needlessly running into bugs and then begging for their  
fixes... or hearing the dreaded "oh that fix will be available two  
updates from now." Not cool.


/dale



/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Bob Friesenhahn

On Tue, 15 Sep 2009, Dale Ghent wrote:


Question though... why is bug fix that can be a watershed for 
performance be held back for so long? s10u9 won't be available for 
at least 6 months from now, and with a huge environment, I try hard 
not to live off of IDRs.


As someone who currently faces kernel panics with recent U7+ kernel 
patches (on AMD64 and SPARC) related to PCI bus upset, I expect that 
Sun will take the time to make sure that the implementation is as good 
as it can be and is thoroughly tested before release.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Richard Elling

Reference below...

On Sep 15, 2009, at 2:38 PM, Dale Ghent wrote:


On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:



On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:


On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched  
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also  
be in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in  
this CR without slowing down other prefetch patterns.  Some  
kstats have also been added to help improve the observability of  
ZFS file prefetching.


Awesome that the fix exists. I've been having a hell of a time  
with device-level prefetch on my iscsi clients causing tons of  
ultimately useless IO and have resorted to setting  
zfs_vdev_cache_max=1.


This only affects metadata. Wouldn't it be better to disable
prefetching for data?


Well, that's a surprise to me, but the zfs_vdev_cache_max=1 did  
provide relief.


Just a general description of my environment:

My setup consists of several s10uX iscsi clients which get LUNs from  
a pairs of thumpers. Each thumper pair exports identical LUNs to  
each iscsi client, and the client in turn mirrors each LUN pair  
inside a local zpool. As more space is needed on a client, a new LUN  
is created on the pair of thumpers, exported to the iscsi client,  
which then picks it up and we add a new mirrored vdev to the  
client's existing zpool.


This is so we have data redundancy across chassis, so if one thumper  
were to fail or need patching, etc, the iscsi clients just see one  
of side of their mirrors drop out.


The problem that we observed on the iscsi clients was that, when  
viewing things through 'zpool iostat -v', far more IO was being  
requested from the LUs than was being registered for the vdev those  
LUs were a member of.


Being that that was a iscsi setup with stock thumpers (no SSD ZIL,  
L2ARC) serving the LUs, this apparently overhead caused far more  
uneccessary disk IO on the thumpers, thus starving out IO for data  
that was actually needed.


The working set is lots of small-ish files, entirely random IO.

If zfs_vdev_cache_max only affects metadata prefetches, which  
parameter affects data prefetches ?


There are two main areas for prefetch: at the transactional object  
layer (DMU) and the pooled
storage level (VDEV).  zfs_vdev_cache_max works at the VDEV level,  
obviously. The
DMU knows more about the context of the data and is where the  
intelligent prefetching

algorithm works.

You can easily observe the VDEV cache statistics with kstat:
# kstat -n vdev_cache_stats
module: zfs instance: 0
name:   vdev_cache_statsclass:misc
crtime  38.83342625
delegations 14030
hits105169
misses  59452
snaptime4564628.18130739

This represents a 59% cache hit rate, which is pretty decent.  But you
will notice far fewer delegations+hits+misses than real IOPS because  
it is

only caching metadata.

Unfortunately, there is not a kstat for showing the DMU cache stats.
But a DTrace script can be written or, even easier, lockstat will show
if you are spending much time in the zfetch_* functions.  More details
are in the Evil Tuning Guide, including how to set zfs_prefetch_disable
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide



I have to admit that disabling device-level prefetching was a shot  
in the dark, but it did result in drastically reduced contention on  
the thumpers.


That is a little bit surprising.  I would expect little metadata  
activity for iscsi
service. It would not be surprising for older Solaris 10 releases,  
though.

It was fixed in NV b70, circa July 2007.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Dale Ghent

On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:



On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:


On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched  
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also  
be in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in  
this CR without slowing down other prefetch patterns.  Some kstats  
have also been added to help improve the observability of ZFS file  
prefetching.


Awesome that the fix exists. I've been having a hell of a time with  
device-level prefetch on my iscsi clients causing tons of  
ultimately useless IO and have resorted to setting  
zfs_vdev_cache_max=1.


This only affects metadata. Wouldn't it be better to disable
prefetching for data?


Well, that's a surprise to me, but the zfs_vdev_cache_max=1 did  
provide relief.


Just a general description of my environment:

My setup consists of several s10uX iscsi clients which get LUNs from a  
pairs of thumpers. Each thumper pair exports identical LUNs to each  
iscsi client, and the client in turn mirrors each LUN pair inside a  
local zpool. As more space is needed on a client, a new LUN is created  
on the pair of thumpers, exported to the iscsi client, which then  
picks it up and we add a new mirrored vdev to the client's existing  
zpool.


This is so we have data redundancy across chassis, so if one thumper  
were to fail or need patching, etc, the iscsi clients just see one of  
side of their mirrors drop out.


The problem that we observed on the iscsi clients was that, when  
viewing things through 'zpool iostat -v', far more IO was being  
requested from the LUs than was being registered for the vdev those  
LUs were a member of.


Being that that was a iscsi setup with stock thumpers (no SSD ZIL,  
L2ARC) serving the LUs, this apparently overhead caused far more  
uneccessary disk IO on the thumpers, thus starving out IO for data  
that was actually needed.


The working set is lots of small-ish files, entirely random IO.

If zfs_vdev_cache_max only affects metadata prefetches, which  
parameter affects data prefetches ?


I have to admit that disabling device-level prefetching was a shot in  
the dark, but it did result in drastically reduced contention on the  
thumpers.


/dale





Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for  
at least 6 months from now, and with a huge environment, I try hard  
not to live off of IDRs.


Am I the only one that thinks this is way too conservative? It's  
just maddening to know that a highly beneficial fix is out there,  
but its release is based on time rather than need. Sustaining  
really needs to be more proactive when it comes to this stuff.


/dale





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Richard Elling


On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:


On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched  
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also  
be in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in  
this CR without slowing down other prefetch patterns.  Some kstats  
have also been added to help improve the observability of ZFS file  
prefetching.


Awesome that the fix exists. I've been having a hell of a time with  
device-level prefetch on my iscsi clients causing tons of ultimately  
useless IO and have resorted to setting zfs_vdev_cache_max=1.


This only affects metadata. Wouldn't it be better to disable
prefetching for data?
 -- richard



Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for  
at least 6 months from now, and with a huge environment, I try hard  
not to live off of IDRs.


Am I the only one that thinks this is way too conservative? It's  
just maddening to know that a highly beneficial fix is out there,  
but its release is based on time rather than need. Sustaining really  
needs to be more proactive when it comes to this stuff.


/dale





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-15 Thread Dale Ghent

On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state  
at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also be  
in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in this  
CR without slowing down other prefetch patterns.  Some kstats have  
also been added to help improve the observability of ZFS file  
prefetching.


Awesome that the fix exists. I've been having a hell of a time with  
device-level prefetch on my iscsi clients causing tons of ultimately  
useless IO and have resorted to setting zfs_vdev_cache_max=1.


Question though... why is bug fix that can be a watershed for  
performance be held back for so long? s10u9 won't be available for at  
least 6 months from now, and with a huge environment, I try hard not  
to live off of IDRs.


Am I the only one that thinks this is way too conservative? It's just  
maddening to know that a highly beneficial fix is out there, but its  
release is based on time rather than need. Sustaining really needs to  
be more proactive when it comes to this stuff.


/dale





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-13 Thread Christian Kendi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

is already a diff for the source available?

El Sep 11, 2009, a las 4:02 PM, Rich Morris escribió:


On 09/10/09 16:22, en...@businessgrade.com wrote:

Quoting Bob Friesenhahn :


On Thu, 10 Sep 2009, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched   
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will  
also  be in Solaris 10 Update 9. This fix speeds up the  
sequential  prefetch pattern described in this CR without slowing  
down other  prefetch patterns.  Some kstats have also been added  
to help  improve the observability of ZFS file prefetching.


Excellent.  What level of read improvement are you seeing?  Is the
prefetch rate improved, or does the fix simply avoid losing the
prefetch?

Thanks,

Bob


Is this fixed in snv_122 or something else?


snv_124.   See 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6859997

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFKrRJnp+9ff145KVIRAhErAKCYKnv6Fn/Vn61Fa2MYpl9S+P9KGACeJUMA
g+RhFTRl9NdI0eNOx5aZaXw=
=QAX8
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-11 Thread Rich Morris

On 09/10/09 16:22, en...@businessgrade.com wrote:

Quoting Bob Friesenhahn :


On Thu, 10 Sep 2009, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched  
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also  
be in Solaris 10 Update 9. This fix speeds up the sequential  
prefetch pattern described in this CR without slowing down other  
prefetch patterns.  Some kstats have also been added to help  
improve the observability of ZFS file prefetching.


Excellent.  What level of read improvement are you seeing?  Is the
prefetch rate improved, or does the fix simply avoid losing the
prefetch?

Thanks,

Bob


Is this fixed in snv_122 or something else?


snv_124.   See 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6859997


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread Bob Friesenhahn

On Thu, 10 Sep 2009, Rich Morris wrote:


Excellent.  What level of read improvement are you seeing?  Is the prefetch 
rate improved, or does the fix simply avoid losing the prefetch?


This fix avoids using a prefetch stream when it is no longer valid.  BTW, ZFS 
prefetch appears to work well for most prefetch patterns.  But this CR found 
a pattern that should have worked well but did not.


It seems that after doing a fresh mount, the zfs prefetch is not quite 
enough to keep my hungry highly-tuned application sufficiently well 
fed.  I will have to wait and see though.


In the mean time, I need to investigate why recent Solaris 10 kernel 
patches (141415-10) cause my Sun Ultra-40M2 system to panic five 
minutes into 'zpool scrub' with a fault being reported against the 
motherboard.  Maybe a few more motherboard swaps will solve it (on 4th 
motherboard now).  141415-3 seems less likely to panic since it 
survives a full scrub (unless VirtualBox is running a Linux instance).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread Rich Morris

On 09/10/09 16:17, Bob Friesenhahn wrote:

On Thu, 10 Sep 2009, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state 
at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also be 
in Solaris 10 Update 9. This fix speeds up the sequential prefetch 
pattern described in this CR without slowing down other prefetch 
patterns.  Some kstats have also been added to help improve the 
observability of ZFS file prefetching.


Excellent.  What level of read improvement are you seeing?  Is the 
prefetch rate improved, or does the fix simply avoid losing the prefetch?


This fix avoids using a prefetch stream when it is no longer valid.  
BTW, ZFS prefetch appears to work well for most prefetch patterns.  But 
this CR found a pattern that should have worked well but did not.


-- Rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread Henrik Johansson

Hello Rich,

On Sep 10, 2009, at 9:12 PM, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state  
at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also be  
in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in this  
CR without slowing down other prefetch patterns.  Some kstats have  
also been added to help improve the observability of ZFS file  
prefetching.


Nice work, do you know if it will be released as a patch for s10u8 or  
will it only be part of the update 9 KUP?


Regards

Henrik
http://sparcv9.blogspot.com___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread eneal

Quoting Bob Friesenhahn :


On Thu, 10 Sep 2009, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched   
state at High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also   
be in Solaris 10 Update 9. This fix speeds up the sequential   
prefetch pattern described in this CR without slowing down other   
prefetch patterns.  Some kstats have also been added to help   
improve the observability of ZFS file prefetching.


Excellent.  What level of read improvement are you seeing?  Is the
prefetch rate improved, or does the fix simply avoid losing the
prefetch?

Thanks,

Bob


Is this fixed in snv_122 or something else?



This email and any files transmitted with it are confidential and are  
intended solely for the use of the individual or entity to whom they  
are addressed. This communication may contain material protected by  
the attorney-client privilege. If you are not the intended recipient,  
be advised that any use, dissemination, forwarding, printing or  
copying is strictly prohibited. If you have received this email in  
error, please contact the sender and delete all copies.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread Bob Friesenhahn

On Thu, 10 Sep 2009, Rich Morris wrote:


On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state at High 
priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also be in 
Solaris 10 Update 9. 
This fix speeds up the sequential prefetch pattern described in this CR 
without slowing down other prefetch patterns.  Some kstats have also been 
added to help improve the observability of ZFS file prefetching.


Excellent.  What level of read improvement are you seeing?  Is the 
prefetch rate improved, or does the fix simply avoid losing the 
prefetch?


Thanks,

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-09-10 Thread Rich Morris

On 07/28/09 17:13, Rich Morris wrote:

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state at 
High priority.


CR 6859997 has recently been fixed in Nevada.  This fix will also be in 
Solaris 10 Update 9. 

This fix speeds up the sequential prefetch pattern described in this CR 
without slowing down other prefetch patterns.  Some kstats have also 
been added to help improve the observability of ZFS file prefetching.


-- Rich

CR 6859997 has been accepted and is actively being worked on.  The 
following info has been added to that CR:


This is a problem with the ZFS file prefetch code (zfetch) in 
dmu_zfetch.c.  The test script provided by the submitter (thanks Bob!) 
does no file prefetching the second time through each file.  This 
problem exists in ZFS in Solaris 10, Nevada, and OpenSolaris.


This test script creates 3000 files each 8M long so the amount of data 
(24G) is greater than the amount of memory (16G on a Thumper). With 
the default blocksize of 128k, each of the 3000 files has 63  blocks.  
The first time through, zfetch ramps up a single prefetch stream 
normally.  But the second time through, dmu_zfetch() calls  
dmu_zfetch_find() which thinks that the data has already been 
prefetched so no additional prefetching is started.


This problem is not seen with 500 files each 48M in length (still 24G 
of data).  In that case there's still only one prefetch stream but it 
is reclaimed when one of the requested offsets is not found.  The 
reason it is not found is that stream "strided" the first time through 
after reaching the zfetch cap, which is 256 blocks.  Files with no 
more than 256 blocks don't require a stride.  So this problem will 
only be seen when the data from a file with no more than 256 blocks is 
accessed after being tossed from the ARC.


The fix for this problem may be more feedback between the ARC and the 
zfetch code.  Or it may make sense to restart the prefetch stream 
after some time has passed or perhaps whenever there's a miss on a 
block that was expected to have already been prefetched?


On a Thumper running Nevada build 118, the first pass of this test 
takes 2 minutes 50 seconds and the second pass takes 5 minutes 22 
seconds.  If dmu_zfetch_find() is modified to restart the refetch 
stream when the requested offset is 0 and more than 2 seconds has 
passed since the stream was last accessed then the time needed for the 
second pass is reduced to 2 minutes 24 seconds.


Additional investigation is currently taking place to determine if 
another solution makes more sense.  And more testing will be needed to 
see what affect this change has on other prefetch patterns.


6412053 is a related CR which mentions that the zfetch code may not be 
issuing I/O at a sufficient pace.  This behavior is also seen on a 
Thumper running the test script in CR 6859997 since, even when 
prefetch is ramping up as expected, less than half of the available 
I/O bandwidth is being used.  Although more aggressive file 
prefetching could increase memory pressure as described in CRs 6258102 
and 6469558.



-- Rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Bob Friesenhahn

On Tue, 28 Jul 2009, Rich Morris wrote:


The fix for this problem may be more feedback between the ARC and the zfetch 
code.  Or it may make sense to restart the prefetch stream after some time 
has passed or perhaps whenever there's a miss on a block that was expected to 
have already been prefetched?


Regarding this approach of waiting for a prefetch miss, this seems 
like it would produce an uneven flow of data to the application and 
not ensure that data is always available when the application goes to 
read it.  A stutter is likely to produce at least a 10ms gap (and 
possibly far greater) while the application is blocked in read() 
waiting for data.  Since zfs blocks are large, stuttering becomes 
expensive, and if the application itself needs to read ahead 128K in 
order to avoid the stutter, then it consumes memory in an expensive 
non-sharable way.  In the ideal case, zfs will always stay one 128K 
block ahead of the application's requirement and the unconsumed data 
will be cached in the ARC where it can be shared with other processes.


For an application with real-time data requirements, it is definitely 
desireable not to stutter at all if possible.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Bob Friesenhahn

On Tue, 28 Jul 2009, Rich Morris wrote:


6412053 is a related CR which mentions that the zfetch code may not be 
issuing I/O at a sufficient pace.  This behavior is also seen on a Thumper 
running the test script in CR 6859997 since, even when prefetch is ramping up 
as expected, less than half of the available I/O bandwidth is being used. 
Although more aggressive file prefetching could increase memory pressure as 
described in CRs 6258102 and 6469558.


It is good to see this analysis.  Certainly the optimum prefetching 
required for an Internet video streaming server (with maybe 300 
kilobits/second per stream) is radically different than what is 
required for uncompressed 2K preview (8MB/frame) of motion picture 
frames (320 megabytes/second per stream) but zfs should be able to 
support both.


Besides real-time analysis based on current stream behavior and 
memory, it would be useful to maintain some recent history for the 
whole pool so that a pool which is usually used for 1000 slow-speed 
video streams behaves differently by default than one used for one or 
two high-speed video streams.  With this bit of hint information, 
files belonging to a pool recently producing high-speed streams can be 
ramped up quickly while files belonging to a pool which has recently 
fed low-speed streams can be ramped up more conservatively (until 
proven otherwise) in order to not flood memory and starve the I/O 
needed by other streams.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-28 Thread Rich Morris

On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:


Sun has opened internal CR 6859997.  It is now in Dispatched state at High 
priority.


CR 6859997 has been accepted and is actively being worked on.  The 
following info has been added to that CR:


This is a problem with the ZFS file prefetch code (zfetch) in dmu_zfetch.c.  
The test script provided by the submitter (thanks Bob!) does no file 
prefetching the second time through each file.  This problem exists in ZFS in 
Solaris 10, Nevada, and OpenSolaris.

This test script creates 3000 files each 8M long so the amount of data (24G) is 
greater than the amount of memory (16G on a Thumper). With the default 
blocksize of 128k, each of the 3000 files has 63  blocks.  The first time 
through, zfetch ramps up a single prefetch stream normally.  But the second 
time through, dmu_zfetch() calls  dmu_zfetch_find() which thinks that the data 
has already been prefetched so no additional prefetching is started.

This problem is not seen with 500 files each 48M in length (still 24G of data).  In that 
case there's still only one prefetch stream but it is reclaimed when one of the requested 
offsets is not found.  The reason it is not found is that stream "strided" the 
first time through after reaching the zfetch cap, which is 256 blocks.  Files with no 
more than 256 blocks don't require a stride.  So this problem will only be seen when the 
data from a file with no more than 256 blocks is accessed after being tossed from the ARC.

The fix for this problem may be more feedback between the ARC and the zfetch 
code.  Or it may make sense to restart the prefetch stream after some time has 
passed or perhaps whenever there's a miss on a block that was expected to have 
already been prefetched?

On a Thumper running Nevada build 118, the first pass of this test takes 2 
minutes 50 seconds and the second pass takes 5 minutes 22 seconds.  If 
dmu_zfetch_find() is modified to restart the refetch stream when the requested 
offset is 0 and more than 2 seconds has passed since the stream was last 
accessed then the time needed for the second pass is reduced to 2 minutes 24 
seconds.

Additional investigation is currently taking place to determine if another 
solution makes more sense.  And more testing will be needed to see what affect 
this change has on other prefetch patterns.

6412053 is a related CR which mentions that the zfetch code may not be issuing 
I/O at a sufficient pace.  This behavior is also seen on a Thumper running the 
test script in CR 6859997 since, even when prefetch is ramping up as expected, 
less than half of the available I/O bandwidth is being used.  Although more 
aggressive file prefetching could increase memory pressure as described in CRs 
6258102 and 6469558.


-- Rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-22 Thread Bob Friesenhahn

On Wed, 22 Jul 2009, Roch wrote:


HI Bob did you consider running the 2 runs with

echo zfs_prefetch_disable/W0t1 | mdb -kw

and see if performance is constant between the 2 runs (and low).
That would help clear the cause a bit. Sorry, I'd do it for
you but since you have the setup etc...

Revert with :

echo zfs_prefetch_disable/W0t0 | mdb -kw

-r


I see that if I update my test script so that prefetch is disabled 
before the first cpio is executed, the read performance of the first 
cpio reported by 'zpool iostat' is similar to what has been normal for 
the second cpio case (i.e. 32MB/second). This seems to indicate that 
prefetch is entirely disabled if the file has ever been read before. 
However, there is a new wrinkle in that the second cpio completes 
twice as fast with prefetch disabled even though 'zpool iostat' 
indicates the same consistent throughput.  The difference goes away if 
I tripple the number of files.


With 3000 8.2MB files:
Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
14443520 blocks

real3m41.61s
user0m0.44s
sys 0m8.12s

Doing second 'cpio -C 131072 -o > /dev/null'
14443520 blocks

real1m50.12s
user0m0.42s
sys 0m7.21s

Now if I increase the number of files to 9000 8.2MB files:

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
144000768 blocks

real35m51.47s
user0m4.46s
sys 1m20.11s

Doing second 'cpio -C 131072 -o > /dev/null'
144000768 blocks

real35m22.41s
user0m4.40s
sys 1m14.22s

Notice that with 3X the files, the throughput is dramatically reduced 
and the time is the same for both cases.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-22 Thread Brad Diggs
Have you considered running your script with ZFS pre-fetching disabled  
altogether to see if

the results are consistent between runs?

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 15, 2009, at 9:59 AM, Bob Friesenhahn wrote:


On Wed, 15 Jul 2009, Ross wrote:

Yes, that makes sense.  For the first run, the pool has only just  
been mounted, so the ARC will be empty, with plenty of space for  
prefetching.


I don't think that this hypothesis is quite correct.  If you use  
'zpool iostat' to monitor the read rate while reading a large  
collection of files with total size far larger than the ARC, you  
will see that there is no fall-off in read performance once the ARC  
becomes full.  The performance problem occurs when there is still  
metadata cached for a file but the file data has since been expunged  
from the cache.  The implication here is that zfs speculates that  
the file data will be in the cache if the metadata is cached, and  
this results in a cache miss as well as disabling the file read- 
ahead algorithm.  You would not want to do read-ahead on data that  
you already have in a cache.


Recent OpenSolaris seems to take a 2X performance hit rather than  
the 4X hit that Solaris 10 takes.  This may be due to improvement of  
existing algorithm function performance (optimizations) rather than  
a related design improvement.


I wonder if there is any tuning that can be done to counteract  
this? Is there any way to tell ZFS to bias towards prefetching  
rather than preserving data in the ARC?  That may provide better  
performance for scripts like this, or for random access workloads.


Recent zfs development focus has been on how to keep prefetch from  
damaging applications like database where prefetch causes more data  
to be read than is needed.  Since OpenSolaris now apparently  
includes an option setting which blocks file data caching and  
prefetch, this seems to open the door for use of more aggressive  
prefetch in the normal mode.


In summary, I agree with Richard Elling's hypothesis (which is the  
same as my own).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-20 Thread Brent Jones
On Mon, Jul 20, 2009 at 7:52 PM, Bob
Friesenhahn wrote:
> On Mon, 20 Jul 2009, Marion Hakanson wrote:

>
> It is definitely real.  Sun has opened internal CR 6859997.  It is now in
> Dispatched state at High priority.
>

Is there a way we can get a Sun person on this list to supply a little
bit more info on that CR?
Seems theres a lot of people bitten by this, from low end to extremely
high end hardware.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-20 Thread Bob Friesenhahn

On Mon, 20 Jul 2009, Marion Hakanson wrote:


Bob, have you tried changing your benchmark to be multithreaded?  It
occurs to me that maybe a single cpio invocation is another bottleneck.
I've definitely experienced the case where a single bonnie++ process was
not enough to max out the storage system.


It is likely that adding more cpios would cause more data to be read, 
but it would also thrash the disks with many more conflicting IOPS.



I'm not suggesting that the bug you're demonstrating is not real.  It's


It is definitely real.  Sun has opened internal CR 6859997.  It is now 
in Dispatched state at High priority.



that points out a problem.  Rather, I'm thinking that maybe the timing
comparisons between low-end and high-end storage systems on this particular
test are not revealing the whole story.


The similarity of performance between the low-end and high-end storage 
systems is a sign that the rotating rust is not a whole lot faster on 
the high-end storage systems.  Since zfs is failing to use pre-fetch, 
only one (or maybe two) disks are accessed at a time.  If more read 
I/Os are issued in parallel, then the data read rate will be vastly 
higher on the higher-end systems.


With my 12 disk array and a large sequential read, zfs can issue 12 
requests for 128K at once and since it can also queue pending I/Os, it 
can request many more than that.  Care is required since over-reading 
will penalize the system.  It is not an easy thing to get right.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-20 Thread Marion Hakanson
bfrie...@simple.dallas.tx.us said:
> No.  I am suggesting that all Solaris 10 (and probably OpenSolaris  systems)
> currently have a software-imposed read bottleneck which  places a limit on
> how well systems will perform on this simple  sequential read benchmark.
> After a certain point (which is  unfortunately not very high), throwing more
> hardware at the problem  does not result in any speed improvement.  This is
> demonstrated by  Scott Lawson's little two disk mirror almost producing the
> same  performance as our much more exotic setups. 

Apologies for reawakening this thread -- I was away last week.

Bob, have you tried changing your benchmark to be multithreaded?  It
occurs to me that maybe a single cpio invocation is another bottleneck.
I've definitely experienced the case where a single bonnie++ process was
not enough to max out the storage system.

I'm not suggesting that the bug you're demonstrating is not real.  It's
clear that subsequent runs on the same system show the degradation, and
that points out a problem.  Rather, I'm thinking that maybe the timing
comparisons between low-end and high-end storage systems on this particular
test are not revealing the whole story.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-16 Thread Bob Friesenhahn
I have received email that Sun CR numbers 6861397 & 6859997 have been 
created to get this performance problem fixed.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-16 Thread James Andrewartha
On Sun, 2009-07-12 at 16:38 -0500, Bob Friesenhahn wrote:
> In order to raise visibility of this issue, I invite others to see if 
> they can reproduce it in their ZFS pools.  The script at
> 
> http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Here's the results from two machines, the first has 12x400MHz US-II
CPUs, 11GB of RAM and the disks are 18GB 10krpm SCSI in a split D1000:

System Configuration:  Sun Microsystems  sun4u 8-slot Sun Enterprise
4000/5000
System architecture: sparc
System release level: 5.11 snv_101
CPU ISA list: sparcv9+vis sparcv9 sparcv8plus+vis sparcv8plus sparcv8 
sparcv8-fsmuld sparcv7 sparc

Pool configuration:
  pool: space
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h22m with 0 errors on Mon Jul 13 17:18:55
2009
config:

NAME STATE READ WRITE CKSUM
spaceONLINE   0 0 0
  mirror ONLINE   0 0 0
c0t3d0   ONLINE   0 0 0
c2t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c0t2d0   ONLINE   0 0 0
c2t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c0t4d0   ONLINE   0 0 0
c2t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c0t5d0   ONLINE   0 0 0
c2t13d0  ONLINE   1 0 0  128K repaired

errors: No known data errors

zfs create space/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /space/zfscachetest 
...
Done!
zfs unmount space/zfscachetest
zfs mount space/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real11m40.67s
user0m20.32s
sys 5m27.16s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real31m29.42s
user0m19.31s
sys 6m46.39s

Feel free to clean up with 'zfs destroy space/zfscachetest'.

The second has 2x1.2GHz US-III+, 4GB RAM and 10krpm FC disks on a single
loop.

System Configuration:  Sun Microsystems  sun4u Sun Fire 480R
System architecture: sparc
System release level: 5.11 snv_97
CPU ISA list: sparcv9+vis2 sparcv9+vis sparcv9 sparcv8plus+vis2 sparcv8plus+vis 
sparcv8plus sparcv8 sparcv8-fsmuld sparcv7 sparc

Pool configuration:
  pool: space
 state: ONLINE
 scrub: none requested
config: 

NAME STATE READ WRITE CKSUM
spaceONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t34d0  ONLINE   0 0 0
c1t48d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t35d0  ONLINE   0 0 0
c1t49d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t36d0  ONLINE   0 0 0
c1t51d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t33d0  ONLINE   0 0 0
c1t52d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t38d0  ONLINE   0 0 0
c1t53d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t39d0  ONLINE   0 0 0
c1t54d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t40d0  ONLINE   0 0 0
c1t55d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t41d0  ONLINE   0 0 0
c1t56d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t42d0  ONLINE   0 0 0
c1t57d0  ONLINE   0 0 0
logs ONLINE   0 0 0
  c1t50d0ONLINE   0 0 0

errors: No known data errors

zfs create space/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /space/zfscachetest 
...
Done!
zfs unmount space/zfscachetest
zfs mount space/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real5m45.66s
user0m5.63s
sys 1m14.66s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real15m29.42s
user0m5.65s
sys 1m37.83s

Feel free to clean up with 'zfs destroy space/zfscachetest'.

James Andrewartha

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Ross
Aaah, ok, I think I understand now.  Thanks Richard.

I'll grab the updated test and have a look at the ARC ghost results when I get 
back to work tomorrow.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Richard Elling wrote:


heh. What you would be looking for is evidence of prefetching.  If 
there is a lot of prefetching, the actv will tend to be high and 
latencies relatively low.  If there is no prefetching, actv will be 
low and latencies may be higher. This also implies that if you use 
IDE disks, which cannot handle multiple outstanding I/Os, the 
performance will look similar for both runs.


Ok, here are some stats for the "poor" (initial "USB" rates) and 
"terrible" (sub-"USB" rates) cases.


"poor" (29% busy) iostat:

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c1t0d0
0.01.20.0   11.4  0.0  0.00.04.5   0   0 c1t1d0
   91.20.0 11654.70.0  0.0  0.80.09.2   0  27 
c4t600A0B80003A8A0B096147B451BEd0
   95.00.0 12160.30.0  0.0  0.90.09.9   0  29 
c4t600A0B800039C9B50A9C47B4522Dd0
   96.40.0 12333.10.0  0.0  0.90.09.5   0  29 
c4t600A0B800039C9B50AA047B4529Bd0
   96.80.0 12377.90.0  0.0  0.90.09.5   0  30 
c4t600A0B80003A8A0B096647B453CEd0
  100.40.0 12845.10.0  0.0  1.00.09.5   0  29 
c4t600A0B800039C9B50AA447B4544Fd0
   93.40.0 11949.10.0  0.0  0.80.09.0   0  28 
c4t600A0B80003A8A0B096A47B4559Ed0
   91.50.0 11705.90.0  0.0  0.90.09.7   0  28 
c4t600A0B800039C9B50AA847B45605d0
   91.40.0 11680.30.0  0.0  0.90.0   10.1   0  29 
c4t600A0B80003A8A0B096E47B456DAd0
   88.90.0 11366.70.0  0.0  0.90.09.7   0  27 
c4t600A0B800039C9B50AAC47B45739d0
   94.30.0 12045.50.0  0.0  0.90.09.9   0  29 
c4t600A0B800039C9B50AB047B457ADd0
   96.50.0 12339.50.0  0.0  0.90.09.3   0  28 
c4t600A0B80003A8A0B097347B457D4d0
   87.90.0 11232.70.0  0.0  0.90.0   10.4   0  29 
c4t600A0B800039C9B50AB447B4595Fd0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c6t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 
c2t202400A0B83A8A0Bd31
0.00.00.00.0  0.0  0.00.00.0   0   0 
c3t202500A0B83A8A0Bd31
0.00.00.00.0  0.0  0.00.00.0   0   0 freddy:vold(pid508)

"terrible" (8% busy) iostat:

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c1t0d0
0.01.80.01.0  0.0  0.00.0   26.6   0   1 c1t1d0
   26.80.0 3430.40.0  0.0  0.10.02.9   0   8 
c4t600A0B80003A8A0B096147B451BEd0
   21.00.0 2688.00.0  0.0  0.10.03.9   0   8 
c4t600A0B800039C9B50A9C47B4522Dd0
   24.00.0 3059.60.0  0.0  0.10.03.4   0   8 
c4t600A0B800039C9B50AA047B4529Bd0
   27.60.0 3532.80.0  0.0  0.10.03.2   0   9 
c4t600A0B80003A8A0B096647B453CEd0
   20.80.0 2662.40.0  0.0  0.10.03.1   0   6 
c4t600A0B800039C9B50AA447B4544Fd0
   26.50.0 3392.00.0  0.0  0.10.02.6   0   7 
c4t600A0B80003A8A0B096A47B4559Ed0
   20.60.0 2636.80.0  0.0  0.10.03.0   0   6 
c4t600A0B800039C9B50AA847B45605d0
   22.90.0 2931.20.0  0.0  0.10.03.8   0   9 
c4t600A0B80003A8A0B096E47B456DAd0
   21.40.0 2739.20.0  0.0  0.10.03.5   0   7 
c4t600A0B800039C9B50AAC47B45739d0
   23.10.0 2944.40.0  0.0  0.10.03.7   0   9 
c4t600A0B800039C9B50AB047B457ADd0
   24.90.0 3187.20.0  0.0  0.10.03.4   0   8 
c4t600A0B80003A8A0B097347B457D4d0
   28.30.0 3622.40.0  0.0  0.10.02.8   0   8 
c4t600A0B800039C9B50AB447B4595Fd0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c6t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 
c2t202400A0B83A8A0Bd31
0.00.00.00.0  0.0  0.00.00.0   0   0 
c3t202500A0B83A8A0Bd31
0.00.00.00.0  0.0  0.00.00.0   0   0 freddy:vold(pid508)

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Richard Elling



Bob Friesenhahn wrote:

On Wed, 15 Jul 2009, Richard Elling wrote:


Unfortunately, "zpool iostat" doesn't really tell you anything about
performance.  All it shows is bandwidth. Latency is what you need
to understand performance, so use iostat.


You are still thinking about this as if it was a hardware-related 
problem when it is clearly not. Iostat is useful for analyzing 
hardware-related problems in the case where the workload is too much 
for the hardware, or the hardware is non-responsive. Anyone who runs 
this crude benchmark will discover that iostat shows hardly any disk 
utilization at all, latencies are low, and read I/O rates are low 
enough that they could be satisfied by a portable USB drive.  You can 
even observe the blinking lights on the front of the drive array and 
see that it is lightly loaded.  This explains why a two disk mirror is 
almost able to keep up with a system with 40 fast SAS drives.


heh. What you would be looking for is evidence of prefetching.  If there
is a lot of prefetching, the actv will tend to be high and latencies 
relatively

low.  If there is no prefetching, actv will be low and latencies may be
higher. This also implies that if you use IDE disks, which cannot handle
multiple outstanding I/Os, the performance will look similar for both runs.

Or, you could get more sophisticated and use a dtrace script to look at
the I/O behavior to determine the latency between contiguous I/O
requests. Something like iopattern is a good start, though it doesn't
try to measure the time between requests, it would be easy to add.
http://www.richardelling.com/Home/scripts-and-programs-1/iopattern
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Richard Elling wrote:


Unfortunately, "zpool iostat" doesn't really tell you anything about
performance.  All it shows is bandwidth. Latency is what you need
to understand performance, so use iostat.


You are still thinking about this as if it was a hardware-related 
problem when it is clearly not. Iostat is useful for analyzing 
hardware-related problems in the case where the workload is too much 
for the hardware, or the hardware is non-responsive. Anyone who runs 
this crude benchmark will discover that iostat shows hardly any disk 
utilization at all, latencies are low, and read I/O rates are low 
enough that they could be satisfied by a portable USB drive.  You can 
even observe the blinking lights on the front of the drive array and 
see that it is lightly loaded.  This explains why a two disk mirror is 
almost able to keep up with a system with 40 fast SAS drives.


This is the opposite situation from the zfs writes which periodically 
push the hardware to its limits.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Richard Elling



Bob Friesenhahn wrote:

On Wed, 15 Jul 2009, Ross wrote:

Yes, that makes sense.  For the first run, the pool has only just 
been mounted, so the ARC will be empty, with plenty of space for 
prefetching.


I don't think that this hypothesis is quite correct.  If you use 
'zpool iostat' to monitor the read rate while reading a large 
collection of files with total size far larger than the ARC, you will 
see that there is no fall-off in read performance once the ARC becomes 
full.


Unfortunately, "zpool iostat" doesn't really tell you anything about
performance.  All it shows is bandwidth. Latency is what you need
to understand performance, so use iostat.

The performance problem occurs when there is still metadata cached for 
a file but the file data has since been expunged from the cache.  The 
implication here is that zfs speculates that the file data will be in 
the cache if the metadata is cached, and this results in a cache miss 
as well as disabling the file read-ahead algorithm.  You would not 
want to do read-ahead on data that you already have in a cache.


I realized this morning that what I posted last night might be
misleading to the casual reader. Clearly the first time through
the data is prefetched and misses the cache.  On the second
pass, it should also miss the cache, if it were a simple cache.
But the ARC tries to be more clever and has ghosts -- where
the data is no longer in cache, but the metadata is.  I suspect
the prefetching is not being used for the ghosts.  The arcstats
will show this. As benr blogs,
   "These Ghosts lists are magic. If you get a lot of hits to the
   ghost lists, it means that ARC is WAY too small and that
   you desperately need either more RAM or an L2 ARC
   device (likely, SSD). Please note, if you are considering
   investing in L2 ARC, check this FIRST."
http://www.cuddletech.com/blog/pivot/entry.php?id=979
This is the explicit case presented by your test. This also
explains why the entry from the system with an L2ARC
did not have the performance "problem."

Also, another test would be to have two large files.  Read from
one, then the other, then from the first again.  Capture arcstats
from between the reads and see if the haunting stops ;-)
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, My D. Truong wrote:


Here's an example of an OpenSolaris machine, 2008.11 upgraded to the 
117 devel release.  X4540, 32GB RAM.  The file count was bumped up 
to 9000 to be a little over double the RAM.


Your timings show a 3.1X hit so it appears that the OpenSolaris 
improvement is not as much as was assumed.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Ross wrote:

Yes, that makes sense.  For the first run, the pool has only just 
been mounted, so the ARC will be empty, with plenty of space for 
prefetching.


I don't think that this hypothesis is quite correct.  If you use 
'zpool iostat' to monitor the read rate while reading a large 
collection of files with total size far larger than the ARC, you will 
see that there is no fall-off in read performance once the ARC becomes 
full.  The performance problem occurs when there is still metadata 
cached for a file but the file data has since been expunged from the 
cache.  The implication here is that zfs speculates that the file data 
will be in the cache if the metadata is cached, and this results in a 
cache miss as well as disabling the file read-ahead algorithm.  You 
would not want to do read-ahead on data that you already have in a 
cache.


Recent OpenSolaris seems to take a 2X performance hit rather than the 
4X hit that Solaris 10 takes.  This may be due to improvement of 
existing algorithm function performance (optimizations) rather than a 
related design improvement.


I wonder if there is any tuning that can be done to counteract this? 
Is there any way to tell ZFS to bias towards prefetching rather than 
preserving data in the ARC?  That may provide better performance for 
scripts like this, or for random access workloads.


Recent zfs development focus has been on how to keep prefetch from 
damaging applications like database where prefetch causes more data to 
be read than is needed.  Since OpenSolaris now apparently includes an 
option setting which blocks file data caching and prefetch, this seems 
to open the door for use of more aggressive prefetch in the normal 
mode.


In summary, I agree with Richard Elling's hypothesis (which is the 
same as my own).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread My D. Truong
> It would be good to see results from a few
> OpenSolaris users running a 
> recent 64-bit kernel, and with fast storage to see if
> this is an 
> OpenSolaris issue as well.

Bob,

Here's an example of an OpenSolaris machine, 2008.11 upgraded to the 117 devel 
release.  X4540, 32GB RAM.  The file count was bumped up to 9000 to be a little 
over double the RAM.

r...@deviant:~# ./zfs-cache-test.ksh gauss
System Configuration: Sun Microsystems Sun Fire X4540
System architecture: i386
System release level: 5.11 snv_117
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 
i86

Pool configuration:
  pool: gauss
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
gauss   ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c9t2d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0
c8t4d0  ONLINE   0 0 0
c9t4d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c7t5d0  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
c9t5d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
c9t6d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0
c9t7d0  ONLINE   0 0 0

errors: No known data errors

zfs create gauss/zfscachetest
Creating data file set (9000 files of 8192000 bytes) under /gauss/zfscachetest 
...
Done!
zfs unmount gauss/zfscachetest
zfs mount gauss/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
144000768 blocks

real9m15.87s
user0m5.16s
sys 1m29.32s

Doing second 'cpio -C 131072 -o > /dev/null'
144000768 blocks

real28m57.54s
user0m5.47s
sys 1m50.32s

Feel free to clean up with 'zfs destroy gauss/zfscachetest'.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Ross
Yes, that makes sense.  For the first run, the pool has only just been mounted, 
so the ARC will be empty, with plenty of space for prefetching.

On the second run however, the ARC is already full of the data that we just 
read, and I'm guessing that the prefetch code is less aggressive when there is 
already data in the ARC.  Which for normal use may be what you want - it's 
trying to keep things in the ARC in case they are needed.

However that does mean that ZFS prefetch is always going to suffer performance 
degradation on a live system, although early signs are that this might not be 
so severe in snv_117.

I wonder if there is any tuning that can be done to counteract this?  Is there 
any way to tell ZFS to bias towards prefetching rather than preserving data in 
the ARC?  That may provide better performance for scripts like this, or for 
random access workloads.

Also, could there be any generic algorithm improvements that could help.  Why 
should ZFS keep data in the ARC if it hasn't been used?  This script has 8GB 
files, but the ARC should be using at least 1GB of RAM.  That's a minimum of 
128 files in memory, none of which would have been read more than once.  If 
we're reading a new file now, prefetching should be able to displace any old 
object in the ARC that hasn't been used - in this case all 127 previous files 
should be candidates for replacement.

I wonder how that would interact with a L2ARC.  If that was fast enough I'd 
certainly want to allocate more of the ARC to prefetching.

Finally, would it make sense for the ARC to always allow a certain percentage 
for prefetching, possibly with that percentage being tunable, allowing us to 
balance the needs of the two systems according to the expected usage?

Ross
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-15 Thread Joerg Schilling
Richard Elling  wrote:

> I think a picture is emerging that if you have enough RAM, the
> ARC is working very well. Which means that the ARC management
> is suspect.
>
> I propose the hypothesis that ARC misses are not prefetched.  The
> first time through, prefetching works.  For the second pass, ARC
> misses are not prefetched, so sequential reads go slower. 

You may be right as it may be that the cache is not filled by new important
data because there is already 100% of unimportant data inside.


Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Richard Elling

I think a picture is emerging that if you have enough RAM, the
ARC is working very well. Which means that the ARC management
is suspect.

I propose the hypothesis that ARC misses are not prefetched.  The
first time through, prefetching works.  For the second pass, ARC
misses are not prefetched, so sequential reads go slower. 


For JBODs, the effect will be worse than for LUNs on a storage array
with lots of cache. benr's prefetch script will help shed light on this,
but apparently doesn't work for Solaris 10. Since the Solaris 10
source is not publicly available, someone with source access might
need to adjust it to match the Solaris 10 source.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Scott Lawson
This system has 32 GB of RAM so I will probbaly need to increase the 
data set size.


[r...@x tmp]#> ./zfs-cache-test.ksh nbupool
System Configuration:  Sun Microsystems  sun4v SPARC Enterprise T5220
System architecture: sparc
System release level: 5.10 Generic_141414-02
CPU ISA list: sparcv9+vis2 sparcv9+vis sparcv9 sparcv8plus+vis2 
sparcv8plus+vis sparcv8plus sparcv8 sparcv8-fsmuld sparcv7 sparc


Pool configuration:
 pool: nbupool
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   nbupool  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t2d0   ONLINE   0 0 0
   c2t3d0   ONLINE   0 0 0
   c2t4d0   ONLINE   0 0 0
   c2t5d0   ONLINE   0 0 0
   c2t6d0   ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t7d0   ONLINE   0 0 0
   c2t8d0   ONLINE   0 0 0
   c2t9d0   ONLINE   0 0 0
   c2t10d0  ONLINE   0 0 0
   c2t11d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t12d0  ONLINE   0 0 0
   c2t13d0  ONLINE   0 0 0
   c2t14d0  ONLINE   0 0 0
   c2t15d0  ONLINE   0 0 0
   c2t16d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t17d0  ONLINE   0 0 0
   c2t18d0  ONLINE   0 0 0
   c2t19d0  ONLINE   0 0 0
   c2t20d0  ONLINE   0 0 0
   c2t21d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t22d0  ONLINE   0 0 0
   c2t23d0  ONLINE   0 0 0
   c2t24d0  ONLINE   0 0 0
   c2t25d0  ONLINE   0 0 0
   c2t26d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t27d0  ONLINE   0 0 0
   c2t28d0  ONLINE   0 0 0
   c2t29d0  ONLINE   0 0 0
   c2t30d0  ONLINE   0 0 0
   c2t31d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t32d0  ONLINE   0 0 0
   c2t33d0  ONLINE   0 0 0
   c2t34d0  ONLINE   0 0 0
   c2t35d0  ONLINE   0 0 0
   c2t36d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t37d0  ONLINE   0 0 0
   c2t38d0  ONLINE   0 0 0
   c2t39d0  ONLINE   0 0 0
   c2t40d0  ONLINE   0 0 0
   c2t41d0  ONLINE   0 0 0
 raidz1 ONLINE   0 0 0
   c2t42d0  ONLINE   0 0 0
   c2t43d0  ONLINE   0 0 0
   c2t44d0  ONLINE   0 0 0
   c2t45d0  ONLINE   0 0 0
   c2t46d0  ONLINE   0 0 0
   spares
 c2t47d0AVAIL  
 c2t48d0AVAIL  
 c2t49d0AVAIL  


errors: No known data errors

zfs create nbupool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/nbupool/zfscachetest ...

Done!
zfs unmount nbupool/zfscachetest
zfs mount nbupool/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m37.24s
user0m9.87s
sys 1m54.08s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real1m59.11s
user0m9.93s
sys 1m49.15s

Feel free to clean up with 'zfs destroy nbupool/zfscachetest'.

Scott Lawson wrote:

Bob,

Output of my run for you. System is a M3000 with 16 GB RAM and 1 zpool 
called test1
which is contained on a raid 1 volume on a 6140 with 7.50.13.10 
firmware on

the RAID controllers. RAid 1 is made up of two 146GB 15K FC disks.

This machine is brand new with a clean install of S10 05/09. It is 
destined to become a Oracle 10 server with

ZFS filesystems for zones and DB volumes.

[r...@xxx /]#> uname -a
SunOS xxx 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise
[r...@xxx /]#> cat /etc/release
  Solaris 10 5/09 s10s_u7wos_08 SPARC
  Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
   Use is subject to license terms.
Assembled 30 March 2009

[r...@xxx /]#> prtdiag -v | more
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise 
M3000 Server

System clock frequency: 1064 MHz
Memory size: 16384 Megabytes


Here is the run output for you.

[r...@xxx tmp]#> ./zfs-cache-test.ksh test1
zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/test1/zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -o 

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Scott Lawson



Bob Friesenhahn wrote:

On Wed, 15 Jul 2009, Scott Lawson wrote:


  NAME   STATE READ WRITE 
CKSUM
  test1  ONLINE   0 
0 0
mirror   ONLINE   0 
0 0
  c3t600A0B8000562264039B4A257E11d0  ONLINE   0 
0 0
  c3t600A0B8000336DE204394A258B93d0  ONLINE   0 
0 0
Each of these LUNS is a pair of 146GB 15K drives in a RAID1 on Crystal 
firmware on a 6140. Each LUN is 2km

apart in different data centres. 1 LUN where the server is, 1 remote.

Interestingly by creating the mirror vdev the first run got faster, and 
the second much much slower.  The second cpio
took and extra 2 minutes by virtue of it being a mirror. I ran the 
script once again prior to adding the mirror
and the results were pretty much the same as the first run posted. (plus 
or minus a couple of seconds, which
is to be expected as these LUNS are on prod arrays feeding other servers 
as well)


I will try these tests on some of my J4500's when I get a chance 
shortly. My interest is now piqued.




Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m25.13s
user0m2.67s
sys 0m28.40s


It is quite impressive that your little two disk mirror reads as fast 
as mega Sun systems with 38+ disks and striped vdevs to boot. Incredible!


Does this have something to do with your well-managed power and 
cooling? :-)

Maybe it is Bob, maybe it is. ;) haha.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Jorgen Lundman wrote:

You have some mighty pools there.  Something I find quite interesting is 
that those who have "mighty pools" generally obtain about the same data 
rate regardless of their relative degree of excessive "might". This causes 
me to believe that the Solaris kernel is throttling the read rate so that 
throwing more and faster hardware at the problem does not help.


Are you saying the X4500s we have are set up incorrectly, or done in a way 
which will make them run poorly?


No.  I am suggesting that all Solaris 10 (and probably OpenSolaris 
systems) currently have a software-imposed read bottleneck which 
places a limit on how well systems will perform on this simple 
sequential read benchmark.  After a certain point (which is 
unfortunately not very high), throwing more hardware at the problem 
does not result in any speed improvement.  This is demonstrated by 
Scott Lawson's little two disk mirror almost producing the same 
performance as our much more exotic setups.


Evidence suggests that SPARC systems are doing better than x86.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Tue, 14 Jul 2009, Ross wrote:


Hi Bob,

My guess is something like it's single threaded, with each file dealt with in 
order and requests being serviced by just one or two disks at a time.  With 
that being the case, an x4500 is essentially just running off 7200 rpm SATA 
drives, which really is nothing special.

A quick summary of some of the figures, with times normalized for 3000 files:

Sun x2200, single 500GB sata:   6m25.15s
Sun v490, raidz1 zpool of 6x146 sas drives on a j4200:  2m46.29s
Sun X4500, 7 sets of mirrored 500Gb SATA:  3m0.83s
Sun x4540, (unknown pool - Jorgen, what are you running?):   4m7.13s


This new one from Scott Lawson is incredible (but technically quite 
possible):


SPARC Enterprise M3000, single SAS mirror pair: 3m25.13s

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Scott Lawson wrote:


  NAME   STATE READ WRITE CKSUM
  test1  ONLINE   0 0 0
mirror   ONLINE   0 0 0
  c3t600A0B8000562264039B4A257E11d0  ONLINE   0 0 0
  c3t600A0B8000336DE204394A258B93d0  ONLINE   0 0 0

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m25.13s
user0m2.67s
sys 0m28.40s


It is quite impressive that your little two disk mirror reads as fast 
as mega Sun systems with 38+ disks and striped vdevs to boot. 
Incredible!


Does this have something to do with your well-managed power and 
cooling? :-)


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman




You have some mighty pools there.  Something I find quite interesting is 
that those who have "mighty pools" generally obtain about the same data 
rate regardless of their relative degree of excessive "might". This 
causes me to believe that the Solaris kernel is throttling the read rate 
so that throwing more and faster hardware at the problem does not help.





Are you saying the X4500s we have are set up incorrectly, or done in a 
way which will make them run poorly?


The servers came with no documentation nor advise. I have yet to find a 
good place that suggest configurations for dedicated x4500 NFS servers. 
We had to find out about the NFSD_SERVERS when the first trouble came 
in. (Followed by 5 other tweaks and limits-reached troubles).


If Sun really wants to compete with NetApp, you'd think they would ship 
us hardware configured for NFS servers, not x4500s configured for 
desktops :(  They are cheap though! Nothing like being Wall-Mart of Storage!


That is how the pools were created as well. Admittedly it may be down to 
our Vendor again.


Lund

--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Wed, 15 Jul 2009, Jorgen Lundman wrote:


Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m1.58s
user0m1.92s
sys 0m56.67s

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m5.51s
user0m1.70s
sys 0m29.53s



You have some mighty pools there.  Something I find quite interesting 
is that those who have "mighty pools" generally obtain about the same 
data rate regardless of their relative degree of excessive "might". 
This causes me to believe that the Solaris kernel is throttling the 
read rate so that throwing more and faster hardware at the problem 
does not help.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Scott Lawson
I added a second Lun identical in size as a mirror and reran test. 
Results are more in line with yours now.


./zfs-cache-test.ksh test1
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise 
M3000 Server

System architecture: sparc
System release level: 5.10 Generic_139555-08
CPU ISA list: sparcv9+vis2 sparcv9+vis sparcv9 sparcv8plus+vis2 
sparcv8plus+vis sparcv8plus sparcv8 sparcv8-fsmuld sparcv7 sparc


Pool configuration:
 pool: test1
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Wed Jul 15 
11:38:54 2009

config:

   NAME   STATE READ WRITE 
CKSUM
   test1  ONLINE   0 
0 0
 mirror   ONLINE   0 
0 0
   c3t600A0B8000562264039B4A257E11d0  ONLINE   0 
0 0
   c3t600A0B8000336DE204394A258B93d0  ONLINE   0 
0 0


errors: No known data errors

zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/test1/zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m25.13s
user0m2.67s
sys 0m28.40s

Doing second 'cpio -C 131072 -o > /dev/null'

48000256 blocks

real8m53.05s
user0m2.69s
sys 0m32.83s

Feel free to clean up with 'zfs destroy test1/zfscachetest'.

Scott Lawson wrote:

Bob,

Output of my run for you. System is a M3000 with 16 GB RAM and 1 zpool 
called test1
which is contained on a raid 1 volume on a 6140 with 7.50.13.10 
firmware on

the RAID controllers. RAid 1 is made up of two 146GB 15K FC disks.

This machine is brand new with a clean install of S10 05/09. It is 
destined to become a Oracle 10 server with

ZFS filesystems for zones and DB volumes.

[r...@xxx /]#> uname -a
SunOS xxx 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise
[r...@xxx /]#> cat /etc/release
  Solaris 10 5/09 s10s_u7wos_08 SPARC
  Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
   Use is subject to license terms.
Assembled 30 March 2009

[r...@xxx /]#> prtdiag -v | more
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise 
M3000 Server

System clock frequency: 1064 MHz
Memory size: 16384 Megabytes


Here is the run output for you.

[r...@xxx tmp]#> ./zfs-cache-test.ksh test1
zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/test1/zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m48.94s
user0m21.58s
sys 0m44.91s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real6m39.87s
user0m21.62s
sys 0m46.20s

Feel free to clean up with 'zfs destroy test1/zfscachetest'.

Looks like a 25% performance loss for me. I was seeing around 80MB/s 
sustained

on the first run and around 60M/'s sustained on the 2nd.

/Scott.


Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue 
for a week now.  A 4X reduction in file read performance due to 
having read the file before is terrible, and of course the situation 
is considerably worse if the file was previously mmapped as well.  
Many of us have sent a lot of money to Sun and were not aware that 
ZFS is sucking the life out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. 
For example, I reproduced it on my Blade 2500 (SPARC) which uses a 
simple mirrored rpool.  On that system there is a 1.8X read slowdown 
from the file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh 



Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the 
pool to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to 

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman


3 servers contained within.


Both x4500 and x4540 are setup the way Sun shipped to us. With minor 
changes (nfsservers=1024 etc). I was a little disappointed that they 
were identical in speed on round one, but the x4540 looked better part 
2. Which I suspect is probably just OS version?




x4500 Sol 10 100% idle, but with 3.86T existing data. 16GB memory, 4 core.
x4500-03:/var/tmp# ./zfs-cache-test.ksh zpool1
System Configuration: Sun Microsystems Sun Fire X4500
System architecture: i386
System release level: 5.10 on10-public-x:s10idr_ldi:03/27/2009
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 
i386 i86


Pool configuration:
  pool: zpool1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zpool1  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c8t0d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0
c8t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c7t5d0  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0

errors: No known data errors

zfs create zpool1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/zpool1/zfscachetest ...

Done!
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m1.58s
user0m1.92s
sys 0m56.67s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real7m7.76s
user0m1.77s
sys 1m6.82s

Feel free to clean up with 'zfs destroy zpool1/zfscachetest'.






x4540 Sol svn 117, 100% idle, completely empty, 32GB memory, 8 core.

x4500-07:/var/tmp# ./zfs-cache-test.ksh zpool1
System Configuration: Sun Microsystems Sun Fire X4540
System architecture: i386
System release level: 5.11 snv_117
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 
i386 i86


Pool configuration:
  pool: zpool1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zpool1  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t7d0  ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
  

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jakov Sosic
Hi!

Do you think that this issues will be seen on a ZVOL-s that are exported as 
iSCSI tragets?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Tue, 14 Jul 2009, Richard Elling wrote:


That is because file prefetch is dynamic.  benr wrote a good blog on the
subject and includes a DTrace script to monitor DMU prefetches.
http://www.cuddletech.com/blog/pivot/entry.php?id=1040


Apparently not dynamic enough.  The provided DTrace script has a 
syntax error when used for Solaris 10 U7.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Richard Elling

Bob Friesenhahn wrote:

On Tue, 14 Jul 2009, Ross wrote:

My guess is something like it's single threaded, with each file dealt 
with in order and requests being serviced by just one or two disks at 
a time.  With that being the case, an x4500 is essentially just 
running off 7200 rpm SATA drives, which really is nothing special.


Keep in mind that there is supposed to be file level read-ahead.  As 
an example, ZFS is able to read from my array at up to 551 MB/second 
when reading from a huge (64GB) file yet it is only managing 
145MB/second or so for these 8MB files sequentially accessed by cpio. 
This suggests that even for the initial read case that zfs is not 
applying enough file level read-ahead (or applying it soon enough) to 
keep the disks busy.  8MB is still pretty big in the world of files. 
Perhaps it takes zfs a long time to decide that read-ahead is required.


I have yet to find a tunable for file level read-ahead.  There are 
tunables for vdev-level read-ahead but vdev read-ahead pretty minor 
read-ahead and increasing it may cause more harm than help.


That is because file prefetch is dynamic.  benr wrote a good blog on the
subject and includes a DTrace script to monitor DMU prefetches.
http://www.cuddletech.com/blog/pivot/entry.php?id=1040
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Halldor Runar Haflidason
On Tue Jul 14, 2009 at 11:09:32AM -0500, Bob Friesenhahn wrote:
> On Tue, 14 Jul 2009, Jorgen Lundman wrote:
>
>> I have no idea. I downloaded the script from Bob without modifications and 
>> ran it specifying only the name of our pool. Should I have changed 
>> something to run the test?
>
> If your system has quite a lot of memory, the number of files should be 
> increased to at least match the amount of memory.
>
>> We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 running 
>> svn117 for ZFS quotas. Worth trying on both?
>
> It is useful to test as much as possible in order to fully understand the 
> situation.
>
> Since results often get posted without system details, the script is 
> updated to dump some system info and the pool configuration.  Refresh from
>
> http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

And mine: 

d...@pax:1512 $ pfexec ./zfs-cache-test.ksh tank
System Configuration: MICRO-STAR INTERNATIONAL CO.,LTD MS-7365
System architecture: i386
System release level: 5.11 snv_101b
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium
i486 i386 i86

Pool configuration:
  pool: tank
 state: ONLINE
 scrub: scrub completed after 3h30m with 0 errors on Tue Jul  7
19:38:45 2009
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4d0ONLINE   0 0 0
c5d0ONLINE   0 0 0
c7d0ONLINE   0 0 0

errors: No known data errors

zfs create tank/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under
/tank/zfscachetest ...
Done!
zfs unmount tank/zfscachetest
zfs mount tank/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real8m19.62s
user0m2.07s
sys 0m30.18s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real5m4.59s
user0m1.86s
sys 0m34.06s

Feel free to clean up with 'zfs destroy tank/zfscachetest'.

-- 
Regards,
Dóri
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Gaëtan Lehmann


Le 14 juil. 09 à 18:09, Bob Friesenhahn a écrit :


On Tue, 14 Jul 2009, Jorgen Lundman wrote:

I have no idea. I downloaded the script from Bob without  
modifications and ran it specifying only the name of our pool.  
Should I have changed something to run the test?


If your system has quite a lot of memory, the number of files should  
be increased to at least match the amount of memory.


We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2  
running svn117 for ZFS quotas. Worth trying on both?


It is useful to test as much as possible in order to fully  
understand the situation.


Since results often get posted without system details, the script is  
updated to dump some system info and the pool configuration.   
Refresh from


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh




Here is the result on another host with faster drives (sas 1 rpm)  
and solaris 10u7.


System Configuration: Sun Microsystems SUN FIRE X4150
System architecture: i386
System release level: 5.10 Generic_139556-08
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium  
i486 i386 i86


Pool configuration:
  pool : rpool
 état : ONLINE
 purger : aucun requis
configuration :

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

erreurs : aucune erreur de données connue

zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocs

real4m56.84s
user0m1.72s
sys 0m28.48s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocs

real13m48.19s
user0m2.07s
sys 0m44.45s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.


--
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr



PGP.sig
Description: Ceci est une signature électronique PGP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Tue, 14 Jul 2009, Ross wrote:

My guess is something like it's single threaded, with each file 
dealt with in order and requests being serviced by just one or two 
disks at a time.  With that being the case, an x4500 is essentially 
just running off 7200 rpm SATA drives, which really is nothing 
special.


Keep in mind that there is supposed to be file level read-ahead.  As 
an example, ZFS is able to read from my array at up to 551 MB/second 
when reading from a huge (64GB) file yet it is only managing 
145MB/second or so for these 8MB files sequentially accessed by cpio. 
This suggests that even for the initial read case that zfs is not 
applying enough file level read-ahead (or applying it soon enough) to 
keep the disks busy.  8MB is still pretty big in the world of files. 
Perhaps it takes zfs a long time to decide that read-ahead is 
required.


I have yet to find a tunable for file level read-ahead.  There are 
tunables for vdev-level read-ahead but vdev read-ahead pretty minor 
read-ahead and increasing it may cause more harm than help.



A quick summary of some of the figures, with times normalized for 3000 files:

Sun x2200, single 500GB sata:   6m25.15s
Sun v490, raidz1 zpool of 6x146 sas drives on a j4200:  2m46.29s
Sun X4500, 7 sets of mirrored 500Gb SATA:  3m0.83s
Sun x4540, (unknown pool - Jorgen, what are you running?):   4m7.13s


And mine:

Ultra 40-M2 / StorageTek 2540, 6 sets of mirrored 300GB SAS: 2m44.20s

I think that Jorgen implied that his system is using SAN storage with 
a mirror across two jumbo LUNs.


The raid pool of SAS drives is quicker again, but for a single 
threaded request that also seems about right.  The random read 
benefits of the mirror aren't going to take effect unless you run 
multiple reads in parallel.  What I suspect is helping here are the 
slightly better seek times of the SAS drives, along with slightly 
higher throughput due to the raid.


Once ZFS decides to apply file level read-ahead then it can issue many 
reads in parallel.  It should be able to keep at least six disks busy 
at once, leading to much better performance than we are seeing.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Angelo Rajadurai
Just FYI. I ran a slightly different version of the test. I used SSD  
(for log & cache)! 3 x 32GB SSDs. 2 mirrored for log and one for  
cache. The systems is a 4150 with 12 GB of RAM. Here are the results


$ pfexec ./zfs-cache-test.ksh sdpool
System Configuration:
System architecture: i386
System release level: 5.11 snv_111b
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium  
i486 i386 i86


Pool configuration:
  pool: sdpool
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Jul 10  
11:33:01 2009

config:

NAMESTATE READ WRITE CKSUM
sdpool  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
logsONLINE   0 0 0
  mirrorONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
cache
  c8t4d0ONLINE   0 0 0

errors: No known data errors

zfs unmount sdpool/zfscachetest
zfs mount sdpool/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real3m27.06s
user0m2.05s
sys 0m30.14s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real2m47.32s
user0m2.09s
sys 0m32.32s

Feel free to clean up with 'zfs destroy sdpool/zfscachetest'.

-Angelo


On Jul 14, 2009, at 12:09 PM, Bob Friesenhahn wrote:


On Tue, 14 Jul 2009, Jorgen Lundman wrote:

I have no idea. I downloaded the script from Bob without  
modifications and ran it specifying only the name of our pool.  
Should I have changed something to run the test?


If your system has quite a lot of memory, the number of files should  
be increased to at least match the amount of memory.


We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2  
running svn117 for ZFS quotas. Worth trying on both?


It is useful to test as much as possible in order to fully  
understand the situation.


Since results often get posted without system details, the script is  
updated to dump some system info and the pool configuration.   
Refresh from


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Ross
Hi Bob,

My guess is something like it's single threaded, with each file dealt with in 
order and requests being serviced by just one or two disks at a time.  With 
that being the case, an x4500 is essentially just running off 7200 rpm SATA 
drives, which really is nothing special.

A quick summary of some of the figures, with times normalized for 3000 files:

Sun x2200, single 500GB sata:   6m25.15s
Sun v490, raidz1 zpool of 6x146 sas drives on a j4200:  2m46.29s
Sun X4500, 7 sets of mirrored 500Gb SATA:  3m0.83s
Sun x4540, (unknown pool - Jorgen, what are you running?):   4m7.13s

Taking my single SATA drive as a base, a pool of mirrored SATA is almost 
exactly twice as quick which makes sense if ZFS is reading the file off both 
drives at once.

The raid pool of SAS drives is quicker again, but for a single threaded request 
that also seems about right.  The random read benefits of the mirror aren't 
going to take effect unless you run multiple reads in parallel.  What I suspect 
is helping here are the slightly better seek times of the SAS drives, along 
with slightly higher throughput due to the raid.

What might be interesting would be to see the results off a ramdisk or SSD 
drive.

Ross
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread lists+zfs
On Tue, Jul 14, 2009 at 11:09:32AM -0500, Bob Friesenhahn wrote:
> On Tue, 14 Jul 2009, Jorgen Lundman wrote:
>
>> I have no idea. I downloaded the script from Bob without modifications 
>> and ran it specifying only the name of our pool. Should I have changed 
>> something to run the test?
>
> If your system has quite a lot of memory, the number of files should be 
> increased to at least match the amount of memory.
>
>> We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 
>> running svn117 for ZFS quotas. Worth trying on both?
>
> It is useful to test as much as possible in order to fully understand  
> the situation.
>
> Since results often get posted without system details, the script is  
> updated to dump some system info and the pool configuration.  Refresh  
> from
>
> http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Whitebox Quad-core Phenom, 8G RAM, RAID-Z (3x1TB + 3x1.5TB) SATA drives via an 
AOC-USAS-L8i:

System Configuration: Gigabyte Technology Co., Ltd. GA-MA770-DS3
System architecture: i386
System release level: 5.11 snv_111b
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 
i86

Pool configuration:
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t7d0  ONLINE   0 0 0
c3t6d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0

errors: No known data errors

zfs create pool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /pool/zfscachetest 
...
Done!
zfs unmount pool/zfscachetest
zfs mount pool/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real4m59.33s
user0m21.83s
sys 2m56.05s

Doing second 'cpio -C 131072 -o > /dev/null'
48000256 blocks

real8m28.11s
user0m22.66s
sys 3m13.26s

Feel free to clean up with 'zfs destroy pool/zfscachetest'.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

On Tue, 14 Jul 2009, Jorgen Lundman wrote:

I have no idea. I downloaded the script from Bob without modifications and 
ran it specifying only the name of our pool. Should I have changed something 
to run the test?


If your system has quite a lot of memory, the number of files should 
be increased to at least match the amount of memory.


We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 running 
svn117 for ZFS quotas. Worth trying on both?


It is useful to test as much as possible in order to fully understand 
the situation.


Since results often get posted without system details, the script is 
updated to dump some system info and the pool configuration.  Refresh 
from


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Bob Friesenhahn

Ross,

Please refresh your test script from the source.  The current script 
tells cpio to use 128k blocks and mentions the proper command in its 
progress message.  I have now updated it to display useful information 
about the system being tested, and to dump the pool configuration.


It is really interesting seeing the various posted numbers.  This is 
as close as it comes to a common benchmark.  A sort of sanity check.


What is most interesting to me is the reported performance for those 
who paid for really fast storage hardware and are using what should be 
really fast storage configurations.  The reason why it is interesting 
is that there seems to be a hardware-independent cap on maximum read 
performance.  It seems that ZFS's read algorithm is rate-limiting the 
read so that regardless of how nice the hardware is, there is a peak 
read limit.


There can be no other explanation as to why an ideal configuration of 
"Thumper II" SAS type hardware is neck and neck with my own setup, and 
quite similar to another fast system as well.  My own setup is 
delivering less than 1/2 the performance that I would expect for the 
initial read (iozone says it can read 540MB/second from a huge file). 
Do the math and see if you think that zfs is giving you the 
read performance you expect based on your hardware.


I think that we are encountering several bugs here.  We also have a 
general read bottleneck.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Ross
For what it's worth, I just repeated that test.  The timings are suspiciously 
similar.  This is very definitely a reproducible bug:

zfs unmount rc-pool/zfscachetest
zfs mount rc-pool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m45.69s
user0m10.22s
sys 0m53.29s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real15m47.48s
user0m10.58s
sys 1m10.96s
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman



I also ran this on my future RAID/NAS. Intel Atom 330 (D945GCLF2) dual 
core 1.6ghz, on a single HDD pool. svn_114, 64 bit, 2GB RAM.


bash-3.23 ./zfs-cache-test.ksh zboot
zfs create zboot/zfscachetest
creating data file set (3000 files of 8192000 bytes) under 
/zboot/zfscachetest ...

done1
zfs unmount zboot/zfscachetest
zfs mount zboot/zfscachetest

doing initial (unmount/mount) 'cpio -c 131072 -o . /dev/null'
48000256 blocks

real7m45.96s
user0m6.55s
sys 1m20.85s

doing second 'cpio -c 131072 -o . /dev/null'
48000256 blocks

real7m50.35s
user0m6.76s
sys 1m32.91s

feel free to clean up with 'zfs destroy zboot/zfscachetest'.





Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue for 
a week now.  A 4X reduction in file read performance due to having read 
the file before is terrible, and of course the situation is considerably 
worse if the file was previously mmapped as well.  Many of us have sent 
a lot of money to Sun and were not aware that ZFS is sucking the life 
out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. For 
example, I reproduced it on my Blade 2500 (SPARC) which uses a simple 
mirrored rpool.  On that system there is a 1.8X read slowdown from the 
file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/rpool/zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this bug.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Kurt Schreiner
On Tue, Jul 14, 2009 at 08:54:36AM +0200, Ross wrote:
> Ok, build 117 does seem a lot better.  The second run is slower,
> but not by such a huge margin.
Hm, I can't support this:

SunOS fred 5.11 snv_117 sun4u sparc SUNW,Sun-Fire-V440
The system has 16GB of Ram, pool is mirrored over two FUJITSU-MBA3147NC.

>-1007: sudo ksh zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (4000 files of 8192000 bytes) under /rpool/zfscachetest 
...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'tar to /dev/null'

real5m12.61s
user0m0.30s
sys 1m28.36s

Doing second 'tar to /dev/null'

real11m13.93s
user0m0.22s
sys 1m37.41s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.
 user=2.32 sec, sys=343.41 sec, elapsed=23:39.41 min, cpu use=24.3%

And here's what arcstat.pl has to say when starting the second read:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c  
11:53:26   11K   895  7410   854  10013  10013G   13G  
11:53:27   12K   832  6390   793  10013  10013G   13G  
11:53:28   11K   832  7390   793  10013  10013G   13G  
11:53:29   11K   832  7390   793  10013   7613G   13G  
11:53:30   12K   896  7420   854  10014  10013G   13G  
11:53:31   11K   832  7390   793  10013  10013G   13G  
11:53:32   11K   768  6360   732  10012  10013G   13G  
11:53:33   11K   832  7390   793  10013  10013G   13G  
11:53:347K   497  7   2533   244   99 4   1113G   13G  
11:53:355K   385  7   3857 00 0013G   13G  
11:53:365K   374  7   3747 00 0013G   13G  
11:53:375K   368  7   3687 00 0013G   13G  
11:53:384K   340  7   3407 00 0013G   13G  
11:53:395K   383  7   3837 00 0013G   13G  
11:53:405K   406  7   4067 00 0013G   13G  
11:53:414K   360  7   3607 00 0013G   13G  
11:53:424K   328  7   3287 00 0013G   13G  
11:53:434K   346  7   3467 00 0013G   13G  
11:53:444K   346  7   3467 00 0013G   13G  
11:53:454K   319  7   3197 00 0013G   13G  
11:53:474K   337  7   3377 00 0013G   13G  

I used tar in this run instead of cpio, just to give it a try...
[time (find . -type f | xargs -i tar cf /dev/null {} )]

Another run with Bob's new script: (rpool/zfscachetest not destroyed before
this run, so wall clock time below is lower)

>-1008: sudo ksh zfs-cache-test.ksh.1
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
64000512 blocks

real4m40.25s
user0m7.96s
sys 1m28.62s

Doing second 'cpio -C 131072 -o > /dev/null'
64000512 blocks

real11m0.08s
user0m7.37s
sys 1m38.58s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.
 user=15.35 sec, sys=187.87 sec, elapsed=15:43.65 min, cpu use=21.5%

Not much difference to the "tar"-run...

Kurt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman


Ah yes, my apologies! I haven't quite worked out why OsX VNC server 
can't handle keyboard mappings. I have to copy'paste "@" even. As I 
pasted the output into my mail over VNC, it would have destroyed the 
(not very) "unusual" characters.



Ross wrote:

Aaah, nevermind, it looks like there's just a rogue 9 appeared in your output.  
It was just a standard run of 3,000 files.


--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Ross
Aaah, nevermind, it looks like there's just a rogue 9 appeared in your output.  
It was just a standard run of 3,000 files.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman


I have no idea. I downloaded the script from Bob without modifications 
and ran it specifying only the name of our pool. Should I have changed 
something to run the test?


We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 running 
svn117 for ZFS quotas. Worth trying on both?


Lund




Ross wrote:

Jorgen,

Am I right in thinking the numbers here don't quite work.  48M blocks is just 
9,000 files isn't it, not 93,000?

I'm asking because I had to repeat a test earlier - I edited the script with 
vi, but when I ran it, it was still using the old parameters.  I ignored it as 
a one off, but I'm wondering if your test has done a similar thing.

Ross



x4540 running svn117

# ./zfs-cache-test.ksh zpool1
zfs create zpool1/zfscachetest
creating data file set 93000 files of 8192000 bytes0
under 
/zpool1/zfscachetest ...

done1
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

doing initial (unmount/mount) 'cpio -o . /dev/null'
48000247 blocks

real4m7.13s
user0m9.27s
sys 0m49.09s

doing second 'cpio -o . /dev/null'
48000247 blocks

real4m52.52s
user0m9.13s
sys 0m47.51s








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discu
ss


--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Ross
Jorgen,

Am I right in thinking the numbers here don't quite work.  48M blocks is just 
9,000 files isn't it, not 93,000?

I'm asking because I had to repeat a test earlier - I edited the script with 
vi, but when I ran it, it was still using the old parameters.  I ignored it as 
a one off, but I'm wondering if your test has done a similar thing.

Ross


> 
> x4540 running svn117
> 
> # ./zfs-cache-test.ksh zpool1
> zfs create zpool1/zfscachetest
> creating data file set 93000 files of 8192000 bytes0
> under 
> /zpool1/zfscachetest ...
> done1
> zfs unmount zpool1/zfscachetest
> zfs mount zpool1/zfscachetest
> 
> doing initial (unmount/mount) 'cpio -o . /dev/null'
> 48000247 blocks
> 
> real4m7.13s
> user0m9.27s
> sys 0m49.09s
> 
> doing second 'cpio -o . /dev/null'
> 48000247 blocks
> 
> real4m52.52s
> user0m9.13s
> sys 0m47.51s
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Ok, build 117 does seem a lot better.  The second run is slower, but not by 
such a huge margin. This was the end of the 98GB test:

Creating data file set (12000 files of 8192000 bytes) under /rpool/zfscachetest 
...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
192000985 blocks

real26m17.80s
user0m47.55s
sys 3m56.94s

Doing second 'cpio -o > /dev/null'
192000985 blocks

real27m14.35s
user0m46.84s
sys 4m39.85s
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Randy Jones
Bob: Sun v490, 4x1.35 processors, 32GB ram,  Solaris 10u7 working with a raidz1 
zpool made up of 6x146 sas drives on a j4200. Results of your running your 
script:

# zfs-cache-test.ksh pool2
zfs create pool2/zfscachetest
Creating data file set (6000 files of 8192000 bytes) under /pool2/zfscachetest 
...
Done!
zfs unmount pool2/zfscachetest
zfs mount pool2/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
96000512 blocks

real5m32.58s
user0m12.75s
sys 2m56.58s

Doing second 'cpio -C 131072 -o > /dev/null'
96000512 blocks

real17m26.68s
user0m12.97s
sys 4m34.33s

Feel free to clean up with 'zfs destroy pool2/zfscachetest'.
#

Same results as you are seeing.

Thanks Randy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Joerg Schilling wrote:


If you continue to use cpio and the cpio archive format, you force copying a
lot of data as the cpio archive format does use odd header sizes and starts
new files "unaligned" directly after the archive header.


Note that the output of cpio is sent to /dev/null in this test so it 
is only the reading part which is significant as long as cpio's CPU 
use is low.  Sun Service won't have a clue about 'star' since it is 
not part of Solaris 10.  It is best to stick with what they know so 
the problem report won't be rejected.


If star is truely more efficient than cpio, it may make the difference 
even more obvious.  What did you discover when you modified my test 
script to use 'star' instead?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mark Shellenbaum wrote:


I've opened the following bug to track this issue:

6859997 zfs caching performance problem

We need to track down if/when this problem was introduced or if it 
has always been there.


I think that it has always been there as long as I have been using ZFS 
(1-3/4 years).  Sometimes it takes a while for me to wake up and smell 
the coffee.


Meanwhile I have opened a formal service request (IBIS 71326296) with 
Sun Support.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 4:41 PM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Jim Mauro wrote:
>
>> Bob - Have you filed a bug on this issue? I am not up to speed on this
>> thread, so I can not comment on whether or not there is a bug here, but you
>> seem to have a test case and supporting data. Filing a bug will get the
>> attention of ZFS engineering.
>
> No, I have not filed a bug report yet.  Any problem report to Sun's Service
> department seems to require at least one day's time.
>
> I was curious to see if recent OpenSolaris suffers from the same problem,
> but posted results (thus far) are not as conclusive as they are for Solaris
> 10.

It doesn't seem to be quite as bad as S10, but there is certainly a hit.

# /var/tmp/zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (400 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
6400033 blocks

real1m26.16s
user0m12.83s
sys 0m25.88s

Doing second 'cpio -o > /dev/null'
6400033 blocks

real2m44.46s
user0m12.59s
sys 0m24.34s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

# cat /etc/release
OpenSolaris 2009.06 snv_111b SPARC
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009

# uname -srvp
SunOS 5.11 snv_111b sparc

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Jim Mauro wrote:

Bob - Have you filed a bug on this issue? I am not up to speed on 
this thread, so I can not comment on whether or not there is a bug 
here, but you seem to have a test case and supporting data. Filing a 
bug will get the attention of ZFS engineering.


No, I have not filed a bug report yet.  Any problem report to Sun's 
Service department seems to require at least one day's time.


I was curious to see if recent OpenSolaris suffers from the same 
problem, but posted results (thus far) are not as conclusive as they 
are for Solaris 10.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Mike Gerdts wrote:
> >
> > Using cpio's -C option seems to not change the behavior for this bug,
> > but I did see a performance difference with the case where I hadn't
> > modified the zfs caching behavior.  That is, the performance of the
> > tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
> >> /dev/null".  At this point cpio was spending roughly 13% usr and 87%
> > sys.
>
> Interesting.  I just updated zfs-cache-test.ksh on my web site so that 
> it uses 131072 byte blocks.  I see a tiny improvement in performance 
> from doing this, but I do see a bit less CPU consumption so the CPU 
> consumption is essentially zero.  The bug remains. It seems best to 
> use ZFS's ideal block size so that issues don't get confused.

If you continue to use cpio and the cpio archive format, you force copying a 
lot of data as the cpio archive format does use odd header sizes and starts
new files "unaligned" directly after the archive header.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Mike Gerdts  wrote:

> Using cpio's -C option seems to not change the behavior for this bug,
> but I did see a performance difference with the case where I hadn't
> modified the zfs caching behavior.  That is, the performance of the
> tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
> >/dev/null".  At this point cpio was spending roughly 13% usr and 87%
> sys.

As mentioned before, a lot of the user CPU time from cpio is spend to 
create cpio archive headers or caused by the fact that cpio archives copy 
the file content to unaligned archive locations while the "tar" archive format
starts each new file on a modulo 512 offset in the archive. This requires a lot
of unneeded copying of file data. You can of course slightly modify parameters
even with cpio. I am not sure what you mean with "13% usr and 87%" as star
typically spends 6% of the wall clock time in user+sys CPU where the user 
CPU time is typically only 1.5% of the system CPU time.

In the "cached" case, it is obviously ZFS that's responsible for the slow down, 
regardless what cpio did in the other case.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Joerg Schilling wrote:
> >
> > cpio reads/writes in 8192 byte chunks from the filesystem.
>
> Yes, I was just reading the cpio manual page and see that.  I think 
> that re-reading the 128K zfs block 16 times to satisfy each request 
> for 8192 bytes explains the 16X performance loss when caching is 
> disabled.  I don't think that this is strictly a bug since it is what 
> the database folks are looking for.

cpio spends 1.6x more SYStem CPU time than star. This may mainly be a result
from the fact that cpio (when using the cpio archive format) reads/writes 512 
byte blocks from/to the archive file.

cpio by default spends 19x more USER CPU time than star. This seems to be a 
result of the inapropriate header structure with the cpio archive format and 
reblocking and cannot be easily changed (well you could use "scpio" - or in 
other words the "cpio" CLI personality of star, but this reduces the USER CPU
time only by 10%-50% compared to Sun cpio).

cpio is a program from the past that does no fit well in our current world.
The internal limits cannot be lifted without creating a new incompatible 
archive format.

In other words: if you use cpio for your work, you have to live with it's 
problems ;-)

If you like to play with different parameter values (e.g. read sizes), cpio 
is unsuitable for tests. Star allows you to set big filesystem read sizes by
using the FIFO and playing with the fifo size and smell filesystem read sizes by
switching off the FIFO and playing with the archive block size.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mark Shellenbaum

Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue for 
a week now.  A 4X reduction in file read performance due to having read 
the file before is terrible, and of course the situation is considerably 
worse if the file was previously mmapped as well.  Many of us have sent 
a lot of money to Sun and were not aware that ZFS is sucking the life 
out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. For 
example, I reproduced it on my Blade 2500 (SPARC) which uses a simple 
mirrored rpool.  On that system there is a 1.8X read slowdown from the 
file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest



I've opened the following bug to track this issue:

6859997 zfs caching performance problem

We need to track down if/when this problem was introduced or if it has 
always been there.



   -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Ross Walker wrote:


Have you tried limiting the ARC so it doesn't squash the page cache?


Yes, the ARC is limited to 10GB, leaving another 10GB for the OS and 
applications.  Resource limits are not the problem.  There is a ton of 
memory and CPU to go around.


Current /etc/system tunables:

set maxphys = 0x2
set zfs:zfs_arc_max = 0x28000
set zfs:zfs_write_limit_override = 0xea60
set zfs:zfs_vdev_max_pending = 5

Make sure page cache has enough for mmap plus buffers for bouncing between it 
and the ARC. I would say 1GB minimum, 2 to be safe.


In this testing mmap is not being used (cpio does not use mmap) so the 
page cache is not an issue.  It does become an issue for 'cp -r' 
though where we see the I/O be substantially (and essentially 
permanently) reduced even more for impacted files until the filesystem 
is unmounted.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mike Gerdts wrote:


Using cpio's -C option seems to not change the behavior for this bug,
but I did see a performance difference with the case where I hadn't
modified the zfs caching behavior.  That is, the performance of the
tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))

/dev/null".  At this point cpio was spending roughly 13% usr and 87%

sys.


Interesting.  I just updated zfs-cache-test.ksh on my web site so that 
it uses 131072 byte blocks.  I see a tiny improvement in performance 
from doing this, but I do see a bit less CPU consumption so the CPU 
consumption is essentially zero.  The bug remains. It seems best to 
use ZFS's ideal block size so that issues don't get confused.


Using an ARC monitoring script called 'arcstat.pl' I see a huge number 
of 'dmis' events when performance is poor.  The ARC size is 7GB, which 
is less than its prescribed cap of 10GB.


Better:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
15:39:37   20K1K  65801K  10019  100 7G   10G
15:39:38   19K1K  55701K  10019  100 7G   10G
15:39:39   19K1K  65401K  10018  100 7G   10G
15:39:40   17K1K  65101K  10017  100 7G   10G

Worse:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
15:43:244K   280  6   2806 00 4  100 9G   10G
15:43:254K   277  6   2776 00 4  100 9G   10G
15:43:264K   268  6   2686 00 5  100 9G   10G
15:43:274K   259  6   2596 00 4  100 9G   10G

An ARC stats summary from a tool called 'arc_summary.pl' is appended 
to this message.


Operation is quite consistent across the full span of files.  Since 
'dmis' is still low when things are "good" (and even when the ARC has 
surely cycled already) this leads me to believe that prefetch is 
mostly working and is usually satisfying read requests.  When things 
go bad I see that 'dmiss' becomes 100% of the misses.  A hypothesis is 
that if zfs thinks that the data might be in the ARC (due to having 
seen the file before) that it disables file prefetch entirely, 
assuming that it can retrieve the data from its cache.  Then once it 
finally determines that there is no cached data after all, it issues a 
read request.


Even the "better" read performance is 1/2 of what I would expect from 
my hardware and based on prior test results from 'iozone'.  More 
prefetch would surely help.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

System Memory:
 Physical RAM:  20470 MB
 Free Memory :  2511 MB
 LotsFree:  312 MB

ZFS Tunables (/etc/system):
 * set zfs:zfs_arc_max = 0x3
 set zfs:zfs_arc_max = 0x28000
 * set zfs:zfs_arc_max = 0x2
 set zfs:zfs_write_limit_override = 0xea60
 * set zfs:zfs_write_limit_override = 0xa000
 set zfs:zfs_vdev_max_pending = 5

ARC Size:
 Current Size: 8735 MB (arcsize)
 Target Size (Adaptive):   10240 MB (c)
 Min Size (Hard Limit):1280 MB (zfs_arc_min)
 Max Size (Hard Limit):10240 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  95%9791 MB (p)
 Most Frequently Used Cache Size: 4%448 MB (c-p)

ARC Efficency:
 Cache Access Total: 827767314
 Cache Hit Ratio:  96%   800123657  [Defined State for 
buffer]
 Cache Miss Ratio:  3%   27643657   [Undefined State for 
Buffer]
 REAL Hit Ratio:   89%   743665046  [MRU/MFU Hits Only]

 Data Demand   Efficiency:99%
 Data Prefetch Efficiency:61%

CACHE HITS BY CACHE LIST:
  Anon:5%47497010   [ New 
Customer, First Cache Hit ]
  Most Recently Used: 33%271365449 (mru)[ 
Return Customer ]
  Most Frequently Used:   59%472299597 (mfu)[ 
Frequent Customer ]
  Most Recently Used Ghost:0%1700764 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  0%7260837 (mfu_ghost)[ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data:73%589582518
  Prefetch Data:   2%20424879
  Demand Metadata:17%139111510
  Prefetch Metadata:   6%51004750
CACHE MISSES BY DATA TYPE:
  Demand Data:21%5814459
  Prefetch Data:  46%12788265
  Demand Metadata:27%7700169
	  Prefetch Metada

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross Walker
On Jul 13, 2009, at 2:54 PM, Bob Friesenhahn > wrote:



On Mon, 13 Jul 2009, Brad Diggs wrote:

You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html


Your post makes it sound like there is not a bug in the operating  
system.  It does not take long to see that there is a bug in the  
Solaris 10 operating system.  It is not clear if the same bug is  
shared by current OpenSolaris since it seems like it has not been  
tested.


Solaris 10 U7 reads files that it has not seen before at a constant  
rate regardless of the amount of file data it has already read.   
When the file is read a second time, the read is 4X or more slower.   
If reads were slowing down because the ARC was slow to expunge stale  
data, then that would be apparent on the first read pass.  However,  
the reads are not slowing down in the first read pass.  ZFS goes  
into the weeds if it has seen a file before but none of the file  
data is resident in the ARC.


It is pathetic that a Sun RAID array that I paid $21K for out of my  
own life savings is not able to perform better than the cheapo  
portable USB drives that I use for backup because of ZFS.  This is  
making me madder and madder by the minute.


Have you tried limiting the ARC so it doesn't squash the page cache?

Make sure page cache has enough for mmap plus buffers for bouncing  
between it and the ARC. I would say 1GB minimum, 2 to be safe.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 3:23 PM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Joerg Schilling wrote:
>>
>> cpio reads/writes in 8192 byte chunks from the filesystem.
>
> Yes, I was just reading the cpio manual page and see that.  I think that
> re-reading the 128K zfs block 16 times to satisfy each request for 8192
> bytes explains the 16X performance loss when caching is disabled.  I don't
> think that this is strictly a bug since it is what the database folks are
> looking for.
>
> Bob

I did other tests with "dd bs=128k" and verified via truss that each
read(2) was returning 128K.  I thought I had seen excessive reads
there too, but now I can't reproduce that.  Creating another fs with
recordsize=8k seems to make this behavior go away - things seem to be
working as designed. I'll go update the (nota-)bug.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 3:16 PM, Joerg
Schilling wrote:
> Bob Friesenhahn  wrote:
>
>> On Mon, 13 Jul 2009, Mike Gerdts wrote:
>> >
>> > FWIW, I hit another bug if I turn off primarycache.
>> >
>> > http://defect.opensolaris.org/bz/show_bug.cgi?id=10004
>> >
>> > This causes really abysmal performance - but equally so for repeat runs!
>>
>> It is quite facinating seeing the huge difference in I/O performance
>> from these various reports.  The bug you reported seems likely to be
>> that without at least a little bit of caching, it is necessary to
>> re-request the underlying 128K ZFS block several times as the program
>> does numerous smaller I/Os (cpio uses 10240 bytes?) across it.
>
> cpio reads/writes in 8192 byte chunks from the filesystem.
>
> BTW: star by default creates a shared memory based FIFO of 8 MB size and
> reads in the biggest possible size that would currently fit into the FIFO.
>
> Jörg

Using cpio's -C option seems to not change the behavior for this bug,
but I did see a performance difference with the case where I hadn't
modified the zfs caching behavior.  That is, the performance of the
tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
>/dev/null".  At this point cpio was spending roughly 13% usr and 87%
sys.

I haven't tried star, but I did see that I could also reproduce with
"cat $file | cat > /dev/null".  This seems like a worthless use of
cat, but it forces cat to actually copy data from input to output
unlike when cat can mmap input and output.  When it does that and
output is /dev/null Solaris is smart enough to avoid any reads.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Joerg Schilling wrote:


cpio reads/writes in 8192 byte chunks from the filesystem.


Yes, I was just reading the cpio manual page and see that.  I think 
that re-reading the 128K zfs block 16 times to satisfy each request 
for 8192 bytes explains the 16X performance loss when caching is 
disabled.  I don't think that this is strictly a bug since it is what 
the database folks are looking for.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Jim Mauro

Bob - Have you filed a bug on this issue?
I am not up to speed on this thread, so I can
not comment on whether or not there is a bug
here, but you seem to have a test case and supporting
data. Filing a bug will get the attention of ZFS
engineering.

Thanks,
/jim


Bob Friesenhahn wrote:

On Mon, 13 Jul 2009, Mike Gerdts wrote:


FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!


It is quite facinating seeing the huge difference in I/O performance 
from these various reports.  The bug you reported seems likely to be 
that without at least a little bit of caching, it is necessary to 
re-request the underlying 128K ZFS block several times as the program 
does numerous smaller I/Os (cpio uses 10240 bytes?) across it. Totally 
disabling data caching seems best reserved for block-oriented 
databases which are looking for a substitute for directio(3C).


It is easily demonstrated that the problem seen in Solaris 10 (jury 
still out on OpenSolaris although one report has been posted) is due 
to some sort of confusion.  It is not due to delays caused by purging 
old data from the ARC.  If these delays were caused by purging data 
from the ARC, then 'zfs iostat' would start showing lower read 
performance once the ARC becomes full, but that is not the case.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Mike Gerdts wrote:
> >
> > FWIW, I hit another bug if I turn off primarycache.
> >
> > http://defect.opensolaris.org/bz/show_bug.cgi?id=10004
> >
> > This causes really abysmal performance - but equally so for repeat runs!
>
> It is quite facinating seeing the huge difference in I/O performance 
> from these various reports.  The bug you reported seems likely to be 
> that without at least a little bit of caching, it is necessary to 
> re-request the underlying 128K ZFS block several times as the program 
> does numerous smaller I/Os (cpio uses 10240 bytes?) across it. 

cpio reads/writes in 8192 byte chunks from the filesystem.

BTW: star by default creates a shared memory based FIFO of 8 MB size and
reads in the biggest possible size that would currently fit into the FIFO.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mike Gerdts wrote:


FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!


It is quite facinating seeing the huge difference in I/O performance 
from these various reports.  The bug you reported seems likely to be 
that without at least a little bit of caching, it is necessary to 
re-request the underlying 128K ZFS block several times as the program 
does numerous smaller I/Os (cpio uses 10240 bytes?) across it. 
Totally disabling data caching seems best reserved for block-oriented 
databases which are looking for a substitute for directio(3C).


It is easily demonstrated that the problem seen in Solaris 10 (jury 
still out on OpenSolaris although one report has been posted) is due 
to some sort of confusion.  It is not due to delays caused by purging 
old data from the ARC.  If these delays were caused by purging data 
from the ARC, then 'zfs iostat' would start showing lower read 
performance once the ARC becomes full, but that is not the case.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 9:34 AM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Alexander Skwar wrote:
>>
>> Still on S10 U7 Sparc M4000.
>>
>> So I'm now inline with the other results - the 2nd run is WAY slower. 4x
>> as slow.
>
> It would be good to see results from a few OpenSolaris users running a
> recent 64-bit kernel, and with fast storage to see if this is an OpenSolaris
> issue as well.

Indeed it is.  Using ldoms with tmpfs as the backing store for virtual
disks, I see:

With S10u7:

# ./zfs-cache-test.ksh testpool
zfs create testpool/zfscachetest
Creating data file set (300 files of 8192000 bytes) under
/testpool/zfscachetest ...
Done!
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m30.35s
user0m9.90s
sys 0m19.81s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m43.95s
user0m9.67s
sys 0m17.96s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

# ./zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m31.14s
user0m10.09s
sys 0m20.47s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m40.24s
user0m9.68s
sys 0m17.86s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.


When I move the zpool to a 2009.06 ldom,

# /var/tmp/zfs-cache-test.ksh testpool
zfs create testpool/zfscachetest
Creating data file set (300 files of 8192000 bytes) under
/testpool/zfscachetest ...
Done!
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m30.09s
user0m9.58s
sys 0m19.83s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m44.21s
user0m9.47s
sys 0m18.18s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

# /var/tmp/zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m29.89s
user0m9.58s
sys 0m19.72s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m44.40s
user0m9.59s
sys 0m18.24s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

Notice in these runs that each time the usr+sys time of the first run
adds up to the elapsed time - the rate was choked by CPU.  This is
verified by "prstat -mL".  The second run seemed to be slow due to a
lock as we had just demonstrated that the IO path can do more (not an
IO bottleneck) and "prstat -mL shows cpio at in sleep for a
significant amount of time.

FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!

# /var/tmp/zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real4m21.57s
user0m9.72s
sys 0m36.30s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real4m21.56s
user0m9.72s
sys 0m36.19s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.


This bug report contains more detail of the configuration.  One thing
not covered in that bug report is that the S10u7 ldom has 2048 MB of
RAM and the 2009.06 ldom has 2024 MB of RAM.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread sean walmsley
Sun X4500 (thumper) with 16Gb of memory running Solaris 10 U6 with patches 
current to the end of Feb 2009.

Current ARC size is ~6Gb.

ZFS filesystem created in a ~3.2 Tb pool consisting of 7 sets of mirrored 500Gb 
SATA drives.

I used 4000 8Mb files for a total of 32Gb.

run 1: ~140M/s average according to zpool iostat
real4m1.11s
user0m10.44s
sys 0m50.76s

run 2: ~37M/s average according to zpool iostat
real13m53.43s
user0m10.62s
sys 0m55.80s

A zfs unmount followed by a mount of the filesystem returned the performance to 
the run 1 case.

real3m58.16s
user0m11.54s
sys 0m51.95s

In summary, the second run performance drops to about 30% of the original run.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Brad Diggs wrote:

You might want to have a look at my blog on filesystem cache tuning...  It 
will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html


Your post makes it sound like there is not a bug in the operating 
system.  It does not take long to see that there is a bug in the 
Solaris 10 operating system.  It is not clear if the same bug is 
shared by current OpenSolaris since it seems like it has not been 
tested.


Solaris 10 U7 reads files that it has not seen before at a constant 
rate regardless of the amount of file data it has already read.  When 
the file is read a second time, the read is 4X or more slower.  If 
reads were slowing down because the ARC was slow to expunge stale 
data, then that would be apparent on the first read pass.  However, 
the reads are not slowing down in the first read pass.  ZFS goes into 
the weeds if it has seen a file before but none of the file data is 
resident in the ARC.


It is pathetic that a Sun RAID array that I paid $21K for out of my 
own life savings is not able to perform better than the cheapo 
portable USB drives that I use for backup because of ZFS.  This is 
making me madder and madder by the minute.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Brad Diggs
You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 4, 2009, at 2:48 AM, Phil Harman wrote:

ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
instead of the Solaris page cache. But mmap() uses the latter. So if  
anyone maps a file, ZFS has to keep the two caches in sync.


cp(1) uses mmap(2). When you use cp(1) it brings pages of the files  
it copies into the Solaris page cache. As long as they remain there  
ZFS will be slow for those files, even if you subsequently use  
read(2) to access them.


If you reboot, your cpio(1) tests will probably go fast again, until  
someone uses mmap(2) on the files again. I think tar(1) uses  
read(2), but from my iPod I can't be sure. It would be interesting  
to see how tar(1) performs if you run that test before cp(1) on a  
freshly rebooted system.


I have done some work with the ZFS team towards a fix, but it is  
only currently in OpenSolaris.


The other thing that slows you down is that ZFS only flushes to disk  
every 5 seconds if there are no synchronous writes. It would be  
interesting to see iostat -xnz 1 while you are running your tests.  
You may find the disks are writing very efficiently for one second  
in every five.


Hope this helps,
Phil

blogs.sun.com/pgdh


Sent from my iPod

On 4 Jul 2009, at 05:26, Bob Friesenhahn  
 wrote:



On Fri, 3 Jul 2009, Bob Friesenhahn wrote:


Copy MethodData Rate
==
cpio -pdum75 MB/s
cp -r32 MB/s
tar -cf - . | (cd dest && tar -xf -)26 MB/s


It seems that the above should be ammended.  Running the cpio based  
copy again results in zpool iostat only reporting a read bandwidth  
of 33 MB/second.  The system seems to get slower and slower as it  
runs.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Interesting, I repeated the test on a few other machines running newer builds.  
First impressions are good:

snv_114, virtual machine, 1GB RAM, 30GB disk - 16% slowdown.
(Only 9GB free so I ran an 8GB test)

Doing initial (unmount/mount) 'cpio -o > /dev/null'
1683 blocks

real3m4.85s
user0m16.74s
sys 0m41.69s

Doing second 'cpio -o > /dev/null'
1683 blocks

real3m34.58s
user0m18.85s
sys 0m45.40s


And again on snv_117, Sun x2200, 40GB RAM, single 500GB sata disk:

First run (with the default 24GB set):

real6m25.15s
user0m11.93s
sys 0m54.93s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real1m9.97s
user0m12.17s
sys 0m57.80s

... d'oh!  At least I know the ARC is working :-)


The second run, with a 98GB test is running now, I'll post the results in the 
morning.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Alexander Skwar wrote:


Still on S10 U7 Sparc M4000.

So I'm now inline with the other results - the 2nd run is WAY slower. 4x
as slow.


It would be good to see results from a few OpenSolaris users running a 
recent 64-bit kernel, and with fast storage to see if this is an 
OpenSolaris issue as well.


It seems likely to be more evident with fast SAS disks or SAN devices 
rather than a few SATA disks since the SATA disks have more access 
latency.  Pools composed of mirrors should offer less read latency as 
well.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Alexander Skwar wrote:


This is a M4000 mit 32 GB RAM and two HDs in a mirror.


I think that you should edit the script to increase the file count 
since your RAM size is big enough to cache most of the data.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Alexander Skwar
Here's a more useful output, with having set the number of
files to 6000, so that it has a dataset which is larger than the
amount of RAM.

--($ ~)-- time sudo ksh zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (6000 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
96000493 Blöcke

real8m44.82s
user0m46.85s
sys2m15.01s

Doing second 'cpio -o > /dev/null'
96000493 Blöcke

real29m15.81s
user0m45.31s
sys3m2.36s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

real48m40.890s
user1m47.192s
sys8m2.165s

Still on S10 U7 Sparc M4000.

So I'm now inline with the other results - the 2nd run is WAY slower. 4x
as slow.

Alexander
-- 
[[ http://zensursula.net ]]
[ Soc. => http://twitter.com/alexs77 | http://www.plurk.com/alexs77 ]
[ Mehr => http://zyb.com/alexws77 ]
[ Chat => Jabber: alexw...@jabber80.com | Google Talk: a.sk...@gmail.com ]
[ Mehr => AIM: alexws77 ]
[ $[ $RANDOM % 6 ] = 0 ] && rm -rf / || echo 'CLICK!'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Jorgen Lundman


x4540 running svn117

# ./zfs-cache-test.ksh zpool1
zfs create zpool1/zfscachetest
creating data file set 93000 files of 8192000 bytes0 under 
/zpool1/zfscachetest ...

done1
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

doing initial (unmount/mount) 'cpio -o . /dev/null'
48000247 blocks

real4m7.13s
user0m9.27s
sys 0m49.09s

doing second 'cpio -o . /dev/null'
48000247 blocks

real4m52.52s
user0m9.13s
sys 0m47.51s








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Daniel Rock

Hi,


Solaris 10U7, patched to the latest released patches two weeks ago.

Four ST31000340NS attached to two SI3132 SATA controller, RAIDZ1.

Selfmade system with 2GB RAM and an
  x86 (chipid 0x0 AuthenticAMD family 15 model 35 step 2 clock 2210 MHz)
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
processor.


On the first run throughput was ~110MB/s, on the second run only 80MB/s.

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 Blöcke

real3m37.17s
user0m11.15s
sys 0m47.74s

Doing second 'cpio -o > /dev/null'
48000247 Blöcke

real4m55.69s
user0m10.69s
sys 0m47.57s




Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Hey Bob,

Here are my results on a Dual 2.2Ghz Opteron, 8GB of RAM and 16 SATA disks 
connected via a Supermicro AOC-SAT2-MV8 (albeit with one dead drive).

Looks like a 5x slowdown to me:

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m46.45s
user0m10.29s
sys 0m58.27s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real15m50.62s
user0m10.54s
sys 1m11.86s

Ross
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Alexander Skwar
Bob,

On Sun, Jul 12, 2009 at 23:38, Bob
Friesenhahn wrote:
> There has been no forward progress on the ZFS read performance issue for a
> week now.  A 4X reduction in file read performance due to having read the
> file before is terrible, and of course the situation is considerably worse
> if the file was previously mmapped as well.  Many of us have sent a lot of
> money to Sun and were not aware that ZFS is sucking the life out of our
> expensive Sun hardware.
>
> It is trivially easy to reproduce this problem on multiple machines. For
> example, I reproduced it on my Blade 2500 (SPARC) which uses a simple
> mirrored rpool.  On that system there is a 1.8X read slowdown from the file
> being accessed previously.
>
> In order to raise visibility of this issue, I invite others to see if they
> can reproduce it in their ZFS pools.  The script at
>
> http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh
>
> Implements a simple test.

--($ ~)-- time sudo ksh zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 Blöcke

real4m7.70s
user0m24.10s
sys 1m5.99s

Doing second 'cpio -o > /dev/null'
48000247 Blöcke

real1m44.88s
user0m22.26s
sys 0m51.56s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

real10m47.747s
user0m54.189s
sys 3m22.039s

This is a M4000 mit 32 GB RAM and two HDs in a mirror.

Alexander
-- 
[[ http://zensursula.net ]]
[ Soc. => http://twitter.com/alexs77 | http://www.plurk.com/alexs77 ]
[ Mehr => http://zyb.com/alexws77 ]
[ Chat => Jabber: alexw...@jabber80.com | Google Talk: a.sk...@gmail.com ]
[ Mehr => AIM: alexws77 ]
[ $[ $RANDOM % 6 ] = 0 ] && rm -rf / || echo 'CLICK!'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Gaëtan Lehmann


Hi,

Here is the result on a Dell Precision T5500 with 24 GB of RAM and two  
HD in a mirror (SATA, 7200 rpm, NCQ).


[glehm...@marvin2 tmp]$ uname -a
SunOS marvin2 5.11 snv_117 i86pc i386 i86pc Solaris
[glehm...@marvin2 tmp]$ pfexec ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /rpool/ 
zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real8m19,74s
user0m6,47s
sys 0m25,32s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real10m42,68s
user0m8,35s
sys 0m30,93s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

HTH,

Gaëtan



Le 13 juil. 09 à 01:15, Scott Lawson a écrit :


Bob,

Output of my run for you. System is a M3000 with 16 GB RAM and 1  
zpool called test1
which is contained on a raid 1 volume on a 6140 with 7.50.13.10  
firmware on

the RAID controllers. RAid 1 is made up of two 146GB 15K FC disks.

This machine is brand new with a clean install of S10 05/09. It is  
destined to become a Oracle 10 server with

ZFS filesystems for zones and DB volumes.

[r...@xxx /]#> uname -a
SunOS xxx 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise
[r...@xxx /]#> cat /etc/release
 Solaris 10 5/09 s10s_u7wos_08 SPARC
 Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
  Use is subject to license terms.
   Assembled 30 March 2009

[r...@xxx /]#> prtdiag -v | more
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise  
M3000 Server

System clock frequency: 1064 MHz
Memory size: 16384 Megabytes


Here is the run output for you.

[r...@xxx tmp]#> ./zfs-cache-test.ksh test1
zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /test1/ 
zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m48.94s
user0m21.58s
sys 0m44.91s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real6m39.87s
user0m21.62s
sys 0m46.20s

Feel free to clean up with 'zfs destroy test1/zfscachetest'.

Looks like a 25% performance loss for me. I was seeing around 80MB/s  
sustained

on the first run and around 60M/'s sustained on the 2nd.

/Scott.


Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance  
issue for a week now.  A 4X reduction in file read performance due  
to having read the file before is terrible, and of course the  
situation is considerably worse if the file was previously mmapped  
as well.  Many of us have sent a lot of money to Sun and were not  
aware that ZFS is sucking the life out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple  
machines. For example, I reproduced it on my Blade 2500 (SPARC)  
which uses a simple mirrored rpool.  On that system there is a 1.8X  
read slowdown from the file being accessed previously.


In order to raise visibility of this issue, I invite others to see  
if they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space  
to run, but the main requirement is that the disk space consumed be  
more than available memory so that file data gets purged from the  
ARC. The script needs to run as root since it creates a filesystem  
and uses mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the  
script.  The pool 'rpool' is used by default, but the name of the  
pool to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under / 
Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /rpool/ 
zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this  
bu

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-12 Thread Scott Lawson

Bob,

Output of my run for you. System is a M3000 with 16 GB RAM and 1 zpool 
called test1

which is contained on a raid 1 volume on a 6140 with 7.50.13.10 firmware on
the RAID controllers. RAid 1 is made up of two 146GB 15K FC disks.

This machine is brand new with a clean install of S10 05/09. It is 
destined to become a Oracle 10 server with

ZFS filesystems for zones and DB volumes.

[r...@xxx /]#> uname -a
SunOS xxx 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise
[r...@xxx /]#> cat /etc/release
  Solaris 10 5/09 s10s_u7wos_08 SPARC
  Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
   Use is subject to license terms.
Assembled 30 March 2009

[r...@xxx /]#> prtdiag -v | more
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise 
M3000 Server

System clock frequency: 1064 MHz
Memory size: 16384 Megabytes


Here is the run output for you.

[r...@xxx tmp]#> ./zfs-cache-test.ksh test1
zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/test1/zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m48.94s
user0m21.58s
sys 0m44.91s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real6m39.87s
user0m21.62s
sys 0m46.20s

Feel free to clean up with 'zfs destroy test1/zfscachetest'.

Looks like a 25% performance loss for me. I was seeing around 80MB/s 
sustained

on the first run and around 60M/'s sustained on the 2nd.

/Scott.


Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue 
for a week now.  A 4X reduction in file read performance due to having 
read the file before is terrible, and of course the situation is 
considerably worse if the file was previously mmapped as well.  Many 
of us have sent a lot of money to Sun and were not aware that ZFS is 
sucking the life out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. 
For example, I reproduced it on my Blade 2500 (SPARC) which uses a 
simple mirrored rpool.  On that system there is a 1.8X read slowdown 
from the file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh 



Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/rpool/zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this bug.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-12 Thread Bob Friesenhahn
There has been no forward progress on the ZFS read performance issue 
for a week now.  A 4X reduction in file read performance due to having 
read the file before is terrible, and of course the situation is 
considerably worse if the file was previously mmapped as well.  Many 
of us have sent a lot of money to Sun and were not aware that ZFS is 
sucking the life out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. 
For example, I reproduced it on my Blade 2500 (SPARC) which uses a 
simple mirrored rpool.  On that system there is a 1.8X read slowdown 
from the file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. 
The script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/rpool/zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this 
bug.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-09 Thread William Bauer
I don't swear.  The word it bleeped was not a bad word
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-09 Thread William Bauer
I have a much more generic question regarding this thread.  I have a sun T5120 
(T2 quad core, 1.4GHz) with two 10K RPM SAS drives in a mirrored pool running 
Solaris 10 u7.  The disk performance seems horrible.  I have the same apps 
running on a Sun X2100M2 (dual core 1.8GHz AMD) also running Solaris 10u7 and 
an old, really poor performing SATA drive (also with ZFS), and its disk 
performance seems at least 5x better.

I'm not offering much detail here, but I had been attributing this to what I've 
always observed--Solaris on x86 performs far better than on sparc for any app 
I've ever used.

I guess the real question would be is ZFS ready for production in Solaris 10, 
or should I flar this bugger up and rebuild with UFS?  This thread concerns me, 
and I really want to keep ZFS on this system for its many features.  Sorry if 
this is off-topic, but you guys got me wondering.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-07 Thread Bob Friesenhahn

On Tue, 7 Jul 2009, Joerg Schilling wrote:


Based on the prior discussions of using mmap() with ZFS and the way
ZFS likes to work, my guess is that POSIX_FADV_NOREUSE does nothing at
all and POSIX_FADV_DONTNEED probably does not work either.  These are
pretty straightforward to implement with UFS since UFS benefits from
the existing working madvise() functionality.


I did run my tests on UFS...


To clarify, you are not likely to see benefits until the system 
becomes starved for memory resources, or there is contention from 
multiple processes for read cache.  Solaris UFS is very well tuned so 
it is likely that a single process won't see much benefit.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >