Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

Richard Elling Tue, 15 Sep 2009 15:11:32 -0700

Reference below...

On Sep 15, 2009, at 2:38 PM, Dale Ghent wrote:

On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:
On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:
On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:
On 07/28/09 17:13, Rich Morris wrote:
On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:
Sun has opened internal CR 6859997. It is now in Dispatchedstate at High priority.
CR 6859997 has recently been fixed in Nevada. This fix will alsobe in Solaris 10 Update 9.This fix speeds up the sequential prefetch pattern described inthis CR without slowing down other prefetch patterns. Somekstats have also been added to help improve the observability ofZFS file prefetching.
Awesome that the fix exists. I've been having a hell of a timewith device-level prefetch on my iscsi clients causing tons ofultimately useless IO and have resorted to settingzfs_vdev_cache_max=1.
This only affects metadata. Wouldn't it be better to disable
prefetching for data?
Well, that's a surprise to me, but the zfs_vdev_cache_max=1 didprovide relief.
Just a general description of my environment:
My setup consists of several s10uX iscsi clients which get LUNs froma pairs of thumpers. Each thumper pair exports identical LUNs toeach iscsi client, and the client in turn mirrors each LUN pairinside a local zpool. As more space is needed on a client, a new LUNis created on the pair of thumpers, exported to the iscsi client,which then picks it up and we add a new mirrored vdev to theclient's existing zpool.
This is so we have data redundancy across chassis, so if one thumperwere to fail or need patching, etc, the iscsi clients just see oneof side of their mirrors drop out.
The problem that we observed on the iscsi clients was that, whenviewing things through 'zpool iostat -v', far more IO was beingrequested from the LUs than was being registered for the vdev thoseLUs were a member of.
Being that that was a iscsi setup with stock thumpers (no SSD ZIL,L2ARC) serving the LUs, this apparently overhead caused far moreuneccessary disk IO on the thumpers, thus starving out IO for datathat was actually needed.
The working set is lots of small-ish files, entirely random IO.
If zfs_vdev_cache_max only affects metadata prefetches, whichparameter affects data prefetches ?

There are two main areas for prefetch: at the transactional objectlayer (DMU) and the pooledstorage level (VDEV). zfs_vdev_cache_max works at the VDEV level,obviously. TheDMU knows more about the context of the data and is where theintelligent prefetching

algorithm works.

You can easily observe the VDEV cache statistics with kstat:
        # kstat -n vdev_cache_stats
        module: zfs                             instance: 0
        name:   vdev_cache_stats                class:    misc
                crtime                          38.83342625
                delegations                     14030
                hits                            105169
                misses                          59452
                snaptime                        4564628.18130739

This represents a 59% cache hit rate, which is pretty decent.  But you

will notice far fewer delegations+hits+misses than real IOPS becauseit is

only caching metadata.

Unfortunately, there is not a kstat for showing the DMU cache stats.
But a DTrace script can be written or, even easier, lockstat will show
if you are spending much time in the zfetch_* functions.  More details
are in the Evil Tuning Guide, including how to set zfs_prefetch_disable
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

I have to admit that disabling device-level prefetching was a shotin the dark, but it did result in drastically reduced contention onthe thumpers.

That is a little bit surprising. I would expect little metadataactivity for iscsiservice. It would not be surprising for older Solaris 10 releases,though.

It was fixed in NV b70, circa July 2007.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

Reply via email to