Reference below...

On Sep 15, 2009, at 2:38 PM, Dale Ghent wrote:

On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:


On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:

On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:

On 07/28/09 17:13, Rich Morris wrote:
On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997. It is now in Dispatched state at High priority.

CR 6859997 has recently been fixed in Nevada. This fix will also be in Solaris 10 Update 9. This fix speeds up the sequential prefetch pattern described in this CR without slowing down other prefetch patterns. Some kstats have also been added to help improve the observability of ZFS file prefetching.

Awesome that the fix exists. I've been having a hell of a time with device-level prefetch on my iscsi clients causing tons of ultimately useless IO and have resorted to setting zfs_vdev_cache_max=1.

This only affects metadata. Wouldn't it be better to disable
prefetching for data?

Well, that's a surprise to me, but the zfs_vdev_cache_max=1 did provide relief.

Just a general description of my environment:

My setup consists of several s10uX iscsi clients which get LUNs from a pairs of thumpers. Each thumper pair exports identical LUNs to each iscsi client, and the client in turn mirrors each LUN pair inside a local zpool. As more space is needed on a client, a new LUN is created on the pair of thumpers, exported to the iscsi client, which then picks it up and we add a new mirrored vdev to the client's existing zpool.

This is so we have data redundancy across chassis, so if one thumper were to fail or need patching, etc, the iscsi clients just see one of side of their mirrors drop out.

The problem that we observed on the iscsi clients was that, when viewing things through 'zpool iostat -v', far more IO was being requested from the LUs than was being registered for the vdev those LUs were a member of.

Being that that was a iscsi setup with stock thumpers (no SSD ZIL, L2ARC) serving the LUs, this apparently overhead caused far more uneccessary disk IO on the thumpers, thus starving out IO for data that was actually needed.

The working set is lots of small-ish files, entirely random IO.

If zfs_vdev_cache_max only affects metadata prefetches, which parameter affects data prefetches ?

There are two main areas for prefetch: at the transactional object layer (DMU) and the pooled storage level (VDEV). zfs_vdev_cache_max works at the VDEV level, obviously. The DMU knows more about the context of the data and is where the intelligent prefetching
algorithm works.

You can easily observe the VDEV cache statistics with kstat:
        # kstat -n vdev_cache_stats
        module: zfs                             instance: 0
        name:   vdev_cache_stats                class:    misc
                crtime                          38.83342625
                delegations                     14030
                hits                            105169
                misses                          59452
                snaptime                        4564628.18130739

This represents a 59% cache hit rate, which is pretty decent.  But you
will notice far fewer delegations+hits+misses than real IOPS because it is
only caching metadata.

Unfortunately, there is not a kstat for showing the DMU cache stats.
But a DTrace script can be written or, even easier, lockstat will show
if you are spending much time in the zfetch_* functions.  More details
are in the Evil Tuning Guide, including how to set zfs_prefetch_disable
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide


I have to admit that disabling device-level prefetching was a shot in the dark, but it did result in drastically reduced contention on the thumpers.

That is a little bit surprising. I would expect little metadata activity for iscsi service. It would not be surprising for older Solaris 10 releases, though.
It was fixed in NV b70, circa July 2007.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to