Reference below...
On Sep 15, 2009, at 2:38 PM, Dale Ghent wrote:
On Sep 15, 2009, at 5:21 PM, Richard Elling wrote:
On Sep 15, 2009, at 1:03 PM, Dale Ghent wrote:
On Sep 10, 2009, at 3:12 PM, Rich Morris wrote:
On 07/28/09 17:13, Rich Morris wrote:
On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:
Sun has opened internal CR 6859997. It is now in Dispatched
state at High priority.
CR 6859997 has recently been fixed in Nevada. This fix will also
be in Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in
this CR without slowing down other prefetch patterns. Some
kstats have also been added to help improve the observability of
ZFS file prefetching.
Awesome that the fix exists. I've been having a hell of a time
with device-level prefetch on my iscsi clients causing tons of
ultimately useless IO and have resorted to setting
zfs_vdev_cache_max=1.
This only affects metadata. Wouldn't it be better to disable
prefetching for data?
Well, that's a surprise to me, but the zfs_vdev_cache_max=1 did
provide relief.
Just a general description of my environment:
My setup consists of several s10uX iscsi clients which get LUNs from
a pairs of thumpers. Each thumper pair exports identical LUNs to
each iscsi client, and the client in turn mirrors each LUN pair
inside a local zpool. As more space is needed on a client, a new LUN
is created on the pair of thumpers, exported to the iscsi client,
which then picks it up and we add a new mirrored vdev to the
client's existing zpool.
This is so we have data redundancy across chassis, so if one thumper
were to fail or need patching, etc, the iscsi clients just see one
of side of their mirrors drop out.
The problem that we observed on the iscsi clients was that, when
viewing things through 'zpool iostat -v', far more IO was being
requested from the LUs than was being registered for the vdev those
LUs were a member of.
Being that that was a iscsi setup with stock thumpers (no SSD ZIL,
L2ARC) serving the LUs, this apparently overhead caused far more
uneccessary disk IO on the thumpers, thus starving out IO for data
that was actually needed.
The working set is lots of small-ish files, entirely random IO.
If zfs_vdev_cache_max only affects metadata prefetches, which
parameter affects data prefetches ?
There are two main areas for prefetch: at the transactional object
layer (DMU) and the pooled
storage level (VDEV). zfs_vdev_cache_max works at the VDEV level,
obviously. The
DMU knows more about the context of the data and is where the
intelligent prefetching
algorithm works.
You can easily observe the VDEV cache statistics with kstat:
# kstat -n vdev_cache_stats
module: zfs instance: 0
name: vdev_cache_stats class: misc
crtime 38.83342625
delegations 14030
hits 105169
misses 59452
snaptime 4564628.18130739
This represents a 59% cache hit rate, which is pretty decent. But you
will notice far fewer delegations+hits+misses than real IOPS because
it is
only caching metadata.
Unfortunately, there is not a kstat for showing the DMU cache stats.
But a DTrace script can be written or, even easier, lockstat will show
if you are spending much time in the zfetch_* functions. More details
are in the Evil Tuning Guide, including how to set zfs_prefetch_disable
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
I have to admit that disabling device-level prefetching was a shot
in the dark, but it did result in drastically reduced contention on
the thumpers.
That is a little bit surprising. I would expect little metadata
activity for iscsi
service. It would not be surprising for older Solaris 10 releases,
though.
It was fixed in NV b70, circa July 2007.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss