Re: [zfs-discuss] ZFS QoS and priorities

2012-12-06 Thread Matt Van Mater
>
>
>
> I'm unclear on the best way to warm data... do you mean to simply `dd
> if=/volumes/myvol/data of=/dev/null`?  I have always been under the
> impression that ARC/L2ARC has rate limiting how much data can be added to
> the cache per interval (i can't remember the interval).  Is this not the
> case?  If there is some rate limiting in place, dd-ing the data like my
> example above would not necessarily cache all of the data... it might take
> several iterations to populate the cache, correct?
>

Quick update... I found at least one reference to the rate limiting I was
referring to.  It was Richard from ~2.5 years ago :)
http://marc.info/?l=zfs-discuss&m=127060523611023&w=2

I assume the source code reference is still valid, in which case a
population of 8MB per 1 second into L2ARC is extremely slow in my books and
very conservative... It would take a very long time to warm the hundreds of
gigs of VMs we have into cache.  Perhaps the L2ARC_WRITE_BOOST tunable
might be a good place to aggressively warm a cache, but my preference is to
not touch the tunables if I have a choice.  I'd rather the system default
be updated to reflect modern hardware, that way everyone benefits and I'm
not running some custom build.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS QoS and priorities

2012-12-06 Thread Matt Van Mater
>
>
> At present, I do not see async write QoS as being interesting. That leaves
> sync writes and reads
> as the managed I/O. Unfortunately, with HDDs, the variance in response
> time >> queue management
> time, so the results are less useful than the case with SSDs. Control
> theory works, once again.
> For sync writes, they are often latency-sensitive and thus have the
> highest priority. Reads have
> lower priority, with prefetch reads at lower priority still.
>
>
This makes sense for the most part, and i agree that with spinning HDDs
there might be minimal benefit.  It is why I suggested that ARC/L2ARC might
be the reasonable starting place for an idea like this because the
latencies are orders of magnitude lower.  Perhaps i'm looking for a way to
modify the prefetch to have a higher priority when the system is under some
threshold.

>
> On a related note (maybe?) I would love to see pool-wide settings that
> control how aggressively data is added/removed form ARC, L2ARC, etc.
>
> Evictions are done on an as-needed basis. Why would you want to evict more
> than needed?
> So you could fetch it again?
>
> Prefetching can be more aggressive, but we actually see busy systems
> disabling prefetch to
> improve interactive performance. Queuing theory works, once again.
>
> It's not that I want evictions to occur for no reason... only that the
rate be accelerated if there is contention.  If I recall correctly, ZFS has
some default values included that throttle how quickly the ACR/L2ARC are
updated, and the explanation I read was it was due to SSDs 6+ years ago
were not capable of the IOPS and throughput that they are today.

I know that ZFS has a prefetch capability but have seen fairly little
written about it, are there any good references you can point me to better
understand it?  In particular I would like to see some kind of measurement
on my systems showing how often this capability is utilized.


>  Something that would accelerate the warming of a cold pool of storage or
> be more aggressive in adding/removing cached data on a volatile dataset
> (e.g. where Virtual Machines are turned on/off frequently).  I have heard
> that some of these defaults might be changed in some future release of
> Illumos, but haven't seen any specifics saying that the idea is nearing
> fruition in release XYZ.
>
> It is easy to warm data (dd), even to put it into MFU (dd + dd). For best
> performance with
> VMs, MFU works extremely well, especially with clones.
>

I'm unclear on the best way to warm data... do you mean to simply `dd
if=/volumes/myvol/data of=/dev/null`?  I have always been under the
impression that ARC/L2ARC has rate limiting how much data can be added to
the cache per interval (i can't remember the interval).  Is this not the
case?  If there is some rate limiting in place, dd-ing the data like my
example above would not necessarily cache all of the data... it might take
several iterations to populate the cache, correct?

Forgive my naivete, but when I look at my pool when it is under random load
and see a heavy load hitting the spinning disk vdevs and relatively little
on my L2ARC SSDs I wonder how to better utilize their performance.  I would
think that if my L2ARC is not yet full and it has very low
IOPS/throughput/busy/wait, then ZFS should use that opportunity to populate
the cache aggressively based on the MRU or some other mechanism.

Sorry to digress from the original thread!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS QoS and priorities

2012-12-05 Thread Matt Van Mater
I don't have anything significant to add to this conversation, but wanted
to chime in that I also find the concept of a QOS-like capability very
appealing and that Jim's recent emails resonate with me.  You're not alone!
 I believe there are many use cases where a granular prioritization that
controls how ARC, L2ARC, ZIL and underlying vdevs are used to give priority
IO to a specific zvol, share, etc would be useful.  My experience is
stronger in the networking side and I envision a weighted class based
queuing methodology (or something along those lines).  I recognize that
ZFS's architecture preference for coalescing writes and reads into larger
sequential batches might conflict with a QOS-like capability... Perhaps the
ARC/L2ARC tuning might be a good starting point towards that end?

On a related note (maybe?) I would love to see pool-wide settings that
control how aggressively data is added/removed form ARC, L2ARC, etc.
 Something that would accelerate the warming of a cold pool of storage or
be more aggressive in adding/removing cached data on a volatile dataset
(e.g. where Virtual Machines are turned on/off frequently).  I have heard
that some of these defaults might be changed in some future release of
Illumos, but haven't seen any specifics saying that the idea is nearing
fruition in release XYZ.

Matt


On Wed, Dec 5, 2012 at 10:26 AM, Jim Klimov  wrote:

> On 2012-11-29 10:56, Jim Klimov wrote:
>
>> For example, I might want to have corporate webshop-related
>> databases and appservers to be the fastest storage citizens,
>> then some corporate CRM and email, then various lower priority
>> zones and VMs, and at the bottom of the list - backups.
>>
>
> On a side note, I'm now revisiting old ZFS presentations collected
> over the years, and one suggested as "TBD" statements the ideas
> that metaslabs with varying speeds could be used for specific
> tasks, and not only to receive the allocations first so that a new
> pool would perform quickly. I.e. "TBD: Workload specific freespace
> selection policies".
>
> Say, I create a new storage box and lay out some bulk file, backup
> and database datasets. Even as they are receiving their first bytes,
> I have some idea about the kind of performance I'd expect from them -
> with QoS per dataset I might destine the databases to the fast LBAs
> (and smaller seeks between tracks I expect to use frequently), and
> the bulk data onto slower tracks right from the start, and the rest
> of unspecified data would grow around the middle of the allocation
> range.
>
> These types of data would then only "creep" onto the less fitting
> metaslabs (faster for bulk, slower for DB) if the target ones run
> out of free space. Then the next-best-fitting would be used...
>
> This one idea is somewhat reminiscent of hierarchical storage
> management, except that it is about static allocation at the
> write-time and takes place within the single disk (or set of
> similar disks), in order to warrant different performance for
> different tasks.
>
> ///Jim
>
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Different size / manufacturer L2ARC

2012-09-26 Thread Matt Van Mater
Excellent thanks to you both.  I knew of both those methods and wanted
to make sure i wasn't missing something!

On Wed, Sep 26, 2012 at 11:21 AM, Dan Swartzendruber wrote:

> **
> On 9/26/2012 11:18 AM, Matt Van Mater wrote:
>
>  If the added device is slower, you will experience a slight drop in
>> per-op performance, however, if your working set needs another SSD,
>> overall it might improve your throughput (as the cache hit ratio will
>> increase).
>>
>
>  Thanks for your fast reply!  I think I know the answer to this question,
> but what is the best way to determine how large my pool's l2arc working set
> is (i.e. how much l2arc is in use)?
>
>
> Easiest way:
>
> zpool iostat -v
>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Different size / manufacturer L2ARC

2012-09-26 Thread Matt Van Mater
>
> If the added device is slower, you will experience a slight drop in
> per-op performance, however, if your working set needs another SSD,
> overall it might improve your throughput (as the cache hit ratio will
> increase).
>

Thanks for your fast reply!  I think I know the answer to this question,
but what is the best way to determine how large my pool's l2arc working set
is (i.e. how much l2arc is in use)?

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Different size / manufacturer L2ARC

2012-09-26 Thread Matt Van Mater
I've looked on the mailing list (the evil tuning wikis are down) and
haven't seen a reference to this seemingly simple question...

I have two OCZ Vertex 4 SSDs acting as L2ARC.  I have a spare Crucial SSD
(about 1.5 years old) that isn't getting much use and i'm curious about
adding it to the pool as a third L2ARC device.

Is there any reason why I technically can't use different capacity and/or
manufacturer SSDs as a single ZFS pool's L2ARC?
Even if it will work technically, will this configuration negatively impact
performance (e.g. slow down the entire cache to the slowest drive's
performance)?

Thanks!
Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss