On 31 dec 2009, at 06.01, Richard Elling wrote:

> 
> On Dec 30, 2009, at 2:24 PM, Ragnar Sundblad wrote:
> 
>> 
>> On 30 dec 2009, at 22.45, Richard Elling wrote:
>> 
>>> On Dec 30, 2009, at 12:25 PM, Andras Spitzer wrote:
>>> 
>>>> Richard,
>>>> 
>>>> That's an interesting question, if it's worth it or not. I guess the 
>>>> question is always who are the targets for ZFS (I assume everyone, though 
>>>> in reality priorities has to set up as the developer resources are 
>>>> limited). For a home office, no doubt thin provisioning is not much of a 
>>>> use, for an enterprise company the numbers might really make a difference 
>>>> if we look at the space used vs space allocated.
>>>> 
>>>> There are some studies that thin provisioning can reduce physical space 
>>>> used up to 30%, which is huge. (Even though I understands studies are not 
>>>> real life and thin provisioning is not viable in every environment)
>>>> 
>>>> Btw, I would like to discuss scenarios where though we have 
>>>> over-subscribed pool in the SAN (meaning the overall allocated space to 
>>>> the systems is more than the physical space in the pool) with proper 
>>>> monitoring and proactive physical drive adds we won't let any 
>>>> systems/applications attached to the SAN realize that we have thin devices.
>>>> 
>>>> Actually that's why I believe configuring thin devices without 
>>>> periodically reclaiming space is just a timebomb, though if you have the 
>>>> option to periodically reclaim space, you can maintain the pool in the SAN 
>>>> in a really efficient way. That's why I found Veritas' Thin Reclamation 
>>>> API as a milestone in the thin device field.
>>>> 
>>>> Anyway, only future can tell if thin provisioning will or won't be a major 
>>>> feature in the storage world, though as I saw Veritas already added this 
>>>> feature I was wondering if ZFS has it at least on it's roadmap.
>>> 
>>> Thin provisioning is absolutely, positively a wonderful, good thing!  The 
>>> question
>>> is, how does the industry handle the multitude of thin provisioning models, 
>>> each
>>> layered on top of another? For example, here at the ranch I use VMWare and 
>>> Xen,
>>> which thinly provision virtual disks. I do this over iSCSI to a server 
>>> running ZFS
>>> which thinly provisions the iSCSI target.  If I had a virtual RAID array, I 
>>> would
>>> probably use that, too. Personally, I think being thinner closer to the 
>>> application
>>> wins over being thinner closer to dumb storage devices (disk drives).
>> 
>> I don't get it - why do we need anything more magic (or complicated)
>> than support for TRIM from the filesystems and the storage systems?
> 
> TRIM is just one part of the problem (or solution, depending on your point
> of view). The TRIM command is part of the T10 protocols that allows a
> host to tell a block device that data in a set of blocks is no longer of
> any value, and the block device can destroy the data without adverse
> consequence.
> 
> In a world with copy-on-write and without snapshots, it is obvious that
> there will be a lot of blocks running around that are no longer in use.
> Snapshots (and their clones) changes that use case. So in a world of
> snapshots, there will be fewer blocks which are not used. Remember,
> the TRIM command is very important to OSes like Windows or OSX
> which do not have file systems that are copy-on-write or have decent
> snapshots. OTOH, ZFS does copy-on-write and lots of ZFS folks use
> snapshots.

I don't believe that there is such a big difference between those
cases. Sure, snapshots may keep more data on disk, but only as much
as the user choose to keep. There has been other ways to keep old
data on disk before (RCS, Solaris patch backout blurbs, logs, caches,
what have you), so there is not really a brand new world there.
(BTW, once upon a time, real operating systems had (optional) file
versioning built into the operating system or file system itself.)

If there was a mechanism that always tended to keep all of the
disk full, that would be another case. Snapshots may do that
with the autosnapshot and warn-and-clean-when-getting-full
features of OpenSolaris, but especially servers will probably
not be managed that way, they will probably have a much more
controlled snapshot policy. (Especially if you want to save every
possible bit of disk space, as those guys with the big fantastic
and ridiculously expensive storage systems always want to do -
maybe that will change in the future though.)

> That said, adding TRIM support is not hard in ZFS. But it depends on
> lower level drivers to pass the TRIM commands down the stack. These
> ducks are lining up now.

Good.

>> I don't see why TRIM would be hard to implement for ZFS either,
>> except that you may want to keep data from a few txgs back just
>> for safety, which would probably call for some two-stage freeing
>> of data blocks (those free blocks that are to be TRIMmed, and
>> those that already are).
> 
> Once a block is freed in ZFS, it no longer needs it. So the "problem"
> of TRIM in ZFS is not related to the recent txg commit history.

It may be that you want to save a few txgs back, so if you get
a failure where parts of the last txg gets lost, you will still be
able to get an old (few seconds/minutes) version of your data back.

This could happen if the sync commands aren't correctly implemented
all the way (as we have seen some stories about on this list).
Maybe someone disabled syncing somewhere to improve performance.

It could also happen if a "non volatile" caching device, such as
a storage controller, breaks in some bad way. Or maybe you just
had a bad/old battery/supercap in a device that implements
NV storage with batteries/supercaps.

> The
> issue is that traversing the free block list has to be protected by
> locks, so that the file system does not allocate a block when it is
> also TRIMming the block. Not so difficult, as long as the TRIM
> occurs relatively quickly.
> 
> I think that any TRIM implementation should be an administration
> command, like scrub. It probably doesn't make sense to have it
> running all of the time.  But on occasion, it might make sense.

I am not sure why it shouldn't run at all times, except for the
fact that it seems to be badly implemented in some SATA devices
with high latencies, so that it will interrupt any data streaming
to/from the disks.
On a general purpose system, that may not be an issue since you
may read a lot from cache anyway, and synced writes may wait a
little without anyone even noticing.
On a special system that needs streaming performance, it might be
interesting to only trim at certain occasions, but then you will
probably have a service window for it, with a start- and stop time,
so you need to be ably to control the trimming process pretty exact
for this feature to be interesting. It may turn out that such
systems may be better served not trimming at all.
On a laptop on the other hand, you typically don't have a service
window and have no idea when it would be a good time to start
TRIMing, and continuous TRIMing may be the best option.

> My concern is that people will have an expectation that they can
> use snapshots and TRIM -- the former reduces the effectiveness
> of the latter.

In my experience, disks tends to get full one way or another
anyway if you don't manage your data. I don't really see that
snapshots changes that a whole lot.

>  As the price of storing bytes continues to decrease,
> will the cost of not TRIMming be a long term issue?  I think not.
> -- richard

Maybe, maybe not.

Storage will always have a cost, not even OpenStorage has
really changed that by order of magnitudes (yet, at least).

Also, currently, when the SSDs for some very strange reason is
constructed from flash chips designed for firmware and slowly
changing configuration data and can only erase in very large chunks,
TRIMing is good for the housekeeping in the SSD drive. A typical
use case for this would be a laptop.

Happy new year, everybody!

/ragge s

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to