On Wed, Dec 30, 2009 at 1:40 PM, Richard Elling
<richard.ell...@gmail.com> wrote:
> On Dec 30, 2009, at 10:53 AM, Andras Spitzer wrote:
>
>> Devzero,
>>
>> Unfortunately that was my assumption as well. I don't have source level
>> knowledge of ZFS, though based on what I know it wouldn't be an easy way to
>> do it. I'm not even sure it's only a technical question, but a design
>> question, which would make it even less feasible.
>
> It is not hard, because ZFS knows the current free list, so walking that
> list
> and telling the storage about the freed blocks isn't very hard.
>
> What is hard is figuring out if this would actually improve life.  The
> reason
> I say this is because people like to use snapshots and clones on ZFS.
> If you keep snapshots, then you aren't freeing blocks, so the free list
> doesn't grow. This is a very different use case than UFS, as an example.

It seems as though the oft mentioned block rewrite capabilities needed
for pool shrinking and changing things like compression, encryption,
and deduplication would also show benefit here.  That is, blocks would
be re-written in such a way to minimize the number of chunks of
storage that is allocated.  The current HDS chunk size is 42 MB.

The most benefit would seem to be to have ZFS make a point of reusing
old but freed blocks before doing an allocation that causes the
back-end storage to allocate another chunk of disk to the
thin-provisioned.  While it is important to be able to roll back a few
transactions in the event of some widely discussed failure modes, it
is probably reasonable to reuse a block freed by a txg that is 3,000
txg's old (about 1 day old if 1 txg per 30 seconds).  Such a threshold
could be used to determine whether to reuse a block or venture into
previously untouched regions of the disk.

This strategy would allow the SAN administrator (who is a different
person than the sysadmin) to allocate extra space to servers and the
sysadmin can control the amount of space really used by quotas.  In
the event that there is an emergency need for more space, the sysadmin
can increase the quota and allow more of the allocate SAN space to be
used.  Assuming the block rewrite feature comes to fruition, this
emergency growth could be shrunk back down to the original size once
the surge in demand (or errant process) subsides.

>
> There are a few minor bumps in the road. The ATA PASSTHROUGH
> command, which allows TRIM to pass through the SATA drivers, was
> just integrated into b130. This will be more important to small servers
> than SANs, but the point is that all parts of the software stack need to
> support the effort. As such, it is not clear to me who, if anyone, inside
> Sun is champion for the effort -- it crosses multiple organizational
> boundaries.
>
>>
>> Apart from the technical possibilities, this feature looks really
>> inevitable to me in the long run especially for enterprise customers with
>> high-end SAN as cost is always a major factor in a storage design and it's a
>> huge difference if you have to pay based on the space used vs space
>> allocated (for example).
>
> If the high cost of SAN storage is the problem, then I think there are
> better ways to solve that :-)

The "SAN" could be an OpenSolaris device serving LUNs through COMSTAR.
 If those LUNs are used to hold a zpool, the zpool could notify the
LUN that blocks are no longer used and the "SAN" could reclaim those
blocks.  This is just a variant of the same problem faced with
expensive SAN devices that have thin provisioning allocation units
measured in the tens of megabytes instead of hundreds to thousands of
kilobytes.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to