On Sat, Jan 29, 2011 at 11:31:59AM -0500, Edward Ned Harvey wrote:
> What is the status of ZFS support for TRIM?
[...]

I've no idea, but because I wanted to add such support for FreeBSD/ZFS
for a while now, I'll share my thoughts.

The problem is where to put those operations. ZFS internally have
ZIO_TYPE_FREE request, which represents exactly what we need - offset
and size to free. It would be best to just pass those requests directly
to VDEVs, but we can't do that. There might be transaction group that
will never be committed, because of a power failure and we TRIMed blocks
that we want to use after boot.
Ok, maybe we could just make such operation part of the transaction
group? No, we can't do that too. If we start committing transactions and
we execute TRIM operations we may still have power failure and TRIM
operations on old blocks cannot be undone, so we will get back to
invalid data.

So why not to move TRIM operations to the next transaction group? That's
doable, although we still need to be careful not to TRIM blocks that
were freed in the previous transaction group, but are reallocated in the
current one (or if we TRIM, we TRIM first and then write). Unfortunately
we don't want to TRIM blocks immediately. Take into account disks that
are lying about cache flush operation and because of that ZFS tries to
keep freed blocks from the few last transaction groups around, so you
can forcibly rewind to one of the previous txgs if such corruption occur.

My initial idea was to implement 100% reliable TRIM, so that I can
implement secure delete using it, eg. if ZFS is placed on top of disk
encryption layer, I can implement TRIM in this layer as 'overwrite the
given range with random data'. Making TRIM 100% reliable will be very
hard, IMHO.  But in most cases we don't need TRIM to be so perfect. My
current idea is to delay TRIM operation for some number of transaction
groups.  For example if block is freed in txg=5, I'll send TRIM for it
after txg=15 (if it wasn't reassigned in the meantime).  This is ok if
we crash before we get to txg=15, because the only side-effect is that
next write to this range might be a little slower.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
p...@freebsd.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

Attachment: pgpd4hVRMkn1v.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to