On Sat, Jan 29, 2011 at 11:31:59AM -0500, Edward Ned Harvey wrote: > What is the status of ZFS support for TRIM? [...]
I've no idea, but because I wanted to add such support for FreeBSD/ZFS for a while now, I'll share my thoughts. The problem is where to put those operations. ZFS internally have ZIO_TYPE_FREE request, which represents exactly what we need - offset and size to free. It would be best to just pass those requests directly to VDEVs, but we can't do that. There might be transaction group that will never be committed, because of a power failure and we TRIMed blocks that we want to use after boot. Ok, maybe we could just make such operation part of the transaction group? No, we can't do that too. If we start committing transactions and we execute TRIM operations we may still have power failure and TRIM operations on old blocks cannot be undone, so we will get back to invalid data. So why not to move TRIM operations to the next transaction group? That's doable, although we still need to be careful not to TRIM blocks that were freed in the previous transaction group, but are reallocated in the current one (or if we TRIM, we TRIM first and then write). Unfortunately we don't want to TRIM blocks immediately. Take into account disks that are lying about cache flush operation and because of that ZFS tries to keep freed blocks from the few last transaction groups around, so you can forcibly rewind to one of the previous txgs if such corruption occur. My initial idea was to implement 100% reliable TRIM, so that I can implement secure delete using it, eg. if ZFS is placed on top of disk encryption layer, I can implement TRIM in this layer as 'overwrite the given range with random data'. Making TRIM 100% reliable will be very hard, IMHO. But in most cases we don't need TRIM to be so perfect. My current idea is to delay TRIM operation for some number of transaction groups. For example if block is freed in txg=5, I'll send TRIM for it after txg=15 (if it wasn't reassigned in the meantime). This is ok if we crash before we get to txg=15, because the only side-effect is that next write to this range might be a little slower. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am!
pgpd4hVRMkn1v.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss