On Fri, February 4, 2011 09:30, Pawel Jakub Dawidek wrote:
> But in most cases we don't need TRIM to be so perfect. My
> current idea is to delay TRIM operation for some number of transaction
> groups.  For example if block is freed in txg=5, I'll send TRIM for it
> after txg=15 (if it wasn't reassigned in the meantime).  This is ok if
> we crash before we get to txg=15, because the only side-effect is that
> next write to this range might be a little slower.

Off the top of my head, I can then of two instances where non-recent
blocks would be needed:
    * snapshots
    * importing via recovery mode ("zpool import -F mypool")

For the latter, given that each vdev label can have up to 128 uberblocks, 
recovery mode import can go back at least 128 transactions for a single
non-mirrored device, so you'd potentially need to not TRIM at least 128
transactions back for the worst case.

Of course if you have a pair of mirrored vdevs/disks, and each one has 128
uberblocks, that's potentially 256 txgs that you can recover from (and it
goes up the more vdevs you have of course). That may be excessive, but
perhaps there could be a tunable sysctl on a max amount to go back TRIMing
(defaulting to 128? 64? 32?).

I'm not sure how ZFS keeps track of snapshots: is there something
in-memory, or is it necessary to walk the tree? Perhaps getting a list of
snapshots, getting the oldest birth time (i.e., smallest txg), and TRIMing
and blocks that have one less than that number? Given that txgs are
committed every 5-30s, and I/O isn't done between them, that "idle" time
could be utilized for sending TRIM commands?

Presumably the Oracle folks are looking at this as well internally.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to