On May 27, 2011, at 6:20 AM, Jim Klimov wrote:

> > > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > > boun...@opensolaris.org] On Behalf Of Frank Van Damme
> > > 
> > > Op 26-05-11 13:38, Edward Ned Harvey schreef:
> > > But what if you loose it (the vdev), would there be a way to 
> > reconstruct> the DDT (which you need to be able to delete old, 
> > deduplicated files)?
> > > Let me guess - this requires tracing down all blocks and 
> > > depends on an infamous feature called BPR? ;)
> > 
> > How is that different from putting your DDT on a hard drive, 
> > which is what we currently do?
> I think you two might be talking about somewhat different ideas
> of implementing such DDT storage.
>  
> One approach might be like we have now: the DDT blocks are
> spread in your pool consisting of several top-level vdevs and
> are redundantly protected by ZFS raidz or mirroring. If one of
> such top-level vdevs is lost, the whole pool is faulted or dead.
>  
> Another approach might be more like a dedicated extra device
> (or mirror/raidz of devices) like L2ARC or rather ZIL (more
> analogies below) - this task would need a write-oriented media
> like SLC SSDs with a large capacity, and throttling of L2ARC
> hardware link and potential unreliability of MLCs might make
> DDT storage a bad neighbor for L2ARC SSDs.

I filed an RFE for this about 2 years ago... I would send a URL but
Oracle shut down the OpenSolaris bugs database interface and 
left what is left mostly useless :-(

> Since ZILs are usually treated as write-only devices with a
> low capacity requirement (i.e. 2-4Gb might be more than
> enough), dedicating the rest of even a 20Gb SSD to the
> DDT may be a good investment overall.
>  
> If the ZIL device (mirror) fails, you might need to rollback
> your pool a few transactions back, detach the ZIL and fall
> back to using HDD blocks for the ZIL.
>  
> Since "zdb -s" can seemingly construct a DDT from scratch,
> and since for reads you still have many references to a single
> on-disk block (DDT is not used for reads, right?) - you can
> reconstruct the DDT for either in-pool or external storage.

Nope. If you lose the DDT, then you lose any or all deduped data.
Today, the DDT is treated like metadata which means there are always
at least 2 copies in the pool.

> That might take some downtime, true. But if coupled with
> offline dedup (as discussed in another thread) running in the
> background, maybe not.
>  
> One thing to think of, though: what would we do when the
> dedicated DDT storage overflows - write the extra entries
> into the HDD pool like we do now?

Other designs which use fixed-sized DDT areas suffer from that design
limitation -- once it fills, they no longer dedup.
> 

> (BTW, what to we do with dedicated ZIL device - flush the
> TXG early?)

No, just write into the pool. Frankly, I don't see this as a problem for real
world machines.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to