On Tue, Jun 15, 2010 at 7:28 PM, David Magda <dma...@ee.ryerson.ca> wrote:
> On Jun 15, 2010, at 14:20, Fco Javier Garcia wrote:
>
>> I think dedup may have its greatest appeal in VDI environments (think
>> about a environment with 85% if the data that the virtual machine needs is
>> into ARC or L2ARC... is like a dream...almost instantaneous response... and
>> you can boot a new machine in a few seconds)...
>
> This may also be accomplished by using snapshots and clones of data sets. At
> least for OS images: user profiles and documents could be something else
> entirely.

It all depends on the nature of the VDI environment.  If the VMs are
regenerated on each login, the snapshot + clone mechanism is
sufficient.  Deduplication is not needed.  However, if VMs have a long
life and get periodic patches and other software updates,
deduplication will be required if you want to remain at somewhat
constant storage utilization.

It probably makes a lot of sense to be sure that swap or page files
are on a non-dedup dataset.  Executables and shared libraries
shouldn't be getting paged out to it and the likelihood that multiple
VMs page the same thing to swap or a page file is very small.

> Another situation that comes to mind is perhaps as the back-end to a mail
> store: if you send out a message(s) with an attachment(s) to a lot of
> people, the attachment blocks could be deduped (and perhaps compressed as
> well, since base-64 adds 1/3 overhead).

It all depends on how this is stored.  If the attachments are stored
like they were in 1990 as part of an mbox format, you will be very
unlikely to get the proper block alignment.  Even storing the message
body (including headers) in the same file as the attachment may not
align the attachments because the mail headers may be different (e.g.
different recipients messages took different paths, some were
forwarded, etc.).  If the attachments are stored in separate files or
a database format is used that stores attachments separate from the
message (with matching database + zfs block size) things may work out
favorably.

However, a system that detaches messages and stores them separately
may just as well store them in a file that matches the SHA256 hash,
assuming that file doesn't already exist.  If does exist, it can just
increment a reference count.  In other words, an intelligent mail
system should already dedup.  Or at least that is how I would have
written it for the last decade or so...

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to