To do dedup properly, it seems like there would have to be some overly
complicated methodology for a sort of delayed dedup of the data. For speed,
you'd want your writes to go straight into the cache and get flushed out as
quickly as possibly, keep everything as ACID as possible. Then, a dedup
scrubber would take what was written, do the voodoo magic of checksumming
the new data, scanning the tree to see if there are any matches, locking the
duplicates, run the usage counters up or down for that block of data,
swapping out inodes, and marking the duplicate data as free space. It's a
lofty goal, but one that is doable. I guess this is only necessary if
deduplication is done at the file level. If done at the block level, it
could possibly be done on the fly, what with the already implemented
checksumming at the block level, but then your reads will suffer because
pieces of files can potentially be spread all over hell and half of Georgia
on the zdevs. Deduplication is going to require the judicious application of
hallucinogens and man hours. I expect that someone is up to the task.

On Tue, Jul 22, 2008 at 10:39 AM, <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:
>
> > > Hi All
> > >Is there any hope for deduplication on ZFS ?
> > >Mertol Ozyoney
> > >Storage Practice - Sales Manager
> > >Sun Microsystems
> > > Email [EMAIL PROTECTED]
> >
> > There is always hope.
> >
> > Seriously thought, looking at http://en.wikipedia.
> > org/wiki/Comparison_of_revision_control_software there are a lot of
> > choices of how we could implement this.
> >
> > SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
> > one of those with ZFS.
> >
> > It _could_ be as simple (with SVN as an example) of using directory
> > listings to produce files which were then 'diffed'. You could then
> > view the diffs as though they were changes made to lines of source code.
> >
> > Just add a "tree" subroutine to allow you to grab all the diffs that
> > referenced changes to file 'xyz' and you would have easy access to
> > all the changes of a particular file (or directory).
> >
> > With the speed optimized ability added to use ZFS snapshots with the
> > "tree subroutine" to rollback a single file (or directory) you could
> > undo / redo your way through the filesystem.
> >
>
>
> dedup is not revision control,  you seem to completely misunderstand the
> problem.
>
>
>
> > Using a LKCD (
> http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
> > ) you could "sit out" on the play and watch from the sidelines --
> > returning to the OS when you thought you were 'safe' (and if not,
> > jumping backout).
> >
>
> Now it seems you have veered even further off course.  What are you
> implying the LKCD has to do with zfs, solaris, dedup, let alone revision
> control software?
>
> -Wade
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to