Re: [zfs-discuss] Deduplication Memory Requirements

Erik Trimble Wed, 04 May 2011 16:52:21 -0700

On 5/4/2011 4:44 PM, Tim Cook wrote:

On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trim...@oracle.com<mailto:erik.trim...@oracle.com>> wrote:


    On 5/4/2011 4:14 PM, Ray Van Dolson wrote:

        On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:

            On Wed, May 4, 2011 at 12:29 PM, Erik
            Trimble<erik.trim...@oracle.com
            <mailto:erik.trim...@oracle.com>>  wrote:

                       I suspect that NetApp does the following to
                limit their resource
                usage:   they presume the presence of some sort of
                cache that can be
                dedicated to the DDT (and, since they also control the
                hardware, they can
                make sure there is always one present).  Thus, they
                can make their code

            AFAIK, NetApp has more restrictive requirements about how
            much data
            can be dedup'd on each type of hardware.

            See page 29 of
            http://media.netapp.com/documents/tr-3505.pdf - Smaller
            pieces of hardware can only dedup 1TB volumes, and even
            the big-daddy
            filers will only dedup up to 16TB per volume, even if the
            volume size
            is 32TB (the largest volume available for dedup).

            NetApp solves the problem by putting rigid constraints
            around the
            problem, whereas ZFS lets you enable dedup for any size
            dataset. Both
            approaches have limitations, and it sucks when you hit them.

            -B

        That is very true, although worth mentioning you can have
        quite a few
        of the dedupe/SIS enabled FlexVols on even the lower-end
        filers (our
        FAS2050 has a bunch of 2TB SIS enabled FlexVols).

    Stupid question - can you hit all the various SIS volumes at once,
    and not get horrid performance penalties?

    If so, I'm almost certain NetApp is doing post-write dedup.  That
    way, the strictly controlled max FlexVol size helps with keeping
    the resource limits down, as it will be able to round-robin the
    post-write dedup to each FlexVol in turn.

    ZFS's problem is that it needs ALL the resouces for EACH pool ALL
    the time, and can't really share them well if it expects to keep
    performance from tanking... (no pun intended)

On a 2050? Probably not. It's got a single-core mobile celeron CPUand 2GB/ram. You couldn't even run ZFS on that box, much lessZFS+dedup. Can you do it on a model that isn't 4 years old withouttanking performance? Absolutely.

Outside of those two 2000 series, the reason there are dedup limitsisn't performance.


--Tim

Indirectly, yes, it's performance, since NetApp has plainly chosenpost-write dedup as a method to restrict the required hardwarecapabilities. The dedup limits on Volsize are almost certainly drivenby the local RAM requirements for post-write dedup.

It also looks like NetApp isn't providing for a dedicated DDT cache,which means that when the NetApp is doing dedup, it's consuming thenormal filesystem cache (i.e. chewing through RAM). Frankly, I'd bevery surprised if you didn't see a noticeable performance hit during theperiod that the NetApp appliance is performing the dedup scans.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

Reply via email to