Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Erik Trimble

On 12/12/2011 12:23 PM, Richard Elling wrote:

On Dec 11, 2011, at 2:59 PM, Mertol Ozyoney wrote:


Not exactly. What is dedup'ed is the stream only, which is infect not very
efficient. Real dedup aware replication is taking the necessary steps to
avoid sending a block that exists on the other storage system.

These exist outside of ZFS (eg rsync) and scale poorly.

Given that dedup is done at the pool level and ZFS send/receive is done at
the dataset level, how would you propose implementing a dedup-aware
ZFS send command?
  -- richard


I'm with Richard.

There is no practical "optimally efficient" way to dedup a stream from 
one system to another.  The only way to do so would be to have total 
information about the pool composition on BOTH the receiver and sender 
side.  That would involve sending the checksums for the complete pool 
blocks between the receiver and sender, which is a non-trivial overhead, 
and, indeed, would usually be far worse than simply doing what 'zfs send 
-D' does now (dedup the sending stream itself).  The only possible way 
that such a scheme would work would be if the receiver and sender were 
the same machine (note: not VMs or Zones on the same machine, but the 
same OS instance, since you would need the DDT to be shared).  And, 
that's not a use case that 'zfs send' is generally optimized for - that 
is, while it's entirely possible, it's not the primary use case for 'zfs 
send'


Given the overhead of network communications, there's no way that 
sending block checksums between hosts can ever be more efficient than 
just sending the self-deduped whole stream (except in pedantic cases).  
Let's look at  possible implementations (all assume that the local 
sending machine does its own dedup - that is, the stream-to-be-sent is 
already deduped within itself):


(1) when constructing the stream, every time a block is read from a 
fileset (or volume), its checksum is sent to the receiving machine. The 
receiving machine then looks up that checksum in its DDT, and sends back 
a "needed" or "not-needed" reply to the sender. While this lookup is 
being done, the sender must hold the original block in RAM, and cannot 
write it out to the to-be-sent-stream.


(2) The sending machine reads all the to-be-sent blocks, creates a 
stream, AND creates a checksum table (a mini-DDT, if you will).  The 
sender communicates to the receiver this mini-DDT.  The receiver diffs 
this against its own master pool DDT, and then sends back an edited 
mini-DDT containing only the checksums that match blocks which aren't on 
the receiver.  The original sending machine must then go back and 
re-construct the stream (either as a whole, or parse the stream as it is 
being sent) to leave out the unneeded blocks.


(3) some combo of #1 and #2 where several checksums are stuffed into a 
packet, and sent over the wire to be checked at the destination, with 
the receiver sending back only those to be included in the stream.



In the first scenario, you produce a huge amount of small network packet 
traffic, which trashes network throughput, with no real expectation that 
the reduction in the send stream will be worth it.  In the second case, 
you induce a huge amount of latency into the construction of the sending 
stream - that is, the "sender" has to wait around and then spend a 
non-trivial amount of processing power on essentially double processing 
the send stream, when, in the current implementation, it just sends out 
stuff as soon as it gets it.  The third scenario is only an optimization 
of #1 and #2, and doesn't avoid the pitfalls of either.


That is, even if ZFS did pool-level sends, you're still trapped by the 
need to share the DDT, which induces an overhead that can't be 
reasonably made up vs simply sending an internally-deduped souce stream 
in the first place.  I'm sure I can construct an instance where such DDT 
sharing would be better than the current 'zfs send' implementation; I'm 
just as sure that such an instance would be the small minority of usage, 
and that such a required implementation would radically alter the 
"typical" use case's performance to the negative.


In any case, as 'zfs send' works on filesets and volumes, and ZFS 
maintains DDT information on a pool-level, there's no way to share an 
existing whole DDT between two systems (and, given the potential size of 
a pool-level DDT, that's a bad idea anyway).


I see no ability to optimize the 'zfs send/receive' concept beyond what 
is currently done.


-Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.



On 12/12/2011 3:02 PM, Gary Driggs wrote:

On Dec 12, 2011, at 11:42 AM, "\"Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.\"" wrote:


please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD)

Do those appliances also use the F20 PCIe flash cards?
no,  these controller need the slots for SAS HBA to make  HA-cluster 
configuration orFC0E,  FC -HBA or 10ge HBA

7120 only support logzilla
7320 (x4170M2 head support readzilla (L2ARC) and logzilla (ZIL) 18GB
7420 (x4470M2 head) support readzilla(500GB or 1TB) and logzilla

I know the
Exadata storage cells use them but they aren't utilizing ZFS in the
Linux version of the X2-2. Has that changed with the Solaris x86
versions of the appliance? Also, does OCZ or someone make an
equivalent to the F20 now?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840
http://laotsao.wordpress.com/
http://laotsao.blogspot.com/

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Richard Elling
On Dec 11, 2011, at 2:59 PM, Mertol Ozyoney wrote:

> Not exactly. What is dedup'ed is the stream only, which is infect not very
> efficient. Real dedup aware replication is taking the necessary steps to
> avoid sending a block that exists on the other storage system.

These exist outside of ZFS (eg rsync) and scale poorly.

Given that dedup is done at the pool level and ZFS send/receive is done at
the dataset level, how would you propose implementing a dedup-aware
ZFS send command?
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Gary Driggs
On Dec 12, 2011, at 11:42 AM, "\"Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.\"" wrote:

> please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD)

Do those appliances also use the F20 PCIe flash cards? I know the
Exadata storage cells use them but they aren't utilizing ZFS in the
Linux version of the X2-2. Has that changed with the Solaris x86
versions of the appliance? Also, does OCZ or someone make an
equivalent to the F20 now?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Albert Chin
On Mon, Dec 12, 2011 at 03:01:08PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." 
wrote:
> 4c@2.4ghz

Yep, that's the plan. Thanks.

> On 12/12/2011 2:44 PM, Albert Chin wrote:
> >On Mon, Dec 12, 2011 at 02:40:52PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) 
> >Ph.D." wrote:
> >>please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and
> >>ZIL(SSD)
> >>may be try the ZFS simulator SW
> >Good point. Thanks.
> >
> >>regards
> >>
> >>On 12/12/2011 2:28 PM, Albert Chin wrote:
> >>>We're preparing to purchase an X4170M2 as an upgrade for our existing
> >>>X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some
> >>>more expensive than others. Our current system has a dual-core 1.8Ghz
> >>>Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649
> >>>2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than
> >>>enough. Based on what we're using the system for, it should be more
> >>>I/O bound than CPU bound. We are doing compression in ZFS but that
> >>>shouldn't be too CPU intensive. Seems we should be caring more about
> >>>more cores than high Ghz.
> >>>
> >>>Recommendations?
> >>>
> >>-- 
> >>Hung-Sheng Tsao Ph D.
> >>Founder&   Principal
> >>HopBit GridComputing LLC
> >>cell: 9734950840
> >>http://laotsao.wordpress.com/
> >>http://laotsao.blogspot.com/
> >>
> >>begin:vcard
> >>fn:Hung-Sheng Tsao
> >>n:Tsao;Hung-Sheng
> >>email;internet:laot...@gmail.com
> >>tel;cell:9734950840
> >>x-mozilla-html:TRUE
> >>version:2.1
> >>end:vcard
> >>
> >
> 
> -- 
> Hung-Sheng Tsao Ph D.
> Founder&  Principal
> HopBit GridComputing LLC
> cell: 9734950840
> http://laotsao.wordpress.com/
> http://laotsao.blogspot.com/

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.

4c@2.4ghz

On 12/12/2011 2:44 PM, Albert Chin wrote:

On Mon, Dec 12, 2011 at 02:40:52PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." 
wrote:

please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and
ZIL(SSD)
may be try the ZFS simulator SW

Good point. Thanks.


regards

On 12/12/2011 2:28 PM, Albert Chin wrote:

We're preparing to purchase an X4170M2 as an upgrade for our existing
X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some
more expensive than others. Our current system has a dual-core 1.8Ghz
Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649
2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than
enough. Based on what we're using the system for, it should be more
I/O bound than CPU bound. We are doing compression in ZFS but that
shouldn't be too CPU intensive. Seems we should be caring more about
more cores than high Ghz.

Recommendations?


--
Hung-Sheng Tsao Ph D.
Founder&   Principal
HopBit GridComputing LLC
cell: 9734950840
http://laotsao.wordpress.com/
http://laotsao.blogspot.com/

begin:vcard
fn:Hung-Sheng Tsao
n:Tsao;Hung-Sheng
email;internet:laot...@gmail.com
tel;cell:9734950840
x-mozilla-html:TRUE
version:2.1
end:vcard





--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840
http://laotsao.wordpress.com/
http://laotsao.blogspot.com/

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Albert Chin
On Mon, Dec 12, 2011 at 02:40:52PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." 
wrote:
> please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and
> ZIL(SSD)
> may be try the ZFS simulator SW

Good point. Thanks.

> regards
> 
> On 12/12/2011 2:28 PM, Albert Chin wrote:
> >We're preparing to purchase an X4170M2 as an upgrade for our existing
> >X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some
> >more expensive than others. Our current system has a dual-core 1.8Ghz
> >Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649
> >2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than
> >enough. Based on what we're using the system for, it should be more
> >I/O bound than CPU bound. We are doing compression in ZFS but that
> >shouldn't be too CPU intensive. Seems we should be caring more about
> >more cores than high Ghz.
> >
> >Recommendations?
> >
> 
> -- 
> Hung-Sheng Tsao Ph D.
> Founder&  Principal
> HopBit GridComputing LLC
> cell: 9734950840
> http://laotsao.wordpress.com/
> http://laotsao.blogspot.com/
> 

> begin:vcard
> fn:Hung-Sheng Tsao
> n:Tsao;Hung-Sheng
> email;internet:laot...@gmail.com
> tel;cell:9734950840
> x-mozilla-html:TRUE
> version:2.1
> end:vcard
> 


-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.
please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and 
ZIL(SSD)

may be try the ZFS simulator SW
regards




On 12/12/2011 2:28 PM, Albert Chin wrote:

We're preparing to purchase an X4170M2 as an upgrade for our existing
X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some
more expensive than others. Our current system has a dual-core 1.8Ghz
Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649
2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than
enough. Based on what we're using the system for, it should be more
I/O bound than CPU bound. We are doing compression in ZFS but that
shouldn't be too CPU intensive. Seems we should be caring more about
more cores than high Ghz.

Recommendations?



--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840
http://laotsao.wordpress.com/
http://laotsao.blogspot.com/

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Albert Chin
We're preparing to purchase an X4170M2 as an upgrade for our existing
X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some
more expensive than others. Our current system has a dual-core 1.8Ghz
Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649
2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than
enough. Based on what we're using the system for, it should be more
I/O bound than CPU bound. We are doing compression in ZFS but that
shouldn't be too CPU intensive. Seems we should be caring more about
more cores than high Ghz.

Recommendations?

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4540 disk replacement

2011-12-12 Thread Tony Schreiner
Thanks for that. Success. I was apprehensive about prying too hard.
Tony

On Dec 12, 2011, at 11:53 AM, Edmund White wrote:

> You need to pry the drive sled off of the disk once the screws are
> removed. There are two or four notches that hold onto the disk. You'll end
> up spreading the carrier frame slightly.
> 
> -- 
> Edmund White
> ewwh...@mac.com
> 
> 
> 
> 
> On 12/12/11 5:25 PM, "Tony Schreiner"  wrote:
> 
>> Sorry for the off-topic question.
>> I'm needing to replace a disk in a x4540 zfs file system. I have
>> replacement ST31000NSSUN disks, but it's not obvious to me how to
>> separate the original disk from its drive sled, it seems to be attached
>> by more than the usual 4 screws. Is it meant to be separated? I've looked
>> the x4540 user guide but it does not say anything about it.
>> 
>> Tony Schreiner
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4540 disk replacement

2011-12-12 Thread Edmund White
You need to pry the drive sled off of the disk once the screws are
removed. There are two or four notches that hold onto the disk. You'll end
up spreading the carrier frame slightly.

-- 
Edmund White
ewwh...@mac.com




On 12/12/11 5:25 PM, "Tony Schreiner"  wrote:

>Sorry for the off-topic question.
>I'm needing to replace a disk in a x4540 zfs file system. I have
>replacement ST31000NSSUN disks, but it's not obvious to me how to
>separate the original disk from its drive sled, it seems to be attached
>by more than the usual 4 screws. Is it meant to be separated? I've looked
>the x4540 user guide but it does not say anything about it.
>
>Tony Schreiner
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sun X4540 disk replacement

2011-12-12 Thread Tony Schreiner
Sorry for the off-topic question.
I'm needing to replace a disk in a x4540 zfs file system. I have replacement 
ST31000NSSUN disks, but it's not obvious to me how to separate the original 
disk from its drive sled, it seems to be attached by more than the usual 4 
screws. Is it meant to be separated? I've looked the x4540 user guide but it 
does not say anything about it.

Tony Schreiner
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Jim Klimov

2011-12-12 19:03, Pawel Jakub Dawidek пишет:

On Sun, Dec 11, 2011 at 04:04:37PM +0400, Jim Klimov wrote:

I would not be surprised to see that there is some disk IO
adding delays for the second case (read of a deduped file
"clone"), because you still have to determine references
to this second file's blocks, and another path of on-disk
blocks might lead to it from a separate inode in a separate
dataset (or I might be wrong). Reading this second path of
pointers to the same cached data blocks might decrease speed
a little.


As I said, ZFS reading path involves no dedup code. No at all.


I am not sure if we contradicted each other ;)

What I meant was that the ZFS reading path involves reading
logical data blocks at some point, consulting the cache(s)
if the block is already cached (and up-to-date). These blocks
are addressed by some unique ID, and now with dedup there are
several pointers to same block.

So, basically, reading a file involves reading ZFS metadata,
determining data block IDs, fetching them from disk or cache.

Indeed, this does not need to be dedup-aware; but if the other
chain of metadata blocks points to same data or metadata blocks
which were already cached (for whatever reason, not limited to
dedup) - this is where the read-speed boost appears.
Likewise, if some blocks are not cached, such as metadata
needed to determine the second file's block IDs, this incurs
disk IOs and may decrease overall speed.

That's why I proposed redoing the test with re-reading both
files - now all relevant data and metadata would be cached
and we might see a bit faster read speed.

Just for kicks ;)

//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Brad Diggs
Thanks everyone for your input on this thread.  It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan.  If the performance goes well, I will share some of the results when we concludein January/February timeframe.Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARC detect and prevent streaming reads such as from dd from populating the cache.  Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS.http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.htmlThanks again!Brad
Brad Diggs | Principal Sales ConsultantTech Blog: http://TheZoneManager.comLinkedIn: http://www.linkedin.com/in/braddiggs


On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.The only vendor i know that can do this is NetappIn fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':  -D  Perform dedup processing on the stream. Deduplicated  streams  cannot  be  received on systems that do not  support the stream deduplication feature.Is there any more published information on how this feature works?-- Ian.___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Mertol Ozyoney
I am almost sure that in cache things are still hydrated. There is an
outstanding RFE for this, while I am not sure, I think this feature will
be implemented sooner or later. And in theory there will be little
benefits as most dedup'ed shares are used for archive purposes...

PS: NetApp's do have significantly bigger problems in caching department ,
like virtually having no L1 cache. However it's also my duty to knw where
they have an advantage Š

Br
Mertol 
 
 
Mertol Özyöney | Storage Sales
Mobile: +90 533 931 0752
Email: mertol.ozyo...@oracle.com







On 12/10/11 4:05 PM, "Pawel Jakub Dawidek"  wrote:

>On Wed, Dec 07, 2011 at 10:48:43PM +0200, Mertol Ozyoney wrote:
>> Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.
>> 
>> The only vendor i know that can do this is Netapp
>
>And you really work at Oracle?:)
>
>The answer is definiately yes. ARC caches on-disk blocks and dedup just
>reference those blocks. When you read dedup code is not involved at all.
>Let me show it to you with simple test:
>
>Create a file (dedup is on):
>
>   # dd if=/dev/random of=/foo/a bs=1m count=1024
>
>Copy this file so that it is deduped:
>
>   # dd if=/foo/a of=/foo/b bs=1m
>
>Export the pool so all cache is removed and reimport it:
>
>   # zpool export foo
>   # zpool import foo
>
>Now let's read one file:
>
>   # dd if=/foo/a of=/dev/null bs=1m
>   1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
>
>We read file 'a' and all its blocks are in cache now. The 'b' file
>shares all the same blocks, so if ARC caches blocks only once, reading
>'b' should be much faster:
>
>   # dd if=/foo/b of=/dev/null bs=1m
>   1073741824 bytes transferred in 0.870501 secs (1233475634 bytes/sec)
>
>Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
>activity. Magic?:)
>
>-- 
>Pawel Jakub Dawidek   http://www.wheelsystems.com
>FreeBSD committer http://www.FreeBSD.org
>Am I Evil? Yes, I Am! http://yomoli.com
>___
>zfs-discuss mailing list
>zfs-discuss@opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Mertol Ozyoney
Not exactly. What is dedup'ed is the stream only, which is infect not very
efficient. Real dedup aware replication is taking the necessary steps to
avoid sending a block that exists on the other storage system.


 
 
Mertol Özyöney | Storage Sales
Mobile: +90 533 931 0752
Email: mertol.ozyo...@oracle.com







On 12/8/11 1:39 PM, "Darren J Moffat"  wrote:

>On 12/07/11 20:48, Mertol Ozyoney wrote:
>> Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.
>>
>> The only vendor i know that can do this is Netapp
>>
>> In fact , most of our functions, like replication is not dedup aware.
>
>> For example, thecnicaly it's possible to optimize our replication that
>> it does not send daya chunks if a data chunk with the same chechsum
>> exists in target, without enabling dedup on target and source.
>
>We already do that with 'zfs send -D':
>
>  -D
>
>  Perform dedup processing on the stream. Deduplicated
>  streams  cannot  be  received on systems that do not
>  support the stream deduplication feature.
>
>
>
>
>-- 
>Darren J Moffat


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Pawel Jakub Dawidek
On Sun, Dec 11, 2011 at 04:04:37PM +0400, Jim Klimov wrote:
> I would not be surprised to see that there is some disk IO
> adding delays for the second case (read of a deduped file
> "clone"), because you still have to determine references
> to this second file's blocks, and another path of on-disk
> blocks might lead to it from a separate inode in a separate
> dataset (or I might be wrong). Reading this second path of
> pointers to the same cached data blocks might decrease speed
> a little.

As I said, ZFS reading path involves no dedup code. No at all.
The proof would be being able to boot from ZFS with dedup turned on
eventhough ZFS boot code has 0 dedup code in it. Another proof would be
ZFS source code.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpOdlii40IHg.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] does log device (ZIL) require a mirror setup?

2011-12-12 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Thomas Nau
> 
> We use a STEC ZeusRAM as a log device for a 200TB RAID-Z2 pool.
> As they are supposed to be read only after a crash or when booting and
> those nice things are pretty expensive I'm wondering if mirroring
> the log devices is a "must / highly recommended"

Assuming you're running a recent version of zfs (zpool > 20 or so)...

The decision to mirror or not to mirror the log device hinges around one
single solitary failure condition...

In normal operation, a log device is write only.  Never gets read until
after an ungraceful system crash.  Unfortunately, it is sometimes possible
for flash memory to enter a failure state which is undetected by writes, and
only detected upon reads.  In this state, you effectively have no log
device, but you think you do.  If you're in that failure state and you have
an ungraceful crash, then you lose whatever you thought you had in the log.

Maybe it will help if you periodically remove the log device, and then read
& write the whole log device to verify it's operational (be sure to actually
detect failures if any) and then re-add the log device to the pool.  Seems
logical.  Probably nobody's ever tested it.

If you have a failed unmirrored log device at the same time as an ungraceful
system crash, then you lose data.  (Up to 30 sec, or 5 sec worth, depending
on your system.)

Your decision to mirror or not to mirror all hinges around your fear of the
aforementioned coincidence of log device failure & ungraceful system crash.
Bear in mind, that mirroring does not eliminate the possibility of both log
devices being in the same undetected failure state.  It doesn't eliminate
the problem, only reduces the probability.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss