Re: [zfs-discuss] How do I protect my zfs pools?
Donald Murray, P.Eng. wrote: What steps are _you_ taking to protect _your_ pools? Replication and tape backup. How are you protecting your enterprise data? Replication and tape backup. How often are you losing an entire pool and restoring from backups? Never (since I started using ZFS in anger on build 66). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedupe is in
Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Alex. On Mon, Nov 2, 2009 at 12:21 PM, David Magda dma...@ee.ryerson.ca wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Marie von Ebner-Eschenbach - Even a stopped clock is right twice a day. - http://www.brainyquote.com/quotes/authors/m/marie_von_ebnereschenbac.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Why didn't one of the developers from green-bytes do the commit? :P /sarcasm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 2:25 PM, Alex Lam S.L. alexla...@gmail.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Alex, you may wish to check PSARC 2009/571 materials [1] for a sneak preview :) [1] http://arc.opensolaris.org/caselog/PSARC/2009/571/ Alex. On Mon, Nov 2, 2009 at 12:21 PM, David Magda dma...@ee.ryerson.ca wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Marie von Ebner-Eschenbach - Even a stopped clock is right twice a day. - http://www.brainyquote.com/quotes/authors/m/marie_von_ebnereschenbac.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
David Magda wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html And PSARC 2009/479 zpool recovery support is in as well: http://mail.opensolaris.org/pipermail/onnv-notify/2009-October/010682.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zfs import (hardware failure)
Hey, On Sat, Oct 31, 2009 at 5:03 PM, Victor Latushkin victor.latush...@sun.com wrote: Donald Murray, P.Eng. wrote: Hi, I've got an OpenSolaris 2009.06 box that will reliably panic whenever I try to import one of my pools. What's the best practice for recovering (before I resort to nuking the pool and restoring from backup)? Could you please post panic stack backtrace? There are two pools on the system: rpool and tank. The rpool seems to be fine, since I can boot from a 2009.06 CD and 'zpool import -f rpool'; I can also 'zfs scrub rpool', and it doesn't find any errors. Hooray! Except I don't care about rpool. :-( If I boot from hard disk, the system begins importing zfs pools; once it's imported everything I usually have enough time to log in before it panics. If I boot from CD and 'zfs import -f tank', it panics. I've just started a 'zdb -e tank' which I found on the intertubes here: http://opensolaris.org/jive/thread.jspa?threadID=49020. Zdb seems to be ... doing something. Not sure _what_ it's doing, but it can't be making things worse for me right? Yes, zdb only reads, so it cannot make thing worse. I'm going to try adding the following to /etc/system, as mentioned here: http://opensolaris.org/jive/thread.jspa?threadID=114906 set zfs:zfs_recover=1 set aok=1 Please do not rush with these settings. Let's look at the stack backtrace first. Regards, Victor I think I've found the cause of my problem. I disconnected one side of each mirror, rebooted, and imported. The system didn't panic! So one of the disconnected drives (or cables, or controllers...) was the culprit. I've since narrowed it down to a single 500GB drive. When that drive is connected, a zpool import panics the system. When that drive is disconnected, the pool imports fine. r...@weyl:~# zpool status tank pool: tank state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed after 0h8m with 0 errors on Sun Nov 1 22:11:15 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 7508645614192559694 FAULTED 0 0 0 was /dev/dsk/c7t0d0s0 c6t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 6 21.2G resilvered c7t0d0 ONLINE 0 0 0 errors: No known data errors r...@weyl:~# The first thing that's jumping out at me: why does the first mirror think the missing disk was c7t0d0? I have an old zpool status from before the problem began, and that disk used to be c6t0d0. r...@weyl:~# zpool status tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirrorONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 errors: No known data errors r...@weyl:~# Victor has been very helpful, living up to his reputation. Thanks Victor! If we determine a root cause, I'll update the list. Things I've learned along the way: - pools import automatically based on cached information in /etc/zfs/zpool.cache; if you move zpool.cache elsewhere, none of the pools will import upon rebooting; - import problematic pools via 'zpool import -f -R /a poolname'; this doesn't update the cachefile, and mounts the pool on /a; - adding the following to /etc/system didn't prevent a hardware-induced panic: set zfs:zfs_recover=1 set aok=1 - crash dumps are typically saved in /var/crash/$( uname -n ) - beadm is your friend; - redundancy is your friend (okay, I already knew that); - if you have a zfs problem, you want Victor Latushkin to be your friend; Cheers! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I protect my zfs pools?
Hey, On Sun, Nov 1, 2009 at 8:48 PM, Donald Murray, P.Eng. donaldm...@gmail.com wrote: Hi, I may have lost my first zpool, due to ... well, we're not yet sure. The 'zpool import tank' causes a panic -- one which I'm not even able to capture via savecore. Looks like I've found the root cause. When I disconnected half of one of my mirrors, my pool imports cleanly. One way to protect your zpool: don't have hardware failures. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] marvell88sx2 driver build126
I have the same card and might have seen the same problem. Yesterday I upgraded to b126 and started to migrate all my data to 8 disc raidz2 connected to such a card. And suddenly ZFS reported checksum errors. I thought the drives were faulty. But you suggest the problem could have been the driver? I also noticed that one of the drives had resilvered a small amount, just like yours. I now use b125 and there are no checksum errors. So, is there a bug in the new b126 driver? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedup question
Does dedup work at the pool level or the filesystem/dataset level? For example, if I were to do this: bash-3.2$ mkfile 100m /tmp/largefile bash-3.2$ zfs set dedup=off tank bash-3.2$ zfs set dedup=on tank/dir1 bash-3.2$ zfs set dedup=on tank/dir2 bash-3.2$ zfs set dedup=on tank/dir3 bash-3.2$ cp /tmp/largefile /tank/dir1/largefile bash-3.2$ cp /tmp/largefile /tank/dir2/largefile bash-3.2$ cp /tmp/largefile /tank/dir3/largefile Would largefile get dedup'ed? Would I need to set dedup on for the pool, and then disable where it isn't wanted/needed? Also, will we need to move our data around (send/recv or whatever your preferred method is) to take advantage of dedup? I was hoping the blockpointer rewrite code would allow an admin to simply turn on dedup and let ZFS process the pool, eliminating excess redundancy as it went. -- Breandan Dezendorf brean...@dezendorf.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 7:20 AM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff On systems with crypto accelerators (particularly Niagara 2) does the hash calculation code use the crypto accelerators, so long as a supported hash is used? Assuming the answer is yes, have performance comparisons been done between weaker hash algorithms implemented in software and sha256 implemented in hardware? I've been waiting very patiently to see this code go in. Thank you for all your hard work (and the work of those that helped too!). -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. so largefile will get deduped in the example below. Enda Breandan Dezendorf wrote: Does dedup work at the pool level or the filesystem/dataset level? For example, if I were to do this: bash-3.2$ mkfile 100m /tmp/largefile bash-3.2$ zfs set dedup=off tank bash-3.2$ zfs set dedup=on tank/dir1 bash-3.2$ zfs set dedup=on tank/dir2 bash-3.2$ zfs set dedup=on tank/dir3 bash-3.2$ cp /tmp/largefile /tank/dir1/largefile bash-3.2$ cp /tmp/largefile /tank/dir2/largefile bash-3.2$ cp /tmp/largefile /tank/dir3/largefile Would largefile get dedup'ed? Would I need to set dedup on for the pool, and then disable where it isn't wanted/needed? Also, will we need to move our data around (send/recv or whatever your preferred method is) to take advantage of dedup? I was hoping the blockpointer rewrite code would allow an admin to simply turn on dedup and let ZFS process the pool, eliminating excess redundancy as it went. -- Enda O'Connor x19781 Software Product Engineering Patch System Test : Ireland : x19781/353-1-8199718 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
On Mon, Nov 2, 2009 at 9:41 AM, Enda O'Connor enda.ocon...@sun.com wrote: it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. Great! I've been looking forward to this code for a long time. All the work and energy is very much appreciated. -- Breandan Dezendorf brean...@dezendorf.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
This is truly awesome news! What's the best way to dedup existing datasets? Will send/recv work, or do we just cp things around? Regards, Tristan Jeff Bonwick wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Double WOHOO! Thanks Victor! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On 02.11.09 18:38, Ross wrote: Double WOHOO! Thanks Victor! Thanks should go to Tim Haley, Jeff Bonwick and George Wilson ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Ok, thanks everyone then (but still thanks to Victor for the heads up) :-) On Mon, Nov 2, 2009 at 4:03 PM, Victor Latushkin victor.latush...@sun.com wrote: On 02.11.09 18:38, Ross wrote: Double WOHOO! Thanks Victor! Thanks should go to Tim Haley, Jeff Bonwick and George Wilson ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
Enda O'Connor wrote: it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. so largefile will get deduped in the example below. And you can use 'zdb -S' (which is a lot better now than it used to be before dedup) to see how much benefit is there (without even turning dedup on): bash-3.2# zdb -S rpool Simulated DDT histogram: bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 625K9.9G 7.90G 7.90G 625K9.9G 7.90G 7.90G 2 9.8K184M132M132M20.7K386M277M277M 41.21K 16.6M 10.8M 10.8M5.71K 76.9M 48.6M 48.6M 8 395764K745K745K3.75K 6.90M 6.69M 6.69M 16 125 2.71M888K888K2.60K 54.2M 17.9M 17.9M 32 56 2.10M750K750K2.33K 85.6M 29.8M 29.8M 649 22.0K 22.0K 22.0K 778 2.04M 2.04M 2.04M 1284 6.00K 6.00K 6.00K 594853K853K853K 2562 8K 8K 8K 711 2.78M 2.78M 2.78M 5122 4.50K 4.50K 4.50K1.47K 3.52M 3.52M 3.52M 8K1128K128K128K15.9K 1.99G 1.99G 1.99G 16K2 8K 8K 8K50.7K203M203M203M Total 637K 10.1G 8.04G 8.04G 730K 12.7G 10.5G 10.5G dedup = 1.30, compress = 1.22, copies = 1.00, dedup * compress / copies = 1.58 bash-3.2# Be careful - can eat lots of RAM! Many thanks to Jeff and all the team! Regards, Victor Enda Breandan Dezendorf wrote: Does dedup work at the pool level or the filesystem/dataset level? For example, if I were to do this: bash-3.2$ mkfile 100m /tmp/largefile bash-3.2$ zfs set dedup=off tank bash-3.2$ zfs set dedup=on tank/dir1 bash-3.2$ zfs set dedup=on tank/dir2 bash-3.2$ zfs set dedup=on tank/dir3 bash-3.2$ cp /tmp/largefile /tank/dir1/largefile bash-3.2$ cp /tmp/largefile /tank/dir2/largefile bash-3.2$ cp /tmp/largefile /tank/dir3/largefile Would largefile get dedup'ed? Would I need to set dedup on for the pool, and then disable where it isn't wanted/needed? Also, will we need to move our data around (send/recv or whatever your preferred method is) to take advantage of dedup? I was hoping the blockpointer rewrite code would allow an admin to simply turn on dedup and let ZFS process the pool, eliminating excess redundancy as it went. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Matthias Appel wrote: I am using 2x Gbit Ethernet an 4 Gig of RAM, 4 Gig of RAM for the iRAM should be more than sufficient (0.5 times RAM and 10s worth of IO) I am aware that this RAM is non-ECC so I plan to mirror the ZIL device. Any considerations for this setupWill it work as I expect it (speed up sync. IO especially for NFS)? We looked at the iRAM a while back and it had decent performance, but I was wary of deploying it in remote datacenters since there was no way to monitor the battery status. IIRC the only indicator of a bad battery is an LED on the board face, not even on the bracket, so I'd have to go crack open the machine to make sure the battery was holding a charge. That didn't fit our model of maintainability, so we didn't deploy it. Regards, Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. If the implementation of the SHA256 ( or possibly SHA512 at some point ) algorithm is well threaded then one would be able to leverage those massively multi-core Niagara T2 servers. The SHA256 hash is based on six 32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT Niagara T2 can easily process those 64-bit hash functions and the multi-core CMT trend is well established. So long as context switch times are very low one would think that IO with a SHA512 based de-dupe implementation would be possible and even realistic. That would solve the hash collision concern I would think. Merely thinking out loud here ... -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 11:58 AM, Dennis Clarke dcla...@blastwave.org wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. If the implementation of the SHA256 ( or possibly SHA512 at some point ) algorithm is well threaded then one would be able to leverage those massively multi-core Niagara T2 servers. The SHA256 hash is based on six 32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT Niagara T2 can easily process those 64-bit hash functions and the multi-core CMT trend is well established. So long as context switch times are very low one would think that IO with a SHA512 based de-dupe implementation would be possible and even realistic. That would solve the hash collision concern I would think. Merely thinking out loud here ... And my out loud thinking on this says that the crypto accelerator on a T2 system does hardware acceleration of SHA256. NAME n2cp - Ultra-SPARC T2 crypto provider device driver DESCRIPTION The n2cp device driver is a multi-threaded, loadable hardware driver supporting hardware assisted acceleration of the following cryptographic operations, which are built into the Ultra-SPARC T2 CMT processor: DES: CKM_DES_CBC, CKM_DES_ECB DES3: CKM_DES3_CBC, CKM_DES3_ECB, AES: CKM_AES_CBC, CKM_AES_ECB, CKM_AES_CTR RC4: CKM_RC4 MD5: KM_MD5, CKM_MD5_HMAC, CKM_MD5_HMAC_GENERAL, CKM_SSL3_MD5_MAC SHA-1: CKM_SHA_1, CKM_SHA_1_HMAC, CKM_SHA_1_HMAC_GENERAL, CKM_SSL3_SHA1_MAC SHA-256:CKM_SHA256, CKM_SHA256_HMAC, CKM_SHA256_HMAC_GENERAL According to page 35 of http://www.slideshare.net/ramesh_r_nagappan/wirespeed-cryptographic-acceleration-for-soa-and-java-ee-security, a T2 CPU can do 41 Gb/s of SHA256. The implication here is that this keeps the MAU's busy but the rest of the core is still idle for things like compression, TCP, etc. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 02, 2009 at 12:58:32PM -0500, Dennis Clarke wrote: Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. ZFS doesn't have enough room in blkptr_t for 512-bi hashes. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote: Enda O'Connor wrote: it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. so largefile will get deduped in the example below. And you can use 'zdb -S' (which is a lot better now than it used to be before dedup) to see how much benefit is there (without even turning dedup on): forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe? -Jeremy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
On Mon, Nov 2, 2009 at 9:01 PM, Jeremy Kitchen kitc...@scriptkitchen.com wrote: forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe? No, the compression works on the block level. If there are two identical blocks, then compression will reduce the number of bytes to store on disk in both of them. However, there still will be two identical copies of the compressed data. Dedup will remove the extra copy. So that compression and dedup are complimenting each other. -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? it may provide another space saving advantage. depending on your data, the savings can be very significant. Wouldn't full-filesystem compression naturally de-dupe? no. compression doesn`t look forth and back, only the actual data block is compressed and redundant information being removed. compression != deduplication ! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients
On Thu, 29 Oct 2009 casper@sun.com wrote: Do you have the complete NFS trace output? My reading of the source code says that the file will be created with the proper gid so I am actually believing that the client over corrects the attributes after creating the file/directory. Just wondering if you had a chance to look at the packet capture I sent and the pointers to the Solaris source code that appear to be causing the problem that results in ignoring the sgid bits on directory creations over NFS. The feedback I'm getting from sustaining on my support request is that they don't think it's broken and they're not inclined to fix it. Even if the spec doesn't explicitly define the behavior, respecting the sgid bit on directory creation still seems like the right thing to do. If you agree, perhaps you could use your considerable influence to try and improve interoperability ;)? Or perhaps put me in touch with someone in forward development, or someone in charge of attending NFS interoperability bakeoffs, that might be more interested in improvements? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sub-optimal ZFS performance
hj == Henrik Johansson henr...@henkis.net writes: hj A überquota property for the whole pool would have been nice hj [to get out-of-space errors instead of fragmentation] just make an empty filesystem with a reservation. That's what I do. NAMEUSED AVAIL REFER MOUNTPOINT andaman3.71T 382G18K none andaman/arrchive 3.07T 382G 67.7G /arrchive andaman/balloon 18K 1010G18K none terabithia:/export/home/guest/Azureus Downloads# zfs get reservation andaman/balloon NAME PROPERTY VALUESOURCE andaman/balloon reservation 628G local pgprZjhXXryEg.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients
On Sat, 31 Oct 2009, Al Hopper wrote: Kudos to you - nice technical analysis and presentation, Keep lobbying your point of view - I think interoperability should win out if it comes down to an arbitrary decision. Thanks; but so far that doesn't look promising. Right now I've got a cron job running every hour on the backend servers crawling around and fixing permissions on new directories :(. You would have thought something like this would have been noticed in one of the NFS interoperability bake offs. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
Jeremy Kitchen wrote: On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote: Enda O'Connor wrote: it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents. so largefile will get deduped in the example below. And you can use 'zdb -S' (which is a lot better now than it used to be before dedup) to see how much benefit is there (without even turning dedup on): forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe? See this for example: Simulated DDT histogram: bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 625K9.9G 7.90G 7.90G 625K9.9G 7.90G 7.90G 2 9.8K184M132M132M20.7K386M277M277M Allocated means what is actually allocated on disk, referenced - what would be allocated on disk without deduplication; then LSIZE denotes logical size, PSIZE denotes physical size after compression. Row with reference count of 1 shows the same figures both in allocated and referenced and this is expected - there only one reference to a block. But row with reference count of 2 shows good difference - without deduplication it is 20.7 thousands blocks on disk with logical size totalling to 386M and physical size after compression 277M. With deduplication there would be only 9.8 thousands blocks on disk (dedup factor of over 2x!), with logical size totalling to 184M and physical size of 132M. So with compression without deduplication it is 277M on disk, with deduplication it would be only 132M - good savings! Hope this helps, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote: forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe? If you snapshot/clone as you go, then yes, dedup will do little for you because you'll already have done the deduplication via snapshots and clones. But dedup will give you that benefit even if you don't snapshot/clone all your data. Not all data can be managed hierarchically, with a single dataset at the root of a history tree. For example, suppose you want to create two VirtualBox VMs running the same guest OS, sharing as much on-disk storage as possible. Before dedup you had to: create one VM, then snapshot and clone that VM's VDI files, use an undocumented command to change the UUID in the clones, import them into VirtualBox, and setup the cloned VM using the cloned VDI files. (I know because that's how I manage my VMs; it's a pain, really.) With dedup you need only enable dedup and then install the two VMs. Clearly the dedup approach is far, far easier to use than the snapshot/clone approach. And since you can't always snapshot/clone... There are many examples where snapshot/clone isn't feasible but dedup can help. For example: mail stores (though they can do dedup at the application layer by using message IDs and hashes). For example: home directories (think of users saving documents sent via e-mail). For example: source code workspaces (ONNV, Xorg, Linux, whatever), where users might not think ahead to snapshot/clone a local clone (I also tend to maintain a local SCM clone that I then snapshot/clone to get workspaces for bug fixes and projects; it's a pain, really). I'm sure there are many, many other examples. The workspace example is particularly interesting: with the snapshot/clone approach you get to deduplicate the _source code_, but not the _object code_, while with dedup you get both dedup'ed automatically. As for compression, that helps whether you dedup or not, and it helps by about the same factor either way -- dedup and compression are unrelated, really. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
On Mon, Nov 2, 2009 at 2:16 PM, Nicolas Williams nicolas.willi...@sun.com wrote: On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote: forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe? If you snapshot/clone as you go, then yes, dedup will do little for you because you'll already have done the deduplication via snapshots and clones. But dedup will give you that benefit even if you don't snapshot/clone all your data. Not all data can be managed hierarchically, with a single dataset at the root of a history tree. For example, suppose you want to create two VirtualBox VMs running the same guest OS, sharing as much on-disk storage as possible. Before dedup you had to: create one VM, then snapshot and clone that VM's VDI files, use an undocumented command to change the UUID in the clones, import them into VirtualBox, and setup the cloned VM using the cloned VDI files. (I know because that's how I manage my VMs; it's a pain, really.) With dedup you need only enable dedup and then install the two VMs. The big difference here is when you consider a life cycle that ends long after provisioning is complete. With clones, the images will diverge. If a year after you install each VM you decide to do an OS upgrade, they will still be linked but are quite unlikely to both reference many of the same blocks. However, with deduplication, the similar changes (e.g. same patch applied, multiple of the same application installed, upgrade to the same newer OS) will result in fewer stored copies. This isn't a big deal if you have 2 VM's. It because quite significant if you have 5000 (e.g. on a ZFS-based file server). Assuming that the deduped blocks stay deduped in the ARC, it means that it is feasible to every block that is accessed with any frequency to be in memory. Oh yeah, and you save a lot of disk space. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Okay, nice to hear ZFS can now use dedup. But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? -- Daniel -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On 03/11/2009, at 7:32 AM, Daniel Streicher wrote: But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? For OpenSolaris, you change your repository and switch to the development branches - should be available to public in about 3-3.5 weeks time. Plenty of instructions on how to do this on the net and in this list. For Solaris, you need to wait for the next update release. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Looks great - and by the time OpenSolaris build has it, I will have a brand new laptop to put it on ;-) One question though - I have a file server at home with 4x750GB on raidz1. When I upgrade to the latest build and set dedup=on, given that it does not have an offline mode, there is no way to operate on the existing dataset? As a workaround I can move files in and out of the pool through an external 500GB HDD, and with the ZFS snapshots I don't really risk much about losing data if anything goes (not too horribly, anyway) wrong. Thanks to you guys again for the great work! Alex. On Mon, Nov 2, 2009 at 1:20 PM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff -- Mike Ditka - If God had wanted man to play soccer, he wouldn't have given us arms. - http://www.brainyquote.com/quotes/authors/m/mike_ditka.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
ZFS dedup will be in snv_128, but putbacks to snv_128 will not likely close till the end of this week. The OpenSolaris dev repository was updated to snv_126 last Thursday: http://mail.opensolaris.org/pipermail/opensolaris-announce/2009-October/001317.html So it looks like about 5 weeks before the dev repository will be updated to snv_128. Then we see if any bugs emerge as we all rush to test it out... Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
James Lever wrote: On 03/11/2009, at 7:32 AM, Daniel Streicher wrote: But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? For OpenSolaris, you change your repository and switch to the development branches - should be available to public in about 3-3.5 weeks time. Plenty of instructions on how to do this on the net and in this list. For Solaris, you need to wait for the next update release. at which stage a patch ( kernel Patch ) will be released that can be applied to pre update 9 releases to get the latest zpool version, existing pools would require a zpool upgrade. Enda cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Location of ZFS documentation (source)?
The man pages documentation from the old Apple port (http://github.com/alblue/mac-zfs/tree/master/zfs_documentation/man8/) don't seem to have a corresponding source file in the onnv-gate repository (http://hub.opensolaris.org/bin/view/Project+onnv/WebHome) although I've found the text on-line (http://docs.sun.com/app/docs/doc/819-2240/zfs-1m) Can anyone point me to where these are stored, so that we can update the documentation in the Apple fork? Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Location of ZFS documentation (source)?
Hi Alex, I'm checking with some folks on how we handled this handoff for the previous project. I'll get back to you shortly. Thanks, Cindy On 11/02/09 16:07, Alex Blewitt wrote: The man pages documentation from the old Apple port (http://github.com/alblue/mac-zfs/tree/master/zfs_documentation/man8/) don't seem to have a corresponding source file in the onnv-gate repository (http://hub.opensolaris.org/bin/view/Project+onnv/WebHome) although I've found the text on-line (http://docs.sun.com/app/docs/doc/819-2240/zfs-1m) Can anyone point me to where these are stored, so that we can update the documentation in the Apple fork? Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Mike Gerdts wrote: On Mon, Nov 2, 2009 at 7:20 AM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff On systems with crypto accelerators (particularly Niagara 2) does the hash calculation code use the crypto accelerators, so long as a supported hash is used? Not yet, it is coming. Currently ZFS has a private copy of SHA256 (for legacy reasons). I have an RTI pending to switch it to the same copy that the crypto framework uses. That is an optimised software implementation (SPARC, Intel and AMD64) but won't yet use the Niagara 2 or on chip crypto. There is an issue with very early boot and the crypto framework I have to resolve so that will come later. Assuming the answer is yes, have performance comparisons been done between weaker hash algorithms implemented in software and sha256 implemented in hardware? I've done some comparisons on that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris disk confusion ?
zfs...@jeremykister.com said: # format -e c12t1d0 selecting c12t1d0 [disk formatted] /dev/dsk/c3t11d0s0 is part of active ZFS pool dbzpool. Please see zpool(1M). It is true that c3t11d0 is part of dbzpool. But why is solaris upset about c3t11 when i'm working with c12t1 ?? So i checked the device links, and all looks fine: . . . Could it be that c12t1d0 was at some time in the past (either in this machine or another machine) known as c3t11d0, and was part of a pool called dbzpool? i tried: fdisk -B /dev/rdsk/c12t1d0 dd if=/dev/zero of=/dev/rdsk/c12t1d0p0 bs=1024k dd if=/dev/zero of=/dev/rdsk/c12t1d0s2 bs=1024k but Solaris still has some association between c3t11 and c12t1. You'll need to give the same dd treatment to the end of the disk as well; ZFS puts copies of its labels at the beginning and at the end. Oh, and you can fdisk -E /dev/rdsk/c12t1d0 to convert to a single, whole-disk EFI partition (non-VTOC style). Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup question
I just stumbled across a clever visual representation of deduplication: http://loveallthis.tumblr.com/post/166124704 It's a flowchart of the lyrics to Hey Jude. =-) Nothing is compressed, so you can still read all of the words. Instead, all of the duplicates have been folded together. -cheers, CSB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Great stuff, Jeff and company. You all rock. =-) A potential topic for the follow-up posts: auto-ditto, and the philosophy behind choosing a default threshold for creating a second copy. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] automate zpool scrub
On a related note, it looks like Constantin is developing a nice SMF service for auto scrub: http://blogs.sun.com/constantin/entry/new_opensolaris_zfs_auto_scrub This is an adaptation of the well-tested auto snapshot service. Amongst other advantages, this approach means that you don't have to deal directly with cron. I'll be interested to see the enhancement for persistent logging of scrub events, which will simplify things a bit. -cheers, CSB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] More Dedupe Questions...
I'm curious as to how send/recv intersects with dedupe... if I send/recv a deduped filesystem, is the data sent it it's de-duped form, ie just sent once, followed by the pointers for subsequent dupe data, or is the the data sent in expanded form, with the recv side system then having to redo the dedupe process? Obviously sending it deduped is more efficient in terms of bandwidth and CPU time on the recv side, but it may also be more complicated to achieve? Also - do we know yet what affect block size has on dedupe? My guess is that a smaller block size will perhaps give a better duplication match rate, but at the cost of higher CPU usage and perhaps reduced performance, as the system will need to store larger de-dupe hash tables? Regards, Tristan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] More Dedupe Questions...
Tristan, there's another dedup system for zfs send in PSARC 2009/557. This can be used independently of whether the in-pool data was deduped. Case log: http://arc.opensolaris.org/caselog/PSARC/2009/557/ Discussion: http://www.opensolaris.org/jive/thread.jspa?threadID=115082 So I believe your deduped data is rehydrated for sending, and then (within the send stream) this other method may be used to save space in transit. What the pool on the receiving end does with it will depend on it's local dedup settings. HTH... -cheers, CSB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss