Re: [zfs-discuss] ZFS compression on Clearcase
On 4 Feb 2010, at 16:35, Bob Friesenhahn wrote: On Thu, 4 Feb 2010, Darren J Moffat wrote: Thanks - IBM basically haven't test clearcase with ZFS compression therefore, they don't support currently. Future may change, as such my customer cannot use compression. I have asked IBM for roadmap info to find whether/when it will be supported. That is FUD generation in my opinion and being overly cautious. The whole point of the POSIX interfaces to a filesystem is that applications don't actually care how the filesystem stores their data. Clearcase itself implements a versioning filesystem so perhaps it is not being overly cautious. Compression could change aspects such as how free space is reported. I'd also like to echo Bob's observations here. Darren's FUDFUD is based on limited experience of ClearCase, I expect ... I do know how ClearCase works and it works *above* the POSIX layer in ZFS - at the VFS layer (and higher). [I've debugged Solaris crash dumps with the clear case kernel modules loaded in them in the past]. By FUD I don't mean it is wrong, but without information about a bug or observed undesirable behaviour it is coming across as Fear that there could be problems. Basically we need more data. What I was pointing out is that because of the layer that ClearCase works there should be no problems - I'm not saying there aren't any just that I don't see where they would be. If there are problems with ZFS then bugs should be logged, leaving statements like ISV x doesn't support using feature f of ZFS is harm full to the ISV's product and to ZFS when there is no bug logged or data about why there is a problem. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unionfs help
Nicolas Williams nicolas.willi...@sun.com wrote: There's no unionfs for Solaris. (For those of you who don't know, unionfs is a BSDism and is a pseudo-filesystem which presents the union of two underlying filesystems, but with all changes being made only to one of the two filesystems. The idea is that one of the underlying filesystems cannot be modified through the union, with all changes made through the union ...and it seems that the ideas for this FS have been taken from TFS (Translucent file system) that appeared in Sun OS in 1986 already. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS send/recv checksum transmission
Are the sha256/fletcher[x]/etc checksums sent to the receiver along with the other data/metadata? And checked upon receipt of course. Do they chain all the way back to the uberblock or to some calculated transfer specific checksum value? The idea is to carry through the integrity checks wherever possible. Whether done as close as within the same zpool, or miles away. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/03/2010 04:35 PM, Andrey Kuzmin wrote: At zfs_send level there are no files, just DMU objects (modified in some txg which is the basis for changed/unchanged decision). Would be awesome if zfs send would have an option to show files changed (with offset), and mode/directory changes (showing the before after data). As is, zfs send is nice but you require ZFS in both sides. I would love a rsync-like tool that could avoid to scan 20 millions of files just to find a couple of small changes (or none at all). - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBS2wc2Zlgi5GaxT1NAQKtRgP/dVBF8xfGPRRcq5tpKBQTW7C1aCiHzMhV 0Sxu2lWY7Fcl7+se5O2YINYYVFWF7dA+Rh0yr2dAQDNTbe0CfwRxt3BKjS+nsjvH GFW7cBOD+Zg7tt3nrVaYf7fg86ZssR9rTDj56fRycdA2rzfpnIgjP0bYoZczo6Lx 9DdiopUHaec= =RkVb -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to get a list of changed files between two snapshots?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/04/2010 05:10 AM, Matthew Ahrens wrote: This is RFE 6425091 want 'zfs diff' to list files that have changed between snapshots, which covers both file directory changes, and file removal/creation/renaming. We actually have a prototype of zfs diff. Hopefully someday we will finish it up... Can't wait! :-)) - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBS2we+5lgi5GaxT1NAQJbzQP9FuwJAFNP+7m+kIHG0Tx4ksDUwrD8g+UD 8dYSjsymNANml1St39vlLUyG9czz2jt/9HR+fw6ERc4lJI+omlZx9eUMy6f3nVyP GcPpReVE5yMoDUZuhWJwu2fJLvcxzQl6yTSN/J+CVKGeIAJeR6TDWV6Z7UbxmgRA Oc/qN9f70hg= =H9sA -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Keeping resilverscrubbing time persistently
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 When a scrub/resilver finishes, you see the date and time in zpool status. But this information doesn't persist across reboots. Would be nice being able to see the date and time it took to scrub the pool, even if you reboot your machine :). PS: I am talking about Solaris 10 U8. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBS2wgAZlgi5GaxT1NAQIFYQQAiLuQilN1BiqxlQv9P/94fIy2BUg+YnSx Liknb7kaM7YOayZUsTm7a8whG+wfQ5yNIjLAXQ0/pMbVNPZHP5eYKGt42USPIyIV t8no7s33cAlqTIW/JcZ2JqLEkTQ4EJ5vFigFWnEcV7CzQo8b4xiUK3jaV2FfN1zb QE1IKlYu52Q= =fItU -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool disk replacing fails
On Fri, 5 Feb 2010, Alexander M. Stetsenko wrote: NAMESTATE READ WRITE CKSUM mypool DEGRADED 0 0 0 mirrorDEGRADED 0 0 0 c1t4d0 DEGRADED 0 028 too many errors c1t5d0 ONLINE 0 0 0 I think your best bet is to do 'zpool detach mypool c1d4d0' followed by a 'zpool attach mypool c1t5d0 c1t4d0'. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Keeping resilverscrubbing time persistently
On Fri, Feb 05, 2010 at 02:41:35PM +0100, Jesus Cea wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 When a scrub/resilver finishes, you see the date and time in zpool status. But this information doesn't persist across reboots. Would be nice being able to see the date and time it took to scrub the pool, even if you reboot your machine :). PS: I am talking about Solaris 10 U8. This is likely (RFE): 6878281 zpool should store the time of last scrub/resilver and other zpool status info in pool properties http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6878281 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Was my raidz2 performance comment above correct? That the write speed is that of the slowest disk? That is what I believe I have read. You are sort-of-correct that its the write speed of the slowest disk. My experience is not in line with that statement. RAIDZ will write a complete stripe plus parity (RAIDZ2 - two parities, etc.). The write speed of the entire stripe will be brought down to that of the slowest disk, but only for its portion of the stripe. In the case of a 5 spindle RAIDZ2, 1/3 of the stripe will be written to each of three disks and parity info on the other two disks. The throughput would be 3x the slowest disk for read or write. Mirrored drives will be faster, especially for random I/O. But you sacrifice storage for that performance boost. Is that really true? Even after glancing at the code, I don't know if zfs overlaps mirror reads across devices. Watching my rpool mirror leads me to believe that it does not. If true, then mirror reads would be no faster than a single disk. Mirror writes are no faster than the slowest disk. As a somewhat related rant, there seems to be confusion about mirror IOPS vs. RAIDZ[123] IOPS. Assuming mirror reads are not overlapped, then a mirror vdev will read and write at roughly the same throughput and IOPS as a single disk (ignoring bus and cpu constraints). Also ignoring bus and cpu constraints, a RAIDZ[123] vdev will read and write at roughly the same throughput of a single disk, multiplied by the number of data drives: three in the config being discussed. Also, a RAIDZ[123] vdev will have IOPS performance similar to that of a single disk. A stack of mirror vdevs will, of course, perform much better than a single mirror vdev in terms of throughput and IOPS. A stack of RAIDZ[123] vdevs will also perform much better than a single RAIDZ[123] vdev in terms of throughput and IOPS. RAIDZ tends to have more CPU overhead and provides more flexibility in choosing the optimal data to redundancy ratio. Many read IOPS problems can be mitigated by L2ARC, even a set of small, fast disk drives. Many write IOPS problems can be mitigated by ZIL. My anecdotal conclusions backed by zero science, Marty -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On 05/02/2010 04:11, Edward Ned Harvey wrote: Data in raidz2 is striped so that it is split across multiple disks. Partial truth. Yes, the data is on more than one disk, but it's a parity hash, requiring computation overhead and a write operation on each and every disk. It's not simply striped. Whenever you read or write, you need to access all the disks (or a bunch of 'em) and use compute cycles to generate the actual data stream. I don't know enough about the underlying methods of calculating and distributing everything to say intelligently *why*, but I know this: Well, that's not entirely true. When reading from raidz2 (non-degraded) you don't need to re-compute any hashes except for a standard fs block checksum which zfs checks regardless of underlying redundancy. In this (sequential) sense it is faster than a single disk. Whenever I benchmark raid5 versus a mirror, the mirror is always faster. Noticeably and measurably faster, as in 50% to 4x faster. (50% for a single disk mirror versus a 6-disk raid5, and 4x faster for a stripe of mirrors, 6 disks with the capacity of 3, versus a 6-disk raid5.) Granted, I'm talking about raid5 and not raidz. There is possibly a difference there, but I don't think so. Actually, there is. One difference is that when writing to a raid-z{1|2} pool compared to raid-10 pool you should get better throughput if at least 4 drives are used. Basically it is due to the fact that in RAID-10 the maximum you can get in terms of write throughput is a total aggregated throughput of half the number of used disks and only assuming there are no other bottlenecks between the OS and disks especially as you need to take into account that you are double the bandwidth requirements due to mirroring. In case of RAID-Zn you have some extra overhead for writing additional checksum but other than that you should get a write throughput closer to of T-N (where N is a RAID-Z level) instead of T/2 in RAID-10. See http://milek.blogspot.com/2006/04/software-raid-5-faster-tha_114588672235104990.html -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Recover ZFS Array after OS Crash?
Hi all, I'm building a whole new server system for my employer, and I really want to use OpenSolaris as the OS for the new file server. One thing is keeping me back, though: is it possible to recover a ZFS Raid Array after the OS crashes? I've spent hours with Google to avail To be more descriptive, I plan to have a Raid 1 array for the OS, and then I will need 3 additional Raid5/RaidZ/etc arrays for data archiving, backups and other purposes. There is plenty of documentation on how to recover an array if one of the drives in the array fails, but what if the OS crashes? Since ZFS is a software-based RAID, if the OS crashes is it even possible to recover any of the arrays? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Fri, 5 Feb 2010, Rob Logan wrote: well, lets look at Intel's offerings... Ram is faster than AMD's at 1333Mhz DDR3 and one gets ECC and thermal sensor for $10 over non-ECC Intel's RAM is faster because it needs to be. It is wise to see the role that architecture plays in total performance. Now, this gets one to 8G ECC easily...AMD's unfair advantage is all those ram slots on their multi-die MBs... A slow AMD cpu with 64G ram might be better depending on your working set / dedup requirements. With the AMD CPU, the memory will run cooler and be cheaper. Regardless, for zfs, memory is more important than raw CPU performance. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover ZFS Array after OS Crash?
On Fri, Feb 05, 2010 at 08:35:15AM -0800, J wrote: To be more descriptive, I plan to have a Raid 1 array for the OS, and then I will need 3 additional Raid5/RaidZ/etc arrays for data archiving, backups and other purposes. There is plenty of documentation on how to recover an array if one of the drives in the array fails, but what if the OS crashes? Since ZFS is a software-based RAID, if the OS crashes is it even possible to recover any of the arrays? Sure, because the ZFS configuration is stored within the pool, not in the OS. Just install a new OS, attach the disks, and do a 'zfs import' to find the importable pools. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Autoreplace property not accounted ?
Hi list, I've a strange behaviour with autoreplace property. It is set to off by default, ok. I want to manually manage disk replacement so default off matches my need. # zpool get autoreplace mypool NAME PROPERTY VALUESOURCE mypool autoreplace off default Then I added 2 spare disks. spares c1t18d0 AVAIL c1t19d0 AVAIL Ok, fine. Then I had failures with 1 disk of the pool and can see in logs the following : DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. --- This is where my problem occurs , zfs automatically replaced faulted disk by a spare ! even with autoreplace=off # zpool status pool: mypool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver completed after 0h0m with 0 errors on Thu Feb 4 00:10:25 2010 config: NAME STATE READ WRITE CKSUM mypool DEGRADED 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 spare DEGRADED 4 0 0 c1t8d0 FAULTED 326 0 too many errors c1t18d0 ONLINE 0 0 4 56K resilvered c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 cache c2d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 spares c1t18d0 INUSE currently in use c1t19d0 AVAIL errors: No known data errors Any idea why it has been done automatically ? solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4 Thx for your answers. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
if zfs overlaps mirror reads across devices. it does... I have one very old disk in this mirror and when I attach another element one can see more reads going to the faster disks... this past isn't right after the attach but since the reboot, but one can still see the reads are load balanced depending on the response of elements in the vdev. 13 % zpool iostat -v capacity operationsbandwidth pool used avail read write read write -- - - - - - - rpool 7.01G 142G 0 0 1.60K 1.44K mirror7.01G 142G 0 0 1.60K 1.44K c9t1d0s0 - - 0 0674 1.46K c9t2d0s0 - - 0 0687 1.46K c9t3d0s0 - - 0 0720 1.46K c9t4d0s0 - - 0 0750 1.46K but I also support your conclusions. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Autoreplace property not accounted ?
Hi Francois, The autoreplace property works independently of the spare feature. Spares are activated automatically when a device in the main pool fails. Thanks, Cindy On 02/05/10 09:43, Francois wrote: Hi list, I've a strange behaviour with autoreplace property. It is set to off by default, ok. I want to manually manage disk replacement so default off matches my need. # zpool get autoreplace mypool NAME PROPERTY VALUESOURCE mypool autoreplace off default Then I added 2 spare disks. spares c1t18d0 AVAIL c1t19d0 AVAIL Ok, fine. Then I had failures with 1 disk of the pool and can see in logs the following : DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. --- This is where my problem occurs , zfs automatically replaced faulted disk by a spare ! even with autoreplace=off # zpool status pool: mypool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver completed after 0h0m with 0 errors on Thu Feb 4 00:10:25 2010 config: NAME STATE READ WRITE CKSUM mypool DEGRADED 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 spare DEGRADED 4 0 0 c1t8d0 FAULTED 326 0 too many errors c1t18d0 ONLINE 0 0 4 56K resilvered c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 cache c2d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 spares c1t18d0 INUSE currently in use c1t19d0 AVAIL errors: No known data errors Any idea why it has been done automatically ? solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4 Thx for your answers. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Autoreplace property not accounted ?
On Fri, Feb 5, 2010 at 12:11 PM, Cindy Swearingen cindy.swearin...@sun.comwrote: Hi Francois, The autoreplace property works independently of the spare feature. Spares are activated automatically when a device in the main pool fails. Thanks, Cindy On 02/05/10 09:43, Francois wrote: Hi list, I've a strange behaviour with autoreplace property. It is set to off by default, ok. I want to manually manage disk replacement so default off matches my need. # zpool get autoreplace mypool NAME PROPERTY VALUESOURCE mypool autoreplace off default Then I added 2 spare disks. spares c1t18d0 AVAIL c1t19d0 AVAIL Ok, fine. Then I had failures with 1 disk of the pool and can see in logs the following : DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. --- This is where my problem occurs , zfs automatically replaced faulted disk by a spare ! even with autoreplace=off # zpool status pool: mypool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver completed after 0h0m with 0 errors on Thu Feb 4 00:10:25 2010 config: NAME STATE READ WRITE CKSUM mypool DEGRADED 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 spare DEGRADED 4 0 0 c1t8d0 FAULTED 326 0 too many errors c1t18d0 ONLINE 0 0 4 56K resilvered c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 c1t15d0ONLINE 0 0 0 c1t16d0ONLINE 0 0 0 c1t17d0ONLINE 0 0 0 cache c2d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 spares c1t18d0 INUSE currently in use c1t19d0 AVAIL errors: No known data errors Any idea why it has been done automatically ? solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4 Thx for your answers. -- Francois ___ I think it might be helpful to explain exactly what that means. I'll give it a shot, feel free to correct my mistake(s). Francois: when you have autoreplace on, what that means is if you remove the bad drive, and stick in a new one to replace it, it will automatically be added to the pool. To do what you're trying to do, you shouldn't have drives added as hot spares at all. If you want it to be a cold spare, put it in the system, and just leave it unassigned. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover ZFS Array after OS Crash?
Ah, I see! Simple, easy, and saves me hundreds on HW-based RAID controllers ^_^ Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
pr == Peter Radig pe...@radig.de writes: ls == Lutz Schumann presa...@storageconcepts.de writes: pr I was expecting a good performance from the X25-E, but was pr really suprised that it is that good (only 1.7 times slower pr than it takes with ZIL completely disabled). So I will use the pr X25-E as ZIL device on my box and will not consider disabling pr ZIL at all to improve NFS performance. According to Lutz posting here ~2010-01-10, the X25-M may not actually be functioning as a ZIL unless you disable its write cache with 'hdadm'. He said he found normal hard drives respect cache flush commands in stream, but Intel X25-M does not. however both do respect disabling the write cache. ls r...@nexenta:/volumes# hdadm write_cache off c3t5 ls c3t5 write_cache disabled You might want to repeat his test with X25-E. If the X25-E is also dropping cache flush commands (it might!), you may be, compared to disabling the ZIL, slowing down your pool for no reason, and making it more fragile as well since an exported pool with a dead ZIL cannot be imported. pgpdmhmYd4Yxq.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, 5 Feb 2010, Miles Nordin wrote: ls r...@nexenta:/volumes# hdadm write_cache off c3t5 ls c3t5 write_cache disabled You might want to repeat his test with X25-E. If the X25-E is also dropping cache flush commands (it might!), you may be, compared to disabling the ZIL, slowing down your pool for no reason, and making it more fragile as well since an exported pool with a dead ZIL cannot be imported. Others have tested the X25-E and found that with its cache enabled, it does drop flushed writes, but is clearly not such a gaping chasm as the X25-M. Some time has passed so there is the possibility that X25-E firmware has (or will) improve. If Sun offers an X25-E based device for use as an slog, you can be sure that its has been qualified for this purpose, and may contain modified firmware. The 'E' stands for Extreme and not Enterprise as some tend to believe. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, Feb 5, 2010 at 10:55 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 5 Feb 2010, Miles Nordin wrote: ls r...@nexenta:/volumes# hdadm write_cache off c3t5 ls c3t5 write_cache disabled You might want to repeat his test with X25-E. If the X25-E is also dropping cache flush commands (it might!), you may be, compared to disabling the ZIL, slowing down your pool for no reason, and making it more fragile as well since an exported pool with a dead ZIL cannot be imported. Others have tested the X25-E and found that with its cache enabled, it does drop flushed writes, but is clearly not such a gaping chasm as the X25-M. Some time has passed so there is the possibility that X25-E firmware has (or will) improve. If Sun offers an X25-E based device for use as an slog, you can be sure that its has been qualified for this purpose, and may contain modified firmware. The 'E' stands for Extreme and not Enterprise as some tend to believe. Exactly. It would be therefore very interesting to hear on performance from anyone using (real) enterprise SSD (which now spells STEC) as slog. Regards, Andrey Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
b == Brian broco...@vt.edu writes: b (4) Hold backups from windows machines, mac (time machine), b linux. for time machine you will probably find yourself using COMSTAR and the GlobalSAN iSCSI initiator because Time Machine does not seem willing to work over NFS. Otherwise, for Macs you should definitely use NFS, and you should definitely use the automounter, and you should use it with the 'net' option (let Mac OS pick were tou mount the fs) if you have heirarchical mounts. Anyway for time machine you cannot use NFS. I'm using: * snv_130 * globalSAN_4.0.0.197_BETA-20091110 * Mac OS X 10.5.latest and it seems to basically work for the last ~1month. I've no reason to believe these versions are special but suggest you get the BETA globalsan and not the stable one. for linux, if you mount Linux NFS filesystems from Solaris you need to use '-o sec=sys' to avoid everything showing up as guest, due to a weird corner case that I think eventually got fixed on one side or the other but probably hasn't percolated through all the stable branches yet. If you mount Solaris NFS filesystems from Linux, you may want to use '-o noacl' because Solaris NFS fabricates ACL's and feeds them to Linux even when you haven't made any, leading to annoying '+' signs in 'ls -l' and sometimes weird, unnecessary permissions problems. This happens even with NFSv3. :( What's even stupider, busybox 'mount' doesn't seem to support the noacl flag which cost me an extra couple hours getting an NFS-rooted system to boot. I like the idea of smoothly transitioning to a more advanced permissions system, but IMHO the whole mess just goes to show you, let people who've been mucking about with Windows touch anything else in your codebase, and their brains are so warped by the influence of that platform on their thinking they make a ponderous mess of it and then chant ``this shouldn't be happening'' over and over. b (5) Be an iSCSI target for several different Virtual Boxes. I've been using plain statically-allocated (not dynamic) .VDI's on ZFS filesystems. I've not been using zvol's nor any iSCSI yet. If you do the latter two suggest comparing performance with the former one---there are rumors of some cache flush knobs may need tuning. Also in general when you yank the cord, the integrity of a physical machine's filesystems is guaranteed, but the same is *not* true of a virtual machine when its host's cord is yanked. It's supposed to be true when you force-virtual-powerdown the guest, but not when you yank the host's cord, because of the same knobs were twisted to compromise integrity for performance. The compromise is probably the right one provided you can work around it, by for example snapshotting the guest so yuo can roll back if there's corruption, and keeping oft-changing files that can't be rolled back outside the guest using either guest serevices shared folders on Windows or NFS on Unix. b Function 4 will use compression and deduplication. Function 5 b will use deduplication. I've not dared to use dedup yet. In particular the DDT needs to fit in RAM (or maybe L2ARC) to avoid performance degredations so severe you may find yourself painted into a corner (ex., 'zfs delete' runs 1wk forcing you to give up, 'zfs send' non-deduped filesystems elsewhere, destroy pool, restore from backup). not sure sddt-vdev is the best idea but that's discussed here: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566 What's missing to my view is a way to manage it: if overgrown DDT can, in effect, trash the pool by making maintenance commands take forever, then there's got to be a way to watch the size of the DDT, maybe even cap it and disable dedup if it overgrows. That said I haven't tried it so I'm talking out my ass. Also gzip compression does not sound like it works well---suggest lzjb instead---but this might be fixed in 6586537, 6806882, or by this fix which sounds like a fairly big deal: http://arc.opensolaris.org/caselog/PSARC/2009/615/mail so I would say gzip may be worth another try now but definitely be ready to fall back to lzjb and convert with zfs send | zfs recv. anyway...seems many things are really improving drastically since a year ago, and thank god for the list! pgpCAgC0MCdCz.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover ZFS Array after OS Crash?
On 5-Feb-10, at 11:35 AM, J wrote: Hi all, I'm building a whole new server system for my employer, and I really want to use OpenSolaris as the OS for the new file server. One thing is keeping me back, though: is it possible to recover a ZFS Raid Array after the OS crashes? I've spent hours with Google to avail To be more descriptive, I plan to have a Raid 1 array for the OS, and then I will need 3 additional Raid5/RaidZ/etc arrays for data archiving, backups and other purposes. There is plenty of documentation on how to recover an array if one of the drives in the array fails, but what if the OS crashes? Since ZFS is a software-based RAID, if the OS crashes is it even possible to recover any of the arrays? Being a software system it is inherently more recoverable than hardware RAID (the latter is probably only going to be readable on exactly the same configuration, and if the constellations are aligned just right, and the black rooster has crowed four times, etc). As Darren says, you can simply take either or both sides of the mirror and boot or access the pool on another ZFS-capable system. It doesn't even have to use the same interfaces; last week I built a new Solaris 10 web server and migrated pool data from one half of a ZFS pool from the old server, connected by USB/SATA adapter. This kind of flexibility (not to mention data integrity) just isn't there with HW RAID. --Toby -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Fri, Feb 05, 2010 at 11:55:12AM -0800, Bob Friesenhahn wrote: On Fri, 5 Feb 2010, Miles Nordin wrote: ls r...@nexenta:/volumes# hdadm write_cache off c3t5 ls c3t5 write_cache disabled You might want to repeat his test with X25-E. If the X25-E is also dropping cache flush commands (it might!), you may be, compared to disabling the ZIL, slowing down your pool for no reason, and making it more fragile as well since an exported pool with a dead ZIL cannot be imported. Others have tested the X25-E and found that with its cache enabled, it does drop flushed writes, but is clearly not such a gaping chasm as the X25-M. Some time has passed so there is the possibility that X25-E firmware has (or will) improve. If Sun offers an X25-E based device for use as an slog, you can be sure that its has been qualified for this purpose, and may contain modified firmware. The 'E' stands for Extreme and not Enterprise as some tend to believe. I missed out on this thread. How would these dropped flushed writes manifest themselves? Something in the logs, or just worsened performance? Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Identifying firmware version of SATA controller (LSI)
Trying to track down why our two Intel X-25E's are spewing out Write/Retryable errors when being used as a ZIL (mirrored). The system is running a LSI1068E controller with LSISASx36 expander (box built by Silicon Mechanics). The drives are fairly new, and it seems odd that both of the pair would start showing errors at the same time I'm trying to figure out where I can find the firmware on the LSI controller... are the bootup messages the only place I could expect to see this? prtconf and prtdiag both don't appear to give firmware information. We have another nearly identical box that isn't showing these errors which is why I want to compare firmware versions... the boot logs on the good server have been rotated out so I can't find a Firmware number for the mpt0 device in its logs to compare with. Solaris 10 U8 x86. Thanks, Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Fri, Feb 5, 2010 at 12:20 PM, Miles Nordin car...@ivy.net wrote: for time machine you will probably find yourself using COMSTAR and the GlobalSAN iSCSI initiator because Time Machine does not seem willing to work over NFS. Otherwise, for Macs you should definitely use NFS, Slightly off-topic ... You can make Time Machine work with CIFS or NFS mounts by setting a system preference. The command is: defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1 I've had some success trying to get my father-in-law's system to back up to a drobo with this. It was working last time I was by his house, but I'm not sure if it's still working. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS 'secure erase'
Two things, mostly related, that I'm trying to find answers to for our security team. Does this scenario make sense: * Create a filesystem at /users/nfsshare1, user uses it for a while, asks for the filesystem to be deleted * New user asks for a filesystem and is given /users/nfsshare2. What are the chances that they could use some tool or other to read unallocated blocks to view the previous user's data? Related to that, when files are deleted on a ZFS volume over an NFS share, how are they wiped out? Are they zeroed or anything. Same question for destroying ZFS filesystems, does the data lay about in any way? (That's largely answered by the first scenario.) If the data is retrievable in any way, is there a way to a) securely destroy a filesystem, or b) securely erase empty space on a filesystem. I know in some sense those questions don't apply in the way they would to, say, ext3, since a filesystem doesn't have a block until a file is written. Sorry if these questions aren't worded well. I've been in meetings for the last couple hours. - Cameron Hanover chano...@umich.edu Chaos was the law of nature. Order was the dream of man. --Henry Brooks Adams smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Identifying firmware version of SATA controller (LSI)
rvandol...@esri.com said: I'm trying to figure out where I can find the firmware on the LSI controller... are the bootup messages the only place I could expect to see this? prtconf and prtdiag both don't appear to give firmware information. . . . Solaris 10 U8 x86. The raidctl command is your friend; Useful for updating firmware if you choose to do so, as well. You can also find the revisions in the output of prtconf -Dv, search for firm in the long list. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On 2/5/10 3:49 PM -0500 c.hanover wrote: Two things, mostly related, that I'm trying to find answers to for our security team. Does this scenario make sense: * Create a filesystem at /users/nfsshare1, user uses it for a while, asks for the filesystem to be deleted * New user asks for a filesystem and is given /users/nfsshare2. What are the chances that they could use some tool or other to read unallocated blocks to view the previous user's data? Over NFS? none. Related to that, when files are deleted on a ZFS volume over an NFS share, how are they wiped out? Are they zeroed or anything. Same question for destroying ZFS filesystems, does the data lay about in any way? (That's largely answered by the first scenario.) In both cases the data is still on disk. If the data is retrievable in any way, is there a way to a) securely destroy a filesystem, or b) securely erase empty space on a filesystem. Someone else will have to answer that. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
rvd == Ray Van Dolson rvandol...@esri.com writes: ak == Andrey Kuzmin andrey.v.kuz...@gmail.com writes: rvd I missed out on this thread. How would these dropped flushed rvd writes manifest themselves? presumably corrupted databases, lost mail, or strange NFS behavior after the server reboots when the clients do not. But the actual test to which I referred is benchmark-like and didn't observe any of those things. If you read my post I gave you Lutz's name and the date he posted and also linked to the msgid in my message's header, so go read for yourself! A good point, though, is that drives with lying write caches are still okay if your box reboots because of a kernel panic, just not if it loses power, so they're not worthless. ak performance from anyone using (real) enterprise SSD (which now ak spells STEC) as slog. I wonder how ACARD would do also since it is 1/5th the cost, or if Seagate Pulsar will behave correctly. STEC coming in at more expensive than DRAM is like a sucker-premium you pay because no one else has their act together. And according to the test Lutz did the X25-M (and probably also -E?) are okay so long as you disable the write cache, though you have to do it at every boot, and 'hdadm' is not bundled. It would also be nice to convince anandtech and friends to yank power cords, too, to confirm that write flushes issued in their tests are actually obeyed, and to redo the io/s test with write cache disabled if the device lies, so that we actually have comparable numbers. If they would do that, the $ value of a supercap would become obvious. pgp2YcU6ajqw3.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
ch == c hanover chano...@umich.edu writes: ch is there a way to a) securely destroy a filesystem, AIUI zfs crypto will include this, some day, by forgetting the key. but for SSD, zfs above a zvol, or zfs above a SAN that may do snapshots without your consent, I think it's just logically not a solveable problem, period, unless you have a writeable keystore outside the vdev structure. pgpLXQTl8372Y.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On Fri, Feb 05, 2010 at 03:49:15PM -0500, c.hanover wrote: Two things, mostly related, that I'm trying to find answers to for our security team. Does this scenario make sense: * Create a filesystem at /users/nfsshare1, user uses it for a while, asks for the filesystem to be deleted * New user asks for a filesystem and is given /users/nfsshare2. What are the chances that they could use some tool or other to read unallocated blocks to view the previous user's data? If the tool isn't accessing the raw disks, then the answer is no chance. (There's no way to access the raw disks over NFS.) Related to that, when files are deleted on a ZFS volume over an NFS share, how are they wiped out? Are they zeroed or anything. Same question for destroying ZFS filesystems, does the data lay about in any way? (That's largely answered by the first scenario.) Deleting a file does not guarantee that data blocks are released: snapshots might exist that retain references to the data blocks of a file that is being deleted. Nor are blocks wiped when released. If the data is retrievable in any way, is there a way to a) securely destroy a filesystem, or b) securely erase empty space on a filesystem. When ZFS crypto ships you'll be able to securely destroy encrypted datasets. Until then the only form of secure erasure is to destroy the pool and then wipe the individual disks. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On Fri, Feb 05, 2010 at 04:41:08PM -0500, Miles Nordin wrote: ch == c hanover chano...@umich.edu writes: ch is there a way to a) securely destroy a filesystem, AIUI zfs crypto will include this, some day, by forgetting the key. Right. but for SSD, zfs above a zvol, or zfs above a SAN that may do snapshots without your consent, I think it's just logically not a solveable problem, period, unless you have a writeable keystore outside the vdev structure. IIIRC ZFS crypto will store encrypted blocks in L2ARC and ZIL, so forgetting the key is sufficient to obtain a high degree of security. ZFS crypto over zvols and what not presents no additional problems. However, if your passphrase is guessable then the key might be recoverable even after it's forgotten. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
In our particular case, there won't be snapshots of destroyed filesystems (I create the snapshots, and destroy them with the filesystem). I'm not too sure on the particulars of NFS/ZFS, but would it be possible to create a 1GB file without writing any data to it, and then use a hex editor to access the data stored on those blocks previously? Any chance someone could make any kind of sense of the contents (allocated in the same order they were before, or what have you)? ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for over the wire encryption. Until then, not much point. - Cameron Hanover chano...@umich.edu Our integrity sells for so little, but it is all we really have. It is the very last inch of us, but within that inch, we are free. --Valerie (V for Vendetta) On Feb 5, 2010, at 4:36 PM, Nicolas Williams wrote: On Fri, Feb 05, 2010 at 03:49:15PM -0500, c.hanover wrote: Two things, mostly related, that I'm trying to find answers to for our security team. Does this scenario make sense: * Create a filesystem at /users/nfsshare1, user uses it for a while, asks for the filesystem to be deleted * New user asks for a filesystem and is given /users/nfsshare2. What are the chances that they could use some tool or other to read unallocated blocks to view the previous user's data? If the tool isn't accessing the raw disks, then the answer is no chance. (There's no way to access the raw disks over NFS.) Related to that, when files are deleted on a ZFS volume over an NFS share, how are they wiped out? Are they zeroed or anything. Same question for destroying ZFS filesystems, does the data lay about in any way? (That's largely answered by the first scenario.) Deleting a file does not guarantee that data blocks are released: snapshots might exist that retain references to the data blocks of a file that is being deleted. Nor are blocks wiped when released. If the data is retrievable in any way, is there a way to a) securely destroy a filesystem, or b) securely erase empty space on a filesystem. When ZFS crypto ships you'll be able to securely destroy encrypted datasets. Until then the only form of secure erasure is to destroy the pool and then wipe the individual disks. Nico -- smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hybrid storage ... thing
I saw this in /. and thought I'd point it out to this list. It appears to act as a L2 cache for a single drive, in theory providing better performance. http://www.silverstonetek.com/products/p_contents.php?pno=HDDBOOSTarea -B -- Brandon High : bh...@freaks.com Indecision is the key to flexibility. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On 2/5/10 5:08 PM -0500 c.hanover wrote: would it be possible to create a 1GB file without writing any data to it, and then use a hex editor to access the data stored on those blocks previously? No, not over NFS and also not locally. You'd be creating a sparse file, which doesn't allocate space on disk for any filesystem (not just zfs). So when you read it back, you get back all 0s. The only way to actually allocate the space on disk is to write to it, and then of course you read back the data you wrote, not what was previously there. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On Fri, Feb 05, 2010 at 05:08:02PM -0500, c.hanover wrote: In our particular case, there won't be snapshots of destroyed filesystems (I create the snapshots, and destroy them with the filesystem). OK. I'm not too sure on the particulars of NFS/ZFS, but would it be possible to create a 1GB file without writing any data to it, and then use a hex editor to access the data stored on those blocks previously? Absolutely not. That is, you can create a 1GB file without writing to it, but it will appear to contain all zeros. Any chance someone could make any kind of sense of the contents (allocated in the same order they were before, or what have you)? No. See above. ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for over the wire encryption. Until then, not much point. You can use NFS with krb5 over the wire encryption _now_. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
On Feb 5, 2010, at 5:19 PM, Nicolas Williams wrote: ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for over the wire encryption. Until then, not much point. You can use NFS with krb5 over the wire encryption _now_. Nico -- I know, that's just something I'm working out the particulars of before we decide if/when we want to offer it in production. I've got it working to some extent now. - Cameron Hanover chano...@umich.edu Tact is for people who aren't witty enough to be sarcastic. smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hybrid storage ... thing
I saw this in /. and thought I'd point it out to this list. It appears to act as a L2 cache for a single drive, in theory providing better performance. http://www.silverstonetek.com/products/p_contents.php?pno=HDDBOOSTarea It's a neat device, but the notion of a hybrid drive is nothing new. As with any block-based caching, this device has no notion of the semantic meaning of a given block so there's only so much intelligence it can bring to bear on the problem. Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
On Feb 5, 2010, at 3:11 AM, grarpamp wrote: Are the sha256/fletcher[x]/etc checksums sent to the receiver along with the other data/metadata? No. Checksums are made on the records, and there could be a different record size for the sending and receiving file systems. The stream itself is checksummed with fletcher4. And checked upon receipt of course. Of course. Do they chain all the way back to the uberblock or to some calculated transfer specific checksum value? I suppose one could say a calculated transfer fletcher4 checksum value. The idea is to carry through the integrity checks wherever possible. Whether done as close as within the same zpool, or miles away. yes. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Feb 5, 2010, at 10:49 AM, Robert Milkowski mi...@task.gda.pl wrote: Actually, there is. One difference is that when writing to a raid-z{1|2} pool compared to raid-10 pool you should get better throughput if at least 4 drives are used. Basically it is due to the fact that in RAID-10 the maximum you can get in terms of write throughput is a total aggregated throughput of half the number of used disks and only assuming there are no other bottlenecks between the OS and disks especially as you need to take into account that you are double the bandwidth requirements due to mirroring. In case of RAID-Zn you have some extra overhead for writing additional checksum but other than that you should get a write throughput closer to of T-N (where N is a RAID-Z level) instead of T/2 in RAID-10. That hasn't been my experience with raidz. I get a max read and write IOPS of the slowest drive in the vdev. Which makes sense because each write spans all drives and each read spans all drives (except the parity drives) so they end up having the performance characteristics of a single drive. Now if you have enough drives you can create multiple raidz vdevs to get the IOPS up, but you need a lot more drives then what multiple mirror vdevs can provide IOPS wise with the same amount of spindles. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS 'secure erase'
You might also want to note that with traditional filesystems, the 'shred' utility will securely erase data, but no tools like that will work for zfs. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
No. Checksums are made on the records, and there could be a different record size for the sending and receiving file systems. Oh. So there's a zfs read to ram somewhere, which checks the sums on disk. And then entirely new stream checksums are made while sending it all off to the pipe. I see the bit about different zfs block sizes perhaps preventing use of the actual on disk checksums in the transfer itself... including thereby, the chain to uberblock in the transfer. Thanks for that part. The stream itself is checksummed with fletcher4. I suppose one could say a calculated transfer fletcher4 checksum value. Hmm, is that configurable? Say to match the checksums being used on the filesystem itself... ie: sha256? It would seem odd to send with less bits than what is used on disk. The idea is to carry through the integrity checks wherever possible. Whether done as close as within the same zpool, or miles away. yes. Was thinking that plaintext ethernet/wan and even some of the 'weaker' ssl algorithms would be candidates to back with sha256 in a transfer. Not really needed for a 'within the box only' unix pipe though. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
On Feb 5, 2010, at 7:20 PM, grarpamp wrote: No. Checksums are made on the records, and there could be a different record size for the sending and receiving file systems. Oh. So there's a zfs read to ram somewhere, which checks the sums on disk. And then entirely new stream checksums are made while sending it all off to the pipe. I see the bit about different zfs block sizes perhaps preventing use of the actual on disk checksums in the transfer itself... including thereby, the chain to uberblock in the transfer. Thanks for that part. The stream itself is checksummed with fletcher4. I suppose one could say a calculated transfer fletcher4 checksum value. Hmm, is that configurable? Say to match the checksums being used on the filesystem itself... ie: sha256? It would seem odd to send with less bits than what is used on disk. Do you expect the same errors in the pipe as you do on disk? The idea is to carry through the integrity checks wherever possible. Whether done as close as within the same zpool, or miles away. yes. Was thinking that plaintext ethernet/wan and even some of the 'weaker' ssl algorithms would be candidates to back with sha256 in a transfer. Not really needed for a 'within the box only' unix pipe though. most folks use ssh. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
Hmm, is that configurable? Say to match the checksums being used on the filesystem itself... ie: sha256? It would seem odd to send with less bits than what is used on disk. Was thinking that plaintext ethernet/wan and even some of the 'weaker' ssl algorithms Do you expect the same errors in the pipe as you do on disk? Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] is assumed to handle data with integrity. So say netcat is used as transport, zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, and your wire takes some undetected/uncorrected hits, and the hits also happen to make it past fletcher4... it kindof nullifies the SA's choice/thought that sha256 would be used throughout all zfs operations. I din't see notation in the man page that checksums are indeed used in send/recv operations... In any case, at least something is used over the bare wire :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Intel's RAM is faster because it needs to be. I'm confused how AMD's dual channel, two way interleaved 128-bit DDR2-667 into an on-cpu controller is faster than Intel's Lynnfield dual channel, Rank and Channel interleaved DDR3-1333 into an on-cpu controller. http://www.anandtech.com/printarticle.aspx?i=3634 With the AMD CPU, the memory will run cooler and be cheaper. cooler yes, but only $2 more per gig for 2x bandwidth? http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050 http://www.newegg.com/Product/Product.aspx?Item=N82E16820134652 and if one uses all 16 slots, that 667Mhz simm runs at 533Mhz with AMD. The same is true for Lynnfield if one uses Registered DDR3, one only gets 800Mhz with all 6 slots. (single or dual rank) Regardless, for zfs, memory is more important than raw CPU agreed! but everything must be balanced. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
On Feb 5, 2010, at 8:09 PM, grarpamp wrote: Hmm, is that configurable? Say to match the checksums being used on the filesystem itself... ie: sha256? It would seem odd to send with less bits than what is used on disk. Was thinking that plaintext ethernet/wan and even some of the 'weaker' ssl algorithms Do you expect the same errors in the pipe as you do on disk? Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] is assumed to handle data with integrity. So say netcat is used as transport, zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, and your wire takes some undetected/uncorrected hits, and the hits also happen to make it past fletcher4... it kindof nullifies the SA's choice/thought that sha256 would be used throughout all zfs operations. Hold it right there, fella. SHA256 is not used for everything ZFS, so expecting it to be so will set the stage for disappointment. You can set the data to be checksummed with SHA256. I din't see notation in the man page that checksums are indeed used in send/recv operations... It is an implementation detail. But if you can make the case for why it is required to be inside the protocol, rather than its transport, then please file an RFE. In any case, at least something is used over the bare wire :) Lots of things are used on the bare wire and there are many hops along the way. This is another good reason to use ssh, or some other end-to-end verification mechanism. UNIX pipes are a great invention! :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/recv checksum transmission
Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] is assumed to handle data with integrity. So say netcat is used as transport, zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, and your wire takes some undetected/uncorrected hits, and the hits also happen to make it past fletcher4... it kindof nullifies the SA's choice/thought that sha256 would be used throughout all zfs operations. Hold it right there, fella. SHA256 is not used for everything ZFS, Well, ok, and in my limited knowhow... zfs set checksum=sha256 only covers user scribbled data [POSIX file metadata, file contents, directory structure, ZVOL blocks] and not necessarily any zfs filesystem internals. You can set the data to be checksummed with SHA256. Definitely, as indeed set above :) I din't see notation in the man page that checksums are indeed used in send/recv operations... It is an implementation detail. But if you can make the case for why it is required to be inside the protocol, rather than its transport, then please file an RFE. The case had to have been previously made to include fletcher4 in the zfs send/recv protocol. So sha256 would just be an update to the user's options. Similar to how f4 was an available on disk update to f2, z3 to z2 to z1, etc. Was really only looking to see what, if anything, was currently used in the protocol, not actually proposing an update. Now I know :) Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon In any case, at least something is used over the bare wire :) UNIX pipes are a great invention! :-) Yeah, I suppose a pipe to ssh has enough bits to catch things these days. Netcat might be different, ergo, at least f4 as already implemented. debug1: kex: server-client aes128-ctr hmac-sha1 none debug1: kex: client-server aes128-ctr hmac-sha1 none Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss