Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Hi, I agree 100% with Chris. Notice the on their own part of the original post. Yes, nobody wants to run zfs send or (s)tar by hand. That's why Chris's script is so useful: You set it up and forget and get the job done for 80% of home users. On another note, I was positively surprised by the availability of Crash Plan for OpenSolaris: http://crashplan.com/ Their free service allows to back up your stuff to a friend's system over the net in an encrypted way, the paid-for servide uses Crashplan's data centers at a less than Amazon-S3 pricing. While this may not be everyone's solution, I find it significant that they explicitly support OpenSolaris. This either means they're OpenSolaris fans or that they see potential in OpenSolaris home server users. Cheers, Constantin On 03/20/10 01:31 PM, Chris Gerhard wrote: I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or even home) backup system on their own one or both can be components of the full solution. Up to a point. zfs send | zfs receive does make a very good back up scheme for the home user with a moderate amount of storage. Especially when the entire back up will fit on a single drive which I think would cover the majority of home users. Using external drives and incremental zfs streams allows for extremely quick back ups of large amounts of data. It certainly does for me. http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/ -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
Robert Milkowski wrote: To add my 0.2 cents... I think starting/stopping scrub belongs to cron, smf, etc. and not to zfs itself. However what would be nice to have is an ability to freeze/resume a scrub and also limit its rate of scrubbing. One of the reason is that when working in SAN environments one have to take into account more that just a server where a scrub will be running as while it might not impact the server it might cause an issue for others, etc. There's an RFE for this (pause/resume a scrub), or rather there was - unfortunately, it's got subsumed into another RFE/BUG and the pause/resume requirement got lost. I'll see about reinstating it. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22.03.2010 02:13, Edward Ned Harvey wrote: Actually ... Why should there be a ZFS property to share NFS, when you can already do that with share and dfstab? And still the zfs property exists. Probably because it is easy to create new filesystems and clone them; as NFS only works per filesystem you need to edit dfstab every time when you add a filesystem. With the nfs property, zfs create the NFS export, etc. Either I'm missing something, or you are. If I export /somedir and then I create a new zfs filesystem /somedir/foo/bar then I don't have to mess around with dfstab, because it's a subdirectory of an exported directory, it's already accessible via NFS. So unless I misunderstand what you're saying, you're wrong. This is the only situation I can imagine, where you would want to create a ZFS filesystem and have it default to NFS exported. Actually, I can see some reasons for this. Some of us wants directories mounted the same place at all servers. Consider the following: zfs inherit sharenfs pool/nfs zfs create -o mountpoint=/home pool/nfs/home zfs create -o mountpoint=/webpages pool/nfs/www zfs create -o mountpoint=/someotherdir pool/nfs/otherdir etc. So, I do see the point of the sharenfs attribute. ;) //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 21.03.2010 01:25, Robert Milkowski wrote: To add my 0.2 cents... I think starting/stopping scrub belongs to cron, smf, etc. and not to zfs itself. However what would be nice to have is an ability to freeze/resume a scrub and also limit its rate of scrubbing. One of the reason is that when working in SAN environments one have to take into account more that just a server where a scrub will be running as while it might not impact the server it might cause an issue for others, etc. Does cron happen to know how many other scrubs are running, bogging down your IO system? If the scrub scheduling was integrated into zfs itself, it would be a small step to include smf/sysctl settings for maximum number of parallel scrubs, meaning the next scrub could sit waiting until the running ones are finished. //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22/03/2010 01:13, Edward Ned Harvey wrote: Actually ... Why should there be a ZFS property to share NFS, when you can already do that with share and dfstab? And still the zfs property exists. Probably because it is easy to create new filesystems and clone them; as NFS only works per filesystem you need to edit dfstab every time when you add a filesystem. With the nfs property, zfs create the NFS export, etc. Either I'm missing something, or you are. If I export /somedir and then I create a new zfs filesystem /somedir/foo/bar then I don't have to mess around with dfstab, because it's a subdirectory of an exported directory, it's already accessible via NFS. So unless I misunderstand what you're saying, you're wrong. no, it is not a subdirectory it is a filesystem mounted on top of the subdirectory. So unless you use NFSv4 with mirror mounts or an automounter other NFS version will show you contents of a directory and not a filesystem. It doesn't matter if it is a zfs or not. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22/03/2010 08:49, Andrew Gabriel wrote: Robert Milkowski wrote: To add my 0.2 cents... I think starting/stopping scrub belongs to cron, smf, etc. and not to zfs itself. However what would be nice to have is an ability to freeze/resume a scrub and also limit its rate of scrubbing. One of the reason is that when working in SAN environments one have to take into account more that just a server where a scrub will be running as while it might not impact the server it might cause an issue for others, etc. There's an RFE for this (pause/resume a scrub), or rather there was - unfortunately, it's got subsumed into another RFE/BUG and the pause/resume requirement got lost. I'll see about reinstating it. have you got the rfe/bug numbers? I will try to find some time and get it implemented... -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] David Plaunt is currently away.
I will be out of the office starting 22/03/2010 and will not return until 06/04/2010. Hello, I am currently working on a project and out of the office. I will be checking my message twice a day but may be unavailable to follow up on your requests. If the matter requires immediate attention please send your request to t...@brucetelecom.com or contact technical support at 1 866 517 2000 x 2 / 519 368 2000 x 2. Thank you, David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
Does cron happen to know how many other scrubs are running, bogging down your IO system? If the scrub scheduling was integrated into zfs itself, It doesn't need to. Crontab entry: /root/bin/scruball.sh /root/bin/scruball.sh: #!/usr/bin/bash for filesystem in filesystem1 filesystem2 filesystem3 ; do zfs scrub $filesystem done If you were talking about something else, for example, multiple machines all scrubbing a SAN at the same time, then ZFS can't solve that any better than cron, because it would require inter-machine communication to coordinate. I contend a shell script could actually handle that better than a built-in zfs property anyway. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22.03.2010 13:35, Edward Ned Harvey wrote: Does cron happen to know how many other scrubs are running, bogging down your IO system? If the scrub scheduling was integrated into zfs itself, It doesn't need to. Crontab entry: /root/bin/scruball.sh /root/bin/scruball.sh: #!/usr/bin/bash for filesystem in filesystem1 filesystem2 filesystem3 ; do zfs scrub $filesystem done If you were talking about something else, for example, multiple machines all scrubbing a SAN at the same time, then ZFS can't solve that any better than cron, because it would require inter-machine communication to coordinate. I contend a shell script could actually handle that better than a built-in zfs property anyway. IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
no, it is not a subdirectory it is a filesystem mounted on top of the subdirectory. So unless you use NFSv4 with mirror mounts or an automounter other NFS version will show you contents of a directory and not a filesystem. It doesn't matter if it is a zfs or not. Ok, I learned something here, that I want to share: If you create a new zfs filesystem as a subdir of a zfs filesystem which is exported via nfs and shared via cifs ... The cifs clients see the contents of the child zfs filesystems. But, as Robert said above, nfs clients do not see the contents of the child zfs filesystem. So, if you nest zfs filesystems inside each other (I don't) then the sharenfs property of a parent can be inherited by a child, and if that's your desired behavior, it's a cool feature. For that matter, even if you do set the property, and you create a new child filesystem with inheritance, that only means the server will auto-export the filesystem. It doesn't mean the client will auto-mount it, right? So what's the 2nd half of the solution? Assuming you want the clients to see the subdirectories as the server does. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. You're right. I did that wrong. Sorry 'bout that. So either way, if there's a zfs property for scrub, that still doesn't prevent multiple scrubs from running simultaneously. So ... Presently there's no way to avoid the simultaneous scrubs either way, right? You have to home-cook scripts to detect which scrubs are running on which filesystems, and serialize the scrubs. With, or without the property. Don't get me wrong - I'm not discouraging the creation of the property. But if you want to avoid simul-scrub, you'd first have to create a mechanism for that, and then you could create the autoscrub. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+CIFS: Volume Shadow Services, or Simple Symlink?
Not being a CIFS user, could you clarify/confirm for me.. is this just a presentation issue, ie making a directory icon appear in a gooey windows explorer (or mac or whatever equivalent) view for people to click on? The windows client could access the .zfs/snapshot dir via typed pathname if it knows to look, or if it's made visible, yes? You are correct. A CIFS client by default will not show the hidden .zfs directory, but if you either click the checkbox show hidden files then you'll see it, or if you type it into the addressbar, then you can access it. However, my users were used to having a hidden .snapshots directory in every directory. I didn't want to tell them You have to go to the parent of all directories, and type in .zfs mostly because they can't remember zfs ... So the softlink just makes it visible and easy to remember. I promise you will never catch me creating backups of ZFS via CIFS. ;- ) Never say never.. Hehehehe. Given the alternatives, I think this is a safe one. I will never backup a ZFS filesystem via CIFS client. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22/03/2010 12:50, Edward Ned Harvey wrote: no, it is not a subdirectory it is a filesystem mounted on top of the subdirectory. So unless you use NFSv4 with mirror mounts or an automounter other NFS version will show you contents of a directory and not a filesystem. It doesn't matter if it is a zfs or not. Ok, I learned something here, that I want to share: If you create a new zfs filesystem as a subdir of a zfs filesystem which is exported via nfs and shared via cifs ... The cifs clients see the contents of the child zfs filesystems. But, as Robert said above, nfs clients do not see the contents of the child zfs filesystem. So, if you nest zfs filesystems inside each other (I don't) then the sharenfs property of a parent can be inherited by a child, and if that's your desired behavior, it's a cool feature. For that matter, even if you do set the property, and you create a new child filesystem with inheritance, that only means the server will auto-export the filesystem. It doesn't mean the client will auto-mount it, right? So what's the 2nd half of the solution? Assuming you want the clients to see the subdirectories as the server does. look for mirror mounts feature in NFSv4. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
hi, Thanks for all the reply. I have found the real culprit. Hard disk was faulty. I changed the hard disk.And now ZFS performance is much better. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22.03.2010 13:54, Edward Ned Harvey wrote: IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. You're right. I did that wrong. Sorry 'bout that. So either way, if there's a zfs property for scrub, that still doesn't prevent multiple scrubs from running simultaneously. So ... Presently there's no way to avoid the simultaneous scrubs either way, right? You have to home-cook scripts to detect which scrubs are running on which filesystems, and serialize the scrubs. With, or without the property. Don't get me wrong - I'm not discouraging the creation of the property. But if you want to avoid simul-scrub, you'd first have to create a mechanism for that, and then you could create the autoscrub. Which is exactly why I wanted it cooked in in the zfs code itself. zfs knows how many fs'es it's scrubbing. //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel SASUC8I - worth every penny
I've moved to 7200RPM 2.5 laptop drives over 3.5 drives, for a combination of reasons: lower-power, better performance than a comparable sized 3.5 drives, and generally lower-capacities meaning resilver times are smaller. They're a bit more $/GB, but not a lot. If you can stomach the extra cost (they run $220), I'd actually recommend getting a 8x2.5 in 2x5.25 enclosure from Supermicro. It works nicely, plus it gives you a nice little place to put your SSD. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Regarding the 2.5 laptop drives, do the inherent error detection properties of ZFS subdue any concerns over a laptop drive's higher bit error rate or rated MTBF? I've been reading about OpenSolaris and ZFS for several months now and am incredibly intrigued, but have yet to implement the solution in my lab. Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel SASUC8I - worth every penny
Cooper Hubbell wrote: Regarding the 2.5 laptop drives, do the inherent error detection properties of ZFS subdue any concerns over a laptop drive's higher bit error rate or rated MTBF? I've been reading about OpenSolaris and ZFS for several months now and am incredibly intrigued, but have yet to implement the solution in my lab. Thanks! So far as I know, laptop drives have no higher error rates (i.e. unrecoverable errors per 1 billion bits read/wrote), and similar MTBF to standard consumer SATA drives. Looking at a couple of spec sheets, MTBF is about 600,000 hrs for laptop drives, and 700,000 hrs for consumer 3.5 drives. Frankly, if I was concerned about individual component failures, I'd look outside the consumer space (in all form factors). In both cases, they're not terribly reliable, which is why ZFS is so great. :-) And, yes, to answer your question, this is (one of) the whole point behind ZFS - being able to provide a reliable service from unreliable parts. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel SASUC8I - worth every penny
On 22.03.2010 16:24, Cooper Hubbell wrote: I've moved to 7200RPM 2.5 laptop drives over 3.5 drives, for a combination of reasons: lower-power, better performance than a comparable sized 3.5 drives, and generally lower-capacities meaning resilver times are smaller. They're a bit more $/GB, but not a lot. If you can stomach the extra cost (they run $220), I'd actually recommend getting a 8x2.5 in 2x5.25 enclosure from Supermicro. It works nicely, plus it gives you a nice little place to put your SSD. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Regarding the 2.5 laptop drives, do the inherent error detection properties of ZFS subdue any concerns over a laptop drive's higher bit error rate or rated MTBF? I've been reading about OpenSolaris and ZFS for several months now and am incredibly intrigued, but have yet to implement the solution in my lab. Well ... the price difference means you can have mirrors of the laptop drives and still save money compared to the enterprise ones. With a modern patrol-reading (scrub or hardware raid) array-setup, and with some redundancy, you can re-implement I to mean inexpensive not independent in RAID. ;) //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On Sat, March 20, 2010 07:31, Chris Gerhard wrote: Up to a point. zfs send | zfs receive does make a very good back up scheme for the home user with a moderate amount of storage. Especially when the entire back up will fit on a single drive which I think would cover the majority of home users. My own fit on a single external drive; but I've noticed that I have a rather small configuration (1.2TB nominal, less than 800GB used). Most people I hear describing building home NAS setups put between 4 and 10 of the biggest drives they can buy in them -- much more capacity than mine (but then I built mine in 2006, too). I'm not clear how much of it they ever fill up :-). Using external drives and incremental zfs streams allows for extremely quick back ups of large amounts of data. It certainly does for me. http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/ So far, for me it allows for endless failures and a LOT of reboots to free stuck IO subsystems. Your script seems to be using a simple zfs send -i; what I'm trying to do is use an incremental replication stream, a -R -I thing (from memory; hope that's right!). This should propagate (for example) my every-2-hours snapshots over onto the backup, even though I only back up to a given drive every two or three days (three backup drives, rotating one off-site). Unfortunately, though, it doesn't work; hangs during the receive eventually. I'm waiting for the 2010.$Spring stable release to see how it behave there before I get really energetic about debugging; for now I'm just forcing pool recreation and full backups each time (destroying the filesystem also hangs). Given that full backups run through to completion, whereas incrementals fail even though they're pushing a lot less data and take a lot less time, I'm not inclined to blame my USB hardware. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
Thank you to all who responded. This response in particular was very helpful and I think I will stick with my current zpool configuration (choice a if you're reading below). I primarily host VMware virtual machines over NFS from this server's predecessor and this server will be doing the same thing. I think the 6 x 2-way mirror configuration gives me the best mix of performance and fault tolerance. Regards, Chris Dunbar On Mar 19, 2010, at 5:44 PM, Erik Trimble wrote: Chris Dunbar - Earthside, LLC wrote: Hello, After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It's not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn't seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that smaller sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. Thank you, Chris Dunbar There's not much benefit I can see to having two pools if both are using the same configuration (i.e all mirrors or all raidz). There are reasons to do so, but I don't see that they would be of any real benefit for what you describe. A Hot spare disk can be assigned to multiple pools (often referred to as a global hot spare) Preferences for raidz[123] configs is to have 4-6 data disks in the vdev. Realistically speaking, you have several different (practical) configurations possible, in order of general performance: (a) 6 x 2-way mirrors + 1 pool hot spare - 9TB usable (b) 4 x 3-ways mirrors + 1 pool hot spare - 6TB usable (c) 1 6-disk raidz + 1 7-disk raidz - 16.5TB usable (d) 2 6-disk raidz + 1 pool hot spare - 15TB usable (e) 1 6-disk raidz2 + 1 7-disk raidz2 - 13.5TB usable (f) 2 6-disk raidz2 + 1 pool hot spare - 12TB usable (g) 1 6-disk raidz3 + 1 7-disk raidz3 - 10.5TB usable (h) 1 13-disk raidz3 - 15TB usable Given the size of your disks, resilvering is likely to have a significant time problem in any RAIDZ[123] configuration. That is, unless you are storing (almost exclusively) very large files, resilver time is going to be significant, and can potentially be radically higher than a mirrored config. The mirroring configs will out-perform raidz[123] on everything except large streaming write/reads, and even then, it's a toss-up. Overall, the (a), (d), and (f) configurations generally offer the best balance of redundancy, space, and performance. Here's the chances to survive disk failures (assuming hot spares are unable to be used; that is, all disk failures happen in a short period of time) - note that all three can always survive a single disk failure: (a) 90% for 2, 73% for 3, 49% for 4, 25% for 5. (d) 55% for 2, 27% for 3, 0% for 4 or more (f) 100% for 2, 80% for 3, 56% for 4, 0% for 5. Depending on your exact requirements, I'd go with (a) or (f) as the best choices - (a) if performance is more important, (f) if redundancy overrides performance. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA eSoft SpamFilter Training Tool Train as Spam Blacklist for All Users Whitelist for All Users ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On Mar 22, 2010, at 7:30 AM, Svein Skogen wrote: On 22.03.2010 13:54, Edward Ned Harvey wrote: IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. You're right. I did that wrong. Sorry 'bout that. So either way, if there's a zfs property for scrub, that still doesn't prevent multiple scrubs from running simultaneously. So ... Presently there's no way to avoid the simultaneous scrubs either way, right? You have to home-cook scripts to detect which scrubs are running on which filesystems, and serialize the scrubs. With, or without the property. Don't get me wrong - I'm not discouraging the creation of the property. But if you want to avoid simul-scrub, you'd first have to create a mechanism for that, and then you could create the autoscrub. Which is exactly why I wanted it cooked in in the zfs code itself. zfs knows how many fs'es it's scrubbing. Nit: ZFS does not scrub file systems. ZFS scrubs pools. In most deployments I've done or seen there are very few pools, with many file systems. For appliances like NexentaStor or Oracle's Sun OpenStorage platforms, the default smallest unit of deployment is one disk. In other words, there is no case where multiple scrubs compete for the resources of a single disk because a single disk only participates in one pool. In general, resource management works when you are resource constrained. Hence, it is quite acceptable to implement concurrent scrubs. Bottom line: systems engineering is still required for optimal system operation. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Mar 19, 2010, at 1:28 PM, Richard Jahnel wrote: They way we do this here is: zfs snapshot voln...@snapnow [i]#code to break on error and email not shown.[/i] zfs send -i voln...@snapbefore voln...@snapnow | pigz -p4 -1 file [i]#code to break on error and email not shown.[/i] scp /dir/file u...@remote:/dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote gzip -t /dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote gunzip /dir/file | zfs receive volname It works for me and it sends a minimum amount of data across the wire which is tested to minimize the chance of inflight issues. Excpet on Sundays when we do a full send. NB. deduped streams should further reduce the snapshot size. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 22.03.2010 18:10, Richard Elling wrote: On Mar 22, 2010, at 7:30 AM, Svein Skogen wrote: On 22.03.2010 13:54, Edward Ned Harvey wrote: IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. You're right. I did that wrong. Sorry 'bout that. So either way, if there's a zfs property for scrub, that still doesn't prevent multiple scrubs from running simultaneously. So ... Presently there's no way to avoid the simultaneous scrubs either way, right? You have to home-cook scripts to detect which scrubs are running on which filesystems, and serialize the scrubs. With, or without the property. Don't get me wrong - I'm not discouraging the creation of the property. But if you want to avoid simul-scrub, you'd first have to create a mechanism for that, and then you could create the autoscrub. Which is exactly why I wanted it cooked in in the zfs code itself. zfs knows how many fs'es it's scrubbing. Nit: ZFS does not scrub file systems. ZFS scrubs pools. In most deployments I've done or seen there are very few pools, with many file systems. For appliances like NexentaStor or Oracle's Sun OpenStorage platforms, the default smallest unit of deployment is one disk. In other words, there is no case where multiple scrubs compete for the resources of a single disk because a single disk only participates in one pool. In general, resource management works when you are resource constrained. Hence, it is quite acceptable to implement concurrent scrubs. Bottom line: systems engineering is still required for optimal system operation. -- richard When you hook up a monstrosity like 96 disks (the limit of those supermicro 2.5-drive sas enclosures discussed on this list recently) to two 4-lane sas-controllers, the bottleneck is likely to be your controller, your pci-express-bus, or your memory bandwidth. You still want to be able to put some constraints into how much your pushing the hardware. ;) //Svein -- Sending mail from a temporary set up workstation, as my primary W500 is off for service. PGP not installed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On Mar 22, 2010, at 10:36 AM, Svein Skogen wrote: On 22.03.2010 18:10, Richard Elling wrote: On Mar 22, 2010, at 7:30 AM, Svein Skogen wrote: On 22.03.2010 13:54, Edward Ned Harvey wrote: IIRC it's zpool scrub, and last time I checked, the zpool command exited (with status 0) as soon as it had started the scrub. Your command would start _ALL_ scrubs in paralell as a result. You're right. I did that wrong. Sorry 'bout that. So either way, if there's a zfs property for scrub, that still doesn't prevent multiple scrubs from running simultaneously. So ... Presently there's no way to avoid the simultaneous scrubs either way, right? You have to home-cook scripts to detect which scrubs are running on which filesystems, and serialize the scrubs. With, or without the property. Don't get me wrong - I'm not discouraging the creation of the property. But if you want to avoid simul-scrub, you'd first have to create a mechanism for that, and then you could create the autoscrub. Which is exactly why I wanted it cooked in in the zfs code itself. zfs knows how many fs'es it's scrubbing. Nit: ZFS does not scrub file systems. ZFS scrubs pools. In most deployments I've done or seen there are very few pools, with many file systems. For appliances like NexentaStor or Oracle's Sun OpenStorage platforms, the default smallest unit of deployment is one disk. In other words, there is no case where multiple scrubs compete for the resources of a single disk because a single disk only participates in one pool. In general, resource management works when you are resource constrained. Hence, it is quite acceptable to implement concurrent scrubs. Bottom line: systems engineering is still required for optimal system operation. -- richard When you hook up a monstrosity like 96 disks (the limit of those supermicro 2.5-drive sas enclosures discussed on this list recently) to two 4-lane sas-controllers, the bottleneck is likely to be your controller, your pci-express-bus, or your memory bandwidth. You still want to be able to put some constraints into how much your pushing the hardware. ;) Scrub tends to be a random workload dominated by IOPS, not bandwidth. But if you are so inclined to create an unbalanced system... Bottom line: systems engineering is still required for optimal system operation :-) -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] David Plaunt is currently away.
I will be out of the office starting 22/03/2010 and will not return until 06/04/2010. Hello, I am currently working on a project and out of the office. I will be checking my message twice a day but may be unavailable to follow up on your requests. If the matter requires immediate attention please send your request to t...@brucetelecom.com or contact technical support at 1 866 517 2000 x 2 / 519 368 2000 x 2. Thank you, David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On 03/22/10 11:02, Richard Elling wrote: Scrub tends to be a random workload dominated by IOPS, not bandwidth. you may want to look at this again post build 128; the addition of metadata prefetch to scrub/resilver in that build appears to have dramatically changed how it performs (largely for the better). - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On Mar 22, 2010, at 11:33 AM, Bill Sommerfeld wrote: On 03/22/10 11:02, Richard Elling wrote: Scrub tends to be a random workload dominated by IOPS, not bandwidth. you may want to look at this again post build 128; the addition of metadata prefetch to scrub/resilver in that build appears to have dramatically changed how it performs (largely for the better). Yes, it is better. But still nowhere near platter speed. All it takes is one little seek... -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snapshots as versioning tool
This may be a bit dimwitted since I don't really understand how snapshots work. I mean the part concerning COW (copy on right) and how it takes so little room. But here I'm not asking about that. It appears to me that the default snapshot setup shares some aspects of a vcs (version control system) tool. I wonder if any of you use it that way. Here is one thing I've considered but not done yet. When I do video projects or any projects for that matter. I sometimes want backups every 10 minutes or so, so as not to loose some piece of script that isn't finished or the like. Or with something like a flash project, you might want to make sure you will be able to recover a version from a while back. So doing the project on zfs filesystem (maybe as nfs or cifs mount) would offer a way to do that. I wondered if it would be possible to run a snapshot system independent of the default one. I mean so a default setup of auto snapshotting would continue unaffected. I'm thinking of scripting something like 10 minute snapshots during the time I'm working on a project, then just turn it off when not working on it. When project is done... zap all those snapshots. Am I missing something basic that make this a poor use of zfs? Oh, something I meant to ask... is there some standard way to tell before calling for a snapshot, if the directory structure has changed at all, other than aging I mean. Is there something better than running `diff -r [...]' between existing structure and last snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
This is totally doable, and a reasonable use of zfs snapshots - we do some similar things. You can easily determine if the snapshot has changed by checking the output of zfs list for the snapshot. --M -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Harry Putnam Sent: Monday, March 22, 2010 1:34 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] snapshots as versioning tool This may be a bit dimwitted since I don't really understand how snapshots work. I mean the part concerning COW (copy on right) and how it takes so little room. But here I'm not asking about that. It appears to me that the default snapshot setup shares some aspects of a vcs (version control system) tool. I wonder if any of you use it that way. Here is one thing I've considered but not done yet. When I do video projects or any projects for that matter. I sometimes want backups every 10 minutes or so, so as not to loose some piece of script that isn't finished or the like. Or with something like a flash project, you might want to make sure you will be able to recover a version from a while back. So doing the project on zfs filesystem (maybe as nfs or cifs mount) would offer a way to do that. I wondered if it would be possible to run a snapshot system independent of the default one. I mean so a default setup of auto snapshotting would continue unaffected. I'm thinking of scripting something like 10 minute snapshots during the time I'm working on a project, then just turn it off when not working on it. When project is done... zap all those snapshots. Am I missing something basic that make this a poor use of zfs? Oh, something I meant to ask... is there some standard way to tell before calling for a snapshot, if the directory structure has changed at all, other than aging I mean. Is there something better than running `diff -r [...]' between existing structure and last snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
On 03/23/10 09:34 AM, Harry Putnam wrote: This may be a bit dimwitted since I don't really understand how snapshots work. I mean the part concerning COW (copy on right) and how it takes so little room. But here I'm not asking about that. It appears to me that the default snapshot setup shares some aspects of a vcs (version control system) tool. It does, but on a filesystem rather than file level. Or to put it another way, less fine grained than a traditional VCS. I wonder if any of you use it that way. I do for things I don't change very often, such us system configuration files. I always snapshot my root pool before making any changes to files under /etc for example. Here is one thing I've considered but not done yet. When I do video projects or any projects for that matter. I sometimes want backups every 10 minutes or so, so as not to loose some piece of script that isn't finished or the like. Or with something like a flash project, you might want to make sure you will be able to recover a version from a while back. So doing the project on zfs filesystem (maybe as nfs or cifs mount) would offer a way to do that. I wondered if it would be possible to run a snapshot system independent of the default one. I mean so a default setup of auto snapshotting would continue unaffected. You can, but I think you would be better off using a traditional VCS (such as Subversion) that works well with binary files. If you have to work in windows, this is your best option (Tortoise SVN is the only reason I know to use windows!). I'm thinking of scripting something like 10 minute snapshots during the time I'm working on a project, then just turn it off when not working on it. When project is done... zap all those snapshots. Am I missing something basic that make this a poor use of zfs? You don't really get to track version of a file. I find I commit very frequently (as soon as a new test passes) and use SVN as an undo if I mess up a change. Tying commits to changes is different form tying them to time. Oh, something I meant to ask... is there some standard way to tell before calling for a snapshot, if the directory structure has changed at all, other than aging I mean. Is there something better than running `diff -r [...]' between existing structure and last snapshot. Not really, there is ZFS diff is in the woks, but not here yet. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
This may be a bit dimwitted since I don't really understand how snapshots work. I mean the part concerning COW (copy on right) and how it takes so little room. COW and snapshots are very simple to explain. Suppose you're chugging along using your filesystem, and then one moment, you tell the filesystem to freeze. Well, suppose a minute later you tell the FS to overwrite some block that's in use already. Instead of overwriting the actual block on disk, the FS will overwrite some unused space, and report back to you that the operation is completed. So now there's a copy of the block as it was at the moment of the freeze, and there's another copy of the block as it looks later in time. The FS only needs to freeze the FS tables, to remember which blocks belonged to which files in each of the snapshots. Hence, Copy On Write. That being said, it's an inaccurate description to say COW takes so little room. If anything, it takes more room than a filesystem which can't do COW, because the FS must not delete any of the old blocks belonging to any of the old snapshots of the filesystem. The more frequently you take snapshots, and the older your oldest snap is, and the more volatile your data is, changing large sequences of blocks rapidly ... The more disk space will be consumed. No block is free, as long as any one of the snaps references it. But suppose you have n snapshots. In a non-COW filesystem, you would have n-times the data. While in COW, you still have 1x the total used data size, plus the byte differentials necessary to resurrect any/all of the old snapshots. I'm thinking of scripting something like 10 minute snapshots during the time I'm working on a project, then just turn it off when not working on it. When project is done... zap all those snapshots. Yup, that's absolutely easy. Just set up a cron job to snap every 10 minutes, using a unique string in the snapname, like @myprojectsnap ... and when you're all done, you zfs destroy anything which matches @myprojectsnap ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
In other words, there is no case where multiple scrubs compete for the resources of a single disk because a single disk only participates in one pool. Excellent point. However, the problem scenario was described as SAN. I can easily imagine a scenario where some SAN administrator created a pool of raid 5+1 or raid 0+1, and the pool is divided up into 3 LUNs which are presented to 3 different machines. Hence, when Machine A is hammering on the disks, it could also affect Machine B or C. The catch that I keep repeating, is that even a zfs property couldn't possibly solve that problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
Matt Cowger mcow...@salesforce.com writes: This is totally doable, and a reasonable use of zfs snapshots - we do some similar things. Good, thanks for the input. You can easily determine if the snapshot has changed by checking the output of zfs list for the snapshot. Do you mean to just grep it out of the output of zfs list -t snapshot Or is there some finer grained way to get it? (I mean barring feeding the exact snapshot name to zfs list [ which would mean finding the name first, of course] ) Here, it appears adding anything more to that command line causes it to fail. zfs list -t snapshot z3/projects cannot open 'z3/projects': operation not applicable to datasets of this type An example command line from your usage might be handy. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
zfs list | grep '@' zpool/f...@1154758324G - 461G - zpool/f...@1208482 6.94G - 338G - zpool/f...@daily.netbackup 1.07G - 344G - zpool/f...@11547581.77G - 242G - zpool/f...@12084822.26G - 261G - zpool/f...@daily.netbackup 323M - 266G - First column there shows the size of the snapshot (e.g. how much has changed). -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Harry Putnam Sent: Monday, March 22, 2010 2:23 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] snapshots as versioning tool Matt Cowger mcow...@salesforce.com writes: This is totally doable, and a reasonable use of zfs snapshots - we do some similar things. Good, thanks for the input. You can easily determine if the snapshot has changed by checking the output of zfs list for the snapshot. Do you mean to just grep it out of the output of zfs list -t snapshot Or is there some finer grained way to get it? (I mean barring feeding the exact snapshot name to zfs list [ which would mean finding the name first, of course] ) Here, it appears adding anything more to that command line causes it to fail. zfs list -t snapshot z3/projects cannot open 'z3/projects': operation not applicable to datasets of this type An example command line from your usage might be handy. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
On Mon, Mar 22, 2010 at 1:58 PM, Ian Collins i...@ianshome.com wrote: On 03/23/10 09:34 AM, Harry Putnam wrote: Oh, something I meant to ask... is there some standard way to tell before calling for a snapshot, if the directory structure has changed at all, other than aging I mean. Is there something better than running `diff -r [...]' between existing structure and last snapshot. Not really, there is ZFS diff is in the woks, but not here yet. Someone pointed out that you can use bart, but that also scans the directories. It might do what you want, but it doesn't work at the zpool / zfs level, just at the file level layer. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
On Mon, Mar 22, 2010 at 12:21 PM, Richard Elling richard.ell...@gmail.comwrote: Yes, it is better. But still nowhere near platter speed. All it takes is one little seek... True, dat. I find that scrubs start very slow ( 20MB/s) with the disks at near-100% utilization. Towards the end of the scrub, speeds are up in the 250+ MB/s range. It's on very slow disk (8x WD Green), so the seek penalty is high. I suspect this is because data and metadata has been scattered across the disk due to churn from snapshots, etc. I've never noticed a slowdown in regular use though, in fact local disk on my clients tends to be the bottleneck when copying files. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Mon, Mar 22, 2010 at 10:26 AM, Richard Elling richard.ell...@gmail.comwrote: NB. deduped streams should further reduce the snapshot size. I haven't seen a lot of discussion on the list regarding send dedup, but I understand it'll use the DDT if you have dedup enabled on your dataset. What's the process and penalty for using it on a dataset that is not already deduped? Does it build a DDT for just the data in the send? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR 6880994 and pkg fix
On 03/21/10 03:24 PM, Richard Elling wrote: I feel confident we are not seeing a b0rken drive here. But something is clearly amiss and we cannot rule out the processor, memory, or controller. Absolutely no question of that, otherwise this list would be flooded :-). However, the purpose of the post wasn't really to diagnose the hardware but to ask about the behavior of ZFS under certain error conditions. Frank reports that he sees this on the same file, /lib/libdlpi.so.1, so I'll go out on a limb and speculate that there is something in the bit pattern for that file that intermittently triggers a bit flip on this system. I'll also speculate that this error will not be reproducible on another system. Hopefully not, but you never know :-). However, this instance is different. The example you quote shows both expected and actual checksums to be the same. This time the expected and actual checksums are different and fmdump isn't flagging any bad_ranges or set-bits (the behavior you observed is still happening, but orthogonal to this instance at different times and not always on this file). Since file itself is OK, and the expected checksums are always the same, neither the file nor the metatdata appear to be corrupted, so it appears that both are making it into memory without error. It would seem therefore that it is the actual checksum calculation that is failing. But, only at boot time, the calculated (bad) checksums differ (out of 16, 10, 3, and 3 are the same [1]) so it's not consistent. At this point it would seem to be cpu or memory, but why only at boot? IMO it's an old and feeble power supply under strain pushing cpu or memory to a margin not seen during normal operation, which could be why diagnostics never see anything amiss (and the importance of a good power supply). FWIW the machine passed everything vts could throw at it for a couple of days. Anyone got any suggestions for more targeted diagnostics? There were several questions embedded in the original post, and I'm not sure any of them have really been answered: o Why is the file flagged by ZFS as fatally corrupted still accessiible? [is this new behavior from b111b vs b125?]. o What possible mechanism could there be for the /calculated/ checksums of /four/ copies of just one specific file to be bad and no others? o Why did this only happen at boot to just this one file which also is peculiarly subject to the bitflips you observed, also mostly at boot (sometimes at scrub)? I like the feeble power supply answer, but why just this one file? Bizarre... # zpool get failmode rpool NAME PROPERTY VALUE SOURCE rpool failmode wait default This machine is extremely memory limited, so I suspect that libdlpi.so.1 is not in a cache. Certainly, a brand new copy wouldn't be, and there's no problem writing and (much later) reading the new copy (or the old one, for that matter). It remains to be seen if the brand new copy gets clobbered at boot (the machine, for all it's faults, remains busily up and operational for months at a time). Maybe I should schedule a reboot out of curiosity :-). This sort of specific error analysis is possible after b125. See CR6867188 for more details. Wasn't this in b125? IIRC we upgraded to b125 for this very reason. There certainly seems to be an overwhelming amount of data in the various logs! Cheers -- Frank [1] This could be (3+1) * 4 where in one instance all 3+1 happen to be the same. Does ZFS really read all 4 copies 4 times (by fmdump timestamp, 8 within 1uS, 40mS later, another 8, again within 1uS)? Not sure what the fmdump timestamps mean, so it's hard to find any pattern. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On 03/22/10 05:04 PM, Brandon High wrote: On Mon, Mar 22, 2010 at 10:26 AM, Richard Elling richard.ell...@gmail.com mailto:richard.ell...@gmail.com wrote: NB. deduped streams should further reduce the snapshot size. I haven't seen a lot of discussion on the list regarding send dedup, but I understand it'll use the DDT if you have dedup enabled on your dataset. The send code (which is user-level) builds its own DDT no matter what, but it will use existing checksums if on-disk dedup is already in effect. What's the process and penalty for using it on a dataset that is not already deduped? The penalty is the cost of doing the checksums. Does it build a DDT for just the data in the send? Yes, currently limited to 20% of physical memory size. Lori -B -- Brandon High : bh...@freaks.com mailto:bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Thu, Mar 18, 2010 at 10:38:00PM -0700, Rob wrote: Can a ZFS send stream become corrupt when piped between two hosts across a WAN link using 'ssh'? No. SSHv2 uses HMAC-MD5 and/or HMAC-SHA-1, depending on what gets negotiated, for integrity protection. The chances of random on the wire corruption going undetected by link-layer CRCs, TCP's CRC and SSHv2's MACs is infinitessimally small. I suspect the chances of local bit flips due to cosmic rays and what not are higher. A bigger problem is that SSHv2 connections do not survive corruption on the wire. That is, if corruption is detected then the connection gets aborted. If you were zfs send'ing 1TB across a long, narrow link and corruption hit the wire while sending the last block you'd have to re-send the whole thing (but even then such corruption would still have to get past link-layer and TCP checksums -- I've seen it happen, so it is possible, but it is also unlikely). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots as versioning tool
You can easily determine if the snapshot has changed by checking the output of zfs list for the snapshot. Do you mean to just grep it out of the output of zfs list -t snapshot I think the point is: You can easily tell how many MB changed in a snapshot, and therefore you can easily tell yes the snapshot changed. But unfortunately, no you can't easily tell which files changed. Yet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send/receive and file system properties
I am trying to coordinate properties and data between 2 file servers. on file server 1 I have: zfs get all zfs52/export/os/sles10sp2 NAME PROPERTY VALUE SOURCE zfs52/export/os/sles10sp2 type filesystem - zfs52/export/os/sles10sp2 creation Mon Mar 22 15:28 2010 - zfs52/export/os/sles10sp2 used 662M - zfs52/export/os/sles10sp2 available 49.4G - zfs52/export/os/sles10sp2 referenced661M - zfs52/export/os/sles10sp2 compressratio 2.88x - zfs52/export/os/sles10sp2 mounted yes - zfs52/export/os/sles10sp2 quota 50G local zfs52/export/os/sles10sp2 mountpoint/export/os/sles10sp2 local zfs52/export/os/sles10sp2 sharenfs r...@192.168.0.0/16,ro...@192.168.0.0/24 inherited from zfs52/export/os zfs52/export/os/sles10sp2 checksum on default zfs52/export/os/sles10sp2 compression gzip local ... I use zfs send zfs52/export/os/sles10...@hpffs52_201003221747 | ssh -c blowfish hpffs51 zfs receive -d zfs51 to copy it to another system , with the same mointpoint , sharenfs , quota and compression properties on the other system I see: zfs get all zfs51/export/os/sles10sp2 NAME PROPERTY VALUE SOURCE zfs51/export/os/sles10sp2 type filesystem - zfs51/export/os/sles10sp2 creation Mon Mar 22 20:00 2010 - zfs51/export/os/sles10sp2 used 1.76G - zfs51/export/os/sles10sp2 available 10.5T - zfs51/export/os/sles10sp2 referenced1.76G - zfs51/export/os/sles10sp2 compressratio 1.00x - zfs51/export/os/sles10sp2 mounted yes - zfs51/export/os/sles10sp2 quota none default zfs51/export/os/sles10sp2 mountpoint/export/os/sles10sp2 inherited from zfs51/export zfs51/export/os/sles10sp2 sharenfs r...@192.168.0.0/16:@172.16.20.0/24:hpffs24-bkup:hpffs01-bkup,ro...@192.168.0.0/24:@172.16.20.0/24:hpffs24-bkup:hpffs01-bkup inherited from zfs51/export zfs51/export/os/sles10sp2 checksum on default zfs51/export/os/sles10sp2 compression off default The sharenfs and mountpoints came across fine, but what happened to compression and quota ?? Is there an option I need? Len Zaifman Systems Manager, High Performance Systems The Centre for Computational Biology The Hospital for Sick Children 555 University Ave. Toronto, Ont M5G 1X8 tel: 416-813-5513 email: leona...@sickkids.ca This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] LSISAS2004 support
All, I did some digging and I was under the impression that the mr_sas driver was to support the LSISAS2004 HBA controller from LSI. I did add the pci id to the driver alias for mr_sas, but then the driver still showed up as unattached (see below). Did I miss something, or was my assumption that this controller was supported in the dev branch flawed. I'm running: SunOS 5.11 snv_134 i86pc i386 i86pc Solaris. Thanks in advance for any pointers. node name: pci1000,3010 Vendor: LSI Logic / Symbios Logic Device: SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] Sub-Vendor: LSI Logic / Symbios Logic binding name: pciex1000,70 devfs path: /p...@0,0/pci8086,3...@3/pci1000,3010 pci path: 3,0,0 compatible name: (pciex1000,70.1000.3010.2)(pciex1000,70.1000.3010)(pciex1000,70.2)(pciex1000,70)(pciexclass,010700)(pciexclass,0107)(pci1000,70.1000.3010.2)(pci1000,70.1000.3010)(pci1000,3010)(pci1000,70.2)(pci1000,70)(pciclass,010700)(pciclass,0107) driver name:mr_sas driver state: Detached assigned-addresses: 81030010 reg:3 compatible: pciex1000,70.1000.3010.2 model: Serial Attached SCSI Controller power-consumption: 1 devsel-speed: 0 interrupts: 1 subsystem-vendor-id:1000 subsystem-id: 3010 unit-address: 0 class-code: 10700 revision-id:2 vendor-id: 1000 device-id: 70 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR 6880994 and pkg fix
On Mar 22, 2010, at 4:21 PM, Frank Middleton wrote: On 03/21/10 03:24 PM, Richard Elling wrote: I feel confident we are not seeing a b0rken drive here. But something is clearly amiss and we cannot rule out the processor, memory, or controller. Absolutely no question of that, otherwise this list would be flooded :-). However, the purpose of the post wasn't really to diagnose the hardware but to ask about the behavior of ZFS under certain error conditions. Frank reports that he sees this on the same file, /lib/libdlpi.so.1, so I'll go out on a limb and speculate that there is something in the bit pattern for that file that intermittently triggers a bit flip on this system. I'll also speculate that this error will not be reproducible on another system. Hopefully not, but you never know :-). However, this instance is different. The example you quote shows both expected and actual checksums to be the same. Look again, the checksums are different. This time the expected and actual checksums are different and fmdump isn't flagging any bad_ranges or set-bits (the behavior you observed is still happening, but orthogonal to this instance at different times and not always on this file). don't forget the -V flag :-) Since file itself is OK, and the expected checksums are always the same, neither the file nor the metatdata appear to be corrupted, so it appears that both are making it into memory without error. It would seem therefore that it is the actual checksum calculation that is failing. But, only at boot time, the calculated (bad) checksums differ (out of 16, 10, 3, and 3 are the same [1]) so it's not consistent. At this point it would seem to be cpu or memory, but why only at boot? IMO it's an old and feeble power supply under strain pushing cpu or memory to a margin not seen during normal operation, which could be why diagnostics never see anything amiss (and the importance of a good power supply). FWIW the machine passed everything vts could throw at it for a couple of days. Anyone got any suggestions for more targeted diagnostics? There were several questions embedded in the original post, and I'm not sure any of them have really been answered: o Why is the file flagged by ZFS as fatally corrupted still accessiible? [is this new behavior from b111b vs b125?]. o What possible mechanism could there be for the /calculated/ checksums of /four/ copies of just one specific file to be bad and no others? Broken CPU, HBA, bus, or memory. o Why did this only happen at boot to just this one file which also is peculiarly subject to the bitflips you observed, also mostly at boot (sometimes at scrub)? I like the feeble power supply answer, but why just this one file? Bizarre... Broken CPU, HBA, bus, memory, or power supply. # zpool get failmode rpool NAME PROPERTY VALUE SOURCE rpool failmode wait default This machine is extremely memory limited, so I suspect that libdlpi.so.1 is not in a cache. Certainly, a brand new copy wouldn't be, and there's no problem writing and (much later) reading the new copy (or the old one, for that matter). It remains to be seen if the brand new copy gets clobbered at boot (the machine, for all it's faults, remains busily up and operational for months at a time). Maybe I should schedule a reboot out of curiosity :-). This sort of specific error analysis is possible after b125. See CR6867188 for more details. Wasn't this in b125? IIRC we upgraded to b125 for this very reason. There certainly seems to be an overwhelming amount of data in the various logs! Cheers -- Frank [1] This could be (3+1) * 4 where in one instance all 3+1 happen to be the same. Does ZFS really read all 4 copies 4 times (by fmdump timestamp, 8 within 1uS, 40mS later, another 8, again within 1uS)? Not sure what the fmdump timestamps mean, so it's hard to find any pattern. Transient failures are some of the most difficult to track down. Not all transient failures are random. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] pool use from network poor performance
Hi i have now two pools rpool 2-way mirror ( pata ) data 4-way raidz2 ( sata ) if i access to datapool from network , smb , nfs , ftp , sftp , jne... i get only max 200 KB/s speeds compared to rpool that give XX MB/S speeds to and from network it is slow. Any ideas what reasons might be and how try to find reason. Locally datapool works reasonable fast for me. # date mkfile 1G testfile date Tuesday, March 23, 2010 07:52:19 AM EET Tuesday, March 23, 2010 07:52:36 AM EET Some information about system. # cat /etc/release OpenSolaris Development snv_134 X86 Copyright 2010 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 01 March 2010 # isainfo -v 64-bit amd64 applications ahf sse3 sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications ahf sse3 sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss