Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Responses inline below... On Sat, Mar 20, 2010 at 00:57, Edward Ned Harvey wrote: > > 1. NDMP for putting "zfs send" streams on tape over the network. So > > Tell me if I missed something here. I don't think I did. I think this > sounds like crazy talk. > > I used NDMP up till November, when we replaced our NetApp with a Solaris > Sun > box. In NDMP, to choose the source files, we had the ability to browse the > fileserver, select files, and specify file matching patterns. My point is: > NDMP is file based. It doesn't allow you to spawn a process and backup a > data stream. > > Unless I missed something. Which I doubt. ;-) > > You clearly know more about NDMP than I do. I'm still learning. I forgot that you previously mentioned the file-based nature of NDMP. I'm still wondering about that in the longer term, but yeah, this is my mistake. I'll end up doing some deeper diving on this topic, I can see. But this was just me seeking clarity. Maybe Fishworks appliances would benefit from the presence of NDMP but if you're using a standard server running (Open)Solaris, it looks like a non-starter. > > > To Ed Harvey: > > > > Some questions about your use of NetBackup on your secondary server: > > > > 1. Do you successfully backup ZVOLs? We know NetBackup should be able > > to capture datasets (ZFS file systems) using straight POSIX semantics. > > I wonder if I'm confused by that question. "backup zvols" to me, would > imply something at a lower level than the filesystem. No, we're not doing > that. We just specify "backup the following directory and all of its > subdirectories." Just like any other typical backup tool. > > The reason we bought NetBackup is because it intelligently supports all the > permissions, ACL's, weird (non-file) file types, and so on. And it > officially supports ZFS, and you can pay for an enterprise support > contract. > > Basically, I consider the purchase cost of NetBackup to be insurance. > Although I never plan to actually use it for anything, because all our > bases > are covered by "zfs send" to hard disks and tapes. I actually trust the > "zfs send" solution more, but I can't claim that I, or anything I've ever > done, is 100% infallible. So I need a commercial solution too, just so I > can point my finger somewhere if needed. > Yeah, I get all the reasons you state for using NetBackup. Makes total sense. And I asked this question to be clear about support for backing up ZVOLs outside if ZFS-specific tools e.g. zfs(1M). I didn't actually think NetBackup could capture ZVOLs, for the reasons you listed, but I wanted to be absolutely clear. Asking the wrong questions is the leading cause of wrong answers, as a former boss of mine used to say. > > > > 2. What version of NetBackup are you using? > > I could look it up, but I'd have to VPN in and open up a console, etc etc. > We bought it in November, so it's whatever was current 4-5 months ago. > > Ok. Thanks. > > > 3. You simply run the NetBackup agent locally on the (Open)Solaris > > server? > > Yup. We're doing no rocket science with it. Ours is the absolute most > basic NetBackup setup you could possibly have. We're not using 90% of the > features of NetBackup. It's installed on a Solaris 10 server, with locally > attached tape library, and it does backups directly from local disk to > local > tape. > > This is an advantage of Solaris being a 1st class citizen in the NetBackup world. For a Unified Storage appliance, however, NDMP for file level backup may be a reasonable choice (as Darren postulated earlier). But if you just buy a server and install Solaris, then the NetBackup Solaris agent is the easiest route, as you've shown. Thanks again, Ed, for your time and generosity. And thank you to all contributors to this thread for indulging my curiosity. -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
> A pool with a 4-wide raidz2 is a completely nonsensical idea. It has > the same amount of accessible storage as two striped mirrors. And would > be slower in terms of IOPS, and be harder to upgrade in the future > (you'd need to keep adding four drives for every expansion with raidz2 > - with mirrors you only need to add another two drives to the pool). > > Just my $0.02 Here's my $0.04: Suppose you had 4 disks, configured as 2 mirrors. And you want to expand by adding another mirror. No problem. Suppose you had 4 disks, configured as raidz2. And you want to expand by adding a mirror. No problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
> I have noted there is now raidz2 and been thinking witch woul be > better. > A pool with 2 mirrors or one pool with 4 disks raidz2 If you use raidz2, made of 4 disks, you will have usable capacity of 2 disks, and you can tolerate any 2 disks failing. If you use 2 mirrors, you will have a total of 4 disks and usable capacity of 2 disks. Your redundancy is not quite as good as above ... You could survive a failed disk in the first mirror, and a failed disk in the second mirror, but you could not survive two failed disks that are in the same mirror. If you use raidz2, your reliability might be a little bit higher. If you use 2 mirrors, your performance will certainly be higher for random IO operations. So you must choose what you care about more: Performance or reliability. Both ways are good ways. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
> It would appear that the bus bandwidth is limited to about 10MB/sec > (~80Mbps) which is well below the theoretical 400Mbps that 1394 is > supposed to be able to handle. I know that these two disks can go > significantly higher since I was seeing 30MB/sec when they were used on > Macs previously in the same daisy-chain configuration. I have not done 1394 in solaris or opensolaris. But I have used it in windows, mac, and Linux. Many times for each one. I never have even the remotest problem with it in any of these other platforms. I consider it more universally reliable, even than USB, because occasionally I see a bad USB driver on some boot CD or something, which can only drive USB around 11Mbit. Again, I've never had anything but decent performance out of 1394. Generally speaking, I use 1394 on: Dell laptops Lenovo laptops Apple laptops Apple XServe HP laptops ... and maybe some dell servers... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> 1. NDMP for putting "zfs send" streams on tape over the network. So Tell me if I missed something here. I don't think I did. I think this sounds like crazy talk. I used NDMP up till November, when we replaced our NetApp with a Solaris Sun box. In NDMP, to choose the source files, we had the ability to browse the fileserver, select files, and specify file matching patterns. My point is: NDMP is file based. It doesn't allow you to spawn a process and backup a data stream. Unless I missed something. Which I doubt. ;-) > To Ed Harvey: > > Some questions about your use of NetBackup on your secondary server: > > 1. Do you successfully backup ZVOLs? We know NetBackup should be able > to capture datasets (ZFS file systems) using straight POSIX semantics. I wonder if I'm confused by that question. "backup zvols" to me, would imply something at a lower level than the filesystem. No, we're not doing that. We just specify "backup the following directory and all of its subdirectories." Just like any other typical backup tool. The reason we bought NetBackup is because it intelligently supports all the permissions, ACL's, weird (non-file) file types, and so on. And it officially supports ZFS, and you can pay for an enterprise support contract. Basically, I consider the purchase cost of NetBackup to be insurance. Although I never plan to actually use it for anything, because all our bases are covered by "zfs send" to hard disks and tapes. I actually trust the "zfs send" solution more, but I can't claim that I, or anything I've ever done, is 100% infallible. So I need a commercial solution too, just so I can point my finger somewhere if needed. > 2. What version of NetBackup are you using? I could look it up, but I'd have to VPN in and open up a console, etc etc. We bought it in November, so it's whatever was current 4-5 months ago. > 3. You simply run the NetBackup agent locally on the (Open)Solaris > server? Yup. We're doing no rocket science with it. Ours is the absolute most basic NetBackup setup you could possibly have. We're not using 90% of the features of NetBackup. It's installed on a Solaris 10 server, with locally attached tape library, and it does backups directly from local disk to local tape. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> > I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or > > even home) backup system on their own one or both can be components > of > > the full solution. I would be pretty comfortable with a solution thusly designed: #1 A small number of external disks, "zfs send" onto the disks and rotate offsite. Then, you're satisfying the ability to restore individual files, but you're not satisfying the archivability, longevity of tapes. #2 Also, "zfs send" onto tapes. So if ever you needed something older than your removable disks, it's someplace reliable, just not readily accessible if you only want a subset of files. > I'm in the fortunate position of having my backups less than the size > of a > large single drive; so I'm rotating three backup drives, and intend to It's of course convenient if your backup fits entirely inside a single removable disk, but that's not a requirement. You could always use removable stripesets, or raidz, or whatever you wanted. For example, you could build a raidz removable volume out of 5 removable disks if you wanted. Just be sure you attach all 5 disks before you "zpool import" ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS+CIFS: Volume Shadow Services, or Simple Symlink?
> > ZFS+CIFS even provides > > Windows Volume Shadow Services so that Windows users can do this on > > their own. > > I'll need to look into that, when I get a moment. Not familiar with > Windows Volume Shadow Services, but having people at home able to do > this > directly seems useful. Even in a fully supported, all-MS environment, I've found the support for "Previous Versions" is spotty and sort of unreliable at best. Not to mention, I think the user interface is just simply non-intuitive. As an alternative, here's what I do: ln -s .zfs/snapshot snapshots Voila. All Windows or Mac or Linux or whatever users are able to easily access snapshots. It's worth note, in the default config of zfs-auto-snapshot, the snaps are created with non-cifs compatible characters in the filename (the ":" colon character in the time.) So I also make it a habit during installation, to modify the zfs-auto-snapshot scripts, and substitute that character. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
> "k" == Khyron writes: k> FireWire is an Apple technology, so they have a vested k> interest in making sure it works well [...] They could even k> have a specific chipset that they exclusively use in their k> systems, yes, you keep repeating yourselves, but there are only a few firewire host chips, like ohci and lynx, and apple uses the same ones as everyone else, no magic. Why would you speak such a complicated fantasy out loud without any reason to believe it other than your imaginations? I also tried to use firewire on Solaris long ago and had a lot of problems with it, both with the driver stack in Solaris and with the embedded software inside a cheaper non-Oxford case (Prolific). I think y'all forum users shuold stick to SAS/SATA for external disks and avoid firewire and USB both. Realize, though, that it is not just the chip driver but the entire software stack that influences speed and reliability. Even above what you normally consider the firewire stack, above all the mid-layer and scsi emulation stuff, Mac OS X for example is rigorous about handling force-unmounting, both with umount -f and disks that go away without warning. FreeBSD OTOH has major problems with force-unmounting, panicing and waiting forever. Solaris has problems too with freezing zpool maintenance commands, access to pools unrelated to the one with the device that went away, and NFS serving anything while any zpool is frozen. This is a problem even if you don't make a habit of yanking disks because it can make diagnosing problems really difficult: what if your case, like my non-Oxford one, has a firmware bug that makes it freeze up sometimes? or a flakey power supply or lose cable? If the OS does not stay up long enough to report the case detached, and stay sane enough for you to figure out what makes it retach (waiting a while, rebooting the case, jiggling the power connector, jiggling the data connector) then you will probably never figure out what's wrong with it, as I didn't for months while if I'd had the same broken case on a Mac I'd have realized almost immediately that it sometimes detaches itself for no reason and retaches when I cycle it's power switch but not when I plug/unplug its data cable and not when I reboot the Mac, so I'd know the case had buggy firmware, while with Solaris I just get these craazy panic messages. Once your exception handling reaches a certain level of crappyness, you cannot touch anything without everything collapsing. And on Solaris all this freezing/panicing behavior depends a lot which disk driver yuo're using while Mac OS X it's, meh, basically working the same for SATA, USB, Firewire, or NFS client, and also you can mount images with hdiutil over NFS without getting weird checksum errors or deadlocks like you do with file or lofiadm-backed ZFS. (globalsan iscsi is still a mess though, worse than all other mac disk drivers and worse than the solaris initiator) I do not like the Mac OS much because it's slow, because the hardware's overpriced and fragile, because the only people running it inside VM's are using piratebay copies, and because I distrust Apple and strongly disapprove of their master plan both in intent and practice like the way they crippled dtrace, the displayport bullshit, and their terrible developer relations like nontransparent last-minute API yanking and ``agreements'' where you even have to agree not to discuss the agreement, and in general of their honing a talent for manipulating people into exploitable corners by slowly convincing them it's okay to feel lazy and entitled. But yes they've got some things relevant to server-side storage working better than Solaris does like handling flakey disks sanely, and providing source for the stable supported version of their OS not just the development version. pgpzf9yUTzCYk.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
> > ZFS+CIFS even provides > > Windows Volume Shadow Services so that Windows users can do this on > > their own. > > I'll need to look into that, when I get a moment. Not familiar with > Windows Volume Shadow Services, but having people at home able to do > this > directly seems useful. I'd like to spin off this discussion into a new thread. Any replies to this one will surely just get buried in the (many messages) in this very long thread... New thread: ZFS+CIFS: Volume Shadow Services, or Simple Symlink? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Erik, I don't think there was any confusion about the block nature of "zfs send" vs. the file nature of star. I think what this discussion is coming down to is the best ways to utilize "zfs send" as a backup, since (as Darren Moffat has noted) it supports all the ZFS objects and metadata. I see 2 things coming out of this: 1. NDMP for putting "zfs send" streams on tape over the network. So the question I have now is for anyone who has used or is using NDMP on OSol. How well does it work? Pros? Cons? If people aren't using it, why not? I think this is one area where there are some gains to be made on the OSol backup front. I still need to go back and look at the best ways to use local tape drives on OSol file servers running ZFS to capture ZFS objects and metadata (ZFS ACLs, ZVOLs, etc.). 2. A new tool is required to provide some of the functionality desired, at least as a supported backup method from Sun. While someone in the community may be interested in developing such a tool, Darren also noted that the requisite APIs are private currently and still in flux. They haven't yet stabilized and been published. To Ed Harvey: Some questions about your use of NetBackup on your secondary server: 1. Do you successfully backup ZVOLs? We know NetBackup should be able to capture datasets (ZFS file systems) using straight POSIX semantics. 2. What version of NetBackup are you using? 3. You simply run the NetBackup agent locally on the (Open)Solaris server? I thank everyone who has participated in this conversation for sharing their thoughts, experiences and realities. It has been most informational. On Fri, Mar 19, 2010 at 13:11, erik.ableson wrote: > On 19 mars 2010, at 17:11, Joerg Schilling wrote: > > >> I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet > >> the implication is that a tar archive stored on a tape is considered a > >> backup ? > > > > You cannot get a single file out of the zfs send datastream. > > zfs send is a block-level transaction with no filesystem dependencies - it > could be transmitting a couple of blocks that represent a portion of a file, > not necessarily an entire file. And since it can also be used to host a > zvol with any filesystem format imaginable it doesn't want to know. > -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 19 mars 2010, at 17:11, Joerg Schilling wrote: >> I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet >> the implication is that a tar archive stored on a tape is considered a >> backup ? > > You cannot get a single file out of the zfs send datastream. zfs send is a block-level transaction with no filesystem dependencies - it could be transmitting a couple of blocks that represent a portion of a file, not necessarily an entire file. And since it can also be used to host a zvol with any filesystem format imaginable it doesn't want to know. Going back to star as an example - from the man page : "Star archives and extracts multiple files to and from a single file called a tarfile. A tarfile is usually a magnetic tape, but it can be any file. In all cases, appearance of a directory name refers to the files and (recursively) subdirectories of that directory." This process pulls files (repeat: files! not blocks) off of the top of a filesystem so it needs to be presented a filesystem with interpretable file objects (like almost all backup tools). ZFS confuses the issue by integrating volume management with filesystem management. zfs send is dealing with the volume and the blocks that represent the volume without any file-level dependence. It addresses an entirely different type of backup need, that is to be able to restore or mirror (especially mirror to another live storage system) an entire volume at a point in time. It does not replace the requirement for file-level backups which deal with a different level of granularity. Simply because the restore use-case is different. For example, on my Mac servers, I run two different backup strategies concurrently - one is bootable clone from which I can restart the computer immediately in the case of a drive failure. At the same time, I use the Time Machine backups for file level granularity that allows me to easily find a particular file at a particular moment. Before Time Machine, this role was fulfilled with Retrospect to a tape drive. However, a block-level dump to tape had little interest in the first use case since the objective is to minimize the RTO. For disaster recovery purposes any of these backup objects can be externalized. Offsite rotation of the disks used allow the management of the RPO. Remember that files exist in a filesystem context and need to be backed up in this context. Volumes exist in another context and can be replicated/backed up in this context. zfs send/recv = EMC MirrorView, NetApp Snap Mirror, EqualLogic Auto-replication, HP StorageWorks Continuous Access, DataCore AIM, etc. zfs send/recv ≠ star, Backup Exec, CommVault, ufsdump, bacula, zmanda, Retrospect, etc. Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
12 disks in mirrored pairs is a small configuration. The "smaller" sets you referrer to might be the number of disks in a raidz/raidz2/raidz3 top level vdev. You say performance is one of your top priorities but what is the workload ? Mostly read ? Mostly write ? Random ? Sequential ? See the ZFS Best Practices guide on the solarisinternals.com site for guidance on how to select your pool layout. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide In particular this part: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] sympathetic (or just multiple) drive failures
Most discussions I have seen about RAID 5/6 and why it stops "working" seem to base their conclusions solely on single drive characteristics and statistics. It seems to me there is a missing component in the discussion of drive failures in the real world context of a system that lives in an environment shared by all the system components - for instance, the video of the disks slowing down when they are yelled at is a good visual example of the negative effect of vibration on drives. http://www.youtube.com/watch?v=tDacjrSCeq4 I thought the google and CMU papers talked about a surprisingly high (higher than expected) rate of multiple drive failures of drives "nearby" each other, but I couldn't find it when I re-=skimmed the papers now. What are peoples' experiences with multiple drive failures? Given that we often use same brand/model/batch drives (even though we are not supposed to), same enclosure, same rack, etc for a given raid 5/6/z1/z2/z3 system, should we be paying more attention to harmonics, vibration/isolation and non-intuitive system level statistics that might be inducing close proximity drive failures rather than just throwing more parity drives at the problem? What if our enclosure and environmental factors increase the system level statistics for multiple drive failures beyond the (used by everyone) single drive failure statistics to the point where it is essentially negating the positive effect of adding parity drives? I realize this issue is not addressed because there is too much variability in the enviroments, etc but I thought it would be interesting to see if anyone has experienced much in terms of close time proximity, multiple drive failures. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Usage of hot spares and hardware allocation capabilities.
Responses inline... On Tue, Mar 16, 2010 at 07:35, Robin Axelsson wrote: > I've been informed that newer versions of ZFS supports the usage of hot > spares which is denoted for drives that are not in use but available for > resynchronization/resilvering should one of the original drives fail in the > assigned storage pool. > That is the definition of a hot spare, at least informally. ZFS has supported this for some time (if not from the beginning; I'm not in a position to answer that). It is *not* new. > > I'm a little sceptical about this because even the hot spare will be > running for the same duration as the other disks in the pool and therefore > will be exposed to the same levels of hardware degradation and failures > unless it is put to sleep during the time it is not being used for storage. > So, is there a sleep/hibernation/standby mode that the hot spares operate in > or are they on all the time regardless of whether they are in use or not? > Not that I am aware of or have heard others report. No such "sleep mode" exists. Sounds like you want a Copan storage system. AFAIK, hot spares are always spinning, that's why they are hot. > > Usually the hot spare is on a not so well-performing SAS/SATA controller, > so given the scenario of a hard drive failure upon which a hot spare has > been used for resilvering of say a raidz2 cluster, can I move the resilvered > hot spare to the faster controller by letting it take the faulty hard > drive's space using the "zpool offline", "zpool online" commands? > Usually? That's not my experience, from multiple vendors hardware RAID arrays. Usually it's on a channel used by storage disks. Maybe someone else has seen otherwise. I'd be personally curious to know what system puts a spare on a lower performance channel. That risks slowing the entire device (RAID set/group) when the hot spare kicks in. As for your questions, that doesn't make a lot of sense to me. I don't even get how that would work, but I'm not "Wile E. Coyote, Super Genius" either. > > To be more general; are the hard drives in the pool "hard coded" to their > SAS/SATA channels or can I swap their connections arbitrarily if I would > want to do that? Will zfs automatically identify the association of each > drive of a given pool or tank and automatically reallocate them to put the > pool/tank/filesystem back in place? > No. Each disk in the pool has a unique ID, as I understand. Thus, you should be able to move a disk to another location (channel, slot) and it would still be a part of the same pool and VDEV. All of that said, I saw this post when it originally came in. I notice no one has responded to it until now. I don't know about anyone else, but I know that I was offended when I read this. I know for myself, I wasn't sure how to take this when I read it. Maybe you should not assume that people on this list don't know what hot sparing is, or that ZFS just learned. Just a suggestion. -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
The point I think Bob was making is that FireWire is an Apple technology, so they have a vested interest in making sure it works well on their systems and with their OS. They could even have a specific chipset that they exclusively use in their systems, although I don't see why others couldn't source it (with the exception that others may be too cheap to do so). Given these factors, it makes sense that FireWire performs brilliantly on Apple hardware/software, while everyone else makes the bare minimum (or less) investment in it, if that much. So those open drivers, while they could be useful for learning or other purposes, may not be directly usable for the systems people are running with OpenSolaris. At least, that's what I think Bob meant. On Fri, Mar 19, 2010 at 17:08, Alex Blewitt wrote: > On 19 Mar 2010, at 15:30, Bob Friesenhahn wrote: > > > On Fri, 19 Mar 2010, Khyron wrote: > >> Getting better FireWire performance on OpenSolaris would be nice though. > >> Darwin drivers are open...hmmm. > > > > OS-X is only (legally) used on Apple hardware. Has anyone considered > that since Firewire is important to Apple, they may have selected a > particular Firewire chip which performs particularly well? > > Darwin is open-source. > > http://www.opensource.apple.com/source/xnu/xnu-1486.2.11/ > > http://www.opensource.apple.com/source/IOFireWireFamily/IOFireWireFamily-417.4.0/ > > Alex -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
Chris Dunbar - Earthside, LLC wrote: Hello, After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It's not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn't seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. Thank you, Chris Dunbar There's not much benefit I can see to having two pools if both are using the same configuration (i.e all mirrors or all raidz). There are reasons to do so, but I don't see that they would be of any real benefit for what you describe. A Hot spare disk can be assigned to multiple pools (often referred to as a "global" hot spare) Preferences for raidz[123] configs is to have 4-6 data disks in the vdev. Realistically speaking, you have several different (practical) configurations possible, in order of general performance: (a) 6 x 2-way mirrors + 1 pool hot spare -> 9TB usable (b) 4 x 3-ways mirrors + 1 pool hot spare -> 6TB usable (c) 1 6-disk raidz + 1 7-disk raidz -> 16.5TB usable (d) 2 6-disk raidz + 1 pool hot spare -> 15TB usable (e) 1 6-disk raidz2 + 1 7-disk raidz2 -> 13.5TB usable (f) 2 6-disk raidz2 + 1 pool hot spare -> 12TB usable (g) 1 6-disk raidz3 + 1 7-disk raidz3 -> 10.5TB usable (h) 1 13-disk raidz3 -> 15TB usable Given the size of your disks, resilvering is likely to have a significant time problem in any RAIDZ[123] configuration. That is, unless you are storing (almost exclusively) very large files, resilver time is going to be significant, and can potentially be radically higher than a mirrored config. The mirroring configs will out-perform raidz[123] on everything except large streaming write/reads, and even then, it's a toss-up. Overall, the (a), (d), and (f) configurations generally offer the best balance of redundancy, space, and performance. Here's the chances to survive disk failures (assuming hot spares are unable to be used; that is, all disk failures happen in a short period of time) - note that all three can always survive a single disk failure: (a) 90% for 2, 73% for 3, 49% for 4, 25% for 5. (d) 55% for 2, 27% for 3, 0% for 4 or more (f) 100% for 2, 80% for 3, 56% for 4, 0% for 5. Depending on your exact requirements, I'd go with (a) or (f) as the best choices - (a) if performance is more important, (f) if redundancy overrides performance. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
Brandon High wrote: On Fri, Mar 19, 2010 at 5:32 AM, Chris Dunbar - Earthside, LLC mailto:cdun...@earthside.net>> wrote: if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. Performance and fault tolerance are somewhat conflicting. You'll have good fault tolerance and performance using a wide raidz3 stripe, eg: 12-disk raidz3 with a spare. Actually, except on certain loads (large, streaming write/read), this config is going to give pretty poor performance. You'll have the best fault tolerance using small raidz3 stripes with a spare, for instance 2 x 6-disk raidz3. This uses 50% of your disks for redundancy. You'll have slightly better performance and slightly worse fault tolerance using raidz2 instead in both cases above. I would not recommend using raidz, as it will offer almost no real fault tolerance with the size of drives you're using. Realistically, a 2 x 6-disk raidz2 with a hot spare will provide /almost/ the same level of redundancy as 2 x 6-disk raidz3, and about 30% better performance and space. (he said he had 13 disks) You'll have your best performance and fault tolerance using 3-way mirrors, but you sacrifice 2/3 of your disks to do it. Actually, I think that raidz3 is higher tolerance still, but the performance difference will be huge. 2-way mirrors is slightly worse for fault tolerance (below raidz2 I believe) and good performance. Yes - see my followup post for percentages of failures. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On 19 Mar 2010, at 15:30, Bob Friesenhahn wrote: > On Fri, 19 Mar 2010, Khyron wrote: >> Getting better FireWire performance on OpenSolaris would be nice though. >> Darwin drivers are open...hmmm. > > OS-X is only (legally) used on Apple hardware. Has anyone considered that > since Firewire is important to Apple, they may have selected a particular > Firewire chip which performs particularly well? Darwin is open-source. http://www.opensource.apple.com/source/xnu/xnu-1486.2.11/ http://www.opensource.apple.com/source/IOFireWireFamily/IOFireWireFamily-417.4.0/ Alex ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
On Fri, Mar 19, 2010 at 5:32 AM, Chris Dunbar - Earthside, LLC < cdun...@earthside.net> wrote: > if I went with two? Finally, would I be better off with raidz2 or something > else instead of the striped mirrored sets? Performance and fault tolerance > are my highest priorities. > Performance and fault tolerance are somewhat conflicting. You'll have good fault tolerance and performance using a wide raidz3 stripe, eg: 12-disk raidz3 with a spare. You'll have the best fault tolerance using small raidz3 stripes with a spare, for instance 2 x 6-disk raidz3. This uses 50% of your disks for redundancy. You'll have slightly better performance and slightly worse fault tolerance using raidz2 instead in both cases above. I would not recommend using raidz, as it will offer almost no real fault tolerance with the size of drives you're using. You'll have your best performance and fault tolerance using 3-way mirrors, but you sacrifice 2/3 of your disks to do it. Actually, I think that raidz3 is higher tolerance still, but the performance difference will be huge. 2-way mirrors is slightly worse for fault tolerance (below raidz2 I believe) and good performance. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error in zfs list output?
> "bh" == Brandon High writes: bh> I think I'm seeing an error in the output from zfs list with bh> regards to snapshot space utilization. no bug. You just need to think harder about it: the space used cannot be neatly put into buckets next to each snapshot that add to the total, just because of...math. To help understand, suppose you decide, just to fuck things up, that from now on every time you take a snapshot you take two snapshots, with exactly zero filesystem writing happening between the two. What do you want 'zfs list' to say now? What does happen if you do that, is it says all snapshots use zero space. the space shown in zfs list is the amount you'd get back if you deleted this one snapshot. Yes, every time you delete a snapshot, all the numbers reshuffle. Yes, there is a whole cat's cradle of space accounting information hidden in there that does not come out through 'zfs list'. pgpzRUSk68FzY.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
no, but I'm slightly paranoid that way. ;) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On 03/20/10 09:28 AM, Richard Jahnel wrote: They way we do this here is: zfs snapshot voln...@snapnow [i]#code to break on error and email not shown.[/i] zfs send -i voln...@snapbefore voln...@snapnow | pigz -p4 -1> file [i]#code to break on error and email not shown.[/i] scp /dir/file u...@remote:/dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote "gzip -t /dir/file" [i]#code to break on error and email not shown.[/i] shh u...@remote "gunzip< /dir/file | zfs receive volname It works for me and it sends a minimum amount of data across the wire which is tested to minimize the chance of inflight issues. Excpet on Sundays when we do a full send. Don't you trust the stream checksum? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
They way we do this here is: zfs snapshot voln...@snapnow [i]#code to break on error and email not shown.[/i] zfs send -i voln...@snapbefore voln...@snapnow | pigz -p4 -1 > file [i]#code to break on error and email not shown.[/i] scp /dir/file u...@remote:/dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote "gzip -t /dir/file" [i]#code to break on error and email not shown.[/i] shh u...@remote "gunzip < /dir/file | zfs receive volname It works for me and it sends a minimum amount of data across the wire which is tested to minimize the chance of inflight issues. Excpet on Sundays when we do a full send. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Error in zfs list output?
I think I'm seeing an error in the output from zfs list with regards to snapshot space utilization. In the first list, there are 818M used by snapshots, but the snaps listed aren't using anything close to that amount. If I destroy the first snapshot, then the second one suddenly jumps in space used to 813M, which seems about right and the USEDSNAP column makes sense. Is this a bug in snapshot accounting or reporting, or is there something I missed? r...@basestar:/export/vmware# zfs list -t all -r -o space tank/export/volumes/caliban NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD tank/export/volumes/caliban 3.09T 1.59G 818M 813M 0 0 tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-10:00 - 2.88M - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-11:00 - 2.80M - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-12:00 - 200K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:15 - 174K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:30 - 252K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:45 - 340K - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-13:00 - 0 - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-13:00 - 0 - - - - r...@basestar:/export/vmware# zfs destroy tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-10:00 r...@basestar:/export/vmware# zfs list -t all -r -o space tank/export/volumes/caliban NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD tank/export/volumes 3.09T 39.3G 0 47.1K 0 39.3G tank/export/volumes/caliban 3.09T 1.59G 815M 813M 0 0 tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-11:00 - 813M - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-12:00 - 200K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:15 - 174K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:30 - 252K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:45 - 340K - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-13:00 - 0 - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-13:00 - 0 - - - - r...@basestar:/export/vmware# zfs destroy tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-11:00 r...@basestar:/export/vmware# zfs list -t all -r -o space tank/export/volumes/caliban NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD tank/export/volumes/caliban 3.09T 815M 2.04M 813M 0 0 tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-12:00 - 200K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:15 - 174K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:30 - 252K - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-12:45 - 340K - - - - tank/export/volumes/cali...@zfs-auto-snap:hourly-2010-03-19-13:00 - 0 - - - - tank/export/volumes/cali...@zfs-auto-snap:frequent-2010-03-19-13:00 - 0 - - - - -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On Fri, March 19, 2010 12:25, Darren J Moffat wrote: > On 19/03/2010 17:19, David Dyer-Bennet wrote: >> >> On Fri, March 19, 2010 11:33, Darren J Moffat wrote: >>> On 19/03/2010 16:11, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffat wrote: > I'm curious, why isn't a 'zfs send' stream that is stored on a tape > yet > the implication is that a tar archive stored on a tape is considered > a > backup ? You cannot get a single file out of the zfs send datastream. >>> >>> I don't see that as part of the definition of a backup - you obviously >>> do - so we will just have to disagree on that. >> >> I used to. Now I think more in terms of getting it from a snapshot >> maintained online on the original storage server. > > Exactly! The single file retrieval due to user error case is best > achieved by an automated snapshot system. ZFS+CIFS even provides > Windows Volume Shadow Services so that Windows users can do this on > their own. I'll need to look into that, when I get a moment. Not familiar with Windows Volume Shadow Services, but having people at home able to do this directly seems useful. >> The overall storage strategy has to include retrieving files lost due to >> user error over some time period, whether that's months or years. And >> having to restore an entire 100TB backup to "spare disk" somewhere to >> get >> one file is clearly not on. > > Completely agree, no where was I suggesting that 'zfs send' out to tape > should be the whole backup strategy. I even pointed to a presentation > given at LOSUG that shows how someone is doing this. Sorry, didn't mean to sound like I was arguing with you (or suggest we disagreed in that area); I intended to pontificate on the problem in general. > I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or > even home) backup system on their own one or both can be components of > the full solution. I'm seeing what a lot of professional and serious amateur photographers are building themselves for storage on a mailing list I'm on. Nearly always it consists of two layers of storage servers, often with one off-site (most of them keep current photos on LOCAL disk, instead of my choice of working directly off storage server disk). I'm in the fortunate position of having my backups less than the size of a large single drive; so I'm rotating three backup drives, and intend to be taking one of them off-site regularly (still in the process of converting to this new scheme; the previous scheme used off-site optical disks). I use ZFS for the removable drives, so I can if necessary reach into them and drag out single files fairly easily if necessary (but "necessary" would require something happening to the online snapshot first). People with much bigger configurations look like they save money using tape for the archival / disaster restore storage, but it's not economically viable at my level. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool I/O error
Hi Cindy, Here's the zpool status: ]# zpool status -v pool: oradata_fs1 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 1 errors on Thu Mar 18 17:00:12 2010 config: NAME STATE READ WRITE CKSUM oradata_fs1 DEGRADED 0 026 c4t60060E801439970139970030d0 DEGRADED 0 0 128 too many errors errors: Permanent errors have been detected in the following files: oradata_fs1:<0x0> # That doesn't really seem to help. For what it's worth,. this is a SAN LUN. What am I missing? - Original Message From: Cindy Swearingen To: Grant Lowe Cc: zfs-discuss@opensolaris.org Sent: Fri, March 19, 2010 10:21:45 AM Subject: Re: [zfs-discuss] zpool I/O error Hi Grant, An I/O error generally means that there is some problem either accessing the disk or disks in this pool, or a disk label got clobbered. Does zpool status provide any clues about what's wrong with this pool? Thanks, Cindy On 03/19/10 10:26, Grant Lowe wrote: > Hi all, > > I'm trying to delete a zpool and when I do, I get this error: > > # zpool destroy oradata_fs1 > cannot open 'oradata_fs1': I/O error > # > The pools I have on this box look like this: > > #zpool list > NAME SIZE USED AVAILCAP HEALTH ALTROOT > oradata_fs1 532G 119K 532G 0% DEGRADED - > rpool 136G 28.6G 107G21% ONLINE - > # > > Why can't I delete this pool? This is on Solaris 10 5/09 s10s_u7. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 19/03/2010 17:19, David Dyer-Bennet wrote: On Fri, March 19, 2010 11:33, Darren J Moffat wrote: On 19/03/2010 16:11, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffat wrote: I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet the implication is that a tar archive stored on a tape is considered a backup ? You cannot get a single file out of the zfs send datastream. I don't see that as part of the definition of a backup - you obviously do - so we will just have to disagree on that. I used to. Now I think more in terms of getting it from a snapshot maintained online on the original storage server. Exactly! The single file retrieval due to user error case is best achieved by an automated snapshot system. ZFS+CIFS even provides Windows Volume Shadow Services so that Windows users can do this on their own. The overall storage strategy has to include retrieving files lost due to user error over some time period, whether that's months or years. And having to restore an entire 100TB backup to "spare disk" somewhere to get one file is clearly not on. Completely agree, no where was I suggesting that 'zfs send' out to tape should be the whole backup strategy. I even pointed to a presentation given at LOSUG that shows how someone is doing this. I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or even home) backup system on their own one or both can be components of the full solution. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool I/O error
Hi Grant, An I/O error generally means that there is some problem either accessing the disk or disks in this pool, or a disk label got clobbered. Does zpool status provide any clues about what's wrong with this pool? Thanks, Cindy On 03/19/10 10:26, Grant Lowe wrote: Hi all, I'm trying to delete a zpool and when I do, I get this error: # zpool destroy oradata_fs1 cannot open 'oradata_fs1': I/O error # The pools I have on this box look like this: #zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT oradata_fs1 532G 119K 532G 0% DEGRADED - rpool 136G 28.6G 107G21% ONLINE - # Why can't I delete this pool? This is on Solaris 10 5/09 s10s_u7. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool I/O error
On Fri, Mar 19, 2010 at 1:26 PM, Grant Lowe wrote: > Hi all, > > I'm trying to delete a zpool and when I do, I get this error: > > # zpool destroy oradata_fs1 > cannot open 'oradata_fs1': I/O error > # > > The pools I have on this box look like this: > > #zpool list > NAME SIZE USED AVAILCAP HEALTH ALTROOT > oradata_fs1 532G 119K 532G 0% DEGRADED - > rpool 136G 28.6G 107G21% ONLINE - > # > > Why can't I delete this pool? This is on Solaris 10 5/09 s10s_u7. > Please send the result of zpool status. Your devices are probably all offline but that shouldn't stop you from removing it, at least not on OpenSolaris. -- Giovanni ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On Fri, March 19, 2010 11:33, Darren J Moffat wrote: > On 19/03/2010 16:11, joerg.schill...@fokus.fraunhofer.de wrote: >> Darren J Moffat wrote: >> >>> I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet >>> the implication is that a tar archive stored on a tape is considered a >>> backup ? >> >> You cannot get a single file out of the zfs send datastream. > > I don't see that as part of the definition of a backup - you obviously > do - so we will just have to disagree on that. I used to. Now I think more in terms of getting it from a snapshot maintained online on the original storage server. The overall storage strategy has to include retrieving files lost due to user error over some time period, whether that's months or years. And having to restore an entire 100TB backup to "spare disk" somewhere to get one file is clearly not on. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 19/03/2010 16:11, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffat wrote: I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet the implication is that a tar archive stored on a tape is considered a backup ? You cannot get a single file out of the zfs send datastream. I don't see that as part of the definition of a backup - you obviously do - so we will just have to disagree on that. ZFS system attributes (as used by the CIFS server and locally) ? star does support such things for Linux and FreeBSD, the problem on Solaris is that the documentation of the interfaces for this Solaris local feature is poor. The was Sun tar archives the attibutes is non-portable. Could you point to documentation? getattrat(3C) / setattrat(3C) Even has example code in it. This is what ls(1) uses. It could be easily possible to add portable support integrated into the framework that already supports FreeBSD and Linux attributes. Great, do you have a time frame for when you will have this added to star then ? ZFS dataset properties (compression, checksum etc) ? Where is the documentation of the interfaces? There isn't any for those because the libzfs interfaces are currently still private. The best you can currently do is to parse the output of 'zfs list' eg. zfs list -H -o compression rpool/export/home Not ideal but it is the only publicly documented interface for now. As long as there is no interface that supports what I did discuss with Jeff Bonwick in September 2004: - A public interface to get the property state That would come from libzfs. There are private interfaces just now that are very likely what you need zfs_prop_get()/zfs_prop_set(). They aren't documented or public though and are subject to change at any time. - A public interface to read the file raw in compressed form I think you are missing something about how ZFS works here. Files aren't in a compressed form. Some blocks of a file may be compressed if compression is enabled on the dataset. Note that for compression and checksum properties they only indicate what algorithm will be used to compress (or checksum) blocks for new writes. It doesn't say what algorithm the blocks of a given file are compressed with. In fact for any given file some blocks may be compressed and some not. The reasons for a block not being compressed include: 1) it didn't compress 2) it was written when compression=off 3) it didn't compress enough. It is even possible that if the user changed the value of compression blocks within a file are compressed with a different algorithm. So you won't ever get this because ZFS just doesn't work like that. In fact even 'zfs send' doesn't even store compressed data. The 'zfs send' stream has the blocks in the form that they exist in the in memory ARC ie uncompressed. In kernel it is possible to ask for a block in its RAW (ie compressed) form but that is only for consumers of arc_read() and zio_read() - way way way below the ZPL layer and applications like star. - A public interface to write the file raw in compressed form Not even a private API exists for this. There is no capability to send a RAW (ie compressed) block to arc_write() or zio_write(). I am not sure whether this is of relevance for a backup. If there is a need to change the states, on a directory base, there is a need for an easy to use public interface. I don't understand what you mean by that, can you give me an example. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Fri, 19 Mar 2010, David Dyer-Bennet wrote: I don't think of stream crypto as inherently including validity checking, though in practice I suppose it would always be a good idea. This is obviously a vital and necessary function of ssh in order to defend against "man in the middle" attacks. The main requirement is to make sure that the transferred data can not be deciphered or modified by something other than the two end-points. I don't know if ssh includes retry logic to request that modified data be retransmitted. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool I/O error
Hi all, I'm trying to delete a zpool and when I do, I get this error: # zpool destroy oradata_fs1 cannot open 'oradata_fs1': I/O error # The pools I have on this box look like this: #zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT oradata_fs1 532G 119K 532G 0% DEGRADED - rpool 136G 28.6G 107G21% ONLINE - # Why can't I delete this pool? This is on Solaris 10 5/09 s10s_u7. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Darren J Moffat wrote: > I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet > the implication is that a tar archive stored on a tape is considered a > backup ? You cannot get a single file out of the zfs send datastream. > >>ZFS system attributes (as used by the CIFS server and locally) ? > > > > star does support such things for Linux and FreeBSD, the problem on Solaris > > is > > that the documentation of the interfaces for this Solaris local feature is > > poor. > > The was Sun tar archives the attibutes is non-portable. > > > > Could you point to documentation? > > getattrat(3C) / setattrat(3C) > > Even has example code in it. > > This is what ls(1) uses. It could be easily possible to add portable support integrated into the framework that already supports FreeBSD and Linux attributes. > >>ZFS dataset properties (compression, checksum etc) ? > > > > Where is the documentation of the interfaces? > > There isn't any for those because the libzfs interfaces are currently > still private. The best you can currently do is to parse the output of > 'zfs list' eg. > zfs list -H -o compression rpool/export/home > > Not ideal but it is the only publicly documented interface for now. As long as there is no interface that supports what I did discuss with Jeff Bonwick in September 2004: - A public interface to get the property state - A public interface to read the file raw in compressed form - A public interface to write the file raw in compressed form I am not sure whether this is of relevance for a backup. If there is a need to change the states, on a directory base, there is a need for an easy to use public interface. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Fri, March 19, 2010 09:49, Bob Friesenhahn wrote: > On Fri, 19 Mar 2010, David Dyer-Bennet wrote: >> >> However, these legacy mechanisms aren't guaranteed to give you the >> less-than-one-wrong-bit-in-10^15 level of accuracy people tend to want >> for >> enterprise backups today (or am I off a couple of orders of magnitude >> there?). They were defined when data rates were much slower and data >> volumes much lower. > > Are you sure? Have you done any research on this? You are saying > that NSA+-grade crypto on the stream is insufficient to detect a > modification to the data? I was referring to the tcp and hardware-level checksums. I specifically said I didn't know if SSH did anything on top of that (other people have since said that it does, and it might well be plenty good enough; also that ZFS itself has checksums in the send stream). I don't think of stream crypto as inherently including validity checking, though in practice I suppose it would always be a good idea. > It seems that the main failure mode would be disconnect by ssh. Sure, can't guarantee against aborted connections at whatever level (actual interruption of IP connectivity). But those are generally detected and reported as an error; one shouldn't be left with the impression the transfer succeeded. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Damon (and others) For those wanting the ability to perform file backups/restores along with all metadata, without resorting to third party applications, if you have a Sun support contract, log a call asking that your organisation be added to the list of users who wants to see RFE #5004379 "want comprehensive backup strategy" implemented. I logged this last month and was told there are now 5 organisations asking for this. Considering this topic seems to crop up regularly on zfs-discuss, I'm guessing the actual number of people is higher but people don't know how to register their interest. JR ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> One of the reasons I am investigating solaris for > this is sparse volumes and dedupe could really help > here. Currently we use direct attached storage on > the dom0s and allocate an LVM to the domU on > creation. Just like your example above, we have lots > of those "80G to start with please" volumes with 10's > of GB unused. I also think this data set would > dedupe quite well since there are a great many > identical OS files across the domUs. Is that > assumption correct? This is one reason I like NFS - thin by default, and no wasted space within a zvol. zvols can be thin as well, but opensolaris will not know the inside format of the zvol, and you may still have a lot of wasted space after a while as files inside of the zvol come and go. In theory dedupe should work well, but I would be careful about a possible speed hit. > I've not seen an example of that before. Do you mean > having two 'head units' connected to an external JBOD > enclosure or a proper HA cluster type configuration > where the entire thing, disks and all, are > duplicated? I have not done any type of cluster work myself, but from what I have read on Sun's site, yes, you could connect the same jbod to two head units, active/passive, in an HA cluster, but no duplicate disks/jbod. When the active goes down, passive detects this and takes over the pool by doing an import. During the import, any outstanding transactions on the zil are replayed, whether they are on a slog or not. I believe this is how Sun does it on their open storage boxes (7000 series). Note - two jbods could be used, one for each head unit, making an active/active setup. Each jbod is active on one node, passive on the other. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
On Fri, 19 Mar 2010, Khyron wrote: Getting better FireWire performance on OpenSolaris would be nice though. Darwin drivers are open...hmmm. OS-X is only (legally) used on Apple hardware. Has anyone considered that since Firewire is important to Apple, they may have selected a particular Firewire chip which performs particularly well? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rethinking my zpool
You will get much better random IO with mirrors, and better reliability when a disk fails with raidz2. Six sets of mirrors are fine for a pool. From what I have read, a hot spare can be shared across pools. I think the correct term would be "load balanced mirrors", vs RAID 10. What kind of performance do you need? Maybe raidz2 will give you the performance you need. Maybe not. Measure the performance of each configuration and decide for yourself. I am a big fan of iometer for this type of work. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
On 19/03/2010 14:57, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffat wrote: That assumes you are writing the 'zfs send' stream to a file or file like media. In many cases people using 'zfs send' for they backup strategy are they are writing it back out using 'zfs recv' into another pool. In those cases the files can even be restored over NFS/CIFS by using the .zfs/snapshot directory If you unpack the datastream from zfs send on a machine on a different location that is safe against e.g. a fire that destroys the main machine, you may call it a backup. I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet the implication is that a tar archive stored on a tape is considered a backup ? ZFS system attributes (as used by the CIFS server and locally) ? star does support such things for Linux and FreeBSD, the problem on Solaris is that the documentation of the interfaces for this Solaris local feature is poor. The was Sun tar archives the attibutes is non-portable. Could you point to documentation? getattrat(3C) / setattrat(3C) Even has example code in it. This is what ls(1) uses. ZFS dataset properties (compression, checksum etc) ? Where is the documentation of the interfaces? There isn't any for those because the libzfs interfaces are currently still private. The best you can currently do is to parse the output of 'zfs list' eg. zfs list -H -o compression rpool/export/home Not ideal but it is the only publicly documented interface for now. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Mike Gerdts wrote: > > another server, where the data is immediately fed through "zfs receive" then > > it's an entirely viable backup technique. > > Richard Elling made an interesting observation that suggests that > storing a zfs send data stream on tape is a quite reasonable thing to > do. Richard's background makes me trust his analysis of this much > more than I trust the typical person that says that zfs send output is > poison. If it is on tape you can restore the whole filesystem if you have a new empty one to restore to but you cannot do all the typical usages of backups. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Darren J Moffat wrote: > That assumes you are writing the 'zfs send' stream to a file or file > like media. In many cases people using 'zfs send' for they backup > strategy are they are writing it back out using 'zfs recv' into another > pool. In those cases the files can even be restored over NFS/CIFS by > using the .zfs/snapshot directory If you unpack the datastream from zfs send on a machine on a different location that is safe against e.g. a fire that destroys the main machine, you may call it a backup. > > Star implements incremental backups and restores based on POSIX compliant > > archives. > > ZFS filesystem have functionality beyond POSIX and some of that is > really very important for some people (especially those using CIFS) As I mentioned many times in the past, star in contrary to other archives I know has the right infrastructure that allows to add support for additional metadata easily. The main problem seems to be that some people from inside Sun signal that they are not interested in star and that this discourages customers that do not maintain their own sw infrastructure. Adding missing features on the other side only makes sens if there is interes in using these features. > Does Star (or any other POSIX archiver) backup: > ZFS ACLs ? Now that libsec finally supports the needed features, it only needs to be defined and implemented. I am waiting since a few years on a discussion to define the textual format to be used in the tar headers... > ZFS system attributes (as used by the CIFS server and locally) ? star does support such things for Linux and FreeBSD, the problem on Solaris is that the documentation of the interfaces for this Solaris local feature is poor. The was Sun tar archives the attibutes is non-portable. Could you point to documentation? > ZFS dataset properties (compression, checksum etc) ? Where is the documentation of the interfaces? > If it doesn't then it is providing an "archive" of the data in the > filesystem, not a full/incremental copy of the ZFS dataset. Which > depending on the requirements of the backup may not be enough. In > otherwords you have data/metadata missing from your backup. > > The only tool I'm aware of today that provides a copy of the data, and > all of the ZPL metadata and all the ZFS dataset properties is 'zfs send'. I encourage you to collaborate... Provide information for documentation in the interfaces and help to discuss the archive format extensions for the missing features. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Fri, 19 Mar 2010, David Dyer-Bennet wrote: However, these legacy mechanisms aren't guaranteed to give you the less-than-one-wrong-bit-in-10^15 level of accuracy people tend to want for enterprise backups today (or am I off a couple of orders of magnitude there?). They were defined when data rates were much slower and data volumes much lower. Are you sure? Have you done any research on this? You are saying that NSA+-grade crypto on the stream is insufficient to detect a modification to the data? It seems that the main failure mode would be disconnect by ssh. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
On Fri, March 19, 2010 02:28, homerun wrote: > Greetings > > I would like to get your recommendation how setup new pool. > > I have 4 new 1.5TB disks reserved to new zpool. > I planned to crow/replace existing small 4 disks ( raidz ) setup with new > bigger one. > > As new pool will be bigger and will have more personally important data to > be stored long time, i like to ask your recommendations should i create > recreate pool or just replace existing devices. Replacing existing drives runs risks to the data -- you're deliberately reducing yourself to no redundancy for a while (while the resilver happens). It would probably be faster, and definitely safer, to back up the data, recreate the pool, and restore the data. > I have noted there is now raidz2 and been thinking witch woul be better. > A pool with 2 mirrors or one pool with 4 disks raidz2 A pool with 2 mirrors will have the same available space as a 4-disk raidz2. It will generally perform better. For small numbers of disks, I'm a big fan of using mirrors rather than RAIDZ. I've got an 8-disk hot-swap bay currently occupied by 3 2-disk pairs (with 2 slots for future expansion; maybe a hot spare, and a space to attach an additional disk during upgrades). When expanding a vdev by replacing devices, it can be done much more safely with a mirror than a RAIDZ group. With a mirror, you can attach a THIRD disk (in fact you can attach any number; one guy wrote about creating a 47-way mirror). So, instead of replacing one disk with a bigger one (eliminating your redundancy during the resilver), attach the bigger one as a third disk. When that resilver is done, you can attach the other new disk, if you have bay space; or detach one of the small disks and THEN attach the other new disk. When the second resilver is done, detach the last small disk, and you have now increased your mirror vdev size without ever reducing your redundancy below 2 copies. There's no equivalent process for a RAIDZ group. > So at least could some explain these new raidz configurations RAIDZ is "single parity" -- one drive is redundant data. A RAIDZ vdev will withstand the failure of one drive without loss of data, but NOT the failure of 2 or more. A RAIDZ pool of N drives (all the same size) has N-1 drives worth of available capacity. RAIDZ2 is "double parity" -- two drives are given to redundant data. A RAIDZ2 vdev will withstand the failure of one or two drives without loss of data, but NOT the failure of 3 or more. A RAIDZ2 pool of N drives (all the same size) has N-2 drives worth of available capacity. A problem with modern large drives is that they take a long time to "resilver" in case of failure and replacement. During that period, if you started with one redundant drive, you're down to no redundant drives, meaning that a failure during the resilver could lose your data. (This is one of the many reasons you should have backups *in addition* to using redundant vdevs). This has driven people to develop higher levels of redundancy in parity schemes, such as RAIDZ2 (and RAIDZ3). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
On Fri, March 19, 2010 00:38, Rob wrote: > Can a ZFS send stream become corrupt when piped between two hosts across a > WAN link using 'ssh'? > > For example a host in Australia sends a stream to a host in the UK as > follows: > > # zfs send tank/f...@now | ssh host.uk receive tank/bar In general, errors would be detected by TCP (or by lower-level hardware media error-checking), and the packet retransmitted. I'm not sure what error-checking ssh does on top of that (if any). However, these legacy mechanisms aren't guaranteed to give you the less-than-one-wrong-bit-in-10^15 level of accuracy people tend to want for enterprise backups today (or am I off a couple of orders of magnitude there?). They were defined when data rates were much slower and data volumes much lower. In addition, memory errors on the receiving host (after the TCP stack turns the data over to the application), if undetected, could leave you with corrupted data; not sure what the probability is there. Every scheme has SOME weak spots. The well-designed ones at least tell you the bit error rate. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Rethinking my zpool
Hello, After being immersed in this list and other ZFS sites for the past few weeks I am having some doubts about the zpool layout on my new server. It's not too late to make a change so I thought I would ask for comments. My current plan to to have 12 x 1.5 TB disks in a what I would normally call a RAID 10 configuration. That doesn't seem to be the right term here, but there are 6 sets of mirrored disks striped together. I know that "smaller" sets of disks are preferred, but how small is small? I am wondering if I should break this into two sets of 6 disks. I do have a 13th disk available as a hot spare. Would it be available for either pool if I went with two? Finally, would I be better off with raidz2 or something else instead of the striped mirrored sets? Performance and fault tolerance are my highest priorities. Thank you, Chris Dunbar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Now, NDMP doesn't do you much good for a locally attached tape drive, as Darren and Svein pointed out. However, provided the software which is installed on this fictional server can talk to the tape in an appropriate way, then all you have to do is pipe "zfs send" into it. Right? What did I miss? Actually there is a case where NDMP is useful when the tape drive is locally attached. If the data server is an appliance that you can not (either technically or by policy or both) install any backup agents onto. The SS7000 falls into this category. The SS7000 allows for a locally attached tape drive. The backup control software runs on another machine and talks with the local NDMP to move the data from local disk to local tape. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> You do not need to mirror the L2ARC devices, as the > system will just hit disk as necessary. Mirroring > sounds like a good idea on the SLOG, but this has > been much discussed on the forums. Ah, ok. > Interesting. I find IOPS is more proportional to the > number of VMs vs disk space. > > User: I need a VM that will consume up to 80G in two > years, so give me an 80G disk. > Me: OK, but recall we can expand disks and > filesystems on the fly, without downtime. > User: Well, that is cool, but 80G to start with > please. > Me: One of the reasons I am investigating solaris for this is sparse volumes and dedupe could really help here. Currently we use direct attached storage on the dom0s and allocate an LVM to the domU on creation. Just like your example above, we have lots of those "80G to start with please" volumes with 10's of GB unused. I also think this data set would dedupe quite well since there are a great many identical OS files across the domUs. Is that assumption correct? > I also believe the SLOG and L2ARC will make using > high RPM disks not as necessary. But, from what I > have read, higher RPM disks will greatly help with > scrubs and reslivers. Maybe two pools - one with fast > mirrored SAS, another with big SATA. Or all SATA, but > one pool with mirrors, another with raidz2. Many > options. But measure to see what works for you. > iometer is great for that, I find. Yes. As part of testing this I had planned to look at the performance of the config and try some other options too, such as using a volume of 2 x mirrors. Its a classic case of balancing performance, cost and redundancy/time to resilver. > One of the benefits of a SLOG on the SAS/SATA bus is > for a cluster. If one node goes down, the other can > bring up the pool, check the ZIL for any necessary > transactions, and apply them. To do this with battery > backed cache, you would need fancy interconnects > between the nodes, cache mirroring, etc. All of those > things that SAN array products do. I've not seen an example of that before. Do you mean having two 'head units' connected to an external JBOD enclosure or a proper HA cluster type configuration where the entire thing, disks and all, are duplicated? Matt. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
> > > > sata > > > disks don't understand the prioritisation, so > > Er, the point was exactly that there is no > discrimination, once the > request is handed to the disk. So, do you say that SCSI drives do understand prioritisation (i.e. TCQ supports the schedule from ZFS), while SATA/NCQ drives don't, or is it just boiling down to what Richard told us, SATA disks being too slow? > If the > internal-to-disk queue is > enough to keep the heads saturated / seek bound, then > a new > high-priority-in-the-kernel request will get to the > disk sooner, but > may languish once there. Thanks. That makes sense to me. > > You can shorten the number of outstanding IO's per > vdev for the pool > overall, or preferably the number scrub will generate > (to avoid > penalising all IO). That sounds like a meaningful approach to addressing bottlenecks caused by zpool scrub to me. >The tunables for each of these > should be found > readily, probably in the Evil Tuning Guide. I think I should try to digest the Evil Tuning Guide occasionally with respect to this topic. Thanks for pointing me to a direction. Maybe what you have suggested above (shorten the number of I/Os issued by scrub) is already possible? If not, I think it would be a meaningful improvement to request. > Disks with write cache effectively do this [command cueing] for > writes, by pretending > they complete immediately, but reads would block the > channel until > satisfied. (This is all for ATA which lacked this, > before NCQ. SCSI > has had these capabilities for a long time). As scrub is about reads, are you saying that this is still a problem with SATA/NCQ drives, or not? I am unsure what you mean at this point. > > > limiting the number of concurrent IO's handed to > the disk to try > > > and avoid saturating the heads. > > > > Indeed, that was what I had in mind. With the > addition that I think > > it is as well necessary to avoid saturating other > components, such > > as CPU. > > Less important, since prioritisation can be applied > there too, but > potentially also an issue. Perhaps you want to keep > the cpu fan > speed/noise down for a home server, even if the scrub > runs longer. Well, the only thing that was really remarkable while scrubbing was CPU load constantly near 100%. I still think that is at least contributing to the collapse of concurrent payload. I.e., it's all about services that take place in Kernel: CIFS, ZFS, iSCSI Mostly, about concurrent load within ZFS itself. That means an implicit trade-off while a file is being provided over CIFS, i.e.. > > AHCI should be fine. In practice if you see actv > 1 > (with a small > margin for sampling error) then ncq is working. Ok, and how is that in respect to mpt? My assertion that mpt will support NCQ is mainly based on the marketing information provided by LSI that these controllers offer NCQ support with SATA drives. How (by which tool) do I get to this "actv" parameter? Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup rollback taking a long time.
My rollback "finished" yesterday after about 7.5 days. It still wasn't ready to receive the last snapshot, so I rm'ed all the files (took 14 hours) and then issued the rollback command again, 2 minutes this time. Ok, I now have many questions, some due to a couple of responses (which don't appear on the http://opensolaris.org/jive website) One response was. "I think it has been shown by others that dedup requires LOTS of RAM and to be safe, an SSD L2ARC, especially with large (multi-TB) datasets. Dedup is still very new, too. People seem to forget that." The other was "My only suggestion is if the machine is still showing any disk activity to try adding more RAM. I don't know this for a fact but it seems that destroying deduped data when the dedup table doesn't fit in RAM is pathologically slow because the entire table is traversed for every deletion, or at least enough of it to hit the disk on every delete. I've seen a couple of people report that the process was able to complete in a sane amount of time after adding more RAM. This information is based on what I remember of past conversations and is all available in the archives as well." I currently have 4 GB of RAM, and can't get anymore in this box (4 x 2 TB hard drives), so it sounds like I need bigger hardware. So the question is how much more. According to one post I have read, the poster claimed that the dedup table would fill 13.4GB for his 1.7 TB file space, assuming this is true (8GB per 1TB), then do modern servers have enough RAM space to use dedup effectively. Is a SSD fast enough, or does the whole DDT need to be held in RAM? I am currently getting a planning a new file server for the company which need to have space for approx 16 TB of files (twice what we are currently using) and this will need to be much more focused to performance. So would the 2 solutions have similar performance, and what results does turning on compress give? Both will have 20 Hard disks (2 rpool, 2 SDD cache, and 14 data as mirrored pairs, and 2 hot spares) non- dedup. 16 x 2 TB giving 14 TB file system space ( 2 spares) 2 x 80 GB SSD cache 16 GB RAM (2 GB for system, 14GB for ZFS, is this fine for non dedup?) dedup ( I am getting a 2.7 ratio at the moment on the secondary backup) 14 x 1 TB giving 6 TB of file system space ( dedup of 2.3 and 2 spare slots for upgrade) 2 x 160 GB SSD cache 64 GB RAM (2GB system, 6GB ZFS, 48 DDT, yes, I know I can't seperate ZFS and DDT.) The second system will be more upgradeable/future proof, but do people think the performance would be similar? Thanks John -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
Thanks !! I will go to raidz2 , seems to be best choise for me.. Data fault tolerance vs amont of space suite for me. Thanks all !!! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import problem
Hello All, I have some problem with the import of pools. On the source system the pools are configured with emcpower devices on slice 2 (emcpower1c) zpool create mypool emcpower1c When i try to do an import on another hosts with mpxio enabled i get this result: pool: ora_system.2 id: 9755850482304172097 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: ora_system.2 UNAVAIL insufficient replicas c3t60060160C9AC1C0088A2B6770331DF11d0s0 UNAVAIL corrupted data Sometimes it try to import from slice 2 but many times another slice. I can workaround this by linking the right slice (2) in a directory and import with zpool import -d $dir Can you explain this behaviour? Rgds, Mark. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
Funny, I thought the same thing up until a couple of years ago when I thought Apple should have bought Sun :-) Cordialement, Erik Ableson +33.6.80.83.58.28 Envoyé depuis mon iPhone On 19 mars 2010, at 09:41, Khyron wrote: Of course, I'm the only person I know who said that Sun should have bought Apple 10 years ago. What do I know? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
On Fri, Mar 19, 2010 at 12:59:39AM -0700, homerun wrote: > Thanks for comments > > So possible choises are : > > 1) 2 2-way mirros > 2) 4 disks raidz2 > > BTW , can raidz have spare ? so is there one posible choise more : > 3 disks raidz with 1 spare ? raidz2 is basically this, with a pre-silvered spare. With an unsilvered spare, you have no redundancy until the resilver completes, and if there are latent errors in the remaining non-redundant disks you may lose data. Other choices: - 4way raidz3 - 4way mirror Same space and fault tolerance, different performance. This is an easier choice, closer (but still not completely) to the nonsensical. Another choice again: - 2 separate pools, each a 2-disk mirror Data in one pool, backed up regularly by snapshot replication to the second. Same space as a 4-way mirror, but this has tolerance to some other kinds of problems that a single pool does not. Better still would be a backup pool in another machine/site. Perhaps the disks you are replacing can go to this purpose? -- Dan. pgpagv9YCTq4k.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/OSOL/Firewire...
I'm also a Mac user. I use Mozy instead of DropBox, but it sounds like DropBox should get a place at the table. I'm about to download it in a few minutes. I'm right now re-cloning my internal HD due to some HFS+ weirdness. I have to completely agree that ZFS would be a great addition to MacOS X, and the best imaginable replacement for HFS+. The file system and associated problems are my only complaint with the entire OS. I guess my browser usage pattern is just too much for HFS+. Of course, I'm the only person I know who said that Sun should have bought Apple 10 years ago. What do I know? Getting better FireWire performance on OpenSolaris would be nice though. Darwin drivers are open...hmmm. On Thu, Mar 18, 2010 at 18:19, David Magda wrote: > On Mar 18, 2010, at 14:23, Bob Friesenhahn wrote: > > On Thu, 18 Mar 2010, erik.ableson wrote: >> >>> >>> Ditto on the Linux front. I was hoping that Solaris would be the >>> exception, but no luck. I wonder if Apple wouldn't mind lending one of the >>> driver engineers to OpenSolaris for a few months... >>> >> >> Perhaps the issue is the filesystem rather than the drivers. Apple users >> have different expectations regarding data loss than Solaris and Linux users >> do. >> > > Apple users (of which I am one) expect things to Just Work. :) > > And there are Apple users and Apple users: > > http://daringfireball.net/2010/03/ode_to_diskwarrior_superduper_dropbox > > If anyone Apple is paying attention, perhaps you could re-open discussions > with now-Oracle about getting ZFS into Mac OS. :) > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
Thanks for comments So possible choises are : 1) 2 2-way mirros 2) 4 disks raidz2 BTW , can raidz have spare ? so is there one posible choise more : 3 disks raidz with 1 spare ? Here i prefer data availibility not performance. And if need sometime to expand / change setup it is then that time problem -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
On Fri, Mar 19, 2010 at 06:34:50PM +1100, taemun wrote: > A pool with a 4-wide raidz2 is a completely nonsensical idea. No, it's not - not completely. > It has the same amount of accessible storage as two striped mirrors. And > would be slower in terms of IOPS, and be harder to upgrade in the future All that is true. If those things weren't as important to you as error recovery, raidz2 make fine sense: a 4-way raidz2 can tolerate the loss of any 2 disks. The mirror pool may die with the loss of the wrong 2 disks. > Just my $0.02 Cost and benefit valuation are left to the user according to their circumstances. -- Dan. pgpl2dPbBOdFY.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
On Fri, Mar 19, 2010 at 2:34 PM, taemun wrote: > A pool with a 4-wide raidz2 is a completely nonsensical idea. It has the > same amount of accessible storage as two striped mirrors. And would be > slower in terms of IOPS, and be harder to upgrade in the future (you'd need > to keep adding four drives for every expansion with raidz2 - with mirrors > you only need to add another two drives to the pool). > Just my $0.02 > but it can survive on failure of 2 random disks in the pool. In striped mirror: mirror1 diskA diskB mirror2 diskC diskD In event diskA and diskB (or diskC and diskD) failed together, entire pool is lost. In raidz2: raidz2-1 diskA diskB diskC diskD Any combination of 2 disks can fail at same time and the pool will still intact. -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q : recommendations for zpool configuration
A pool with a 4-wide raidz2 is a completely nonsensical idea. It has the same amount of accessible storage as two striped mirrors. And would be slower in terms of IOPS, and be harder to upgrade in the future (you'd need to keep adding four drives for every expansion with raidz2 - with mirrors you only need to add another two drives to the pool). Just my $0.02 On 19 March 2010 18:28, homerun wrote: > Greetings > > I would like to get your recommendation how setup new pool. > > I have 4 new 1.5TB disks reserved to new zpool. > I planned to crow/replace existing small 4 disks ( raidz ) setup with new > bigger one. > > As new pool will be bigger and will have more personally important data to > be stored long time, i like to ask your recommendations should i create > recreate pool or just replace existing devices. > > I have noted there is now raidz2 and been thinking witch woul be better. > A pool with 2 mirrors or one pool with 4 disks raidz2 > > So at least could some explain these new raidz configurations > > Thanks > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Ahhh, this has been...interesting...some real "personalities" involved in this discussion. :p The following is long-ish but I thought a re-cap was in order. I'm sure we'll never finish this discussion, but I want to at least have a new plateau or base from which to consider these questions. I've just read through EVERY post to this thread, so I want to recap the best points in the vein of the original thread, and set a new base for continuing the conversation. Personally, I'm less interested in the archival case; rather, I'm looking for the best way to either recover from a complete system failure or recover an individual file or file set from some backup media, most likely tape. Now let's put all of this together, along with some definitions. First, the difference between archival storage (to tape or other) and backup. I think the best definition provided in this thread came from Darren Moffat as well. As Carsten Aulbert mentioned, this discussion is fairly useless until we start using the same terminology to describe a set of actions. For this discussion, I am defining archival as taking the data and placing it on some media - likely tape, but not necessarily - in the simplest format possible that could hopefully be read by another device in the future. This could exclude capturing NTFS/NFSv4/ZFS ACLs, Solaris extended attributes, or zpool properties (aka metadata for purposes of this discussion). With an archive, we may not go back and touch the data for a long time, if ever again. Backup, OTOH, is the act of making a perfect copy of the data to some media (in my interest tape, but again, not necessarily) which includes all of the metadata associated with that data. Such a copy would allow perfect re-creation of the data in a new environment, recovery from a complete system failure, or single file (or file set) recovery. With a backup, we have the expectation that we may need to return to it shortly after it is created, so we have to be able to trust it...now. Data restored from this backup needs to be an exact replica of the original source - ZFS pool and dataset properties, extended attributes, and ZFS ACLs included. Now that I hopefully have common definitions for this conversation (and I hope I captured Darren's meaning accurately), I'll divide this into 2 sections, starting with NDMP. NDMP: For those who are unaware (and to clarify my own understanding), I'll take a moment to describe NDMP. NDMP was invented by NetApp to allow direct backup of their Filers to tape backup servers, and eventually onto tape. It is designed to remove the need for indirect backup by backing up the NFS or CIFS shared file systems on the clients. Instead, we backup the shared file systems directly from the Filer (or other file server - say Fishworks box or OpenSolaris server) to the backup server via the network. We avoid multiple copies of the shared file systems. NDMP is a network-based delivery mechanism to get data from a storage server to a backup server, which is why the backup software must also speak NDMP. Hopefully, my description is mostly accurate, and it is clear why this might be useful for people using (Open)Solaris + ZFS for tape backup or archival purposes. Darren Moffat made the point that NDMP could be used to do the tape splitting, but I'm not sure this is accurate. If "zfs send" from a file server running (Open)Solaris to a tape drive over NDMP is viable -- which it appears to be to me -- then the tape splitting would be handled by the tape backup application. In my world, that's typically NetBackup or some similar enterprise offering. I see no reason why it couldn't be Amanda or Bacula or Arkeia or something else. THIS is why I am looking for faster progress on NDMP. Now, NDMP doesn't do you much good for a locally attached tape drive, as Darren and Svein pointed out. However, provided the software which is installed on this fictional server can talk to the tape in an appropriate way, then all you have to do is pipe "zfs send" into it. Right? What did I miss? ZVOLs and NTFS/NFSv4/ZFS ACLs: The answer is "zfs send" to both of my questions about ZVOLs and ACLs. At the center of all of this attention is "zfs send". As Darren Moffat pointed out, it has all the pieces to do a proper, complete and correct backup. The big remaining issue that I see is how do you place a "zfs send" stream on a tape in a reliable fashion. CR 6936195 would seem to handle one complaint from Svein, Miles Nordin and others about reliability of the send stream on the tape. Again, I think NDMP may help answer this question for file servers without attached tape devices. For those with attached tape devices, what's the equivalent answer? Who is doing this, and how? I believe we've seen Ed Harvey say "NetBackup" and Ian Collins say "NetVault". Do these products capture all the metadata required to call this copy a "backup"? That's my next question. Finally, Damon Atkins said: "But
[zfs-discuss] Q : recommendations for zpool configuration
Greetings I would like to get your recommendation how setup new pool. I have 4 new 1.5TB disks reserved to new zpool. I planned to crow/replace existing small 4 disks ( raidz ) setup with new bigger one. As new pool will be bigger and will have more personally important data to be stored long time, i like to ask your recommendations should i create recreate pool or just replace existing devices. I have noted there is now raidz2 and been thinking witch woul be better. A pool with 2 mirrors or one pool with 4 disks raidz2 So at least could some explain these new raidz configurations Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss