Re: [zfs-discuss] pool layout vs resilver times
On 1/5/13 11:42 AM, "Russ Poyner" wrote: >I'm configuring a box with 24x 3Tb consumer SATA drives >The box is a supermicro with 36 bays controlled through a single LSI >9211-8i. My recollection is that it's far from best practice to have SATA drives connected to a SAS expander; better to either use SAS drives or use one of the Supermicro chassis designs that doesn't use expanders in their backplanes and control the drives with multiple LSI cards. If you've already purchased the configuration as described, you may be a little stuck with it, but my understanding is that the combination of SATA drives and SAS expanders is a large economy-sized bucket of pain. -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sonnet Tempo SSD supported?
On 12/3/12 5:28 PM, "Peter Tripp" wrote: >This product only makes sense if you're trying to run OpenIndiana on a >Mac Pro, which in my experience is more trouble than it's worth, but to >each their own I guess. I could make a case for it in some other environments. Say you're using a SuperMicro 4U chassis with 24x3.5" drives split into two zpools and you'd like to use SSDs for L2ARC and ZIL. If you mirror each ZIL and use single drives for each L2ARC, that's 6 drive bays you'd be sacrificing-- or you could use 3 PCI slots, which might be available depending on your configuration, and lets you combine nearline SAS hard drives (to play nicely with SAS expanders) and SATA SSDs (because SAS SSDs are painfully expensive). Obviously, this all depends on the controller in use on the cards-- I'll probably be getting one to play with in the Jan-Feb timeframe, but as of now I have no knowledge of that subject. -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zvol vs zfs send/zfs receive
On 9/16/12 10:40 AM, "Richard Elling" wrote: >With a zvol of 8K blocksize, 4K sector disks, and raidz you will get 12K >(data >plus parity) written for every block, regardless of how many disks are in >the set. >There will also be some metadata overhead, but I don't know of a metadata >sizing formula for the general case. > >So the bad news is, 4K sector disks with small blocksize zvols tend to >have space utilization more like mirroring. The good news is that >performance >is also more like mirroring. > -- richard Ok, that makes sense. And since there's no way to change the blocksize of a zvol after creation (AFAIK) I can either live with the size, find 3TB drives with 512byte sectors (I think Seagate Constellations would work) and do yet another send/receive, or create a new zvol with a larger blocksize and copy the files from one zvol to the other. (Leaning toward option 3 because the files are mostly largish graphics files and the like.) Thanks for the help! -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zvol vs zfs send/zfs receive
> The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB. > That... doesn't look right. (Comparing zfs list -t snapshot and looking at > the 5.34 ref for the snapshot vs zfs list on the new system and looking at > space used.) > > Is this a problem? Should I be panicking yet? Well, the zfs send/receive finally finished, at a size of 9.56TB (apologies for the HTML, it was the only way I could make the columns readable): root@archive:/home/admin# zfs get all archive1/RichRAID NAMEPROPERTY VALUE SOURCE archive1/RichRAID type volume - archive1/RichRAID creation Fri Sep 14 4:17 2012 - archive1/RichRAID used 9.56T - archive1/RichRAID available 1.10T - archive1/RichRAID referenced9.56T - archive1/RichRAID compressratio 1.00x - archive1/RichRAID reservation none default archive1/RichRAID volsize 5.08T local archive1/RichRAID volblocksize 8K - archive1/RichRAID checksum on default archive1/RichRAID compression offdefault archive1/RichRAID readonly offdefault archive1/RichRAID copies1 default archive1/RichRAID refreservationnone default archive1/RichRAID primarycache alldefault archive1/RichRAID secondarycachealldefault archive1/RichRAID usedbysnapshots 0 - archive1/RichRAID usedbydataset 9.56T - archive1/RichRAID usedbychildren0 - archive1/RichRAID usedbyrefreservation 0 - archive1/RichRAID logbias latencydefault archive1/RichRAID dedup offdefault archive1/RichRAID mlslabel none default archive1/RichRAID sync standard default archive1/RichRAID refcompressratio 1.00x - archive1/RichRAID written 9.56T - So used is 9.56TB, volsize is 5.08TB (which is the amount of data used on the volume). The Mac connected to the FC target sees a 5.6TB volume with 5.1TB used, so that makes sense-- but where did the other 4TB go? (I'm about at the point where I'm just going to create and export another volume on a second zpool and then let the Mac copy from one zvol to the other-- this is starting to feel like voodoo here.) -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zvol vs zfs send/zfs receive
I need a bit of a sanity check here. 1) I have a a RAIDZ2 of 8 1TB drives, so 6TB usable, running on an ancient version of OpenSolaris (snv_134 I think). On that zpool (miniraid) I have a zvol (RichRAID) that's using almost the whole FS. It's shared out via COMSTAR Fibre Channel target mode. I'd like to move that zvol to a newer server with a larger zpool. Sounds like a job for ZFS send/receive, right? 2) Since ZFS send/receive is snapshot-based I need to create a snapshot. Unfortunately I did not realize that zvols require disk space sufficient to duplicate the zvol, and my zpool wasn't big enough. After a false start (zpool add is dangerous when low on sleep) I added a 250GB mirror and a pair of 3GB mirrors to miniraid and was able to successfully snapshot the zvol: miniraid/RichRAID@exportable (I ended up booting off an OI 151a5 USB stick to make that work, since I don't believe snv_134 could handle a 3TB disk). 3) Now it's easy, right? I enabled root login via SSH on the new host, which is running a zpool "archive1" consisting of a single RAIDZ2 of 3TB drives using ashift=12, and did a ZFS send: ZFS send miniraid/RichRAID@exportable | ssh root@newhost zfs receive archive1/RichRAID It asked for the root password, I gave it that password, and it was off and running. GigE ain't super fast, but I've got time. The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB. That... doesn't look right. (Comparing zfs list -t snapshot and looking at the 5.34 ref for the snapshot vs zfs list on the new system and looking at space used.) Is this a problem? Should I be panicking yet? -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Question on 4k sectors
Hi. Is the problem with ZFS supporting 4k sectors or is the problem mixing 512 byte and 4k sector disks in one pool, or something else? I have seen alot of discussion on the 4k issue but I haven't understood what the actual problem ZFS has with 4k sectors is. It's getting harder and harder to find large disks with 512 byte sectors so what should we do? TIA... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zombie damaged zpool won't die
In the beginning, I created a mirror named DumpFiles on FreeBSD. Later, I decided to move those drives to a new Solaris 11 server-- but rather than import the old pool I'd create a new pool. And I liked the DumpFiles name, so I stuck with it. Oops. Now whenever I run zpool import, it shows a faulted zpool that I can't import and can't delete: root@backbone:/home/dpooser# zpool import pool: DumpFiles id: 16375225052759912554 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: DumpFilesFAULTED corrupted data mirror-0 ONLINE c8t5000C5001B03A749d0p0 ONLINE c9t5000C5001B062211d0p0 ONLINE I deleted the new DumpFiles pool; no change. The -f flag doesn't help with the import, and I've deleted the zpool.cache and rebooted without any luck. Any suggestions appreciated-- there is no data on those drives that I'm worried about, but I'd like to get rid of that error. -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 4k sector support in Solaris 11?
If I want to use a batch of new Seagate 3TB Barracudas with Solaris 11, will zpool let me create a new pool with ashift=12 out of the box or will I need to play around with a patched zpool binary (or the iSCSI loopback)? -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Dell with FreeBSD
On 10/19/11 9:14 AM, "Albert Shih" wrote: >When we buy a MD1200 we need a RAID PERC H800 card on the server No, you need a card that includes 2 external x4 SFF8088 SAS connectors. I'd recommend an LSI SAS 9200-8e HBA flashed with the IT firmware-- then it presents the individual disks and ZFS can handle redundancy and recovery. -- Dave Pooser Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZIL, L2ARC, rpool -- partitions and mirrors oh my!
Putting together a server for a friend's recording studio. He's planning to do audio editing off the server, so low latency is a big deal. My plan is to create a pool of two 8-drive RAIDZ2 vdevs and then accelerate them... But how? OS if going to be latest OpenIndiana. I have a pair of 40GB SSDs (Crucial) with good write speeds and a pair of 64GB SSDs ( with good read speeds. I'd like to mirror the root pool. My initial thought was mirror the 40GB SSDs for the ZIL and partition the two 64s; mirror two slices for the rpool and two slices for the L2ARC. If there's a smarter way to do it, suggestions gratefully accepted. My current ZFS storage servers are all built around sustained reads/sustained writes, so tuning the ZIL and L2ARC are still outside my experience. -- Dave Pooser Manager of Information Services Alford Media Services, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Edward Ned Harvey wrote: > Well ... > Slice all 4 drives into 13G and 60G. > Use a mirror of 13G for the rpool. > Use 4x 60G in some way (raidz, or stripe of mirrors) for tank > Use a mirror of 13G appended to tank Hi Edward! Thanks for your post. I think I understand what you are saying but I don't know how to actually do most of that. If I am going to make a new install of Solaris 10 does it give me the option to slice and dice my disks and to issue zpool commands? Until now I have only used Solaris on Intel with boxes and used both complete drives as a mirror. Can you please tell me what are the steps to do your suggestion? I imagine I can slice the drives in the installer and then setup a 4 way root mirror (stupid but as you say not much choice) on the 13G section. Or maybe one root mirror on two slices and then have 13G aux storage left to mirror for something like /var/spool? What would you recommend? I didn't understand what you suggested about appending a 13G mirror to tank. Would that be something like RAID10 without actually being RAID10 so I could still boot from it? How would the system use it? In this setup that will install everything on the root mirror so I will have to move things around later? Like /var and /usr or whatever I don't want on the root mirror? And then I just make a RAID10 like Jim was saying with the other 4x60 slices? How should I move mountpoints that aren't separate ZFS filesystems? > The only conclusion you can draw from that is: First take it as a given > that you can't boot from a raidz volume. Given, you must have one mirror. Thanks, I will keep it in mind. > Then you raidz all the remaining space that's capable of being put into a > raidz... And what you have left is a pair of unused space, equal to the > size of your boot volume. You either waste that space, or you mirror it > and put it into your tank. So RAID10 sounds like the only reasonable choice since there are an even number of slices, I mean is RAIDZ1 even possible with 4 slices? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Hello! > I don't see the problem. Install the OS onto a mirrored partition, and > configure all the remaining storage however you like - raid or mirror or > watever. I didn't understand your point of view until I read the next paragraph. > My personal preference, assuming 4 disks, since the OS is mostly reads and > only a little bit of writes, is to create a 4-way mirrored 100G partition > for the OS, and the remaining 900G of each disk (or whatever) becomes > either a stripe of mirrors or raidz, as appropriate in your case, for the > storagepool. Oh, you are talking about 1T drives and my servers are all 4x73G! So it's a fairly big deal since I have little storage to waste and still want to be able to survive losing one drive. I should have given the numbers at the beginning, sorry. Given this meager storage do you have any suggestions? Thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?
Hello Jim! I understood ZFS doesn't like slices but from your reply maybe I should reconsider. I have a few older servers with 4 bays x 73G. If I make a root mirror pool and swap on the other 2 as you suggest, then I would have about 63G x 4 left over. If so then I am back to wondering what to do about 4 drives. Is raidz1 worthwhile in this scenario? That is less redundancy that a mirror and much less than a 3 way mirror, isn't it? Is it even possible to do raidz2 on 4 slices? Or would 2, 2 way mirrors be better? I don't understand what RAID10 is, is it simply a stripe of two mirrors? Or would it be best to do a 3 way mirror and a hot spare? I would like to be able to tolerate losing one drive without loss of integrity. I will be doing new installs of Solaris 10. Is there an option in the installer for me to issue ZFS commands and set up pools or do I need to format the disks before installing and if so how do I do that? Thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is another drive worth anything? [Summary]
Many thanks to all who responded. I learned a lot from this thread! For now I have decided to make a 3 way mirror because of the read performance. I don't want to take a risk on an unmirrored drive. Instead of replying to everyone separately I am following the Sun Managers system since I read that newsgroup occasionalliy also. Here's a summary of the responses. Jim Klimov wrote: > Well, you can use this drive as a separate "scratch area", as a separate > single-disk pool, without redundancy. You'd have a separate spindle for > some dedicated tasks with data you're okay with losing. I thought about that and I really don't like losing data. I also don't generate much temporary data so I love ZFS because it makes mirroring easy. On my other systems where I don't have ZFS I run hourly backups from drive to drive. Consumer drives are pretty good these days but you never know when one will fail. I had a failure recently on a Linux box and although I didn't lose data because I back up hourly it's still annoying to deal with. If I hadn't had another good drive with that data on it I would have lost critical data. > You can also make the rpool a three-way mirror which may increase read > speeds if you have enough concurrentcy. And when one drive breaks, your > rpool is still mirrored. I think that's the best suggestion. I didn't realize a 3 way mirror would help performance but you and several others said it does, so that's what I will do. Thanks for the suggestions, Jim. Roy pointed out a theoretical 50% read increase when adding the third drive. Thanks Roy! Edward Ned Harvey wrote: > In my benchmarking, I found 2-way mirror reads 1.97x the speed of a single > disk, and a 3-way mirror reads 2.91x a single disk. Always great having hard data to base a decision on! That helped me make my decision! Thanks Edward! Jim Klimov answered a question that came up based on comments that read performance was improved in a three way mirror: > Writes in a mirror are deemed to be not faster than the slowest disk - all > two or three drives must commit a block before it is considered written > (in sync write mode), likewise for TXG sync but with some optimization by > caching and write-coalescing. Thanks Jim! Good to know. Edward Ned Harvey pointed out "If you make it a 3-way mirror, your write performance will be unaffected, but your read performance will increase 50% over a 2-way mirror. All 3 drives can read different data simultaneously for the net effect of 3x a single disk read performance." Bob clarified the theoretical benefit of adding a third drive to a mirror by saying "I think that a read performance increase of (at most) 33.3% is more correct. You might obtain (at most) 50% over one disk by mirroring it. Zfs makes a random selection of which disk to read from in a mirror set so the improvement is not truely linear." Thanks guys, that makes sense. Daniel Carosone suggested keeping the extra drive around in case of a failure and in the meantime using an SSD in the 3rd SATA slot. He pointed out a few other options that could help with performance besides creating a 3 way mirror when he wrote: > Namely, leave the third drive on the shelf as a cold spare, and use the > third sata connector for an ssd, as L2ARC, ZIL or even possibly both > (which will affect selection of which device to use). That's not an option for me right now but I am planning to revisit SSD again when the consumer drives are reliable enough and don't have wear issues. Right now overall integrity and long service life are more important than absolute performance on this box, although since I have the integrity with the ZFS mirror I could add an SSD but I really don't want to deal with another failure as long as I don't have to. I do want additional performance if I can afford it, but not at the expense of possible data loss. Daniel also wrote: > L2ARC is likely to improve read latency (on average) even more than a > third submirror. ZIL will be unmirrored, but may improve writes at an > acceptable risk for development system. If this risk is acceptable, you > may wish to consider whether setting sync=disabled is also acceptable at > least for certain datasets. I don't know what L2ARC is, but I'll take a look on the net. I did hear about ZIL but don't understand it fully, but I figured spending 500G on ZIL would be unwise. By that I mean I understand ZIL doesn't require much storage but if I don't have an identical drive I can't add a drive or slice with less storage than the other drives in a mirror to that mirror, so I would be forced to waste a lot of storage to implement ZIL. > Finally, if you're considering spending money, can you increase the RAM > instead? If so, do that first. This mobo is maxed out at 4G, it's a socket 775 I bought a couple of years ago. I have always seen the benefits to more RAM and I agree with you it helps more than people generally believe. Next time I buy a new box I am hoping to g
Re: [zfs-discuss] Good SLOG devices?
On 3/2/11 9:42 AM, "David Dyer-Bennet" wrote: >Says "call for price". I know what that means, it means "If you have to >ask, you can't afford it. I called. It's $3k -- not a fit for my archive servers, but an interesting idea for a database server I'm building Probably not a great product for the home hobbyist, though. :^) -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/28/11 4:23 PM, "Garrett D'Amore" wrote: >Drives are ordered in the order they are *enumerated* when they *first* >show up in the system. *Ever*. Is the same true of controllers? That is, will c12 remain c12 or /pci@0,0/pci8086,340c@5 remain /pci@0,0/pci8086,340c@5 even if other controllers are active? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/27/11 11:13 PM, "James C. McPherson" wrote: >/pci@0,0/pci8086,340c@5/pci1000,3020@0 >and >/pci@0,0/pci8086,340e@7/pci1000,3020@0 > >which are in different slots on your motherboard and connected to >different PCI Express Root Ports - which should help with transfer >rates amongst other things. Have a look at /usr/share/hwdata/pci.ids >for 340[0-9a-f] after the line which starts with 8086. That's the information I needed; I now have the drives allocated across multiple controllers for the fault-tolerance I was looking for. Thanks for all your help-- not only can I fully, unequivocally retract my "failed bit" crack, but I just ordered two more of these cards for my next project! :^) -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com <http://www.alfordmedia.com/> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/27/11 10:06 PM, "James C. McPherson" wrote: >I've arranged these by devinfo path: > >1st controller > >c10t2d0 >/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0 >c15t5000CCA222E006B6d0 >/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@8/disk@w5000cca222e006b6,0 >c13t5000CCA222DF92A0d0 >/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0 >c12t5000CCA222E0533Fd0 >/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@20/disk@w5000cca222e0533f,0 > >The most likely reason why you're seeing a c10t2d0 is because the >disk is failing to respond in the required fashion for a particular >SCSI INQUIRY command when the disk is attached to the system. That's an inexpensive SSD used as the boot disk, so it's different enough from the other devices I can't say I'm stunned that it behaves differently. >2nd controller >c16t5000CCA222DDD7BAd0 >/pci@0,0/pci8086,340c@5/pci1000,3020@0/iport@2/disk@w5000cca222ddd7ba,0 >3rd controller >c14t5000CCA222DF8FBEd0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0 >c18t5000CCA222DEAFE6d0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@2/disk@w5000cca222deafe6,0 >c19t5000CCA222E0A3DEd0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@4/disk@w5000cca222e0a3de,0 >c20t5000CCA222E046B7d0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@8/disk@w5000cca222e046b7,0 >c17t5000CCA222DF3CECd0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@20/disk@w5000cca222df3cec,0 So I mentioned I'm dense tonight, right? Is the key there where it says 340@, so each controller will have a different letter associated with it and a different number after the @? (That is, presumably in this system there's a 340b@4 and a 340d@6 if I add more drives and try 'format' again?) >>I'd like to revise and extend my remarks and replace that with "a >>suboptimal choice for this project." >Not knowing your other requirements for the project, I'll settle >for this version :) Actually at this point I think I have to re-revise it to "just fine for this project had I brains enough to comprehend the output of 'format'." :^) -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/27/11 4:07 PM, "James C. McPherson" wrote: >I misread your initial email, sorry. No worries-- I probably could have written it more clearly. >So your disks are connected to separate PHYs on the HBA, by virtue >of their cabling. You can see this for yourself by looking at the >iport@xx element in the physical paths: > >1. c13t5000CCA222DF92A0d0 >/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0 > >2. c14t5000CCA222DF8FBEd0 >/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0 > >The "xx" part is a bitmask, starting from 0, which gives you an >indication of which PHY the device is attached to. > >Your disk #1 above is connected to iport@10, which is PHY #4 when >you have x1 ports: > > >PHY iport@ >01 >12 >24 >38 >410 >520 >640 >780 OK, bear with me for a moment because I'm feeling extra dense this evening. The PHY tells me which port on the HBA I'm connected to. What tells me which HBA? That's the information I care most about, and if that information is contained up there I'll do a happy dance and head on in to the office to start building zpools. >With the information above about the PHY/iport relationship, I >hope you can now see better what your physical layout is. Also, >please remember that using MPxIO means you have a single virtual >controller, and the driver stack handles the translation to physical >for you so you don't have to worry about that aspect. Of course, >if you want to worry about it, feel free. Well, I want to make sure that a single controller failure can't cause any of my RAIDz2 vdevs to fault. I know I can do that manually by building the vdevs in such a way that no more than two drives are on a single controller. If the virtual controller is smart enough to do that automagically-- when I'm using SATA disks and a backplane that doesn't support multipathing-- then I have no complaints and I owe you a beer or three the next time you're in the Dallas area. But that seems unlikely to me, and so I think I have to worry about it. I'd love to be wrong, though! >Personally, having worked on the mpt_sas(7d) project, I'm disappointed >that you believe the card and its driver are "a failed bit". I'd like to revise and extend my remarks and replace that with "a suboptimal choice for this project." In fact, if I can't make this work my backup plan is to take some of my storage towers that have only one HBA, put the 9211s in them and grab the LSISAS3081 cards out of those towers for this beast. So those cards will still get productive use -- not a failed bit, at worst just not serving the purpose I had in mind. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/27/11 11:18 AM, "Roy Sigurd Karlsbakk" wrote: >I cannot but agree. On Linux and Windoze (haven't tested FreeBSD), drives >connected to an LSI9211 show up in the correct order, but not on >OI/osol/S11ex (IIRC), and fmtopo doesn't always show a mapping between >device name and slot, since that relies on the SES hardware being >properly supported. The answer I've got for this issue is, it's not an >issue, since it's that way by design etc. This doesn't make sense when >Linux/Windows show the drives in the correct order. IMHO this looks more >like a design flaw in the driver code Especially since the SAS3081 cards work as expected. I guess I'll start looking for some more of the 3Gb SAS controllers and chalk the 9211s up as a failed bit. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com <http://www.alfordmedia.com/> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/27/11 5:15 AM, "James C. McPherson" wrote: >On 27/02/11 05:24 PM, Dave Pooser wrote: >>On 2/26/11 7:43 PM, "Bill Sommerfeld" wrote: >> >>>On your system, c12 is the mpxio virtual controller; any disk which is >>>potentially multipath-able (and that includes the SAS drives) will >>>appear as a child of the virtual controller (rather than appear as the >>>child of two or more different physical controllers). >> >>Hmm... That makes sense, except that my drives are all SATA because I'm >>cheap^H^H^H fiscally conservative. :^) > >They're attached to a SAS hba, which is doing translations for them >using SATL - SAS to ATA Translation Layer. Yeah, but they're still not multipathable, are they? >>'stmsboot -L' displayed no mappings, > >this is because mpt_sas(7d) controllers - which you have - are using >MPxIO by default. Running stmsboot -L will only show mappings if you've >enabled or disabled MPxIO > >> but I went ahead and tried stmsboot >>-d to disable multipathing; > >... and now you have disabled MPxIO, stmsboot -L should show mappings. Nope: locadmin@bigdawg2:~# stmsboot -L stmsboot: MPXIO disabled >>after reboot instead of seeing nine disks on a >>single controller I now see ten different controllers (in a machine that >>has four PCI controllers and one motherboard controller): > >This is a side effect of how your expanders are configured to operate >on your motherboard. But there shouldn't be any expanders in the system-- the front backplane has six SFF-8087 ports to control 24 drives, and the rear backplane has three more SFF-8087 ports to control 12 more drives. Each of those ports is connected directly to an SFF-8087 port on an LSI 9211-8i controller, except that the ninth port is connected to the integrated LSI 2008 controller on the motherboard. >If you're lucky, your expanders and the enclosure that they're >configured into will show up with one or more SES targets. If >that's the case, you might be able to see bay numbers with the >fmtopo command - when you run it as root: > ># /usr/lib/fm/fmd/fmtopo -V > >If this doesn't work for you, then you'll have to resort to the >tried and tested use of dd to /dev/null for each disk, and see >which lights blink. I can live with that-- but I really want to know what (real, not virtual) controllers disks are connected to; I want to build 3 8-disk RAIDz2 vdevs now (with room for a fourth for expansion later) and I really want to make sure each of those vdevs has fewer than three disks per controller so a single controller failure can degrade my vdevs but not kill them. Probably my next step is going to be to take a look with Nexenta Core or FreeBSD (or maybe SolEx11 for a temporary eval) and see if either of those gives me a saner view, but other suggestions would be appreciated. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com <http://www.alfordmedia.com/> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Format returning bogus controller info
On 2/26/11 7:43 PM, "Bill Sommerfeld" wrote: >On your system, c12 is the mpxio virtual controller; any disk which is >potentially multipath-able (and that includes the SAS drives) will >appear as a child of the virtual controller (rather than appear as the >child of two or more different physical controllers). Hmm... That makes sense, except that my drives are all SATA because I'm cheap^H^H^H fiscally conservative. :^) 'stmsboot -L' displayed no mappings, but I went ahead and tried stmsboot -d to disable multipathing; after reboot instead of seeing nine disks on a single controller I now see ten different controllers (in a machine that has four PCI controllers and one motherboard controller): locadmin@bigdawg2:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c10t2d0 /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0 1. c13t5000CCA222DF92A0d0 /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0 2. c14t5000CCA222DF8FBEd0 /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0 3. c15t5000CCA222E006B6d0 /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@8/disk@w5000cca222e006b6,0 4. c16t5000CCA222DDD7BAd0 /pci@0,0/pci8086,340c@5/pci1000,3020@0/iport@2/disk@w5000cca222ddd7ba,0 5. c17t5000CCA222DF3CECd0 /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@20/disk@w5000cca222df3cec,0 6. c18t5000CCA222DEAFE6d0 /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@2/disk@w5000cca222deafe6,0 7. c19t5000CCA222E0A3DEd0 /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@4/disk@w5000cca222e0a3de,0 8. c20t5000CCA222E046B7d0 /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@8/disk@w5000cca222e046b7,0 9. c21t5000CCA222E0533Fd0 /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@20/disk@w5000cca222e0533f,0 So now I'm more baffled than I started. Any other suggestions will be gratefully accepted... -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Format returning bogus controller info
The hardware: SuperMicro 847A chassis (3 drive bays in 4U) -- A means there are 9 SFF-8087 ports on the backplanes, each controlling 4 drives; no expanders here. SuperMicro X8DTH-6F motherboard with integrated LSI 2008 SAS chipset, flashed to IT firmware, connected to one backplane port. Four LSI 9211-8i SAS controllers, flashed to IT firmware, each connected to two backplane ports The OS: OpenSolaris b134, installed off a USB stick created using the instructions at <http://blogs.sun.com/clayb/entry/creating_opensolaris_usb_sticks_is> The problem: While trying to add drives one at a time so I can identify them for later use, I noticed two interesting things: the controller information is unlike any I've seen before, and out of nine disks added after the boot drive all nine are attached to c12 -- and no single controller has more than eight ports. The output of format: locadmin@bigdawg2:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c10t2d0 /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0 1. c12t5000CCA222DDD7BAd0 /scsi_vhci/disk@g5000cca222ddd7ba 2. c12t5000CCA222DEAFE6d0 /scsi_vhci/disk@g5000cca222deafe6 3. c12t5000CCA222DF3CECd0 /scsi_vhci/disk@g5000cca222df3cec 4. c12t5000CCA222DF8FBEd0 /scsi_vhci/disk@g5000cca222df8fbe 5. c12t5000CCA222DF92A0d0 /scsi_vhci/disk@g5000cca222df92a0 6. c12t5000CCA222E0A3DEd0 /scsi_vhci/disk@g5000cca222e0a3de 7. c12t5000CCA222E006B6d0 /scsi_vhci/disk@g5000cca222e006b6 8. c12t5000CCA222E046B7d0 /scsi_vhci/disk@g5000cca222e046b7 9. c12t5000CCA222E0533Fd0 /scsi_vhci/disk@g5000cca222e0533f Specify disk (enter its number): ^C Any suggestions? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vidoe files residing on zfs used from cifs fail to work
On 11/21/10 Nov 21, 8:43 PM, "Harry Putnam" wrote: > When *.mov file reside on a windows host, and assuming your browser > has the right plugins, you can open them with either quicktime player > or firefox (which also uses the quicktime player). > > But I find if the files are on a zfs server the same files fail to > play. > > Is it a local phenomena or a common problem? We don't have that problem, and we have roughly 25TB of QuickTime files on an OpenSolaris box shared over CIFS to mostly Mac clients. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apparent SAS HBA failure-- now what?
On 11/6/10 Nov 6, 2:35 PM, "Khushil Dep" wrote: > Similar to what I've seen before, SATA disks in a 846 chassis with hardware > and transport errors. Though in that occasion it was an E2 chassis with > interposers. How long has this system been up? Is it production or can you > offline and check all firmware on lsi controllers are up to date and match > each other? It's been up for about 6 months. I can offline them. > Do and fmdump -u UUID - V on those faults and get the serial numbers of disks > that have failed. Trial and error unless you wrote down which went where I'm > afraid. Here's the thing, though-- I'm really not at all sure it's the disks that failed. The idea that coincidentally I'm going to have had eight of 24 disks report major errors, all at the same time (because I scrub weekly and didn't catch any errors last scrub), all on the same controller-- well, that seems much less likely than the idea that I just have a bad controller that needs replacing. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apparent SAS HBA failure-- now what?
dy: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t0d0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 8 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 8 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t2d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 16 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t3d0 Soft Errors: 0 Hard Errors: 3 Transport Errors: 13 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t4d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 19 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t5d0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 1 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t6d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 12 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c10t7d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 9 Vendor: ATA Product: Hitachi HDS72202 Revision: A20N Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apparent SAS HBA failure-- now what?
mber of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Response: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Run 'zpool status -x' and replace the bad device. --- -- - TIMEEVENT-ID MSG-ID SEVERITY --- -- - Nov 06 06:33:23 896d10f1-fa11-69bb-ae78-d18a56fd3288 ZFS-8000-HCMajor Fault class : fault.fs.zfs.io_failure_wait Affects : zfs://pool=uberdisk1 faulted but still in service Problem in : zfs://pool=uberdisk1 faulty Description : The ZFS pool has experienced currently unrecoverable I/O failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. Response: No automated response will be taken. Impact : Read and write I/Os cannot be serviced. Action : Make sure the affected devices are connected, then run 'zpool clear'. --- -- - TIMEEVENT-ID MSG-ID SEVERITY --- -- - Nov 06 06:33:30 989d0590-9e27-cd11-cba5-d7dbf7127ce1 ZFS-8000-FDMajor Fault class : fault.fs.zfs.vdev.io Affects : zfs://pool=uberdisk3/vdev=e0209de35309a6f8 faulted but still in service Problem in : zfs://pool=uberdisk3/vdev=e0209de35309a6f8 faulty Description : The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Response: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Run 'zpool status -x' and replace the bad device. --- -- - TIMEEVENT-ID MSG-ID SEVERITY --- -- - Nov 06 06:33:51 a2d736ac-14e9-cbf7-db28-84e25bfd4a3e ZFS-8000-HCMajor Fault class : fault.fs.zfs.io_failure_wait Affects : zfs://pool=uberdisk3 faulted but still in service Problem in : zfs://pool=uberdisk3 faulty Description : The ZFS pool has experienced currently unrecoverable I/O failures. Refer to http://sun.com/msg/ZFS-8000-HC for more information. Response: No automated response will be taken. Impact : Read and write I/Os cannot be serviced. Action : Make sure the affected devices are connected, then run 'zpool clear'. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Apparent SAS HBA failure-- now what?
My setup: A SuperMicro 24-drive chassis with Intel dual-processor motherboard, three LSI SAS3081E controllers, and 24 SATA 2TB hard drives, divided into three pools with each pool a single eight-disk RAID-Z2. (Boot is an SSD connected to motherboard SATA.) This morning I got a cheerful email from my monitoring script: "Zchecker has discovered a problem on bigdawg." The full output is below, but I have one unavailable pool and two degraded pools, with all my problem disks connected to controller c10. I have multiple spare controllers available. First question-- is there an easy way to identify which controller is c10? Second question-- What is the best way to handle replacement (of either the bad controller or of all three controllers if I can't identify the bad controller)? I was thinking that I should be able to shut the server down, remove the controller(s), install the replacement controller(s), check to see that all the drives are visible, run zpool clear for each pool and then do another scrub to verify the problem has been resolved. Does that sound like a good plan? === pool: uberdisk1 state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub in progress for 3h7m, 24.08% done, 9h52m to go config: NAME STATE READ WRITE CKSUM uberdisk1UNAVAIL 55 0 0 insufficient replicas raidz2 UNAVAIL112 0 0 insufficient replicas c9t0d0 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c10t0d0 UNAVAIL 4330 0 experienced I/O failures c10t1d0 REMOVED 0 0 0 c10t2d0 ONLINE 74 0 0 c11t1d0 ONLINE 0 0 0 c11t2d0 ONLINE 0 0 0 errors: 1 data errors, use '-v' for a list pool: uberdisk2 state: DEGRADED scrub: scrub in progress for 3h3m, 32.26% done, 6h24m to go config: NAME STATE READ WRITE CKSUM uberdisk2DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 c10t3d0 REMOVED 0 0 0 c10t4d0 REMOVED 0 0 0 c11t3d0 ONLINE 0 0 0 c11t4d0 ONLINE 0 0 0 c11t5d0 ONLINE 0 0 0 errors: No known data errors pool: uberdisk3 state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub in progress for 2h58m, 31.95% done, 6h19m to go config: NAME STATE READ WRITE CKSUM uberdisk3DEGRADED 1 0 0 raidz2 DEGRADED 4 0 0 c9t6d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 c10t5d0 ONLINE 5 0 0 c10t6d0 ONLINE 9894 0 c10t7d0 REMOVED 0 0 0 c11t6d0 ONLINE 0 0 0 c11t7d0 ONLINE 0 0 0 c11t8d0 ONLINE 0 0 0 errors: 1 data errors, use '-v' for a list -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Changing vdev controller
I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog devices attached. I had a port go bad on one of my controllers (both are sat2-mv8), so I need to replace it (I have no spare ports on either card). My spare controller is a LSI 1068 based 8 port card. My plan is to remove the l2arc and slog from the pool (to try and minimize any glitches), export the pool, change the controller, re-import and the add the l2arc and slog. Is that basically the correct process, or are there any tips for avoiding potential issues? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS disk space monitoring with SNMP
I just query for the percentage in use via snmp (net-snmp) In my snmpd.conf I have: extend .1.3.6.1.4.1.2021.60 drive15 /usr/gnu/bin/sh /opt/utils/zpools.ksh rpool space and the zpools.ksh is: #!/bin/ksh export PATH=/usr/bin:/usr/sbin:/sbin export LD_LIBRARY_PATH=/usr/lib zpool list -H -o capacity ${1} | sed -e 's/%//g' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup status
Can you provide some specifics to see how bad the writes are? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On 8/14/10 Aug 14, 2:57 PM, "Edward Ned Harvey" wrote: >> Or Btrfs. It may not be ready for production now, but it could become a >> serious alternative to ZFS in one year's time or so. (I have been using > > I will much sooner pay for sol11 instead of use btrfs. Stability & speed & > maturity greatly outweigh a few hundred dollars a year, if you run your > business on it. Flip side is that if Oracle convinces enough people that ZFS is a shrinking market (how long do you think the BSDs will support a proprietary filesystem?) then there will be a lot more interest in the BTRFS project, much of it from the same folks who have experience producing enterprise-grade ZFS. Speaking for myself, if Solaris 11 doesn't include COMSTAR I'm going to have to take a serious look at another alternative for our show storage towers -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
David Dyer-Bennet wrote: On Tue, August 10, 2010 13:23, Dave Pacheco wrote: David Dyer-Bennet wrote: My full backup still doesn't complete. However, instead of hanging the entire disk subsystem as it did on 111b, it now issues error messages. Errors at the end. [...] cannot receive incremental stream: most recent snapshot of bup-wrack/fsfs/zp1/ddb does not match incremental source bash-4.0$ The bup-wrack pool was newly-created, empty, before this backup started. The backup commands were: zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" I don't see how anything could be creating snapshots on bup-wrack while this was running. That pool is not normally mounted (it's on a single external USB drive, I plug it in for backups). My script for doing regular snapshots of zp1 and rpool doesn't reference any of the bup-* pools. I don't see how this snapshot mismatch can be coming from anything but the send/receive process. There are quite a lot of snapshots; dailys for some months, 2-hour ones for a couple of weeks. Most of them are empty or tiny. Next time I will try WITHOUT -v on both ends, and arrange to capture the expanded version of the command with all the variables filled in, but I don't expect any different outcome. Any other ideas? Is it possible that snapshots were renamed on the sending pool during the send operation? I don't have any scripts that rename a snapshot (in fact I didn't know it was possible until just now), and I don't have other users with permission to make snapshots (either delegated or by root access). I'm not using the Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I know what it does). So I don't at the moment see how one would be getting renamed. It's possible that a snapshot was *deleted* on the sending pool during the send operation, however. Also that snapshots were created (however, a newly created one would be after the one specified in the zfs send -R, and hence should be irrelevant). (In fact it's certain that snapshots were created and I'm nearly certain of deleted.) If that turns out to be the problem, that'll be annoying to work around (I'm making snapshots every two hours and deleting them after a couple of weeks). Locks between admin scripts rarely end well, in my experience. But at least I'd know what I had to work around. Am I looking for too much here? I *thought* I was doing something that should be simple and basic and frequently used nearly everywhere, and hence certain to work. "What could go wrong?", I thought :-). If I'm doing something inherently dicey I can try to find a way to back off; as my primary backup process, this needs to be rock-solid. It's certainly a reasonable thing to do and it should work. There have been a few problems around deleting and renaming snapshots as they're being sent, but the delete issues were fixed in build 123 by having zfs_send hold snapshots being sent (as long as you've upgraded your pool past version 18), and it sounds like you're not doing renames, so your problem may be unrelated. -- Dave -- David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
David Dyer-Bennet wrote: My full backup still doesn't complete. However, instead of hanging the entire disk subsystem as it did on 111b, it now issues error messages. Errors at the end. [...] cannot receive incremental stream: most recent snapshot of bup-wrack/fsfs/zp1/ddb does not match incremental source bash-4.0$ The bup-wrack pool was newly-created, empty, before this backup started. The backup commands were: zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" I don't see how anything could be creating snapshots on bup-wrack while this was running. That pool is not normally mounted (it's on a single external USB drive, I plug it in for backups). My script for doing regular snapshots of zp1 and rpool doesn't reference any of the bup-* pools. I don't see how this snapshot mismatch can be coming from anything but the send/receive process. There are quite a lot of snapshots; dailys for some months, 2-hour ones for a couple of weeks. Most of them are empty or tiny. Next time I will try WITHOUT -v on both ends, and arrange to capture the expanded version of the command with all the variables filled in, but I don't expect any different outcome. Any other ideas? Is it possible that snapshots were renamed on the sending pool during the send operation? -- Dave -- David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?
I've been looking at using consumer 2.5" drives also, I think the ones I've settled on are the hitachi 7K500 500 GB. These are 7200 rpm, I'm concerned the 5400's might be a little too low performance wise. The main reasons for hitachi were performance seems to be among the top 2 or 3 in the laptop drive segment, I've found hitachi to be pretty reliable, and perhaps most importantly is there is the hitachi feature tool, which allows you to disable the head unload feature. You don't need to set it on each reboot, plus it's persistent across reboots. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS bug - CVE-2010-2392
Looks like the bug affects through snv_137. Patches are available from the usual location-- <https://pkg.sun.com/opensolaris/support> for OpenSolaris. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
> Ok guys, can we please kill this thread about commodity versus enterprise > hardware? +1 -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] preparing for future drive additions
On 7/14/10 Jul 14, 2:58 PM, "Daniel Taylor" wrote: > I was thinking of mirroring the drives and then converting to raidz some how? Not possible. You can start with a mirror and then add another mirror; the filesystem will spread data across both drives in a way analogous* to RAID 10. *You can't really compare ZFS to conventional RAID implementations, but if you look at it from 50,000 feet and squint you get the similarities. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On 7/12/10 Jul 12, 10:49 AM, "Linder, Doug" wrote: > Out of sheer curiosity - and I'm not disagreeing with you, just wondering - > how does ZFS make money for Oracle when they don't charge for it? Do you > think it's such an important feature that it's a big factor in customers > picking Solaris over other platforms? I'm looking at a new web server for the company, and am considering Solaris specifically because of ZFS. (Oracle's lousy sales model-- specifically the unwillingness to give a price for a Solaris support contract without my having to send multiple emails to multiple addresses-- may yet push me back to my default CentOS platform, but to the extent that Oracle is even in the running it's because of ZFS.) -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Please trim posts
I trimmed, and then got complained at by a mailing list user that the context of what I was replying to was missing. Can't win :P -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to S7000
Martijn de Munnik wrote: I have several home directories on a Solaris server. I want to move these home directories to a S7000 storage. I know I can use zfs send | zfs receive to move zfs filesystems. Can this be done to a S7000 storage using ssh? No. Check out the shadow migration feature, described in the administration guide: http://wikis.sun.com/display/FishWorks/Documentation -- Dave -- David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On 5/2/10 3:12 PM, "Bob Friesenhahn" wrote: > On the flip-side, using 'zfs scrub' puts more stress on the system > which may make it more likely to fail. It increases load on the power > supplies, CPUs, interfaces, and disks. A system which might work fine > under normal load may be stressed and misbehave under scrub. Using > scrub on a weak system could actually increase the chance of data > loss. If my system is going to fail under the stress of a scrub, it's going to fail under the stress of a resilver. From my perspective, I'm not as scared of data corruption as I am of data corruption *that I don't know about.* I only keep backups for a finite amount of time. If I scrub every week, and my zpool dies during a scrub, then I know it's time to pull out last week's backup, where I know (thanks to scrubbing) the data was not corrupt. I've lived the experience where a user comes to me because he tried to open a seven-year-old file and it was corrupt. Not a blankety-blank thing I could do, because we only retain backup tapes for four years and the four-year-old tape had a backup of the file post-corruption. Data loss may be unavoidable, but that's why we keep backups. It's the invisible data loss that makes life suboptimal. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On 4/26/10 10:10 AM, "Richard Elling" wrote: > SAS shines with multiple connections to one or more hosts. Hence, SAS > is quite popular when implementing HA clusters. So that would be how one builds something like the active/active controller failover in standalone RAID boxes. Is there a good resource on doing something like that with an OpenSolaris storage server? I could see that as a project I might want to attempt. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Identifying drives
I have one storage server with 24 drives, spread across three controllers and split into three RAIDz2 pools. Unfortunately, I have no idea which bay holds which drive. Fortunately, this server is used for secondary storage so I can take it offline for a bit. My plan is to use zpool export to take each pool offline and then dd to do a sustained read off each drive in turn and watch the blinking lights to see which drive is which. In a nutshell: zpool export uberdisk1 zpool export uberdisk2 zpool export uberdisk3 dd if=/dev/rdsk/c9t0d0 of=/dev/null dd if=/dev/rdsk/c9t1d0 of=/dev/null [etc. 22 more times] zpool import uberdisk1 zpool import uberdisk2 zpool import uberdisk3 Are there any glaring errors in my reasoning here? My thinking is I should probably identify these disks before any problems develop, in case of erratic read errors that are enough to make me replace the drive without being enough to make the hardware ID it as bad. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
I'm building another 24-bay rackmount storage server, and I'm considering what drives to put in the bays. My chassis is a Supermicro SC846A, so the backplane supports SAS or SATA; my controllers are LSI3081E, again supporting SAS or SATA. Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM drive in both SAS and SATA configurations; the SAS model offers one quarter the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and costs 10% more than its enterprise SATA twin. (They also offer a Barracuda XT SATA drive; it's roughly 20% less expensive than the Constellation drive, but rated at 60% the MTBF of the others and a predicted rate of nonrecoverable errors an order of magnitude higher.) Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On 4/25/10 6:11 PM, "Rich Teer" wrote: > I tried going to that URL, but got a 404 error... :-( What's the correct > one, please? <http://code.google.com/p/maczfs/> -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On 4/25/10 6:07 PM, "Rich Teer" wrote: > Sounds fair enough! Let's move this to email; meanwhile, what's the > packet sniffing incantation I need to use? On Solaris I'd use snoop, > but I don't htink Mac OS comes with that! Use Wireshark (formerly Ethereal); works great for me. It does require X11 on your machine. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
> IMHO, whether a dedicated log device needs redundancy > (mirrored), should > be determined by the dynamics of each end-user > environment (zpool version, > goals/priorities, and budget). > Well, I populate a chassis with dual HBAs because my _perception_ is they tend to fail more than other cards. Please help me with my perception of the X1. :-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
Or, DDRDrive X1 ? Would the X1 need to be mirrored? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
The Acard device mentioned in this thread looks interesting: http://opensolaris.org/jive/thread.jspa?messageID=401719񢄷 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
> > On 18 apr 2010, at 00.52, Dave Vrona wrote: > > > Ok, so originally I presented the X-25E as a > "reasonable" approach. After reading the follow-ups, > I'm second guessing my statement. > > > > Any decent alternatives at a reasonable price? > > How much is reasonable? :-) How about $1000 per device? $2000 for a mirrored pair. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
Ok, so originally I presented the X-25E as a "reasonable" approach. After reading the follow-ups, I'm second guessing my statement. Any decent alternatives at a reasonable price? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD best practices
> > From: zfs-discuss-boun...@opensolaris.org > [mailto:zfs-discuss- > > boun...@opensolaris.org] On Behalf Of Edward Ned > Harvey > > > > > From: zfs-discuss-boun...@opensolaris.org > [mailto:zfs-discuss- > > > boun...@opensolaris.org] On Behalf Of Dave Vrona > > > > > > > > 2) ZIL write cache. It appears some have > disabled the write cache on > > > the X-25E. This results in a 5 fold performance > hit but it > > eliminates > > > a potential mechanism for data loss. Is this > valid? If I can mirror > > > ZIL, I imagine this is no longer a concern? > > Ahh, I see there may have been some confusion there, > because your question > wasn't asked right. ;-) > > "Disabling ZIL" is not the same thing as "disabling > write cache." Those two > terms are not to be mixed. > My statement was less than perfectly worded. I specifically meant disabling write cache on the X-25e that is holding the ZIL. I certainly didn't mean to imply disabling ZIL. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD best practices
Hi all, I'm planning a new build based on a SuperMicro chassis with 16 bays. I am looking to use up to 4 of the bays for SSD devices. After reading many posts about SSDs I believe I have a _basic_ understanding of a reasonable approach to utilizing SSDs for ZIL and L2ARC. Namely: ZIL: Intel X-25E L2ARC: Intel X-25M So, I am somewhat unclear about a couple of details surrounding the deployment of these devices. 1) Mirroring. Leaving cost out of it, should ZIL and/or L2ARC SSDs be mirrored ? 2) ZIL write cache. It appears some have disabled the write cache on the X-25E. This results in a 5 fold performance hit but it eliminates a potential mechanism for data loss. Is this valid? If I can mirror ZIL, I imagine this is no longer a concern? 3) SATA devices on a SAS backplane. Assuming the main drives are SAS, what impact do the SATA SSDs have? Any performance impact? I realize I could use an onboard SATA controller for the SSDs however this complicates things in terms of the mounting of these drives. thanks ! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
> What do you mean by overpromised and underdelivered? Well, when I did a quick Google search this <http://wordpress.fusetnt.com/2009/08/areca-are-liars-the-arc-1300ix-16-does -not-support-solaris/> was one of the first results I got. (I know, a different card-- but the same company, and if they fudge compatibility information on one product....) -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
Now that Erik has made me all nervous about my "3xRAIDz2 of 8x2TB 7200RPM disks" approach, I'm considering moving forward using more and smaller 2.5" disks instead. The problem is that at eight drives per LSI 3018, I run out of PCIe slots quickly. The ARC-1680 cards would appear to offer greater drive densities, but a quick Google search shows that they've overpromised and underdelivered on Solaris support in the past. Is anybody currently using those cards on OpenSolaris? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: clarification on meaning of the autoreplace propert
> Hi Dave, > > I'm unclear about the autoreplace behavior with one > spare that is > connected to two pools. I don't see how it could work > if the autoreplace > property is enabled on both pools, which formats and > replaces a spare Because I already partitioned the disk into slices. Then I indicated the proper slice as the spare. > disk that might be in-use in another pool (?) Maybe I > misunderstand. > > 1. I think autoreplace behavior might be inconsistent > when a device is > removed. CR 6935332 was filed recently but is not > available yet through > our public bug database. > > 2. The current issue with adding a spare disk to a > ZFS root pool is that > if a root pool mirror disk fails and the spare kicks > in, the bootblock > is not applied automatically. We're working on > improving this > experience. While the bootblock may not have been applied automatically, the root pool did show resilvering, but the storage pool did not (at least per the status report) > > My advice would be to create a 3-way mirrored root > pool until we have a > better solution for root pool spares. That would be sort of a different topic. I'm just interested in understanding the functionality of the hot spare at this point. > > 3. For simplicity and ease of recovery, consider > using your disks as > whole disks, even though you must use slices for the > root pool. I can't do this with a RAID 10 configuration on the storage pool, and a mirrored root pool. I only have places for 5 disks on a 2RU/ 3.5" drive server > If one disk is part of two pools and it fails, two > pool are impacted. Yes. This is why I used slices instead of a whole disk for the hot spare. > The beauty of ZFS is no longer having to deal with > slice administration, > except for the root pool. > > I like your mirror pool configurations but I would > simplify it by > converting store1 to using whole disks, and keep > separate spare disks.` I would have done that from the beginning with more chassis space. > One for the store1 pool, and either create a 3-way > mirrored root pool > or keep a spare disk connected to the system but > unconfigured. I still need confirmation on whether the hot spare function will work with slices. I saw no errors when executing the commands for the hot spare slices, but I got this funny response when I ran the test > Dave -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS: clarification on meaning of the autoreplace property
>From pages 29,83,86,90 and 284 of the 10/09 Solaris ZFS Administration guide, it sounds like a disk designated as a hot spare will: 1. Automatically take the place of a bad drive when needed 2. The spare will automatically be detached back to the spare pool when a new device is inserted and brought up to replace the original compromised one. Should this work the same way for slices? I have four active disks in a RAID 10 configuration, for a storage pool, and the same disks are used for mirrored root configurations, but only only one of the possible mirrored root slice pairs is currently active. I wanted to designate slices on a 5th disk as hot spares for the two existing pools, so after partitioning the 5th disk (#4) identical to the four existing disks, I ran: # zpool add rpool spare c0t4d0s0 # zpool add store1 spare c0t4d0s7 # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 c0t1d0s0 ONLINE 0 0 0 spares c0t4d0s0AVAIL errors: No known data errors pool: store1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM store1ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s7 ONLINE 0 0 0 c0t1d0s7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0s7 ONLINE 0 0 0 c0t3d0s7 ONLINE 0 0 0 spares c0t4d0s7AVAIL errors: No known data errors -- So It looked like everything was set up how I was hoping until I emulated a disk failure by pulling one of the online disks. The root pool responded how I expected, but the storage pool, on slice 7, did not appear to perform the autoreplace: Not too long after pulling one of the online disks: # zpool status pool: rpool state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress for 0h0m, 10.02% done, 0h5m to go config: NAMESTATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirrorDEGRADED 0 0 0 c0t0d0s0ONLINE 0 0 0 spare DEGRADED84 0 0 c0t1d0s0 REMOVED 0 0 0 c0t4d0s0 ONLINE 0 084 329M resilvered spares c0t4d0s0 INUSE currently in use errors: No known data errors pool: store1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM store1ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s7 ONLINE 0 0 0 c0t1d0s7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0s7 ONLINE 0 0 0 c0t3d0s7 ONLINE 0 0 0 spares c0t4d0s7AVAIL errors: No known data errors I was able to convert the state of store1 to DEGRADED by writing to a file in that storage pool, but it always listed the spare as available. This at the same time as showing c0t1d0s7 as REMOVED in the same pool Based on the manual, I expected the system to bring a reinserted disk back on line automatically, but zpool status still showed it as "REMOVED". To get it back on line: # zpool detach rpool c0t4d0s0 # zpool clear rpool # zpool clear store1 Then status showed *both* pools resilvering. So the questions are: 1. Does autoreplace work on slices, or just complete disks? 2. Is there a problem replacing a "bad" disk with the same disk to get the autoreplace function to work? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Listing snapshots in a pool
Try: zfs list -r -t snapshot zp1 -- Dave On 2/21/10 5:23 PM, David Dyer-Bennet wrote: I thought this was simple. Turns out not to be. bash-3.2$ zfs list -t snapshot zp1 cannot open 'zp1': operation not applicable to datasets of this type Fails equally on all the variants of pool name that I've tried, including "zp1/" and "zp1/@" and such. You can do "zfs list -t snapshot" and get a list of all snapshots in all pools. You can do "zfs list -r -t snapshot zp1" and get a recursive list of snapshots in zp1. But you can't, with any options I've tried, get a list of top-level snapshots in a given pool. (It's easy, of course, with grep, to get the bigger list and then filter out the subset you want). Am I missing something? Has this been added after snv_111b? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Speed question: 8-disk RAIDZ2 vs 10-disk RAIDZ3
> If I go to 10x 2TB in a RAIDZ3, will the extra spindles increase > speed, or will the extra parity writes reduce speed, or will the two factors > offset and leave things a wash? I should mention that the usage of this system is as storage for large (5-300GB) video files, so what's most important is sequential write speed. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Speed question: 8-disk RAIDZ2 vs 10-disk RAIDZ3
I currently am getting good speeds out of my existing system (8x 2TB in a RAIDZ2 exported over fibre channel) but there's no such thing as too much speed, and these other two drive bays are just begging for drives in them If I go to 10x 2TB in a RAIDZ3, will the extra spindles increase speed, or will the extra parity writes reduce speed, or will the two factors offset and leave things a wash? (My goal is to be able to survive one controller failure, so if I add more drives I'll have to add redundancy to compensate for the fact that one controller would then be able to take out three drives.) I've considered adding a drive for the ZIL instead, but my experiments in disabling the ZIL (using the evil tuning guide at <http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabl ing_the_ZIL_.28Don.27t.29>) didn't show any speed increase. (I know it's a bad idea run the system with ZIL disabled; I disabled it only to measure its impact on my write speeds and re-enabled it after testing was complete.) Current system: OpenSolaris dev release b132 Intel S5500BC mainboard (latest firmware) Intel E5506 Xeon 2.13GHz 8GB RAM 3x LSI 3018 PCIe SATA controllers (latest IT firmware) 8x 2TB Hitachi 7200RPM SATA drives (2 connected to each LSI and 2 to motherboard SATA ports) 2x 60GB Imation M-class SSD (boot mirror) Qlogic 2440 PCIe Fibre Channel HBA -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> I'm off to straighten out my controller distribution, check to see if I have > write caching turned off on the motherboard ports, install the b132 build, > and possibly grab some dinner while I'm about it. I'll report back to the > list with any progress or lack thereof. OK, the issue seems to be resolved now-- I'm seeing write speeds in excess of 160MB/s. What I did to fix things: 1) Redistributed drives across controllers to match my actual configuration-- thanks to Nigel for pointing that one out 2) Set my motherboard controller to AHCI mode-- thanks to Richard and Thomas for suggesting that. Once I made that change I no longer saw the "raidz contains devices of different sizes" error, so it looks like Bob was right about the source of that error 3) Upgraded to OpenSolaris 2010.03 preview b132 which appears to correct a problem in 2009.06 where iSCSI (and apparently FC) forced all writes to be synchronous -- thanks to Richard for that pointer. Five hours from tearing my hair out to toasting a success-- this list is a great resource! -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> on my motherboard, i can make the onboard sata ports show up as IDE or SATA, > you may look into that. It would probably be something like AHCI mode. Yeah, I changed the motherboard setting from "enhanced" to AHCI and now those ports show up as SATA. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
> So which hard drives are connected to which controllers? > And what device drivers are those controllers using? 0. c7t0d0 /p...@0,0/pci8086,3...@3/pci1000,3...@0/s...@0,0 1. c7t1d0 /p...@0,0/pci8086,3...@3/pci1000,3...@0/s...@1,0 2. c8t0d0 /p...@0,0/pci8086,3...@7/pci1000,3...@0/s...@0,0 3. c8t1d0 /p...@0,0/pci8086,3...@7/pci1000,3...@0/s...@1,0 4. c9t0d0 /p...@0,0/pci8086,3...@9/pci1000,3...@0/s...@0,0 5. c9t1d0 /p...@0,0/pci8086,3...@9/pci1000,3...@0/s...@1,0 6. c10d0 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 7. c10d1 /p...@0,0/pci-...@1f,2/i...@0/c...@1,0 8. c11d0 /p...@0,0/pci-...@1f,2/i...@1/c...@0,0 9. c11d1 /p...@0,0/pci-...@1f,2/i...@1/c...@1,0 > Strange that you say > that there are two hard drives > per controllers, but three drives are showing > high %b. > > And strange that you have c7,c8,c9,c10,c11 > which looks like FIVE controllers! c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard has 6 SATA ports which are presented as two controllers (presumably c10 and c11) one for ports 0-3 and one for ports 4 and 5; both currently use the PCI-IDE drivers. And as you say, it's odd that there are three drives on c10 and c11, since they should have only two of the raidz2 drives; I need to go double-check my cabling. The way it's *supposed* to be configured is: c7: two RAIDZ2 drives and one of the boot mirror drives c8: two RAIDZ2 drives c9: two RAIDZ2 drives c10: one RAIDZ2 drive and one of the boot mirror drives c11: one RAIDZ2 drive (The theory here is that since this server is going to spend its life being shipped places in the back of a truck I want to make sure that no single controller failure can either render it unbootable or destroy the RAIDZ2.) That said, I think that this is probably *a* tuning problem but not *the* tuning problem, since I was getting acceptable performance over CIFS and miserable performance over FC. Richard Elling suggested I try the latest dev release to see if I'm encountering a bug that forces synchronous writes, so I'm off to straighten out my controller distribution, check to see if I have write caching turned off on the motherboard ports, install the b132 build, and possibly grab some dinner while I'm about it. I'll report back to the list with any progress or lack thereof. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export
6 c8t0d0 0.0 191.00.0 1816.2 0.0 0.10.00.5 0 6 c8t1d0 0.0 191.00.0 1816.2 0.0 0.10.00.5 0 6 c9t0d0 0.0 191.00.0 1816.2 0.0 0.10.00.5 0 6 c9t1d0 -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
Ah, I didn't see the original post. If you're using an old COMSTAR version prior to build 115, maybe the metadata placed at the first 64K of the volume is causing problems? http://mail.opensolaris.org/pipermail/storage-discuss/2009-September/007192.html The clone and create-lu process works for mounting cloned volumes under linux with b130. I don't have any windows clients to test with. -- Dave On 2/8/10 11:23 AM, Scott Meilicke wrote: Sure, but that will put me back into the original situation. -Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
Use create-lu to give the clone a different GUID: sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab -- Dave On 2/8/10 10:34 AM, Scott Meilicke wrote: Thanks Dan. When I try the clone then import: pfexec zfs clone data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 data01/san/gallardo/g-testandlab pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab The sbdadm import-lu gives me: sbdadm: guid in use which makes sense, now that I see it. The man pages make it look like I cannot give it another GUID during the import. Any other thoughts? I *could* delete the current lu, import, get my data off and reverse the process, but that would take the current volume off line, which is not what I want to do. Thanks, Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + fsck
Thanks for taking the time to write this - very useful info :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Crazy Phantom Zpools Again
I just did a fresh reinstall of OpenSolaris and I'm again seeing the phenomenon described in http://article.gmane.org/gmane.os.solaris.opensolaris.zfs/26259 which I posted many months ago and got no reply to. Can someone *please* help me figure out what's going on here? Thanks in Advance, -- Dave Abrahams BoostPro Computing http://boostpro.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS commands hang after several zfs receives
> The case has been identified and I've just received > an IDR,which I will > test next week. I've been told the issue is fixed in > update 8, but I'm > not sure if there is an nv fix target. > Anyone know if there Is an opensolaris fix for this issue and when? These seem to be related. http://www.opensolaris.org/jive/thread.jspa?threadID=112808 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool status OK but zfs filesystem seems hung
Thanks for the reply but this seems to be a bit different. a couple of things I failed to mention; 1) this is a secondary pool and not the root pool. 2) the snapshot are trimmed to only keep 80 or so. The system boots and runs fine. It's just an issue for this secondary pool and filesystem.It seems to be directly related to I/O intensive operations as the (full) backup seems to trigger it, never seen it happen with incremental backups... Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool status OK but zfs filesystem seems hung
Hello all, I have a situation where zpool status shows no known data errors but all processes on a specific filesystem are hung. This has happened 2 times before since we installed Opensolaris 2009.06 snv_111b. For instance there are two files systems in this pool 'zfs get all' on one filesystem returns with out issue when ran on the other filesystem it hangs. Also a 'df -h' hangs, etc. This file system has many different operation running on it; 1) It receives incremental snapshot every 30 minutes continuously. 2) every night a clone is made from one of the received snapshot streams then a filesystem backup is taken on that clone (the backup is a directory traversal) once the backup completes the clone is destroyed. We tried to upgrade to the latest build but ran in to the current 'check sum' issue in build snv_122 so we rolled back. # uname -a SunOS lahar2 5.11 snv_111b i86pc i386 i86pc # zpool status zdisk1 pool: zdisk1 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM zdisk1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares c7t6d0AVAIL errors: No known data errors The filesystem is currently in this 'hung' state, is there any commands I can run to help debug the issue? TIA -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] check a zfs rcvd file
Dick Hoogendijk wrote: Some time ago there was some discussion on zfs send | rcvd TO A FILE. Apart form the disadvantages which I now know someone mentioned a CHECK to be at least sure that the file itself was OK (without one or more bits that felt over). I lost this reply and would love to hear this check again. In other words how can I be sure of the validity of the received file in the next command line: # zfs send -Rv rp...@090902 > /backup/snaps/rpool.090902 I only want to know how to check the integrity of the received file. You should be able to generate a sha1sum/md5sum of the zfs send stream on the fly with 'tee': # zfs send -R rp...@090902 | tee /backups/snaps/rpool.090902 | sha1sum compare the output of that with the sha1sum of the file on-disk: # sha1sum /backups/snaps/rpool.090902 This only guarantees that the file contains the exact same bits as the zfs send stream. It does not verify the ZFS format/integrity of the stream - the only way to do that is to zfs recv the stream into ZFS. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status/priority of 6761786
Richard Elling wrote: On Aug 28, 2009, at 12:15 AM, Dave wrote: Thanks, Trevor. I understand the RFE/CR distinction. What I don't understand is how this is not a bug that should be fixed in all solaris versions. In a former life, I worked at Sun to identify things like this that affect availability and lobbied to get them fixed. There are opposing forces at work: the functionality is correct as designed versus availability folks think it should go faster. It is difficult to build the case that code changes should be made for availability when other workarounds exist. It will be more fruitful for you to examine the implementation and see if there is a better way to improve the efficiencies of your snapshot processes. For example, the case can be made for a secondary data store containing long-term snapshots which can allow you to further optimize the primary data store for performance and availability. -- richard This is unfortunate, but it seems this may be the only option if I want to import a pool within a reasonable amount of time. It's very frustrating to know that it can be fixed (evidenced by the S10U6 fix), but won't be fixed in Nevada/OpenSolaris - or so it seems. It may be filed as an RFE, but in my opinion it is most definitely a bug. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status/priority of 6761786
Thanks, Trevor. I understand the RFE/CR distinction. What I don't understand is how this is not a bug that should be fixed in all solaris versions. The related ID 6612830 says it was fixed in Sol 10 U6, which was a while ago. I am using OpenSolaris, so I would really appreciate confirmation that it has been fixed in OpenSolaris as well. I can't tell by the info on the bugs DB - it seems like it hasn't been fixed in OpenSolaris. If it has, then the status should reflect it as Fixed/Closed in the bug database... -- Dave Trevor Pretty wrote: Dave Yep that's an RFE. (Request For Enchantment) that's how things are reported to engineers to fix things inside Sun. If it's an honest to goodness CR = bug (However it normally need a real support paying customer to have a problem to go from RFE to CR) the "responsible engineer" evaluates it, and eventually gets it fixed, or not. When I worked at Sun I logged a lot of RFEs, only a few where accepted as bugs and fixed. Click on the "new Search" link and look at the type and state menus. It gives you an idea of the states a RFE and CR goes through. It's probably documented somewhere, but I can't find it. Part of the joy of Sun putting out in public something most other vendors would not dream of doing. Oh and it doesn't help both RFEs and CR are labelled "bug" at http://bugs.opensolaris.org/ So. Looking at your RFE. It tells you which version on Nevada it was reported against (translating this into an Opensolaris version is easy - NOT!) Look at "*Related Bugs* 6612830 <http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=e49afb42be7df0f5f17ec9c2d711?bug_id=6612830> " This will tell you the "*Responsible Engineer* Richard Morris" and when it was fixed "*Release Fixed* , solaris_10u6(s10u6_01) (*Bug ID:*2160894 <http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2160894>) " Although as nothing in life is guaranteed it looks like another bug 2160894 has been identified and that's not yet on bugs.opensolaris.org Hope that helps. Trevor Dave wrote: Just to make sure we're looking at the same thing: http://bugs.opensolaris.org/view_bug.do?bug_id=6761786 This is not an issue of auto snapshots. If I have a ZFS server that exports 300 zvols via iSCSI and I have daily snapshots retained for 14 days, that is a total of 4200 snapshots. According to the link/bug report above it will take roughly 5.5 hours to import my pool (even when the pool is operating perfectly fine and is not degraded or faulted). This is obviously unacceptable to anyone in an HA environment. Hopefully someone close to the issue can clarify. -- Dave Blake wrote: I think the value of auto-snapshotting zvols is debatable. At least, there are not many folks who need to do this. What I'd rather see is a default property of 'auto-snapshot=off' for zvols. Blake On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote: On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers wrote: Dave, Its logged as an RFE (Request for Enhancement) not as a CR (bug). The status is 3-Accepted/ P1 RFE RFE's are generally looked at in a much different way then a CR. ..Remco Seriously? It's considered "works as designed" for a system to take 5+ hours to boot? Wow. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss */ *//* *//*///* www.eagle.co.nz <http://www.eagle.co.nz/> This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status/priority of 6761786
Just to make sure we're looking at the same thing: http://bugs.opensolaris.org/view_bug.do?bug_id=6761786 This is not an issue of auto snapshots. If I have a ZFS server that exports 300 zvols via iSCSI and I have daily snapshots retained for 14 days, that is a total of 4200 snapshots. According to the link/bug report above it will take roughly 5.5 hours to import my pool (even when the pool is operating perfectly fine and is not degraded or faulted). This is obviously unacceptable to anyone in an HA environment. Hopefully someone close to the issue can clarify. -- Dave Blake wrote: I think the value of auto-snapshotting zvols is debatable. At least, there are not many folks who need to do this. What I'd rather see is a default property of 'auto-snapshot=off' for zvols. Blake On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote: On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers wrote: Dave, Its logged as an RFE (Request for Enhancement) not as a CR (bug). The status is 3-Accepted/ P1 RFE RFE's are generally looked at in a much different way then a CR. ..Remco Seriously? It's considered "works as designed" for a system to take 5+ hours to boot? Wow. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Status/priority of 6761786
Can anyone from Sun comment on the status/priority of bug ID 6761786? Seems like this would be a very high priority bug, but it hasn't been updated since Oct 2008. Has anyone else with thousands of volume snapshots experienced the hours long import process? -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find poor performing disks
Maybe you can run a Dtrace probe using Chime? http://blogs.sun.com/observatory/entry/chime Initial Traces -> Device IO -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs deduplication
I don't think is at liberty to discuss ZFS Deduplication at this point in time: http://www.itworld.com/storage/71307/sun-tussles-de-duplication-startup Hopefully, the matter is resolved and discussions can proceed openly. "Send lawyers, guns and money." - Warren Zevon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40
> I don't mean to be offensive Russel, but if you do > ever return to ZFS, please promise me that you will > never, ever, EVER run it virtualized on top of NTFS > (a.k.a. worst file system ever) in a production > environment. Microsoft Windows is a horribly > unreliable operating system in situations where > things like protecting against data corruption are > important. Microsoft knows this Oh WOW! Whether or not our friend Russel virtualized on top of NTFS (he didn't - he used raw disk access) this point is amazing! System5 - based on this thread I'd say you can't really make this claim at all. Solaris suffered a crash and the ZFS filesystem lost EVERYTHING! And there aren't even any recovery tools? HANG YOUR HEADS!!! Recovery from the same situation is EASY on NTFS. There are piles of tools out there that will recover the file system, and failing that, locate and extract data. The key parts of the file system are stored in multiple locations on the disk just in case. It's been this way for over 10 years. I'd say it seems from this thread that my data is a lot safer on NTFS than it is on ZFS! I can't believe my eyes as I read all these responses blaming system engineering and hiding behind ECC memory excuses and "well, you know, ZFS is intended for more Professional systems and not consumer devices, etc etc." My goodness! You DO realize that Sun has this website called opensolaris.org which actually proposes to have people use ZFS on commodity hardware, don't you? I don't see a huge warning on that site saying "ATTENTION: YOU PROBABLY WILL LOSE ALL YOUR DATA". I recently flirted with putting several large Unified Storage 7000 systems on our corporate network. The hype about ZFS is quite compelling and I had positive experience in my lab setting. But because of not having Solaris capability on our staff we went in another direction instead. Reading this thread, I'm SO glad we didn't put ZFS in production in ANY way. Guys, this is the real world. Stuff happens. It doesn't matter what the reason is - hardware lying about cache commits, out-of-order commits, failure to use ECC memory, whatever. It is ABSOLUTELY unacceptable for the filesystem to be entirely lost. No excuse or rationalization of any type can be justified. There MUST be at least the base suite of tools to deal with this stuff. without it, ZFS simply isn't ready yet. I am saving a copy of this thread to show my colleagues and also those Sun Microsystems sales people that keep calling. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix
Anyone (Ross?) creating ZFS pools over iSCSI connections will want to pay attention to snv_121 which fixes the 3 minute hang after iSCSI disk problems: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649 Yay! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
Haudy Kazemi wrote: I think a better question would be: what kind of tests would be most promising for turning some subclass of these lost pools reported on the mailing list into an actionable bug? my first bet would be writing tools that test for ignored sync cache commands leading to lost writes, and apply them to the case when iSCSI targets are rebooted but the initiator isn't. I think in the process of writing the tool you'll immediately bump into a defect, because you'll realize there is no equivalent of a 'hard' iSCSI mount like there is in NFS. and there cannot be a strict equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy to preserve availability when an iSCSI target goes away. I think the whole model is wrong somehow. I'd surely hope that a ZFS pool with redundancy built on iSCSI targets could survive the loss of some targets whether due to actual failures or necessary upgrades to the iSCSI targets (think OS upgrades + reboots on the systems that are offering iSCSI devices to the network.) I've had a mirrored zpool created from solaris iSCSI target servers in production since April 2008. I've had disks die and reboots of the target servers - ZFS has handled them very well. My biggest wish is to be able to tune the iSCSI timeout value so ZFS can failover reads/writes to the other half of the mirror quicker than it does now (about 180 seconds on my config). A minor gripe considering the features that ZFS provides. I've also had the zfs server (the initiator aggregating the mirrored disks) unintentionally power cycled with the iscsi zpool imported. The pool re-imported and scrubbed fine. ZFS is definitely my FS of choice - by far. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Things I Like About ZFS
I'll start: - The commands are easy to remember -- all two of them. Which is easier, SVM or ZFS, to mirror your disks? I've been using SVM for years and still have to break out the manual to use metadb, metainit, metastat, metattach, metadetach, etc. I hardly ever have to break out the ZFS manual. I can actually remember the commands and options to do things. Don't even start me on VxVM. - Boasting to the unconverted. We still have a lot of VxVM and SVM on Solaris, and LVM on AIX, in the office. The other admins are always having issues with storage migrations, full filesystems, Live Upgrade, corrupted root filesystems, etc. I love being able to offer solutions to their immediate problems, and follow it up with, "You know, if your box was on ZFS this wouldn't be an issue." -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server Cloning With ZFS?
Cindy, my question is about what "system specific info" is maintained that would need to be changed? To take my example, my E450, "homer", has disks that are failing and it's a big clunky server anyway, and management wants to decommission it. But we have an old 220R racked up doing nothing, and it's not scheduled for disposal. What would be wrong with this: 1) Create a recursive snapshot of the root pool on homer. 2) zfs send this snapshot to a file on some NFS server. 3) Boot my 220R (same architecture as the E450) into single user mode from a DVD. 4) Create a zpool on the 220R's local disks. 5) zfs receive the snapshot created in step 2 to the new pool. 6) Set the bootfs property. 7) Reboot the 220R. Now my 220R comes up as "homer", with its IP address, users, root pool filesystems, any software that was installed in the old homer's root pool, etc. Since ZFS filesystems don't care about the underlying disk structure -- they only care about the pool, and I've already created a pool for them on the 220R using the disks it has, there shouldn't be any storage-type "system specific into" to change, right? And sure, the 220R might have a different number and speed of CPUs, and more or less RAM than the E450 had. But when you upgrade a server in place you don't have to manually configure the CPUs or RAM, and how is this different? The only thing I can think of that I might need to change, in order to bring up my 220R and have it "be" homer, is the network interfaces, from hme to bge or whatever. And that's a simple config setting. I don't care about Flash. Actually, if you wanted to provision new servers based on a golden image like you can with Flash, couldn't you just take a recursive snapshot of a zpool as above, "receive" it in an empty zpool on another server, set your bootfs, and do a sys-unconfig? So my big question is, with a server on ZFS root, what "system specific info" would still need to be changed? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Server Cloning With ZFS?
So I had an E450 running Solaris 8 with VxVM encapsulated root disk. I upgraded it to Solaris 10 ZFS root using this method: - Unencapsulate the root disk - Remove VxVM components from the second disk - Live Upgrade from 8 to 10 on the now-unused second disk - Boot to the new Solaris 10 install - Create a ZFS pool on the now-unused first disk - Use Live Upgrade to migrate root filesystems to the ZFS pool - Add the now-unused second disk to the ZFS pool as a mirror Now my E450 is running Solaris 10 5/09 with ZFS root, and all the same users, software, and configuration that it had previously. That is pretty slick in itself. But the server itself is dog slow and more than half the disks are failing, and maybe I want to clone the server on new(er) hardware. With ZFS, this should be a lot simpler than it used to be, right? A new server has new hardware, new disks with different names and different sizes. But that doesn't matter anymore. There's a procedure in the ZFS manual to recover a corrupted server by using zfs receive to reinstall a copy of the boot environment into a newly created pool on the same server. But what if I used zfs send to save a recursive snapshot of my root pool on the old server, booted my new server (with the same architecture) from the DVD in single user mode and created a ZFS pool on its local disks, and did zfs receive to install the boot environments there? The filesystems don't care about the underlying disks. The pool hides the disk specifics. There's no vfstab to edit. Off the top of my head, all I can think to have to change is the network interfaces. And that change is as simple as "cd /etc ; mv hostname.hme0 hostname.qfe0" or whatever. Is there anything else I'm not thinking of? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR# 6574286, remove slog device
Richard Elling wrote: Will Murnane wrote: On Wed, May 20, 2009 at 12:42, Miles Nordin wrote: "djm" == Darren J Moffat writes: djm> a) it was highly dangerous and involved using multiple djm> different zfs kernel modules was well as however...utter hogwash! Nothing is ``highly dangerous'' when your pool is completely unreadable. It is if you turn your "unreadable but fixable" pool into a "completely unrecoverable" pool. If my pool loses its log disk, I'm waiting for an official tool to fix it. Whoa. The slog is a top-level vdev like the others. The current situation is that loss of a top-level vdev results in a pool that cannot be imported. If you are concerned about the loss of a top-level vdev, then you need to protect them. For slogs, mirrors work. For the main pool, mirrors and raidz[12] work. There was a conversation regarding whether it would be a best practice to always mirror the slog. Since the recovery from slog failure modes is better than that of the other top-level vdevs, the case for recommending a mirrored slog is less clear. If you are paranoid, then mirror the slog. -- richard I can't test this myself at the moment, but the reporter of Bug ID 6733267 says even one failed slog from a pair of mirrored slogs will prevent an exported zpool from being imported. Has anyone tested this recently? -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR# 6574286, remove slog device
Eric Schrock wrote: On May 19, 2009, at 12:57 PM, Dave wrote: If you don't have mirrored slogs and the slog fails, you may lose any data that was in a txg group waiting to be committed to the main pool vdevs - you will never know if you lost any data or not. None of the above is correct. First off, you only lose data if the slog fails *and* the machine panics/reboots before the transaction groups is synced (5-30s by default depending on load, though there is a CR filed to immediately sync on slog failure). You will not lose any data once the txg is synced - syncing the transaction group does not require reading from the slog, so failure of the log device does not impact normal operation. Thanks for correcting my statement. There is still a potential approximate 60 second window for data loss if there are 2 transaction groups waiting to sync with a 30 second txg commit timer, correct? The latter half of the above statement is also incorrect. Should you find yourself in the double-failure described above, you will get an FMA fault that describes the nature of the problem and the implications. If the slog is truly dead, you can 'zpool clear' (or 'fmadm repair') the fault and use whatever data you still have in the pool. If the slog is just missing, you can insert it and continue without losing data. In no cases will ZFS silently continue without committed data. How will it know that data was actually lost? Or does it just alert you that it's possible data was lost? There's also the worry that the pool is not importable if you did have the double failure scenario and the log really is gone. Re: bug ID 6733267 . E.g. if you had done a 'zpool import -o cachefile=none mypool'. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR# 6574286, remove slog device
Paul B. Henson wrote: I was checking with Sun support regarding this issue, and they say "The CR currently has a high priority and the fix is understood. However, there is no eta, workaround, nor IDR." If it's a high priority, and it's known how to fix it, I was curious as to why has there been no progress? As I understand, if a failure of the log device occurs while the pool is active, it automatically switches back to an embedded pool log. It seems removal would be as simple as following the failure path to an embedded log, and then update the pool metadata to remove the log device. Is it more complicated than that? We're about to do some testing with slogs, and it would make me a lot more comfortable to deploy one in production if there was a backout plan :)... If you don't have mirrored slogs and the slog fails, you may lose any data that was in a txg group waiting to be committed to the main pool vdevs - you will never know if you lost any data or not. I think this thread is the latest discussion about slogs and their behavior: https://opensolaris.org/jive/thread.jspa?threadID=102392&tstart=0 -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?
Carson Gaspar wrote: Tim wrote (although it wasn't his error originally): Unless you want to have a different response for each of the repair methods, I'd just drop that part: status: One or more devices has experienced an error. The error has been automatically corrected by zfs. Data on the pool is unaffected. "Data on the pool are unaffected." Data is plural. Not to nitpick, but I think most people would prefer the singular 'data' when referring to the storage of data. The plural 'data' in this case is very awkward. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] reboot when copying large amounts of data
Tim wrote: On Thu, Mar 12, 2009 at 2:22 PM, Blake <mailto:blake.ir...@gmail.com>> wrote: I've managed to get the data transfer to work by rearranging my disks so that all of them sit on the integrated SATA controller. So, I feel pretty certain that this is either an issue with the Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though I would think that the integrated SATA would also be using the PCI bus?). The motherboard, for those interested, is an HD8ME-2 (not, I now find after buying this box from Silicon Mechanics, a board that's on the Solaris HCL...) <http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm> So I'm not considering one of LSI's HBA's - what do list members think about this device: <http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm <http://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm>> I believe the MCP55's SATA controllers are actually PCI-E based. I use Tyan 2927 motherboards. They have on-board nVidia MCP55 chipsets, which is the same chipset at the X4500 (IIRC). I wouldn't trust the MCP55 chipset in OpenSolaris. I had random disk hangs even while the machine was mostly idle. In Feb 2008 I bought AOC-SAT2-MV8 cards and moved all my drives to these add-in cards. I haven't had any issues with drive hanging since. There does not seem to be any problems with the SAT2-MV8 under heavy load in my servers from what I've seen. When the SuperMicro AOC-USAS-L8i came out later last year, I started using them instead. They work better than the SAT2-MV8s. This card needs a 3U or bigger case: http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm This is the low profile card that will fit in a 2U: http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm They both work in normal PCI-E slots on my Tyan 2927 mobos. Finding good non-Sun hardware that works very well under OpenSolaris is frustrating to say the least. Good luck. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs related google summer of code ideas - your vote
C. Bergström wrote: Bob Friesenhahn wrote: I don't know if anyone has noticed that the topic is "google summer of code". There is only so much that a starving college student can accomplish from a dead-start in 1-1/2 months. The ZFS equivalent of eliminating world hunger is not among the tasks which may be reasonably accomplished, yet tasks at this level of effort is all that I have seen mentioned here. May I interject a bit.. I'm silently collecting this task list and even outside of gsoc may help try to arrange it from a community perspective. Of course this will be volunteer based unless /we/ get a sponsor or sun beats /us/ to it. So all the crazy ideas welcome.. I would really like to see a feature like 'zfs diff f...@snap1 f...@othersnap' that would report the paths of files that have either been added, deleted, or changed between snapshots. If this could be done at the ZFS level instead of the application level it would be very cool. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs related google summer of code ideas - your vote
Gary Mills wrote: On Wed, Mar 04, 2009 at 06:31:59PM -0700, Dave wrote: Gary Mills wrote: On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: "gm" == Gary Mills writes: gm> I suppose my RFE for two-level ZFS should be included, It's a simply a consequence of ZFS's end-to-end error detection. There are many different components that could contribute to such errors. Since only the lower ZFS has data redundancy, only it can correct the error. Of course, if something in the data path consistently corrupts the data regardless of its origin, it won't be able to correct the error. The same thing can happen in the simple case, with one ZFS over physical disks. I would argue against building this into ZFS. Any corruption happening on the wire should not be the responsibility of ZFS. If you want to make sure your data is not corrupted over the wire, use IPSec. If you want to prevent corruption in RAM, use ECC sticks, etc. But what if the `wire' is a SCSI bus? Would you want ZFS to do error correction in that case? There are many possible wires. Every component does its own error checking of some sort, but in its own domain. This brings us back to end-to-end error checking again. Since we are designing a filesystem, that's where the reliability should reside. ZFS can't eliminate or prevent all errors. You should have a split backplane/multiple controllers and a minimum 2-way mirror if you're concerned about this from a local component POV. Same with iSCSI. I run a minimum 2-way mirror from my ZFS server from 2 different NICs, over 2 gigabit switches w/trunking to two different disk shelves for this reason. I do not stack ZFS layers, since it degrades performance and really doesn't provide any benefit. What's your reason for stacking zpools? I can't recall the original argument for this. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs related google summer of code ideas - your vote
Gary Mills wrote: On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: "gm" == Gary Mills writes: gm> I suppose my RFE for two-level ZFS should be included, Not that my opinion counts for much, but I wasn't deaf to it---I did respond. I appreciate that. I thought it was kind of based on mistaken understanding. It included this strangeness of the upper ZFS ``informing'' the lower one when corruption had occured on the network, and the lower ZFS was supposed to do something with the physical disks...to resolve corruption on the network? why? IIRC several others pointed out the same bogosity. It's a simply a consequence of ZFS's end-to-end error detection. There are many different components that could contribute to such errors. Since only the lower ZFS has data redundancy, only it can correct the error. Of course, if something in the data path consistently corrupts the data regardless of its origin, it won't be able to correct the error. The same thing can happen in the simple case, with one ZFS over physical disks. I would argue against building this into ZFS. Any corruption happening on the wire should not be the responsibility of ZFS. If you want to make sure your data is not corrupted over the wire, use IPSec. If you want to prevent corruption in RAM, use ECC sticks, etc. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS disable startup import
smart trams wrote: Hi All, What I all want is a way to disable startup import process of ZFS. So on every server reboot, I want to manually import the pools and mount on required mount point. zpool attributes like mountpoint=legacy or canmount affect pool mounting behavior and no command found for disabling startup import process. My systems are Solaris running on SPARC systems. Why I need this feature? Good Question! I've a active/standby clustered environment with 1 shared SAN disk with 2 servers. Shared disk have one ZFS pool [xpool] that must be always imported and mounted on one server on any time. When the active server dies, my cluster software [Verites Cluster] detects the problem and imports the 'xpool' [with -f switch] on standby server and starts the applications. Everything is happy till now. When the died server boots up, it tries to have the 'xpool' pool and lists it as one of it's pools. Note that I didn't mentioned about mounting in any mountpoint! only listing as it's current pools. The problem now rise up that two nodes are now trying to have write activities on pool and the pool gets inconsistent! What I want is to disable this ZFS behaviour and force it to wait until my cluster software decides about the active server. Use the cachefile=none option whenever you import the pool on either server: zpool import -o cachefile=none xpool -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused about zfs recv -d, apparently
Frank Cusack wrote: When you try to backup the '/' part of the root pool, it will get mounted on the altroot itself, which is of course already occupied. At that point, the receive will fail. So far as I can tell, mounting the received filesystem is the last step in the process. So I guess maybe you could replicate everything except '/', finally replicate '/' and just ignore the error message. I haven't tried this. You have to do '/' last because the receive stops at that point even if there is more data in the stream. Wouldn't it be relatively easy to add an option to 'zfs receive' to ignore/not mount the received filesystem, or set the canmount option to 'no' when receiving? Is there an RFE for this, or has it been added to a more recent release already? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two zvol devices one volume?
Henrik Johansson wrote: I tried to export the zpool also, and I got this, the strange part is that it sometimes still thinks that the ubuntu-01-dsk01 dataset exists: # zpool export zpool01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist cannot unmount '/zpool01/dump': Device busy But: # zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist Regards I have seen this 'phantom dataset' with a pool on nv93. I created a zpool, created a dataset, then destroyed the zpool. When creating a new zpool on the same partitions/disks as the destroyed zpool, upon export I receive the same message as you describe above, even though I never created the dataset in the new pool. Creating a dataset of the same name and then destroying it doesn't seem to get rid of it, either. I never did remember to file a bug for it... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Blake wrote: I'm sure it's very hard to write good error handling code for hardware events like this. I think, after skimming this thread (a pretty wild ride), we can at least decide that there is an RFE for a recovery tool for zfs - something to allow us to try to pull data from a failed pool. That seems like a reasonable tool to request/work on, no? The ability to force a roll back to an older uberblock in order to be able to access the pool (in the case of corrupt current uberblock) should be ZFS developer's very top priority, IMO. I'd offer to do it myself, but I have nowhere near the ability to do so. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Will Murnane wrote: On Thu, Feb 12, 2009 at 20:05, Tim wrote: Are you selectively ignoring responses to this thread or something? Dave has already stated he *HAS IT WORKING TODAY*. No, I saw that post. However, I saw one unequivocal "it doesn't work" earlier (even if I can't show it to you), which implies to me that whether the card works or not in a particular setup is somewhat finicky. So here's one datapoint: Dave wrote: Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. but the thread that Brandon linked to does not contain a datapoint. For what it's worth, I think these are the only two datapoints I've seen; most threads about this card end up debating back and forth whether it will work, with nobody actually buying and testing the card. I can tell you that the USAS-L8i absolutely works fine with a Tyan 2927 in a Chenbro RM31616 3U rackmount chassis. In fact, I have two of the USAS-L8i in this chassis because I forgot that, unlike the 8-port AOC-SAT2-MV8, the USAS-L8i can support up to 122 drives. I have 8 drives connected to the first USAS-L8i. They are set up in a raidz-2 and I get 90-120MB/sec read and 60-75MB/sec write during my rsyncs from linux machines (this solaris box is only used to store backup data). I plan on removing the second USAS-L8i and connect all 16 drives to the first USAS-L8i when I need more storage capacity. I have no doubt that it will work as intended. I will report to the list otherwise. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Dave wrote: Brent wrote: Does anyone know if this card will work in a standard pci express slot? Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. The AOC-SAT2-MV8 also works in a regular PCI slot (although it is PCI-X card). Please let the list know if you try the USASLP-L8i (the low profile version of this card). The USAS-L8i only fits in 3u rackmount chassis. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Brent wrote: Does anyone know if this card will work in a standard pci express slot? Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. The AOC-SAT2-MV8 also works in a regular PCI slot (although it is PCI-X card). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss