Re: [zfs-discuss] ZFS RaidZ recommendation
Eric Andersen wrote: I find Erik Trimble's statements regarding a 1 TB limit on drives to be a very bold statement. I don't have the knowledge or the inclination to argue the point, but I am betting that we will continue to see advances in storage technology on par with what we have seen in the past. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. Again, I emphasize that this assumption is not based on any sort of knowledge other than past experience with the ever growing storage capacity of physical disks. Why thank you for recognizing my bold, God-like predictive powers. It comes from my obviously self-descriptive name, which means Powerful/Eternal Ruler wink Ahem. I'm not saying that hard drive manufacturers have (quite yet) hit their ability to increase storage densities - indeed, I do expect to see 4TB drives some time in the next couple of years. What I am saying is that it doesn't matter if areal densities continue to increase - we're at the point now with 1TB drives where the number of predictable hard error rates is just below the level which we can tolerate. That is, error rates (errors per X bits read/written) have dropped linearly over the past 3 decades, while densities are on a rather severe geometric increase, and data transfer rate is effectively stopped increasing at all. What this means is that while you can build a higher-capacity disk, the time you can effectively use it is dropping (i.e. before it experiences a non-recoverable error and has to be replaced), and the time that it takes to copy off all the data from drive to another one is increasing. If X = (time to use ) and Y = (time to copy off data), when X 2*Y, you're screwed. In fact, from an economic standpoint, when X 100 * Y, you're pretty much screwed. And 1TB drives are about the place where they can still just pass this test. 1.5TB drives and up aren't going to be able to pass it. Everything I've said applies not only to 3.5 drives, but to 2.5 drives. It's a problem with the basic winchester hard drive technology. We just get a bit more breathing space (maybe two technology cycles, which in the HD sector means about 3 years) with the 2.5 form factor. But even they are doomed shortly. I got a pack of Bud with your name on it. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing disk in zfs pool
Hi all, I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won't have access to my documentation there. That's why I would like to have a good plan on paper before driving to that location. :-) The current tank pool looks as follows: pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: zpool attach tank c1t7d0 c1t6d0 This hopefully gives me a three-way mirror: mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 And then a zpool dettach tank c1t15d0 to get c1t15d0 out of the mirror to finally have mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 again. Is that a good plan? I am then intending to do zpool add tank mirror c1t14d0 c1t15d0 to add another 146GB to the pool. Please let me know if I am missing anything. This is a production server. A failure of the pool would be fatal. Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
On 9 apr 2010, at 10.58, Andreas Höschler wrote: Hi all, I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won't have access to my documentation there. That's why I would like to have a good plan on paper before driving to that location. :-) The current tank pool looks as follows: pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: I don't think that the BIOS and rebooting part ever has to be true, at least I don't hope so. You shouldn't have to reboot just because you replace a hot plug disk. Depending on the hardware and the state of your system, it might not be the problem at all, and rebooting may not help. Are the device links for c1t6* gone in /dev/(r)dsk? Then someone must have ran a devfsadm -C or something like that. You could try devfsadm -sv to see if it wants to (re)create any device links. If you think that it looks good, run it with devfsadm -v. If it is the HBA/raid controller acting up and not showing recently inserted drives, you should be able to talk to it with a program from within the OS. raidctl for some LSI HBAs, and arcconf for some SUN/StorageTek HBAs. zpool attach tank c1t7d0 c1t6d0 This hopefully gives me a three-way mirror: mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 And then a zpool dettach tank c1t15d0 to get c1t15d0 out of the mirror to finally have mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 again. Is that a good plan? I believe so, and I tried it, as I don't actually do this very often by hand (only in my test shell scripts, which I currently run some dozens of times a day :-): -bash-4.0$ pfexec zpool create tank mirror c3t5d0 c3t6d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 errors: No known data errors -bash-4.0$ pfexec zpool attach tank c3t6d0 c3t7d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9 11:30:13 2010 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 73.5K resilvered errors: No known data errors -bash-4.0$ pfexec zpool detach tank c3t5d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9 11:30:13 2010 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 73.5K resilvered errors: No known data errors -bash-4.0$ I am then intending to do zpool add tank mirror c1t14d0 c1t15d0 I believe that too: -bash-4.0$ pfexec zpool add tank mirror c3t1d0 c3t2d0 -bash-4.0$ zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Apr 9
Re: [zfs-discuss] Replacing disk in zfs pool
Hi Ragnar, I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won't have access to my documentation there. That's why I would like to have a good plan on paper before driving to that location. :-) The current tank pool looks as follows: pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: I don't think that the BIOS and rebooting part ever has to be true, at least I don't hope so. You shouldn't have to reboot just because you replace a hot plug disk. Hard to believe! But that's the most recent state of affairs. Not even the Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 times assuming it to be defect! :-) I tried to remotely reboot the server (with LOM) and go into the SCSI BIOS to initialize the disk, but the BIOS requires a key combination to initialize the disk that does not go through the remote connections (don't remember which one). That's why I am planning to drive to the remote location and do it manually with a server reboot and keyboard and screen attached like in the very old days. :-( Depending on the hardware and the state of your system, it might not be the problem at all, and rebooting may not help. Are the device links for c1t6* gone in /dev/(r)dsk? Then someone must have ran a devfsadm -C or something like that. You could try devfsadm -sv to see if it wants to (re)create any device links. If you think that it looks good, run it with devfsadm -v. If it is the HBA/raid controller acting up and not showing recently inserted drives, you should be able to talk to it with a program from within the OS. raidctl for some LSI HBAs, and arcconf for some SUN/StorageTek HBAs. I have /usr/sbin/raidctl on that machine and just studied the man page of this tool. But I couldn't find hints of how to initialize a disk c1t16d0. It just talks about setting up raid volumes!? :-( Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
On 04/ 9/10 08:58 PM, Andreas Höschler wrote: zpool attach tank c1t7d0 c1t6d0 This hopefully gives me a three-way mirror: mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 And then a zpool dettach tank c1t15d0 to get c1t15d0 out of the mirror to finally have mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 again. Is that a good plan? I am then intending to do zpool add tank mirror c1t14d0 c1t15d0 to add another 146GB to the pool. Please let me know if I am missing anything. That looks OK and safe. This is a production server. A failure of the pool would be fatal. To whom?? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
On 9 apr 2010, at 12.04, Andreas Höschler wrote: Hi Ragnar, I need to replace a disk in a zfs pool on a production server (X4240 running Solaris 10) today and won't have access to my documentation there. That's why I would like to have a good plan on paper before driving to that location. :-) The current tank pool looks as follows: pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Note that disk c1t15d0 is being used and has taken ove rthe duty of c1t6d0. c1t6d0 failed and was replaced with a new disk a couple of months ago. However, the new disk does not show up in /dev/rdsk and /dev/dsk. I was told that the disk has to initialized first with the SCSI BIOS. I am going to do so today (reboot the server). Once the disks shows up in /dev/rdsk I am planning to do the following: I don't think that the BIOS and rebooting part ever has to be true, at least I don't hope so. You shouldn't have to reboot just because you replace a hot plug disk. Hard to believe! But that's the most recent state of affairs. Not even the Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 times assuming it to be defect! :-) I tried to remotely reboot the server (with LOM) and go into the SCSI BIOS to initialize the disk, but the BIOS requires a key combination to initialize the disk that does not go through the remote connections (don't remember which one). That's why I am planning to drive to the remote location and do it manually with a server reboot and keyboard and screen attached like in the very old days. :-( Yes, this is one of the many reasons that you shouldn't ever be forced to do anything in a non booted state (like in a BIOS setup thing or the like). :-( Depending on the hardware and the state of your system, it might not be the problem at all, and rebooting may not help. Are the device links for c1t6* gone in /dev/(r)dsk? Then someone must have ran a devfsadm -C or something like that. You could try devfsadm -sv to see if it wants to (re)create any device links. If you think that it looks good, run it with devfsadm -v. If it is the HBA/raid controller acting up and not showing recently inserted drives, you should be able to talk to it with a program from within the OS. raidctl for some LSI HBAs, and arcconf for some SUN/StorageTek HBAs. I have /usr/sbin/raidctl on that machine and just studied the man page of this tool. But I couldn't find hints of how to initialize a disk c1t16d0. It just talks about setting up raid volumes!? :-( If the HBA/raid controller really is the problem at all, it is probably about that it wants you to tell it how it should present the disk to the computer (as part of a raid, as a jbod disk, etc etc). It could also be that it wants you just to initialize the disk for it, or that it sees that it has been used in another raid configuration before and wants you to acknowledge that you want to reinitialize it. Hopefully you can just the disk and slot it in a straight through, auto replace, jbod-like mode. But this might not even be the problem. What HBA/raid controller do you have? (If you have a STK-RAID-INT or similar, chanses are that it actually is the Adaptec/Intel thing, and you will have do get the software for it here: http://www.intel.com/support/go/sunraid.htm You can just download it and use .../cmdline/arcconf directly, no need to install anything.) It may also be something with cfgadm, which you may have to use on some models (X4500 i believe) when you are replacing disks. I don't have one of those machines, and I haven't understood why you should have to use cfgadm on those systems either. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Snv_126 Kernel PF Panic
Hey All, I'm having some issues with a snv_126 file server running on an HP ML370 G6 server with an Adaptec Raid Card (31605). The server has the rpool, plus two raidz2 data pools (one is 1.5TB and 1.0TB respectively). I have been using e-sata to backup the pools to a pool that contains 3x 1.5 Tb drives every week. This has all worked great for the last 4 or so months. Starting last week, the machine would panic and reboot when attempting to perform a backup. This week, the machine has been randomly rebooting every 3-15 hours (with or without backup pool attached), complaining of: (#pf Page fault) rp=ff0010568eb0 addr=30 occurred in module zfs due to a NULL pointer dereference I use cron to perform a scrub of all pools every night, and there have been no errors what so ever. Below is the output from mdb $C on the core dump: rcher...@stubborn2:/var/crash/Stubborn2$ mdb 0 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp rootnex scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci fctl md lofs fcip fcp cpc random crypto smbsrv nfs logindmux ptm ufs nsmb sppp ipc ] $C ff000f4ef3b0 vdev_is_dead+0xc(0) ff000f4ef3d0 vdev_readable+0x16(0) ff000f4ef410 vdev_mirror_child_select+0x61(ff02fa41da10) ff000f4ef450 vdev_mirror_io_start+0xda(ff02fa41da10) ff000f4ef490 zio_vdev_io_start+0x1ba(ff02fa41da10) ff000f4ef4c0 zio_execute+0xa0(ff02fa41da10) ff000f4ef4e0 zio_nowait+0x42(ff02fa41da10) ff000f4ef580 arc_read_nolock+0x82d(0, ff02d716b000, ff02e3fdc000, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670) ff000f4ef620 arc_read+0x75(0, ff02d716b000, ff02e3fdc000, ff02e3a7f928, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670) ff000f4ef6c0 dbuf_prefetch+0x131(ff02e3a80018, 20) ff000f4ef710 dmu_zfetch_fetch+0xa8(ff02e3a80018, 20, 1) ff000f4ef750 dmu_zfetch_dofetch+0xb8(ff02e3a80278, ff02f4c52868) ff000f4ef7b0 dmu_zfetch_find+0x436(ff02e3a80278, ff000f4ef7c0, 1) ff000f4ef870 dmu_zfetch+0xac(ff02e3a80278, 2b, 4000, 1) ff000f4ef8d0 dbuf_read+0x170(ff02f3d8ea00, 0, 2) ff000f4ef950 dnode_hold_impl+0xed(ff02e2a2f040, 1591, 1, ff02e4e71478, ff000f4ef998) ff000f4ef980 dnode_hold+0x2b(ff02e2a2f040, 1591, ff02e4e71478, ff000f4ef998) ff000f4ef9e0 dmu_tx_hold_object_impl+0x4a(ff02e4e71478, ff02e2a2f040, 1591, 2, 0, 0) ff000f4efa00 dmu_tx_hold_bonus+0x2a(ff02e4e71478, 1591) ff000f4efa50 zfs_inactive+0x99(ff030213ae80, ff02d4ed6d88, 0) ff000f4efaa0 fop_inactive+0xaf(ff030213ae80, ff02d4ed6d88, 0) ff000f4efac0 vn_rele+0x5f(ff030213ae80) ff000f4efae0 smb_node_free+0x7d(ff02f098b2a0) ff000f4efb10 smb_node_release+0x9a(ff02f098b2a0) ff000f4efb30 smb_ofile_delete+0x76(ff03026d5d18) ff000f4efb60 smb_ofile_release+0x84(ff03026d5d18) ff000f4efb80 smb_request_free+0x23(ff02fa4b0058) ff000f4efbb0 smb_session_worker+0x6e(ff02fa4b0058) ff000f4efc40 taskq_d_thread+0xb1(ff02e51b9e90) ff000f4efc50 thread_start+8() I can provide any other info that may be need. Thank you in advance for your help! Rob -- Rob Cherveny Manager of Information Technology American Junior Golf Association 770.868.4200 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Andreas Höschler I don't think that the BIOS and rebooting part ever has to be true, at least I don't hope so. You shouldn't have to reboot just because you replace a hot plug disk. Hard to believe! But that's the most recent state of affairs. Not even the Sun technician made the disk to show up in /dev/dsks. They have replaced it 3 times assuming it to be defect! :-) I recently went through an exercise very similar to this on an x4275. I also tried to configure the HBA via the ILOM but couldn't find any way to do it. I also thought about shutting down the system, but never did that. I couldn't believe the sun support tech didn't know (and took days to figure out) how to identify or configure the raid HBA card installed, and identify the correct HBA configuration software. In my case (probably different for you) I have a storagetek, and the software is located here: http://www.intel.com/support/motherboards/server/sunraid/index.htm The manual is located here: http://docs.sun.com/source/820-1177-13/index.html I don't know how to identify what card is installed in your system. All the usual techniques (/var/adm/messages and prtdiag and prtconf) are giving me nothing that I can see identifies my storagetek. Once you have the raid reconfiguration software ... I had to initialize the disk (although it was already initialized, it was incorrect) and I had to make simple volume on that disk. Then it appeared as a device, reported by format Just like you, I had a scheduled downtime window, and I attempted to do all the above during that window. It was not necessary. I prepared in advance, by using a different system, adding and removing disks. On the other system (which had no HBA) I needed to use the commands devfsadm -Cv and cfgadm -al So you may need those. The first support guy I talked to said the raid configuration utility for the HBA was raidctl, which seems to be built into every system, but I don't think that's accurate. I am not aware of any situation where that is useful; but who knows, it might be for you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey I don't know how to identify what card is installed in your system. Actually, this is useful: prtpicl -v | less Search for RAID. On my system, I get this snippet (out of 3723 lines of output): :DeviceID 0 :UnitAddress 13 :pci-msi-capid-pointer 0xa0 :device-id 0x285 :vendor-id 0x9005 :revision-id 0x9 :class-code0x10400 :unit-address 0 :subsystem-id 0x286 :subsystem-vendor-id 0x108e :interrupts0x1 :devsel-speed 0 :power-consumption 01 00 00 00 01 00 00 00 :model RAID controller According to this page http://kb.qlogic.com/KanisaPlatform/Publishing/130/10441_f.html The important information is: :device-id 0x285 :vendor-id 0x9005 :subsystem-id 0x286 :subsystem-vendor-id 0x108e Now ... If you have device-id and vendor-id which are not listed on that qlogic page (mine is not) then how do you look up your product based on this information? And once you know the model of HBA you have, how do you locate the driver configuration utility for that card? My advice is put some ownage on your sun support tech. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] backup pool
Hi all, I want to backup a pool called mpool. I want to do this by doing a zfs send of a mpool snapshot and receive into a different pool called bpool. All this on the same machine. I'm sharing various filesystems via zfs sharenfs and sharesmb. Sending and receiving of the entire pool works as expected, including incremental updates. After exporting and importing bpool all shares get activated. All nfs shares get duplicated albeit with a different root. But the cifs shares really get duplicated. Looking at the output from sharemgr the share from bpool, which got mounted last, got precedence over the real share. What I want is a second pool which is a copy of the first including all properties. I don't want to turn turn off sharing by setting sharenfs and sharesmb to off. Because when I need to restore the pool I also need to set all the sharing properties again. Currently I use the following strategy: # zpool create -m none -O canmount=noauto bpool c5t15d0 c5t16d0 # zfs snapshot -r tp...@00 # zfs send -R tp...@00 | zfs recv -vFud bpool # zfs set canmount=noauto [each filesystem in bpool] # zpool export bpool # zpool import bpool After the import of bpool no extra shares in sharemgr and all properties still intact except the canmount property. Can I either send or receive the canmount=noauto property? (PSARC/2009/510) I know that I need at least version 22 for that. I tried it on a b134 with version 22 pools but couldn't get it to work. How can I prevent mounting filesystems during zpool import? I know how to mount it on a different root that doesn't solve my problem. Why can't the canmount zfs property be inherited? Any suggestion and / or strategy to accomplish will be more than welcome. Thank you for your interest and time, Frederik -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing disk in zfs pool
On 9 apr 2010, at 14.17, Edward Ned Harvey wrote: ... I recently went through an exercise very similar to this on an x4275. I also tried to configure the HBA via the ILOM but couldn't find any way to do it. ... Oh no, this is a BIOS system. The card is an autonomous entity that lives a life on it's own, and can barely be communicated with or supervised. :-( You either have to set the card up under a hook in the BIOS boot dialog, or with a special proprietary software from the operating system that may or may not work with your installation. (Some (or all?) Areca cards have ethernet ports so that you can talk to the card directly. :-) You can do it with ILOM under the BIOS boot sequence. I also thought about shutting down the system, but never did that. I couldn't believe the sun support tech didn't know (and took days to figure out) how to identify or configure the raid HBA card installed, and identify the correct HBA configuration software. Well, maybe he wasn't used to systems like this, and thought that the system design would be a little coherent, integrated and sane? :-) ... The first support guy I talked to said the raid configuration utility for the HBA was raidctl, which seems to be built into every system, but I don't think that's accurate. I am not aware of any situation where that is useful; but who knows, it might be for you. raidctl is for LSI cards with LSI1020/1030/1064/1068 controllers only. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Eric Andersen I backup my pool to 2 external 2TB drives that are simply striped using zfs send/receive followed by a scrub. As of right now, I only have 1.58TB of actual data. ZFS send over USB2.0 capped out at 27MB/s. The scrub for 1.5TB of backup data on the USB drives took roughly 14 hours. As needed, I'll destroy the backup pool and add more drives as needed. I looked at a lot of different options for external backup, and decided to go with cheap (USB). I am doing something very similar. I backup to external USB's, which I leave connected to the server for obviously days at a time ... zfs send followed by scrub. You might want to consider eSATA instead of USB. Just a suggestion. You should be able to go about 4x-6x faster than 27MB/s. I have found external enclosures to be unreliable. For whatever reason, they commonly just flake out, and have to be power cycled. This is unfortunately disastrous to solaris/opensolaris. The machine crashes, you have to power cycle, boot up in failsafe mode, import the pool(s) and then reboot once normal. I am wondering, how long have you been doing what you're doing? Do you leave your drives connected all the time? Have you seen similar reliability issues? What external hardware are you using? I started doing this on one system (via eSATA) about a year ago. It worked flawlessly for about 4 months before the disk started crashing. I started doing it on another system (via USB) about 6 months ago. It just started crashing a couple of weeks ago. I am now in the market to try and identify any *well made* external enclosures. The best I've seen so far is the Dell RD1000, but we're talking crazy overpriced, and hard drives that are too small to be useful to me. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. I solemnly pledge to do it anyway. And why wait? ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
ONStor sells a ZFS based machine http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1354658,00.html It seems more like FreeNAS or something? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
No idea about the build quality, but is this the sort of thing you're looking for? Not cheap, integrated RAID (sigh), but one cable only http://www.pc-pitstop.com/das/fit-500.asp Cheap, simple, 4 eSATA connections on one box http://www.pc-pitstop.com/sata_enclosures/scsat4eb.asp Still cheap, uses 4x SFF-8470 for a single cable connection http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp Slightly more expensive, but integrated port multiplier means one standard eSATA cable required http://www.pc-pitstop.com/sata_port_multipliers/scsat05b.asp On 9 avr. 2010, at 15:14, Edward Ned Harvey wrote: I am now in the market to try and identify any *well made* external enclosures. The best I've seen so far is the Dell RD1000, but we're talking crazy overpriced, and hard drives that are too small to be useful to me. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup pool
Use the -u option on the receiving pool. From the zfs(1m) man page: -u File system that is associated with the received stream is not mounted. NB this works for root pools, too. -- richard On Apr 9, 2010, at 5:33 AM, F. Wessels wrote: Hi all, I want to backup a pool called mpool. I want to do this by doing a zfs send of a mpool snapshot and receive into a different pool called bpool. All this on the same machine. I'm sharing various filesystems via zfs sharenfs and sharesmb. Sending and receiving of the entire pool works as expected, including incremental updates. After exporting and importing bpool all shares get activated. All nfs shares get duplicated albeit with a different root. But the cifs shares really get duplicated. Looking at the output from sharemgr the share from bpool, which got mounted last, got precedence over the real share. What I want is a second pool which is a copy of the first including all properties. I don't want to turn turn off sharing by setting sharenfs and sharesmb to off. Because when I need to restore the pool I also need to set all the sharing properties again. Currently I use the following strategy: # zpool create -m none -O canmount=noauto bpool c5t15d0 c5t16d0 # zfs snapshot -r tp...@00 # zfs send -R tp...@00 | zfs recv -vFud bpool # zfs set canmount=noauto [each filesystem in bpool] # zpool export bpool # zpool import bpool After the import of bpool no extra shares in sharemgr and all properties still intact except the canmount property. Can I either send or receive the canmount=noauto property? (PSARC/2009/510) I know that I need at least version 22 for that. I tried it on a b134 with version 22 pools but couldn't get it to work. How can I prevent mounting filesystems during zpool import? I know how to mount it on a different root that doesn't solve my problem. Why can't the canmount zfs property be inherited? Any suggestion and / or strategy to accomplish will be more than welcome. Thank you for your interest and time, Frederik -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
On Apr 9, 2010, at 7:07 AM, Orvar Korvar wrote: ONStor sells a ZFS based machine http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1354658,00.html It seems more like FreeNAS or something? It doesn't look like a ZFS-based product... too many limitations. Also LSI bought the company last year. http://www.lsi.com/storage_home/products_home/nas_gateways/index.html -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
You may be absolutely right. CPU clock frequency certainly has hit a wall at around 4GHz. However, this hasn't stopped CPUs from getting progressively faster. I know this is mixing apples and oranges, but my point is that no matter what limits or barriers computing technology hits, someone comes along and finds a way to engineer around it. I have no idea what storage technology will look like years from now, but I will be very surprised if the limitations you've listed have held back advances in storage devices. No idea what those devices will look like or how they'll work. If someone told me roughly 10 years ago that I would be using multi-core processors at the same clock speed as my Pentium 4, I would have probably scoffed at the idea. Here we are. I'm a drinker, not a prophet ;-) Like I said, I've built my system planning to upgrade with bigger capacity drives when I start running out of space rather then adding more drives. This is almost certainly unrealistic. I've always built my systems around planned upgradeability, but whenever it does come time for an upgrade, it never makes sense to do so. It's usually much more cost effective to just build a new system with newer and better technology. It should take me a long while to fill up 9TB, but there was a time when I thought a single gigabyte was a ridiculous amount of storage too. Eric On Apr 8, 2010, at 11:21 PM, Erik Trimble wrote: Eric Andersen wrote: I find Erik Trimble's statements regarding a 1 TB limit on drives to be a very bold statement. I don't have the knowledge or the inclination to argue the point, but I am betting that we will continue to see advances in storage technology on par with what we have seen in the past. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. Again, I emphasize that this assumption is not based on any sort of knowledge other than past experience with the ever growing storage capacity of physical disks. Why thank you for recognizing my bold, God-like predictive powers. It comes from my obviously self-descriptive name, which means Powerful/Eternal Ruler wink Ahem. I'm not saying that hard drive manufacturers have (quite yet) hit their ability to increase storage densities - indeed, I do expect to see 4TB drives some time in the next couple of years. What I am saying is that it doesn't matter if areal densities continue to increase - we're at the point now with 1TB drives where the number of predictable hard error rates is just below the level which we can tolerate. That is, error rates (errors per X bits read/written) have dropped linearly over the past 3 decades, while densities are on a rather severe geometric increase, and data transfer rate is effectively stopped increasing at all. What this means is that while you can build a higher-capacity disk, the time you can effectively use it is dropping (i.e. before it experiences a non-recoverable error and has to be replaced), and the time that it takes to copy off all the data from drive to another one is increasing. If X = (time to use ) and Y = (time to copy off data), when X 2*Y, you're screwed. In fact, from an economic standpoint, when X 100 * Y, you're pretty much screwed. And 1TB drives are about the place where they can still just pass this test. 1.5TB drives and up aren't going to be able to pass it. Everything I've said applies not only to 3.5 drives, but to 2.5 drives. It's a problem with the basic winchester hard drive technology. We just get a bit more breathing space (maybe two technology cycles, which in the HD sector means about 3 years) with the 2.5 form factor. But even they are doomed shortly. I got a pack of Bud with your name on it. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
I am doing something very similar. I backup to external USB's, which I leave connected to the server for obviously days at a time ... zfs send followed by scrub. You might want to consider eSATA instead of USB. Just a suggestion. You should be able to go about 4x-6x faster than 27MB/s. I did strongly consider going with eSATA. What I really wanted to use was FireWire 800 as it is reasonably fast and the ability to daisy chain devices is very appealing, but some of the stuff I've read regarding the state of OpenSolaris FireWire drivers scared me off. I decided against eSATA because I don't have any eSATA ports. I could buy a controller or run SATA to eSATA cables of the four available onboard ports, but either way, when/if I run out of ports, that's it. With USB, I can always use a hub if needed (at even slower speeds). If OpenSolaris supported SATA port multipliers, I'd have definitely gone with eSATA. The speed issue isn't really critical to me, especially if I'm doing incremental send/receives. Recovering my data from backup will be a drag, but it is what it is. I decided cheap and simple was best, and went with USB. I have found external enclosures to be unreliable. For whatever reason, they commonly just flake out, and have to be power cycled. This is unfortunately disastrous to solaris/opensolaris. The machine crashes, you have to power cycle, boot up in failsafe mode, import the pool(s) and then reboot once normal. This is what I've overwhelmingly heard as well. Most people point to the controllers in the enclosures. If I could find a reasonable backup method that avoided external enclosures altogether, I would take that route. For cost and simplicity it's hard to beat externals. I am wondering, how long have you been doing what you're doing? Do you leave your drives connected all the time? Have you seen similar reliability issues? What external hardware are you using? Not long (1 week), so I'm just getting started. I don't leave the drives connected. Plug them in, do a backup, zpool export, unplug and throw in my safe. It's far from great, but it beats what I had before (nothing). I plan to do an incremental zfs send/receive every 2-4 weeks depending on how much new data I have. I can't attest to any sort of reliability as I've only been at it for a very short period of time. I am using 2TB WD Elements drives (cheap). This particular model (WDBAAU0020HBK-NESN) hasn't been on the market too terribly long. There is one review on Newegg of someone having issues with one from the start. It sucks, but I think the reality is that it's pretty much a crapshoot when it comes to reliability on external drives/enclosures. I started doing this on one system (via eSATA) about a year ago. It worked flawlessly for about 4 months before the disk started crashing. I started doing it on another system (via USB) about 6 months ago. It just started crashing a couple of weeks ago. I am now in the market to try and identify any *well made* external enclosures. The best I've seen so far is the Dell RD1000, but we're talking crazy overpriced, and hard drives that are too small to be useful to me. If you find something good, please let me know. There are a lot of different solutions for a lot of different scenarios and price points. I went with cheap. I won't be terribly surprised if these drives end up flaking out on me. You usually get what you pay for. What I have isn't great, but it's better than nothing. Hopefully, I'll never need to recover data from them. If they end up proving to be too unreliable, I'll have to look at other options. Eric If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. I solemnly pledge to do it anyway. And why wait? ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send hangs
My zfs filesystem hangs when transferring large filesystems (500GB) with a couple dozen snapshots between servers using zfs send/receive with netcat. The transfer hangs about halfway through and is unkillable, freezing all IO to the filesystem, requiring a hard reboot. I have attempted this three times and failed every time. On the destination server I use: nc -l -p 8023 | zfs receive -vd sas On the source server I use: zfs send -vR promise1/rbac...@daily.1 | nc mothra 8023 The filesystems on both servers are the same (zfs version 3). The source zpool is version 22 (build 129), and the destination server is version 14 (build 111b). Rsync does not have this problem and performs extremely well. However, it will not transfer snapshots. Two other send/receives (234GB and 451GB) between the same servers have worked fine without hanging. Thanks, Daniel Bakken ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup pool
try again... On Apr 9, 2010, at 5:33 AM, F. Wessels wrote: Hi all, I want to backup a pool called mpool. I want to do this by doing a zfs send of a mpool snapshot and receive into a different pool called bpool. All this on the same machine. I'm sharing various filesystems via zfs sharenfs and sharesmb. Sending and receiving of the entire pool works as expected, including incremental updates. After exporting and importing bpool all shares get activated. All nfs shares get duplicated albeit with a different root. But the cifs shares really get duplicated. Looking at the output from sharemgr the share from bpool, which got mounted last, got precedence over the real share. What I want is a second pool which is a copy of the first including all properties. I don't want to turn turn off sharing by setting sharenfs and sharesmb to off. Because when I need to restore the pool I also need to set all the sharing properties again. I'll challenge this notion. Re-sharing the copy is a disaster recovery scenario, not a restore scenario. There should be no case where you want to share both copies simultaneously to the same client because then your copies diverge and you lose the original-to-backup relationship. Currently I use the following strategy: # zpool create -m none -O canmount=noauto bpool c5t15d0 c5t16d0 # zfs snapshot -r tp...@00 # zfs send -R tp...@00 | zfs recv -vFud bpool # zfs set canmount=noauto [each filesystem in bpool] instead do zfs set sharesmb=off zfs set sharenfs=off All property settings are recorded in the zpool history, so you can't lose the settings for sharenfs or sharesmb. # zpool export bpool # zpool import bpool After the import of bpool no extra shares in sharemgr and all properties still intact except the canmount property. Can I either send or receive the canmount=noauto property? (PSARC/2009/510) I know that I need at least version 22 for that. I tried it on a b134 with version 22 pools but couldn't get it to work. I do not know of a method for injecting property changes into a send stream. This might be an interesting RFE, but I fear the HCI for such a feature is a bigger problem. How can I prevent mounting filesystems during zpool import? I know how to mount it on a different root that doesn't solve my problem. Why can't the canmount zfs property be inherited? I don't see any definitive statement in the ARC case logs. However, I believe that trying to teach people how to zfs create -o canmount=noauto is far more difficult than teaching how to set canmount on an existing file system. Any suggestion and / or strategy to accomplish will be more than welcome. I have reservations about using zfs send -R because it rarely suits my needs. While it appears to save keystrokes, it makes policy management more difficult. -- richard Thank you for your interest and time, Frederik -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send hangs
On Fri, April 9, 2010 13:20, Daniel Bakken wrote: My zfs filesystem hangs when transferring large filesystems (500GB) with a couple dozen snapshots between servers using zfs send/receive with netcat. The transfer hangs about halfway through and is unkillable, freezing all IO to the filesystem, requiring a hard reboot. I have attempted this three times and failed every time. On the destination server I use: nc -l -p 8023 | zfs receive -vd sas On the source server I use: zfs send -vR promise1/rbac...@daily.1 | nc mothra 8023 I have problems using incremental replication streams that sound similar (hands, IO system disruption). I'm on build 111b, that is, 2009.06. I'm hoping things will clear up when 2010.$Spring comes out, which should be soon. Your data point is not helping my confidence there, though! -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] L2ARC L2_Size kstat fluctuate
Hi all I ran an OLTP-Filebench workload I set Arc max size = 2 gb l2arc ssd device size = 32gb workingset(dataset) = 10gb , 10 files , 1gb each after running the workload for 6 hours and monitoring kstat , I have noticed that l2_size from kstat has reached 10gb which is great . however, l2_size started to drop all the way to 7gb which means that the workload will go back to the HDD to retirive some data that are no longer on l2arc device . I understand that l2arc size reflected by zpool iostat is much larger becuase of COW and l2_size from kstat is the actual size of l2arc data. so can any one tell me why I am loosing my workingset from l2_size actual data !!! here a copy of my kstat zppol kstat *** l2_size 56832 l2_size 328063488 l2_size 779794944 l2_size 1354787328 l2_size 1930713600 l2_size 2455841280 l2_size 2968873472 l2_size 3490916864 l2_size 3973593600 l2_size 4464867840 l2_size 4936317440 l2_size 5397862912 l2_size 5798283776 l2_size 6284609536 l2_size 6719334400 l2_size 7115446784 l2_size 747960 l2_size 7824894464 l2_size 8199109120 l2_size 8547932672 l2_size 8882055680 l2_size 9143912960 l2_size 9405434368 l2_size 9589115392 l2_size 9793055232 l2_size 9947593216 l2_size 10077579776 l2_size 10177542656 l2_size 10236250624 l2_size 10363714048 l2_size 10405505536 l2_size 9461303808 l2_size 9211787776 l2_size 8871764480 l2_size 8693268992 l2_size 8734097920 l2_size 8538903040 l2_size 8259551744 l2_size 7984349696 l2_size 7858135552 l2_size 7729111552 l2_size 7832486400 l2_size 7676416512 l2_size 7613940224 l2_size 7503409664 l2_size 7400632832 l2_size 7296352768 l2_size 7234888192 l2_size 7274947072 l2_size 7197770240 l2_size 7367848448 l2_size 7386595840 l2_size 7368700416 l2_size 7402328576 l2_size 7281926656 l2_size 7201276416 l2_size 7230919168 l2_size 7558078976 l2_size 7546552832 l2_size 7368802816 l2_size 7312437248 l2_size 7202963456 l2_size 7373578240 l2_size 7438184448 l2_size 7240036352 l2_size 7408721920 l2_size 7306350592 l2_size 7216246784 l2_size 7517110272 l2_size 7336427520 l2_size 7386693632 l2_size 7367741440 l2_size 7457832960 l2_size 7296126976 l2_size 7176265728 l2_size 6986084352 l2_size 7133356032 l2_size 7126814720 l2_size 7047786496 l2_size 7396147200 l2_size 7543431168 l2_size 7586426880 l2_size 7466901504 l2_size
[zfs-discuss] about backup and mirrored pools
When I started using zfs a while back, I got the impression that setting my home server up with mirror sets rather than some kind of zraid would offer the most reliable setup for my data. My data is just what you'd expect on a home lan... no real commercial value involved. I've since created 2 zpools beyond rpool each with a single mirror set. I happened to notice someones' config posted here recently where a single zpool was made up of several mirror sets. From: Andreas Höschler ahoe...@smartsoft.de Subject: Replacing disk in zfs pool Newsgroups: gmane.os.solaris.opensolaris.zfs To: zfs-discuss@opensolaris.org Date: Fri, 9 Apr 2010 10:58:16 +0200 Message-ID: 099e714d-43b6-11df-83fb-000393ca0...@smartsoft.de I hadn't even thought of such a setup, but wonder now if that would have been a better way to go. My needs are small, and the zfs server acts mostly as NAS for home lan. I've been thinking the mirrors on the zfs server were the final stopping place for my backups. I'm thinking the mirrors are reliable enough that I don't do even more backups of the backup zpools. I mean other than auto snapshots. I'm thinking a crippled mirror can be recovered rather than needing a backup of it. And that short of 2 mirrored disks dieing at the same time. I'm in pretty good shape. Am I way wrong on this, and further I'm curious if it would make more versatile use of the space if I were to put the mirrored pairs into one big pool containing 3 mirrored pairs (6 discs) So where they had been separate pools, where one might fill up while another stayed fairly empty, if they were all in a single pool none would fill up until they all filled up. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send hangs
I had some issues with direct send/receives myself. In the end I elected to send to a gz file and then scp that file across to receive from the file on the otherside. This has been working fine 3 times a day for about 6 months now. two sets of systems using doing this so far, a set running b111b and a set running b133. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about backup and mirrored pools
On Fri, April 9, 2010 14:38, Harry Putnam wrote: I happened to notice someones' config posted here recently where a single zpool was made up of several mirror sets. From: Andreas Höschler ahoe...@smartsoft.de Subject: Replacing disk in zfs pool Newsgroups: gmane.os.solaris.opensolaris.zfs To: zfs-discuss@opensolaris.org Date: Fri, 9 Apr 2010 10:58:16 +0200 Message-ID: 099e714d-43b6-11df-83fb-000393ca0...@smartsoft.de I hadn't even thought of such a setup, but wonder now if that would have been a better way to go. Probably; unless you need different performance out of the two, or something. My needs are small, and the zfs server acts mostly as NAS for home lan. That's the job mine does; keeping all those photos, and a little music. I've been thinking the mirrors on the zfs server were the final stopping place for my backups. I'm thinking the mirrors are reliable enough that I don't do even more backups of the backup zpools. I mean other than auto snapshots. I'm thinking a crippled mirror can be recovered rather than needing a backup of it. And that short of 2 mirrored disks dieing at the same time. I'm in pretty good shape. Am I way wrong on this, and further I'm curious if it would make more versatile use of the space if I were to put the mirrored pairs into one big pool containing 3 mirrored pairs (6 discs) Well, my own thinking doesn't consider that adequate for my own data; which is not identical to thinking you're actually wrong, of course. Issues I see include: Flood, fire, foes, bugs, user error. rm -rf / will destroy your data just as well on the mirror as on a single disk, as will hacker breakins. OS and driver bugs can corrupt both sides of the mirror. And burning your house down, or flooding it perhaps (depending on where your server is; mine's in the basement, so if we flood, it gets wet), will destroy your data. I make and keep off-site backups, formerly on optical media, moving towards external disk drives. So where they had been separate pools, where one might fill up while another stayed fairly empty, if they were all in a single pool none would fill up until they all filled up. Yes, that's the advantage. I'm running three mirror vdevs in one data pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about backup and mirrored pools
Mirrored sets do protect against disk failure, but most of the time you'll find proper backups are better as most issues are more on the order of oops than blowed up sir. Perhaps mirrored sets with daily snapshots and a knowedge of how to mount snapshots as clones so that you can pull a copy of that file you deleted 3 days ago. :) If your especially paranoid a 3 way mirror set with copies set to 2. =) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC L2_Size kstat fluctuate
On 09 April, 2010 - Abdullah Al-Dahlawi sent me these 27K bytes: Hi all I ran an OLTP-Filebench workload I set Arc max size = 2 gb l2arc ssd device size = 32gb workingset(dataset) = 10gb , 10 files , 1gb each after running the workload for 6 hours and monitoring kstat , I have noticed that l2_size from kstat has reached 10gb which is great . however, l2_size started to drop all the way to 7gb which means that the workload will go back to the HDD to retirive some data that are no longer on l2arc device . I understand that l2arc size reflected by zpool iostat is much larger becuase of COW and l2_size from kstat is the actual size of l2arc data. so can any one tell me why I am loosing my workingset from l2_size actual data !!! Maybe the data in the l2arc was invalidated, because the original data was rewritten? /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC L2_Size kstat fluctuate
Hi Tomas I understand from previous post http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg36914.html that if the data gets invalidated, the l2arc size that is shown by zpool iostat is the one that changed (always growing because of COW) not the actual size shown by kstat which represent the size of the up to date data in l2arc. My only conclusion here to this fluctuation in kstat l2_size is the fact that data has indeed invalidated and did not made it back to l2arc from the tail of ARC !!! Am I right On Fri, Apr 9, 2010 at 4:33 PM, Tomas Ögren st...@acc.umu.se wrote: On 09 April, 2010 - Abdullah Al-Dahlawi sent me these 27K bytes: Hi all I ran an OLTP-Filebench workload I set Arc max size = 2 gb l2arc ssd device size = 32gb workingset(dataset) = 10gb , 10 files , 1gb each after running the workload for 6 hours and monitoring kstat , I have noticed that l2_size from kstat has reached 10gb which is great . however, l2_size started to drop all the way to 7gb which means that the workload will go back to the HDD to retirive some data that are no longer on l2arc device . I understand that l2arc size reflected by zpool iostat is much larger becuase of COW and l2_size from kstat is the actual size of l2arc data. so can any one tell me why I am loosing my workingset from l2_size actual data !!! Maybe the data in the l2arc was invalidated, because the original data was rewritten? /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/http://www.acc.umu.se/%7Estric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se -- Abdullah Al-Dahlawi PhD Candidate George Washington University Department. Of Electrical Computer Engineering Check The Fastest 500 Super Computers Worldwide http://www.top500.org/list/2009/11/100 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Fri, Apr 09, 2010 at 10:21:08AM -0700, Eric Andersen wrote: If I could find a reasonable backup method that avoided external enclosures altogether, I would take that route. I'm tending to like bare drives. If you have the chassis space, there are 5-in-3 bays that don't need extra drive carriers, they just slot a bare 3.5 drive. For e.g. http://www.newegg.com/Product/Product.aspx?Item=N82E16817994077 a 5way raidz backup pool would be quite useful. Otherwise, there are esata docking stations for 1 or 2 drives. Overall, it's cheap and you're far more in control of the unknowns of controllers and chips. Then there are simple boxes to protect the drives in storage/transport, ranging from little silicone sleeves to 5 way hard plastic boxes. If we still are capped out at 2TB as the limit for a physical device in 2 years, I solemnly pledge now that I will drink a six-pack of beer in his name. I solemnly pledge to do it anyway. And why wait? ;-) +6 -- Dan. pgpbhPkwYADqd.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send hangs
On 04/10/10 06:20 AM, Daniel Bakken wrote: My zfs filesystem hangs when transferring large filesystems (500GB) with a couple dozen snapshots between servers using zfs send/receive with netcat. The transfer hangs about halfway through and is unkillable, freezing all IO to the filesystem, requiring a hard reboot. I have attempted this three times and failed every time. On the destination server I use: nc -l -p 8023 | zfs receive -vd sas On the source server I use: zfs send -vR promise1/rbac...@daily.1 | nc mothra 8023 The filesystems on both servers are the same (zfs version 3). The source zpool is version 22 (build 129), and the destination server is version 14 (build 111b). Consider upgrading. I used to see issues like this on Solaris before update 8 (which uses version 15). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Fri, Apr 9, 2010 at 6:14 AM, Edward Ned Harvey solar...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Eric Andersen I backup my pool to 2 external 2TB drives that are simply striped using zfs send/receive followed by a scrub. As of right now, I only have 1.58TB of actual data. ZFS send over USB2.0 capped out at 27MB/s. The scrub for 1.5TB of backup data on the USB drives took roughly 14 hours. As needed, I'll destroy the backup pool and add more drives as needed. I looked at a lot of different options for external backup, and decided to go with cheap (USB). I am doing something very similar. I backup to external USB's, which I leave connected to the server for obviously days at a time ... zfs send followed by scrub. You might want to consider eSATA instead of USB. Just a suggestion. You should be able to go about 4x-6x faster than 27MB/s. I have found external enclosures to be unreliable. For whatever reason, they commonly just flake out, and have to be power cycled. This is unfortunately disastrous to solaris/opensolaris. The machine crashes, you have to power cycle, boot up in failsafe mode, import the pool(s) and then reboot once normal. I think your best bet for an external enclosure is to use a real chassis, like a Supermicro with a SAS backplane or similar. You local whitebox seller (or Newegg, or Silicon Mechanics) should be able to sell something like this. Sans Digital makes a few 4 and 8 drive cases that (for the money) look like they may not suck, with eSATA, USB or SAS connections. $300 for an 8-drive eSATA/PMP chassis, $400 for 8-drive SAS/SATA. I haven't used them, but from the specs they look not horrible. http://www.newegg.com/Product/Product.aspx?Item=N82E16816111071 http://www.newegg.com/Product/Product.aspx?Item=N82E16816111092 -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about backup and mirrored pools
On Fri, 9 Apr 2010, Harry Putnam wrote: Am I way wrong on this, and further I'm curious if it would make more versatile use of the space if I were to put the mirrored pairs into one big pool containing 3 mirrored pairs (6 discs) Besides more versatile use of the space, you would get 3X the performance. Luckily, since you are using mirrors, you can easily migrate disks from your existing extra pools to the coalesced pool. Just make sure to scrub first in order to have confidence that there won't be data loss. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC L2_Size kstat fluctuate
On 09 April, 2010 - Abdullah Al-Dahlawi sent me these 5,3K bytes: Hi Tomas I understand from previous post http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg36914.html that if the data gets invalidated, the l2arc size that is shown by zpool iostat is the one that changed (always growing because of COW) not the actual size shown by kstat which represent the size of the up to date data in l2arc. My only conclusion here to this fluctuation in kstat l2_size is the fact that data has indeed invalidated and did not made it back to l2arc from the tail of ARC !!! Am I right Sounds plausible. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
Now that Erik has made me all nervous about my 3xRAIDz2 of 8x2TB 7200RPM disks approach, I'm considering moving forward using more and smaller 2.5 disks instead. The problem is that at eight drives per LSI 3018, I run out of PCIe slots quickly. The ARC-1680 cards would appear to offer greater drive densities, but a quick Google search shows that they've overpromised and underdelivered on Solaris support in the past. Is anybody currently using those cards on OpenSolaris? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Sat, Apr 10 at 7:22, Daniel Carosone wrote: On Fri, Apr 09, 2010 at 10:21:08AM -0700, Eric Andersen wrote: If I could find a reasonable backup method that avoided external enclosures altogether, I would take that route. I'm tending to like bare drives. If you have the chassis space, there are 5-in-3 bays that don't need extra drive carriers, they just slot a bare 3.5 drive. For e.g. http://www.newegg.com/Product/Product.aspx?Item=N82E16817994077 I have a few of the 3-in-2 versions of that same enclosure from the same manufacturer, and they installed in about 2 minutes in my tower case. The 5-in-3 doesn't have grooves in the sides like their 3-in-2 does, so some cases may not accept the 5-in-3 if your case has tabs to support devices like DVD drives in the 5.25 slots. The grooves are clearly visible in this picture: http://www.newegg.com/Product/Product.aspx?Item=N82E16817994075 The doors are a bit light perhaps, but it works just fine for my needs and holds drives securely. The small fans are a bit noisy, but since the box lives in the basement I don't really care. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send hangs
-Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Daniel Bakken My zfs filesystem hangs when transferring large filesystems (500GB) with a couple dozen snapshots between servers using zfs send/receive with netcat. The transfer hangs about halfway through and is unkillable, freezing all IO to the filesystem, requiring a hard reboot. I have attempted this three times and failed every time. The behavior you've described is typical for having a device simply disappear from a zpool. For example, if you have a zpool on a single external disk, and you accidentally disconnect the external disk ... *poof* you need to power cycle. If you're using all raidz, or mirrored, or redundant drives ... then it's typical behavior for a failing or flaky disk controller. Even if your system is not using external disks, you better consider the possibility that you've got some flaky or buggy hardware. I'll suggest doing a zfs send to /dev/null. And run a scrub. And see if the system simply dies because of doing large sustained IO. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
On Wed, Apr 7, 2010 at 10:47 AM, Daniel Bakken dan...@economicmodeling.com wrote: When I send a filesystem with compression=gzip to another server with compression=on, compression=gzip is not set on the received filesystem. I am using: Is compression set on the dataset, or is it being inherited from a parent dataset? I think only locally set properties are preserved. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
At 11:19 AM +1000 2/19/10, James C. McPherson wrote: On 19/02/10 12:51 AM, Maurice Volaski wrote: For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For those of you who have been running xVM without MSI support, could you please confirm whether the devices exhibiting the problem are internal to your host, or connected via jbod. And if via jbod, please confirm the model number and cables. For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. I have a SuperMicro X8DTN motheboard and an LSI SAS3081E-R, which is firmware 1.28.02.00-IT. I have 24 drives attached to the backplane of the system with a single mini-SAS cable probably not even 18 inches long. All the drives are WD RE4-GP. OpenSolaris, snv_130, is running on VMWare, but I am using PCI passthrough for the LSI card. It turns out the the mpt_sas HBAs are affected the same way: Hi Maurice, this is very interesting to note. I'll pass the info along to the relevant team (they're in Beijing, so away for another few days due to Spring Festival). I have identified the culprit is the Western Digital drive WD2002FYPS-01U1B0. It's not clear if they can fix it in firmware, but Western Digital is replacing my drives. Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:45:10 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:45:10 thecratewall Log info 0x31110630 received for target 13. Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:45:10 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:45:10 thecratewall Log info 0x31110630 received for target 13. Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:45:10 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:45:10 thecratewall Log info 0x31110630 received for target 13. Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:47:57 thecratewall scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110630 Feb 17 04:47:57 thecratewall scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110630 Feb 17 04:47:57 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall Log info 0x31110630 received for target 33. Feb 17 04:47:57 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:47:57 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall Log info 0x31110630 received for target 33. Feb 17 04:47:57 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:47:57 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall Log info 0x31110630 received for target 33. Feb 17 04:47:57 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:47:57 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall Log info 0x31110630 received for target 33. Feb 17 04:47:57 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:47:57 thecratewall scsi: [ID 365881 kern.info] /p...@0,0/pci15ad,7...@15/pci1000,3...@0 (mpt_sas0): Feb 17 04:47:57 thecratewall Log info 0x31110630 received for target 33. Feb 17 04:47:57 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc -- James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog -- Maurice Volaski, maurice.vola...@einstein.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vPool unavailable but RaidZ1 is online
On Sun, Apr 04, 2010 at 07:13:58AM -0700, Kevin wrote: I am trying to recover a raid set, there are only three drives that are part of the set. I attached a disk and discovered it was bad. It was never part of the raid set. Are you able to tell us more precisely what you did with this disk? For example, exactly how did you attach the disk? Maybe it was in fact added as a non-redundant second vdev? Can you attach the output of zdb -l from one of the pool devices? zdb -h may reveal more accurately the history of what was attached and how. -- Dan. pgp5AP6tnrwss.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
Hi David, why not just use a couple of SAS expanders? Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss