Re: [zfs-discuss] Motherboard for home zfs/solaris file server
Ok, i am ready to try. 2 last questions before I go for it: - which version of (open)solaris for Ecc support (which seems to have been dropped from 200906) and general as-few-headaches-as-possible installation? - do you think this issue with the AMD Athlon II X2 250 http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3572p=2cp=4 would affect cool'n'quiet support in solaris? thx for your insight. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] x4540 dead HDD replacement, remains configured.
x4540 snv_117 We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you). So today, I'm looking at replacing the broken HDD, but no amount of work makes it turn on the blue LED. After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too). For example: # zpool status raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 UNAVAIL 0 0 0 cannot open c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 spares c4t7d0 INUSE currently in use # zpool offline zpool1 c1t5d0 raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 # cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # fmadm faulty FRU : HD_ID_47 (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) faulty # fmadm repair HD_ID_47 fmadm: recorded repair to HD_ID_47 # format | grep c1t5d0 # # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # ipmitool sunoem led get|grep 13 hdd13.fail.led | ON hdd13.ok2rm.led | OFF # zpool online zpool1 c1t5d0 warning: device 'c1t5d0' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present # cfgadm -c disconnect c1::dsk/c1t5d0 cfgadm: Hardware specific failure: operation not supported for SCSI device Bah, why were they changed to SCSI? Increasing the size of the hammer... # cfgadm -x replace_device c1::sd37 Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0 This operation will suspend activity on SCSI bus: c1 Continue (yes/no)? y SCSI bus quiesced successfully. It is now safe to proceed with hotplug operation. Enter y if operation is complete or n to abort (yes/no)? y # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540. Any other commands I should try? Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
The case is made by Chyangfun, and the model made for Mini-ITX motherboards is called CGN-S40X. They had 6 pcs left last I talked to them, and need 3 week lead for more if I understand it correctly. I need to finish my LCD panel work before I will open shop to sell these. As for temperature, I have only check the server HDDs so far (on my wiki) but will test with green HDDs tonight. I do not know if Solaris can retrieve the Atom chipset temperature readings. The parts I used should be listed on my wiki. Anon wrote: I have the same case which I use as directed attached storage. I never thought about using it with a motherboard inside. Could you provide a complete parts list? What sort of temperatures at the chip, chipset, and drives did you find? Thanks! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
I suspect this is what it is all about: # devfsadm -v devfsadm[16283]: verbose: no devfs node or mismatched dev_t for /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a [snip] and indeed: brw-r- 1 root sys 30, 2311 Aug 6 15:34 s...@4,0:wd crw-r- 1 root sys 30, 2311 Aug 6 15:24 s...@4,0:wd,raw drwxr-xr-x 2 root sys2 Aug 6 14:31 s...@5,0 drwxr-xr-x 2 root sys2 Apr 17 17:52 s...@6,0 brw-r- 1 root sys 30, 2432 Jul 6 09:50 s...@6,0:a crw-r- 1 root sys 30, 2432 Jul 6 09:48 s...@6,0:a,raw Perhaps because it was booted with the dead disk in place, it never configured the entire sd5 mpt driver. Why the other hard-disks work I don't know. I suspect the only way to fix this, is to reboot again. Lund Jorgen Lundman wrote: x4540 snv_117 We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you). So today, I'm looking at replacing the broken HDD, but no amount of work makes it turn on the blue LED. After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too). For example: # zpool status raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 UNAVAIL 0 0 0 cannot open c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 spares c4t7d0 INUSE currently in use # zpool offline zpool1 c1t5d0 raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 # cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # fmadm faulty FRU : HD_ID_47 (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) faulty # fmadm repair HD_ID_47 fmadm: recorded repair to HD_ID_47 # format | grep c1t5d0 # # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # ipmitool sunoem led get|grep 13 hdd13.fail.led | ON hdd13.ok2rm.led | OFF # zpool online zpool1 c1t5d0 warning: device 'c1t5d0' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present # cfgadm -c disconnect c1::dsk/c1t5d0 cfgadm: Hardware specific failure: operation not supported for SCSI device Bah, why were they changed to SCSI? Increasing the size of the hammer... # cfgadm -x replace_device c1::sd37 Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0 This operation will suspend activity on SCSI bus: c1 Continue (yes/no)? y SCSI bus quiesced successfully. It is now safe to proceed with hotplug operation. Enter y if operation is complete or n to abort (yes/no)? y # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540. Any other commands I should try? Lund -- Jorgen
Re: [zfs-discuss] Shrinking a zpool?
It is unfortunately a very difficult problem, and will take some time to solve even with the application of all possible resources (including the majority of my time). We are updating CR 4852783 at least once a month with progress reports. Matt, should these progress reports be visible via [1] ? Right now it doesn't seem to be available. Moreover, it says the last update was 6-May-2009. May I suggest using this forum (zfs-discuss) to periodically report the progress ? Chances are that most of the people waiting for this feature reading this list. [1] http://bugs.opensolaris.org/view_bug.do?bug_id=4852783 -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
On Wed, Aug 5, 2009 at 11:48 PM, Jorgen Lundmanlund...@gmo.jp wrote: I suspect this is what it is all about: # devfsadm -v devfsadm[16283]: verbose: no devfs node or mismatched dev_t for /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a [snip] and indeed: brw-r- 1 root sys 30, 2311 Aug 6 15:34 s...@4,0:wd crw-r- 1 root sys 30, 2311 Aug 6 15:24 s...@4,0:wd,raw drwxr-xr-x 2 root sys 2 Aug 6 14:31 s...@5,0 drwxr-xr-x 2 root sys 2 Apr 17 17:52 s...@6,0 brw-r- 1 root sys 30, 2432 Jul 6 09:50 s...@6,0:a crw-r- 1 root sys 30, 2432 Jul 6 09:48 s...@6,0:a,raw Perhaps because it was booted with the dead disk in place, it never configured the entire sd5 mpt driver. Why the other hard-disks work I don't know. I suspect the only way to fix this, is to reboot again. Lund I have a pair of X4540's also, and getting any kind of drive status, or failure alert is a lost cause. I've opened several cases with Sun with the following issues: ILOM/BMC can't see any drives (status, FRU, firmware, etc) FMA cannot see a drive failure (you can pull a drive, and it could be hours before 'zpool status' will show a failed drive, even during a 'zfs scrub') Hot swapping drives rarely works, system will not see new drive until a reboot Things I've tried that Sun has suggested: New BIOS New controller firmware New ILOM firmware Upgrading to new releases of Osol (currently on 118, no luck) Replacing ILOM card Custom FMA configs Nothing works, and my cases with Sun have been open for about 6 months now, with no resolution in sight. Given that Sun now makes the 7000, I can only assume their support on the more whitebox version, AKA X4540, is either near an end, or they don't intend to support any advanced monitoring whatsoever. Sad, really.. as my $900 Dell and HP servers can send SMS, Jabber messages, SNMP traps, etc, on ANY IPMI event, hardware issue, and what have you without any tinkering or excuses. -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
Whoah! We have yet to experience losing a disk that didn't force a reboot Do you have any notes on how many times this has happened Jorgen, or what steps you've taken each time? I appreciate you're probably more concerned with getting an answer to your question, but if ZFS needs a reboot to cope with failures on even an x4540, that's an absolute deal breaker for everything we want to do with ZFS. Ross -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?
Or use UFS filesystem ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Brian Kolaci wrote: They understand the technology very well. Yes, ZFS is very flexible with many features, and most are not needed in an enterprise environment where they have high-end SAN storage that is shared between Sun, IBM, linux, VMWare ESX and Windows. Local disk is only for the OS image. There is no need to have an M9000 be a file server. They have NAS for that. They use SAN across the enterprise and it gives them the ability to fail-over to servers in other data centers very quickly. Different business groups cannot share the same pool for many reasons. Each business group pays for their own storage. There are legal issues as well, and in fact cannot have different divisions on the same frame let alone shared storage. But they're in a major virtualization push to the point that nobody will be allowed to be on their own physical box. So the big push is to move to VMware, and we're trying to salvage as much as we can to move them to containers and LDoms. That being the case, I've recommended that each virtual machine on either a container or LDom should be allocated their own zpool, and the zonepath or LDom disk image be on their own zpool. This way when (not if) they need to migrate to another system, they have one pool to move over. They use fixed sized LUNs, so the granularity is a 33GB LUN, which can be migrated. This is also the case for their clusters as well as SRDF to their COB machines. If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [cifs-discuss] ZFS CIFS problem with Ubuntu, NFS as an alternative?
Afshin, thanks for the response. You seem to be everywhere on the forum... Respect... :-) The ACL on the files I tried are the same, I always do a chmod -R when changing ACLs on the dataset/directory. You got a recommendation for a network trace tool? I could do it on OpenSolaris (file server) or Ubuntu (client, virtual machine on VBox 2.2.4). I'm not familiar with making network traces. Many thanks for your help. Chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. You can move zpools between computers, you can't move individual file systems. Remember that there is a SAN involved. The disk array does not run Solaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Hi Matt Thanks for this update, and the confirmation to the outside world that this problem is being actively worked on with significant resources. But I would like to support Cyril's comment. AFAIK, any updates you are making to bug 4852783 are not available to the outside world via the normal bug URL. It would be useful if we were able to see them. I think it is frustrating for the outside world that it cannot see Sun's internal source code repositories for work in progress, and only see the code when it is complete and pushed out. And so there is no way to judge what progress is being made, or to actively help with code reviews or testing. Best Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Mattias Pantzare wrote: If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. You can move zpools between computers, you can't move individual file systems. send/receive? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, Aug 6, 2009 at 12:45, Ian Collinsi...@ianshome.com wrote: Mattias Pantzare wrote: If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. You can move zpools between computers, you can't move individual file systems. send/receive? :-) What is the downtime for doing a send/receive? What is the downtime for zpool export, reconfigure LUN, zpool import? And you still need to shrink the pool. Move a 100Gb application from server A to server B using send/receive and you will have 100Gb stuck on server A that you can't use on server B where you relay need it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Nigel Smith wrote: Hi Matt Thanks for this update, and the confirmation to the outside world that this problem is being actively worked on with significant resources. But I would like to support Cyril's comment. AFAIK, any updates you are making to bug 4852783 are not available to the outside world via the normal bug URL. It would be useful if we were able to see them. I think it is frustrating for the outside world that it cannot see Sun's internal source code repositories for work in progress, and only see the code when it is complete and pushed out. That is no different to the vast majority of Open Source projects either. Open Source and Open Development usually don't give you access to individuals work in progress. Compare this to Linux kernel development, you usually don't get to see the partially implemented drivers or changes until they are requesting integration into the kernel. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
On 4-Aug-09, at 16:54 , Ian Collins wrote: Use a CompactFlash card (the board has a slot) for root, 8 drives in raidz2 tank, backup the root regularly If booting/running from CompactFlash works, then I like this one. Backing up root should be trivial since you can back it up into your big storage pool. Usually root contains mostly non-critical data. The nice SAS backplane seems too precious to waste for booting. Do you know if it is possible to put just grub, stage2, kernel on the CF card, instead of the entire root? You can move some of root to another device, but I don't think you can move the bulk - /usr. See: http://docs.sun.com/source/820-4893-13/compact_flash.html#50589713_78631 Good link. So I suppose I can move /var out and that would deal with most (all?) of the writes. Good plan! A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS rollback got hanged
In a sol10 box which in ZFS filesystem,I took a snapshot of whole sol box (root dir) and then made some changes in /opt dir(30 - 40 MB).After this ,When I tried to rollback the snapshot,the sol box got hanged.Does any one faced similar issues? Is it depends on the size of changes we make?Please comment on this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Ross wrote: But with export / import, are you really saying that you're going to physically move 100GB of disks from one system to another? zpool export/import would not move anything on disk. It just changes which host the pool is attached to. This is exactly how cluster failover works in the SS7000 systems. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
Well, to be fair, there were some special cases. I know we had 3 separate occasions with broken HDDs, when we were using UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced the disk. This is most likely due to use using UFS in zvol (for quotas). We got an IDR patch, and eventually this was released as UFS 3-way deadlock writing log with zvol. I forget the number right now, but the patch is out. This is the very first time we have lost a disk in a purely-ZFS system, and I was somewhat hoping that this would be the time everything went smoothly. But it did not. However, I have also experienced (once) a disk dying in such a way that it took out the chain in a netapp, so perhaps the disk died like this here to (it is really dead). But still disappointing. Power cycling the x4540 takes about 7 minutes (service to service), but with Sol svn116(?) and up it can do quiesce-reboots, which take about 57 seconds. In this case, we had to power cycle. Ross wrote: Whoah! We have yet to experience losing a disk that didn't force a reboot Do you have any notes on how many times this has happened Jorgen, or what steps you've taken each time? I appreciate you're probably more concerned with getting an answer to your question, but if ZFS needs a reboot to cope with failures on even an x4540, that's an absolute deal breaker for everything we want to do with ZFS. Ross -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS will bring IO when the file is VERY short-lived?
Thanks. :) I have tested in my system, it's great. But, you know, ZIO is pipelined, it means that the IO request may be sent, and when you unlink the file, the IO stage is in progress. so, would it be canceled else? From: Bob Friesenhahn bfrie...@simple.dallas.tx.us To: Chookiex hexcoo...@yahoo.com Cc: zfs-discuss@opensolaris.org Sent: Wednesday, August 5, 2009 11:25:45 PM Subject: Re: [zfs-discuss] Would ZFS will bring IO when the file is VERY short-lived? On Tue, 4 Aug 2009, Chookiex wrote: You know, ZFS afford a very Big buffer for write IO. So, When we write a file, the first stage is put it to buffer. But, if the file is VERY short-lived? Is it bring IO to disk? or else, it just put the meta data and data to memory, and then removed it? This depends on timing, available memory, and if the writes are synchronous. Synchronous writes are sent to disk immediately. Buffered writes seem to be very well buffered and small created files are not persisted until the next TXG sync interval and if they are immediately deleted it is as if they did not exist at all. This leads to a huge improvement in observed performance. % while true do rm -f crap.dat dd if=/dev/urandom of=crap.dat count=200 rm -f crap.dat sleep 1 done I just verified this by running the above script and running a tool which monitors zfs read and write requests. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Aug 6, 2009, at 5:36 AM, Ian Collins i...@ianshome.com wrote: Brian Kolaci wrote: They understand the technology very well. Yes, ZFS is very flexible with many features, and most are not needed in an enterprise environment where they have high-end SAN storage that is shared between Sun, IBM, linux, VMWare ESX and Windows. Local disk is only for the OS image. There is no need to have an M9000 be a file server. They have NAS for that. They use SAN across the enterprise and it gives them the ability to fail-over to servers in other data centers very quickly. Different business groups cannot share the same pool for many reasons. Each business group pays for their own storage. There are legal issues as well, and in fact cannot have different divisions on the same frame let alone shared storage. But they're in a major virtualization push to the point that nobody will be allowed to be on their own physical box. So the big push is to move to VMware, and we're trying to salvage as much as we can to move them to containers and LDoms. That being the case, I've recommended that each virtual machine on either a container or LDom should be allocated their own zpool, and the zonepath or LDom disk image be on their own zpool. This way when (not if) they need to migrate to another system, they have one pool to move over. They use fixed sized LUNs, so the granularity is a 33GB LUN, which can be migrated. This is also the case for their clusters as well as SRDF to their COB machines. If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. -- Ian. For failover scenarios you need a pool per application so they can move the application between servers which may be in different datacenters and each app on one server can fail over to a different server. So the storage needs to be partitioned as such. The failover entails moving or rerouting San. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, 6 Aug 2009, Cyril Plisko wrote: May I suggest using this forum (zfs-discuss) to periodically report the progress ? Chances are that most of the people waiting for this feature reading this list. Sun has placed themselves in the interesting predicament that being open about progress on certain high-profile enterprise features (such as shrink and de-duplication) could cause them to lose sales to a competitor. Perhaps this is a reason why Sun is not nearly as open as we would like them to be. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
But why do you have to attach to a pool? Surely you're just attaching to the root filesystem anyway? And as Richard says, since filesystems can be shrunk easily and it's just as easy to detach a filesystem from one machine and attach to it from another, why the emphasis on pools? For once I'm beginning to side with Richard, I just don't understand why data has to be in separate pools to do this. The only argument I can think of is for performance since pools use completely separate sets of disks. I don't know if zfs offers a way to throttle filesystems, but surely that could be managed at the network interconnect level? I have to say that I have no experience of enterprise class systems, these questions are purely me playing devils advocate as I learn :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?
You can use a separate SSD ZIL. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS will bring IO when the file is VERY short-lived?
On Thu, 6 Aug 2009, Chookiex wrote: But, you know, ZIO is pipelined, it means that the IO request may be sent, and when you unlink the file, the IO stage is in progress. so, would it be canceled else? In POSIX filesystems, if a file is still open when it is unlinked, then the file directory entry goes away but the file still exists as long as a process has an open file handle to it. This helps avoid certain problems which otherwise would exist. I doubt that ZFS ever cancels I/O in progress. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. On Wed, Aug 5, 2009 at 12:16 PM, Adam Sherman asher...@versature.comwrote: On 5-Aug-09, at 12:07 , Thomas Burgess wrote: i would be VERY surprised if you couldn't fit these in there SOMEWHERE, the sata to compactflash adapter i got was about 1.75 inches across and very very thin, i was able to mount them side by side on top of the drive tray in my machine, you can easily make a bracket...i know a guy who used double sided tape! but, check out this picturehttp:// www.newegg.com/Product/Product.aspx?Item=N82E16812186051 most of them can be found like this, they are VERY VERY thin and can be mounted just about anywhere. they don't get very hot. I've used them on a few machines, opensolaris and freebsd. I'm a big fan of compact flash. What about USB sticks? Is there a difference in practice? Thanks for the advice, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. I know that I generally prefer to let XFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? -Kyle A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
On 6-Aug-09, at 11:50 , Kyle McDonald wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. Turns out the FAQ page explains that it will not, too bad. I know that I generally prefer to let ZFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. I'm with you there. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? I just ordered a pair of the Syba units, cheap enough too test out anyway. Now to find some reasonably priced 8GB CompactFlash cards… Thanks, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
i've seen these before, if i remember right, it has a jumper on it to set as a sort of onboard raid0 or raid1...i'm not sure it it has a jbod mode thoughpersoanlly i prefer the small single cf to sata adapters, you'd be surprised how thin they are, you can attatch them with screws or even hot glue or double sided tape...they are as thin as the cards themselves and 1.75 inches across so 2 of them will fit across a 3.5 drive tray. Depending on the case, you can often make a custom mount for them...i know i have with several cases...i've yet to find one i couldn't fit them into SOMEWHERE On Thu, Aug 6, 2009 at 11:43 AM, Adam Sherman asher...@versature.comwrote: On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs snapshot of zoned ZFS dataset
I have a ZFS (e.g. tank/zone1/data) which is delegated to a zone as a dataset. As root in the global zone, I can zfs snapshot and zfs send this ZFS: zfs snapshot tank/zone1/data and zfs send tank/zone1/data without any problem. When I zfs allow another user (e.g. amanda) with: zfs allow -ldu amanda mount,create,rename,snapshot,destroy,send,receive this user amanda CAN DO zfs snapshot and zfs send on ZFS filesystems in the global zone, but it can not do these commands for the delegated zone (whilst root can do it) and I get a permission denied. A truss shows me: ioctl(3, ZFS_IOC_SNAPSHOT, 0x080469D0) Err#1 EPERM [sys_mount] fstat64(2, 0x08045BF0) = 0 cannot create snapshot 'tank/zone1/d...@test'write(2, c a n n o t c r e a t.., 53) = 53 Which setting am I missing to allow to do this for user amanda? Anyone experiencing the same? Regards, Marcel -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
if it's this one http://www.newegg.com/Product/Product.aspx?Item=N82E16812186051 it works perfectly. I've used them on several machines. They just show up as sata drives. That unit also has a very tiny red led that lights upit's QUITE brightbut you likely won't see it if it's inside the case. On Thu, Aug 6, 2009 at 11:54 AM, Adam Sherman asher...@versature.comwrote: On 6-Aug-09, at 11:50 , Kyle McDonald wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. Turns out the FAQ page explains that it will not, too bad. I know that I generally prefer to let ZFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. I'm with you there. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? I just ordered a pair of the Syba units, cheap enough too test out anyway. Now to find some reasonably priced 8GB CompactFlash cards… Thanks, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 11:50 , Kyle McDonald wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. Turns out the FAQ page explains that it will not, too bad. I know that I generally prefer to let ZFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. I'm with you there. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? I just ordered a pair of the Syba units, cheap enough too test out anyway. Oh. I was looking and if you have an IDE socket, this will do separate master/slave devices: (no IDE cable needed, it plugs right into the MB - There's another that uses a cable if you prefer.) http://www.addonics.com/products/flash_memory_reader/adeb44idecf.asp And 2 of these (which look remarkably like the Syba ones) would work too: http://www.addonics.com/products/flash_memory_reader/adsahdcf.asp They're only 30 each so 2 of those are less than the dual one. -Kyle Now to find some reasonably priced 8GB CompactFlash cards… Thanks, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
I've had SOME problem with the ide ones in the past. It depends on the card you get with idethe sata ones tend to work regardless...I'm not saying not to use ide, i'm just saying you might have to research your cf cards if you do. not all ide-cf will boot. On Thu, Aug 6, 2009 at 11:59 AM, Kyle McDonald kmcdon...@egenera.comwrote: Adam Sherman wrote: On 6-Aug-09, at 11:50 , Kyle McDonald wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. Turns out the FAQ page explains that it will not, too bad. I know that I generally prefer to let ZFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. I'm with you there. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? I just ordered a pair of the Syba units, cheap enough too test out anyway. Oh. I was looking and if you have an IDE socket, this will do separate master/slave devices: (no IDE cable needed, it plugs right into the MB - There's another that uses a cable if you prefer.) http://www.addonics.com/products/flash_memory_reader/adeb44idecf.asp And 2 of these (which look remarkably like the Syba ones) would work too: http://www.addonics.com/products/flash_memory_reader/adsahdcf.asp They're only 30 each so 2 of those are less than the dual one. -Kyle Now to find some reasonably priced 8GB CompactFlash cards… Thanks, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?
On Aug 6, 2009, at 11:09 AM, Scott Meilicke no-re...@opensolaris.org wrote: You can use a separate SSD ZIL. Yes, but to see if a separate ZIL will make a difference the OP should try his iSCSI workload first with ZIL then temporarily disable ZIL and re-try his workload. Nothing worse then buying expensive hardware to find it doesn't solve your problem. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs incremental send stream size
I'm puzzled by the size reported for incremental zfs send|zfs receive. I'd expect the stream to be roughly the same size as the used blocks reported by zfs list. Can anyone explain why the stream size reported is so much larger that the used data in the source snapshots? Thanks. % zfs list -r -t snapshot mail/00 | tail -4 mail/0...@.nightly 1.98M - 34.1G - mail/0...@0400.hourly 1.67M - 34.1G - mail/0...@0800.hourly 1.43M - 34.1G - mail/0...@1000.hourly 0 - 34.1G - # zfs send -i mail/0...@.nightly mail/0...@0400.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@0400.hourly into mailtest/0...@0400.hourly received 17.9MB stream in 4 seconds (4.49MB/sec) # zfs send -i mail/0...@0400.hourly mail/0...@0800.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@0800.hourly into mailtest/0...@0800.hourly received 15.1MB stream in 1 seconds (15.1MB/sec) # zfs send -i mail/0...@0800.hourly mail/0...@1000.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@1000.hourly into mailtest/0...@1000.hourly received 13.7MB stream in 2 seconds (6.86MB/sec) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Aug 6, 2009, at 7:59 AM, Ross wrote: But why do you have to attach to a pool? Surely you're just attaching to the root filesystem anyway? And as Richard says, since filesystems can be shrunk easily and it's just as easy to detach a filesystem from one machine and attach to it from another, why the emphasis on pools? For once I'm beginning to side with Richard, I just don't understand why data has to be in separate pools to do this. welcome to the dark side... bwahahahaa :-) The way I've always done such migrations in the past is to get everything ready in parallel, then restart the service pointing to the new data. The cost is a tiny bit and a restart, which isn't a big deal for most modern system architectures. If you have a high availability cluster, just add it to the list of things to do when you do a weekly/monthly/quarterly failover. Now, if I was to work in a shrink, I would do the same because shrinking moves data and moving data is risky. Perhaps someone could explain how they do a rollback from a shrink? Snapshots? I think the problem at the example company is that they make storage so expensive that the (internal) customers spend way too much time and money trying to figure out how to optimally use it. The storage market is working against this model by reducing the capital cost of storage. ZFS is tackling many of the costs related to managing storage. Clearly, there is still work to be done, but the tide is going out and will leave expensive storage solutions high and dry. Consider how different the process would be as the total cost of storage approaches zero. Would shrink need to exist? The answer is probably no. But the way shrink is being solved in ZFS has another application. Operators can still make mistakes with add vs attach so the ability to remove a top-level vdev is needed. Once this is solved, shrink is also solved. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Greg Mason wrote: What is the downtime for doing a send/receive? What is the downtime for zpool export, reconfigure LUN, zpool import? We have a similar situation. Our home directory storage is based on many X4540s. Currently, we use rsync to migrate volumes between systems, but our process could very easily be switched over to zfs send/receive (and very well may be in the near future). What this looks like, if using zfs send/receive, is we perform an initial send (get the bulk of the data over), and then at a planned downtime, do an incremental send to catch up the destination. This catch up phase is usually a very small fraction of the overall size of the volume. The only downtime required is from just before the final snapshot you send (the last incremental), and when the send finishes, and turning up whatever service(s) on the destination system. If the filesystem a lot of write activity, you can run multiple incrementals to decrease the size of that last snapshot. As far as backing out goes, you can simply destroy the destination filesystem, and continue running on the original system, if all hell breaks loose (of course that never happens, right? :) That is how I migrate services (zones) and their data between hosts with one of my clients. The big advantage of zfs send/receive over rsync is the final replication is very fast. Run a send/receive just before the migration than top up after the service shuts down. The last one we moved was a mail server with 1TB of small files and the downtime was under 2 minutes. The biggest delay was sending the start and done text messages! When everything checks out (which you can safely assume when the recv finishes, thanks to how ZFS send/recv works), you then just have to destroy the original fileystem. It is correct in that this doesn't shrink the pool, but it's at least a workaround to be able to swing filesystems around to different systems. If you had only one filesystem in the pool, you could then safely destroy the original pool. This does mean you'd need 2x the size of the LUN during the transfer though. For replication of ZFS filesystems, we a similar process, with just a lot of incremental sends. Same here. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] limiting the ARC cache during early boot, without /etc/system
On 08/06/09 14:28, Matt Ingenthron wrote: If ZFS is not beinng used significantly, then ARC should not grow. ARC grows based on the usage (ie. amount of ZFS files/data accessed). Hence, if you are sure that the ZFS usage is low, things should be fine. I understand that it won't grow, but I want it to be smaller than the default. Like I said, I have a use case where I would like to pre-allocate as many large pages as possible. How can I constrain or shrink it before I start my other applications? Thanks in advance, - Matt p.s.: I just found there may not be any large pages on domUs, so maybe it doesn't matter so much Hi Matt! Besides the /etc/system, you could also export all the pools, use mdb to set the same variable that /etc/system sets, and then import the pools again. Don't know of any other mechanism to limit ZFS's memory foot print. If you don't do ZFS boot, manually import the pools after the application starts, so you get your pages first. Steffen ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing faulty disk in ZFS pool
Dear managers, one of our servers (X4240) shows a faulty disk: -bash-3.00# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c1t6d0 FAULTED 0 19 0 too many errors c1t7d0 ONLINE 0 0 0 errors: No known data errors I derived the following possible approaches to solve the problem: 1) A way to reestablish redundancy would be to use the command zpool attach tank c1t7d0 c1t15d0 to add c1t15d0 to the virtual device c1t6d0 + c1t7d0. We still would have the faulty disk in the virtual device. We could then dettach the faulty disk with the command zpool dettach tank c1t6d0 2) Another approach would be to add a spare disk to tank zpool add tank spare c1t15d0 and the replace to replace the faulty disk. zpool replace tank c1t6d0 c1t15d0 In theory that is easy, but since I have never done that and since this is a productive server I would appreciate if somone with more experience would look on my agenda before I issue these commands. What is the difference between the two approaches? Which one do you recommend? And is that really all that has to be done or am I missing a bit? I mean can c1t6d0 be physically replaced after issuing zpool dettach tank c1t6d0 or zpool replace tank c1t6d0 c1t15d0? I also found the command zpool offline tank ... but am not sure whether this should be used in my case. Hints are greatly appreciated! Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp Take care, the SATA model didn't work with Solaris. I have haven't tried the current builds (I last tried with nv_101). The IDE model works fine. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Excellent advice, thans Ian. A. -- Adam Sherman +1.613.797.6819 On 2009-08-06, at 15:16, Ian Collins i...@ianshome.com wrote: Adam Sherman wrote: On 4-Aug-09, at 16:54 , Ian Collins wrote: Use a CompactFlash card (the board has a slot) for root, 8 drives in raidz2 tank, backup the root regularly If booting/running from CompactFlash works, then I like this one. Backing up root should be trivial since you can back it up into your big storage pool. Usually root contains mostly non- critical data. The nice SAS backplane seems too precious to waste for booting. Do you know if it is possible to put just grub, stage2, kernel on the CF card, instead of the entire root? You can move some of root to another device, but I don't think you can move the bulk - /usr. See: http://docs.sun.com/source/820-4893-13/compact_flash.html#50589713_78631 Good link. So I suppose I can move /var out and that would deal with most (all?) of the writes. Good plan! I also moved most of /opt out to save space. This ended up being a costly mistake, the environment I ended up with didn't play well with Live Upgrade. So I suggest what ever you do, make sure you can create a new BE and boot into it before committing. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
ob Friesenhahn wrote: Sun has placed themselves in the interesting predicament that being open about progress on certain high-profile enterprise features (such as shrink and de-duplication) could cause them to lose sales to a competitor. Perhaps this is a reason why Sun is not nearly as open as we would like them to be. I agree that it is difficult for Sun, at this time, to be more 'open', especially for ZFS, as we still await the resolution of Oracle purchasing Sun, the court case with NetApp over patents, and now the GreenBytes issue! But I would say they are more likely to avoid loosing sales by confirming what enhancements they are prioritising. I think people will wait if they know work is being done, and progress being made, although not indefinitely. I guess it depends on the rate of progress of ZFS compared to say btrfs. I would say that maybe Sun should have held back on announcing the work on deduplication, as it just seems to have ramped up frustration, now that it seems no more news is forthcoming. It's easy to be wise after the event and time will tell. Thanks Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Hi Andreas, Good job for using a mirrored configuration. :-) Your various approaches would work. My only comment about #2 is that it might take some time for the spare to kick in for the faulted disk. Both 1 and 2 would take a bit more time than just replacing the faulted disk with a spare disk, like this: # zpool replace tank c1t6d0 c1t15d0 Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational and use zpool clear to clear the previous errors on the pool. If the system is used heavily, then you might want to run the zpool scrub when system use is reduced. If you were going to physically replace c1t6d0 while it was still attached to the pool, then you might offline it first. Cindy On 08/06/09 13:17, Andreas Höschler wrote: Dear managers, one of our servers (X4240) shows a faulty disk: -bash-3.00# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirrorDEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t7d0 ONLINE 0 0 0 errors: No known data errors I derived the following possible approaches to solve the problem: 1) A way to reestablish redundancy would be to use the command zpool attach tank c1t7d0 c1t15d0 to add c1t15d0 to the virtual device c1t6d0 + c1t7d0. We still would have the faulty disk in the virtual device. We could then dettach the faulty disk with the command zpool dettach tank c1t6d0 2) Another approach would be to add a spare disk to tank zpool add tank spare c1t15d0 and the replace to replace the faulty disk. zpool replace tank c1t6d0 c1t15d0 In theory that is easy, but since I have never done that and since this is a productive server I would appreciate if somone with more experience would look on my agenda before I issue these commands. What is the difference between the two approaches? Which one do you recommend? And is that really all that has to be done or am I missing a bit? I mean can c1t6d0 be physically replaced after issuing zpool dettach tank c1t6d0 or zpool replace tank c1t6d0 c1t15d0? I also found the command zpool offline tank ... but am not sure whether this should be used in my case. Hints are greatly appreciated! Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discus s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Hi Cindy, Good job for using a mirrored configuration. :-) Thanks! Your various approaches would work. My only comment about #2 is that it might take some time for the spare to kick in for the faulted disk. Both 1 and 2 would take a bit more time than just replacing the faulted disk with a spare disk, like this: # zpool replace tank c1t6d0 c1t15d0 You mean I can execute zpool replace tank c1t6d0 c1t15d0 without having made c1t15d0 a spare disk first with zpool add tank spare c1t15d0 ? After doing that c1t6d0 is offline and ready to be physically replaced? Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational That would be zpool scrub tank in my case!? and use zpool clear to clear the previous errors on the pool. I assume teh complete comamnd fo rmy case is zpool clear tank Why d we have to do that. Couldb't zfs realize that everything is fine again after executing zpool replace tank c1t6d0 c1t15d0? If the system is used heavily, then you might want to run the zpool scrub when system use is reduced. That would be now! :-) If you were going to physically replace c1t6d0 while it was still attached to the pool, then you might offline it first. Ok, this sounds like approach 3) zpool offline tank c1t6d0 physically replace c1t6d0 with a new one zpool online tank c1t6d0 Would that be it? Thanks a lot! Regards, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Thomas Burgess wrote: that's strange...it works for me.at least the ones i've used have worked with opensolaris freebsd and linux. It just shows up as a normal sata drive. did you try more than one type of compactflash card? with the IDE unit, it was ALWAYS due to the cardmost of them would work SOMEWHAT but not all of them would boot...but i've had no problems at all with the sata versions. Same card works in an IDE adapter. The issues must have been fixed in later builds, I'll try again. On Thu, Aug 6, 2009 at 3:18 PM, Ian Collins i...@ianshome.com mailto:i...@ianshome.com wrote: Adam Sherman wrote: On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp Take care, the SATA model didn't work with Solaris. I have haven't tried the current builds (I last tried with nv_101). The IDE model works fine. -- Ian. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Andreas, More comments below. Cindy On 08/06/09 14:18, Andreas Höschler wrote: Hi Cindy, Good job for using a mirrored configuration. :-) Thanks! Your various approaches would work. My only comment about #2 is that it might take some time for the spare to kick in for the faulted disk. Both 1 and 2 would take a bit more time than just replacing the faulted disk with a spare disk, like this: # zpool replace tank c1t6d0 c1t15d0 You mean I can execute zpool replace tank c1t6d0 c1t15d0 without having made c1t15d0 a spare disk first with Yes, that is correct. zpool add tank spare c1t15d0 ? After doing that c1t6d0 is offline and ready to be physically replaced? Yes, that is correct. Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational That would be zpool scrub tank in my case!? Yes. and use zpool clear to clear the previous errors on the pool. I assume teh complete comamnd fo rmy case is zpool clear tank Why d we have to do that. Couldb't zfs realize that everything is fine again after executing zpool replace tank c1t6d0 c1t15d0? Yes, sometimes the clear is not necessary but it will also clear the error counts if need be. If the system is used heavily, then you might want to run the zpool scrub when system use is reduced. That would be now! :-) If you were going to physically replace c1t6d0 while it was still attached to the pool, then you might offline it first. Ok, this sounds like approach 3) zpool offline tank c1t6d0 physically replace c1t6d0 with a new one zpool online tank c1t6d0 Would that be it? Those steps would be like this: zpool offline tank c1t6d0 physically replace c1t6d0 with a new one zpool replace tank c1t6d0 zpool online tank c1t6d0 On some hardware, you must unconfigure the disk before replacing it, such as after taking it offline. I'm not sure if the x4240 is in that category. If you do the replacement with another known good disk (c1t15d0) then you do not have to unconfigure the failed disk first. See Example 11-1 for more information: http://docs.sun.com/app/docs/doc/819-5461/gbbvf?a=view Thanks a lot! Regards, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs incremental send stream size
On 08/06/09 12:19, Robert Lawhead wrote: I'm puzzled by the size reported for incremental zfs send|zfs receive. I'd expect the stream to be roughly the same size as the used blocks reported by zfs list. Can anyone explain why the stream size reported is so much larger that the used data in the source snapshots? Thanks. part of the reason is that the send stream contains a lot of records for free blocks and free objects. I'm working on a fix to the send stream format that will eliminate some of that. Lori % zfs list -r -t snapshot mail/00 | tail -4 mail/0...@.nightly 1.98M - 34.1G - mail/0...@0400.hourly 1.67M - 34.1G - mail/0...@0800.hourly 1.43M - 34.1G - mail/0...@1000.hourly 0 - 34.1G - # zfs send -i mail/0...@.nightly mail/0...@0400.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@0400.hourly into mailtest/0...@0400.hourly received 17.9MB stream in 4 seconds (4.49MB/sec) # zfs send -i mail/0...@0400.hourly mail/0...@0800.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@0800.hourly into mailtest/0...@0800.hourly received 15.1MB stream in 1 seconds (15.1MB/sec) # zfs send -i mail/0...@0800.hourly mail/0...@1000.hourly | zfs receive -v -F mailtest/00 receiving incremental stream of mail/0...@1000.hourly into mailtest/0...@1000.hourly received 13.7MB stream in 2 seconds (6.86MB/sec) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problem importing pool
Hello, I am having a problem importing a pool in 2009.06 that was created on zfs-fuse (ubuntu 8.10). Basically, I was having issues with a controller, and took a disk offline. After restarting with a new controller, I was unable to import the pool (in ubuntu). Someone had suggested that I try to import the pool in opensolaris, but I have not had any luck so far. I cannot import the pool because the drive is offline, and I cannot online the drive because the pool isnt imported. ja...@blackbox:~# zpool import pool: archive id: 282447908044376699 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: archive UNAVAIL insufficient replicas raidz1 DEGRADED c7d0p1 ONLINE c8d1p1 OFFLINE c8d0p1 ONLINE mirror UNAVAIL corrupted data c10d0p0 ONLINE c10d1p0 ONLINE ja...@blackbox:~# zpool import -f archive cannot import 'archive': invalid vdev configuration Any suggestions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Hi all, zpool add tank spare c1t15d0 ? After doing that c1t6d0 is offline and ready to be physically replaced? Yes, that is correct. Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational That would be zpool scrub tank in my case!? Yes. and use zpool clear to clear the previous errors on the pool. I assume teh complete comamnd fo rmy case is zpool clear tank Why d we have to do that. Couldb't zfs realize that everything is fine again after executing zpool replace tank c1t6d0 c1t15d0? Yes, sometimes the clear is not necessary but it will also clear the error counts if need be. I have done zpool add tank spare c1t15d0 zpool replace tank c1t6d0 c1t15d0 now and waited for the completion of the resilvering process. zpool status now gives me scrub: resilver completed after 0h22m with 0 errors on Thu Aug 6 22:55:37 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c1t15d0 INUSE currently in use errors: No known data errors This does look like a final step is missing. Can I simply physically replace c1t6d0 now or do I have to do zpool offline tank c1t6d0 first? Moreover it seems I have to run a zpool clear in my case to get rid of the DEGRADED message!? What is the missing bit here? zpool offline tank c1t6d0 physically replace c1t6d0 with a new one zpool replace tank c1t6d0 zpool online tank c1t6d0 Just out of curiosity (since I used the other road this time), how does the replace command know what exactly to do here. In my case I ordered the system specifically to replace c1t6d0 with c1t15d0 by doing zpool replace tank c1t6d0 c1t15d0 but if I simply issue zpool replace tank c1t6d0 it ...!?? Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Andreas, I think you can still offline the faulted disk, c1t6d0. The difference between these two replacements: zpool replace tank c1t6d0 c1t15d0 zpool replace tank c1t6d0 Is that in the second case, you are telling ZFS that c1t6d0 has been physically replaced in the same location. This would be equivalent but unnecessary syntax: zpool replace tank c1t6d0 c1t6d0 Another option is to set the autoreplace pool property to on, which will do the replacement steps (zpool replace) after you physically replace the disk in the same physical location as the faulted disk. This is also described in Example 11-1, here: http://docs.sun.com/app/docs/doc/819-5461/gbbvf?a=view After you physically replace c1t6d0, then you might have to detach the spare, c1t15d0, back to the spare pool, like this: # zpool detach tank c1t15d0 I'm not sure this step is always necessary... cs On 08/06/09 15:05, Andreas Höschler wrote: Hi all, zpool add tank spare c1t15d0 ? After doing that c1t6d0 is offline and ready to be physically replaced? Yes, that is correct. Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational That would be zpool scrub tank in my case!? Yes. and use zpool clear to clear the previous errors on the pool. I assume teh complete comamnd fo rmy case is zpool clear tank Why d we have to do that. Couldb't zfs realize that everything is fine again after executing zpool replace tank c1t6d0 c1t15d0? Yes, sometimes the clear is not necessary but it will also clear the error counts if need be. I have done zpool add tank spare c1t15d0 zpool replace tank c1t6d0 c1t15d0 now and waited for the completion of the resilvering process. zpool status now gives me scrub: resilver completed after 0h22m with 0 errors on Thu Aug 6 22:55:37 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c1t15d0 INUSE currently in use errors: No known data errors This does look like a final step is missing. Can I simply physically replace c1t6d0 now or do I have to do zpool offline tank c1t6d0 first? Moreover it seems I have to run a zpool clear in my case to get rid of the DEGRADED message!? What is the missing bit here? zpool offline tank c1t6d0 physically replace c1t6d0 with a new one zpool replace tank c1t6d0 zpool online tank c1t6d0 Just out of curiosity (since I used the other road this time), how does the replace command know what exactly to do here. In my case I ordered the system specifically to replace c1t6d0 with c1t15d0 by doing zpool replace tank c1t6d0 c1t15d0 but if I simply issue zpool replace tank c1t6d0 it ...!?? Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol10u7: can't zpool remove missing hot spare
Hi Kyle, Except that in the case of spares, you can't replace them. You'll see a message like the one below. Cindy # zpool create pool mirror c1t0d0 c1t1d0 spare c1t5d0 # zpool status pool: pool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM poolONLINE 0 0 0 mirrorONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 spares c1t5d0AVAIL # zpool replace pool c1t5d0 c2t5d0 cannot replace c1t5d0 with c2t5d0: device is reserved as a hot spare On 08/05/09 14:04, Kyle McDonald wrote: Will Murnane wrote: I'm using Solaris 10u6 updated to u7 via patches, and I have a pool with a mirrored pair and a (shared) hot spare. We reconfigured disks a while ago and now the controller is c4 instead of c2. The hot spare was originally on c2, and apparently on rebooting it didn't get found. So, I looked up what the new name for the hot spare was, then added it to the pool with zpool add home1 spare c4t19d0. I then tried to remove the original name for the hot spare: r...@box:~# zpool remove home1 c2t0d8 r...@box:~# zpool status home1 pool: home1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM home1ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 c4t24d0 ONLINE 0 0 0 spares c2t0d8 UNAVAIL cannot open c4t19d0AVAIL errors: No known data errors So, how can I convince the pool to release its grasp on c2t0d8? Have you tried making a sparse file with mkfile in /tmp and then ZFS replace'ing c2t0d8 with the file, and then zfs remove'ing the file? I don't know if it will work, but at least at the time of the remove, the device will exist. -Kyle Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Hi Cindy, I think you can still offline the faulted disk, c1t6d0. OK, here it gets tricky. I have NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c1t15d0 INUSE currently in use now. When I issue the command zpool offline tank c1t6d0 I get cannot offline c1t6d0: no valid replicas ?? However zpool detach tank c1t6d0 seems to work! pool: tank state: ONLINE scrub: resilver completed after 0h22m with 0 errors on Thu Aug 6 22:55:37 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 errors: No known data errors This looks like I can remove and physically replace c1t6d0 now! :-) Thanks, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
Dang. This is a bug we talked about recently that is fixed in Nevada and an upcoming Solaris 10 release. Okay, so you can't offline the faulted disk, but you were able to replace it and detach the spare. Cool beans... Cindy On 08/06/09 15:35, Andreas Höschler wrote: Hi Cindy, I think you can still offline the faulted disk, c1t6d0. OK, here it gets tricky. I have NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 spares c1t15d0 INUSE currently in use now. When I issue the command zpool offline tank c1t6d0 I get cannot offline c1t6d0: no valid replicas ?? However zpool detach tank c1t6d0 seems to work! pool: tank state: ONLINE scrub: resilver completed after 0h22m with 0 errors on Thu Aug 6 22:55:37 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t15d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 errors: No known data errors This looks like I can remove and physically replace c1t6d0 now! :-) Thanks, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, 6 Aug 2009, Nigel Smith wrote: I guess it depends on the rate of progress of ZFS compared to say btrfs. Btrfs is still an infant whereas zfs is now into adolescence. I would say that maybe Sun should have held back on announcing the work on deduplication, as it just seems to I still have not seen any formal announcement from Sun regarding deduplication. Everything has been based on remarks from code developers. It is not as concrete and definite as Apple's announcement of zfs inclusion in Snow Leopard Server. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, Aug 6, 2009 at 16:59, Rossno-re...@opensolaris.org wrote: But why do you have to attach to a pool? Surely you're just attaching to the root filesystem anyway? And as Richard says, since filesystems can be shrunk easily and it's just as easy to detach a filesystem from one machine and attach to it from another, why the emphasis on pools? What filesystems are you talking about? A zfs pool can be attached to one and only one computer at any given time. All file systems in that pool are attached to the same computer. For once I'm beginning to side with Richard, I just don't understand why data has to be in separate pools to do this. All accounting for data and free blocks are done at the pool level. That is why you can share space between file systems. You could write code that made ZFS a cluster file system, maybe just for the pool but that is a lot of work and would require all attached computer so talk to each other. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On 6 aug 2009, at 23.52, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: I still have not seen any formal announcement from Sun regarding deduplication. Everything has been based on remarks from code developers. To be fair, the official what's new document for 2009.06 states that dedup will be part of the next OSOL release in 2010. Or at least that we should look out for it ;) We're already looking forward to the next release due in 2010. Look out for great new features like an interactive installation for SPARC, the ability to install packages directly from the repository during the install, offline IPS support, a new version of the GNOME desktop, ZFS deduplication and user quotas, cloud integration and plenty more! As always, you can follow active development by adding the dev/ repository. Henrik http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Fri, 7 Aug 2009, Henrik Johansson wrote: We're already looking forward to the next release due in 2010. Look out for great new features like an interactive installation for SPARC, the ability to install packages directly from the repository during the install, offline IPS support, a new version of the GNOME desktop, ZFS deduplication and user quotas, cloud integration and plenty more! As always, you can follow active development by adding the dev/ repository. Clearly I was wrong and the ZFS deduplication announcement *is* as concrete as Apple's announcement of zfs support in Snow Leopard Server. Sorry about that. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
erik.ableson wrote: You're running into the same problem I had with 2009.06 as they have corrected a bug where the iSCSI target prior to 2009.06 didn't honor completely SCSI sync commands issued by the initiator. I think I've hit the same thing. I'm using an iscsi volume as the target for Time Machine backups for my new Mac Book Pro using the GlobalSAN initiator. Running against an iscsi volume on my zfs pool, with both the Mac and the Solaris box on gigE, I was seeing the Time Machine backup (of 90GB of data) running at about 600-700KB (yes, KB) per second. This would mean a backup time on the order of (optimistically) 45 hours, so I decided to give your suggestion a go. For my freewheeling home use where everything gets tried, crashed, patched and put back together with baling twine (and is backed up elsewhere...) I've mounted a RAM disk of 1Gb which is attached to the pool as a ZIL and you see the performance run in cycles where the ZIL loads up to saturation, flushes out to disk and keeps going. I did write a script to regularly dd the ram disk device out to a file so that I can recreate with the appropriate signature if I have to reboot the osol box. This is used with the GlobalSAN initiator on OS X as well as various Windows and Linux machines, physical and VM. Assuming this is a test system that you're playing with and you can destroy the pool with inpunity, and you don't have an SSD lying around to test with, try the following : ramdiskadm -a slog 2g (or whatever size you can manage reasonably with the available physical RAM - try vmstat 1 2 to determine available memory) zpool add poolname log /dev/ramdisk/slog I used a 2GB ram disk (the machine has 12GB of RAM) and this jumped the backup up to somewhere between 18-40MB/s, which means that I'm only a couple of hours away from finishing my backup. This is, as far as I can tell, magic (since I started this message nearly 10GB of data have been transferred, when it took from 6am this morning to get to 20GB.) It transfer speed drops like crazy when the write to disk happens, but it jumps right back up afterwards. If you want to perhaps reuse the slog later (ram disks are not preserved over reboot) write the slog volume out to disk and dump it back in after restarting. dd if=/dev/ramdisk/slog of=/root/slog.dd Now my only question is: what do I do when it's done? If I reboot and the ram disk disappears, will my tank be dead? Or will it just continue without the slog? I realize that I'm probably totally boned if the system crashes, so I'm copying off the stuff that I really care about to another pool (the Mac's already been backed up to a USB drive.) Have I meddled in the affairs of wizards? Is ZFS subtle and quick to anger? Steve -- Stephen Green http://blogs.sun.com/searchguy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
I believe there are a couple of ways that work. The commands I've always used are to attach the new disk as a spare (if not already) and then replace the failed disk with the spare. I don't know if there are advantages or disavantages but I also have never had a problem doing it this way. Andreas Höschler wrote: Dear managers, one of our servers (X4240) shows a faulty disk: -bash-3.00# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirrorDEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t7d0 ONLINE 0 0 0 errors: No known data errors I derived the following possible approaches to solve the problem: 1) A way to reestablish redundancy would be to use the command zpool attach tank c1t7d0 c1t15d0 to add c1t15d0 to the virtual device c1t6d0 + c1t7d0. We still would have the faulty disk in the virtual device. We could then dettach the faulty disk with the command zpool dettach tank c1t6d0 2) Another approach would be to add a spare disk to tank zpool add tank spare c1t15d0 and the replace to replace the faulty disk. zpool replace tank c1t6d0 c1t15d0 In theory that is easy, but since I have never done that and since this is a productive server I would appreciate if somone with more experience would look on my agenda before I issue these commands. What is the difference between the two approaches? Which one do you recommend? And is that really all that has to be done or am I missing a bit? I mean can c1t6d0 be physically replaced after issuing zpool dettach tank c1t6d0 or zpool replace tank c1t6d0 c1t15d0? I also found the command zpool offline tank ... but am not sure whether this should be used in my case. Hints are greatly appreciated! Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing faulty disk in ZFS pool
If her adds the spare and then manually forces a replace, it will take no more time than any other way. I do this quite frequently and without needing the scrub which does take quite a lot of time. cindy.swearin...@sun.com wrote: Hi Andreas, Good job for using a mirrored configuration. :-) Your various approaches would work. My only comment about #2 is that it might take some time for the spare to kick in for the faulted disk. Both 1 and 2 would take a bit more time than just replacing the faulted disk with a spare disk, like this: # zpool replace tank c1t6d0 c1t15d0 Then you could physically replace c1t6d0 and add it back to the pool as a spare, like this: # zpool add tank spare c1t6d0 For a production system, the steps above might be the most efficient. Get the faulted disk replaced with a known good disk so the pool is no longer degraded, then physically replace the bad disk when you have the time and add it back to the pool as a spare. It is also good practice to run a zpool scrub to ensure the replacement is operational and use zpool clear to clear the previous errors on the pool. If the system is used heavily, then you might want to run the zpool scrub when system use is reduced. If you were going to physically replace c1t6d0 while it was still attached to the pool, then you might offline it first. Cindy On 08/06/09 13:17, Andreas Höschler wrote: Dear managers, one of our servers (X4240) shows a faulty disk: -bash-3.00# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirrorDEGRADED 0 0 0 c1t6d0 FAULTED 019 0 too many errors c1t7d0 ONLINE 0 0 0 errors: No known data errors I derived the following possible approaches to solve the problem: 1) A way to reestablish redundancy would be to use the command zpool attach tank c1t7d0 c1t15d0 to add c1t15d0 to the virtual device c1t6d0 + c1t7d0. We still would have the faulty disk in the virtual device. We could then dettach the faulty disk with the command zpool dettach tank c1t6d0 2) Another approach would be to add a spare disk to tank zpool add tank spare c1t15d0 and the replace to replace the faulty disk. zpool replace tank c1t6d0 c1t15d0 In theory that is easy, but since I have never done that and since this is a productive server I would appreciate if somone with more experience would look on my agenda before I issue these commands. What is the difference between the two approaches? Which one do you recommend? And is that really all that has to be done or am I missing a bit? I mean can c1t6d0 be physically replaced after issuing zpool dettach tank c1t6d0 or zpool replace tank c1t6d0 c1t15d0? I also found the command zpool offline tank ... but am not sure whether this should be used in my case. Hints are greatly appreciated! Thanks a lot, Andreas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discus s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] BELATED FOLLOWUP Re: Zpool scrub in cron hangs u3/u4 server, stumps tech support.
I'm realizing I never sent the answer to this story, which is that the server needed more RAM. We knew the ARC cache was implicated but had missed just how much RAM zfs needs for the ARC cache, and this server had a LOT of file systems. THOUSANDS. Partially because a lot of this information wasn't in the ZETG until recently... This all came to light when we crossed some sort of line and the server started hanging intermittently and seemingly at random, but increasingly frequent intervals. Taking the server from 2G to 10G made the problem disappear. 6G would have been sufficient (possibly 4G) but that week the price was the same for 4G vs 8G. I omit the part of the story where we became mired in arc cache variable changes, because that's probably just relevant to u3/u4 users. I did take my replacement servers up to u6/u7 On Tue, Feb 17, 2009 at 11:56 PM, Elizabeth Schwartzbetsy.schwa...@gmail.com wrote: I've got a server that freezes when I run a zpool scrub from cron. Zpool scrub runs fine from the command line, no errors. The freeze happens within 30 seconds of the zpool scrub happening. The one core dump I succeeded in taking showed an arccache eating up all the ram. The server's running Solaris 10 u3, kernel patch 127727-11 but it's been patched and seems to have some u4 features (particularly, the arc variables) The only bug report I could find shows a similar bug patched in 120011-14, a patch which I installed many months ago. Sun support threw up their hands and said to install Solaris 10 u6, which I'm not really happy about doing as a bug fix to a production server running a supported version of Sun OS. Once Upon a Time, Sun used to offer *patches* to paying customers for operating system bugs. I quote the latest ticket note in disgust: I really don't know what to tell you. S10u6 has many enhancements and improvments to zfs, but most can be gained though patchs with the exception of new features. I'm trying to escalate the ticket, but really, I'm angry. I've been a big champion of staying with Sun/Solaris over Linux and one of the reasons has been that traditionally Sun had really good tech support, and you could *get* patches if you needed them. If the answer is going to be we don't know what the bug is but maybe a later release will fix it - or not that's not very reassuring. Any thoughts - besides upgrading? Which we'll do, but it's a production server so I don't want to rush it. -- Unix Systems Administrator Harvard Graduate School of Design ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss