[zfs-discuss] iscsi share on different subnet?
I have ZFS/Xen server for my home network. The box itself has two physical NICs. I want Dom0 to be on my management network and the guest domains to be on the dmz and private networks. The private network is where all my home computers are and would like to export iscsi volumes directly to them - without having to create a firewall rule to grant them access to the management network. After some searching, I have yet to find a way to specify the subnet an iSCSI target is visible to - is there any way to do that? Another idea, I suppose, would be to have one of the guest domains mount the volume and then export it itself, but this would be less performant and more complicated... Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] how to destroy a pool by id?
Over the course of multiple OpenSolaris installs , I first created a pool called "tank" and then, later and resusing some of the same drives, I created another pool called tank. I can `zpool export tank`, but when I `zpool import tank`, I get: bash-3.2# zpool import tank cannot import 'tank': more than one matching pool import by numeric ID instead Then, using just `zpool import` I see the IDs: bash-3.2# zpool import pool: tank id: 15608629750614119537 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: tank ONLINE raidz2 ONLINE c3t0d0 ONLINE c3t4d0 ONLINE c4t0d0 ONLINE c4t4d0 ONLINE c5t0d0 ONLINE c5t4d0 ONLINE raidz2 ONLINE c3t1d0 ONLINE c3t5d0 ONLINE c4t1d0 ONLINE c4t5d0 ONLINE c5t1d0 ONLINE c5t5d0 ONLINE pool: tank id: 3280066346390919920 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank ONLINE raidz2 ONLINE c4t1d0p0 ONLINE c3t1d0p0 ONLINE c4t4d0p0 ONLINE c3t4d0p0 ONLINE c3t5d0p0 ONLINE c3t0d0p0 ONLINE How can I destroy the pool 3280066346390919920 so I do have to specify the ID to import tank in the future? Thanks, kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ATA UDMA data parity error
For the archive, I swapped the mobo and all is good now... (I copied 100GB into the pool without a crash) One problem I had was that Solaris would hang whenever booting - even when all the aoc-sat2-mv8 cards were pulled out. Turns out that switching the BIOS field USB 2.0 Controller Mode from HiSpeed to FullSpeed makes the difference - any ideas why? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ATA UDMA data parity error
Thanks for the note Anton. I let memtest86 run overnight and it found no issues. I've also now moved the cards around and have confirmed that slot #3 on the mobo is bad (all my aoc-sat2-mv8 cards, cables, and backplanes are OK). However, I think its more than just slot #3 that has a fault because when I have all three cards plugged into mobo slots other than #3, they all work fine individually, but when I run the exact same per-card tests in parallel, the system crashes. I'm now going to have the system integrator that built my system send me a new mobo (ugh!) Thanks again, Kent Anton B. Rang wrote: Definitely a hardware problem (possibly compounded by a bug). Some key phrases and routines: ATA UDMA data parity error This one actually looks like a misnomer. At least, I'd normally expect data parity error not to crash the system! (It should result in a retry or EIO.) PCI(-X) Express Fatal Error This one's more of an issue -- it indicates that the PCI Express bus had an error. pcie_pci:pepb_err_msi_intr This indicates an error on the PCI bus which has been reflected through to the PCI Express bus. There should be more detail, but it's hard to figure it out from what's below. (The report is showing multiple errors, including both parity errors system errors, which seems unlikely unless there's a hardware design flaw or a software bug.) Others have suggested the power supply or memory, but in my experience these types of errors are more often due to a faulty system backplane or card (and occasionally a bad bridge chip). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ATA UDMA data parity error
Hey all, I'm not sure if this is a ZFS bug or a hardware issue I'm having - any pointers would be great! Following contents include: - high-level info about my system - my first thought to debugging this - stack trace - format output - zpool status output - dmesg output High-Level Info About My System - - fresh install of b78 - first time trying to do anything IO-intensive with ZFS - command was `cp -r /cdrom /tank/sxce_b78_disk1` - but this also fails `cp -r /usr /tank/usr` - system has 24 sata/sas drive bays, but only 12 of them all populated - system has three AOC-SAT2-MV8 cards plugged into 6 mini-sas backplanes - card1 (c3) - bp1 (c3t0d0, c3t1d0) - bp2 (c3t4d0, c3t5d0) - card2 (c4) - bp1 (c4t0d0, c4t1d0) - bp2 (c4t4d0, c4t5d0) - card3 (c5) - bp1 (c5t0d0, c5t1d0) - bp2 (c5t4d0, c5t5d0) - system has one Barcelona Opteron (step BA) - the one with the potential look-aside cache bug... - though its not clear this is related... My First Thought To Debugging This After crashing my system several times (using `cp -r /usr /tank/usr`) and comparing the outputs, I noticed that it stack trace always points to device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1, which corresponds to all the drives connected to aoc-sat-mv8 card #3 (i.e. c5) But looking at the `format` output, this device path only differs from the other devices in that there is a ,1 trailing the /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED] part. Further, again looking at the `format` output, c3 devices have 4/disk, c4 devices have 6/disk, and c5 devices also have 6/disk. The only other thing I can add to this is that if I boot a Xen kernel, which I was *not* using for all these tests, I the following IRQ errors are reported: SunOS Release 5.11 Version snv_78 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: san NOTICE: IRQ17 is shared Reading ZFS config: done Mounting ZFS filesystems: (1/1) NOTICE: IRQ20 is shared NOTICE: IRQ21 is shared NOTICE: IRQ22 is shared Any ideas? Stack Trace (note: I've done this a few times and its always /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1) --- ATA UDMA data parity error SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major EVENT-TIME: 0x478f39ab.0x160dc688 (0x5344bb5958) PLATFORM: i86pc, CSN: -, HOSTNAME: san SOURCE: SunOS, REV: 5.11 snv_78 DESC: Errors have been detected that require a reboot to ensure system integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information. AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry IMPACT: The system will sync files, save a crash dump if needed, and reboot REC-ACTION: Save the error summary below in case telemetry cannot be saved panic[cpu3]/thread=ff000f7c2c80: pcie_pci-0: PCI(-X) Express Fatal Error ff000f7c2bc0 pcie_pci:pepb_err_msi_intr+d2 () ff000f7c2c20 unix:av_dispatch_autovect+78 () ff000f7c2c60 unix:dispatch_hardint+2f () ff000fd09fd0 unix:switch_sp_and_call+13 () ff000fd0a020 unix:do_interrupt+a0 () ff000fd0a030 unix:cmnint+ba () ff000fd0a130 genunix:avl_first+1e () ff000fd0a1f0 zfs:metaslab_group_alloc+d1 () ff000fd0a2c0 zfs:metaslab_alloc_dva+1b7 () ff000fd0a360 zfs:metaslab_alloc+82 () ff000fd0a3b0 zfs:zio_dva_allocate+8a () ff000fd0a3d0 zfs:zio_next_stage+b3 () ff000fd0a400 zfs:zio_checksum_generate+6e () ff000fd0a420 zfs:zio_next_stage+b3 () ff000fd0a490 zfs:zio_write_compress+239 () ff000fd0a4b0 zfs:zio_next_stage+b3 () ff000fd0a500 zfs:zio_wait_for_children+5d () ff000fd0a520 zfs:zio_wait_children_ready+20 () ff000fd0a540 zfs:zio_next_stage_async+bb () ff000fd0a560 zfs:zio_nowait+11 () ff000fd0a870 zfs:dbuf_sync_leaf+1ac () ff000fd0a8b0 zfs:dbuf_sync_list+51 () ff000fd0a900 zfs:dbuf_sync_indirect+cd () ff000fd0a940 zfs:dbuf_sync_list+5e () ff000fd0a9b0 zfs:dnode_sync+23b () ff000fd0a9f0 zfs:dmu_objset_sync_dnodes+55 () ff000fd0aa70 zfs:dmu_objset_sync+13d () ff000fd0aac0 zfs:dsl_dataset_sync+5d () ff000fd0ab30 zfs:dsl_pool_sync+b5 () ff000fd0abd0 zfs:spa_sync+208 () ff000fd0ac60 zfs:txg_sync_thread+19a () ff000fd0ac70 unix:thread_start+8 () syncing file systems... 1 1 done ereport.io.pciex.rc.fe-msg ena=5344b8176c00c01 detector=[ version=0 scheme=
Re: [zfs-discuss] ATA UDMA data parity error
On a lark, I decided to create a new pool not including any devices connected to card #3 (i.e. c5) It crashes again, but this time with a slightly different dump (see below) - actually, there are two dumps below, the first is using the xVM kernel and the second is not Any ideas? Kent [NOTE: this one using xVM kernel - see below for dump without xVM kernel] # zpool destroy tank # zpool status no pools available # zpool create tank raidz2 c3t0d0 c3t4d0 c4t0d0 c4t4d0 raidz2 c3t1d0 c3t5d0 c4t1d0 c4t5d0 # zpool status pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 errors: No known data errors # ls /tank # cp -r /usr /tank/usr Jan 17 08:48:53 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:53 san port 5: device reset Jan 17 08:48:53 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:53 san port 5: link lost Jan 17 08:48:53 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:53 san port 5: link established Jan 17 08:48:55 san marvell88sx: WARNING: marvell88sx1: port 4: DMA completed after timed out Jan 17 08:48:55 san last message repeated 14 times Jan 17 08:48:55 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:55 san port 4: device reset Jan 17 08:48:55 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:55 san port 4: link lost Jan 17 08:48:55 san sata: NOTICE: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Jan 17 08:48:55 san port 4: link established Jan 17 08:48:55 san scsi: WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15): Jan 17 08:48:55 san Error for Command: write Error Level: Retryable Jan 17 08:48:55 san scsi: Requested Block: 11893 Error Block: 11893 Jan 17 08:48:55 san scsi: Vendor: ATASerial Number: Jan 17 08:48:55 san scsi: Sense Key: No_Additional_Sense Jan 17 08:48:55 san scsi: ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 Jan 17 08:48:55 san scsi: WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15): Jan 17 08:48:55 san Error for Command: write Error Level: Retryable Jan 17 08:48:55 san scsi: Requested Block: 11983 Error Block: 11983 Jan 17 08:48:55 san scsi: Vendor: ATASerial Number: Jan 17 08:48:55 san scsi: Sense Key: No_Additional_Sense Jan 17 08:48:55 san scsi: ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 Jan 17 08:48:55 san scsi: WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15): Jan 17 08:48:55 san Error for Command: write Error Level: Retryable Jan 17 08:48:55 san scsi: Requested Block: 12988 Error Block: 12988 Jan 17 08:48:55 san scsi: Vendor: ATASerial Number: Jan 17 08:48:55 san scsi: Sense Key: No_Additional_Sense Jan 17 08:48:55 san scsi: ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 Jan 17 08:48:55 san scsi: WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd15): Jan 17 08:48:55 WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4: ATA UDMA data parity error WARNING: marvell88sx1: error on port 4:
Re: [zfs-discuss] ATA UDMA data parity error
Below I create zpools isolating one card at a time - when just card#1 - it works - when just card #2 - it fails - when just card #3 - it works And then again using the two cards that seem to work: - when cards #1 and #3 - it fails So, at first I thought I narrowed it down to a card, but my last test shows that it still fails when the zpool uses two cards that succeed individually... The only thing I can think to point out here is that those two cards on on different buses - one connected to a NECuPD720400 and the other connected to a AIC-7902, which itself is then connected to the NECuPD720400 Any ideas? Thanks, Kent OK, doing it again using just card #1 (i.e. c3) works! # zpool destroy tank # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 # cp -r /usr /tank/usr cp: cycle detected: /usr/ccs/lib/link_audit/32 cp: cannot access /usr/lib/amd64/libdbus-1.so.2 Doing it again using just card #2 (i.e. c4) still fails: # zpool destroy tank # zpool create tank raidz2 c4t0d0 c4t4d0 c4t1d0 c4t5d0 # cp -r /usr /tank/usr cp: cycle detected: /usr/ccs/lib/link_audit/32 cp: cannot access /usr/lib/amd64/libdbus-1.so.2 WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error WARNING: marvell88sx1: error on port 1: ATA UDMA data parity error SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major EVENT-TIME: 0x478f6148.0x376ebd4b (0xbf8f86652d) PLATFORM: i86pc, CSN: -, HOSTNAME: san SOURCE: SunOS, REV: 5.11 snv_78 DESC: Errors have been detected that require a reboot to ensure system integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information. AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry IMPACT: The system will sync files, save a crash dump if needed, and reboot REC-ACTION: Save the error summary below in case telemetry cannot be saved panic[cpu3]/thread=ff000f7bcc80: pcie_pci-0: PCI(-X) Express Fatal Error ff000f7bcbc0 pcie_pci:pepb_err_msi_intr+d2 () ff000f7bcc20 unix:av_dispatch_autovect+78 () ff000f7bcc60 unix:dispatch_hardint+2f () ff000f786ac0 unix:switch_sp_and_call+13 () ff000f786b10 unix:do_interrupt+a0 () ff000f786b20 unix:cmnint+ba () ff000f786c10 unix:mach_cpu_idle+b () ff000f786c40 unix:cpu_idle+c8 () ff000f786c60 unix:idle+10e () ff000f786c70 unix:thread_start+8 () syncing file systems... done ereport.io.pciex.rc.fe-msg ena=bf8f828ea700c01 detector=[ version=0 scheme= dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED] ] rc-status=87c source-id=200 source-valid=1 ereport.io.pciex.rc.mue-msg ena=bf8f828ea700c01 detector=[ version=0 scheme= dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED] ] rc-status=87c ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0 scheme=dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED] ] pci-sec-status=6000 pci-bdg-ctrl=3 ereport.io.pci.sec-ma ena=bf8f828ea700c01 detector=[ version=0 scheme=dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED] ] pci-sec-status=6000 pci-bdg-ctrl=3 ereport.io.pciex.bdg.sec-perr ena=bf8f828ea700c01 detector=[ version=0 scheme= dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED] ] sue-status=1800 source-id=200 source-valid=1 ereport.io.pciex.bdg.sec-serr ena=bf8f828ea700c01 detector=[ version=0 scheme= dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED] ] sue-status=1800 ereport.io.pci.sec-rserr ena=bf8f828ea700c01 detector=[ version=0 scheme=dev device-path=/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED] ] pci-sec-status=6420 pci-bdg-ctrl=7 dumping to /dev/dsk/c2t0d0s1, offset 215547904, content: kernel NOTICE: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED]: port 0: device reset 100% done: And doing it again using just card #3 (i.e. c5) works! # zpool destroy tank cannot open 'tank': no such pool (interesting) # zpool create tank raidz2 c5t0d0 c5t4d0 c5t1d0 c5t5d0 # cp -r /usr /tank/usr And doing it again using cards #1 and #3 (i.e. c3 and c5) fails! # zpool destroy tank # zpool create tank raidz2 c3t0d0 c3t4d0 c3t1d0 c3t5d0 raidz2 c5t0d0 c5t4d0 c5t1d0 c5t5d0 # cp -r /usr /tank/usr cp: cycle detected: /usr/ccs/lib/link_audit/32 cp: cannot access /usr/lib/amd64/libdbus-1.so.2
Re: [zfs-discuss] how to create whole disk links?
Eric Schrock wrote: Or just let ZFS work its magic ;-) Oh, I didn't realize that `zpool create` could be fed vdevs that didn't exist in /dev/dsk/ - and, as a bonus, it also creates the /dev/dsk/ links! # zpool create -f tank raidz2 c3t0d0 c3t4d0 c4t0d0 c4t4d0 c5t0d0 c5t4d0z # ls -l /dev/dsk/ | grep :wd$ lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t0d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t1d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t2d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t3d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t4d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 78 Dec 27 17:32 c3t5d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED],1/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 76 Dec 28 12:45 c4t0d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 76 Dec 27 22:38 c4t1d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 76 Dec 27 22:38 c4t4d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 76 Dec 28 12:45 c5t0d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd lrwxrwxrwx 1 root root 76 Dec 28 12:45 c5t4d0 - ../../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1033,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:wd Thanks for the pointer! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] aoc-sat2-mv8 (was: LSI SAS3081E = unstable drive numbers?)
Kent Watsen wrote: So, I picked up an AOC-SAT2-MV8 off eBay for not too much and then I got a 4xSATA to one SFF-8087 cable to connect it to one one my six backplanes. But, as fortune would have it, the cable I bought has SATA connectors that are physically too big to plug into the AOC-SAT2-MV8 - since the AOC-SAT2-MV8 stacks two SATA connectors on top of each other... As a temporary solution, I hooked up the reverse breakout cable using ports 1, 3, 5, and 7 on the aoc-sat2-mv8 - the cables fit this way because its using only one port from each stack. Anyway, the good news is that drives showed up in Solaris right away and their IDs are stable between hot-swaps and reboots. So I'll be keeping the aoc-sat2-mv8 (anybody want a SAS3081E?) I've already ordered more cables for the aoc-sat2-mv8 and will report which ones work when I get them Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LSI SAS3081E = unstable drive numbers?
Paul Jochum wrote: What the lsiutil does for me is clear the persistent mapping for all of the drives on a card. Since James confirms that I'm doomed to ad hoc methods tracking device-ids to bays, I'm interested in knowing if somehow your ability to clear the persistent mapping for *all* of the drives on the card somehow gets you to a stable-state? Seems to me that, if the tool is run from the BIOS, like the LSI Configuration Utility, then it kind defeats the uptime that I'm trying to achieve by having a RAID system in the first place... Is it, by chance, a user land program? I don't know of a way to disable the mapping completely (but that does sound like a nice option). Since SUN is reselling this card now (that is how I got my cards), I wonder if they can put in a request to LSI to provide this enhancement? Yes, please! - can someone from SUN please request LSI to add a feature to selectively disable their card's persistent drive mapping feature? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LSI SAS3081E = unstable drive numbers?
Kent Watsen wrote: Given that manually tracking shifting ids doesn't sound appealing to me, would using a SATA controller like the AOC-SAT2-MV8 resolve the issue? Given that I currently only have one LSI HBA - I'd need to get 2 more for all 24 drives ---or--- I could get 3 of these SATA controllers plus 6 discrete-to-8087 reverse breakout cables. Going down the LSI-route would cost about $600 while going down the AOC-SAT2-MV8 would cost about $400. I understand that the SATA controllers are less performant, but I'd gladly exchange some performance that I'm likely to never need to simplify my administrative overhead... So, I picked up an AOC-SAT2-MV8 off eBay for not too much and then I got a 4xSATA to one SFF-8087 cable to connect it to one one my six backplanes. But, as fortune would have it, the cable I bought has SATA connectors that are physically too big to plug into the AOC-SAT2-MV8 - since the AOC-SAT2-MV8 stacks two SATA connectors on top of each other... Any chance anyone knows of a 4xSATA to SFF-8087 reverse breakout cable that will plug into the AOC-SAT2-MV8? Alternatively, anyone know how hard it would be to splice the SATA cables that came with the AOC-SAT2-MV8 into the cable I bought? Also, the cable I got actually had 5 cables going into the SFF-8087 - 4 SATA and another that is rectangular and has holes for either 7 or 8 pins (not sure if its for 7 or 8 pins as one of the holes appears a bit different). Any idea what this fifth cable is for? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LSI SAS3081E = unstable drive numbers?
Eric Schrock wrote: For x86 systems, you can use ipmitool to manipulate the led state (ipmitool sunoem led ...). On older galaxy systems, you can only set the fail LED ('io.hdd0.led'), as the ok2rm LED is not physically connected to anything. On newer systems, you can set both the 'fail' and 'okrm' LEDs. You cannot change the activity LED except by manually sending the 'set sensor reading' IPMI command (not available via impitool). For external enclosures, you'll need a SES control program. Both of these problems are being worked on under the FMA sensor framework to create a unified view through libtopo. Until that's complete, you'll be stuck using ad hoc methods. Hi Eric, I've looked at your blog and have tried your suggestions, but I need a little more advice. I am on an x86 system running SXCE svn_74 - the system has 6 SAS backplanes but, according to the integrator, no real scsi enclosure services. according to the man page, I found that I could use `sdr elist generic` to list LEDs, but that command doesn't return any output: # ipmitool sdr elist generic # /* there was no output */ Is it not returning sensor ids because I don't have real scsi enclosure services? Is there anything I can do or am I doomed to ad hoc methods forever? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] LSI SAS3081E = unstable drive numbers?
Based on recommendations from this list, I asked the company that built my box to use an LSI SAS3081E controller. The first problem I noticed was that the drive-numbers were ordered incorrectly. That is, given that my system has 24 bays (6 rows, 4 bays/row), the drive numbers from top-to-bottom left-to-right were 6, 1, 0, 2, 4, 5 - even though when the system boots, each drive is scanned in perfect order (I can tell by watching the LEDs blink). I contacted LSI tech support and they explained: start response SAS treats device IDs differently than SCSI. LSI SAS controllers remember devices in the order they were discovered by the controller. This memory is persistent across power cycles. It is based on the world wide name (WWN) given uniquely to every SAS device. This allows your boot device to remain your boot device no matter where it migrates in the SAS topology. In order to clear the memory of existing devices you need at least one device that will not be present in your final configuration. Re-boot the machine and enter the LSI configuration utility (CTRL-C). Then find your way to SAS Topology. To see more options, press CTRL-M. Choose the option to clear all non-present device IDs. This clears the persistent memory of all devices not present at that time. Exchange the drives. The system will now remember the order it finds the drives after the next boot cycle. end response Sure enough, I was able to physical reorder my drives so they were 0, 1, 2, 4, 5, 6 - so, appearantly, the company that put my system together moved the drives around after they were initially scanned. But where is 3? (answer below). Then I tried another test: 1. make first disk blink # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10 10+0 records in 10+0 records out 2. pull disk '0' out and replace it with a brand new disk # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10 dd: /dev/dsk/c2t0d0p0: open: No such file or directory 3. scratch head and try again with '3' (I had previously cleared the LSI's controllers memory) # run dd if=/dev/dsk/c2t3d0p0 of=/dev/null count=10 10+0 records in 10+0 records out So, it seems my SAS controller is being too smart for its own good - it tracks the drives themselves, not the drive-bays. If I hot-swap a brand new drive into a bay, Solaris will see it as a new disk, not a replacement for the old disk. How can ZFS support this? I asked the LSI tech support again and got: start quote I don't have the knowledge to answer that, so I'll just say this: most vendors, including Sun, set up the SAS HBA to use enclosure/slot naming, which means that if a drive is swapped, it does NOT get a new name (after all, the enclosure and slot did not change). end quote So, now I turn to you... Here some information about my system: Specs: Motherboard: SuperMicro H8DME-2 Rev 2.01 - BIOS: AMI v2.58 HBA: LSI SAS3081E (SN: P068170707) installed in Slot #5 - LSI Configuration Utility v6.16.00.00 (2007-05-07) Backplane: CI-Design 12-6412-01BR HBA connected to BP via two SFF-8087-SFF-8087 cables OS: SXCE b74 Details: * Chassis has 24 SAS/SATA bays * There are 6 backplanes - one for each *row* of drives * I currently have only 6 drives installed (see pic) * The LSI card is plugged into backplanes 1 2 * The LSI card is NOT configured to do any RAID - its only JBOD as I'm using Solaris's ZFS (software-RAID) Question: * I only plan to use SATA drives, would using a SATA controller like Supermicro's AOC-SAT2-MV8 help? Thanks again, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LSI SAS3081E = unstable drive numbers?
Wow, how fortunate for me that you are on this list! I guess I do have a follow-up question... If each new drive gets a new id when plugged into the system - and I learn to discover that drive's id using dmesg or iostat and use `zfs replace ` correctly - when a drive fails, what will it take for me to physically find it. I'm hoping there is a command, like dd, that I can use to make that drive's LED blink, but I don't know if I can trust that `dd` will work at all when the drive is failed! Since I don't have enclosure-services, does that mean my best option is to manually track id-to-bay mappings? (envision a clip-board hanging on rack) Given that manually tracking shifting ids doesn't sound appealing to me, would using a SATA controller like the AOC-SAT2-MV8 resolve the issue? Given that I currently only have one LSI HBA - I'd need to get 2 more for all 24 drives ---or--- I could get 3 of these SATA controllers plus 6 discrete-to-8087 reverse breakout cables. Going down the LSI-route would cost about $600 while going down the AOC-SAT2-MV8 would cost about $400. I understand that the SATA controllers are less performant, but I'd gladly exchange some performance that I'm likely to never need to simplify my administrative overhead... Thanks, Kent James C. McPherson wrote: Hi Kent, I'm one of the team that works on Solaris' mpt driver, which we recently enhanced to deliver mpxio support with SAS. I have a bit of knowledge about your issue :-) Kent Watsen wrote: Based on recommendations from this list, I asked the company that built my box to use an LSI SAS3081E controller. The first problem I noticed was that the drive-numbers were ordered incorrectly. That is, given that my system has 24 bays (6 rows, 4 bays/row), the drive numbers from top-to-bottom left-to-right were 6, 1, 0, 2, 4, 5 - even though when the system boots, each drive is scanned in perfect order (I can tell by watching the LEDs blink). I contacted LSI tech support and they explained: start response SAS treats device IDs differently than SCSI. LSI SAS controllers remember devices in the order they were discovered by the controller. This memory is persistent across power cycles. It is based on the world wide name (WWN) given uniquely to every SAS device. This allows your boot device to remain your boot device no matter where it migrates in the SAS topology. In order to clear the memory of existing devices you need at least one device that will not be present in your final configuration. Re-boot the machine and enter the LSI configuration utility (CTRL-C). Then find your way to SAS Topology. To see more options, press CTRL-M. Choose the option to clear all non-present device IDs. This clears the persistent memory of all devices not present at that time. Exchange the drives. The system will now remember the order it finds the drives after the next boot cycle. end response Firstly, yes, the LSI SAS hbas do use persistent mapping, with a logical target id by default. This is where the hba does the translation between the physical disk device's SAS address (which you'll see in prtconf -v as the devid), and an essentially arbitrary target number which gets passed up to the OS - in this case Solaris. The support person @ LSI was correct about deleting all those mappings. Yes, the controller is being smart and tracking the actual device rather than a particular bay/slot mapping. This isn't so bad, mostly. The effect for you is that you can't assume that the replaced device is going to have the same target number as the old one (in fact, I'd call that quite unlikely) so you'll have to see what the new device name is by checking your dmesg or iostat -En output. Sure enough, I was able to physical reorder my drives so they were 0, 1, 2, 4, 5, 6 - so, appearantly, the company that put my system together moved the drives around after they were initially scanned. But where is 3? (answer below). Then I tried another test: 1. make first disk blink # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10 10+0 records in 10+0 records out 2. pull disk '0' out and replace it with a brand new disk # run dd if=/dev/dsk/c2t0d0p0 of=/dev/null count=10 dd: /dev/dsk/c2t0d0p0: open: No such file or directory 3. scratch head and try again with '3' (I had previously cleared the LSI's controllers memory) # run dd if=/dev/dsk/c2t3d0p0 of=/dev/null count=10 10+0 records in 10+0 records out So, it seems my SAS controller is being too smart for its own good - it tracks the drives themselves, not the drive-bays. If I hot-swap a brand new drive into a bay, Solaris will see it as a new disk, not a replacement for the old disk. How can ZFS support this? I asked the LSI tech support again and got: start quote I don't have the knowledge to answer that, so I'll just
Re: [zfs-discuss] LSI SAS3081E = unstable drive numbers?
Hi Paul, Already in my LSI Configuration Utility I have an option to clear the persistent mapping for drives not present, but then the card resumes its normal persistent-mapping logic. What I really want is to disable to persistent mapping logic completely - is the `lsiutil` doing that for you? Thanks, Kent Paul Jochum wrote: Hi Kent: I have run into the same problem before, and have worked with LSI and SUN support to fix it. LSI calls this persistant drive mapping, and here is how to clear it 1) obtain the latest version of the program lsiutil from LSI. They don't seem to have the Solaris versions on their website, but I got it by email when entering a ticket into their support system. I know that they have a version for Solaris x86 (and I believe a Sparc version also). The version I currently have is: LSI Logic MPT Configuration Utility, Version 1.52, September 7, 2007 2) Execute the lsiutil program on your target box. a) first it will ask you to select which card to use (I have multiple cards in my machine, don't know if it will ask if you only have 1 card in your box) b) then you need to select option 15 (it is a hidden option, not shown on the menu) c) then you select option 10 (Clear all persistant mappings) d) then option 0 multiple times to get out of the program e) I normally than reboot the box, and the next time it comes up, the drives are back in order. e) or (instead of rebooting) option 99, to reset the chip (causes new mappings to be established), then option 8 (to verify lower target IDs), then devfsadm. After devfsadm completes, lsiutil option 42 should display valid device names (in /dev/rdsk), and format should find the devices so that you can label them. Hope this helps. I happened to need it last night again (I normally have to run it after re-imaging a box, assuming that I don't want to save the data that was on those drives). Paul Jochum This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Solaris 10u5 Proposed Changes
How does one access the PSARC database to lookup the description of these features? Sorry if this has been asked before! - I tried google before posting this :-[ Kent George Wilson wrote: ZFS Fans, Here's a list of features that we are proposing for Solaris 10u5. Keep in mind that this is subject to change. Features: PSARC 2007/142 zfs rename -r PSARC 2007/171 ZFS Separate Intent Log PSARC 2007/197 ZFS hotplug PSARC 2007/199 zfs {create,clone,rename} -p PSARC 2007/283 FMA for ZFS Phase 2 PSARC/2006/465 ZFS Delegated Administration PSARC/2006/577 zpool property to disable delegation PSARC/2006/625 Enhancements to zpool history PSARC/2007/121 zfs set copies PSARC/2007/228 ZFS delegation amendments PSARC/2007/295 ZFS Delegated Administration Addendum PSARC/2007/328 zfs upgrade Stay tuned for a finalized list of RFEs and fixes. Thanks, George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Probably not, my box has 10 drives and two very thirsty FX74 processors and it draws 450W max. At 1500W, I'd be more concerned about power bills and cooling than the UPS! Yeah - good point, but I need my TV! - or so I tell my wife so I can play with all this gear :-X Cheers, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?
David Edmondson wrote: One option I'm still holding on to is to also use the ZFS system as a Xen-server - that is OpenSolaris would be running in Dom0... Given that the Xen hypervisor has a pretty small cpu/memory footprint, do you think it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of Dom0 and bump the memory up 512MB? A dom0 with 4G and 2 cores should be plenty to run ZFS and the support necessary for a reasonable (16) paravirtualised domains. If the guest domains end up using HVM then the dom0 load is higher, but we haven't done the work to quantify this properly yet. A tasty insight - a million thanks! I think if I get 2 quad-cores and 16Gb mem, I'd be able to stomach the overhead of 25%cpu and 25%mem going to the host - as the cost differential of have a dedicated SAN with another totally-redundant Xen box would be more expensive Cheers! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
I know what you are saying, but I , wonder if it would be noticeable? I Well, noticeable again comes back to your workflow. As you point out to Richard, it's (theoretically) 2x IOPS difference, which can be very significant for some people. Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit self-centered) I would say no, not even close to pushing it. Remember, we're measuring performance in MBytes/s, and video throughput is measured in Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air is going to be pretty rare). So I'm figuring you're just scratching the surface of even a minimal array. Put it this way: can a single, modern hard drive keep up with an ADSL2+ (24 Mbit/s) connection? Throw 24 spindles at the problem, and I'd say you have headroom for a *lot* of streams. Sweet! I should probably hang-up this thread now, but there are too many other juicy bits to respond too... I wasn't sure, with your workload. I know with mine, I'm seeing the data store as being mostly temporary. With that much data streaming in and out, are you planning on archiving *everything*? Cos that's only one month's worth of HD video. Well, not to down-play the importance of my TV recordings, which is really a laugh because I'm not really a big TV watcher, I simply don't want to ever have to think about this again after getting it setup I'd consider tuning a portion of the array for high throughput, and another for high redundancy as an archive for whatever you don't want to lose. Whether that's by setting copies=2, or by having a mirrored zpool (smart for an archive, because you'll be less sensitive to the write performance that suffers there), it's up to you... ZFS gives us a *lot* of choices. (But then you knew that, and it's what brought you to the list :) All true, but if 4(4+2) serves all my needs, I think that its simpler to administrate as I can arbitrarily allocate space as needed without needing to worry about what kind of space it is - all the space is good and fast space... I also committed to having at least one hot spare, which, after staring at relling's graphs for days on end, seems to be the cheapest, easiest way of upping the MTTDL for any array. I'd recommend it. No doubt that a hot-spare gives you a bump in MTTDL, but double-parity trumps it big time - check out Richard's blog... As I understand it, 5(2+1) would scale to better IOPS performance than 4(4+2), and IOPS represents the performance baseline; as you ask the array to do more and more at once, it'll look more like random seeks. What you get from those bigger zvol groups of 4+2 is higher performance per zvol. That said, with my few datapoints on 4+1 RAID-Z groups (running on 2 controllers) suggest that that configuration runs into a bottleneck somewhere, and underperforms from what's expected. Er? Can anyone fill in the missing blank here? Oh, the bus will far exceed your needs, I think. The exercise is to specify something that handles what you need without breaking the bank, no? Bank, smank - I build a system every 5+ years and I want it to kick ass all the way until I build the next one - cheers! BTW, where are these HDTV streams coming from/going to? Ethernet? A capture card? (and which ones will work with Solaris?) Glad you asked, for the lists sake, I'm using two HDHomeRun tuners (http://www.silicondust.com/wiki/products/hdhomerun) - actually, I bought 3 of them because I felt like I needed a spare :-D Yeah, perhaps I've been a bit too circumspect about it, but I haven't been all that impressed with my PCI-X bus configuration. Knowing what I know now, I might've spec'd something different. Of all the suggestions that've gone out on the list, I was most impressed with Tim Cook's: Won't come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm That has 3x 133MHz PCI-X slots each connected to the Southbridge via a different PCIe bus, which sounds worthy of being the core of the demi-Thumper you propose. Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html (note, LSI also sells matching PCI-X HBA controllers, in case you need to balance your mobo's architecture] ...But It all depends what you intend to spend. (This is what I was going to say in my next blog entry on the system:) We're talking about benchmarks that are really far past what you say is your most taxing work load. I say I'm disappointed with the contention on my bus putting limits on maximum throughputs, but really, what I have far outstrips my ability to get data into or out of the system. So moving to the PCIe-based cards should fix that - no? So all of my disappointment is in theory. Seems like this
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Hey Adam, My first posting contained my use-cases, but I'd say that video recording/serving will dominate the disk utilization - thats why I'm pushing for 4 striped sets of RAIDZ2 - I think that it would be all around goodness It sounds good, that way, but (in theory), you'll see random I/O suffer a bit when using RAID-Z2: the extra parity will drag performance down a bit. I know what you are saying, but I , wonder if it would be noticeable? I think my worst case scenario would be 3 myth frontends watching 1080p content while 4 tuners are recording 1080p content - with each 1080p stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all sequential I/O) - does that sound like it would even come close to pushing a 4(4+2) array? The RAS guys will flinch at this, but have you considered 8*(2+1) RAID-Z1? That configuration showed up in the output of the program I posted back in July (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html): 24 bays w/ 500 GB drives having MTBF=5 years - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of 95.05 years - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of 28911.68 years - can have 4 (4+1) w/ 4 spares providing 8000 GB with MTTDL of 684.38 years - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of 8673.50 years - can have 2 (8+1) w/ 6 spares providing 8000 GB with MTTDL of 380.21 years - can have 2 (8+2) w/ 4 spares providing 8000 GB with MTTDL of 416328.12 years But it is 91 times more likely to fail and this system will contain data that I don't want to risk losing I don't want to over-pimp my links, but I do think my blogged experiences with my server (also linked in another thread) might give you something to think about: http://lindsay.at/blog/archive/tag/zfs-performance/ I see that you also set up a video server (myth?), from you blog, I think you are doing 5(2+1) (plus a hot-spare?) - this is what my program says about a 16-bay system: 16 bays w/ 500 GB drives having MTBF=5 years - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of 1825.00 years - can have 4 (2+2) w/ 0 spares providing 4000 GB with MTTDL of 43367.51 years - can have 3 (4+1) w/ 1 spares providing 6000 GB with MTTDL of 912.50 years - can have 2 (4+2) w/ 4 spares providing 4000 GB with MTTDL of 2497968.75 years - can have 1 (8+1) w/ 7 spares providing 4000 GB with MTTDL of 760.42 years - can have 1 (8+2) w/ 6 spares providing 4000 GB with MTTDL of 832656.25 years Note that are MTTDL isn't quite as bad as 8(2+1) since you have three less strips. Also, its interesting for me to note that have have 5 strips and my 4(4+2) setup would have just one less - so the question to answer if your extra strip is better than my 2 extra disks in each raid-set? Testing 16 disks locally, however, I do run into noticeable I/O bottlenecks, and I believe it's down to the top limits of the PCI-X bus. Yes, too bad Supermicro doesn't make a PCIe-based version... But still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than my worst case scenario from above where 7 1080p streams would be 189Mb/s... As far as a mobo with good PCI-X architecture - check out the latest from Tyan (http://tyan.com/product_board_detail.aspx?pid=523) - it has three 133/100MHz PCI-X slots I use a Tyan in my server, and have looked at a lot of variations, but I hadn't noticed that one. It has some potential. Still, though, take a look at the block diagram on the datasheet: that actually looks like 1x PCI-X 133MHz slot and a bridge sharing 2x 100MHz slots. My benchmarks so far show that putting a controller on a 100MHz slot is measurably slower than 133MHz, but contention over a single bridge can be even worse. Hmmm, I hadn't thought about that... Here is another new mobo from Tyan (http://tyan.com/product_board_detail.aspx?pid=517) - its datasheet shows the PCI-X buses configured the same way as your S3892: Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Nit: small, random read I/O may suffer. Large random read or any random write workloads should be ok. Given that video-serving is all sequential-read, is it correct that that raidz2, specifically 4(4+2), would be just fine? For 24 data disks there are enough combinations that it is not easy to pick from. The attached RAIDoptimizer output may help you decide on the trade-offs. Wow! - thanks for running it with 24 disks! For description of the theory behind it, see my blog http://blogs.sun.com/relling I used your theory to write my own program (posted in July), but your's is way more complete I recommend loading it into StarOffice Nice little plug ;-) and using graphs or sorts to reorder the data, based on your priorities. Interesting, my 4(4+2) has 282 iops, where as 8(2+1) has 565 iops - exactly double, which is kind of expected given that it has twice as many stripes)... Also, it helps to see that the iops extremes are 12(raid1) with 1694 iops and 2(10+2) with 141 iops - so 4(4+2) is not a great 24-disk performer but isn't 282 iops is probably overkill for my home network? Yes, I (obviously :-) recommend http://www.sun.com/storagetek/storage_networking/hba/sas/specs.xml Very nice - think I'll be getting 3 of these! Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Sorry, but looking again at the RMP page, I see that the chassis I recommended is actually different than the one we have. I can't find this chassis only online, but here's what we bought: http://www.siliconmechanics.com/i10561/intel-storage-server.php?cat=625 That is such a cool looking case! From their picture gallery, you can't see the back, but it has space for 3.5 drives in the back. You can put hot swap trays back there for your OS drives. The guys at Silicon Mechanics are great, so you could probably call them to ask who makes this chassis. They may also be able to build you a partial system, if you like. An excellent suggestion, but after configuring the nServ K501 (because I want quad-core AMD) the way I want it, there price is almost exactly the same a my thrifty-shopper price, unlike RackMountPro which seems to add about 20% overhead - so I'll probably order the whole system from them, sans the Host Bus Adapter, as I'll use the SUN card Richard suggested Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
[CC-ing xen-discuss regarding question below] Probably a 64 bit dual core with 4GB of (ECC) RAM would be a good starting point. Agreed. So I was completely out of a the ball-park - I hope the ZFS Wiki can be updated to contain some sensible hardware-sizing information... One option I'm still holding on to is to also use the ZFS system as a Xen-server - that is OpenSolaris would be running in Dom0... Given that the Xen hypervisor has a pretty small cpu/memory footprint, do you think it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of Dom0 and bump the memory up 512MB? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
I will only comment on the chassis, as this is made by AIC (short for American Industrial Computer), and I have three of these in service at my work. These chassis are quite well made, but I have experienced the following two problems: snip Oh my, thanks for the heads-up! Charlie at RMP said that they were most popular - so I assumed that they were solid... For all new systems, I've gone with this chassis instead (I just noticed Rackmount Pro sells 'em also): http://rackmountpro.com/productpage.php?prodid=2043 But I was hoping for resiliency and easy replacement for the OS drive - hot-swap RAID1 seemed like a no-brainer... This case has 1 internal and one external 3.5 drive bays. I could use a CF reader for resiliency and reduce need for replacement - assuming I spool logs to internal drive so as to not burn out the CF. Alternatively, I could put a couple 2.5 drives into a single 3.5 bay for RAID1 resiliency, but I'd have to shutdown to replace a drive... What do you recommend? One other thing, that you may know already. Rackmount Pro will try to sell you 3ware cards, which work great in the Linux/Windows environment, but aren't supported in Open Solaris, even in JBOD mode. You will need alternate SATA host adapters for this application. Indeed, but why pay for a RAID controller when you only need SATA ports? - thats why I was thinking of picking up three of these bad boys (http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm) for about $100 each Good luck, Getting there - can anybody clue me into how much CPU/Mem ZFS needs? I have an old 1.2Ghz with 1Gb of mem laying around - would it be sufficient? Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Fun exercise! :) Indeed! - though my wife and kids don't seem to appreciate it so much ;) I'm thinking about using this 26-disk case: [FYI: 2-disk RAID1 for the OS 4*(4+2) RAIDZ2 for SAN] What are you *most* interested in for this server? Reliability? Capacity? High Performance? Reading or writing? Large contiguous reads or small seeks? One thing that I did that got a good feedback from this list was picking apart the requirements of the most demanding workflow I imagined for the machine I was speccing out. My first posting contained my use-cases, but I'd say that video recording/serving will dominate the disk utilization - thats why I'm pushing for 4 striped sets of RAIDZ2 - I think that it would be all around goodness I'm learning more and more about this subject as I test the server (not all that dissimilar to what you've described, except with only 18 disks) I now have. I'm frustrated at the relative unavailability of PCIe SATA controller cards that are ZFS-friendly (i.e., JBOD), and the relative unavailability of motherboards that support both the latest CPUs as well as have a good PCI-X architecture. Good point - another reply I just sent noted a PCI-X sata controller card, but I'd prefer a PCIe card - do you have a recommendation on a PCIe card? As far as a mobo with good PCI-X architecture - check out the latest from Tyan (http://tyan.com/product_board_detail.aspx?pid=523) - it has three 133/100MHz PCI-X slots If you come across some potential solutions, I think a lot of people here will thank you for sharing... Will keep the list posted! Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] hardware sizing for a zfs-based system?
Hi all, I'm putting together a OpenSolaris ZFS-based system and need help picking hardware. I'm thinking about using this 26-disk case: [FYI: 2-disk RAID1 for the OS 4*(4+2) RAIDZ2 for SAN] http://rackmountpro.com/productpage.php?prodid=2418 Regarding the mobo, cpus, and memory - I searched goggle and the ZFS site and all I came up with so far is that, for a dedicated iSCSI-based SAN, I'll need about 1 Gb of memory and a low-end processor - can anyone clarify exactly how much memory/cpu I'd need to be in the safe-zone? Also, are there any mobo/chipsets that are particularly well suited for a dedicated iSCSI-based SAN? This is for my home network, which includes internet/intranet services (mail, web, ldap, samba, netatalk, code-repository), build/test environments (for my cross-platform projects), and a video server (mythtv-backend). Right now, the aforementioned run on two separate machines, but I'm planning to consolidate them into a single Xen-based server. One idea I have is to host a Xen-server on this same machine - that is, an OpenSolaris-based Dom0 serving ZFS-based volumes to the DomU guest machines. But if I go this way, then I'd be looking at 4-socket Opteron mobo to use with AMD's just released quad-core CPUs and tons of memory. My biggest concern with this approach is getting PSUs large enough to power it all - if anyone has experience on this front, I'd love to hear about it too Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] pool analysis
Richard's blog analyzes MTTDL as a function of N+P+S: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl But to understand how to best utilize an array with a fixed number of drives, I add the following constraints: - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S I got the following results by varying the NUM_BAYS parameter in the source code below: *_4 bays w/ 300 GB drives having MTBF=4 years_* - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 5840.00 years - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 799350.00 years - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years *_8 bays w/ 300 GB drives having MTBF=4 years_* - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 2920.00 years - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 399675.00 years - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 1752.00 years - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 2557920.00 years - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years *_12 bays w/ 300 GB drives having MTBF=4 years_* - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 365.00 years - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 266450.00 years - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 876.00 years - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 79935.00 years - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 426320.00 years *_16 bays w/ 300 GB drives having MTBF=4 years_* - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 1168.00 years - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 199837.50 years - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 584.00 years - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 1278960.00 years - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 426320.00 years *_20 bays w/ 300 GB drives having MTBF=4 years_* - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 973.33 years - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 159870.00 years - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 109.50 years - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 852640.00 years - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 13322.50 years *_24 bays w/ 300 GB drives having MTBF=4 years_* - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 182.50 years - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 133225.00 years - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 438.00 years - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 39967.50 years - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 213160.00 years While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking? Thanks, Kent _*Source Code*_ (compile with: cc -std:c99 -lm filename) [its more than 80 columns - sorry!] #include stdio.h #include math.h #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4 #define MTTR_HOURS_NO_SPARE 16 #define MTTR_HOURS_SPARE 4 int main() { printf(\n); printf(%u bays w/ %u GB drives having MTBF=%u years\n, NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS); for (int num_drives=2; num_drives=8; num_drives*=2) { for (int num_parity=1; num_parity=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives= num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int
Re: [zfs-discuss] pool analysis
Resent as HTML to avoid line-wrapping: Richard's blog analyzes MTTDL as a function of N+P+S: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl But to understand how to best utilize an array with a fixed number of drives, I add the following constraints: - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S I got the following results by varying the NUM_BAYS parameter in the source code below: _*4 bays w/ 300 GB drives having MTBF=4 years*_ - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 5840.00 years - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 799350.00 years - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years _*8 bays w/ 300 GB drives having MTBF=4 years*_ - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 2920.00 years - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 399675.00 years - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 1752.00 years - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 2557920.00 years - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years _*12 bays w/ 300 GB drives having MTBF=4 years*_ - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 365.00 years - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 266450.00 years - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 876.00 years - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 79935.00 years - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 426320.00 years * _16 bays w/ 300 GB drives having MTBF=4 years_* - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 1168.00 years - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 199837.50 years - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 584.00 years - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 1278960.00 years - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 426320.00 years _*20 bays w/ 300 GB drives having MTBF=4 years*_ - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 973.33 years - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 159870.00 years - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 109.50 years - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 852640.00 years - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 13322.50 years _*24 bays w/ 300 GB drives having MTBF=4 years*_ - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 182.50 years - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 133225.00 years - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 438.00 years - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 39967.50 years - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 213160.00 years While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking? Thanks, Kent _Source Code_ (compile with: cc -std:c99 -lm filename) [its more than 80 columns - sorry!] #include stdio.h #include math.h #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4 #define MTTR_HOURS_NO_SPARE 16 #define MTTR_HOURS_SPARE 4 int main() { printf(\n); printf(%u bays w/ %u GB drives having MTBF=%u years\n, NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS); for (int num_drives=2; num_drives=8; num_drives*=2) { for (int num_parity=1; num_parity=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives= num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int num_spares = NUM_BAYS % total_num_drives; double mttr= num_spares==0 ? MTTR_HOURS_NO_SPARE :
Re: [zfs-discuss] pool analysis
But to understand how to best utilize an array with a fixed number of drives, I add the following constraints: - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S Yes, these are reasonable and will reduce the problem space, somewhat. Actually, I wish I could get more insight into why N can only be 2, 4,or 8. In contemplating a 16-bay array, I many times think that 3 (3+2) + 1 spare would be perfect, but I have no understanding what N=3 implicates... While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking? You are overlooking statistics :-). As I discuss in http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but clearly no child will live anywhere close to 4,807 years. Thanks - I hadn't seen that blog entry yet... #define MTTR_HOURS_NO_SPARE 16 I think this is optimistic :-) Not really for me as the array is in my basement - so I assume that I'll swap in a drive when I get home from work ;) There are many more facets of looking at these sorts of analysis, which is why I wrote RAIDoptimizer. Is RAIDoptimizer the name of a spreadsheet you developed - is it publically available? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 8+2 or 8+1+spare?
Hi all, I'm new here and to ZFS but I've been lurking for quite some time... My question is simple: which is better 8+2 or 8+1+spare? Both follow the (N+P) N={2,4,8} P={1,2} rule, but 8+2 results in a total or 10 disks, which is one disk more than 3=num-disks=9 rule. But 8+2 has much better MTTDL than 8+1+spare and so I'm trying to understand how bad it would really be - what doesn't work/scale? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 8+2 or 8+1+spare?
I think that the 3=num-disks=9 rule only applies to RAIDZ and it was changed to 4=num-disks=10 for RAIDZ2, but I might be remembering wrong. Can anybody confirm that the 3=num-disks=9 rule only applies to RAIDZ and that 4=num-disks=10 applies to RAIDZ2? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 8+2 or 8+1+spare?
Don't confuse vdevs with pools. If you add two 4+1 vdevs to a single pool it still appears to be one place to put things. ;) Newbie oversight - thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 8+2 or 8+1+spare?
Another reason to recommend spares is when you have multiple top-level vdevs and want to amortize the spare cost over multiple sets. For example, if you have 19 disks then 2x 8+1 raidz + spare amortizes the cost of the spare across two raidz sets. -- richard Interesting - I hadn't realized that a spare could be used across sets Thanks! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 8+2 or 8+1+spare?
Rob Logan wrote: which is better 8+2 or 8+1+spare? 8+2 is safer for the same speed 8+2 requires alittle more math, so its slower in theory. (unlikely seen) (4+1)*2 is 2x faster, and in theory is less likely to have wasted space in transaction group (unlikely seen) I keep reading that (4+1)*2 is 2x faster, but if all the data I care about is in one of the two sets, does it follow that my access to just that data is also 2x faster? - or is it more that simultaneous read/write of the entire array is (globally) 2x faster? Thanks, Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 8+2 or 8+1+spare?
John-Paul Drawneek wrote: Your data gets striped across the two sets so what you get is a raidz stripe giving you the 2x faster. tank ---raidz --devices ---raidz --devices sorry for the diagram. So you got your zpool tank with raidz stripe. Thanks - I think you all have hammered this point home for me now - all this confusion stems from my not realizing that sets are merged into a single striped pool... ugh! ;) Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss