Re: [zfs-discuss] WD caviar/mpt issues
On Wed, Jun 23, 2010 at 2:43 PM, Jeff Bacon wrote: >> >> Swapping the 9211-4i for a MegaRAID ELP (mega_sas) improves >> >> performance by 30-40% instantly and there are no hangs anymore so > I'm >> >> guessing it's something related to the mpt_sas driver. > > Wait. The mpt_sas driver by default uses scsi_vhci, and scsi_vhci by > default does load-balance round-robin. Have you tried setting > load-balance="none" in scsi_vhci.conf? That didn't help. -- Giovanni Tirloni gtirl...@sysdroid.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Open Solaris installation help for backup application
This forum has been tremendously helpful, but I decided to get some help from a Solaris Guru install Solaris for a backup application. I do not want to disturb the flow of this forum, but where can I post to get some paid help on this forum? We are located in the San Francisco Bay Area. Any help would be appreciated. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
On Jun 23, 2010, at 1:48 PM, Robert Milkowski wrote: > > 128GB. > > Does it mean that for dataset used for databases and similar environments > where basically all blocks have fixed size and there is no other data all > parity information will end-up on one (z1) or two (z2) specific disks? What's the record size on those datasets? 8k? -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
> Does it mean that for dataset used for databases and similar environments > where basically all blocks have fixed size and there is no other data all > parity information will end-up on one (z1) or two (z2) specific disks? No. There are always smaller writes to metadata that will distribute parity. What is the total width of your raidz1 stripe? Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
128GB. Does it mean that for dataset used for databases and similar environments where basically all blocks have fixed size and there is no other data all parity information will end-up on one (z1) or two (z2) specific disks? On 23/06/2010 17:51, Adam Leventhal wrote: Hey Robert, How big of a file are you making? RAID-Z does not explicitly do the parity distribution that RAID-5 does. Instead, it relies on non-uniform stripe widths to distribute IOPS. Adam On Jun 18, 2010, at 7:26 AM, Robert Milkowski wrote: Hi, zpool create test raidz c0t0d0 c1t0d0 c2t0d0 c3t0d0 \ raidz c0t1d0 c1t1d0 c2t1d0 c3t1d0 \ raidz c0t2d0 c1t2d0 c2t2d0 c3t2d0 \ raidz c0t3d0 c1t3d0 c2t3d0 c3t3d0 \ [...] raidz c0t10d0 c1t10d0 c2t10d0 c3t10d0 zfs set atime=off test zfs set recordsize=16k test (I know...) now if I create a one large file with filebench and simulate a randomread workload with 1 or more threads then disks on c2 and c3 controllers are getting about 80% more reads. This happens both on 111b and snv_134. I would rather except all of them to get about the same number of iops. Any idea why? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] WD caviar/mpt issues
> I'm using iozone to get some performance numbers and I/O hangs when > it's doing the writing phase. > > This pool has: > > 18 x 2TB SAS disks as 9 data mirrors > 2 x 32GB X-25E as log mirror > 1 x 160GB X-160M as cache > > iostat shows "2" I/O operations active and SSDs at 100% busy when > it's stuck. Interesting. Have a SM 847E2 chassis with 33 constellation 2TB SAS and 3 vertex LE 100G, dual-connected across a pair of 9211-8is, sol10u8 with may patchset, and it runs like a champ - left several bonnie++ processes running on it for three days straight thrashing the pool, not even a blip. (the rear and front backplanes are separately cabled to the controllers.) (that's with load-balance="none", in deference to Josh Simon's observations - not really willing to lock the paths because I want the auto-failover. I'm going to be dropping in another pair of 9211-4is and connecting the back 12 drives to them since I have the PCIe slots, though it's probably not especially necessary.) I wonder if the expander chassis work better if you're running with the dual-expander-chip backplane? So far all of my testing with the 2TB SAS drives have been with single-expander-chip backplanes. Hm, might have to give that a try; it never came up simply because both of my dual-expander-chip-backplane JBODs were filled and in use, which just recently changed. > My plan is to use the newest SC846E26 chassis with 2 cables but right > now what I've available for testing is the SC846E1. Agreed. I just got my first 847E2 chassis in today - been waiting for months for them to be available, and I'm not entirely sure there's any real stock (sorta like SM's quad-socket Magny-Cours boards - a month ago, they didn't even have any boards in the USA available for RMA, they got one batch in and sold it in a week or so). > >> Swapping the 9211-4i for a MegaRAID ELP (mega_sas) improves > >> performance by 30-40% instantly and there are no hangs anymore so I'm > >> guessing it's something related to the mpt_sas driver. Wait. The mpt_sas driver by default uses scsi_vhci, and scsi_vhci by default does load-balance round-robin. Have you tried setting load-balance="none" in scsi_vhci.conf? -bacon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] WD caviar/mpt issues
Gack, that's the same message we're seeing with the mpt controller with SATA drives. I've never seen it with a SAS drive before . Has anyone noticed a trend of 2TB SATA drives en-masse not working well with the LSI SASx28/x36 expander chips? I can seemingly reproduce it on demand - hook > 4 2TB disks to one of my supermicro chassis, spin up the array, and beat on it. (The last part is optional; merely hooking up the WD Caviar Blacks and attempting an import is sometimes sufficient.) Sun guys, I've got piles of hardware, if you want a testbed you got it. > > What's your read/write mix, and what are you using for CPU/mem? How many > > drives? > > I'm using iozone to get some performance numbers and I/O hangs > when it's doing the writing phase. > > This pool has: > > 18 x 2TB SAS disks as 9 data mirrors > 2 x 32GB X-25E as log mirror > 1 x 160GB X-160M as cache > > iostat shows "2" I/O operations active and SSDs at 100% busy when > it's stuck. > > There are timeout messages when this happens: > > Jun 23 00:05:51 osol-x8-hba scsi: [ID 107833 kern.warning] > WARNING: > /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): > Jun 23 00:05:51 osol-x8-hba Disconnected command timeout for > Target 11 > Jun 23 00:05:51 osol-x8-hba scsi: [ID 365881 kern.info] > /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): > Jun 23 00:05:51 osol-x8-hba Log info 0x3114 received for target > 11. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] c5->c9 device name change prevents beadm activate
Cindy Swearingen wrote: On 06/23/10 10:40, Evan Layton wrote: On 6/23/10 4:29 AM, Brian Nitz wrote: I saw a problem while upgrading from build 140 to 141 where beadm activate {build141BE} failed because installgrub failed: # BE_PRINT_ERR=true beadm activate opensolarismigi-4 be_do_installgrub: installgrub failed for device c5t0d0s0. Unable to activate opensolarismigi-4. Unknown external error. The reason installgrub failed is that it is attempting to install grub on c5t0d0s0 which is where my root pool is: # zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub repaired 0 in 5h3m with 0 errors on Tue Jun 22 22:31:08 2010 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors But the raw device doesn't exist: # ls -ls /dev/rdsk/c5* /dev/rdsk/c5*: No such file or directory Even though zfs pool still sees it as c5, the actual device seen by format is c9t0d0s0 Is there any workaround for this problem? Is it a bug in install, zfs or somewhere else in ON? In this instance beadm is a victim of the zpool configuration reporting the wrong device. This does appear to be a ZFS issue since the device actually being used is not what zpool status is reporting. I'm forwarding this on to the ZFS alias to see if anyone has any thoughts there. -evan Hi Evan, I suspect that some kind of system, hardware, or firmware event changed this device name. We could identify the original root pool device with the zpool history output from this pool. Brian, you could boot this system from the OpenSolaris LiveCD and attempt to import this pool to see if that will update the device info correctly. If that doesn't help, then create /dev/rdsk/c5* symlinks to point to the correct device. I've seen this kind of device name change in a couple contexts now related to installs, image-updates, etc. I think we need to understand why this is happening. Prior to OpenSolaris and the new installer, we used to go to a fair amount of trouble to make sure that device names, once assigned, never changed. Various parts of the system depended on device names remaining the same across upgrades and other system events. Does anyone know why these device names are changing? Because that seems like the root of the problem. Creating symlinks with the old names seems like a band-aid, which could cause problems down the road--what if some other device on the system gets assigned that name on a future update? Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] c5->c9 device name change prevents beadm activate
On 06/23/10 10:40, Evan Layton wrote: On 6/23/10 4:29 AM, Brian Nitz wrote: I saw a problem while upgrading from build 140 to 141 where beadm activate {build141BE} failed because installgrub failed: # BE_PRINT_ERR=true beadm activate opensolarismigi-4 be_do_installgrub: installgrub failed for device c5t0d0s0. Unable to activate opensolarismigi-4. Unknown external error. The reason installgrub failed is that it is attempting to install grub on c5t0d0s0 which is where my root pool is: # zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub repaired 0 in 5h3m with 0 errors on Tue Jun 22 22:31:08 2010 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors But the raw device doesn't exist: # ls -ls /dev/rdsk/c5* /dev/rdsk/c5*: No such file or directory Even though zfs pool still sees it as c5, the actual device seen by format is c9t0d0s0 Is there any workaround for this problem? Is it a bug in install, zfs or somewhere else in ON? In this instance beadm is a victim of the zpool configuration reporting the wrong device. This does appear to be a ZFS issue since the device actually being used is not what zpool status is reporting. I'm forwarding this on to the ZFS alias to see if anyone has any thoughts there. -evan Hi Evan, I suspect that some kind of system, hardware, or firmware event changed this device name. We could identify the original root pool device with the zpool history output from this pool. Brian, you could boot this system from the OpenSolaris LiveCD and attempt to import this pool to see if that will update the device info correctly. If that doesn't help, then create /dev/rdsk/c5* symlinks to point to the correct device. Thanks, Cindy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
Hey Robert, How big of a file are you making? RAID-Z does not explicitly do the parity distribution that RAID-5 does. Instead, it relies on non-uniform stripe widths to distribute IOPS. Adam On Jun 18, 2010, at 7:26 AM, Robert Milkowski wrote: > Hi, > > > zpool create test raidz c0t0d0 c1t0d0 c2t0d0 c3t0d0 \ > raidz c0t1d0 c1t1d0 c2t1d0 c3t1d0 \ > raidz c0t2d0 c1t2d0 c2t2d0 c3t2d0 \ > raidz c0t3d0 c1t3d0 c2t3d0 c3t3d0 \ > [...] > raidz c0t10d0 c1t10d0 c2t10d0 c3t10d0 > > zfs set atime=off test > zfs set recordsize=16k test > (I know...) > > now if I create a one large file with filebench and simulate a randomread > workload with 1 or more threads then disks on c2 and c3 controllers are > getting about 80% more reads. This happens both on 111b and snv_134. I would > rather except all of them to get about the same number of iops. > > Any idea why? > > > -- > Robert Milkowski > http://milek.blogspot.com > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] c5->c9 device name change prevents beadm activate
On 6/23/10 4:29 AM, Brian Nitz wrote: I saw a problem while upgrading from build 140 to 141 where beadm activate {build141BE} failed because installgrub failed: # BE_PRINT_ERR=true beadm activate opensolarismigi-4 be_do_installgrub: installgrub failed for device c5t0d0s0. Unable to activate opensolarismigi-4. Unknown external error. The reason installgrub failed is that it is attempting to install grub on c5t0d0s0 which is where my root pool is: # zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub repaired 0 in 5h3m with 0 errors on Tue Jun 22 22:31:08 2010 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors But the raw device doesn't exist: # ls -ls /dev/rdsk/c5* /dev/rdsk/c5*: No such file or directory Even though zfs pool still sees it as c5, the actual device seen by format is c9t0d0s0 Is there any workaround for this problem? Is it a bug in install, zfs or somewhere else in ON? In this instance beadm is a victim of the zpool configuration reporting the wrong device. This does appear to be a ZFS issue since the device actually being used is not what zpool status is reporting. I'm forwarding this on to the ZFS alias to see if anyone has any thoughts there. -evan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raid-z - not even iops distribution
Reaching into the dusty regions of my brain, I seem to recall that since RAIDz does not work like a traditional RAID 5, particularly because of variably sized stripes, that the data may not hit all of the disks, but it will always be redundant. I apologize for not having a reference for this assertion, so I may be completely wrong. I assume your hardware is recent, the controllers are on PCIe x4 buses, etc. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COMSTAR iSCSI and two Windows computers
Look again at how XenServer does storage. I think you will find it already has a solution, both for iSCSI and NFS. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] WD caviar/mpt issues
On Wed, Jun 23, 2010 at 10:14 AM, Jeff Bacon wrote: >> > Have I missed any changes/updates in the situation? >> >> I'm been getting very bad performance out of a LSI 9211-4i card >> (mpt_sas) with Seagate Constellation 2TB SAS disks, SM SC846E1 and >> Intel X-25E/M SSDs. Long story short, I/O will hang for over 1 minute >> at random under heavy load. > > Hm. That I haven't seen. Is this hang as in some drive hangs up with > iostat busy% at 100 and nothing else happening (can't talk to a disk) or > a hang as perceived by applications under load? > > What's your read/write mix, and what are you using for CPU/mem? How many > drives? I'm using iozone to get some performance numbers and I/O hangs when it's doing the writing phase. This pool has: 18 x 2TB SAS disks as 9 data mirrors 2 x 32GB X-25E as log mirror 1 x 160GB X-160M as cache iostat shows "2" I/O operations active and SSDs at 100% busy when it's stuck. There are timeout messages when this happens: Jun 23 00:05:51 osol-x8-hba scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:05:51 osol-x8-hba Disconnected command timeout for Target 11 Jun 23 00:05:51 osol-x8-hba scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:05:51 osol-x8-hba Log info 0x3114 received for target 11. Jun 23 00:05:51 osol-x8-hba scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Jun 23 00:05:51 osol-x8-hba scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:05:51 osol-x8-hba Log info 0x3114 received for target 11. Jun 23 00:05:51 osol-x8-hba scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Jun 23 00:11:51 osol-x8-hba scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:11:51 osol-x8-hba Disconnected command timeout for Target 11 Jun 23 00:11:51 osol-x8-hba scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:11:51 osol-x8-hba Log info 0x3114 received for target 11. Jun 23 00:11:51 osol-x8-hba scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Jun 23 00:11:51 osol-x8-hba scsi: [ID 365881 kern.info] /p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0): Jun 23 00:11:51 osol-x8-hba Log info 0x3114 received for target 11. Jun 23 00:11:51 osol-x8-hba scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc > I wonder if maybe your SSDs are flooding the channel. I have a (many) > 847E2 chassis, and I'm considering putting in a second pair of > controllers and splitting the drives front/back so it's 24/12 vs all 36 > on one pair. My plan is to use the newest SC846E26 chassis with 2 cables but right now what I've available for testing is the SC846E1. I like the fact that SM uses the LSI chipsets in their backplanes. It's been a good experience so far. >> Swapping the 9211-4i for a MegaRAID ELP (mega_sas) improves >> performance by 30-40% instantly and there are no hangs anymore so I'm >> guessing it's something related to the mpt_sas driver. > > Well, I sorta hate to swap out all of my controllers (bother, not to > mention the cost) but it'd be nice to have raidutil/lsiutil back. As much as I would like to blame faulty hardware for this issue, I only pointed out that using the MegaRAID doesn't show the problem because that's what I've been using without any issues in this particular setup. This system will be available to me for quite some time, so if anyone wants all kinds of tests to understand what's happening, I would be happy to provide those. -- Giovanni Tirloni gtirl...@sysdroid.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] WD caviar/mpt issues
> > Have I missed any changes/updates in the situation? > > I'm been getting very bad performance out of a LSI 9211-4i card > (mpt_sas) with Seagate Constellation 2TB SAS disks, SM SC846E1 and > Intel X-25E/M SSDs. Long story short, I/O will hang for over 1 minute > at random under heavy load. Hm. That I haven't seen. Is this hang as in some drive hangs up with iostat busy% at 100 and nothing else happening (can't talk to a disk) or a hang as perceived by applications under load? What's your read/write mix, and what are you using for CPU/mem? How many drives? I wonder if maybe your SSDs are flooding the channel. I have a (many) 847E2 chassis, and I'm considering putting in a second pair of controllers and splitting the drives front/back so it's 24/12 vs all 36 on one pair. > Swapping the 9211-4i for a MegaRAID ELP (mega_sas) improves > performance by 30-40% instantly and there are no hangs anymore so I'm > guessing it's something related to the mpt_sas driver. Well, I sorta hate to swap out all of my controllers (bother, not to mention the cost) but it'd be nice to have raidutil/lsiutil back. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss