Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
What are you running there? snv or OpenSolaris? Could you try an OpenSolaris 2009.06 live disc and boot directly from that. Once I was running that build every single hot plug I tried worked flawlessly. I tried for several hours to replicate the problems that caused me to log that bug report, but the issue appeared completely resolved. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Well, I upgraded to b124, disabling ACPI because of [1], and I get exactly the same behaviour. I've removed the device from the zpool, and tried dd-ing from the device while I remove it; it still hangs all IO on the system until the disk is re-inserted. I'm running the kernel with -v (from diagnosing the ACPI issue) and nothing enlightening is printed in dmesg. 1: http://defect.opensolaris.org/bz/show_bug.cgi?id=11739 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Tue, Oct 13, 2009 at 9:42 AM, Aaron Brady wrote: > I did, but as tcook suggests running a later build, I'll try an > image-update (though, 111 > 2008.11, right?) > It should be, yes. b111 was released in April of 2009. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
I did, but as tcook suggests running a later build, I'll try an image-update (though, 111 > 2008.11, right?) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Hi Tim, that doesn't help in this case - it's a complete lockup apparently caused by driver issues. However, the good news ofr Insom is that the bug is closed because the problem now appears fixed. I tested it and found that it's no longer occuring in OpenSolaris 2008.11 or 2009.06. If you move to a newer build of OpenSolaris you should be fine. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Tue, Oct 13, 2009 at 8:54 AM, Aaron Brady wrote: > All's gone quiet on this issue, and the bug is closed, but I'm having > exactly the same problem; pulling a disk on this card, under OpenSolaris > 111, is pausing all IO (including, weirdly, network IO), and using the ZFS > utilities (zfs list, zpool list, zpool status) causes a hang until I replace > the disk. > -- > Did you set your failmode to continue? --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
All's gone quiet on this issue, and the bug is closed, but I'm having exactly the same problem; pulling a disk on this card, under OpenSolaris 111, is pausing all IO (including, weirdly, network IO), and using the ZFS utilities (zfs list, zpool list, zpool status) causes a hang until I replace the disk. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, Feb 12, 2009 at 5:16 PM, Bob Friesenhahn < bfrie...@simple.dallas.tx.us> wrote: > On Thu, 12 Feb 2009, Ross Smith wrote: > >> >> As far as I'm concerned, the 7000 series is a new hardware platform, >> > > You are joking right? Have you ever looked at the photos of these "new" > systems or compared them to other Sun systems? They are just re-purposed > existing systems with a bit of extra secret sauce added. > > Bob > Ya, that *secret sauce* is what makes it a new system. And out of the last 4 x4240's I've ordered, two had to have new motherboards installed within a week, and one had to have a new power supply. The other appears to have a dvd rom drive going flaky. So the fact they're based on existing hardware isn't exactly confidence inspiring either. Sun's old sparc gear: rock solid. The newer x64 has been leaving a bad taste in my mouth TBQH. The engineering behind the systems when I open them up is absolutely phenomenal. The failure rate, however, is downright scary. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, 12 Feb 2009, Ross Smith wrote: As far as I'm concerned, the 7000 series is a new hardware platform, You are joking right? Have you ever looked at the photos of these "new" systems or compared them to other Sun systems? They are just re-purposed existing systems with a bit of extra secret sauce added. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Heh, yeah, I've thought the same kind of thing in the past. The problem is that the argument doesn't really work for system admins. As far as I'm concerned, the 7000 series is a new hardware platform, with relatively untested drivers, running a software solution that I know is prone to locking up when hardware faults are handled badly by drivers. Fair enough, that actual solution is out of our price range, but I would still be very dubious about purchasing it. At the very least I'd be waiting a year for other people to work the kinks out of the drivers. Which is a shame, because ZFS has so many other great features it's easily our first choice for a storage platform. The one and only concern we have is its reliability. We have snv_106 running as a test platform now. If I felt I could trust ZFS 100% I'd roll it out tomorrow. On Thu, Feb 12, 2009 at 4:25 PM, Tim wrote: > > > On Thu, Feb 12, 2009 at 9:25 AM, Ross wrote: >> >> This sounds like exactly the kind of problem I've been shouting about for >> 6 months or more. I posted a huge thread on availability on these forums >> because I had concerns over exactly this kind of hanging. >> >> ZFS doesn't trust hardware or drivers when it comes to your data - >> everything is checksummed. However, when it comes to seeing whether devices >> are responding, and checking for faults, it blindly trusts whatever the >> hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to >> any unexpected bug or error in the storage chain. I've encountered at least >> two hang conditions myself (and I'm not exactly a heavy user), and I've seen >> several others on the forums, including a few on x4500's. >> >> Now, I do accept that errors like this will be few and far between, but >> they still means you have the risk that a badly handled error condition can >> hang your entire server, instead of just one drive. Solaris can handle >> things like CPU's or Memory going faulty for crying out loud. Its raid >> storage system had better be able to handle a disk failing. >> >> Sun seem to be taking the approach that these errors should be dealt with >> in the driver layer. And while that's technically correct, a reliable >> storage system had damn well better be able to keep the server limping along >> while we wait for patches to the storage drivers. >> >> ZFS absolutely needs an error handling layer between the volume manager >> and the devices. It needs to timeout items that are not responding, and it >> needs to drop bad devices if they could cause problems elsewhere. >> >> And yes, I'm repeating myself, but I can't understand why this is not >> being acted on. Right now the error checking appears to be such that if an >> unexpected, or badly handled error condition occurs in the driver stack, the >> pool or server hangs. Whereas the expected behavior would be for just one >> drive to fail. The absolute worst case scenario should be that an entire >> controller has to be taken offline (and I would hope that the controllers in >> an x4500 would be running separate instances of the driver software). >> >> None one of those conditions should be fatal, good storage designs cope >> with them all, and good error handling at the ZFS layer is absolutely vital >> when you have projects like Comstar introducing more and more types of >> storage device for ZFS to work with. >> >> Each extra type of storage introduces yet more software into the equation, >> and increases the risk of finding faults like this. While they will be >> rare, they should be expected, and ZFS should be designed to handle them. > > > I'd imagine for the exact same reason short-stroking/right-sizing isn't a > concern. > > "We don't have this problem in the 7000 series, perhaps you should buy one > of those". > > ;) > > --Tim > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, Feb 12, 2009 at 9:25 AM, Ross wrote: > This sounds like exactly the kind of problem I've been shouting about for 6 > months or more. I posted a huge thread on availability on these forums > because I had concerns over exactly this kind of hanging. > > ZFS doesn't trust hardware or drivers when it comes to your data - > everything is checksummed. However, when it comes to seeing whether devices > are responding, and checking for faults, it blindly trusts whatever the > hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to > any unexpected bug or error in the storage chain. I've encountered at least > two hang conditions myself (and I'm not exactly a heavy user), and I've seen > several others on the forums, including a few on x4500's. > > Now, I do accept that errors like this will be few and far between, but > they still means you have the risk that a badly handled error condition can > hang your entire server, instead of just one drive. Solaris can handle > things like CPU's or Memory going faulty for crying out loud. Its raid > storage system had better be able to handle a disk failing. > > Sun seem to be taking the approach that these errors should be dealt with > in the driver layer. And while that's technically correct, a reliable > storage system had damn well better be able to keep the server limping along > while we wait for patches to the storage drivers. > > ZFS absolutely needs an error handling layer between the volume manager and > the devices. It needs to timeout items that are not responding, and it > needs to drop bad devices if they could cause problems elsewhere. > > And yes, I'm repeating myself, but I can't understand why this is not being > acted on. Right now the error checking appears to be such that if an > unexpected, or badly handled error condition occurs in the driver stack, the > pool or server hangs. Whereas the expected behavior would be for just one > drive to fail. The absolute worst case scenario should be that an entire > controller has to be taken offline (and I would hope that the controllers in > an x4500 would be running separate instances of the driver software). > > None one of those conditions should be fatal, good storage designs cope > with them all, and good error handling at the ZFS layer is absolutely vital > when you have projects like Comstar introducing more and more types of > storage device for ZFS to work with. > > Each extra type of storage introduces yet more software into the equation, > and increases the risk of finding faults like this. While they will be > rare, they should be expected, and ZFS should be designed to handle them. > I'd imagine for the exact same reason short-stroking/right-sizing isn't a concern. "We don't have this problem in the 7000 series, perhaps you should buy one of those". ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
This sounds like exactly the kind of problem I've been shouting about for 6 months or more. I posted a huge thread on availability on these forums because I had concerns over exactly this kind of hanging. ZFS doesn't trust hardware or drivers when it comes to your data - everything is checksummed. However, when it comes to seeing whether devices are responding, and checking for faults, it blindly trusts whatever the hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to any unexpected bug or error in the storage chain. I've encountered at least two hang conditions myself (and I'm not exactly a heavy user), and I've seen several others on the forums, including a few on x4500's. Now, I do accept that errors like this will be few and far between, but they still means you have the risk that a badly handled error condition can hang your entire server, instead of just one drive. Solaris can handle things like CPU's or Memory going faulty for crying out loud. Its raid storage system had better be able to handle a disk failing. Sun seem to be taking the approach that these errors should be dealt with in the driver layer. And while that's technically correct, a reliable storage system had damn well better be able to keep the server limping along while we wait for patches to the storage drivers. ZFS absolutely needs an error handling layer between the volume manager and the devices. It needs to timeout items that are not responding, and it needs to drop bad devices if they could cause problems elsewhere. And yes, I'm repeating myself, but I can't understand why this is not being acted on. Right now the error checking appears to be such that if an unexpected, or badly handled error condition occurs in the driver stack, the pool or server hangs. Whereas the expected behavior would be for just one drive to fail. The absolute worst case scenario should be that an entire controller has to be taken offline (and I would hope that the controllers in an x4500 would be running separate instances of the driver software). None one of those conditions should be fatal, good storage designs cope with them all, and good error handling at the ZFS layer is absolutely vital when you have projects like Comstar introducing more and more types of storage device for ZFS to work with. Each extra type of storage introduces yet more software into the equation, and increases the risk of finding faults like this. While they will be rare, they should be expected, and ZFS should be designed to handle them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
> Yup, was an absolute nightmare to diagnose on top of everything else. > Definitely doesn't > happen in windows too. I really want somebody to try snv_94 on a Thumper to > see if you > get the same behaviour there, or whether it's unique to Supermicro's Marvell > card. On a Thumper under S10U5 we recently had a hardware failure of one disk. This caused all I/O to the entire 46 disk pool to hang. zpool status commands also were hanging. Reset commands from the service processor timed out unsuccessfully. The system had to be power cycled manually. After that booting took about 30 minutes. At this point the bad disk could be unconfigured with cfgadm and then hot swapped with a warranty replacement. So it appears that bug 6735931 is also affecting the X4500 upon disk hardware failure; in a way that seriously impairs the entire system's fault tolerance. I would be willing to test any T-patch coming out soon I found this thread after seeing a total failure of a hot unplug of a 1.5TB disk from a (different) newly assembled system with 3 AOC-SAT2-MV8 cards and 24 disks + one host spare. After removing one disk the entire system also froze; instead of initiating a resilver process with the hot spare. Clearly the marvell88sx driver cannot handle disk outages in any environment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
I don't think its just b94, I recall this behavior for as long as I've had the card. I'd also be interested to know if the sun driver team has ever even tested with this card. I realize its probably not a top priority, but it sure would be nice to have it working properly. On 8/20/08, Ross Smith <[EMAIL PROTECTED]> wrote: > >> > Without fail, cfgadm changes the status from "disk" to "sata-port" when >> > I >> > unplug a device attached to port 6 or 7, but most of the time unplugging >> > disks 0-5 results in no change in cfgadm, until I also attach disk 6 or >> > 7. >> >> That does seem inconsistent, or at least, it's not what I'd expect. > > Yup, was an absolute nightmare to diagnose on top of everything else. > Definitely doesn't happen in windows too. I really want somebody to try > snv_94 on a Thumper to see if you get the same behaviour there, or whether > it's unique to Supermicro's Marvell card. > >> > Often the system hung completely when you pulled one of the disks 0-5, >> > and wouldn't respond again until you re-inserted it. >> > >> > I'm 99.99% sure this is a driver issue for this controller. >> >> Have you logged a bug on it yet? > > Yup, 6735931. Added the information about it working in Windows today too. > > Ross > > _ > Get Hotmail on your mobile from Vodafone > http://clk.atdmt.com/UKM/go/107571435/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Ross Smith wrote: > > > Without fail, cfgadm changes the status from "disk" to "sata-port" > when I > > > unplug a device attached to port 6 or 7, but most of the time > unplugging > > > disks 0-5 results in no change in cfgadm, until I also attach disk > 6 or 7. > > > > That does seem inconsistent, or at least, it's not what I'd expect. > > Yup, was an absolute nightmare to diagnose on top of everything else. > Definitely doesn't happen in windows too. I really want somebody to try > snv_94 on a Thumper to see if you get the same behaviour there, or > whether it's unique to Supermicro's Marvell card. That's a very good question. > > > Often the system hung completely when you pulled one of the disks 0-5, > > > and wouldn't respond again until you re-inserted it. > > > > > > I'm 99.99% sure this is a driver issue for this controller. > > > > Have you logged a bug on it yet? > > Yup, 6735931. Added the information about it working in Windows today too. Heh... I should have recognised that, I moved it from the triage queue to driver/sata :-) James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
> > Without fail, cfgadm changes the status from "disk" to "sata-port" when I > > unplug a device attached to port 6 or 7, but most of the time unplugging > > disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7. > > That does seem inconsistent, or at least, it's not what I'd expect. Yup, was an absolute nightmare to diagnose on top of everything else. Definitely doesn't happen in windows too. I really want somebody to try snv_94 on a Thumper to see if you get the same behaviour there, or whether it's unique to Supermicro's Marvell card. > > Often the system hung completely when you pulled one of the disks 0-5, > > and wouldn't respond again until you re-inserted it. > > > > I'm 99.99% sure this is a driver issue for this controller. > > Have you logged a bug on it yet? Yup, 6735931. Added the information about it working in Windows today too. Ross _ Get Hotmail on your mobile from Vodafone http://clk.atdmt.com/UKM/go/107571435/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Ross wrote: > lol, I got bored after 13 pages and a whole day of going back through my > notes to pick out the relevant information. > > Besides, I did mention that I was using cfgadm to see what was connected > :-p. If you're really interested, most of my troubleshooting notes have > been posted to the forum, but unfortunately Sun's software has split it > into three or four pieces. Just search for posts talking about the > AOC-SAT2-MV8 card to find them. > > Without fail, cfgadm changes the status from "disk" to "sata-port" when I > unplug a device attached to port 6 or 7, but most of the time unplugging > disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7. That does seem inconsistent, or at least, it's not what I'd expect. > Often the system hung completely when you pulled one of the disks 0-5, > and wouldn't respond again until you re-inserted it. > > I'm 99.99% sure this is a driver issue for this controller. Have you logged a bug on it yet? James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
lol, I got bored after 13 pages and a whole day of going back through my notes to pick out the relevant information. Besides, I did mention that I was using cfgadm to see what was connected :-p. If you're really interested, most of my troubleshooting notes have been posted to the forum, but unfortunately Sun's software has split it into three or four pieces. Just search for posts talking about the AOC-SAT2-MV8 card to find them. Without fail, cfgadm changes the status from "disk" to "sata-port" when I unplug a device attached to port 6 or 7, but most of the time unplugging disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7. Often the system hung completely when you pulled one of the disks 0-5, and wouldn't respond again until you re-inserted it. I'm 99.99% sure this is a driver issue for this controller. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Well, when you leave out a bunch of relevant information you also leave people guessing! :-) Regardless, is it possibly that all of your testing was done with ZFS and not just the "raw" disk? If so, it is possible that ZFS isn't noticing the hot unplugging of the disk until it tries to access the drive. I don't know this, but it would be consistent with what you have related to date. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Fri, Aug 15, 2008 at 10:07:31AM -0500, Tim wrote: > You could always try FreeBSD :) > > > Unfortunately for me, Windows doesn't support ZFS... right now it's > > looking a whole load more stable. Nope: FreeBSD doesn't have proper power management either. florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 pgpOZ7C2tjYVb.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Oh god no, I'm already learning three new operating systems, now is not a good time to add a fourth. Ross<-- Windows admin now working with Ubuntu, OpenSolaris and ESX Date: Fri, 15 Aug 2008 10:07:31 -0500From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removedCC: zfs-discuss@opensolaris.org You could always try FreeBSD :)--Tim On Fri, Aug 15, 2008 at 9:44 AM, Ross <[EMAIL PROTECTED]> wrote: Haven't a clue, but I've just gotten around to installing windows on this box to test and I can confirm that hot plug works just fine in windows.Drives appear and dissappear in device manager the second I unplug the hardware. Any drive, either controller. So far I've done a couple of dozen removals, pulling individual drives, or as many as half a dozen at once. I've even gone as far as to immediately pull a drive I only just connected. Windows has no problems at all.Unfortunately for me, Windows doesn't support ZFS... right now it's looking a whole load more stable.Ross _ Win a voice over part with Kung Fu Panda & Live Search and 100’s of Kung Fu Panda prizes to win with Live Search http://clk.atdmt.com/UKM/go/107571439/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
You could always try FreeBSD :) --Tim On Fri, Aug 15, 2008 at 9:44 AM, Ross <[EMAIL PROTECTED]> wrote: > Haven't a clue, but I've just gotten around to installing windows on this > box to test and I can confirm that hot plug works just fine in windows. > > Drives appear and dissappear in device manager the second I unplug the > hardware. Any drive, either controller. So far I've done a couple of dozen > removals, pulling individual drives, or as many as half a dozen at once. > I've even gone as far as to immediately pull a drive I only just connected. > Windows has no problems at all. > > Unfortunately for me, Windows doesn't support ZFS... right now it's > looking a whole load more stable. > > Ross > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Haven't a clue, but I've just gotten around to installing windows on this box to test and I can confirm that hot plug works just fine in windows. Drives appear and dissappear in device manager the second I unplug the hardware. Any drive, either controller. So far I've done a couple of dozen removals, pulling individual drives, or as many as half a dozen at once. I've even gone as far as to immediately pull a drive I only just connected. Windows has no problems at all. Unfortunately for me, Windows doesn't support ZFS... right now it's looking a whole load more stable. Ross > > I don't have any extra cards lying > around and can't really take my server down, so > my immediate question would be:Is there any sort > of PCI bridge chip on the card? I know in my > experience I've seen all sorts of headaches with > less than stellar bridge chips. Specifically > some of the IBM bridge chips. > Food for > thought.--Tim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
I don't have any extra cards lying around and can't really take my server down, so my immediate question would be: Is there any sort of PCI bridge chip on the card? I know in my experience I've seen all sorts of headaches with less than stellar bridge chips. Specifically some of the IBM bridge chips. Food for thought. --Tim On Thu, Aug 14, 2008 at 5:24 AM, Ross <[EMAIL PROTECTED]> wrote: > This is the problem when you try to write up a good summary of what you > found. I've got pages and pages of notes of all the tests I did here, far > more than I could include in that PDF. > > What makes me think it's driver is that I've done much of what you > suggested. I've replicated the exact same behaviour on two different cards, > individually and with both cards attached to the server. It's also > consistent across many different brands and types of drive, and occurs even > if I have just 4 drives connected out of 8 on a single controller. > > I did wonder whether it could be hardware related, so I tested plugging and > unplugging drives while the computer was booting. While doing that and > hot-plugging drives in the BIOS, at no point did I see any hanging of the > system, which tends to confirm my thought that it's driver related. > > I was also able to power on the system with all drives connected, wait for > the controllers to finish scanning the drives, then remove a few at the GRUB > boot screen. From there when I continue to boot Solaris, the correct state > is detected every time for all drives. > > Based on that, it appears that it's purely a problem with detection of the > insertion / removal event after Solaris has loaded its drivers. Initial > detection is fine, it's purely hot swap detection on ports 0-5 that fails. > I know it sounds weird, but trust me I checked this pretty carefully, and > experience has taught me never to assume computers won't behave in odd ways. > > I do appreciate my diagnosis may be wrong as I have very limited knowledge > of Solaris' internals, but that is my best guess right now. > > Ross > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
This is the problem when you try to write up a good summary of what you found. I've got pages and pages of notes of all the tests I did here, far more than I could include in that PDF. What makes me think it's driver is that I've done much of what you suggested. I've replicated the exact same behaviour on two different cards, individually and with both cards attached to the server. It's also consistent across many different brands and types of drive, and occurs even if I have just 4 drives connected out of 8 on a single controller. I did wonder whether it could be hardware related, so I tested plugging and unplugging drives while the computer was booting. While doing that and hot-plugging drives in the BIOS, at no point did I see any hanging of the system, which tends to confirm my thought that it's driver related. I was also able to power on the system with all drives connected, wait for the controllers to finish scanning the drives, then remove a few at the GRUB boot screen. From there when I continue to boot Solaris, the correct state is detected every time for all drives. Based on that, it appears that it's purely a problem with detection of the insertion / removal event after Solaris has loaded its drivers. Initial detection is fine, it's purely hot swap detection on ports 0-5 that fails. I know it sounds weird, but trust me I checked this pretty carefully, and experience has taught me never to assume computers won't behave in odd ways. I do appreciate my diagnosis may be wrong as I have very limited knowledge of Solaris' internals, but that is my best guess right now. Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Looking at what you wrote, you claim that hot plug events on ports 6 and 7 generally work, but other ports are not immediately discovered. Since there is no special code for ports 6 & 7 and no one else has reported this sort of behavior, it would make me think that you have a hardware issue. Possibly poor signaling over the SATA cables or possibly marginal power. See if things behave differently with fewer disks attached or when mix and matching cables/drives. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Ok, I've now reported most of the problems I found, but have additional information to add to bugs 6667199 and 667208. Can anybody tell me how I go about reporting that to Sun? thanks, Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss