Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-14 Thread Ross
What are you running there?  snv or OpenSolaris?

Could you try an OpenSolaris 2009.06 live disc and boot directly from that.  
Once I was running that build every single hot plug I tried worked flawlessly.  
I tried for several hours to replicate the problems that caused me to log that 
bug report, but the issue appeared completely resolved.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-14 Thread Aaron Brady
Well, I upgraded to b124, disabling ACPI because of [1], and I get exactly the 
same behaviour. I've removed the device from the zpool, and tried dd-ing from 
the device while I remove it; it still hangs all IO on the system until the 
disk is re-inserted.

I'm running the kernel with -v (from diagnosing the ACPI issue) and nothing 
enlightening is printed in dmesg.

1: http://defect.opensolaris.org/bz/show_bug.cgi?id=11739
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Tim Cook
On Tue, Oct 13, 2009 at 9:42 AM, Aaron Brady  wrote:

> I did, but as tcook suggests running a later build, I'll try an
> image-update (though, 111 > 2008.11, right?)
>


It should be, yes.  b111 was released in April of 2009.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Aaron Brady
I did, but as tcook suggests running a later build, I'll try an image-update 
(though, 111 > 2008.11, right?)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Ross
Hi Tim, that doesn't help in this case - it's a complete lockup apparently 
caused by driver issues.

However, the good news ofr Insom is that the bug is closed because the problem 
now appears fixed.  I tested it and found that it's no longer occuring in 
OpenSolaris 2008.11 or 2009.06.

If you move to a newer build of OpenSolaris you should be fine.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Tim Cook
On Tue, Oct 13, 2009 at 8:54 AM, Aaron Brady  wrote:

> All's gone quiet on this issue, and the bug is closed, but I'm having
> exactly the same problem; pulling a disk on this card, under OpenSolaris
> 111, is pausing all IO (including, weirdly, network IO), and using the ZFS
> utilities (zfs list, zpool list, zpool status) causes a hang until I replace
> the disk.
> --
>


Did you set your failmode to continue?


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-10-13 Thread Aaron Brady
All's gone quiet on this issue, and the bug is closed, but I'm having exactly 
the same problem; pulling a disk on this card, under OpenSolaris 111, is 
pausing all IO (including, weirdly, network IO), and using the ZFS utilities 
(zfs list, zpool list, zpool status) causes a hang until I replace the disk.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Tim
On Thu, Feb 12, 2009 at 5:16 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Thu, 12 Feb 2009, Ross Smith wrote:
>
>>
>> As far as I'm concerned, the 7000 series is a new hardware platform,
>>
>
> You are joking right?  Have you ever looked at the photos of these "new"
> systems or compared them to other Sun systems?  They are just re-purposed
> existing systems with a bit of extra secret sauce added.
>
> Bob
>

Ya, that *secret sauce* is what makes it a new system.  And out of the last
4 x4240's I've ordered, two had to have new motherboards installed within a
week, and one had to have a new power supply.  The other appears to have a
dvd rom drive going flaky.  So the fact they're based on existing hardware
isn't exactly confidence inspiring either.

Sun's old sparc gear: rock solid.  The newer x64 has been leaving a bad
taste in my mouth TBQH.  The engineering behind the systems when I open them
up is absolutely phenomenal.  The failure rate, however, is downright scary.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Bob Friesenhahn

On Thu, 12 Feb 2009, Ross Smith wrote:


As far as I'm concerned, the 7000 series is a new hardware platform,


You are joking right?  Have you ever looked at the photos of these 
"new" systems or compared them to other Sun systems?  They are just 
re-purposed existing systems with a bit of extra secret sauce added.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Ross Smith
Heh, yeah, I've thought the same kind of thing in the past.  The
problem is that the argument doesn't really work for system admins.

As far as I'm concerned, the 7000 series is a new hardware platform,
with relatively untested drivers, running a software solution that I
know is prone to locking up when hardware faults are handled badly by
drivers.  Fair enough, that actual solution is out of our price range,
but I would still be very dubious about purchasing it.  At the very
least I'd be waiting a year for other people to work the kinks out of
the drivers.

Which is a shame, because ZFS has so many other great features it's
easily our first choice for a storage platform.  The one and only
concern we have is its reliability.  We have snv_106 running as a test
platform now.  If I felt I could trust ZFS 100% I'd roll it out
tomorrow.



On Thu, Feb 12, 2009 at 4:25 PM, Tim  wrote:
>
>
> On Thu, Feb 12, 2009 at 9:25 AM, Ross  wrote:
>>
>> This sounds like exactly the kind of problem I've been shouting about for
>> 6 months or more.  I posted a huge thread on availability on these forums
>> because I had concerns over exactly this kind of hanging.
>>
>> ZFS doesn't trust hardware or drivers when it comes to your data -
>> everything is checksummed.  However, when it comes to seeing whether devices
>> are responding, and checking for faults, it blindly trusts whatever the
>> hardware or driver tells it.  Unfortunately, that means ZFS is vulnerable to
>> any unexpected bug or error in the storage chain.  I've encountered at least
>> two hang conditions myself (and I'm not exactly a heavy user), and I've seen
>> several others on the forums, including a few on x4500's.
>>
>> Now, I do accept that errors like this will be few and far between, but
>> they still means you have the risk that a badly handled error condition can
>> hang your entire server, instead of just one drive.  Solaris can handle
>> things like CPU's or Memory going faulty for crying out loud.  Its raid
>> storage system had better be able to handle a disk failing.
>>
>> Sun seem to be taking the approach that these errors should be dealt with
>> in the driver layer.  And while that's technically correct, a reliable
>> storage system had damn well better be able to keep the server limping along
>> while we wait for patches to the storage drivers.
>>
>> ZFS absolutely needs an error handling layer between the volume manager
>> and the devices.  It needs to timeout items that are not responding, and it
>> needs to drop bad devices if they could cause problems elsewhere.
>>
>> And yes, I'm repeating myself, but I can't understand why this is not
>> being acted on.  Right now the error checking appears to be such that if an
>> unexpected, or badly handled error condition occurs in the driver stack, the
>> pool or server hangs.  Whereas the expected behavior would be for just one
>> drive to fail.  The absolute worst case scenario should be that an entire
>> controller has to be taken offline (and I would hope that the controllers in
>> an x4500 would be running separate instances of the driver software).
>>
>> None one of those conditions should be fatal, good storage designs cope
>> with them all, and good error handling at the ZFS layer is absolutely vital
>> when you have projects like Comstar introducing more and more types of
>> storage device for ZFS to work with.
>>
>> Each extra type of storage introduces yet more software into the equation,
>> and increases the risk of finding faults like this.  While they will be
>> rare, they should be expected, and ZFS should be designed to handle them.
>
>
> I'd imagine for the exact same reason short-stroking/right-sizing isn't a
> concern.
>
> "We don't have this problem in the 7000 series, perhaps you should buy one
> of those".
>
> ;)
>
> --Tim
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Tim
On Thu, Feb 12, 2009 at 9:25 AM, Ross  wrote:

> This sounds like exactly the kind of problem I've been shouting about for 6
> months or more.  I posted a huge thread on availability on these forums
> because I had concerns over exactly this kind of hanging.
>
> ZFS doesn't trust hardware or drivers when it comes to your data -
> everything is checksummed.  However, when it comes to seeing whether devices
> are responding, and checking for faults, it blindly trusts whatever the
> hardware or driver tells it.  Unfortunately, that means ZFS is vulnerable to
> any unexpected bug or error in the storage chain.  I've encountered at least
> two hang conditions myself (and I'm not exactly a heavy user), and I've seen
> several others on the forums, including a few on x4500's.
>
> Now, I do accept that errors like this will be few and far between, but
> they still means you have the risk that a badly handled error condition can
> hang your entire server, instead of just one drive.  Solaris can handle
> things like CPU's or Memory going faulty for crying out loud.  Its raid
> storage system had better be able to handle a disk failing.
>
> Sun seem to be taking the approach that these errors should be dealt with
> in the driver layer.  And while that's technically correct, a reliable
> storage system had damn well better be able to keep the server limping along
> while we wait for patches to the storage drivers.
>
> ZFS absolutely needs an error handling layer between the volume manager and
> the devices.  It needs to timeout items that are not responding, and it
> needs to drop bad devices if they could cause problems elsewhere.
>
> And yes, I'm repeating myself, but I can't understand why this is not being
> acted on.  Right now the error checking appears to be such that if an
> unexpected, or badly handled error condition occurs in the driver stack, the
> pool or server hangs.  Whereas the expected behavior would be for just one
> drive to fail.  The absolute worst case scenario should be that an entire
> controller has to be taken offline (and I would hope that the controllers in
> an x4500 would be running separate instances of the driver software).
>
> None one of those conditions should be fatal, good storage designs cope
> with them all, and good error handling at the ZFS layer is absolutely vital
> when you have projects like Comstar introducing more and more types of
> storage device for ZFS to work with.
>
> Each extra type of storage introduces yet more software into the equation,
> and increases the risk of finding faults like this.  While they will be
> rare, they should be expected, and ZFS should be designed to handle them.
>


I'd imagine for the exact same reason short-stroking/right-sizing isn't a
concern.

"We don't have this problem in the 7000 series, perhaps you should buy one
of those".

;)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-12 Thread Ross
This sounds like exactly the kind of problem I've been shouting about for 6 
months or more.  I posted a huge thread on availability on these forums because 
I had concerns over exactly this kind of hanging.

ZFS doesn't trust hardware or drivers when it comes to your data - everything 
is checksummed.  However, when it comes to seeing whether devices are 
responding, and checking for faults, it blindly trusts whatever the hardware or 
driver tells it.  Unfortunately, that means ZFS is vulnerable to any unexpected 
bug or error in the storage chain.  I've encountered at least two hang 
conditions myself (and I'm not exactly a heavy user), and I've seen several 
others on the forums, including a few on x4500's.

Now, I do accept that errors like this will be few and far between, but they 
still means you have the risk that a badly handled error condition can hang 
your entire server, instead of just one drive.  Solaris can handle things like 
CPU's or Memory going faulty for crying out loud.  Its raid storage system had 
better be able to handle a disk failing.

Sun seem to be taking the approach that these errors should be dealt with in 
the driver layer.  And while that's technically correct, a reliable storage 
system had damn well better be able to keep the server limping along while we 
wait for patches to the storage drivers.

ZFS absolutely needs an error handling layer between the volume manager and the 
devices.  It needs to timeout items that are not responding, and it needs to 
drop bad devices if they could cause problems elsewhere.

And yes, I'm repeating myself, but I can't understand why this is not being 
acted on.  Right now the error checking appears to be such that if an 
unexpected, or badly handled error condition occurs in the driver stack, the 
pool or server hangs.  Whereas the expected behavior would be for just one 
drive to fail.  The absolute worst case scenario should be that an entire 
controller has to be taken offline (and I would hope that the controllers in an 
x4500 would be running separate instances of the driver software).

None one of those conditions should be fatal, good storage designs cope with 
them all, and good error handling at the ZFS layer is absolutely vital when you 
have projects like Comstar introducing more and more types of storage device 
for ZFS to work with.

Each extra type of storage introduces yet more software into the equation, and 
increases the risk of finding faults like this.  While they will be rare, they 
should be expected, and ZFS should be designed to handle them.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2009-02-11 Thread Peter Schultze
> Yup, was an absolute nightmare to diagnose on top of everything else.  
> Definitely doesn't 
> happen in windows too.  I really want somebody to try snv_94 on a Thumper to 
> see if you
> get the same behaviour there, or whether it's unique to Supermicro's Marvell 
> card.

On a Thumper under S10U5 we recently had a hardware failure
of one disk. This caused all I/O to the entire 46 disk pool to hang.
zpool status commands also were hanging. Reset commands
from the service processor timed out unsuccessfully. The system
had to be power cycled manually. After that booting took about
30 minutes. At this point the bad disk could be unconfigured
with cfgadm and then hot swapped with a warranty replacement.

So it appears that bug 6735931 is also affecting the X4500 upon disk
hardware failure; in a way that seriously impairs the entire system's 
fault tolerance. 

I would be willing to test any T-patch coming out soon

I found this thread after seeing a total failure of a hot unplug
of a 1.5TB disk from a (different) newly assembled system with 3 AOC-SAT2-MV8
cards and 24 disks + one host spare. After removing one disk
the entire system also froze; instead of initiating a resilver
process with the hot spare. Clearly the marvell88sx driver cannot handle
disk outages in any environment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Tim
I don't think its just b94, I recall this behavior for as long as I've
had the card.  I'd also be interested to know if the sun driver team
has ever even tested with this card.  I realize its probably not a top
priority, but it sure would be nice to have it working properly.






On 8/20/08, Ross Smith <[EMAIL PROTECTED]> wrote:
>
>> > Without fail, cfgadm changes the status from "disk" to "sata-port" when
>> > I
>> > unplug a device attached to port 6 or 7, but most of the time unplugging
>> > disks 0-5 results in no change in cfgadm, until I also attach disk 6 or
>> > 7.
>>
>> That does seem inconsistent, or at least, it's not what I'd expect.
>
> Yup, was an absolute nightmare to diagnose on top of everything else.
> Definitely doesn't happen in windows too.  I really want somebody to try
> snv_94 on a Thumper to see if you get the same behaviour there, or whether
> it's unique to Supermicro's Marvell card.
>
>> > Often the system hung completely when you pulled one of the disks 0-5,
>> > and wouldn't respond again until you re-inserted it.
>> >
>> > I'm 99.99% sure this is a driver issue for this controller.
>>
>> Have you logged a bug on it yet?
>
> Yup, 6735931.  Added the information about it working in Windows today too.
>
> Ross
>
> _
> Get Hotmail on your mobile from Vodafone
> http://clk.atdmt.com/UKM/go/107571435/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread James C. McPherson
Ross Smith wrote:
>  > > Without fail, cfgadm changes the status from "disk" to "sata-port" 
> when I
>  > > unplug a device attached to port 6 or 7, but most of the time 
> unplugging
>  > > disks 0-5 results in no change in cfgadm, until I also attach disk 
> 6 or 7.
>  >
>  > That does seem inconsistent, or at least, it's not what I'd expect.
> 
> Yup, was an absolute nightmare to diagnose on top of everything else.  
> Definitely doesn't happen in windows too.  I really want somebody to try 
> snv_94 on a Thumper to see if you get the same behaviour there, or 
> whether it's unique to Supermicro's Marvell card.

That's a very good question.

>  > > Often the system hung completely when you pulled one of the disks 0-5,
>  > > and wouldn't respond again until you re-inserted it.
>  > >
>  > > I'm 99.99% sure this is a driver issue for this controller.
>  >
>  > Have you logged a bug on it yet?
> 
> Yup, 6735931.  Added the information about it working in Windows today too.


Heh... I should have recognised that, I moved it from the
triage queue to driver/sata :-)


James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Ross Smith

> > Without fail, cfgadm changes the status from "disk" to "sata-port" when I
> > unplug a device attached to port 6 or 7, but most of the time unplugging
> > disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7.
> 
> That does seem inconsistent, or at least, it's not what I'd expect.

Yup, was an absolute nightmare to diagnose on top of everything else.  
Definitely doesn't happen in windows too.  I really want somebody to try snv_94 
on a Thumper to see if you get the same behaviour there, or whether it's unique 
to Supermicro's Marvell card.

> > Often the system hung completely when you pulled one of the disks 0-5,
> > and wouldn't respond again until you re-inserted it.
> > 
> > I'm 99.99% sure this is a driver issue for this controller.
> 
> Have you logged a bug on it yet?

Yup, 6735931.  Added the information about it working in Windows today too.

Ross

_
Get Hotmail on your mobile from Vodafone 
http://clk.atdmt.com/UKM/go/107571435/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread James C. McPherson
Ross wrote:
> lol, I got bored after 13 pages and a whole day of going back through my
> notes to pick out the relevant information.
> 
> Besides, I did mention that I was using cfgadm to see what was connected
> :-p.  If you're really interested, most of my troubleshooting notes have
> been posted to the forum, but unfortunately Sun's software has split it
> into three or four pieces.  Just search for posts talking about the
> AOC-SAT2-MV8 card to find them.
> 
> Without fail, cfgadm changes the status from "disk" to "sata-port" when I
> unplug a device attached to port 6 or 7, but most of the time unplugging
> disks 0-5 results in no change in cfgadm, until I also attach disk 6 or 7.

That does seem inconsistent, or at least, it's not what I'd expect.

> Often the system hung completely when you pulled one of the disks 0-5,
> and wouldn't respond again until you re-inserted it.
> 
> I'm 99.99% sure this is a driver issue for this controller.

Have you logged a bug on it yet?


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Ross
lol, I got bored after 13 pages and a whole day of going back through my notes 
to pick out the relevant information.

Besides, I did mention that I was using cfgadm to see what was connected :-p.  
If you're really interested, most of my troubleshooting notes have been posted 
to the forum, but unfortunately Sun's software has split it into three or four 
pieces.  Just search for posts talking about the AOC-SAT2-MV8 card to find them.

Without fail, cfgadm changes the status from "disk" to "sata-port" when I 
unplug a device attached to port 6 or 7, but most of the time unplugging disks 
0-5 results in no change in cfgadm, until I also attach disk 6 or 7.

Often the system hung completely when you pulled one of the disks 0-5, and 
wouldn't respond again until you re-inserted it.

I'm 99.99% sure this is a driver issue for this controller.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-20 Thread Brian D. Horn
Well, when you leave out a bunch of relevant information you also leave
people guessing! :-)

Regardless, is it possibly that all of your testing was done with ZFS and not
just the "raw" disk?  If so, it is possible that ZFS isn't noticing the hot 
unplugging
of the disk until it tries to access the drive.  I don't know this, but it would
be consistent with what you have related to date.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Florin Iucha
On Fri, Aug 15, 2008 at 10:07:31AM -0500, Tim wrote:
> You could always try FreeBSD :)
>
> > Unfortunately for me, Windows doesn't support ZFS...  right now it's
> > looking a whole load more stable.

Nope: FreeBSD doesn't have proper power management either.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


pgpOZ7C2tjYVb.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Ross Smith

Oh god no, I'm already learning three new operating systems, now is not a good 
time to add a fourth.
 
Ross<-- Windows admin now working with Ubuntu, OpenSolaris and ESX



Date: Fri, 15 Aug 2008 10:07:31 -0500From: [EMAIL PROTECTED]: [EMAIL 
PROTECTED]: Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive 
removedCC: zfs-discuss@opensolaris.org
You could always try FreeBSD :)--Tim
On Fri, Aug 15, 2008 at 9:44 AM, Ross <[EMAIL PROTECTED]> wrote:
Haven't a clue, but I've just gotten around to installing windows on this box 
to test and I can confirm that hot plug works just fine in windows.Drives 
appear and dissappear in device manager the second I unplug the hardware.  Any 
drive, either controller.  So far I've done a couple of dozen removals, pulling 
individual drives, or as many as half a dozen at once.  I've even gone as far 
as to immediately pull a drive I only just connected.  Windows has no problems 
at all.Unfortunately for me, Windows doesn't support ZFS...  right now it's 
looking a whole load more stable.Ross
_
Win a voice over part with Kung Fu Panda & Live Search   and   100’s of Kung Fu 
Panda prizes to win with Live Search
http://clk.atdmt.com/UKM/go/107571439/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Tim
You could always try FreeBSD :)

--Tim

On Fri, Aug 15, 2008 at 9:44 AM, Ross <[EMAIL PROTECTED]> wrote:

> Haven't a clue, but I've just gotten around to installing windows on this
> box to test and I can confirm that hot plug works just fine in windows.
>
> Drives appear and dissappear in device manager the second I unplug the
> hardware.  Any drive, either controller.  So far I've done a couple of dozen
> removals, pulling individual drives, or as many as half a dozen at once.
>  I've even gone as far as to immediately pull a drive I only just connected.
>  Windows has no problems at all.
>
> Unfortunately for me, Windows doesn't support ZFS...  right now it's
> looking a whole load more stable.
>
> Ross
>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-15 Thread Ross
Haven't a clue, but I've just gotten around to installing windows on this box 
to test and I can confirm that hot plug works just fine in windows.

Drives appear and dissappear in device manager the second I unplug the 
hardware.  Any drive, either controller.  So far I've done a couple of dozen 
removals, pulling individual drives, or as many as half a dozen at once.  I've 
even gone as far as to immediately pull a drive I only just connected.  Windows 
has no problems at all.

Unfortunately for me, Windows doesn't support ZFS...  right now it's looking a 
whole load more stable.

Ross


> 
> I don't have any extra cards lying
> around and can't really take my server down, so
> my immediate question would be:Is there any sort
> of PCI bridge chip on the card?  I know in my
> experience I've seen all sorts of headaches with
> less than stellar bridge chips.  Specifically
> some of the IBM bridge chips.
> Food for
> thought.--Tim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-14 Thread Tim
I don't have any extra cards lying around and can't really take my server
down, so my immediate question would be:
Is there any sort of PCI bridge chip on the card?  I know in my experience
I've seen all sorts of headaches with less than stellar bridge chips.
Specifically some of the IBM bridge chips.

Food for thought.

--Tim





On Thu, Aug 14, 2008 at 5:24 AM, Ross <[EMAIL PROTECTED]> wrote:

> This is the problem when you try to write up a good summary of what you
> found.  I've got pages and pages of notes of all the tests I did here, far
> more than I could include in that PDF.
>
> What makes me think it's driver is that I've done much of what you
> suggested.  I've replicated the exact same behaviour on two different cards,
> individually and with both cards attached to the server.  It's also
> consistent across many different brands and types of drive, and occurs even
> if I have just 4 drives connected out of 8 on a single controller.
>
> I did wonder whether it could be hardware related, so I tested plugging and
> unplugging drives while the computer was booting.  While doing that and
> hot-plugging drives in the BIOS, at no point did I see any hanging of the
> system, which tends to confirm my thought that it's driver related.
>
> I was also able to power on the system with all drives connected, wait for
> the controllers to finish scanning the drives, then remove a few at the GRUB
> boot screen.  From there when I continue to boot Solaris, the correct state
> is detected every time for all drives.
>
> Based on that, it appears that it's purely a problem with detection of the
> insertion / removal event after Solaris has loaded its drivers.  Initial
> detection is fine, it's purely hot swap detection on ports 0-5 that fails.
>  I know it sounds weird, but trust me I checked this pretty carefully, and
> experience has taught me never to assume computers won't behave in odd ways.
>
> I do appreciate my diagnosis may be wrong as I have very limited knowledge
> of Solaris' internals, but that is my best guess right now.
>
> Ross
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-14 Thread Ross
This is the problem when you try to write up a good summary of what you found.  
I've got pages and pages of notes of all the tests I did here, far more than I 
could include in that PDF.

What makes me think it's driver is that I've done much of what you suggested.  
I've replicated the exact same behaviour on two different cards, individually 
and with both cards attached to the server.  It's also consistent across many 
different brands and types of drive, and occurs even if I have just 4 drives 
connected out of 8 on a single controller.

I did wonder whether it could be hardware related, so I tested plugging and 
unplugging drives while the computer was booting.  While doing that and 
hot-plugging drives in the BIOS, at no point did I see any hanging of the 
system, which tends to confirm my thought that it's driver related.

I was also able to power on the system with all drives connected, wait for the 
controllers to finish scanning the drives, then remove a few at the GRUB boot 
screen.  From there when I continue to boot Solaris, the correct state is 
detected every time for all drives.

Based on that, it appears that it's purely a problem with detection of the 
insertion / removal event after Solaris has loaded its drivers.  Initial 
detection is fine, it's purely hot swap detection on ports 0-5 that fails.  I 
know it sounds weird, but trust me I checked this pretty carefully, and 
experience has taught me never to assume computers won't behave in odd ways.

I do appreciate my diagnosis may be wrong as I have very limited knowledge of 
Solaris' internals, but that is my best guess right now.

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-13 Thread Brian D. Horn
Looking at what you wrote, you claim that hot plug events on ports 6 and 7
generally work, but other ports are not immediately discovered.  Since
there is no special code for ports 6 & 7 and no one else has reported this
sort of behavior, it would make me think that you have a hardware issue.
Possibly poor signaling over the SATA cables or possibly marginal power.

See if things behave differently with fewer disks attached or when mix
and matching cables/drives.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-11 Thread Ross
Ok, I've now reported most of the problems I found, but have additional 
information to add to bugs 6667199 and 667208.  Can anybody tell me how I go 
about reporting that to Sun?

thanks,

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss