Re: [zfs-discuss] X4500 device disconnect problem persists

2008-03-11 Thread Gerry Haskins
Do *NOT* install 127871-02 on a Solaris 10 system.

127871-02 is an immature Feature patch associated with Solaris 10 Update 5.  
It's only purpose is for constructing pre-release builds of Solaris 10 Update 
5 for internal Sun testing.  It is *not* to be installed on pre-U5 systems.

127871-02 comes from a difference internal source code branch to normal 
Sustaining (bug fix) patches.

Installing 127871-02 on a Solaris 10 system will leave the system in an 
undefined state.  Please see the warnings in the patch README file.  If this 
patch has been installed on a production system, please back it out immediately.

Please let me know who gave you this patch as you should not have been given it.

A later revision of this patch (or an accumulating patch) will be released to 
SunSolve once Solaris 10 Update 5 ships in April/May.  This later revision will 
be a normal Sustaining patch which you can install on any Solaris 10 system.  
But until then, it is not safe to install 127871.

Best Wishes,

Gerry Haskins
Senior Engineering Manager
Software Product Engineering
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-12-29 Thread Peter Eriksson
Still no news when a real patch will be released for this issue?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-16 Thread roland egle
We are having the same problem.

First with 125025-05 and then also with 125205-07
Solaris 10 update 4 - Know with all Patchesx


We opened a Case and got

T-PATCH 127871-02

we installed the Marvell Driver Binary 3 Days ago.

T127871-02/SUNWckr/reloc/kernel/misc/sata
T127871-02/SUNWmv88sx/reloc/kernel/drv/marvell88sx
T127871-02/SUNWmv88sx/reloc/kernel/drv/amd64/marvell88sx
T127871-02/SUNWsi3124/reloc/kernel/drv/si3124
T127871-02/SUNWsi3124/reloc/kernel/drv/amd64/si3124 

It seems that this resolve the device reset problem and the nfsd crash on
x4500 with one raidz2 pool and a lot of zfs Filesystems
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-15 Thread Peter Eriksson
Speaking of error recovery due to bad blocks - anyone know if the SATA disks 
that are delivered with the Thumper have enterprise or desktop 
firmware/settings by default? If I'm not mistaken one of the differences is 
that the enterrprise variant more quickly gives up with bad blocks and 
reports those to the operating system compared to the desktop variant that 
will keep on retrying forever (or almost atleast)...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Dan Poltawski
I've just discovered patch 125205-07, which wasn't installed on our system 
because we don't have SUNWhea..

Has anyone with problems tried this patch, and has it helped at all?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Peter Tribble
On 11/13/07, Dan Poltawski [EMAIL PROTECTED] wrote:
 I've just discovered patch 125205-07, which wasn't installed on our system 
 because we don't have SUNWhea..

 Has anyone with problems tried this patch, and has it helped at all?

We were having a pretty rough time running S10U4. While I was away on vacation
125205-06 was applied and apparently made some difference, although the
problem doesn't seem to have entirely vanished. (It's gone far enough away that
users aren't complaining, but I think we still want to put the -07
version of the
patch on when we can and I too would like confirmation that it's helping and
hasn't introduced any other regressions..)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Lida Horn
The reset: no matching NCQ I/O found issue appears to be related to the
error recovery for bad blocks on the disk.  In general it should be harmless, 
but
I have looked into this.  If there is someone out there who;
1) Is hitting this issue, and;
2) Is running recent Solaris Nevada bits (not Solaris 10) and;
3) Is willing to try out an experimental driver

I can provide a new binary (with which I've done some testing already)
which would appear to deal with this issue and do better and quicker error
recovery.  Remember that the underlying problem still appears to be bad blocks
on the disk, so until those blocks are re-written or mapped away there will
still be slow response and error messages generated each and every time those
blocks are read.

Regards,
Lida
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-08 Thread Dan Poltawski
That is interesting, again we're having the same problem with our X4500s.

I am trying to work out what is causing the problem with NFS, restarting the 
service causes it to try and stop and not bring it back up. 

Rebooting the whole box fails and it just hangs till a hard reset..
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-07 Thread Kelly Kane
We have this identical problem on all 10 or so of our thumpers. They're running 
stock Solaris 10, whatever came with them. We think it's starting to cause 
problems, as we will see a rash of those errors on one of our machines, and 
then NFS will stop serving.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-10-28 Thread Michael
I got it too. Its a brand new x4500 (my 2nd eval box after the other one use to 
freeze up). I got this while running a java program that tries and reads a 128G 
file while writing a 100G file in 2 threads with 128K blocks.

Oct 29 00:56:28 zeta1 marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx2: 
device on port 1 reset: no matching NCQ I/O found
Oct 29 00:56:28 zeta1 marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx2: 
device on port 1 reset: device disconnected or device error
Oct 29 00:56:28 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 00:56:28 zeta1  port 1: device reset
Oct 29 00:56:28 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 00:56:28 zeta1  port 1: link lost
Oct 29 00:56:28 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 00:56:28 zeta1  port 1: link established
Oct 29 00:56:28 zeta1 marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx2: error on port 1:
Oct 29 00:56:28 zeta1 marvell88sx: [ID 517869 kern.info]device 
disconnected
Oct 29 00:56:28 zeta1 marvell88sx: [ID 517869 kern.info]device connected
Oct 29 00:56:28 zeta1 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd13):
Oct 29 00:56:28 zeta1   Error for Command: read(10)Error Level: 
Retryable
Oct 29 00:56:28 zeta1 scsi: [ID 107833 kern.notice] Requested Block: 
186994359 Error Block: 186994359
Oct 29 00:56:28 zeta1 scsi: [ID 107833 kern.notice] Vendor: ATA 
   Serial Number:
Oct 29 00:56:28 zeta1 scsi: [ID 107833 kern.notice] Sense Key: No 
Additional Sense
Oct 29 00:56:28 zeta1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0


I did a complete reinstall of Solaris 10 U4 and then applied 
NAME: Solaris 10_x86 Recommended Patch Cluster
DATE: Oct/26/07

Release: 5.10
Kernel architecture: i86pc
Application architecture: i386
Hardware provider:
Domain:
Kernel version: SunOS 5.10 Generic_127112-02

Looks like its back to set sata:sata_func_enable = 0x5 for me.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-10-28 Thread Willi Burmeister
Hi,

we have the same problem. Our X4500 has Solaris 10 11/06 and (nearly) 
every kernel and driver related patch installed. 

Nothing set in /etc/system

fmdump is not showing any errors

--
# fmdump
TIME UUID SUNW-MSG-ID
fmdump: /var/fm/fmd/fltlog is empty
--

from /var/adm/messages:
--
Oct 29 04:49:49 celeborn marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx4: device on port 1 reset: DMA command timeout
Oct 29 04:49:49 celeborn sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 04:49:49 celeborn  port 1: device reset
Oct 29 04:49:49 celeborn marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx4: device on port 1 reset: device disconnected or device error
Oct 29 04:49:49 celeborn sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 04:49:49 celeborn  port 1: device reset
Oct 29 04:49:49 celeborn sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 04:49:49 celeborn  port 1: link lost
Oct 29 04:49:49 celeborn sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 29 04:49:49 celeborn  port 1: link established
Oct 29 04:49:49 celeborn marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx4: error on port 1:
Oct 29 04:49:49 celeborn marvell88sx: [ID 517869 kern.info] device 
disconnected
Oct 29 04:49:49 celeborn marvell88sx: [ID 517869 kern.info] device connected
Oct 29 04:49:49 celeborn scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd15):
Oct 29 04:49:49 celebornError for Command: write(10)   
Error Level: Retryable
Oct 29 04:49:49 celeborn scsi: [ID 107833 kern.notice]  Requested Block: 
107400869 Error Block: 107400869
Oct 29 04:49:49 celeborn scsi: [ID 107833 kern.notice]  Vendor: ATA 
   Serial Number: 
Oct 29 04:49:49 celeborn scsi: [ID 107833 kern.notice]  Sense Key: No 
Additional Sense
Oct 29 04:49:49 celeborn scsi: [ID 107833 kern.notice]  ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0
--

--
# pca -l missing
Using /usr/local/etc/patchdiag.xref from Oct/18/07
Host: celeborn (SunOS 5.10/Generic_127112-01/i386/i86pc)

Patch  IR   CR RSB Age Synopsis
-- -- - -- --- --- ---
125333 01  02 -S-  14 JDS 3_x86: Macromedia Flash Player Plugin Patch
125546 --  01 ---  19 GNOME 2.6.0_x86: GNOME Performance Meter
127733 --  01 ---  11 SunOS 5.10_x86: sd Patch
127748 --  01 ---  11 SunOS 5.10_x86: pciehpc patch
127887 --  01 ---  11 SunOS 5.10_x86: ipf patch
--


Willi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-10-27 Thread Lida Horn
Stuart Anderson wrote:
 After applying 125205-07 on two X4500 machines running Sol10U4 and
 removing set sata:sata_func_enable = 0x5 from /etc/system to
 re-enable NCQ, I am again observing drive disconnect error messages.
 This in spite of the patch description which claims multiple fixes
 in this area:

 6587133 repeated DMA command timeouts and device resets on x4500
 6538627 x4500 message logs contain multiple device disk resets but nothing 
 logged in FMA
 6564956 Disparity error for marvell88sx3 was shown during boot-time
 for example,

 Has anyone else had any better luck with this?
   
I have never seen this before.  Please let me know all the patches you 
have added
to your machine.  It would appear that you are having some sort of 
hardware issue, but
apparently you provided only part of what was in /var/adm/messages 
below, which
makes it hard to say for certain.  What you have below may be all 
related to a single
device error (note the time stamps).  Are you saying this occurs over 
and over again?

By the way, only the fix for CR 6587133 deals with repeated device resets.
The fix for CR 6538627 just added to the logged message the reason for 
the reset and
the fix for CR6564956 made it so that right after boot the single reset 
that was required would
no longer be required.   This reset was only once per port per boot.

Regards,
Lida


 Thanks.


 Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
 marvell88sx3: device on port 1 reset: no matching NCQ I/O found
 Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
 marvell88sx3: device on port 1 reset: device disconnected or device error
 Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
 PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Oct 26 16:25:34 thumper2  port 1: device reset
 Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
 PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Oct 26 16:25:34 thumper2  port 1: link lost
 Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
 PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Oct 26 16:25:34 thumper2  port 1: link established
 Oct 26 16:25:34 thumper2 marvell88sx: [ID 812950 kern.warning] WARNING: 
 marvell88sx3: error on port 1:
 Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device 
 disconnected
 Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device 
 connected
 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd25):
 Oct 26 16:25:34 thumper2Error for Command: read(10)
 Error Level: Retryable
 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
 521002402 Error Block: 521002402
 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA   
  Serial Number: 
 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: No 
 Additional Sense
 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x0 (no 
 additional sense info), ASCQ: 0x0, FRU: 0x0

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] X4500 device disconnect problem persists

2007-10-26 Thread Stuart Anderson
After applying 125205-07 on two X4500 machines running Sol10U4 and
removing set sata:sata_func_enable = 0x5 from /etc/system to
re-enable NCQ, I am again observing drive disconnect error messages.
This in spite of the patch description which claims multiple fixes
in this area:

6587133 repeated DMA command timeouts and device resets on x4500
6538627 x4500 message logs contain multiple device disk resets but nothing 
logged in FMA
6564956 Disparity error for marvell88sx3 was shown during boot-time
for example,

Has anyone else had any better luck with this?


Thanks.


Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx3: device on port 1 reset: no matching NCQ I/O found
Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx3: device on port 1 reset: device disconnected or device error
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: device reset
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: link lost
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: link established
Oct 26 16:25:34 thumper2 marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx3: error on port 1:
Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device 
disconnected
Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device connected
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd25):
Oct 26 16:25:34 thumper2Error for Command: read(10)
Error Level: Retryable
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
521002402 Error Block: 521002402
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA 
   Serial Number: 
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: No 
Additional Sense
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss