Re: [zfs-discuss] Re: Significant pauses during zfs writes

2006-08-28 Thread Michael Schuster - Sun Microsystems

Robert Milkowski wrote:

Hello Michael,

Wednesday, August 23, 2006, 12:49:28 PM, you wrote:

MSSM Roch wrote:


MSSM I sent this output offline to Roch, here's the essential ones and (first)
MSSM his reply:


So it looks like this:

6421427 netra x1 slagged by NFS over ZFS leading to long spins in the ATA driver 
code
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427



Is there any workarounds?
Or maybe some code not yeyt integrated to try?


not that I know of - Roch may be better informed than me though.

cheers
Michael
--
Michael Schuster  +49 89 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] ZFS se6920

2006-08-28 Thread Robert Milkowski
Hello Wee,

Saturday, August 26, 2006, 6:43:05 PM, you wrote:

WYT Thanks to all who have responded.  I spent 2 weekends working through
WYT the best practices tthat Jerome recommended -- it's quite a mouthful.

WYT On 8/17/06, Roch [EMAIL PROTECTED] wrote:
 My general principles are:

 If you can, to improve you 'Availability' metrics,
 let ZFS handle one level of redundancy;

WYT Cool.  This is a good way to take advantage of the
WYT error-detection/correcting feature in ZFS.  We will definitely take
WYT this suggestion!

 For Random Read performance prefer mirrors over
 raid-z. If you use raid-z, group together a smallish
 number of volumes.

 setup volumes that correspond to small number of
 drives (smallest   you   can bear) with  a  volume
 interlace that is in the [1M-4M] range.

WYT I have a hard time picturing this wrt the 6920 storage pool.  The
WYT internal disks in the 6920 presents up to 2 VD per array (6-7 disk
WYT each?).  The storage pool will be built from a bunch of these VD and
WYT may be futher partitioned into several volumes and each volume is
WYT presented to a ZFS host.  What should the storage profile look like?
WYT I can probably do a stripe profile since I can leave the redundancy to
WYT ZFS.

IMHO if you have VD make just one partition and present it as a LUN to
ZFS. Do not present severap partitions from the same disks to ZFS as
different LUN.


WYT To complicate matters, we are likely going to attach all our 3510 into
WYT the 6920 and use some of these for the ZFS volumes so futher
WYT restrictions may apply.  Are we better off doing a direct attach?

You can attach 3510 JBODs (I guess) directly - but currently there're
restrictions - only one host and no MPxIO. If it's ok it looks like
you'll get better performance than if going with 3510 head unit.

ps. I did try with MPxIO and two hosts connected, with several JBODs -
and I did see FC loop logoug/login, etc.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Significant pauses during zfs writes

2006-08-28 Thread George Wilson

A fix for this should be integrated shortly.

Thanks,
George

Michael Schuster - Sun Microsystems wrote:

Robert Milkowski wrote:

Hello Michael,

Wednesday, August 23, 2006, 12:49:28 PM, you wrote:

MSSM Roch wrote:


MSSM I sent this output offline to Roch, here's the essential ones 
and (first)

MSSM his reply:


So it looks like this:

6421427 netra x1 slagged by NFS over ZFS leading to long spins in 
the ATA driver code

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421427



Is there any workarounds?
Or maybe some code not yeyt integrated to try?


not that I know of - Roch may be better informed than me though.

cheers
Michael

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sol 10 x86_64 intermittent SATA device locks up server

2006-08-28 Thread Shannon Roddy
Hello All,

I have an issue where I have two SATA cards with 5 drives each in one
zfs pool.  The issue is one of the devices has been intermittently
failing.  The problem is that the entire box seems to lock up on
occasion when this happens.  I currently have the SATA cable to that
device disconnected in the hopes that the box will at least stay up for
now.  This is a new build that I am burning in in the hopes that it
will serve as some NFS space for our solaris boxen.  Below is the output
from zpool status -vx

bash-3.00# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankDEGRADED 0 0 0
  raidz ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  raidz DEGRADED 0 0 0
c2t1d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0
c2t5d0  UNAVAIL 4263 0  cannot open

errors: No known data errors


And below is some info from /var/adm/messages:

Aug 29 12:42:08 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   link
data receive error - crc
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   link
data receive error - state
Aug 29 12:42:08 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   device
error
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt
Aug 29 12:42:08 localhost marvell88sx: [ID 702911 kern.notice]   EDMA
self disabled
Aug 29 12:43:08 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:43:08 localhost marvell88sx: [ID 702911 kern.notice]   device
disconnected
Aug 29 12:43:08 localhost marvell88sx: [ID 702911 kern.notice]   device
connected
Aug 29 12:43:08 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt
Aug 29 12:43:10 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:43:10 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt
Aug 29 12:43:10 localhost marvell88sx: [ID 702911 kern.notice]   link
data receive error - crc
Aug 29 12:43:10 localhost marvell88sx: [ID 702911 kern.notice]   link
data receive error - state
Aug 29 12:43:11 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:43:11 localhost marvell88sx: [ID 702911 kern.notice]   device
error
Aug 29 12:43:11 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt
Aug 29 12:43:11 localhost marvell88sx: [ID 702911 kern.notice]   EDMA
self disabled
Aug 29 12:44:10 localhost marvell88sx: [ID 812917 kern.warning] WARNING:
marvell88sx1: error on port 5:
Aug 29 12:44:10 localhost marvell88sx: [ID 702911 kern.notice]   device
disconnected
Aug 29 12:44:10 localhost marvell88sx: [ID 702911 kern.notice]   device
connected
Aug 29 12:44:10 localhost marvell88sx: [ID 702911 kern.notice]   SError
interrupt


My question is, shouldn't it be possible for the solaris to stay up even
with an intermittent drive error?  I have a replacement drive and cable
on order to see if that fixes the problem.

Thanks!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss