Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-04-22 Thread Haudy Kazemi

Brad Hill wrote:

I've seen reports of a recent Seagate firmware update
bricking drives again.

What's the output of 'zpool import' from the LiveCD?
 It sounds like
ore than 1 drive is dropping off.




r...@opensolaris:~# zpool import
  pool: tank
id: 16342816386332636568
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankFAULTED  corrupted data
  raidz1DEGRADED
c6t0d0  ONLINE
c6t1d0  ONLINE
c6t2d0  ONLINE
c6t3d0  UNAVAIL  cannot open
c6t4d0  ONLINE

  pool: rpool
id: 9891756864015178061
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

rpool   ONLINE
  c3d0s0ONLINE
  
1.) Here's a similar report from last summer from someone running ZFS on 
FreeBSD.  No resolution there either:

raidz vdev marked faulted with only one faulted disk
http://kerneltrap.org/index.php?q=mailarchive/freebsd-fs/2008/6/15/2132754

2.) This old thread from Dec 2007 for a different raidz1 problem, titled 
'Faulted raidz1 shows the same device twice' suggests trying these 
commands (see the link for the context they were run under):

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg13214.html

# zdb -l /dev/dsk/c18t0d0

# zpool export external
# zpool import external

# zpool clear external
# zpool scrub external
# zpool clear external

3.) Do you have ECC RAM? Have you verified that your memory, cpu, and 
motherboard are reliable?


4.) 'Bad exchange descriptor' is mentioned very sparingly across the 
net, mostly in system error tables.  Also here: 
http://opensolaris.org/jive/thread.jspa?threadID=88486tstart=165


5.) More raidz setup caveats, at least on MacOS: 
http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000346.html


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-30 Thread Pål Baltzersen
Take the new disk out as well.. foreign/bad non-zero disk label may cause 
trouble too.

I've experienced tool core dumps with foreign disk (partition) label which 
might be the case if it is a recycled replacement disk (In my case fixed by 
plugging the disk it into a linux desktop and blanking the disk by wiping the 
label with dd if=/dev/zero of=/dev/sdc bs=512 count=4 where /dev/sdc was the 
device it got assigned (linux: fdisk -l)).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-28 Thread Brad Hill
Yes. I have disconnected the bad disk and booted with nothing in the slot, and 
also with known good replacement disk in on the same sata port. Doesn't change 
anything.

Running 2008.11 on the box and 2008.11 snv_101b_rc2 on the LiveCD. I'll give it 
a shot booting from the latest build and see if that makes any kind of 
difference.

Thanks for the suggestions.

Brad

 Just a thought, but have you physically disconnected
 the bad disk?  It's not unheard of for a bad disk to
 cause problems with others.
 
 Failing that, it's the corrupted data bit that's
 worrying me, it sounds like you may have other
 corruption on the pool (always a risk with single
 parity raid), but I'm worried that it's not giving
 you any more details as to what's wrong.
 
 Also, what version of OpenSolaris are you running?
 Could you maybe try booting off a CD of the latest
 build?  There are often improvements in the way ZFS
 copes with errors, so it's worth a try.  I don't
 think it's likely to help, but I wouldn't discount
  it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

2009-01-27 Thread Brad Hill
Any ideas on this? It looks like a potential bug to me, or there is something 
that I'm not seeing.

Thanks again!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

2009-01-27 Thread Chris Du
Do you know 7200.11 has firmware bugs? 

Go to seagate website to check.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-27 Thread Blake
I guess you could try 'zpool import -f'.  This is a pretty odd status,
I think.  I'm pretty sure raidz1 should survive a single disk failure.

Perhaps a more knowledgeable list member can explain.

On Sat, Jan 24, 2009 at 12:48 PM, Brad Hill b...@thosehills.com wrote:
 I've seen reports of a recent Seagate firmware update
 bricking drives again.

 What's the output of 'zpool import' from the LiveCD?
  It sounds like
 ore than 1 drive is dropping off.


 r...@opensolaris:~# zpool import
  pool: tank
id: 16342816386332636568
  state: FAULTED
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
 config:

tankFAULTED  corrupted data
  raidz1DEGRADED
c6t0d0  ONLINE
c6t1d0  ONLINE
c6t2d0  ONLINE
c6t3d0  UNAVAIL  cannot open
c6t4d0  ONLINE

  pool: rpool
id: 9891756864015178061
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
 config:

rpool   ONLINE
  c3d0s0ONLINE
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-27 Thread Brad Hill
r...@opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hoping someone has seen that before... the Google is seriously letting me down 
on that one.

 I guess you could try 'zpool import -f'.  This is a
 pretty odd status,
 I think.  I'm pretty sure raidz1 should survive a
 single disk failure.
 
 Perhaps a more knowledgeable list member can explain.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-27 Thread Blake
This is outside the scope of my knowledge/experience.  Maybe there is
now a core file you can examine?  That might help you at least see
what's going on?

On Tue, Jan 27, 2009 at 10:32 PM, Brad Hill b...@thosehills.com wrote:
 r...@opensolaris:~# zpool import -f tank
 internal error: Bad exchange descriptor
 Abort (core dumped)

 Hoping someone has seen that before... the Google is seriously letting me 
 down on that one.

 I guess you could try 'zpool import -f'.  This is a
 pretty odd status,
 I think.  I'm pretty sure raidz1 should survive a
 single disk failure.

 Perhaps a more knowledgeable list member can explain.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

2009-01-27 Thread Brad Hill
I do, thank you. The disk that went out sounds like it had a head crash or some 
such - loud clicking shortly after spin-up then it spins down and gives me 
nothing. BIOS doesn't even detect it properly to do a firmware update.


 Do you know 7200.11 has firmware bugs? 
 
 Go to seagate website to check.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting

2009-01-27 Thread Ross
Just a thought, but have you physically disconnected the bad disk?  It's not 
unheard of for a bad disk to cause problems with others.

Failing that, it's the corrupted data bit that's worrying me, it sounds like 
you may have other corruption on the pool (always a risk with single parity 
raid), but I'm worried that it's not giving you any more details as to what's 
wrong.

Also, what version of OpenSolaris are you running?  Could you maybe try booting 
off a CD of the latest build?  There are often improvements in the way ZFS 
copes with errors, so it's worth a try.  I don't think it's likely to help, but 
I wouldn't discount it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-24 Thread Brad Hill
 I've seen reports of a recent Seagate firmware update
 bricking drives again.
 
 What's the output of 'zpool import' from the LiveCD?
  It sounds like
 ore than 1 drive is dropping off.


r...@opensolaris:~# zpool import
  pool: tank
id: 16342816386332636568
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankFAULTED  corrupted data
  raidz1DEGRADED
c6t0d0  ONLINE
c6t1d0  ONLINE
c6t2d0  ONLINE
c6t3d0  UNAVAIL  cannot open
c6t4d0  ONLINE

  pool: rpool
id: 9891756864015178061
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

rpool   ONLINE
  c3d0s0ONLINE
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-23 Thread Blake
I've seen reports of a recent Seagate firmware update bricking drives again.

What's the output of 'zpool import' from the LiveCD?  It sounds like
more than 1 drive is dropping off.



On Thu, Jan 22, 2009 at 10:52 PM, Brad Hill b...@thosehills.com wrote:
 I would get a new 1.5 TB and make sure it has the new
 firmware and replace
 c6t3d0 right away - even if someone here comes up
 with a magic solution, you
 don't want to wait for another drive to fail.

 The replacement disk showed up today but I'm unable to replace the one marked 
 UNAVAIL:

 r...@blitz:~# zpool replace tank c6t3d0
 cannot open 'tank': pool is unavailable

 I would in this case also immediately export the pool (to prevent any
 write attempts) and see about a firmware update for the failed drive
 (probably need windows for this).

 While I didn't export first, I did boot with a livecd and tried to force the 
 import with that:

 r...@opensolaris:~# zpool import -f tank
 internal error: Bad exchange descriptor
 Abort (core dumped)

 Hopefully someone on this list understands what situation I am in and how to 
 resolve it. Again, many thanks in advance for any suggestions you all have to 
 offer.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

2009-01-22 Thread Brad Hill
 I would get a new 1.5 TB and make sure it has the new
 firmware and replace 
 c6t3d0 right away - even if someone here comes up
 with a magic solution, you 
 don't want to wait for another drive to fail.

The replacement disk showed up today but I'm unable to replace the one marked 
UNAVAIL:

r...@blitz:~# zpool replace tank c6t3d0
cannot open 'tank': pool is unavailable

 I would in this case also immediately export the pool (to prevent any 
 write attempts) and see about a firmware update for the failed drive 
 (probably need windows for this).

While I didn't export first, I did boot with a livecd and tried to force the 
import with that:

r...@opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hopefully someone on this list understands what situation I am in and how to 
resolve it. Again, many thanks in advance for any suggestions you all have to 
offer.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss