Re: [zfs-discuss] checksum errors on Sun Fire X4500

2009-01-22 Thread Carsten Aulbert
Hi Jay,

Jay Anderson schrieb:
> I have b105 running on a Sun Fire X4500, and I am constantly seeing checksum 
> errors reported by zpool status. The errors are showing up over time on every 
> disk in the pool. In normal operation there might be errors on two or three 
> disks each day, and sometimes there are enough errors so it reports "too many 
> errors," and the disk goes into a degraded state. I have had to remove the 
> spares from the pool because otherwise the spares get pulled into the pool to 
> replace the drives. There are no reported hardware problems with any of the 
> drives. I have run scrub multiple times, and this also generates checksum 
> errors. After the scrub completes the checksums continue to occur during 
> normal operation.
> 
> This problem also occurred with b103. Before that Solaris 10u4 was installed 
> on the server, and it never had any checksum errors. With the OpenSolaris 
> builds I am running CIFS Server, and that's the only difference in server 
> function from when Solaris 10u4 was installed on it.
> 
> Is this a known issue? Any suggestions or workarounds?

We had something similar two or three disk slots which started to act
weird and failed quite often - usually starting with a high error rate.
After exchanging two hard drives, the Sun hotline initiated to exchange
the backplane - essentially the chassis was replaced.

Since then, we have not encountered anything like this anymore.

So it *might* be the backplane or a broken Marvell controller, but it's
hard to judge.

HTH

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] checksum errors on Sun Fire X4500

2009-01-22 Thread Jay Anderson
I have b105 running on a Sun Fire X4500, and I am constantly seeing checksum 
errors reported by zpool status. The errors are showing up over time on every 
disk in the pool. In normal operation there might be errors on two or three 
disks each day, and sometimes there are enough errors so it reports "too many 
errors," and the disk goes into a degraded state. I have had to remove the 
spares from the pool because otherwise the spares get pulled into the pool to 
replace the drives. There are no reported hardware problems with any of the 
drives. I have run scrub multiple times, and this also generates checksum 
errors. After the scrub completes the checksums continue to occur during normal 
operation.

This problem also occurred with b103. Before that Solaris 10u4 was installed on 
the server, and it never had any checksum errors. With the OpenSolaris builds I 
am running CIFS Server, and that's the only difference in server function from 
when Solaris 10u4 was installed on it.

Is this a known issue? Any suggestions or workarounds?

Thank you.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss