Hi.
Sorry for forward but maybe this will be more visible that way.
I really think something strange is going on here and it's
virtually impossible that I have a problem with hardware and get
CKSUM errors (many of them) only for ditto blocks.
This is a forwarded message
From: Robert Milkowski <[EMAIL PROTECTED]>
To: Robert Milkowski <[EMAIL PROTECTED]>
Date: Sunday, July 9, 2006, 8:44:16 PM
Subject: [zfs-discuss] zpool status and CKSUM errors
===8<==Original message text===
Hello Robert,
Thursday, July 6, 2006, 1:49:34 AM, you wrote:
RM> Hello Eric,
RM> Monday, June 12, 2006, 11:21:24 PM, you wrote:
ES>> I reproduced this pretty easily on a lab machine. I've filed:
ES>> 6437568 ditto block repair is incorrectly propagated to root vdev
ES>> To track this issue. Keep in mind that you do have a flakey
ES>> controller/lun/something. If this had been a user data block, your data
ES>> would be gone.
RM> I belive that something else is also happening here.
RM> I can see CKSUM errors on two different servers (v240 and T2000) all
RM> on non-redundant zpools and all the times it looks like ditto block
RM> helped - hey, it's just improbable.
RM> And while on T2000 from fmdump -ev I get:
RM> Jul 05 19:59:43.8786 ereport.io.fire.pec.btp
0x14e4b8015f612002
RM> Jul 05 20:05:28.9165 ereport.io.fire.pec.re
0x14e5f951ce12b002
RM> Jul 05 20:05:58.5381 ereport.io.fire.pec.re
0x14e614e78f4c9002
RM> Jul 05 20:05:58.5389 ereport.io.fire.pec.btp
0x14e614e7b6ddf002
RM> Jul 05 23:34:11.1960 ereport.io.fire.pec.re
0x1513869a6f7a6002
RM> Jul 05 23:34:11.1967 ereport.io.fire.pec.btp
0x1513869a95196002
RM> Jul 06 00:09:17.1845 ereport.io.fire.pec.re
0x151b2fca4c988002
RM> Jul 06 00:09:17.1852 ereport.io.fire.pec.btp
0x151b2fca72e6b002
RM> on v240 fmdump shows nothing for over a month and I'm sure I did zpool
RM> clear on that server later.
RM> v240:
RM> bash-3.00# zpool status nfs-s5-s7
RM> pool: nfs-s5-s7
RM> state: ONLINE
RM> status: One or more devices has experienced an unrecoverable error. An
RM> attempt was made to correct the error. Applications are unaffected.
RM> action: Determine if the device needs to be replaced, and clear the errors
RM> using 'zpool clear' or replace the device with 'zpool replace'.
RM>see: http://www.sun.com/msg/ZFS-8000-9P
RM> scrub: none requested
RM> config:
RM> NAME STATE READ WRITE CKSUM
RM> nfs-s5-s7ONLINE 0 0 167
RM> c4t600C0FF009258F28706F5201d0 ONLINE 0 0 167
RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool clear nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM> pool: nfs-s5-s7
RM> state: ONLINE
RM> scrub: none requested
RM> config:
RM> NAME STATE READ WRITE CKSUM
RM> nfs-s5-s7ONLINE 0 0 0
RM> c4t600C0FF009258F28706F5201d0 ONLINE 0 0 0
RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool scrub nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM> pool: nfs-s5-s7
RM> state: ONLINE
RM> scrub: scrub in progress, 0.01% done, 269h24m to go
RM> config:
RM> NAME STATE READ WRITE CKSUM
RM> nfs-s5-s7ONLINE 0 0 0
RM> c4t600C0FF009258F28706F5201d0 ONLINE 0 0 0
RM> errors: No known data errors
RM> bash-3.00#
RM> We'll see the result - I hope I would have not to stop it in the
RM> morning. Anyway I have a feeling that nothing will be reported.
RM> ps. I've got several similar pools on those two servers and I see
RM> CKSUM errors on all of them with the same result - it's almost
RM> impossible.
ok, it took several days actually to complete scrub.
During scrub I saw some CKSUM errors already and now again there are
many of them, however scrub itself reported no errors at all.
bash-3.00# zpool status nfs-s5-s7
pool: nfs-s5-s7
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed with 0 errors on Sun Jul 9 02:56:19 2006
config:
NAME STATE READ WRITE CKSUM
nfs-s5-s7ONLINE 0 018
c4t600C0FF009258F28706F5201d0 ONLINE 0 018
errors: No known data errors
bash-3.00#
--
Best regards,
Robertmailt