Re[3]: [zfs-discuss] zpool status and CKSUM errors

Robert Milkowski Sun, 09 Jul 2006 11:44:59 -0700

Hello Robert,

Thursday, July 6, 2006, 1:49:34 AM, you wrote:


RM> Hello Eric,

RM> Monday, June 12, 2006, 11:21:24 PM, you wrote:

ES>> I reproduced this pretty easily on a lab machine.  I've filed:

ES>> 6437568 ditto block repair is incorrectly propagated to root vdev

ES>> To track this issue.  Keep in mind that you do have a flakey
ES>> controller/lun/something.  If this had been a user data block, your data
ES>> would be gone.


RM> I belive that something else is also happening here.
RM> I can see CKSUM errors on two different servers (v240 and T2000) all
RM> on non-redundant zpools and all the times it looks like ditto block
RM> helped - hey, it's just improbable.

RM> And while on T2000 from fmdump -ev I get:

RM> Jul 05 19:59:43.8786 ereport.io.fire.pec.btp               
0x14e4b8015f612002
RM> Jul 05 20:05:28.9165 ereport.io.fire.pec.re                
0x14e5f951ce12b002
RM> Jul 05 20:05:58.5381 ereport.io.fire.pec.re                
0x14e614e78f4c9002
RM> Jul 05 20:05:58.5389 ereport.io.fire.pec.btp               
0x14e614e7b6ddf002
RM> Jul 05 23:34:11.1960 ereport.io.fire.pec.re                
0x1513869a6f7a6002
RM> Jul 05 23:34:11.1967 ereport.io.fire.pec.btp               
0x1513869a95196002
RM> Jul 06 00:09:17.1845 ereport.io.fire.pec.re                
0x151b2fca4c988002
RM> Jul 06 00:09:17.1852 ereport.io.fire.pec.btp               
0x151b2fca72e6b002


RM> on v240 fmdump shows nothing for over a month and I'm sure I did zpool
RM> clear on that server later.


RM> v240:
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM> status: One or more devices has experienced an unrecoverable error.  An
RM>         attempt was made to correct the error.  Applications are unaffected.
RM> action: Determine if the device needs to be replaced, and clear the errors
RM>         using 'zpool clear' or replace the device with 'zpool replace'.
RM>    see: http://www.sun.com/msg/ZFS-8000-9P
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0   167
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0   167

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool clear nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool scrub nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: scrub in progress, 0.01% done, 269h24m to go
RM> config:

RM>         NAME                                     STATE     READ WRITE CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#

RM> We'll see the result - I hope I would have not to stop it in the
RM> morning. Anyway I have a feeling that nothing will be reported.


RM> ps. I've got several similar pools on those two servers and I see
RM> CKSUM errors on all of them with the same result - it's almost
RM> impossible.


ok, it took several days actually to complete scrub.
During scrub I saw some CKSUM errors already and now again there are
many of them, however scrub itself reported no errors at all.

bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Sun Jul  9 02:56:19 2006
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0    18
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0    18

errors: No known data errors
bash-3.00#


-- 
Best regards,
 Robert                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[3]: [zfs-discuss] zpool status and CKSUM errors

Reply via email to