My guess is that you have some defective hardware in the system that's 
causing bit flips in the checksum or the data payload.

I'd suggest running some sort of system diagnostics for a few hours to 
see if you can locate the bad piece of hardware.

My suspicion would be your memory or CPU, but that's just a wild guess, 
based on the number of errors you have and the number of devices it's 
spread over.

Could it be that you have been corrupting data for some time and now 
known it?

Oh - And i'd also look around based on your disk controller and ensure 
that there are no newer patches for it, just in case it's one for which 
there was a known problem. (which was worked around in the driver)

I *think* there was an issue with at least one or two...

Cheers!

Nathan.

Sandro wrote:
> hi folks
> 
> I've been running my fileserver at home with linux for a couple of years and 
> last week I finally reinstalled it with solaris 10 u4.
> 
> I borrowed a bunch of disks from a friend, copied over all the files, 
> reinstalled my fileserver and copied the data back.
> 
> Everything went fine, but after a few days now, quite a lot of files got 
> corrupted.
> here's the output:
> 
>  # zpool status data
>   pool: data
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         data        ONLINE       0     0 5.52K
>           raidz1    ONLINE       0     0 5.52K
>             c0t0d0  ONLINE       0     0 10.72
>             c0t1d0  ONLINE       0     0 4.59K
>             c0t2d0  ONLINE       0     0 5.18K
>             c0t3d0  ONLINE       0     0 9.10K
>             c1t0d0  ONLINE       0     0 7.64K
>             c1t1d0  ONLINE       0     0 3.75K
>             c1t2d0  ONLINE       0     0 4.39K
>             c1t3d0  ONLINE       0     0 6.04K
> 
> errors: 388 data errors, use '-v' for a list
> 
> Last night I found out about this, it told me there were errors in like 50 
> files.
> So I scrubbed the whole pool and it found a lot more corrupted files.
> 
> The temporary system which I used to hold the data while I'm installing 
> solaris on my fileserver is running nv build 80 and no errors on there.
> 
> What could be the cause of these errors??
> I don't see any hw errors on my disks..
> 
>  # iostat -En | grep -i error
> c3d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c4d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c0t0d0           Soft Errors: 574 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c1t0d0           Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c0t1d0           Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c0t2d0           Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c0t3d0           Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c1t1d0           Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c1t2d0           Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> c1t3d0           Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> 
> although a lot of soft errors.
> Linux said that one disk had gone bad, but I figured the sata cable was 
> somehow broken, so I replaced that before installing solaris. And solaris 
> didn't and doesn't see any actual hw errors on the disks, does it?
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to