D'oh. One more thing.

We had a problem in b120-123 that caused random checksum errors on RAIDZ configs. This info is still in the ZFS troubleshooting guide.

See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved

If not, then see the recommendation below.



On 04/15/11 13:18, Cindy Swearingen wrote:
Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.



On 04/15/11 12:52, Karl Rossing wrote:

One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following.

-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go

        NAME         STATE     READ WRITE CKSUM
        vdipool      ONLINE       0     0     0
          raidz1     ONLINE       0     0     0
            c9t14d0  ONLINE       0     0    12  6K repaired
            c9t15d0  ONLINE       0     0    13  167K repaired
            c9t16d0  ONLINE       0     0    11  5.50K repaired
            c9t17d0  ONLINE       0     0    20  10K repaired
            c9t18d0  ONLINE       0     0    15  7.50K repaired
          c9t19d0    AVAIL

errors: No known data errors

I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors.

I'm wondering how it could have gotten so screwed up?


CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
zfs-discuss mailing list
zfs-discuss mailing list
zfs-discuss mailing list

Reply via email to