Richard Elling wrote:

status: One or more devices has experienced an unrecoverable error.  An
  attempt was made to correct the error.  Applications are unaffected.
  NAME        STATE     READ WRITE CKSUM
  rpool       ONLINE       0     0     0
    c1d0s0    ONLINE       0     0     1
errors: No known data errors

# zpool clear rpool
# zpool status -v
pool: rpool
state: ONLINE
scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 23:53:48 2009
config:
  NAME        STATE     READ WRITE CKSUM
  rpool       ONLINE       0     0     0
    c1d0s0    ONLINE       0     0     0
errors: No known data errors

Now I wonder where that error came from. It was just a single checksum error. It couldn't go away with an earlier scrub, and seemingly left no traces of badness on the drive. Something serious? At least it looks a tad contradictory: "Applications are unaffected.", it is unrecoverable, and once cleared, there is no error left.

Since there are "no known data errors," it was fixed, and the scrub
should succeed without errors.  You cannot conclude that the drive
is completely free of faults using scrub, you can only test the areas
of the drive which have active data.

I didn't conclude that.
I conclude, when an 'unrecoverable error' is found, that 'zpool clear' cannot recover it. Still, there was one one CHSUM error before, and it wouldn't go away before the 'clear'; while after the 'clear' even that one would disappear.


As Cindy notes, more detailed info is available in FMA.  But know
that ZFS can detect transient faults, as well as permanent faults,
almost anywhere in the data path.

So this is the respective output:

Feb 16 2009 23:18:47.848442332 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
   class = ereport.io.scsi.cmd.disk.dev.uderr
   ena = 0xd0dd396561a00001
   detector = (embedded nvlist)
   nvlist version: 0
       version = 0x0
       scheme = dev
       device-path = /p...@0,0/pci1565,3...@4,1/stor...@4/d...@0,0
       devid = id1,s...@f00551e8c4980493b000551a00000
   (end detector)

   driver-assessment = fail
   op-code = 0x1a
   cdb = 0x1a 0x0 0x8 0x0 0x18 0x0
   pkt-reason = 0x0
   pkt-state = 0x1f
   pkt-stats = 0x0
   stat-code = 0x0
un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page code mismatch 0

   un-decode-value =
   __ttl = 0x1
   __tod = 0x499983d7 0x329233dc

Mar 27 2009 22:27:42.314752029 ereport.fs.zfs.checksum
nvlist version: 0
   class = ereport.fs.zfs.checksum
   ena = 0xb393a3ba200001
   detector = (embedded nvlist)
   nvlist version: 0
       version = 0x0
       scheme = zfs
       pool = 0xf6bd78c1d3b3c878
       vdev = 0x38287e797d1642bc
   (end detector)

   pool = rpool
   pool_guid = 0xf6bd78c1d3b3c878
   pool_context = 0
   pool_failmode = continue
   vdev_guid = 0x38287e797d1642bc
   vdev_type = disk
   vdev_path = /dev/dsk/c2d0s0
   vdev_devid = id1,c...@awdc_wd6400aaks-65a7b0=_____wd-wmasy4847131/a
   parent_guid = 0xf6bd78c1d3b3c878
   parent_type = root
   zio_err = 50
   zio_offset = 0x13a4c00000
   zio_size = 0x20000
   zio_objset = 0x13f
   zio_object = 0x20ff4
   zio_level = 0
   zio_blkid = 0xa
   __ttl = 0x1
   __tod = 0x49cce25e 0x12c2bc1d

Apr 13 2009 21:29:35.739718381 ereport.fs.zfs.checksum
nvlist version: 0
   class = ereport.fs.zfs.checksum
   ena = 0xb6afed32000001
   detector = (embedded nvlist)
   nvlist version: 0
       version = 0x0
       scheme = zfs
       pool = 0xf6bd78c1d3b3c878
       vdev = 0x38287e797d1642bc
   (end detector)

   pool = rpool
   pool_guid = 0xf6bd78c1d3b3c878
   pool_context = 0
   pool_failmode = continue
   vdev_guid = 0x38287e797d1642bc
   vdev_type = disk
   vdev_path = /dev/dsk/c1d0s0
   vdev_devid = id1,c...@awdc_wd6400aaks-65a7b0=_____wd-wmasy4847131/a
   parent_guid = 0xf6bd78c1d3b3c878
   parent_type = root
   zio_err = 50
   zio_offset = 0x421660000
   zio_size = 0x20000
   zio_objset = 0x107
   zio_object = 0x38dbf
   zio_level = 0
   zio_blkid = 0x4
   __ttl = 0x1
   __tod = 0x49e33e3f 0x2c1734ed

#

So I had not that many errors in the last 2 months: 3.
I'm sorry, but my question remains unanswered: Where did the unrecoverable error come from, and how it could go away?

Uwe

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to