Richard Elling wrote:
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1d0s0 ONLINE 0 0 1
errors: No known data errors
# zpool clear rpool
# zpool status -v
pool: rpool
state: ONLINE
scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14
23:53:48 2009
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1d0s0 ONLINE 0 0 0
errors: No known data errors
Now I wonder where that error came from. It was just a single
checksum error. It couldn't go away with an earlier scrub, and
seemingly left no traces of badness on the drive. Something serious?
At least it looks a tad contradictory: "Applications are
unaffected.", it is unrecoverable, and once cleared, there is no
error left.
Since there are "no known data errors," it was fixed, and the scrub
should succeed without errors. You cannot conclude that the drive
is completely free of faults using scrub, you can only test the areas
of the drive which have active data.
I didn't conclude that.
I conclude, when an 'unrecoverable error' is found, that 'zpool clear'
cannot recover it. Still, there was one one CHSUM error before, and it
wouldn't go away before the 'clear'; while after the 'clear' even that
one would disappear.
As Cindy notes, more detailed info is available in FMA. But know
that ZFS can detect transient faults, as well as permanent faults,
almost anywhere in the data path.
So this is the respective output:
Feb 16 2009 23:18:47.848442332 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.uderr
ena = 0xd0dd396561a00001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /p...@0,0/pci1565,3...@4,1/stor...@4/d...@0,0
devid = id1,s...@f00551e8c4980493b000551a00000
(end detector)
driver-assessment = fail
op-code = 0x1a
cdb = 0x1a 0x0 0x8 0x0 0x18 0x0
pkt-reason = 0x0
pkt-state = 0x1f
pkt-stats = 0x0
stat-code = 0x0
un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page
code mismatch 0
un-decode-value =
__ttl = 0x1
__tod = 0x499983d7 0x329233dc
Mar 27 2009 22:27:42.314752029 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xb393a3ba200001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf6bd78c1d3b3c878
vdev = 0x38287e797d1642bc
(end detector)
pool = rpool
pool_guid = 0xf6bd78c1d3b3c878
pool_context = 0
pool_failmode = continue
vdev_guid = 0x38287e797d1642bc
vdev_type = disk
vdev_path = /dev/dsk/c2d0s0
vdev_devid = id1,c...@awdc_wd6400aaks-65a7b0=_____wd-wmasy4847131/a
parent_guid = 0xf6bd78c1d3b3c878
parent_type = root
zio_err = 50
zio_offset = 0x13a4c00000
zio_size = 0x20000
zio_objset = 0x13f
zio_object = 0x20ff4
zio_level = 0
zio_blkid = 0xa
__ttl = 0x1
__tod = 0x49cce25e 0x12c2bc1d
Apr 13 2009 21:29:35.739718381 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xb6afed32000001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf6bd78c1d3b3c878
vdev = 0x38287e797d1642bc
(end detector)
pool = rpool
pool_guid = 0xf6bd78c1d3b3c878
pool_context = 0
pool_failmode = continue
vdev_guid = 0x38287e797d1642bc
vdev_type = disk
vdev_path = /dev/dsk/c1d0s0
vdev_devid = id1,c...@awdc_wd6400aaks-65a7b0=_____wd-wmasy4847131/a
parent_guid = 0xf6bd78c1d3b3c878
parent_type = root
zio_err = 50
zio_offset = 0x421660000
zio_size = 0x20000
zio_objset = 0x107
zio_object = 0x38dbf
zio_level = 0
zio_blkid = 0x4
__ttl = 0x1
__tod = 0x49e33e3f 0x2c1734ed
#
So I had not that many errors in the last 2 months: 3.
I'm sorry, but my question remains unanswered: Where did the
unrecoverable error come from, and how it could go away?
Uwe
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss