I’m seeing a bunch of "zdb_blkptr_cb: Got error 50“ when I run zdb, and
checksum errors in the final summary. What is that? To explain:
The server has 8 x 4 TB HGST Deskstar NAS SATA drives in a raidz2 zones pool,
connected over an LSI 9211-8i with whatever IT firmware Joyent recommended last
spring. (I forget the exact version.) The pool has a ZeusRAM slog connected to
an LSI 9240-4i, flashed with the same revision of IT firmware.
It’s currently running SmartOS 20150205T075858Z and is being used for rsync
backups.
One backup job regularly wedges the server to the point that zpool iostat
reports no I/O and I can’t log on via SSH or on the console, and I may need to
reset the machine to recover. I suspect that there is a problem with the ZFS
dataset used by this one job, perhaps corruption from an earlier configuration
of the server.
zpool scrub reported no errors:
scan: scrub repaired 0 in 49h24m with 0 errors on Thu Feb 12 12:41:54 2015
config:
NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c0t5000CCA23DCD5A55d0 ONLINE 0 0 0
c0t5000CCA23DCD5AD7d0 ONLINE 0 0 0
c0t5000CCA23DCD27C2d0 ONLINE 0 0 0
c0t5000CCA23DCD59C5d0 ONLINE 0 0 0
c0t5000CCA23DCDDE1Fd0 ONLINE 0 0 0
c0t5000CCA23DCDDEADd0 ONLINE 0 0 0
c0t5000CCA23DCDE77Fd0 ONLINE 0 0 0
c0t5000CCA23DCDFED8d0 ONLINE 0 0 0
logs
c3t5000A72A3005FA86d0 ONLINE 0 0 0
Then I ran
# zdb -bbcsLvv zones
and got lots of errors like
zdb_blkptr_cb: Got error 50 reading <400, 439542, 0, c>
DVA[0]=<0:1722b9ed0000:2d000> [L0 ZFS plain file] fletcher4 uncompressed LE
contiguous unique single size=20000L/20000P birth=4646796L/4646796P fill=1
cksum=382fabd1bc7e:e16cea23bda658d:528f72ca32596b16:36dc5ff33e51e606 — skipping
zdb_blkptr_cb: Got error 50 reading <400, 439542, 0, 37>
DVA[0]=<0:1722ba50c000:15000> [L0 ZFS plain file] fletcher4 lz4 LE contiguous
unique single size=20000L/d000P birth=4646796L/4646796P fill=1
cksum=171643231dfd:2588189f0f08af1:833c1ab9ba7956de:b022589524762816 — skipping
etc. with final output
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
zones 26.6T 2.45T 5.38K 0 470M 0 0 0 4.88K
raidz2 26.6T 2.45T 5.38K 0 470M 0 0 0 11.4K
/dev/dsk/c0t5000CCA23DCD5A55d0s0 2.66K 0 73.7M 0 0 0 25
/dev/dsk/c0t5000CCA23DCD5AD7d0s0 2.66K 0 73.7M 0 0 0 18
/dev/dsk/c0t5000CCA23DCD27C2d0s0 2.66K 0 73.7M 0 0 0 19
/dev/dsk/c0t5000CCA23DCD59C5d0s0 2.66K 0 73.7M 0 0 0 29
/dev/dsk/c0t5000CCA23DCDDE1Fd0s0 2.66K 0 73.7M 0 0 0 21
/dev/dsk/c0t5000CCA23DCDDEADd0s0 2.66K 0 73.7M 0 0 0 13
/dev/dsk/c0t5000CCA23DCDE77Fd0s0 2.66K 0 73.7M 0 0 0 20
/dev/dsk/c0t5000CCA23DCDFED8d0s0 2.66K 0 73.7M 0 0 0 19
log /dev/dsk/c3t5000A72A3005FA86d0s0 4K 7.44G 0 0 15 0
0 0 0
What is “error 50“ and what are those checksum error reported by zdb that zpool
scrub doesn’t pick up on?
Thanks,
Chris
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com