I’m seeing a bunch of "zdb_blkptr_cb: Got error 50“ when I run zdb, and 
checksum errors in the final summary. What is that? To explain:

The server has 8 x 4 TB HGST Deskstar NAS SATA drives in a raidz2 zones pool, 
connected over an LSI 9211-8i with whatever IT firmware Joyent recommended last 
spring. (I forget the exact version.) The pool has a ZeusRAM slog connected to 
an LSI 9240-4i, flashed with the same revision of IT firmware.

It’s currently running SmartOS 20150205T075858Z and is being used for rsync 
backups.

One backup job regularly wedges the server to the point that zpool iostat 
reports no I/O and I can’t log on via SSH or on the console, and I may need to 
reset the machine to recover. I suspect that there is a problem with the ZFS 
dataset used by this one job, perhaps corruption from an earlier configuration 
of the server.

zpool scrub reported no errors:

  scan: scrub repaired 0 in 49h24m with 0 errors on Thu Feb 12 12:41:54 2015
config:

        NAME                       STATE     READ WRITE CKSUM
        zones                      ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c0t5000CCA23DCD5A55d0  ONLINE       0     0     0
            c0t5000CCA23DCD5AD7d0  ONLINE       0     0     0
            c0t5000CCA23DCD27C2d0  ONLINE       0     0     0
            c0t5000CCA23DCD59C5d0  ONLINE       0     0     0
            c0t5000CCA23DCDDE1Fd0  ONLINE       0     0     0
            c0t5000CCA23DCDDEADd0  ONLINE       0     0     0
            c0t5000CCA23DCDE77Fd0  ONLINE       0     0     0
            c0t5000CCA23DCDFED8d0  ONLINE       0     0     0
        logs
          c3t5000A72A3005FA86d0    ONLINE       0     0     0

Then I ran

        # zdb -bbcsLvv zones

and got lots of errors like

zdb_blkptr_cb: Got error 50 reading <400, 439542, 0, c> 
DVA[0]=<0:1722b9ed0000:2d000> [L0 ZFS plain file] fletcher4 uncompressed LE 
contiguous unique single size=20000L/20000P birth=4646796L/4646796P fill=1 
cksum=382fabd1bc7e:e16cea23bda658d:528f72ca32596b16:36dc5ff33e51e606 — skipping

zdb_blkptr_cb: Got error 50 reading <400, 439542, 0, 37> 
DVA[0]=<0:1722ba50c000:15000> [L0 ZFS plain file] fletcher4 lz4 LE contiguous 
unique single size=20000L/d000P birth=4646796L/4646796P fill=1 
cksum=171643231dfd:2588189f0f08af1:833c1ab9ba7956de:b022589524762816 — skipping

etc. with final output

                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
zones                     26.6T 2.45T 5.38K     0  470M     0     0     0 4.88K
  raidz2                  26.6T 2.45T 5.38K     0  470M     0     0     0 11.4K
    /dev/dsk/c0t5000CCA23DCD5A55d0s0  2.66K     0 73.7M     0     0     0    25
    /dev/dsk/c0t5000CCA23DCD5AD7d0s0  2.66K     0 73.7M     0     0     0    18
    /dev/dsk/c0t5000CCA23DCD27C2d0s0  2.66K     0 73.7M     0     0     0    19
    /dev/dsk/c0t5000CCA23DCD59C5d0s0  2.66K     0 73.7M     0     0     0    29
    /dev/dsk/c0t5000CCA23DCDDE1Fd0s0  2.66K     0 73.7M     0     0     0    21
    /dev/dsk/c0t5000CCA23DCDDEADd0s0  2.66K     0 73.7M     0     0     0    13
    /dev/dsk/c0t5000CCA23DCDE77Fd0s0  2.66K     0 73.7M     0     0     0    20
    /dev/dsk/c0t5000CCA23DCDFED8d0s0  2.66K     0 73.7M     0     0     0    19
  log /dev/dsk/c3t5000A72A3005FA86d0s0    4K 7.44G     0     0    15     0     
0     0     0

What is “error 50“ and what are those checksum error reported by zdb that zpool 
scrub doesn’t pick up on?

Thanks,
Chris



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to