I would like to get some help diagnosing permanent errors on my files. The machine in question has 12 1TB disks connected to an Areca raid card. I installed OpenSolaris build 134 and according to zpool history, created a pool with
zpool create bigraid raidz2 c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5 c4t0d6 c4t0d7 c4t1d0 c4t1d1 c4t1d2 c4t1d3 I then backed up 806G of files to the machine, and had the backup program verify the files. It failed. The check is continuing to run, but so far it found 4 files where the checksums of the backup files don't match the checksum of the original file. Zpool status shows problems: $ sudo zpool status -v pool: bigraid state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM bigraid DEGRADED 0 0 536 raidz2-0 DEGRADED 0 0 3.14K c4t0d0 ONLINE 0 0 0 c4t0d1 ONLINE 0 0 0 c4t0d2 ONLINE 0 0 0 c4t0d3 ONLINE 0 0 0 c4t0d4 ONLINE 0 0 0 c4t0d5 ONLINE 0 0 0 c4t0d6 ONLINE 0 0 0 c4t0d7 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t1d1 ONLINE 0 0 0 c4t1d2 ONLINE 0 0 0 c4t1d3 DEGRADED 0 0 0 too many errors errors: Permanent errors have been detected in the following files: <metadata>:<0x18> <metadata>:<0x3a> So, it appears that one of the disks is bad, but if one disk failed, how would a raidz2 pool develop permanent errors? The numbers in the CKSUM column are continuing to grow, but is that because the backup verification is tickling the errors as it runs? Previous postings on permanent errors said to look at fmdump -eV, but that has 437543 lines, and I don't really know how to interpret what I see. I did check the vdev_path with " fmdump -eV | grep vdev_path | sort | uniq -c" to see if it was only certain disks, but every disk in the array is listed in the file, albeit with different frequencies: 2189 vdev_path = /dev/dsk/c4t0d0s0 1077 vdev_path = /dev/dsk/c4t0d1s0 1077 vdev_path = /dev/dsk/c4t0d2s0 1097 vdev_path = /dev/dsk/c4t0d3s0 25 vdev_path = /dev/dsk/c4t0d4s0 25 vdev_path = /dev/dsk/c4t0d5s0 20 vdev_path = /dev/dsk/c4t0d6s0 1072 vdev_path = /dev/dsk/c4t0d7s0 1092 vdev_path = /dev/dsk/c4t1d0s0 2222 vdev_path = /dev/dsk/c4t1d1s0 2221 vdev_path = /dev/dsk/c4t1d2s0 1149 vdev_path = /dev/dsk/c4t1d3s0 What should I make of this? All the disks are bad? That seems unlikely. I found another thread http://opensolaris.org/jive/thread.jspa?messageID=399988 where it finally came down to bad memory, so I'll test that. Any other suggestions? -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss