On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris Csanady said: > In a recent message, I detailed the excessive checksum errors that > occurred after replacing a disk. It seems that after a resilver > completes, it leaves a large number of blocks in the pool which fail > to checksum properly. Afterward, it is necessary to scrub the pool in > order to correct these errors. > > After some testing, it seems that this only occurs with RAID-Z. The > same behavior can be observed on both snv_59 and snv_60, though I do > not have any other installs to test at the moment.
A colleague at work and I have followed the same steps, included running a digest on the /test/file, on a SXCE:61 build today and can confirm the exact same, and disturbing?, result. My colleague mentioned to me he has witnessed the same 'resilver' behavior on builds 57 and 60. The box which these steps were performed on was 'luupgraded' from SXCE: 60 to 61 using the SUNWlu* packages from 61! # cat /etc/release Solaris Nevada snv_61 X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 26 March 2007 # mkdir /tmp/test # mkfile 64m /tmp/test/0 /tmp/test/1 # zpool create test raidz /tmp/test/0 /tmp/test/1 # mkfile 16m /test/file # digest -v -a sha1 /test/file sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 # # zpool export test # rm /tmp/test/0 # zpool import -d /tmp/test test # mkfile 64m /tmp/test/0 # zpool replace test /tmp/test/0 # digest -v -a sha1 /test/file sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 # zpool status test pool: test state: ONLINE scrub: resilver completed with 0 errors on Wed Apr 11 15:19:15 2007 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /tmp/test/0 ONLINE 0 0 0 /tmp/test/1 ONLINE 0 0 0 errors: No known data errors # zpool scrub test # # zpool status test pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Wed Apr 11 15:22:30 2007 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /tmp/test/0 ONLINE 0 0 17 /tmp/test/1 ONLINE 0 0 0 errors: No known data errors I don't think these checksum errors are a good sign. The sha1 digest on the file *does* show to be the same so the question arises: is the resilver process truly broken (even though in this test-case the test file does appear to unchanged based on the sha1 digest) ? Marco -- # make mistake make: don't know how to make mistake. Stop _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss