Re: [zfs-discuss] How can a mirror lose a file?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard Elling > > This can happen if there is a failure in a common system component > during the write (eg. main memory, HBA, PCI bus, CPU, bridges, etc.) I bet that's the cause. Because as "sol" said ... "Doesn't it require the almost impossible scenario of exactly the same sector being trashed on both disks?" Basically, yeah. And it's time to start thinking up ways "almost impossible" isn't quite as impossible as you thought it was. Regular scrubs, snapshots, and backups are your friends. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can a mirror lose a file?
Hi Sol, What kind of disks? You should be able to use the fmdump -eV command to identify when the checksum errors occurred. Thanks, Cindy On 07/28/10 13:41, sol wrote: Hi Having just done a scrub of a mirror I've lost a file and I'm curious how this can happen in a mirror. Doesn't it require the almost impossible scenario of exactly the same sector being trashed on both disks? However the zpool status shows checksum errors not I/O errors and I'm not sure what that means in this case. I thought that a zfs mirror would be the "ultimate" in protection but it's not! Any ideas why and how to protect against this in the future? (BTW it's osol official release 2009.06 snv_111b) # zpool status -v pool: liver state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 3h31m with 1 errors config: NAME STATE READ WRITE CKSUM liver ONLINE 0 0 1 mirror ONLINE 0 0 2 c9d0p0 ONLINE 0 0 2 c10d0p0 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can a mirror lose a file?
On 07/29/10 07:41 AM, sol wrote: Hi Having just done a scrub of a mirror I've lost a file and I'm curious how this can happen in a mirror. Doesn't it require the almost impossible scenario of exactly the same sector being trashed on both disks? However the zpool status shows checksum errors not I/O errors and I'm not sure what that means in this case. I thought that a zfs mirror would be the "ultimate" in protection but it's not! Any ideas why and how to protect against this in the future? Bad memory? Use ECC memory and test it. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can a mirror lose a file?
On Jul 28, 2010, at 12:41 PM, sol wrote: > Having just done a scrub of a mirror I've lost a file and I'm curious how this > can happen in a mirror. Doesn't it require the almost impossible scenario > of exactly the same sector being trashed on both disks? However the > zpool status shows checksum errors not I/O errors and I'm not sure what > that means in this case. It means that the data read back from the disk is not what ZFS thought it wrote. > I thought that a zfs mirror would be the "ultimate" in protection but it's > not! Are you saying you would rather have the data silently corrupted? > Any ideas why and how to protect against this in the future? This can happen if there is a failure in a common system component during the write (eg. main memory, HBA, PCI bus, CPU, bridges, etc.) > (BTW it's osol official release 2009.06 snv_111b) On more modern releases, the details of the corruption are shown in the FMA dump. However, this feature does not exist in OpenSolaris 2009.06. -- richard > > # zpool status -v > pool: liver > state: ONLINE > status: One or more devices has experienced an error resulting in > data corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire > pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 3h31m with 1 errors > config: > > NAME STATE READ WRITE CKSUM > liver ONLINE 0 0 1 > mirror ONLINE 0 0 2 > c9d0p0 ONLINE 0 0 2 > c10d0p0 ONLINE 0 0 2 > > errors: Permanent errors have been detected in the following files: -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How can a mirror lose a file?
Hi Having just done a scrub of a mirror I've lost a file and I'm curious how this can happen in a mirror. Doesn't it require the almost impossible scenario of exactly the same sector being trashed on both disks? However the zpool status shows checksum errors not I/O errors and I'm not sure what that means in this case. I thought that a zfs mirror would be the "ultimate" in protection but it's not! Any ideas why and how to protect against this in the future? (BTW it's osol official release 2009.06 snv_111b) # zpool status -v pool: liver state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 3h31m with 1 errors config: NAME STATE READ WRITE CKSUM liver ONLINE 0 0 1 mirror ONLINE 0 0 2 c9d0p0 ONLINE 0 0 2 c10d0p0 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss