Re: [zfs-discuss] How can a mirror lose a file?

2010-07-28 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Richard Elling
> 
> This can happen if there is a failure in a common system component
> during the write (eg. main memory, HBA, PCI bus, CPU, bridges, etc.)

I bet that's the cause.  Because as "sol" said ... "Doesn't it require the
almost impossible scenario of exactly the same sector being trashed on both
disks?"

Basically, yeah.  And it's time to start thinking up ways "almost
impossible" isn't quite as impossible as you thought it was.  Regular
scrubs, snapshots, and backups are your friends.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can a mirror lose a file?

2010-07-28 Thread Cindy Swearingen

Hi Sol,

What kind of disks?

You should be able to use the fmdump -eV command to identify when the
checksum errors occurred.

Thanks,

Cindy



On 07/28/10 13:41, sol wrote:

Hi

Having just done a scrub of a mirror I've lost a file and I'm curious how this
can happen in a mirror.  Doesn't it require the almost impossible scenario
of exactly the same sector being trashed on both disks?  However the
zpool status shows checksum errors not I/O errors and I'm not sure what
that means in this case.

I thought that a zfs mirror would be the "ultimate" in protection but it's not!
Any ideas why and how to protect against this in the future?

(BTW it's osol official release 2009.06 snv_111b)

# zpool status -v
  pool: liver
 state: ONLINE
status: One or more devices has experienced an error resulting in 
data corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the entire 
pool from backup.

   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 3h31m with 1 errors 
config:


NAME  STATE  READ WRITE CKSUM
liver ONLINE  0 0 1
 mirror   ONLINE  0 0 2
  c9d0p0  ONLINE  0 0 2
  c10d0p0 ONLINE  0 0 2

errors: Permanent errors have been detected in the following files:



  
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can a mirror lose a file?

2010-07-28 Thread Ian Collins

On 07/29/10 07:41 AM, sol wrote:

Hi

Having just done a scrub of a mirror I've lost a file and I'm curious how this
can happen in a mirror.  Doesn't it require the almost impossible scenario
of exactly the same sector being trashed on both disks?  However the
zpool status shows checksum errors not I/O errors and I'm not sure what
that means in this case.

I thought that a zfs mirror would be the "ultimate" in protection but it's not!
Any ideas why and how to protect against this in the future?

   

Bad memory? Use ECC memory and test it.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How can a mirror lose a file?

2010-07-28 Thread Richard Elling
On Jul 28, 2010, at 12:41 PM, sol wrote:
> Having just done a scrub of a mirror I've lost a file and I'm curious how this
> can happen in a mirror.  Doesn't it require the almost impossible scenario
> of exactly the same sector being trashed on both disks?  However the
> zpool status shows checksum errors not I/O errors and I'm not sure what
> that means in this case.

It means that the data read back from the disk is not what ZFS thought
it wrote.

> I thought that a zfs mirror would be the "ultimate" in protection but it's 
> not!

Are you saying you would rather have the data silently corrupted?

> Any ideas why and how to protect against this in the future?

This can happen if there is a failure in a common system component
during the write (eg. main memory, HBA, PCI bus, CPU, bridges, etc.)

> (BTW it's osol official release 2009.06 snv_111b)

On more modern releases, the details of the corruption are shown in the 
FMA dump.  However, this feature does not exist in OpenSolaris 2009.06.
 -- richard

> 
> # zpool status -v
>  pool: liver
> state: ONLINE
> status: One or more devices has experienced an error resulting in 
> data corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the 
> entire 
> pool from backup.
>   see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub completed after 3h31m with 1 errors 
> config:
> 
> NAME  STATE  READ WRITE CKSUM
> liver ONLINE  0 0 1
> mirror   ONLINE  0 0 2
>  c9d0p0  ONLINE  0 0 2
>  c10d0p0 ONLINE  0 0 2
> 
> errors: Permanent errors have been detected in the following files:

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How can a mirror lose a file?

2010-07-28 Thread sol
Hi

Having just done a scrub of a mirror I've lost a file and I'm curious how this
can happen in a mirror.  Doesn't it require the almost impossible scenario
of exactly the same sector being trashed on both disks?  However the
zpool status shows checksum errors not I/O errors and I'm not sure what
that means in this case.

I thought that a zfs mirror would be the "ultimate" in protection but it's not!
Any ideas why and how to protect against this in the future?

(BTW it's osol official release 2009.06 snv_111b)

# zpool status -v
  pool: liver
 state: ONLINE
status: One or more devices has experienced an error resulting in 
data corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the entire 
pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 3h31m with 1 errors 
config:

NAME  STATE  READ WRITE CKSUM
liver ONLINE  0 0 1
 mirror   ONLINE  0 0 2
  c9d0p0  ONLINE  0 0 2
  c10d0p0 ONLINE  0 0 2

errors: Permanent errors have been detected in the following files:



  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss