Re: [zfs-discuss] [on-discuss] Reliability at power failure?

Toby Thain Sun, 19 Apr 2009 08:12:14 -0700


On 19-Apr-09, at 10:38 AM, Uwe Dippel wrote:

casper....@sun.com wrote:
We are back at square one; or, at the subject line.
I did a zpool status -v, everything was hunky dory.
Next, a power failure, 2 hours later, and this is what zpoolstatus -v thinks:
zpool status -v
 pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting indata
   corruption.  Applications may be affected.
action: Restore the file in question if possible. Otherwiserestore the
   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

   NAME        STATE     READ WRITE CKSUM
   rpool       ONLINE       0     0     0
     c1d0s0    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

       //etc/svc/repository-boot-20090419_174236
I know, the hord-core defenders of ZFS will repeat for theumpteenth time that I should be grateful that ZFS can NOTICE andinform about the problem.
:-)
The file is created on boot and I assume this was created directlyafter the boot after the power-failure.
Am I correct in thinking that:
        the last boot happened on 2009/04/19_17:42:36
        the system hasn't reboot since that time
Good guess, but wrong. Another two to go ...   :)
Others might want to repeat that this is not supposed to happenin the first place.
ZFS guarantees that does cannot happen, unless the hardware isbad. Bad means here "the hardware doesn't promise what ZFSbelieves the hardware promises".
But anything can cause this:

        hardware problems:
                - bad memory
                - bad disk
                - bad disk controller
                - bad power supply
                
        software problem
                - memory corruption through any odd driver
                - any part of the zfs stack
My memory would still be a hardware problem. I remember aparticular case where ZFS continuously found checksums; replacingthe power supply fixed that.
Chances are. That Ubuntu as double boot here never finds anythingwrong, crashes, etc.


Why should it? It isn't designed to do so.

And again, someone will inform me that this is the beauty of ZFS:That I know of the corruption.
After a scrub, what I see is:

zpool status -v
 pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible. Otherwiserestore the
   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed after 0h48m with 1 errors on Sun Apr 1919:09:26 2009
config:

   NAME        STATE     READ WRITE CKSUM
   rpool       ONLINE       0     0     1
     c1d0s0    ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

       <0xa6>:<0x4f002>

Which file to replace?


Have you thoroughly checked your hardware?

Why are you running a non-redundant pool?

--Toby

Serious, what would a normal user expected to do here? No, I don'thave a backup of a file that has recently been created, true, at17:42 on April 19th.Reinstall? While everything was okay 12 hours ago, after some 30crashes due to power-failures, that were - until recently -rectified with crashes at boot, Failsafe, reboot.A system that has been going up and down without much hassle for1.5 years, both on OpenSolaris on UFS and Ubuntu?
(Let's not forget the thread started with my question "Why do Ihave to Failsafe so frequently after a power failure, to correct acorrupted bootarchive?")
Uwe


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [on-discuss] Reliability at power failure?

Reply via email to