On Sat, Apr 10, 2010 at 10:08 AM, Edward Ned Harvey
<solar...@nedharvey.com>wrote:

>  Due to recent experiences, and discussion on this list, my colleague and
> I performed some tests:
>
>
>
> Using solaris 10, fully upgraded.  (zpool 15 is latest, which does not have
> log device removal that was introduced in zpool 19)  In any way possible,
> you lose an unmirrored log device, and the OS will crash, and the whole
> zpool is permanently gone, even after reboots.
>
>
>
> Using opensolaris, upgraded to latest, which includes zpool version 22.
> (Or was it 23?  I forget now.)  Anyway, it’s >=19 so it has log device
> removal.
>
> 1.       Created a pool, with unmirrored log device.
>
> 2.       Started benchmark of sync writes, verified the log device getting
> heavily used.
>
> 3.       Yank out the log device.
>
> Behavior was good.  The pool became “degraded” which is to say, it started
> using the primary storage for the ZIL, performance presumably degraded, but
> the system remained operational and error free.
>
> I was able to restore perfect health by “zpool remove” the failed log
> device, and “zpool add” a new log device.
>
>
>
> Next:
>
> 1.       Created a pool, with unmirrored log device.
>
> 2.       Started benchmark of sync writes, verified the log device getting
> heavily used.
>
> 3.       Yank out both power cords.
>
> 4.       While the system is down, also remove the log device.
>
> (OOoohhh, that’s harsh.)  I created a situation where an unmirrored log
> device is known to have unplayed records, there is an ungraceful shutdown, *
> *and** the device disappears.  That’s the absolute worst case scenario
> possible, other than the whole building burning down.  Anyway, the system
> behaved as well as it possibly could.  During boot, the faulted pool did not
> come up, but the OS came up fine.  My “zpool status” showed this:
>
>
>
> # zpool status
>
>
>
>   pool: junkpool
>
>  state: FAULTED
>
> status: An intent log record could not be read.
>
>         Waiting for adminstrator intervention to fix the faulted pool.
>
> action: Either restore the affected device(s) and run 'zpool online',
>
>         or ignore the intent log records by running 'zpool clear'.
>
>    see: http://www.sun.com/msg/ZFS-8000-K4
>
>  scrub: none requested
>
> config:
>
>
>
>         NAME        STATE     READ WRITE CKSUM
>
>         junkpool    FAULTED      0     0     0  bad intent log
>
>           c8t4d0    ONLINE       0     0     0
>
>           c8t5d0    ONLINE       0     0     0
>
>         logs
>
>           c8t3d0    UNAVAIL      0     0     0  cannot open
>
>
>
> (---------------------------)
>
> I know the unplayed log device data is lost forever.  So I clear the error,
> remove the faulted log device, and acknowledge that I have lost the last few
> seconds of written data, up to the system crash:
>
>
>
> # zpool clear junkpool
>
> # zpool status
>
>
>
>   pool: junkpool
>
>  state: DEGRADED
>
> status: One or more devices could not be opened.  Sufficient replicas exist
> for
>
>         the pool to continue functioning in a degraded state.
>
> action: Attach the missing device and online it using 'zpool online'.
>
>    see: http://www.sun.com/msg/ZFS-8000-2Q
>
>  scrub: none requested
>
> config:
>
>
>
>         NAME        STATE     READ WRITE CKSUM
>
>         junkpool    DEGRADED     0     0     0
>
>           c8t4d0    ONLINE       0     0     0
>
>           c8t5d0    ONLINE       0     0     0
>
>         logs
>
>           c8t3d0    UNAVAIL      0     0     0  cannot open
>
>
>
> # zpool remove junkpool c8t3d0
>
> # zpool status junkpool
>
>
>
>   pool: junkpool
>
>  state: ONLINE
>
>  scrub: none requested
>
> config:
>
>
>
>         NAME        STATE     READ WRITE CKSUM
>
>         junkpool    ONLINE       0     0     0
>
>           c8t4d0    ONLINE       0     0     0
>
>           c8t5d0    ONLINE       0     0     0
>
>
>


Awesome!  Thanks for letting us know the results of your tests Ed, that's
extremely helpful.  I was actually interested in grabbing some of the
cheaper intel SSD's for home use, but didn't want to waste my money if it
wasn't going to handle the various failure modes gracefully.

--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to