[zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

Edward Ned Harvey Sat, 10 Apr 2010 08:09:37 -0700

Due to recent experiences, and discussion on this list, my colleague and I
performed some tests:


 

Using solaris 10, fully upgraded.  (zpool 15 is latest, which does not have
log device removal that was introduced in zpool 19)  In any way possible,
you lose an unmirrored log device, and the OS will crash, and the whole
zpool is permanently gone, even after reboots.

 

Using opensolaris, upgraded to latest, which includes zpool version 22.  (Or
was it 23?  I forget now.)  Anyway, it's >=19 so it has log device removal.

1.       Created a pool, with unmirrored log device.

2.       Started benchmark of sync writes, verified the log device getting
heavily used.

3.       Yank out the log device.

Behavior was good.  The pool became "degraded" which is to say, it started
using the primary storage for the ZIL, performance presumably degraded, but
the system remained operational and error free.

I was able to restore perfect health by "zpool remove" the failed log
device, and "zpool add" a new log device.

 

Next:

1.       Created a pool, with unmirrored log device.

2.       Started benchmark of sync writes, verified the log device getting
heavily used.

3.       Yank out both power cords.

4.       While the system is down, also remove the log device.

(OOoohhh, that's harsh.)  I created a situation where an unmirrored log
device is known to have unplayed records, there is an ungraceful shutdown,
*and* the device disappears.  That's the absolute worst case scenario
possible, other than the whole building burning down.  Anyway, the system
behaved as well as it possibly could.  During boot, the faulted pool did not
come up, but the OS came up fine.  My "zpool status" showed this:

 

# zpool status

 

  pool: junkpool

 state: FAULTED

status: An intent log record could not be read.

        Waiting for adminstrator intervention to fix the faulted pool.

action: Either restore the affected device(s) and run 'zpool online',

        or ignore the intent log records by running 'zpool clear'.

   see: http://www.sun.com/msg/ZFS-8000-K4

 scrub: none requested

config:

 

        NAME        STATE     READ WRITE CKSUM

        junkpool    FAULTED      0     0     0  bad intent log

          c8t4d0    ONLINE       0     0     0

          c8t5d0    ONLINE       0     0     0

        logs

          c8t3d0    UNAVAIL      0     0     0  cannot open

 

(---------------------------)

I know the unplayed log device data is lost forever.  So I clear the error,
remove the faulted log device, and acknowledge that I have lost the last few
seconds of written data, up to the system crash:

 

# zpool clear junkpool

# zpool status

 

  pool: junkpool

 state: DEGRADED

status: One or more devices could not be opened.  Sufficient replicas exist
for

        the pool to continue functioning in a degraded state.

action: Attach the missing device and online it using 'zpool online'.

   see: http://www.sun.com/msg/ZFS-8000-2Q

 scrub: none requested

config:

 

        NAME        STATE     READ WRITE CKSUM

        junkpool    DEGRADED     0     0     0

          c8t4d0    ONLINE       0     0     0

          c8t5d0    ONLINE       0     0     0

        logs

          c8t3d0    UNAVAIL      0     0     0  cannot open

 

# zpool remove junkpool c8t3d0

# zpool status junkpool

 

  pool: junkpool

 state: ONLINE

 scrub: none requested

config:

 

        NAME        STATE     READ WRITE CKSUM

        junkpool    ONLINE       0     0     0

          c8t4d0    ONLINE       0     0     0

          c8t5d0    ONLINE       0     0     0

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

Reply via email to