> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jonathan Loran > > But here's what's keeping me up at night: We're running zpool v15, > which as I understand it means if an X25e log fails, the pool is toast. > Obviously, the log devices are not mirrored. My bad :( I guess this > begs the first question, which is: > > - if the machine is running, and the log device fails, AND the failure > is detected as such, will the ZIL roll back into the main pool drives? > If so, are we saved?
Because you're at pool v15, it does not matter if the log device fails while you're running, or you're offline and trying to come online, or whatever. Simply if the log device fails, unmirrored, and the version is less than 19, the pool is simply lost. There are supposedly techniques to recover, so it's not necessarily a "data unrecoverable by any means" situation, but you certainly couldn't recover without a server crash, or at least shutdown. And it would certainly be a nightmare, at best. The system will not "fall back" to ZIL in the main pool. That was a feature created in v19. > - Second question, how about this: partition the two X25E drives into > two, and then mirror each half of each drive as log devices for each > pool. Am I missing something with this scheme? On boot, will the GUID > for each pool get found by the system from the partitioned log drives? I'm afraid it's too late for that, unless you're willing to destroy & recreate your pool. You cannot remove the existing log device. You cannot shrink it. You cannot replace it with a smaller one. The only things you can do right now are: (a) Start mirroring that log device with another device of the same size or larger. or (b) Buy another SSD which is larger than the first. Create a slice on the 2nd which is equal to the size of the first. Mirror the first onto the slice of the 2nd. After resilver, detach the first drive, and replace it with another one of the larger drives. Slice the 3rd drive just like the 2nd, and mirror the 2nd drive slice onto it. Now you've got a mirrored & sliced device, without any downtime, but you had to buy 2x 2x larger drives in order to do it. or (c) Destroy & recreate your whole pool, but learn from your mistake. This time, slice each SSD, and mirror the slices to form the log device. BTW, ask me how I know this in such detail? It's cuz I made the same mistake last year. There was one interesting possibility we considered, but didn't actually implement: We are running a stripe of mirrors. We considered the possibility of breaking the mirrors, creating a new pool out of the "other half" using the SSD properly sliced. Using "zfs send" to replicate all the snapshots over to the new pool, up to a very recent time. Then, we'd be able to make a very short service window. Shutdown briefly, send that one final snapshot to the new pool, destroy the old pool, rename the new pool to take the old name, and bring the system back up again. Instead of scheduling a long service window. As soon as the system is up again, start mirroring and resilvering (er ... initial silvering), and of course, slice the SSD before attaching the mirror. Naturally there is some risk, running un-mirrored long enough to send the snaps... and so forth. Anyway, just an option to consider. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss