Hello.
Recently a disk died in one of my servers running 12.2
(12.2-RELEASE-p2). So.... it died, I got a bunch of dmesg errors saying
there's a bunch of i/o commands stuck, OS became partially livelocked (I
still could login, but barely could do anything) so.... considering this
is a mirrored pool, and "I have done it many times before, nothing could
be safer !" I sent a reset to the server via IPMI.
And it was quite discouraging finding this after a successful boot-up
from intact zroot (yeah, I've already tried to zpool import -F after an
export, so initially it was imported already, showing the same
devastating state):
[root@db0:~]# zpool import
pool: data
id: 15967028801499953224
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://illumos.org/msg/ZFS-8000-5E
config:
data FAULTED corrupted data
9566965891719887395 FAULTED corrupted data
nvd0 ONLINE
# zpool import -F data
cannot import 'data': one or more devices is currently unavailable
Well, -yeah, I do have a replica, I didn't lose one bit of data, but
it's still a tragedy - to lose pool after one silly reset (and I have
done it literally a hundred times before on various servers and FreeBSD
versions).
So, a couple of questions:
- is it worth trying FreeBSD 13 to recover ? (just to get the experience
if it can be still recovered)
- is it because it's more dangerous with NVMes or would it also happen
on SSD/rotational drives ?
- would zpool checkpoint save me in this case ?
Thanks.
Eugene.