Hello.

Recently a disk died in one of my servers running 12.2 (12.2-RELEASE-p2). So.... it died, I got a bunch of dmesg errors saying there's a bunch of i/o commands stuck, OS became partially livelocked (I still could login, but barely could do anything) so.... considering this is a mirrored pool, and "I have done it many times before, nothing could be safer !" I sent a reset to the server via IPMI.

And it was quite discouraging finding this after a successful boot-up from intact zroot (yeah, I've already tried to zpool import -F after an export, so initially it was imported already, showing the same devastating state):


[root@db0:~]# zpool import
pool: data
id: 15967028801499953224
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://illumos.org/msg/ZFS-8000-5E
config:
data                   FAULTED  corrupted data
9566965891719887395  FAULTED  corrupted data
nvd0                 ONLINE


# zpool import -F data
cannot import 'data': one or more devices is currently unavailable


Well, -yeah, I do have a replica, I didn't lose one bit of data, but it's still a tragedy - to lose pool after one silly reset (and I have done it literally a hundred times before on various servers and FreeBSD versions).

So, a couple of questions:

- is it worth trying FreeBSD 13 to recover ? (just to get the experience if it can be still recovered)

- is it because it's more dangerous with NVMes or would it also happen on SSD/rotational drives ?

- would zpool checkpoint save me in this case ?


Thanks.

Eugene.


Reply via email to