Hi all,

on our octopus-latest cluster I accidentally destroyed an up+in OSD with the 
command line

  ceph-volume lvm zap /dev/DEV

It executed the dd command and then failed at the lvm commands with "device 
busy". Problem number one is, that the OSD continued working fine. Hence, there 
is no indication of a corruption, its a silent corruption. Problem number two - 
the real one - is, why is ceph-colume not checking if the OSD that device 
belongs to is still up+in? "ceph osd destroy" does that, for example. I believe 
to remember that "ceph-volume lvm zap --osd-id" also checks, but I'm not sure.

Has this been changed in versions later than octopus?

I think it is extremely dangerous to provide a tool that allows the silent 
corruption of an entire ceph cluster. The corruption is only discovered on 
restart and then it would be too late (unless there is an in-official recovery 
procedure somewhere).

I would prefer that ceph-volume lvm zap employs the same strict sanity checks 
as other ceph-commands to avoid accidents. In my case it was a typo, one wrong 
letter.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to