I'm seeing this in a slightly different manner, on Bionic/Queens. We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently. The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of links to block.db and block.wal on a non-working OSD.
See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info. If I made the links manually: cd /var/lib/ceph/osd/ceph-4 ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal This resulted in a perms error accessing the device "bluestore(/var/lib/ceph/osd/ceph-4) _open_db /var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable: (13) Permission denied" ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/ total 0 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21 lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17 lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15 I tried to change the perms to ceph.ceph ownership, but no change. I have also tried (using `systemctl edit lvm2-monitor.service`) adding the following to lvm2, but that's not changed the behavior either: # cat /etc/systemd/system/lvm2-monitor.service.d/override.conf [Service] ExecStartPre=/bin/sleep 60 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1828617 Title: Hosts randomly 'losing' disks, breaking ceph-osd service enumeration Status in systemd package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 Ceph deployment. Ceph OSD devices utilizing LVM volumes pointing to udev-based physical devices. LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/ folder that are created by udev. However on reboot it happens (not always, rather like race condition) that Ceph services cannot start, and pvdisplay doesn't show any volumes created. The folder /dev/disk/by-dname/ however has all necessary device created by the end of boot process. The behaviour can be fixed manually by running "#/sbin/lvm pvscan --cache --activate ay /dev/nvme0n1" command for re-activating the LVM components and then the services can be started. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1828617/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp