Ryan, We believe this is a bug as we expect curtin to wipe the disks. In this case it's failing to wipe the disks and occasionally that causes issues with our automation deploying ceph on those disks. This may be more of an issue with LVM and a race condition trying to wipe all of the disks sequentially simply with the large number of disks/vgs/lvs. To clarify from my previous testing, I was mistaken, I thought MAAS used the commissioning OS as the ephemeral OS from which to deploy from, this is not the case. MAAS uses the specified deployment OS as the ephemeral image to deploy from. Based on this all of my previous testing was done with Bionic using the 4.15 kernel. This proves it is a race condition somewhere as sometimes this error does not reproduce and it was just a coincidence that I was changing the commissioning OS.
I have tested this morning and have been able to reproduce the issue with bionic 4.15 and xenial 4.4 however I have yet to reproduce it using either bionic or xenial hwe kernels. I will upload the curtin logs and config from my reproducer now. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1871874 Title: lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1871874/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs