>
> Ryan,
>   We believe this is a bug as we expect curtin to wipe the disks. In this
>   case it's failing to wipe the disks and occasionally that causes issues
>   with our automation deploying ceph on those disks.

I'm still confused about what the actual error you believe is happening.
Note that lvremove is not a fatal error from curtin's perspective because
we will be destroying data on the underlying physical disk or partition.


Looking at your debug info:

1) your curtin-install.log does not show any failures of lvmremove
command

2) if the curtin-install-cfg.yaml is correct, then you've marked

  wipe: superblock

on all of the devices on top of which you build logical volumes. With this
setting curtin wipes the logical volume *and* the underlying device.

Even if the writes to the lv fail, or if lvremove fails, as long as the
underying disk/partition succeed then the LVM metadata and partition table
on the disk will be cleared redering the content unusable.


Look at sda1 which olds the lv_root LV:

shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-24'
Running command ['dmsetup', 'splitname', 'vgroot-lvroot', '-c', '--noheadings', 
'--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] 
(capture=True)

# here we start wiping the logical device by writing 1M of zeros at the
# start of the device and at the end of the device
Wiping lvm logical volume: /dev/vgroot/lvroot
wiping 1M on /dev/vgroot/lvroot at offsets [0, -1048576]

# now we remove the lv device and then the vg if it's empty
using "lvremove" on vgroot/lvroot
Running command ['lvremove', '--force', '--force', 'vgroot/lvroot'] with 
allowed return codes [0] (capture=False)
  Logical volume "lvroot" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 
'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Running command ['pvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 
'vg_name,pv_name'] with allowed return codes [0] (capture=True)
Running command ['vgremove', '--force', '--force', 'vgroot'] with allowed 
return codes [0, 5] (capture=False)
  Volume group "vgroot" successfully removed

# now the vg was created from /dev/sda1, here curtin wipes the device with
# 1M of zeros at the start and end of this partition
Wiping lvm physical volume: /dev/sda1
wiping 1M on /dev/sda1 at offsets [0, -1048576]


In the scenario where you see the lvremove command fail, what is the outcome
on the system.  Does curtin fail the install?  Does the install succeed by
something after booting into the new system fail?  If the latter, what
commands fail and can you show the output?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871874

Title:
  lvremove occasionally fails on nodes with multiple volumes and curtin
  does not catch the failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1871874/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to