The issue basically is about a failure in mounting root if we have a
stacked setup of LUKS on top of RAID1, when RAID1 is degraded (like a
member missing). What happens in detail is a conjuncture of factors
leading to this problem:

(a) The initramfs script for cryptroot currently is present in two
initram stages: local-top and local-block. Problem is that if the script
fails on local-top phase, it panics and opens a console, not allowing
the boot process to continue. In this case, subsequent scripts are not
executed automatically.

(b) The mdadm initramfs script to mount degraded arrays runs on local-
block stage. It provides a heuristic that tries a regular array assemble
for (2/3*ROOTDELAY) times, and then it assembles the array as degraded,
in which is called the "poor man last resort" mechanism.

So, the first and far more serious issue is cryptroot early fail at
local-top phase. So an idea I've implemented to fix this was to allow
some retries on local-block stage, given local-block should loop for a
while running its scripts (at least according to documentation and
Debian's initramfs code). But guess what ?

(c) In Ubuntu, we have wait-on-root, which aims to speed-up the boot, in
my shallow understanding. Basically, we have wait-for-root consuming
almost all the ROOTDELAY time (30s as default, if not specified), and
local-block scripts run only once. Except...mdadm, which has the
previously mentioned heuristic of running 2/3*ROOTDELAY times. And for
that, we have a hack on initramfs-tools to cope with mdadm (!), as per
commit: salsa.debian.org/kernel-team/initramfs-tools/-/commit/033c948bb0
.

So, to fix the cryptroot inability to mount root device on top a
degraded RAID1 is a matter of coordinate mdadm and cryptroot, and (if my
approach is taken), loop on local-block. Below are the steps I took to
circumvent this long-term issue:

1) Allows cryptsetup to retry on local-block stage, relying in a
heuristic based on ROOTDELAY (we try 1/4*ROOTDELAY times) and on
initramfs looping at local-block phase.

2) Reduce the heuristic frequency on mdadm, in order it doesn't "beat"
the cryptroot attempts, i.e., cryptroot must execute more times. for
this, we reduced the heuristic for 1/5*ROOTDELAY.

3) Make local-block on Ubuntu loop again, but still rely on wait-for-
root in a first step; also, I removed that mdadm heinous hack from
initramfs-tools, it works without...that...if local-block loops.

Below I'll submit groovy debdiffs to gather reviews on my approach. Also, a PPA 
with packages built, in case somebody else wanna give them a try: 
https://launchpad.net/~gpiccoli/+archive/ubuntu/lp1879980
Thanks,


Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1879980

Title:
  Fail to boot with LUKS on top of RAID1 if the array is broken/degraded

Status in cryptsetup package in Ubuntu:
  Confirmed
Status in initramfs-tools package in Ubuntu:
  Confirmed
Status in mdadm package in Ubuntu:
  Confirmed

Bug description:
  Description will be saved for further SRU template, the details of the
  issue will be exposed in comments

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/1879980/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to