[Bug 1801555] [NEW] [FATAL] mdadm --grow adds dirty disk to RAID1 without recovery

xor Sat, 03 Nov 2018 16:56:05 -0700

Public bug reported:

On Kubuntu 18.04.1 it is possible to cause a (non-bitmap!) RAID1 to consume an 
out-of-sync disk without recovery as if it were in sync.
"$ cat /proc/mdstat" immediately shows the dirty disk as "U" = up after 
addition, WITHOUT a resync.


I was able to reproduce this twice.

This means arbitrary corruption to the filesystem can happen as you will have 
two mixed filesystem states in the RAID1.
RAID1 balances reads across all disks so the state of either disk will be 
returned randomly.


Steps to reproduce:

1. Install via network installer, create the following partition layout 
manually:
{sda1, sdb1} -> md RAID1 -> btrfs -> /boot
{sda2, sdb2} -> md RAID1 -> dm-crypt -> btrfs -> /

2. After the system is installed ensure the raid array has no bitmap. I
won't provide instructions for this as my 16 GiB disks were small enough
to avoid creation of a bitmap apparently. Check "$ cat /proc/mdstat" to
confirm there is no bitmap.

3. Boot with sdb physically disconnected. Boot will now hang at "Begin: Waiting 
for encrypted source device ...". That will timeout after a few minutes and 
drop to an initramfs shell, complaining that the disk doesn't exist. This is a 
separate bug, filed at #1196693
To make it bootable again, do the following workaround in the initramfs shell:
$ mdadm --run /dev/md0
$ mdadm --run /dev/md1
# Reduce size of array to stop the initramfstools from waiting for sdb forevery.
$ mdadm --grow -n 1 --force /dev/md0
$ mdadm --grow -n 1 --force /dev/md1
$ reboot

After "$ reboot", boot up the system fully with sdb still disconnected.
Now the state of the two disks should be out of sync - booting surely produces 
at least one write.
Reboot and apply the same procedure to sdb, with sda disconnected.

4. Boot from one of the disks and do this:
$ mdadm /dev/md0 --add /dev/sdb1
$ mdadm /dev/md1 --add /dev/sdb2
# The sdb partitions should now be listed as (S), i.e. spare
$ cat /proc/mdstat
# Grow the array to use up the spares
$ mdadm /dev/md0 --grow -n 2 
$ mdadm /dev/md1 --grow -n 2 
# Now the bug shows: mdstat will say the array is in sync immediately:
$ cat /proc/mdstat
# And the kernel log will show that a recovery was started
# - BUT completed within less than a second:
$ dmesg
[144.255918] md: recovery of RAID array md0
[144.256176] md: md0: recovery done
[151.776281] md: recovery of RAID array md1
[151.776667] md: md1: recovery done

Notice: I'm not sure whether this is a bug in mdadm or the kernel.
Filing this as mdadm bug for now, if you figure out this is a kernel bug
then please re-assign.

** Affects: mdadm (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1801555

Title:
  [FATAL] mdadm --grow adds dirty disk to RAID1 without recovery

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1801555/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1801555] [NEW] [FATAL] mdadm --grow adds dirty disk to RAID1 without recovery

Reply via email to