I was able to reproduce the hang problem using the script mdhang.sh
(modified for my raid device location) on 22.04 with kernel 5.15 several
times, usually within 10-15 minutes.
Using 22.04 hwe kernel 6.5 I was not able to reproduce the problem with
a 25+ hours run of the same script. Seems someth
Hi. I have the same problem on radi5x2 arrays assembled in stripe
The total volume of the array is 97 TB and free 16 GB
Do users who have a bug also have little free space?
I couldn’t reproduce this condition on a test bench under a synthetic load.
Ubuntu 22.04.3 LTS
kernel 5.15.0.-72-generic
--
Hello, got the same issue with Ubuntu 22.04.3 LTS (GNU/Linux
5.15.0-91-generic x86_64). Also tried 5.19 kernel and got the same
problem.
Jan 7 02:28:26 cache4 systemd[1]: Starting MD array scrubbing...
Jan 7 02:28:26 cache4 root: mdcheck start checking /dev/md0
Jan 7 08:28:44 cache4 kernel: [29
Linux version 6.4.12-200.fc38.x86_64
(mockbuild@30894952d3244f1ab967aeda9ed417f6) (gcc (GCC) 13.2.1 20230728
(Red Hat 13.2.1-1), GNU ld version 2.39-9.fc38) #1 SMP PREEMPT_DYNAMIC
Wed Aug 23 17:46:49 UTC 2023
230 ?I< 0:00 \_ [md]
1377 ?S 6:15 \_ [md0_raid1]
156
Noticed this issue on Ubuntu 20.04 with a md raid device. System
exhibited the same behavior as other users have noted: high CPU usage
and terminal locking up until the system is rebooted.
[14715252.569157] INFO: task md1_raid4:1763945 blocked for more than 120
seconds.
[14715252.570228] N
On Ubuntu 22.04, 5.15.0-83-generic #92-Ubuntu - our storage system ran
into this bug. mdcheck ran for the scheduled 1st day of the month and
then hung 6 hours later.
Oct 1 06:52:13 server1 systemd[1]: Starting MD array scrubbing...
Oct 1 06:52:13 server1 root: mdcheck start checking /dev/md0
Oc
I can't add much in terms of data. But this is a +1. My symptoms were virtually
identical to Chad Wagner's
This happened july 1st 2023 on my machine, the day the check started. It got
stuck at 98%.
Only cure I could find was to reboot the system.
--
You received this bug notification because yo
** Also affects: linux-signed-hwe-5.15 (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs
Seeing this bug on Ubuntu 20.04 and Ubuntu 22.04 as well, with both
normal and HWE kernels.
To add some more information, this bug seems to randomly appear during
the initial RAID 6 creation process as well, where the array is mounted
but completely empty and not accessed - so it's likely to origi
I hit his bug as well in Ubuntu 22.04 with kernel 5.15.0-67-generic
We have a single RAID 5 on 3 drives for 28T. I'm switching to the workaround
from
comment #5.
Jun 4 12:48:11 server1 kernel: [1622699.548591] INFO: task md0_raid5:406
blocked for more than 120 seconds.
Jun 4 12:48:11 server1
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux-signed-hwe-5.4 (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bu
** Also affects: linux-signed-hwe-5.4 (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs d
I hit this bug with Ubuntu 22.04 (Jammy) on Kernel 5.15.0-56
For testing, I have set up a RAID1 and a RAID5 in a VM. I put the disks
for each RAID on a separate controller. Based on the 'mdhang' script
(see comment #11), I was able to reproduce the error easily.
I ran 2 'mdhang' scripts at the sa
Comment #5 (https://bugs.launchpad.net/ubuntu/+source/linux-signed-
hwe-5.11/+bug/1942935/comments/5) has been a stable workaround for me
(basically revert back to a continuous resync like 18.04).
My newer machines are using ZFS with raidz2 pools.
--
You received this bug notification because yo
Turns out my issue was a faulty drive, and the system would lock up when
mdadm hit the bad sectors on resync. The issue seemed like it was lower
in the blockdev code causing a deadlock.
I replaced the drive and the problem went away.
--
You received this bug notification because you are a member
The second of the patches mentioned in #27 (with git SHA 1e2677...) has,
I believe, been backported to Ubuntu kernels 5.15.0-48 and 5.4.0-126.
We've still hit this with Ubuntu Jammy on 5.15.0-53, so I guess the
first commit needs to be backported as well.
--
You received this bug notification be
It won’t let me change the state back to active.
Every time I try nothing happens and array_status is always idle.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
T
Yeah, that's the same issue as this one. The issue is the raid is doing
a consistency check (mdcheck) and is transitioned to an "idle" state and
hits a deadlock that causes all I/O through the md device to block. The
workaround is to change the array state back to active.
I made the changes in #
Digging further, I think I might be running into this bug:
https://lore.kernel.org/linux-raid/5ed54ffc-
ce82-bf66-4eff-390cb23bc...@molgen.mpg.de/T/
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://
I'm running checkarray manually, I took off all the start and stop stuff
like you did.
echo active > /sys/block/md0/md/array_state doesn't fix.
I must have not gotten all the trace last time. I've attached it here.
** Attachment added: "kernel log snippet"
https://bugs.launchpad.net/ubuntu
I believe to resolve the deadlock you want to do:
echo active > /sys/block/md1/md/array_state
Not "idle". You should see a hung task for mdcheck in there somewhere as well,
and it only occurs when the raid is resyncing (md_resync should be running), at
least for me I the workaround in comment 5:
Looks like two patches are landing in next to resolve this:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527&id=8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527&id=1e267742
** Tags added: patch
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs during mdcheck/resync
Status in linux package in Ubuntu:
Confirmed
S
Same issue on impish 5.13.13 kernel, running in VBox.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs during mdcheck/resync
Status in linux
** Patch added: "md-reap-sync-thread.patch"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942935/+attachment/5526028/+files/md-reap-sync-thread.patch
** Tags added: apport-collected impish
** Description changed:
It seems to always occur during an mdcheck/resync, if I am logged in
** Changed in: linux (Ubuntu)
Status: Incomplete => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs during mdcheck/resync
** Tags removed: hirsute
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs during mdcheck/resync
Status in linux package in Ubuntu:
Incompl
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935
Title:
kernel io hangs during mdcheck/r
Here is Donald Buczek's reproducer script. I setup an Ubuntu 20.04 VM
with latest linux-image-generic and was able to reproduce it within
maybe 10 or 15 minutes. Exactly the same issue.
Filesystem layout built as follows:
# assemble raid devices
mdadm --create /dev/md0 --level=1 --raid-devices=
The patch hasn't made it into mainline from what I have seen, it looks
like it died back in March waiting for feedback from additional kernel
developers. From what I have gathered this is a deadlock scenario
directly caused by pausing the resync while the system is under heavy
write activity.
Don
One of the systems was using package linux-generic and in practice
Linux pedabackup 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/version_signature
Ubuntu 5.4.0-80.90-generic 5.4.124
--
You received this bug notification because you are
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: linux-signed-hwe-5.11 (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://b
I think I've seen this issue once per two different systems so I think
this is software issue. Does anybody know if patch in comment #6 is
going to be included in Ubuntu 20.04 LTS?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-sign
Here is the proposed patch, Doesn't appear to have been applied. Last
report was with 5.11rc5.
https://lore.kernel.org/linux-raid/1613177399-22024-1-git-send-email-
guoqing.ji...@cloud.ionos.com/
--
You received this bug notification because you are a member of Kernel
Packages, which is subscri
Similar report here on 5.10.0-rc4:
https://www.spinics.net/lists/raid/msg66654.html
I ended up masking the services introduced with 20.04 LTS, and switched back
the crontab.
systemctl mask mdcheck_continue.service mdcheck_continue.timer
mdcheck_start.service mdcheck_start.timer
cat > /etc/cron
Hi Kleber,
I installed it later yesterday, but I won't know until the next resync. This
has been a problem since at least linux 5.4 kernel that shipped with Ubuntu
20.04. I don't think I had these problems on Ubuntu 18.04 LTS, the same
hardware, running the linux-image-generic at that time.
Hello Chad Wagner,
Thank you for reporting this issue. Could you please try installing the
latest 20.04 HWE kernel and check whether the problem persists? The
version currently in focal-updates is 5.11.0-34.36~20.04.1.
--
You received this bug notification because you are a member of Kernel
Pack
** Attachment added: "screenlog.txt"
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5523575/+files/screenlog.txt
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
38 matches
Mail list logo