I propose to close this. This is clearly fixed with 4.4 on the host, and
rolling that out is covered by bug 1602577.
It can be closed for auto-package-testing either way as our arm64 nova
compute nodes now run 4.4.23.
** Changed in: linux (Ubuntu)
Status: Confirmed => Fix Released
** Chan
For the record, I now use two arm64 xenial (4.4) instances on a host
with kernel 4.8, and things are looking really good. See latest posts to
bug 1602577.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launch
@cjwatson: Ah, ok. I may have misread the history here. I had gleaned
that the xenial kernel (as a host) was more unstable - but for different
reasons.
Regardless, I have pulled the wily backport build I prepared, because it
was frequently triggering a WARN() condition. Looks like my backport
atte
@dannf, this bug seems to be *worse* in xenial than in wily, so I don't
think backporting a change from xenial to wily is going to help matters?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bu
I wonder if this might be a dupe of LP: #1549494? We fixed that in
xenial, but haven't backported the fix to wily. I haven't been able to
reproduce this issue myself, but I uploaded a wily kernel w/ a
backported fix to ppa:dannf/test, in case someone else can test it. It
corresponds to the git bran
OK, ignore the last two messages, it eventually booted, it just seems
that the host was rather slow.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some ti
I'm reproducing rcu_sched timeouts all the time with a 4.4 kernel on a
far slower ARM64 host with the same cloud images.
[ 157.555837] INFO: rcu_sched self-detected stall on CPU
[ 157.561551] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 157.562669] 2-...: (14960 ticks this GP) idle=5b5/140
..and for one more datapoint, QEMU seems to be hung spinning on a futex:
futex(0xb05520, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily
unavailable)
futex(0xb054f4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xb05520, 1104376) = 1
--
You received this bug notification because you
We're seeing rather similar symptoms on Launchpad builders after
upgrading the guests from wily to xenial (console-log not very
informative, e.g. https://pastebin.canonical.com/160898/plain/; build
output appears hung; I can't tell for sure that it's the same thing,
this is just a guess). These ar
Filed LP#1602577 for the host instability issue on 4.4
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in Auto Package Testi
yep, file a separate bug, the perf data will be useful. Thanks.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in Auto Pack
Hi,
I'm sorry but 4.4 is too unstable on the hosts. We have to reboot and/or
power cycle them multiple times a day. We're back on 4.2 everywhere.
Haw gathered some perf data on a failing 4.4 host, perhaps we can start
digging the issue from here ? Perhaps it should be a separate bug as
well.
Tha
> can you try using the following kernel parameters on the VM and see if this
> helps:
> rcu_nocb_poll rcutree.kthread_prio=90 rcuperf.verbose=1
the instance on swirlix16 (on 4.2 kernel) hung again (twice), with the
attached console log. This now has the above kernel parameters, but I'm
afraid it
hloeung | pitti: yeah, I believe work was done to get swirlix01-09 to
4.4
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in
[hloeung@ragnar tmp]$ for i in {01..09} 16; do ssh
swirlix${i}.bos01.scalingstack "uname -a"; done
Linux swirlix01 4.4.0-30-generic #49~14.04.1-Ubuntu SMP Thu Jun 30 22:20:09 UTC
2016 aarch64 aarch64 aarch64 GNU/Linux
Linux swirlix02 4.4.0-30-generic #49~14.04.1-Ubuntu SMP Thu Jun 30 22:20:09 UTC
lxd-armhf1 (on swirlix01) has run without any lockup since the host
kernel update to 4.4. I created a new lxd-armhf2 yesterday (on
swirlix08) which also survived without any workaround. At the same time
I created a new lxd-armhf3 (on swirlix16) which has locked up pretty
well every < 15 minutes (I
Thanks Colin, great work! I'll deploy this ASAP.
FYI, at least some of the VM hosts in scalingstack got updated to a 4.4
kernel. Not sure how much that changes your investigations.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in
Also, can we clarify something. Do the ARM hosts provide kvm? If not,
one should really run the VMs with just one CPU.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm6
This article throws some light onto things:
https://lwn.net/Articles/518953/
"Second, the greater the number of idle CPUs, the more work RCU must do
when forcing quiescent states. Yes, the busier the system, the less work
RCU needs to do! The reason for the extra work is that RCU is not
permitted
http://lists.infradead.org/pipermail/linux-arm-
kernel/2014-July/274251.html
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status
Bit more digging, I see that the CPU goes into idle either by a single
WFI (wait for an interrupt) shallow sleep or a deeper
arm_cpuidle_suspend() - the latter is akin to turning off the CPU. I
wonder if we're seeing some issues with the wakeup latency taking a long
time inside QEMU when the host
On an idle Xenial cloud image I'm seeing:
[ 1485.236760] [] __switch_to+0x90/0xa8
[ 1485.236772] [] __tick_nohz_idle_enter+0x50/0x3f0
[ 1485.236776] [] tick_nohz_idle_enter+0x40/0x70
[ 1485.236785] [] cpu_startup_entry+0x288/0x2d8
[ 1485.236791] [] secondary_start_kernel+0x120/0x130
[ 1485.236795]
Bisecting is proving problematic as 4.3 kernels don't boot.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in Auto Package
I wonder if it is possible to test with a recent 4.4 Xenial kernel on
the host to see if that helps.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some ti
Can't repro the bug on 4.4 kernel on host. Will try 4.3 now
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in Auto Package
Testing with 4.4 on the host and the VM is showing:
[ 335.699014] sched: RT throttling activated
[ 337.600831] hrtimer: interrupt took 2939683820 ns
..which shows us that the host is suffering from some very large
scheduling latency issues that is causing the VM some grief.
--
You received th
Can trip it with stress-ng context switching with 4.2.0-38-generic
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after booting
Status in Auto P
Finally able to trip a rcu timeout. 3.19.0-61-generic kernel on host,
xenial on server, host busy on async i/o requests (via stress-ng):
[ 825.195520] systemd[1]: Started Journal Service.
[ 900.108730] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 900.110254] 0-...: (4 GPs behind) idle=750
Thanks William, I'm going to soak test with those older kernels and see
if I can trip the hang on these.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups som
The production hardware is mcdivitt as well, running trusty with lts-
vivid or lts-wily.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768
Title:
[arm64] lockups some time after boo
I've been running Xenial host + Xenial VM on a mcdivitt 8 core box and
not been able to reproduce this issue. I'm going to keep it running for
one more day.
Do we have any idea of what the host(s) hardware is? I'm starting to
wonder if it is a host/VM interaction issue.
--
You received this bu
31 matches
Mail list logo