On 14.03.22 14:02, Greg Gallagher wrote: > > > On Mon, Mar 14, 2022 at 8:33 AM Jan Kiszka <jan.kis...@siemens.com > <mailto:jan.kis...@siemens.com>> wrote: > > On 04.03.22 00:45, Greg Gallagher wrote: > > > > > > On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka <jan.kis...@siemens.com > <mailto:jan.kis...@siemens.com> > > <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>> > wrote: > > > > On 02.03.22 16:44, Greg Gallagher wrote: > > > > > > > > > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka > <jan.kis...@siemens.com <mailto:jan.kis...@siemens.com> > > <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>> > > > <mailto:jan.kis...@siemens.com > <mailto:jan.kis...@siemens.com> <mailto:jan.kis...@siemens.com > <mailto:jan.kis...@siemens.com>>>> > > wrote: > > > > > > Hi Greg, > > > > > > something is going wrong on arm64 with latest ipipe version, > > see e.g. > > > > > > > > > https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw> > > > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>> > > > > > > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw> > > > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw > <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>> > > > (same thing seen on HiKey as well) > > > > > > Could you have a look? > > > > > > Thanks, > > > Jan > > > > > > -- > > > Siemens AG, Technology > > > Competence Center Embedded Linux > > > > > > > > > I'll take a look, it will be close to the end of the week > but i'll aim > > > to have it root caused by the weekend. > > > > > > > Just tried locally with xenomai-images and qemu-arm64 (just > run smokey): > > > > [ 408.747349] Kernel panic - not syncing: kernel stack overflow > > [ 408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted: > G > > W 5.4.180+ #1 > > [ 408.747762] Hardware name: linux,dummy-virt (DT) > > [ 408.747852] I-pipe domain: Xenomai > > [ 408.747941] Call trace: > > ... > > [ 408.761131] do_debug_exception+0x94/0x240 > > [ 408.761255] el1_dbg+0x18/0x8c > > [ 408.761329] this_cpu_has_cap+0x60/0x7c > > [ 408.761423] erratum_1418040_thread_switch+0x18/0x5c > > [ 408.761534] __switch_to+0xf8/0x154 > > [ 408.761622] xnarch_switch_to+0x5c/0xc4 > > [ 408.761711] pipeline_switch_to+0x14/0x84 > > [ 408.761803] ___xnsched_run+0x154/0x240 > > [ 408.761889] pipeline_schedule+0x30/0x40 > > [ 408.761999] xnintr_core_clock_handler+0x250/0x260 > > [ 408.762107] dispatch_irq_head+0x84/0x120 > > [ 408.762198] __ipipe_dispatch_irq+0x19c/0x1c4 > > [ 408.762293] __ipipe_grab_irq+0x5c/0xa0 > > [ 408.762377] gic_handle_irq+0x54/0xb0 > > [ 408.762457] handle_arch_irq_pipelined+0x14/0x60 > > [ 408.762557] el0_irq_naked+0x5c/0x84 > > [ 408.762905] SMP: stopping secondary CPUs > > > > This dbg trap from erratum_1418040_thread_switch looks > suspicious, and > > if I had to bet, I would say it somehow relates to [1] which > came with > > v5.4.176. But more logical would [2] due to its switch from > static to > > dynamic cpu_has_cap - but that is already in since v5.4.80... > > > > Jan > > > > [1] > > > > https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b> > > > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b>> > > [2] > > > > https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f> > > > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f > > <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f>> > > > > > > -- > > Siemens AG, Technology > > Competence Center Embedded Linux > > > > > > I just built a new image and I’ll have time to look into this probably > > tomorrow. > > > > Thanks for the help :) > > > > Any news on this? Do you need further support? > > Jan > > -- > Siemens AG, Technology > Competence Center Embedded Linux > > > Still working on it, I’ve unfortunately haven’t had a lot of time to > focus on this. I should have more time this week. > > If anyone has any ideas or patches they’d like me to try I can test them > as well. >
Looks like we have a bunch of new !preemtible() assertions in the switching path due to that erratum_1418040_thread_switch. Those sometimes trigger over Xenomai tasks, and that will cause the debug trap followed by a ride to fault recursion hell. I've hacked two away, and things seem to run smoothly again. Needs more careful analysis, though. Also that path of erratum_1418040_new_exec, if it needs hard preemption off. diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 1e16c4e00e771..8f74d2830e1b9 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -37,7 +37,7 @@ static bool __maybe_unused is_affected_midr_range_list(const struct arm64_cpu_capabilities *entry, int scope) { - WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + WARN_ON(scope != SCOPE_LOCAL_CPU /*|| preemptible()*/); return is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list); } diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index acdef8d76c64d..0d2242c0fe6b7 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2023,7 +2023,7 @@ static void __init mark_const_caps_ready(void) bool this_cpu_has_cap(unsigned int n) { - if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) { + if (/*!WARN_ON(preemptible()) &&*/ n < ARM64_NCAPS) { const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n]; if (cap) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 68c078ab0250c..879ecf0237c88 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -517,9 +517,9 @@ static void erratum_1418040_thread_switch(struct task_struct *next) static void erratum_1418040_new_exec(void) { - preempt_disable(); + unsigned long flags = hard_preempt_disable(); erratum_1418040_thread_switch(current); - preempt_enable(); + hard_preempt_enable(flags); } /* Jan -- Siemens AG, Technology Competence Center Embedded Linux