Re: CONFIG_NO_HZ_FULL = y but still have arch-timer on isolation CPUs

2022-03-15 Thread Ivan Jiang via Xenomai
Dear Hongzhan:

My kernel is 4.19.165, indeed remove xenomai, the phenomenon is the 
same.
Maybe it’s a bug in this kernel, and how can I trace this bug if it is 
fixed in some patch?
Thank you.

Ivan

在 2022/3/16 11:30,“Chen, Hongzhan” 写入:

Hi Ivan

Hope that what discussed in  
https://stackoverflow.com/questions/60322119/completely-eliminating-the-timer-tick-in-modern-linux-5-0
 may help you.
Seems that there need  do more settings for kernel parameters per your 
requirement but I have not tried myself.  When you remove Xenomai,  the 
phenomenon is the same?

Regards

Hongzhan Chen

-Original Message-
From: Xenomai  On Behalf Of Ivan Jiang via 
Xenomai
Sent: Tuesday, March 15, 2022 12:01 PM
To: xenomai@xenomai.org
Subject: CONFIG_NO_HZ_FULL = y but still have arch-timer on isolation CPUs

Dear Guys:



   I’ve set the configs like this

   CONFIG_NO_HZ_FULL = y

CONFIG_RCU_NOCB_CPU=y

CONFIG_PREEMPT=y

CONFIG_CPU_IDLE=n

CONFIG_ARM_CPUIDLE=n

CONFIG_CPU_FREQ=n 

And setenv isolcpus=1 xenomai.supported_cpus=0x02 nohz_full=1  
irqaffinity=0   rcu_nocbs=1

The CPU is Cortex-A55 Dual core and I use CPU 0 as Linux CPU and CPU1 for 
isolation core.

But cat /proc/interrupts still the arch_timers are increasing the same time 
on both CPUs.

Seems NO_HZ_FULL = y has no effect.

The boot log as below:

[0.00] rcu: Preemptible hierarchical RCU implementation.

[0.00] rcu: RCU dyntick-idle grace-period acceleration is 
enabled.

[0.00] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.

[0.00] rcu: RCU priority boosting: priority 1 delay 500 ms.

[0.00]  Tasks RCU enabled.

[0.00] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2

[0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0

[0.00] GICv3: Distributor has no Range Selector support

[0.00] GICv3: no VLPI support, no direct LPI support

[0.00] GICv3: CPU0: found redistributor 0 region 
0:0x1194

[0.00] NO_HZ: Full dynticks CPUs: 1.

[0.00] rcu: Offload RCU callbacks from CPUs: 1.

[0.00] arch_timer: cp15 timer(s) running at 24.00MHz (virt).

Please help me to analysis this situation.



Very appreciate,

Chen.












RE: CONFIG_NO_HZ_FULL = y but still have arch-timer on isolation CPUs

2022-03-15 Thread Chen, Hongzhan via Xenomai
Hi Ivan

Hope that what discussed in  
https://stackoverflow.com/questions/60322119/completely-eliminating-the-timer-tick-in-modern-linux-5-0
 may help you.
Seems that there need  do more settings for kernel parameters per your 
requirement but I have not tried myself.  When you remove Xenomai,  the 
phenomenon is the same?

Regards

Hongzhan Chen

-Original Message-
From: Xenomai  On Behalf Of Ivan Jiang via Xenomai
Sent: Tuesday, March 15, 2022 12:01 PM
To: xenomai@xenomai.org
Subject: CONFIG_NO_HZ_FULL = y but still have arch-timer on isolation CPUs

Dear Guys:

 

   I’ve set the configs like this

   CONFIG_NO_HZ_FULL = y

CONFIG_RCU_NOCB_CPU=y

CONFIG_PREEMPT=y

CONFIG_CPU_IDLE=n

CONFIG_ARM_CPUIDLE=n

CONFIG_CPU_FREQ=n 

And setenv isolcpus=1 xenomai.supported_cpus=0x02 nohz_full=1  irqaffinity=0   
rcu_nocbs=1

The CPU is Cortex-A55 Dual core and I use CPU 0 as Linux CPU and CPU1 for 
isolation core.

But cat /proc/interrupts still the arch_timers are increasing the same time on 
both CPUs.

Seems NO_HZ_FULL = y has no effect.

The boot log as below:

[    0.00] rcu: Preemptible hierarchical RCU implementation.

[    0.00] rcu: RCU dyntick-idle grace-period acceleration is enabled.

[    0.00] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.

[    0.00] rcu: RCU priority boosting: priority 1 delay 500 ms.

[    0.00]  Tasks RCU enabled.

[    0.00] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2

[    0.00] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0

[    0.00] GICv3: Distributor has no Range Selector support

[    0.00] GICv3: no VLPI support, no direct LPI support

[    0.00] GICv3: CPU0: found redistributor 0 region 0:0x1194

[    0.00] NO_HZ: Full dynticks CPUs: 1.

[    0.00] rcu: Offload RCU callbacks from CPUs: 1.

[    0.00] arch_timer: cp15 timer(s) running at 24.00MHz (virt).

Please help me to analysis this situation.

 

Very appreciate,

Chen.

 

 

 



Re: waitqueue vs. mutex behavior

2022-03-15 Thread Jan Kiszka via Xenomai
On 15.03.22 19:27, Matt Klass via Xenomai wrote:
> Using Xenomai 3.0.10, with kernel 4.9.128-05789, on armv7, we're having
> problems with the functionality of rtdm_waitqueues. The code was written by
> a Xenomai-adept developer who has since left for greener pastures.
> 
> We have two functions that use rtdm_waitqueue_lock/unlock on the same
> rtdm_waitqueue_t to manage access to a shared data structure. One is an
> rtdm_task_t that runs periodically every 1ms, the second is an IOCTL
> handler.
> 
> Problem: In some circumstances, one of the two functions will acquire the
> lock, and access the shared data structure. But before the first function
> releases the lock, the second function seems to also acquire the lock, and
> begin to access its own access of the shared data structure. The second
> function releases its lock after its work is complete, and then when the
> first function tries to release the lock, it gets an "already unlocked"
> error from Xenomai:
> 
> [Xenomai] lock 80f10020 already unlocked on CPU #0
>   last owner = kernel/xenomai/sched.c:908 (___xnsched_run(), CPU #0)
> [<8010ed78>] (unwind_backtrace) from [<8010b5f0>] (show_stack+0x10/0x14)
> [<8010b5f0>] (show_stack) from [<801c8c08>] (xnlock_dbg_release+0x12c/0x138)
> [<801c8c08>] (xnlock_dbg_release) from [<801be110>] (___xnlock_put+0xc/0x38)
> [<801be110>] (___xnlock_put) from [<7f000434>]
> (myengine_rtdm_waitqueue_unlock_with_num+0xf8/0x13c [engine_rtnet])
> [<7f000434>] (myengine_rtdm_waitqueue_unlock_with_num [engine_rtnet]) from
> [<7f00ace8>] (engine_rtnet_periodic_task+0x604/0x660 [engine_rtnet])
> [<7f00ace8>] (engine_rtnet_periodic_task [engine_rtnet]) from [<801c73ac>]
> (kthread_trampoline+0x68/0xa4)
> [<801c73ac>] (kthread_trampoline) from [<80147190>] (kthread+0x108/0x110)
> [<80147190>] (kthread) from [<80107cd4>] (ret_from_fork+0x18/0x24)
> 
> 
> These waitqueues were originally mutexes, and the above-mentioned adept
> committed this change to waitqueues seven years ago with the following
> comment: "Use Wait Queue instead of Mutex, because Mutex can't be called
> from the non-RT context."
> 
> We'd expect that once one of the functions obtains the lock on the
> waitqueue, the other would be blocked until the first function releases the
> lock. It's quite possible, likely really, that we don't understand the
> differences between mutexes and waitqueues. We've looked at the online
> Xenomai documentation on waitqueues, but we have not been enlightened.
> 
> 
> Would you have any suggestions on things we should do (or not do) to figure
> out what's going on?
> 

rtdm_waitqueue_lock/unlock is surely no replacement for
rtdm_mutex_lock/unlock to be used in non-rt contexts. It exists in order
to prepare the caller for waiting in a queue, and that waiting shares
the same constraint that rtdm_mutex_lock have: the caller must be RT.
Furthermore, the lock will obviously be dropped while being blocked on
the waitqueue.

If you need synchronization between RT and non-RT contexts, you should
use rtdm_lock_get_irqsave/put_irqrestore AND have little code in the
critical section. Definitely not any code that could sleep, call random
Linux functions or do even worse things. Or you need to ensure to
promote the non-RT caller to RT on entry.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] arm64: ipipe: Fix section mismatch of __ipipe_tsc_register

2022-03-15 Thread Greg Gallagher via Xenomai
On Tue, Mar 15, 2022 at 3:00 PM Jan Kiszka  wrote:

> From: Jan Kiszka 
>
> The kernel warns:
>
> The function dw_apb_clocksource_register() references
> the function __init __ipipe_tsc_register().
> This is often because dw_apb_clocksource_register lacks a __init
> annotation or the annotation of __ipipe_tsc_register is wrong.
>
> Signed-off-by: Jan Kiszka 
> ---
>
> Developed for 5.4 but probably also 4.19 material.
>
>  arch/arm64/kernel/ipipe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/ipipe.c b/arch/arm64/kernel/ipipe.c
> index 9dcb54c636395..a60787230fd84 100644
> --- a/arch/arm64/kernel/ipipe.c
> +++ b/arch/arm64/kernel/ipipe.c
> @@ -228,7 +228,7 @@ void __ipipe_root_sync(void)
>
>  static struct __ipipe_tscinfo tsc_info;
>
> -void __init __ipipe_tsc_register(struct __ipipe_tscinfo *info)
> +void __ipipe_tsc_register(struct __ipipe_tscinfo *info)
>  {
> tsc_info = *info;
> __ipipe_hrclock_freq = info->freq;
> --
> 2.34.1
>
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux


I’ll apply to 5.4 and check it it’s needed for 4.19

-Greg

>
>


Re: [PATCH] arm64: ipipe: Make erratum_1418040_thread_switch compatible with I-pipe

2022-03-15 Thread Greg Gallagher via Xenomai
On Tue, Mar 15, 2022 at 2:56 PM Jan Kiszka  wrote:

> From: Jan Kiszka 
>
> First of all, this erratum hook is called from __switch_to, thus
> potentially also from the primary domain. Some of the functions it calls
> check if preemption was disabled under Linux - which may not be the case
> when invoked from primary domain. Rather than adding a costly check for
> ipipe_root_p to this hot-path, simply turn the check off if I-pipe is
> enabled.
>
> As the hook can be called from primary context, we need to protect its
> setup for new execs against those contexts via hard_preempt_disable.
>
> Signed-off-by: Jan Kiszka 
> ---
>
> This is for 5.4-only, older kernels do no have the erratum fix.
>
> Philippe, the hardening of erratum_1418040_new_exec() could be a topic
> for dovetail as well. preemptible() is fully oob-aware there, though.
>
>  arch/arm64/kernel/cpu_errata.c | 3 ++-
>  arch/arm64/kernel/cpufeature.c | 3 ++-
>  arch/arm64/kernel/process.c| 4 ++--
>  3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpu_errata.c
> b/arch/arm64/kernel/cpu_errata.c
> index 1e16c4e00e771..7fd7d1c8b9fcc 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -37,7 +37,8 @@ static bool __maybe_unused
>  is_affected_midr_range_list(const struct arm64_cpu_capabilities *entry,
> int scope)
>  {
> -   WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
> +   WARN_ON(scope != SCOPE_LOCAL_CPU ||
> +   (preemptible() && !IS_ENABLED(CONFIG_IPIPE)));
> return is_midr_in_range_list(read_cpuid_id(),
> entry->midr_range_list);
>  }
>
> diff --git a/arch/arm64/kernel/cpufeature.c
> b/arch/arm64/kernel/cpufeature.c
> index acdef8d76c64d..d65287cc2148b 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2023,7 +2023,8 @@ static void __init mark_const_caps_ready(void)
>
>  bool this_cpu_has_cap(unsigned int n)
>  {
> -   if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
> +   if (!WARN_ON(!IS_ENABLED(CONFIG_IPIPE) && preemptible()) &&
> +   n < ARM64_NCAPS) {
> const struct arm64_cpu_capabilities *cap =
> cpu_hwcaps_ptrs[n];
>
> if (cap)
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 68c078ab0250c..879ecf0237c88 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -517,9 +517,9 @@ static void erratum_1418040_thread_switch(struct
> task_struct *next)
>
>  static void erratum_1418040_new_exec(void)
>  {
> -   preempt_disable();
> +   unsigned long flags = hard_preempt_disable();
> erratum_1418040_thread_switch(current);
> -   preempt_enable();
> +   hard_preempt_enable(flags);
>  }
>
>  /*
> --
> 2.34.1


Hi Jan,

Thanks for the patch, I’ll have time to test it tomorrow.

Thanks

Greg

>
>


[PATCH] arm64: ipipe: Fix section mismatch of __ipipe_tsc_register

2022-03-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

The kernel warns:

The function dw_apb_clocksource_register() references
the function __init __ipipe_tsc_register().
This is often because dw_apb_clocksource_register lacks a __init
annotation or the annotation of __ipipe_tsc_register is wrong.

Signed-off-by: Jan Kiszka 
---

Developed for 5.4 but probably also 4.19 material.

 arch/arm64/kernel/ipipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/ipipe.c b/arch/arm64/kernel/ipipe.c
index 9dcb54c636395..a60787230fd84 100644
--- a/arch/arm64/kernel/ipipe.c
+++ b/arch/arm64/kernel/ipipe.c
@@ -228,7 +228,7 @@ void __ipipe_root_sync(void)
 
 static struct __ipipe_tscinfo tsc_info;
 
-void __init __ipipe_tsc_register(struct __ipipe_tscinfo *info)
+void __ipipe_tsc_register(struct __ipipe_tscinfo *info)
 {
tsc_info = *info;
__ipipe_hrclock_freq = info->freq;
-- 
2.34.1


-- 
Siemens AG, Technology
Competence Center Embedded Linux



[PATCH] arm64: ipipe: Make erratum_1418040_thread_switch compatible with I-pipe

2022-03-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

First of all, this erratum hook is called from __switch_to, thus
potentially also from the primary domain. Some of the functions it calls
check if preemption was disabled under Linux - which may not be the case
when invoked from primary domain. Rather than adding a costly check for
ipipe_root_p to this hot-path, simply turn the check off if I-pipe is
enabled.

As the hook can be called from primary context, we need to protect its
setup for new execs against those contexts via hard_preempt_disable.

Signed-off-by: Jan Kiszka 
---

This is for 5.4-only, older kernels do no have the erratum fix.

Philippe, the hardening of erratum_1418040_new_exec() could be a topic 
for dovetail as well. preemptible() is fully oob-aware there, though.

 arch/arm64/kernel/cpu_errata.c | 3 ++-
 arch/arm64/kernel/cpufeature.c | 3 ++-
 arch/arm64/kernel/process.c| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 1e16c4e00e771..7fd7d1c8b9fcc 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -37,7 +37,8 @@ static bool __maybe_unused
 is_affected_midr_range_list(const struct arm64_cpu_capabilities *entry,
int scope)
 {
-   WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+   WARN_ON(scope != SCOPE_LOCAL_CPU ||
+   (preemptible() && !IS_ENABLED(CONFIG_IPIPE)));
return is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list);
 }
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index acdef8d76c64d..d65287cc2148b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2023,7 +2023,8 @@ static void __init mark_const_caps_ready(void)
 
 bool this_cpu_has_cap(unsigned int n)
 {
-   if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
+   if (!WARN_ON(!IS_ENABLED(CONFIG_IPIPE) && preemptible()) &&
+   n < ARM64_NCAPS) {
const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n];
 
if (cap)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 68c078ab0250c..879ecf0237c88 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -517,9 +517,9 @@ static void erratum_1418040_thread_switch(struct 
task_struct *next)
 
 static void erratum_1418040_new_exec(void)
 {
-   preempt_disable();
+   unsigned long flags = hard_preempt_disable();
erratum_1418040_thread_switch(current);
-   preempt_enable();
+   hard_preempt_enable(flags);
 }
 
 /*
-- 
2.34.1



waitqueue vs. mutex behavior

2022-03-15 Thread Matt Klass via Xenomai
Using Xenomai 3.0.10, with kernel 4.9.128-05789, on armv7, we're having
problems with the functionality of rtdm_waitqueues. The code was written by
a Xenomai-adept developer who has since left for greener pastures.

We have two functions that use rtdm_waitqueue_lock/unlock on the same
rtdm_waitqueue_t to manage access to a shared data structure. One is an
rtdm_task_t that runs periodically every 1ms, the second is an IOCTL
handler.

Problem: In some circumstances, one of the two functions will acquire the
lock, and access the shared data structure. But before the first function
releases the lock, the second function seems to also acquire the lock, and
begin to access its own access of the shared data structure. The second
function releases its lock after its work is complete, and then when the
first function tries to release the lock, it gets an "already unlocked"
error from Xenomai:

[Xenomai] lock 80f10020 already unlocked on CPU #0
  last owner = kernel/xenomai/sched.c:908 (___xnsched_run(), CPU #0)
[<8010ed78>] (unwind_backtrace) from [<8010b5f0>] (show_stack+0x10/0x14)
[<8010b5f0>] (show_stack) from [<801c8c08>] (xnlock_dbg_release+0x12c/0x138)
[<801c8c08>] (xnlock_dbg_release) from [<801be110>] (___xnlock_put+0xc/0x38)
[<801be110>] (___xnlock_put) from [<7f000434>]
(myengine_rtdm_waitqueue_unlock_with_num+0xf8/0x13c [engine_rtnet])
[<7f000434>] (myengine_rtdm_waitqueue_unlock_with_num [engine_rtnet]) from
[<7f00ace8>] (engine_rtnet_periodic_task+0x604/0x660 [engine_rtnet])
[<7f00ace8>] (engine_rtnet_periodic_task [engine_rtnet]) from [<801c73ac>]
(kthread_trampoline+0x68/0xa4)
[<801c73ac>] (kthread_trampoline) from [<80147190>] (kthread+0x108/0x110)
[<80147190>] (kthread) from [<80107cd4>] (ret_from_fork+0x18/0x24)


These waitqueues were originally mutexes, and the above-mentioned adept
committed this change to waitqueues seven years ago with the following
comment: "Use Wait Queue instead of Mutex, because Mutex can't be called
from the non-RT context."

We'd expect that once one of the functions obtains the lock on the
waitqueue, the other would be blocked until the first function releases the
lock. It's quite possible, likely really, that we don't understand the
differences between mutexes and waitqueues. We've looked at the online
Xenomai documentation on waitqueues, but we have not been enlightened.


Would you have any suggestions on things we should do (or not do) to figure
out what's going on?


Many thanks,
Matt


Re: ipipe-5.4: arm64 regression

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 14:02, Greg Gallagher wrote:
> 
> 
> On Mon, Mar 14, 2022 at 8:33 AM Jan Kiszka  > wrote:
> 
> On 04.03.22 00:45, Greg Gallagher wrote:
> >
> >
> > On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka  
> > >>
> wrote:
> >
> >     On 02.03.22 16:44, Greg Gallagher wrote:
> >     >
> >     >
> >     > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka
> mailto:jan.kis...@siemens.com>
> >     >
> >     >     >     wrote:
> >     >
> >     >     Hi Greg,
> >     >
> >     >     something is going wrong on arm64 with latest ipipe version,
> >     see e.g.
> >     >
> >     >   
> >   
>   https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> 
> >   
>   >
> >     >   
> >   
>    
> >   
>   >>
> >     >     (same thing seen on HiKey as well)
> >     >
> >     >     Could you have a look?
> >     >
> >     >     Thanks,
> >     >     Jan
> >     >
> >     >     --
> >     >     Siemens AG, Technology
> >     >     Competence Center Embedded Linux
> >     >
> >     >
> >     > I'll take a look, it will be close to the end of the week
> but i'll aim
> >     > to have it root caused by the weekend.
> >     >
> >
> >     Just tried locally with xenomai-images and qemu-arm64 (just
> run smokey):
> >
> >     [  408.747349] Kernel panic - not syncing: kernel stack overflow
> >     [  408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted:
> G   
> >         W         5.4.180+ #1
> >     [  408.747762] Hardware name: linux,dummy-virt (DT)
> >     [  408.747852] I-pipe domain: Xenomai
> >     [  408.747941] Call trace:
> >     ...
> >     [  408.761131]  do_debug_exception+0x94/0x240
> >     [  408.761255]  el1_dbg+0x18/0x8c
> >     [  408.761329]  this_cpu_has_cap+0x60/0x7c
> >     [  408.761423]  erratum_1418040_thread_switch+0x18/0x5c
> >     [  408.761534]  __switch_to+0xf8/0x154
> >     [  408.761622]  xnarch_switch_to+0x5c/0xc4
> >     [  408.761711]  pipeline_switch_to+0x14/0x84
> >     [  408.761803]  ___xnsched_run+0x154/0x240
> >     [  408.761889]  pipeline_schedule+0x30/0x40
> >     [  408.761999]  xnintr_core_clock_handler+0x250/0x260
> >     [  408.762107]  dispatch_irq_head+0x84/0x120
> >     [  408.762198]  __ipipe_dispatch_irq+0x19c/0x1c4
> >     [  408.762293]  __ipipe_grab_irq+0x5c/0xa0
> >     [  408.762377]  gic_handle_irq+0x54/0xb0
> >     [  408.762457]  handle_arch_irq_pipelined+0x14/0x60
> >     [  408.762557]  el0_irq_naked+0x5c/0x84
> >     [  408.762905] SMP: stopping secondary CPUs
> >
> >     This dbg trap from erratum_1418040_thread_switch looks
> suspicious, and
> >     if I had to bet, I would say it somehow relates to [1] which
> came with
> >     v5.4.176. But more logical would [2] due to its switch from
> static to
> >     dynamic cpu_has_cap - but that is already in since v5.4.80...
> >
> >     Jan
> >
> >     [1]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b
> 
> 
> >   
>  
>  
> >
> >     [2]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f
> 
> 
> >   
>  
>  
> >
> >
> >
> >     --
> >     Siemens AG, Technology
> >     Competence Center Embedded Linux
> >
> >
> > I just built a new 

Re: ipipe-5.4: arm64 regression

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 14:02, Greg Gallagher wrote:
> 
> 
> On Mon, Mar 14, 2022 at 8:33 AM Jan Kiszka  > wrote:
> 
> On 04.03.22 00:45, Greg Gallagher wrote:
> >
> >
> > On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka  
> > >>
> wrote:
> >
> >     On 02.03.22 16:44, Greg Gallagher wrote:
> >     >
> >     >
> >     > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka
> mailto:jan.kis...@siemens.com>
> >     >
> >     >     >     wrote:
> >     >
> >     >     Hi Greg,
> >     >
> >     >     something is going wrong on arm64 with latest ipipe version,
> >     see e.g.
> >     >
> >     >   
> >   
>   https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> 
> >   
>   >
> >     >   
> >   
>    
> >   
>   >>
> >     >     (same thing seen on HiKey as well)
> >     >
> >     >     Could you have a look?
> >     >
> >     >     Thanks,
> >     >     Jan
> >     >
> >     >     --
> >     >     Siemens AG, Technology
> >     >     Competence Center Embedded Linux
> >     >
> >     >
> >     > I'll take a look, it will be close to the end of the week
> but i'll aim
> >     > to have it root caused by the weekend.
> >     >
> >
> >     Just tried locally with xenomai-images and qemu-arm64 (just
> run smokey):
> >
> >     [  408.747349] Kernel panic - not syncing: kernel stack overflow
> >     [  408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted:
> G   
> >         W         5.4.180+ #1
> >     [  408.747762] Hardware name: linux,dummy-virt (DT)
> >     [  408.747852] I-pipe domain: Xenomai
> >     [  408.747941] Call trace:
> >     ...
> >     [  408.761131]  do_debug_exception+0x94/0x240
> >     [  408.761255]  el1_dbg+0x18/0x8c
> >     [  408.761329]  this_cpu_has_cap+0x60/0x7c
> >     [  408.761423]  erratum_1418040_thread_switch+0x18/0x5c
> >     [  408.761534]  __switch_to+0xf8/0x154
> >     [  408.761622]  xnarch_switch_to+0x5c/0xc4
> >     [  408.761711]  pipeline_switch_to+0x14/0x84
> >     [  408.761803]  ___xnsched_run+0x154/0x240
> >     [  408.761889]  pipeline_schedule+0x30/0x40
> >     [  408.761999]  xnintr_core_clock_handler+0x250/0x260
> >     [  408.762107]  dispatch_irq_head+0x84/0x120
> >     [  408.762198]  __ipipe_dispatch_irq+0x19c/0x1c4
> >     [  408.762293]  __ipipe_grab_irq+0x5c/0xa0
> >     [  408.762377]  gic_handle_irq+0x54/0xb0
> >     [  408.762457]  handle_arch_irq_pipelined+0x14/0x60
> >     [  408.762557]  el0_irq_naked+0x5c/0x84
> >     [  408.762905] SMP: stopping secondary CPUs
> >
> >     This dbg trap from erratum_1418040_thread_switch looks
> suspicious, and
> >     if I had to bet, I would say it somehow relates to [1] which
> came with
> >     v5.4.176. But more logical would [2] due to its switch from
> static to
> >     dynamic cpu_has_cap - but that is already in since v5.4.80...
> >
> >     Jan
> >
> >     [1]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b
> 
> 
> >   
>  
>  
> >
> >     [2]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f
> 
> 
> >   
>  
>  
> >
> >
> >
> >     --
> >     Siemens AG, Technology
> >     Competence Center Embedded Linux
> >
> >
> > I just built a new 

Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62

2022-03-15 Thread Scott Reed via Xenomai




On 3/15/22 7:32 AM, Jan Kiszka wrote:

On 14.03.22 18:45, Scott Reed wrote:



On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:


On 3/11/22 12:38 PM, Jan Kiszka wrote:

On 11.03.22 11:12, Scott Reed via Xenomai wrote:

Hello,

I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
when trying to move to a newer kernel and I-pipe patch.

The issue is as soon as a PCIe MSI interrupt occurs, the system
hangs with no message output on the serial console or in
/var/log/messages.

The platform I am working on is a "i.MX 6 Quad" and I am upgrading
from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
kernel and I-pipe patch with Xenomai 3.2.1.

Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
interrupts to the CPU from, for example, an Altera Triple-Speed MAC.

I have stable system running for some time with Linux 4.14.62 with
Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
some time back, I tried to move to 4.14.110 with I-pipe and also
saw same scenario of my system hanging on the first PCIe MSI interrupt
so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
but
see the same hang.


What about 4.19.y-cip? Specifically because of
https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.


Actually, that commit is also missing from the last tagged 5.4 ipipe
version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.


To do a quick test, I just applied the change from the commit you
referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
help (hang still occurs with first interrupt).





Before I dive into analyzing the hang, I wanted to ask:

What are other people's experiences with using PCIe MSI interrupts
and I-pipe?

I am thinking of trying 5.10.103 Dovetail to see if I still see
the problem. Would this be recommended?


If you can migrate your test with reasonable effort, yes, definitely.


I will try to migrate my test to 5.10.103 Dovetail with the hopes that
it will not be too much effort and report back.


I tried to migrate my test to 5.10.103 Dovetail and failed on the first
step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
on my platform.

The kernel boots without a problem, but the FEC Ethernet port on the
i.MX 6 is not working (cannot ping in or out).


Do you have or did you have any custom patches on top?


Only a patch to add the device tree include (dtsi) for our imx6 SOC:
   μQ7-962 - μQseven standard module with NXP i.MX 6 Processor





I looked at the trace with Wireshark and it looks like when pinging
out that the ARP packet is corrupt and therefore failing. The ARP
packet is corrupt in that it looks like various bits are flipped. For
example, the source MAC address should be
   00:09:cc:02:c1:b6
but is
   00:01:cc:02:01:36 or
   00:09:cc:02:c1:36
Wireshark also complains about the Frame check sequence
([FCS Status: Unverified]

I can provide Wireshark dumps if someone is interested, but for me
at this point I do not want to fight with getting a 5.10.x kernel
to work as I was pretty far along moving to a 5.4.x kernel with
ipipe before running into the original problem posted (with ipipe
my system freezes on the first PCIe MSI interrupt. Note: without
ipipe, I do not see any issues).

As mentioned, I first saw this problem a while ago when trying
to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
then backed back down to 4.14.62+ipipe which works.

I guess my next strategy is to try to figure out what changed
between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
the hang as I hope the delta between them is not too large.

If anyone has other suggestions or tips, they are more than welcome.


As I wrote before: try the latest 4.19-cip-ipipe first.


OK. Will do.



Jan





Re: [PATCH v8 0/4] Kernel-Shark and libtraceevent plugins

2022-03-15 Thread Jan Kiszka via Xenomai
On 15.03.22 02:32, Chen, Hongzhan wrote:
> Looks good to me. Thanks for your help.
> 

Thanks for your effort! The last "few percentages" to get
autoconf/automake integration were indeed tougher than I thought.

Jan

> Regards
> 
> Hongzhan Chen
> 
> -Original Message-
> From: Jan Kiszka  
> Sent: Monday, March 14, 2022 6:01 PM
> To: xenomai@xenomai.org
> Cc: Chen, Hongzhan 
> Subject: [PATCH v8 0/4] Kernel-Shark and libtraceevent plugins
> 
> Changes in v8:
>  - drop explicit deps again - no longer unneeded after refreshing local
>libtracecmd installation
> 
> Changes in v7:
>  - reworked installation
>  - fixed build of kernelshark plugin (missing dep)
>  - dropped applied first patch
> 
> Jan
> 
> 
> CC: Hongzhan Chen 
> 
> Hongzhan Chen (3):
>   build: add options to build plugins of kernelshark and libtraceevent
>   KernelShark: Add xenomai_cobalt_switch_events plugin for KernelShark
>   libtraceevent: Add xenomai_schedparams plugin for libtraceevent
> 
> Jan Kiszka (1):
>   libs: Silence installation output of libtool
> 
>  Makefile.am   |   4 +
>  configure.ac  |  35 
>  lib/alchemy/Makefile.am   |   2 +
>  lib/analogy/Makefile.am   |   2 +
>  lib/cobalt/Makefile.am|   2 +
>  lib/copperplate/Makefile.am   |   2 +
>  lib/mercury/Makefile.am   |   3 +-
>  lib/psos/Makefile.am  |   2 +
>  lib/smokey/Makefile.am|   2 +
>  lib/trank/Makefile.am |   3 +-
>  lib/vxworks/Makefile.am   |   2 +
>  tracing/Makefile.am   |  13 ++
>  tracing/README|  84 +
>  tracing/kernelshark/CobaltSwitchEvents.cpp| 156 
>  tracing/kernelshark/Makefile.am   |  20 ++
>  .../xenomai_cobalt_switch_events.c| 174 ++
>  .../xenomai_cobalt_switch_events.h|  58 ++
>  tracing/libtraceevent/Makefile.am |  19 ++
>  .../plugin_xenomai_schedparams.c  | 158 
>  19 files changed, 739 insertions(+), 2 deletions(-)
>  create mode 100644 tracing/Makefile.am
>  create mode 100644 tracing/README
>  create mode 100644 tracing/kernelshark/CobaltSwitchEvents.cpp
>  create mode 100644 tracing/kernelshark/Makefile.am
>  create mode 100644 tracing/kernelshark/xenomai_cobalt_switch_events.c
>  create mode 100644 tracing/kernelshark/xenomai_cobalt_switch_events.h
>  create mode 100644 tracing/libtraceevent/Makefile.am
>  create mode 100644 tracing/libtraceevent/plugin_xenomai_schedparams.c
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] cobalt/sched: Use nr_cpumask_bits instead of BITS_PER_LONG

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 22:13, Bezdeka, Florian via Xenomai wrote:
> On Mon, 2022-03-14 at 21:05 +, Bezdeka, Florian via Xenomai wrote:
>> Hi Richard,
>>
>> On Mon, 2022-03-14 at 21:38 +0100, Richard Weinberger via Xenomai
>> wrote:
>>> BITS_PER_LONG is too broad, the max number of usable bits is limited
>>> by nr_cpumask_bits.
>>
>> I agree, BITS_PER_LONG seems wrong. But couldn't it be too small as
>> well? It depends on NR_CPUS which might be > BITS_PER_LONG.
> 
> Sorry that was unclear. I assume that the size of the cpumask depends
> somehow on NR_CPUS, which might be > BITS_PER_LONG. So BITS_PER_LONG
> might be too small AND might be too broad.
> 

Good hint, I've adjusted this on merge.

Thanks,
Jan

>>
>> Regards,
>> Florian
>>
>>> Found while debugging a system with CONFIG_DEBUG_PER_CPU_MAPS enabled.
>>>
>>> Signed-off-by: Richard Weinberger 
>>> ---
>>>  kernel/cobalt/sched.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
>>> index 88c4951ed814..aa65fd7f5d63 100644
>>> --- a/kernel/cobalt/sched.c
>>> +++ b/kernel/cobalt/sched.c
>>> @@ -1370,7 +1370,7 @@ static int affinity_vfile_show(struct 
>>> xnvfile_regular_iterator *it,
>>> unsigned long val = 0;
>>> int cpu;
>>>  
>>> -   for (cpu = 0; cpu < BITS_PER_LONG; cpu++)
>>> +   for (cpu = 0; cpu < nr_cpumask_bits; cpu++)
>>> if (cpumask_test_cpu(cpu, _cpu_affinity))
>>> val |= (1UL << cpu);
>>>  
>>> @@ -1395,7 +1395,7 @@ static ssize_t affinity_vfile_store(struct 
>>> xnvfile_input *input)
>>> affinity = xnsched_realtime_cpus; /* Reset to default. */
>>> else {
>>> cpumask_clear();
>>> -   for (cpu = 0; cpu < BITS_PER_LONG; cpu++, val >>= 1) {
>>> +   for (cpu = 0; cpu < nr_cpumask_bits; cpu++, val >>= 1) {
>>> if (val & 1) {
>>> /*
>>>  * The new dynamic affinity must be a strict
>>
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 18:45, Scott Reed wrote:
> 
> 
> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>
>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
 Hello,

 I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
 when trying to move to a newer kernel and I-pipe patch.

 The issue is as soon as a PCIe MSI interrupt occurs, the system
 hangs with no message output on the serial console or in
 /var/log/messages.

 The platform I am working on is a "i.MX 6 Quad" and I am upgrading
 from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
 kernel and I-pipe patch with Xenomai 3.2.1.

 Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
 interrupts to the CPU from, for example, an Altera Triple-Speed MAC.

 I have stable system running for some time with Linux 4.14.62 with
 Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
 some time back, I tried to move to 4.14.110 with I-pipe and also
 saw same scenario of my system hanging on the first PCIe MSI interrupt
 so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
 but
 see the same hang.
>>>
>>> What about 4.19.y-cip? Specifically because of
>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>
>>>
>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>
>> To do a quick test, I just applied the change from the commit you
>> referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
>> help (hang still occurs with first interrupt).
>>
>>>

 Before I dive into analyzing the hang, I wanted to ask:

 What are other people's experiences with using PCIe MSI interrupts
 and I-pipe?

 I am thinking of trying 5.10.103 Dovetail to see if I still see
 the problem. Would this be recommended?
>>>
>>> If you can migrate your test with reasonable effort, yes, definitely.
>>
>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>> it will not be too much effort and report back.
> 
> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
> on my platform.
> 
> The kernel boots without a problem, but the FEC Ethernet port on the
> i.MX 6 is not working (cannot ping in or out).

Do you have or did you have any custom patches on top?

> 
> I looked at the trace with Wireshark and it looks like when pinging
> out that the ARP packet is corrupt and therefore failing. The ARP
> packet is corrupt in that it looks like various bits are flipped. For
> example, the source MAC address should be
>   00:09:cc:02:c1:b6
> but is
>   00:01:cc:02:01:36 or
>   00:09:cc:02:c1:36
> Wireshark also complains about the Frame check sequence
> ([FCS Status: Unverified]
> 
> I can provide Wireshark dumps if someone is interested, but for me
> at this point I do not want to fight with getting a 5.10.x kernel
> to work as I was pretty far along moving to a 5.4.x kernel with
> ipipe before running into the original problem posted (with ipipe
> my system freezes on the first PCIe MSI interrupt. Note: without
> ipipe, I do not see any issues).
> 
> As mentioned, I first saw this problem a while ago when trying
> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
> then backed back down to 4.14.62+ipipe which works.
> 
> I guess my next strategy is to try to figure out what changed
> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
> the hang as I hope the delta between them is not too large.
> 
> If anyone has other suggestions or tips, they are more than welcome.

As I wrote before: try the latest 4.19-cip-ipipe first.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux