Re: [Xenomai 3.1 PATCH v2] process: update clockfreq when receive corresponding event.

2022-06-17 Thread Jan Kiszka via Xenomai
On 17.06.22 08:08, Jan Kiszka via Xenomai wrote:
> On 16.06.22 02:18, Hongzhan Chen via Xenomai wrote:
>> 1. When there is clockfreq param passed down via command line, we
>>do not update clockfreq even if we receive event of updating clockfreq.
>>Or else, we update the clockfreq with notified value.
>> 2. At the same time, we would like to update clockfreq param showing
>>in sys filesystem after apply updated clockfreq.
>>
>> Signed-off-by: Hongzhan Chen 
>> ---
>>  include/cobalt/kernel/init.h  |  2 ++
>>  kernel/cobalt/init.c  | 12 
>>  kernel/cobalt/posix/process.c |  3 ++-
>>  3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/cobalt/kernel/init.h b/include/cobalt/kernel/init.h
>> index 41dd531a8..36d1ea290 100644
>> --- a/include/cobalt/kernel/init.h
>> +++ b/include/cobalt/kernel/init.h
>> @@ -51,4 +51,6 @@ void cobalt_remove_state_chain(struct notifier_block *nb);
>>  
>>  void cobalt_call_state_chain(enum cobalt_run_states newstate);
>>  
>> +void cobalt_update_clockfreq(unsigned long freq);
>> +
>>  #endif /* !_COBALT_KERNEL_INIT_H_ */
>> diff --git a/kernel/cobalt/init.c b/kernel/cobalt/init.c
>> index dbe321c3b..558030292 100644
>> --- a/kernel/cobalt/init.c
>> +++ b/kernel/cobalt/init.c
>> @@ -53,6 +53,16 @@ module_param_named(timerfreq, timerfreq_arg, ulong, 0444);
>>  static unsigned long clockfreq_arg;
>>  module_param_named(clockfreq, clockfreq_arg, ulong, 0444);
>>  
>> +static bool passed_clockfreq;
>> +
>> +void cobalt_update_clockfreq(unsigned long freq)
>> +{
>> +if (!passed_clockfreq) {
>> +xnclock_update_freq(freq);
>> +clockfreq_arg = freq;
>> +}
>> +}
>> +
>>  #ifdef CONFIG_SMP
>>  static unsigned long supported_cpus_arg = -1;
>>  module_param_named(supported_cpus, supported_cpus_arg, ulong, 0444);
>> @@ -150,6 +160,8 @@ static int __init mach_setup(void)
>>  
>>  if (clockfreq_arg == 0)
>>  clockfreq_arg = sysinfo.sys_hrclock_freq;
>> +else
>> +passed_clockfreq = clockfreq_arg != 0;
> 
> If you assign in the conditional branch, you don't need to check
> clockfreq_arg, just set it true.
> 

Fixed up while merging.

Both patches now applied. This one to stable/v3.1.x and stable/v3.0.x.
The I-pipe patch to 5.4, 4.19-cip and 4.4-cip. Tests are running, please
also check.

Thanks,
Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [I-PIPE PATCH v2] x86/tsc: I-PIPE : notify I-PIPE about updated clockfreq

2022-06-17 Thread Jan Kiszka via Xenomai
On 16.06.22 02:18, Hongzhan Chen via Xenomai wrote:
> When there is refined tsc clock, notify Xenomai to apply it.
> Linux may schedule a delayed work to refine tsc clock and update
> tsc_khz which happen after Xenomai finsih init but tsc_scale and
> tsc_shift still keep the value depending on origianl tsc clock
> which is outdated. The difference between two clocks may cause
> unexpected timing drift.
> 
> For example:
>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>   cat /sys/module/xenomai/parameters/clockfreq
>   2899886000
>   After patching, we like to use 2903.999 MHz.
> 
> Signed-off-by: Hongzhan Chen 
> ---
>  arch/x86/kernel/tsc.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 835856efd71f..37faaf9a9e6c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1294,6 +1294,7 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   u64 tsc_stop, ref_stop, delta;
>   unsigned long freq;
>   int cpu;
> + unsigned int ipipe_freq;
>  
>   /* Don't bother refining TSC on unstable systems */
>   if (tsc_unstable)
> @@ -1345,6 +1346,9 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + ipipe_freq = tsc_khz * 1000;
> +  __ipipe_report_clockfreq_update(ipipe_freq);
   ^^^
Wrong indention - fixed up on merge.

Jan

> +
>   /* Update the sched_clock() rate to match the clocksource one */
>   for_each_possible_cpu(cpu)
>   set_cyc2ns_scale(tsc_khz, cpu, tsc_stop);

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [I-PIPE PATCH v2] x86/tsc: I-PIPE : notify I-PIPE about updated clockfreq

2022-06-17 Thread Jan Kiszka via Xenomai
On 17.06.22 08:09, Jan Kiszka via Xenomai wrote:
> On 16.06.22 02:18, Hongzhan Chen via Xenomai wrote:
>> When there is refined tsc clock, notify Xenomai to apply it.
>> Linux may schedule a delayed work to refine tsc clock and update
>> tsc_khz which happen after Xenomai finsih init but tsc_scale and
>> tsc_shift still keep the value depending on origianl tsc clock
>> which is outdated. The difference between two clocks may cause
>> unexpected timing drift.
>>
>> For example:
>>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>>   cat /sys/module/xenomai/parameters/clockfreq
>>   2899886000
>>   After patching, we like to use 2903.999 MHz.
>>
>> Signed-off-by: Hongzhan Chen 
>> ---
>>  arch/x86/kernel/tsc.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>> index 835856efd71f..37faaf9a9e6c 100644
>> --- a/arch/x86/kernel/tsc.c
>> +++ b/arch/x86/kernel/tsc.c
>> @@ -1294,6 +1294,7 @@ static void tsc_refine_calibration_work(struct 
>> work_struct *work)
>>  u64 tsc_stop, ref_stop, delta;
>>  unsigned long freq;
>>  int cpu;
>> +unsigned int ipipe_freq;
>>  
>>  /* Don't bother refining TSC on unstable systems */
>>  if (tsc_unstable)
>> @@ -1345,6 +1346,9 @@ static void tsc_refine_calibration_work(struct 
>> work_struct *work)
>>  /* Inform the TSC deadline clockevent devices about the recalibration */
>>  lapic_update_tsc_freq();
>>  
>> +ipipe_freq = tsc_khz * 1000;
> 
> You are still using a separate variable.
> 

Seems I missed a reply on that in the list - spam filters or who knows...

__ipipe_report_clockfreq_update() is defined in an unhandy way to
enforce this, I see. Would have been better to define it like this:

#define __ipipe_report_clockfreq_update(freq)   \
do {\
unsigned int __freq = (freq);   \
__ipipe_notify_kevent(IPIPE_KEVT_CLOCKFREQ, &__freq); \
} while (0)

OTH, we are in maintenance mode, so let's ignore this.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [I-PIPE PATCH v2] x86/tsc: I-PIPE : notify I-PIPE about updated clockfreq

2022-06-17 Thread Jan Kiszka via Xenomai
On 16.06.22 02:18, Hongzhan Chen via Xenomai wrote:
> When there is refined tsc clock, notify Xenomai to apply it.
> Linux may schedule a delayed work to refine tsc clock and update
> tsc_khz which happen after Xenomai finsih init but tsc_scale and
> tsc_shift still keep the value depending on origianl tsc clock
> which is outdated. The difference between two clocks may cause
> unexpected timing drift.
> 
> For example:
>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>   cat /sys/module/xenomai/parameters/clockfreq
>   2899886000
>   After patching, we like to use 2903.999 MHz.
> 
> Signed-off-by: Hongzhan Chen 
> ---
>  arch/x86/kernel/tsc.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 835856efd71f..37faaf9a9e6c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1294,6 +1294,7 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   u64 tsc_stop, ref_stop, delta;
>   unsigned long freq;
>   int cpu;
> + unsigned int ipipe_freq;
>  
>   /* Don't bother refining TSC on unstable systems */
>   if (tsc_unstable)
> @@ -1345,6 +1346,9 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + ipipe_freq = tsc_khz * 1000;

You are still using a separate variable.

> +  __ipipe_report_clockfreq_update(ipipe_freq);
> +
>   /* Update the sched_clock() rate to match the clocksource one */
>   for_each_possible_cpu(cpu)
>   set_cyc2ns_scale(tsc_khz, cpu, tsc_stop);

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [Xenomai 3.1 PATCH v2] process: update clockfreq when receive corresponding event.

2022-06-17 Thread Jan Kiszka via Xenomai
On 16.06.22 02:18, Hongzhan Chen via Xenomai wrote:
> 1. When there is clockfreq param passed down via command line, we
>do not update clockfreq even if we receive event of updating clockfreq.
>Or else, we update the clockfreq with notified value.
> 2. At the same time, we would like to update clockfreq param showing
>in sys filesystem after apply updated clockfreq.
> 
> Signed-off-by: Hongzhan Chen 
> ---
>  include/cobalt/kernel/init.h  |  2 ++
>  kernel/cobalt/init.c  | 12 
>  kernel/cobalt/posix/process.c |  3 ++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/cobalt/kernel/init.h b/include/cobalt/kernel/init.h
> index 41dd531a8..36d1ea290 100644
> --- a/include/cobalt/kernel/init.h
> +++ b/include/cobalt/kernel/init.h
> @@ -51,4 +51,6 @@ void cobalt_remove_state_chain(struct notifier_block *nb);
>  
>  void cobalt_call_state_chain(enum cobalt_run_states newstate);
>  
> +void cobalt_update_clockfreq(unsigned long freq);
> +
>  #endif /* !_COBALT_KERNEL_INIT_H_ */
> diff --git a/kernel/cobalt/init.c b/kernel/cobalt/init.c
> index dbe321c3b..558030292 100644
> --- a/kernel/cobalt/init.c
> +++ b/kernel/cobalt/init.c
> @@ -53,6 +53,16 @@ module_param_named(timerfreq, timerfreq_arg, ulong, 0444);
>  static unsigned long clockfreq_arg;
>  module_param_named(clockfreq, clockfreq_arg, ulong, 0444);
>  
> +static bool passed_clockfreq;
> +
> +void cobalt_update_clockfreq(unsigned long freq)
> +{
> + if (!passed_clockfreq) {
> + xnclock_update_freq(freq);
> + clockfreq_arg = freq;
> + }
> +}
> +
>  #ifdef CONFIG_SMP
>  static unsigned long supported_cpus_arg = -1;
>  module_param_named(supported_cpus, supported_cpus_arg, ulong, 0444);
> @@ -150,6 +160,8 @@ static int __init mach_setup(void)
>  
>   if (clockfreq_arg == 0)
>   clockfreq_arg = sysinfo.sys_hrclock_freq;
> + else
> + passed_clockfreq = clockfreq_arg != 0;

If you assign in the conditional branch, you don't need to check
clockfreq_arg, just set it true.

>  
>   if (clockfreq_arg == 0) {
>   printk(XENO_ERR "null clock frequency? Aborting.\n");
> diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
> index 6d1c1c427..f762fef2e 100644
> --- a/kernel/cobalt/posix/process.c
> +++ b/kernel/cobalt/posix/process.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1382,7 +1383,7 @@ static inline int handle_clockfreq_event(unsigned int 
> *p)
>  {
>   unsigned int newfreq = *p;
>  
> - xnclock_update_freq(newfreq);
> + cobalt_update_clockfreq(newfreq);
>  
>   return KEVENT_PROPAGATE;
>  }


-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 3.2.x] cobalt/x86: Account for changes to switch_fpu_finish in 5.4.182

2022-06-15 Thread Jan Kiszka via Xenomai
On 15.06.22 17:20, Jan Kiszka via Xenomai wrote:
> From: Jan Kiszka 
> 
> The signature of switch_fpu_finish changed in stable 5.4.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  kernel/cobalt/arch/x86/ipipe/thread.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cobalt/arch/x86/ipipe/thread.c 
> b/kernel/cobalt/arch/x86/ipipe/thread.c
> index dd97a5d32c..7e28903a42 100644
> --- a/kernel/cobalt/arch/x86/ipipe/thread.c
> +++ b/kernel/cobalt/arch/x86/ipipe/thread.c
> @@ -425,7 +425,9 @@ void xnarch_leave_root(struct xnthread *root)
>  #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)
>   /* restore current's fpregs */
>   __cpu_invalidate_fpregs_state();
> -#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,2,0)
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,4,182)
> + switch_fpu_finish(current);
> +#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5,2,0)
>   switch_fpu_finish(>thread.fpu);
>  #else
>   switch_fpu_finish(>thread.fpu, raw_smp_processor_id());

Actually 3.1.x material as well - we already supported 5.4 there.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [xenomai-images][PATCH] Update Isar revision

2022-06-15 Thread Jan Kiszka via Xenomai
On 15.06.22 18:00, Jan Kiszka via Xenomai wrote:
> From: Jan Kiszka 
> 
> This brings changes to image types that are easy to account for. Also
> the image file name changed, so adjust readme and start-qemu.sh.
> 
> The update fixes logging issues, thus helps a lot with analyzing failing
> builds, specifically in CI.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  README.md   | 2 +-
>  conf/machine/hikey.conf | 2 +-
>  conf/machine/qemu-machine.inc   | 2 +-
>  conf/machine/x86-64-efi.conf| 2 +-
>  kas.yml | 2 +-
>  recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb | 2 ++
>  start-qemu.sh   | 2 +-
>  7 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/README.md b/README.md
> index b38e131..5f6c4c7 100644
> --- a/README.md
> +++ b/README.md
> @@ -41,7 +41,7 @@ armhf(`board-beagle-bone-black.yml`) and 
> arm64(`board-hikey.yml`) targets.
>  Each physical target will generate ready-to-boot images under
>  `build/tmp/deploy/images/`. To flash, e.g., the HiKey image to an SD card, 
> run
>  
> -dd 
> if=build/tmp/deploy/images/hikey/demo-image-hikey-xenomai-demo-hikey.wic.img \
> +dd 
> if=build/tmp/deploy/images/hikey/demo-image-hikey-xenomai-demo-hikey.wic \
> of=/dev/ bs=1M status=progress
>  
>  ## CI Build
> diff --git a/conf/machine/hikey.conf b/conf/machine/hikey.conf
> index 6ad4611..344a3de 100644
> --- a/conf/machine/hikey.conf
> +++ b/conf/machine/hikey.conf
> @@ -11,7 +11,7 @@
>  
>  DISTRO_ARCH = "arm64"
>  
> -IMAGE_FSTYPES ?= "wic-img"
> +IMAGE_FSTYPES ?= "wic"
>  IMAGER_INSTALL += "${GRUB_BOOTLOADER_INSTALL}"
>  
>  IMAGE_PREINSTALL_append = " firmware-ti-connectivity"
> diff --git a/conf/machine/qemu-machine.inc b/conf/machine/qemu-machine.inc
> index 7771a11..d9ee2dd 100644
> --- a/conf/machine/qemu-machine.inc
> +++ b/conf/machine/qemu-machine.inc
> @@ -9,7 +9,7 @@
>  # SPDX-License-Identifier: MIT
>  #
>  
> -IMAGE_FSTYPES = "ext4-img"
> +IMAGE_FSTYPES = "ext4"
>  
>  IMAGE_INSTALL_remove += "expand-on-first-boot"
>  ROOTFS_EXTRA = "1024"
> diff --git a/conf/machine/x86-64-efi.conf b/conf/machine/x86-64-efi.conf
> index cb3ed85..036bdcd 100644
> --- a/conf/machine/x86-64-efi.conf
> +++ b/conf/machine/x86-64-efi.conf
> @@ -11,5 +11,5 @@
>  
>  DISTRO_ARCH = "amd64"
>  
> -IMAGE_FSTYPES ?= "wic-img"
> +IMAGE_FSTYPES ?= "wic"
>  IMAGER_INSTALL += "${GRUB_BOOTLOADER_INSTALL}"
> diff --git a/kas.yml b/kas.yml
> index 6b3b32d..7a06aca 100644
> --- a/kas.yml
> +++ b/kas.yml
> @@ -22,7 +22,7 @@ repos:
>  
>isar:
>  url: https://github.com/ilbers/isar.git
> -refspec: a960a4e52c50ef4a15e3827685fa9cfffead
> +refspec: 660d23fe898297524628a058d6121d9e425694f9
>  layers:
>meta:
>  
> diff --git a/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb 
> b/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
> index 6231e9e..00e4058 100644
> --- a/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
> +++ b/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
> @@ -20,3 +20,5 @@ SRC_URI = " \
>  SRCREV = "${AUTOREV}"
>  
>  S = "${WORKDIR}/git"
> +
> +SRC_URI += "file://stable.patch"
> diff --git a/start-qemu.sh b/start-qemu.sh
> index 68516cc..33b6e65 100755
> --- a/start-qemu.sh
> +++ b/start-qemu.sh
> @@ -83,7 +83,7 @@ fi
>  shift 1
>  
>  ${QEMU_PATH}${QEMU} \
> - -drive 
> file=${IMAGE_PREFIX}.ext4.img,discard=unmap,if=none,id=disk,format=raw \
> + -drive 
> file=${IMAGE_PREFIX}.ext4,discard=unmap,if=none,id=disk,format=raw \
>   -m 1G -serial mon:stdio -netdev user,id=net \
>   -kernel ${IMAGE_PREFIX}-${KERNEL_SUFFIX} -append "${KERNEL_CMDLINE}" \
>   -initrd ${IMAGE_PREFIX}-initrd.img ${QEMU_EXTRA_ARGS} "$@"

Strike it. Missed some cases, and it contains a debug line.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



[xenomai-images][PATCH] Update Isar revision

2022-06-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

This brings changes to image types that are easy to account for. Also
the image file name changed, so adjust readme and start-qemu.sh.

The update fixes logging issues, thus helps a lot with analyzing failing
builds, specifically in CI.

Signed-off-by: Jan Kiszka 
---
 README.md   | 2 +-
 conf/machine/hikey.conf | 2 +-
 conf/machine/qemu-machine.inc   | 2 +-
 conf/machine/x86-64-efi.conf| 2 +-
 kas.yml | 2 +-
 recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb | 2 ++
 start-qemu.sh   | 2 +-
 7 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index b38e131..5f6c4c7 100644
--- a/README.md
+++ b/README.md
@@ -41,7 +41,7 @@ armhf(`board-beagle-bone-black.yml`) and 
arm64(`board-hikey.yml`) targets.
 Each physical target will generate ready-to-boot images under
 `build/tmp/deploy/images/`. To flash, e.g., the HiKey image to an SD card, run
 
-dd 
if=build/tmp/deploy/images/hikey/demo-image-hikey-xenomai-demo-hikey.wic.img \
+dd 
if=build/tmp/deploy/images/hikey/demo-image-hikey-xenomai-demo-hikey.wic \
of=/dev/ bs=1M status=progress
 
 ## CI Build
diff --git a/conf/machine/hikey.conf b/conf/machine/hikey.conf
index 6ad4611..344a3de 100644
--- a/conf/machine/hikey.conf
+++ b/conf/machine/hikey.conf
@@ -11,7 +11,7 @@
 
 DISTRO_ARCH = "arm64"
 
-IMAGE_FSTYPES ?= "wic-img"
+IMAGE_FSTYPES ?= "wic"
 IMAGER_INSTALL += "${GRUB_BOOTLOADER_INSTALL}"
 
 IMAGE_PREINSTALL_append = " firmware-ti-connectivity"
diff --git a/conf/machine/qemu-machine.inc b/conf/machine/qemu-machine.inc
index 7771a11..d9ee2dd 100644
--- a/conf/machine/qemu-machine.inc
+++ b/conf/machine/qemu-machine.inc
@@ -9,7 +9,7 @@
 # SPDX-License-Identifier: MIT
 #
 
-IMAGE_FSTYPES = "ext4-img"
+IMAGE_FSTYPES = "ext4"
 
 IMAGE_INSTALL_remove += "expand-on-first-boot"
 ROOTFS_EXTRA = "1024"
diff --git a/conf/machine/x86-64-efi.conf b/conf/machine/x86-64-efi.conf
index cb3ed85..036bdcd 100644
--- a/conf/machine/x86-64-efi.conf
+++ b/conf/machine/x86-64-efi.conf
@@ -11,5 +11,5 @@
 
 DISTRO_ARCH = "amd64"
 
-IMAGE_FSTYPES ?= "wic-img"
+IMAGE_FSTYPES ?= "wic"
 IMAGER_INSTALL += "${GRUB_BOOTLOADER_INSTALL}"
diff --git a/kas.yml b/kas.yml
index 6b3b32d..7a06aca 100644
--- a/kas.yml
+++ b/kas.yml
@@ -22,7 +22,7 @@ repos:
 
   isar:
 url: https://github.com/ilbers/isar.git
-refspec: a960a4e52c50ef4a15e3827685fa9cfffead
+refspec: 660d23fe898297524628a058d6121d9e425694f9
 layers:
   meta:
 
diff --git a/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb 
b/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
index 6231e9e..00e4058 100644
--- a/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
+++ b/recipes-xenomai/xenomai/xenomai_stable-3.0.x.bb
@@ -20,3 +20,5 @@ SRC_URI = " \
 SRCREV = "${AUTOREV}"
 
 S = "${WORKDIR}/git"
+
+SRC_URI += "file://stable.patch"
diff --git a/start-qemu.sh b/start-qemu.sh
index 68516cc..33b6e65 100755
--- a/start-qemu.sh
+++ b/start-qemu.sh
@@ -83,7 +83,7 @@ fi
 shift 1
 
 ${QEMU_PATH}${QEMU} \
-   -drive 
file=${IMAGE_PREFIX}.ext4.img,discard=unmap,if=none,id=disk,format=raw \
+   -drive 
file=${IMAGE_PREFIX}.ext4,discard=unmap,if=none,id=disk,format=raw \
-m 1G -serial mon:stdio -netdev user,id=net \
-kernel ${IMAGE_PREFIX}-${KERNEL_SUFFIX} -append "${KERNEL_CMDLINE}" \
-initrd ${IMAGE_PREFIX}-initrd.img ${QEMU_EXTRA_ARGS} "$@"
-- 
2.35.3



[PATCH 3.2.x] cobalt/x86: Account for changes to switch_fpu_finish in 5.4.182

2022-06-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

The signature of switch_fpu_finish changed in stable 5.4.

Signed-off-by: Jan Kiszka 
---
 kernel/cobalt/arch/x86/ipipe/thread.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/cobalt/arch/x86/ipipe/thread.c 
b/kernel/cobalt/arch/x86/ipipe/thread.c
index dd97a5d32c..7e28903a42 100644
--- a/kernel/cobalt/arch/x86/ipipe/thread.c
+++ b/kernel/cobalt/arch/x86/ipipe/thread.c
@@ -425,7 +425,9 @@ void xnarch_leave_root(struct xnthread *root)
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)
/* restore current's fpregs */
__cpu_invalidate_fpregs_state();
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,2,0)
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,4,182)
+   switch_fpu_finish(current);
+#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5,2,0)
switch_fpu_finish(>thread.fpu);
 #else
switch_fpu_finish(>thread.fpu, raw_smp_processor_id());
-- 
2.35.3



Re: [PATCH] rtdm/drvlib: Prevent pagefaults on arm on io mapping

2022-06-15 Thread Jan Kiszka via Xenomai
On 15.06.22 10:30, Philippe Gerum wrote:
> 
> Jan Kiszka  writes:
> 
>> On 15.06.22 09:54, Philippe Gerum wrote:
>>>
>>> Jan Kiszka via Xenomai  writes:
>>>
>>>> On 23.05.22 16:04, Gunter Grau via Xenomai wrote:
>>>>> From: Gunter Grau 
>>>>>
>>>>> When mapping io memory into userspace an extra simulated pagefault for all
>>>>> pages is added to prevent later pagefaults because of copy on write
>>>>> mechanisms. This happens only on architectures that have defined the
>>>>> needed cobalt_machine.prefault function.
>>>>>
>>>>> Signed-off-by: Gunter Grau 
>>>>> ---
>>>>>  kernel/cobalt/rtdm/drvlib.c | 10 +-
>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/cobalt/rtdm/drvlib.c b/kernel/cobalt/rtdm/drvlib.c
>>>>> index 4eaf3a57c..db8431ee1 100644
>>>>> --- a/kernel/cobalt/rtdm/drvlib.c
>>>>> +++ b/kernel/cobalt/rtdm/drvlib.c
>>>>> @@ -1761,6 +1761,7 @@ static int mmap_iomem_helper(struct vm_area_struct 
>>>>> *vma, phys_addr_t pa)
>>>>>  {
>>>>>   pgprot_t prot = PAGE_SHARED;
>>>>>   unsigned long len;
>>>>> + int ret;
>>>>>  
>>>>>   len = vma->vm_end - vma->vm_start;
>>>>>  #ifndef CONFIG_MMU
>>>>> @@ -1774,8 +1775,15 @@ static int mmap_iomem_helper(struct vm_area_struct 
>>>>> *vma, phys_addr_t pa)
>>>>>  #endif
>>>>>   vma->vm_page_prot = pgprot_noncached(prot);
>>>>>  
>>>>> - return remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
>>>>> + ret = remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
>>>>>  len, vma->vm_page_prot);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + if (cobalt_machine.prefault)
>>>>> + cobalt_machine.prefault(vma);
>>>>> +
>>>>> + return ret;
>>>>>  }
>>>>>  
>>>>>  static int mmap_buffer_helper(struct rtdm_fd *fd, struct vm_area_struct 
>>>>> *vma)
>>>>
>>>> Wow, that was likely broken by the refactoring in c8e9e166, long ago.
>>>>
>>>> Applied to next
>>>>
>>>
>>> The prefault hook has always been specifically about COW-breaking, I/O
>>> memory has no business with this, so there is no point in having the
>>> iomem helper calling the prefaulting hook.
>>>
>>> I suspect that rtdm_mmap_to_user() should be called instead of
>>> rtdm_iomap_to_user() in the case at hand.
>>>
>>
>> If Gunter is mapping IO memory, rtdm_mmap_to_user is surely not the
>> right thing.
> 
> The xenomai2 implementation had a single helper dealing with I/O and
> kernel memory mappings (from virtual and linear memory) altogether. So I
> would not find impossible that wrong assumptions could be made from
> this implementation.
> 
>>
>> Could it be that the prefault callback also has the side effect of
>> setting all page table entries that would otherwise only be filled lazily?
> 
> Yes, this could prevent PTE misses, however I would expect minor faults
> to take place, those should be handled directly from the out-of-band
> stage.  Otherwise, this _might_ be an issue with the interrupt pipeline
> used. The prefault for ARM was kind of a hack to work around
> shortcomings from the generic COW-breaking mechanism implemented by the
> I-pipe on ARM (IIRC, this had to do with LPAE support).
> 

Gunter, could you take a function-trace of that very first fault without
your patch applied? Then we may see better what actually happens here,
specifically if your patch just happens to paper over a real issue.

BTW, you only tested it with I-pipe kernels so far, or did you also see
the issue with dovetail (5.10+)?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] rtdm/drvlib: Prevent pagefaults on arm on io mapping

2022-06-15 Thread Jan Kiszka via Xenomai
On 15.06.22 09:54, Philippe Gerum wrote:
> 
> Jan Kiszka via Xenomai  writes:
> 
>> On 23.05.22 16:04, Gunter Grau via Xenomai wrote:
>>> From: Gunter Grau 
>>>
>>> When mapping io memory into userspace an extra simulated pagefault for all
>>> pages is added to prevent later pagefaults because of copy on write
>>> mechanisms. This happens only on architectures that have defined the
>>> needed cobalt_machine.prefault function.
>>>
>>> Signed-off-by: Gunter Grau 
>>> ---
>>>  kernel/cobalt/rtdm/drvlib.c | 10 +-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/cobalt/rtdm/drvlib.c b/kernel/cobalt/rtdm/drvlib.c
>>> index 4eaf3a57c..db8431ee1 100644
>>> --- a/kernel/cobalt/rtdm/drvlib.c
>>> +++ b/kernel/cobalt/rtdm/drvlib.c
>>> @@ -1761,6 +1761,7 @@ static int mmap_iomem_helper(struct vm_area_struct 
>>> *vma, phys_addr_t pa)
>>>  {
>>> pgprot_t prot = PAGE_SHARED;
>>> unsigned long len;
>>> +   int ret;
>>>  
>>> len = vma->vm_end - vma->vm_start;
>>>  #ifndef CONFIG_MMU
>>> @@ -1774,8 +1775,15 @@ static int mmap_iomem_helper(struct vm_area_struct 
>>> *vma, phys_addr_t pa)
>>>  #endif
>>> vma->vm_page_prot = pgprot_noncached(prot);
>>>  
>>> -   return remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
>>> +   ret = remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
>>>len, vma->vm_page_prot);
>>> +   if (ret)
>>> +   return ret;
>>> +
>>> +   if (cobalt_machine.prefault)
>>> +   cobalt_machine.prefault(vma);
>>> +
>>> +   return ret;
>>>  }
>>>  
>>>  static int mmap_buffer_helper(struct rtdm_fd *fd, struct vm_area_struct 
>>> *vma)
>>
>> Wow, that was likely broken by the refactoring in c8e9e166, long ago.
>>
>> Applied to next
>>
> 
> The prefault hook has always been specifically about COW-breaking, I/O
> memory has no business with this, so there is no point in having the
> iomem helper calling the prefaulting hook.
> 
> I suspect that rtdm_mmap_to_user() should be called instead of
> rtdm_iomap_to_user() in the case at hand.
> 

If Gunter is mapping IO memory, rtdm_mmap_to_user is surely not the
right thing.

Could it be that the prefault callback also has the side effect of
setting all page table entries that would otherwise only be filled lazily?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/2] x86: ipipe: Enable FPU tests unconditionally

2022-06-15 Thread Jan Kiszka via Xenomai
On 15.06.22 09:44, Bezdeka, Florian (T CED SES-DE) wrote:
> On Tue, 2022-06-14 at 20:11 +0200, Jan Kiszka wrote:
>> On 08.06.22 18:59, Bezdeka, Florian (T CED SES-DE) wrote:
>>> On Wed, 2022-06-08 at 17:02 +0200, Jan Kiszka wrote:
>>>> On 25.05.22 11:56, Florian Bezdeka wrote:
>>>>> Parts of the FPU tests were skipped when one of the following config
>>>>> options was enabled, shadowing a real test issue that was triggered by
>>>>> high load on the system. The options:
>>>>>   - CONFIG_X86_USE_3DNOW
>>>>>   - CONFIG_MD_RAID456
>>>>>   - CONFIG_MD_RAID456_MODULE
>>>>>
>>>>> As the FPU initialization is fixed now, we can enable the tests
>>>>> unconditionally.
>>>>>
>>>>> Signed-off-by: Florian Bezdeka 
>>>>> ---
>>>>>  .../arch/x86/ipipe/include/asm/xenomai/fptest.h | 13 -
>>>>>  1 file changed, 13 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h 
>>>>> b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>>>> index ccf7afa11..7a2b17d75 100644
>>>>> --- a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>>>> +++ b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>>>> @@ -36,19 +36,6 @@ static inline void fp_init(void)
>>>>>
>>>>>  static inline int fp_linux_begin(void)
>>>>>  {
>>>>> -#if defined(CONFIG_X86_USE_3DNOW) \
>>>>> -   || defined(CONFIG_MD_RAID456) || defined(CONFIG_MD_RAID456_MODULE)
>>>>> -   /* Ther kernel uses x86 FPU, we can not also use it in our tests. */
>>>>> -   static int once = 0;
>>>>> -   if (!once) {
>>>>> -   once = 1;
>>>>> -   printk("%s:%d: Warning: Linux is compiled to use FPU in "
>>>>> -  "kernel-space.\nFor this reason, switchtest can not "
>>>>> -  "test using FPU in Linux kernel-space.\n",
>>>>> -  __FILE__, __LINE__);
>>>>> -   }
>>>>> -   return -EBUSY;
>>>>> -#endif /* 3DNow or RAID 456 */
>>>>> kernel_fpu_begin();
>>>>> /* kernel_fpu_begin() does no re-initialize the fpu context, but
>>>>>fp_regs_set() implicitely expects an initialized fpu context, so
>>>>
>>>> Hmm, I'm not yet fully convinced from reading both commit logs that the
>>>> one fix actually obsoletes this check. Did it really only paper over a
>>>> simple bug?
>>>
>>> I don't have the full history here, but it seems that this was kind of
>>> double protection.
>>>
>>> So far all tests did not bring up any further issues.
>>>
>>> On systems with RAID (=systems with one of the mentioned options
>>> enabled) FPU usage is much more likely and bugs would trigger more
>>> likely. I would like to enable the FPU systems especially on such
>>> systems.
>>>
>>> But: In case we have more undiscovered bugs in this area, it might
>>> happen that we damage a RAID based file system. It seems Gilles had
>>> such a system and tried to prevent FS damage this way.
>>>
>>
>> OK, it's just a test setup in the end - let's dare it.
>>
>> Applied both to stable/v3.2.
> 
> I have prepared backports for stable/v3.1.x and stable/v3.0.x as well.
> If there is interest I could easily send them out. I was just waiting
> for feedback to avoid reworking them all.

Great, please share then.

> 
> Do we try to keep the stable branches "synchronized" even for testing
> issues?

More or less. In this case, 3.2 could rush forward first, though, as
there is no testing on next.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [REMINDER] Mailing list rehosting

2022-06-15 Thread Jan Kiszka via Xenomai
On 21.05.22 12:48, Philippe Gerum via Xenomai wrote:
> 
> The Xenomai mailing list server will be migrated to [1] on June 18
> 2022. In the meantime, the current server will be operating as
> usual. However, the existing subscriptions to xenomai@xenomai.org will
> NOT be automatically transferred to the new server.
> 
> This means that:
> 
> - The Xenomai mailing list service will be inaccessible on June 18
>   from 4pm to 6pm CET for maintenance duties.
> 
> - Starting from June 18 at 6pm CET, all posts to xenomai@xenomai.org
>   will be forwarded to xeno...@lists.linux.dev, the current mailing list
>   archive will move to [2] afterwards.
> 
> - If you want to keep on receiving e-mails from the Xenomai mailing list
>   after this date, please subscribe to xeno...@lists.linux.dev by
>   sending an empty mail to xenomai+subscr...@lists.linux.dev.
> 
> NOTE: xeno...@lists.linux.dev is a public list, you only need to
> subscribe for receiving e-mails.
> 
> Thanks,
> 
> [1] https://subspace.kernel.org/lists.linux.dev.html
> [2] https://lore.kernel.org/xenomai/
> 

Reminder @all that this switch is happening this weekend. Make sure you
are already subscribed to the new list.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [I-PIPE Xenoami3.1 PATCH 2/2] x86/tsc: I-PIPE : notify I-PIPE about updated clockfreq

2022-06-14 Thread Jan Kiszka via Xenomai
On 27.05.22 08:22, Hongzhan Chen via Xenomai wrote:
> When there is refined tsc clock, notify Xenomai to apply it.

Xenomai is conceptually not known to I-pipe. This patch is about calling
the well-defined kevent hook when the TSC frequency changes after a
recalibration. ARM does something similar on frequency changes.

> Linux may schedule a delayed work to refine tsc clock and update
> tsc_khz which happen after Xenomai finsih init but tsc_scale and
> tsc_shift still keep the value depending on origianl tsc clock
> which is outdated. The difference between two clocks may cause
> unexpected timing drift.
> 
> For example:
>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>   cat /sys/module/xenomai/parameters/clockfreq
>   2899886000
>   After patching, we like to use 2903.999 MHz.
> 
> Signed-off-by: Hongzhan Chen 
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 835856efd71f..e2ca733d76ee 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1294,6 +1294,7 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   u64 tsc_stop, ref_stop, delta;
>   unsigned long freq;
>   int cpu;
> + unsigned int ipipe_freq;
>  
>   /* Don't bother refining TSC on unstable systems */
>   if (tsc_unstable)
> @@ -1345,6 +1346,10 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + /* notify xenomai about updated clockfreq */

Drop the comment, it's misleading.

> + ipipe_freq = tsc_khz * 1000;
> +  __ipipe_report_clockfreq_update(ipipe_freq);

Why not simply

__ipipe_report_clockfreq_update(tsc_khz * 1000);

?

> +
>   /* Update the sched_clock() rate to match the clocksource one */
>   for_each_possible_cpu(cpu)
>   set_cyc2ns_scale(tsc_khz, cpu, tsc_stop);

The patch looks like valid candidate for 5.4, 4.19-cip and even 4.4-cip.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [Cobalt Xenoami3.1 PATCH 1/2] process: update clockfreq when receive corresponding event.

2022-06-14 Thread Jan Kiszka via Xenomai
On 27.05.22 08:22, Hongzhan Chen via Xenomai wrote:
> 1. When there is clockfreq param passed down via command line, we
>do not update clockfreq even if we receive event of updating clockfreq.
>Or else, we update the clockfreq with notified value.
> 2. At the same time, we would like to update clockfreq param showing
>in sys filesystem after apply updated clockfreq.
> 
> Signed-off-by: Hongzhan Chen 
> 
> diff --git a/include/cobalt/kernel/init.h b/include/cobalt/kernel/init.h
> index 41dd531a8..4313b 100644
> --- a/include/cobalt/kernel/init.h
> +++ b/include/cobalt/kernel/init.h
> @@ -51,4 +51,6 @@ void cobalt_remove_state_chain(struct notifier_block *nb);
>  
>  void cobalt_call_state_chain(enum cobalt_run_states newstate);
>  
> +void cobalt_update_clockfreq_arg(unsigned long updatedclockfreq);

Just "freq" is sufficient as parameter name.

> +
>  #endif /* !_COBALT_KERNEL_INIT_H_ */
> diff --git a/kernel/cobalt/include/asm-generic/xenomai/machine.h 
> b/kernel/cobalt/include/asm-generic/xenomai/machine.h
> index 25764f989..aaa3edc97 100644
> --- a/kernel/cobalt/include/asm-generic/xenomai/machine.h
> +++ b/kernel/cobalt/include/asm-generic/xenomai/machine.h
> @@ -61,6 +61,7 @@ struct cobalt_pipeline {
>  #ifdef CONFIG_SMP
>   cpumask_t supported_cpus;
>  #endif
> + unsigned int passed_clockfreq;

bool

But I would rather make this a private flag in cobalt/init.c, see below.

>  };
>  
>  extern struct cobalt_pipeline cobalt_pipeline;
> diff --git a/kernel/cobalt/init.c b/kernel/cobalt/init.c
> index dbe321c3b..a6cfc1e06 100644
> --- a/kernel/cobalt/init.c
> +++ b/kernel/cobalt/init.c
> @@ -53,6 +53,19 @@ module_param_named(timerfreq, timerfreq_arg, ulong, 0444);
>  static unsigned long clockfreq_arg;
>  module_param_named(clockfreq, clockfreq_arg, ulong, 0444);
>  
> +void cobalt_update_clockfreq_arg(unsigned long updatedclockfreq)
> +{
> + spl_t s;
> +
> + xnlock_get_irqsave(, s);
> +
> + clockfreq_arg = updatedclockfreq;
> +
> + xnlock_put_irqrestore(, s);

How is synchronization supposed to work here?

And when is clockfreq_arg supposed to be consumed after mach_setup? By
whom? /sys/module/xenomai/parameters/?

To make this function more useful, it could perform the "was passed"
check and also set clockfreq_arg and call xnclock_update_freq() if it
was not defined via the command line.

> +
> +}
> +EXPORT_SYMBOL_GPL(cobalt_update_clockfreq_arg);

Why exporting this internal helper? Xenomai is always built into the kernel.

> +
>  #ifdef CONFIG_SMP
>  static unsigned long supported_cpus_arg = -1;
>  module_param_named(supported_cpus, supported_cpus_arg, ulong, 0444);
> @@ -148,8 +161,11 @@ static int __init mach_setup(void)
>   if (timerfreq_arg == 0)
>   timerfreq_arg = sysinfo.sys_hrtimer_freq;
>  
> - if (clockfreq_arg == 0)
> + if (clockfreq_arg == 0) {
>   clockfreq_arg = sysinfo.sys_hrclock_freq;
> + cobalt_pipeline.passed_clockfreq = 0;

Global variables are already zero-initialized.

> + } else
> + cobalt_pipeline.passed_clockfreq = 1;

cobalt_pipeline.passed_clockfreq = clockfreq_arg != 0;

>  
>   if (clockfreq_arg == 0) {
>   printk(XENO_ERR "null clock frequency? Aborting.\n");
> diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
> index 6d1c1c427..41ea47b0d 100644
> --- a/kernel/cobalt/posix/process.c
> +++ b/kernel/cobalt/posix/process.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1382,7 +1383,13 @@ static inline int handle_clockfreq_event(unsigned int 
> *p)
>  {
>   unsigned int newfreq = *p;
>  
> - xnclock_update_freq(newfreq);
> + /* when there is no para in commandline
> +  * passed down to set clockfreq

Comment does not tell anything that isn't in the code already.

> +  */
> + if (!cobalt_pipeline.passed_clockfreq) {
> + xnclock_update_freq(newfreq);
> + cobalt_update_clockfreq_arg(newfreq);
> + }

See my remark above: If you push xnclock_update_freq into
cobalt_update_clockfreq, you can simplify to logic here, just call
cobalt_update_clockfreq unconditionally.

>  
>   return KEVENT_PROPAGATE;
>  }

And now I understand that none of this is needed for 3.2, only the
I-pipe patch to send the kevent.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/2] x86: ipipe: Enable FPU tests unconditionally

2022-06-14 Thread Jan Kiszka via Xenomai
On 08.06.22 18:59, Bezdeka, Florian (T CED SES-DE) wrote:
> On Wed, 2022-06-08 at 17:02 +0200, Jan Kiszka wrote:
>> On 25.05.22 11:56, Florian Bezdeka wrote:
>>> Parts of the FPU tests were skipped when one of the following config
>>> options was enabled, shadowing a real test issue that was triggered by
>>> high load on the system. The options:
>>>   - CONFIG_X86_USE_3DNOW
>>>   - CONFIG_MD_RAID456
>>>   - CONFIG_MD_RAID456_MODULE
>>>
>>> As the FPU initialization is fixed now, we can enable the tests
>>> unconditionally.
>>>
>>> Signed-off-by: Florian Bezdeka 
>>> ---
>>>  .../arch/x86/ipipe/include/asm/xenomai/fptest.h | 13 -
>>>  1 file changed, 13 deletions(-)
>>>
>>> diff --git a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h 
>>> b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>> index ccf7afa11..7a2b17d75 100644
>>> --- a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>> +++ b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
>>> @@ -36,19 +36,6 @@ static inline void fp_init(void)
>>>
>>>  static inline int fp_linux_begin(void)
>>>  {
>>> -#if defined(CONFIG_X86_USE_3DNOW) \
>>> -   || defined(CONFIG_MD_RAID456) || defined(CONFIG_MD_RAID456_MODULE)
>>> -   /* Ther kernel uses x86 FPU, we can not also use it in our tests. */
>>> -   static int once = 0;
>>> -   if (!once) {
>>> -   once = 1;
>>> -   printk("%s:%d: Warning: Linux is compiled to use FPU in "
>>> -  "kernel-space.\nFor this reason, switchtest can not "
>>> -  "test using FPU in Linux kernel-space.\n",
>>> -  __FILE__, __LINE__);
>>> -   }
>>> -   return -EBUSY;
>>> -#endif /* 3DNow or RAID 456 */
>>> kernel_fpu_begin();
>>> /* kernel_fpu_begin() does no re-initialize the fpu context, but
>>>fp_regs_set() implicitely expects an initialized fpu context, so
>>
>> Hmm, I'm not yet fully convinced from reading both commit logs that the
>> one fix actually obsoletes this check. Did it really only paper over a
>> simple bug?
> 
> I don't have the full history here, but it seems that this was kind of
> double protection.
> 
> So far all tests did not bring up any further issues.
> 
> On systems with RAID (=systems with one of the mentioned options
> enabled) FPU usage is much more likely and bugs would trigger more
> likely. I would like to enable the FPU systems especially on such
> systems.
> 
> But: In case we have more undiscovered bugs in this area, it might
> happen that we damage a RAID based file system. It seems Gilles had
> such a system and tried to prevent FS damage this way.
> 

OK, it's just a test setup in the end - let's dare it.

Applied both to stable/v3.2.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] rtdm/drvlib: Prevent pagefaults on arm on io mapping

2022-06-14 Thread Jan Kiszka via Xenomai
On 23.05.22 16:04, Gunter Grau via Xenomai wrote:
> From: Gunter Grau 
> 
> When mapping io memory into userspace an extra simulated pagefault for all
> pages is added to prevent later pagefaults because of copy on write
> mechanisms. This happens only on architectures that have defined the
> needed cobalt_machine.prefault function.
> 
> Signed-off-by: Gunter Grau 
> ---
>  kernel/cobalt/rtdm/drvlib.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cobalt/rtdm/drvlib.c b/kernel/cobalt/rtdm/drvlib.c
> index 4eaf3a57c..db8431ee1 100644
> --- a/kernel/cobalt/rtdm/drvlib.c
> +++ b/kernel/cobalt/rtdm/drvlib.c
> @@ -1761,6 +1761,7 @@ static int mmap_iomem_helper(struct vm_area_struct 
> *vma, phys_addr_t pa)
>  {
>   pgprot_t prot = PAGE_SHARED;
>   unsigned long len;
> + int ret;
>  
>   len = vma->vm_end - vma->vm_start;
>  #ifndef CONFIG_MMU
> @@ -1774,8 +1775,15 @@ static int mmap_iomem_helper(struct vm_area_struct 
> *vma, phys_addr_t pa)
>  #endif
>   vma->vm_page_prot = pgprot_noncached(prot);
>  
> - return remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
> + ret = remap_pfn_range(vma, vma->vm_start, pa >> PAGE_SHIFT,
>  len, vma->vm_page_prot);
> + if (ret)
> + return ret;
> +
> + if (cobalt_machine.prefault)
> + cobalt_machine.prefault(vma);
> +
> + return ret;
>  }
>  
>  static int mmap_buffer_helper(struct rtdm_fd *fd, struct vm_area_struct *vma)

Wow, that was likely broken by the refactoring in c8e9e166, long ago.

Applied to next

Thanks,
Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: compile conflict with Boost

2022-06-14 Thread Jan Kiszka via Xenomai
On 14.06.22 16:18, Russell Johnson via Xenomai wrote:
> From 452e8b2ca8ecd53571a6b1f5d8b9ab23cd67f99d Mon Sep 17 00:00:00 2001
> From: Russell Johnson 
> Date: Tue, 14 Jun 2022 08:10:14 -0600
> Subject: [PATCH] fixing conflict with C++ [[fallthough]], and maybe at some
>  point in the future with the C2X standard
> 

We are missing the DCO (signed-of-by) here. It's a trivial change, but
we should hold up the policy for all trees.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated clockfreq.

2022-06-08 Thread Jan Kiszka via Xenomai
On 09.06.22 04:06, Chen, Hongzhan wrote:
>> -Original Message-
>> From: Jan Kiszka  
>> Sent: Wednesday, June 8, 2022 11:21 PM
>> To: Chen, Hongzhan ; xenomai@xenomai.org
>> Subject: Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated clockfreq.
>>
>> On 02.06.22 14:56, Chen, Hongzhan wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Jan Kiszka  
>>>> Sent: Thursday, June 2, 2022 5:54 PM
>>>> To: Chen, Hongzhan ; xenomai@xenomai.org
>>>> Subject: Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated 
>>>> clockfreq.
>>>>
>>>> On 27.05.22 08:22, Hongzhan Chen via Xenomai wrote:
>>>>> When there is refined tsc clock, notify Xenomai to apply it.
>>>>> Linux may schedule a delayed work to refine tsc clock and update
>>>>> tsc_khz which happen after Xenomai finsih init but tsc_scale and
>>>>> tsc_shift still keep the value depending on origianl tsc clock
>>>>> which is outdated. The difference between two clocks may cause
>>>>> timing issue.
>>>>>
>>>>> For example:
>>>>>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>>>>>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>>>>>   cat /sys/module/xenomai/parameters/clockfreq
>>>>>   2899886000
>>>>>   After patching, we like to use 2903.999 MHz.
>>>>>
>>>>> The patchset includes IPIPE patch and cobalt-patch.
>>>>>
>>>>
>>>> Sounds reasonable, but you could help me with reviewing this by already
>>>> answering:
>>>>
>>>> - How does dovetail (and xenomai 3.2 or evl) address this?
>>>
>>> So far , I have not found similar issue on dovetail-based. Dovetail-based 
>>> would go vdso uniformly so there is
>>> no such issue but IPIPE would have to depend  on tsc_khz value it got at 
>>> first to do translation even after tsc clockfreq is refined and changed.
>>
>> Right, that is reason...
>>
>>>
>>>> - Why is this tagged "3.1" only?
>>>> - Which I-pipe series is this targeting (5.4, or also 4.19)?
>>>
>>> Currently , I just reproduced this issue and verified the patch on 5.4.133 
>>> + xenomai 3.1. But according to [1] reported,
>>> the issue can be found on 4.19 and I think my patch may work but I have not 
>>> verified on 4.19. 
>>>
>>
>> Please check stable/v3.2.x first (or as well) as that is the latest
>> stable with I-pipe still included. Once the fix is merged there, we can
>> pick it for 3.1 and possibly even 3.0 as well.
> 
> The code between 3.2 and 3.1 involving this patchset is quite different now. 
> For 3.2, we already dropped code related to clockfreq parameter
> in 6d2989b6da73ec52fe8c990798be8a637e4db5b9 by Philippe.
> But in my patchset for 3.1, we have to update value of  clockfreq parameter 
> to show correct clock freq after we update to refined clock freq.
> To keep consistency between 3.1 and 3.2 for I-PIPE based code, do we need to 
> revert this part of code related to clockfreq parameter for 3.2?

Do you see the problem of 3.1+ipipe with 3.2 as well? I suspect so, but
please confirm. Then we do need a solution there, too. How is the next
question, I need to dive into that again.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] x86/fpu: fix compile error without kernel_fpu_disabled()

2022-06-08 Thread Jan Kiszka via Xenomai
On 26.05.22 18:19, Philippe Gerum wrote:
> 
> jamiens...@163.com writes:
> 
>> From: Jamie Huang 
>>
>> In v5.18-evl-rebase, function kernel_fpu_disabled() has been removed in
>> commit 59f5ede3bc0f("x86/fpu: Prevent FPU state corruption"), so we will
>> get compile error when CONFIG_DOVETAIL is enabled:
>> arch/x86/kernel/fpu/core.c:931:6: error: implicit declaration of function 
>> ‘kernel_fpu_disabled’; did you mean ‘perf_pmu_disable’? 
>> [-Werror=implicit-function-declaration]
>> if (kernel_fpu_disabled()) {
>> ^~~
>> perf_pmu_disable
>> cc1: all warnings being treated as errors
>> So, fix it.
>>
>> Signed-off-by: Jamie Huang 
>> ---
>>  arch/x86/kernel/fpu/core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index 6a0d1e7f278f..c3adbbb73226 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -928,7 +928,7 @@ void fpu__suspend_inband(void)
>>   * preemption of an inband kernel context currently using the
>>   * fpu by a thread which resumes on the oob stage.
>>   */
>> -if (kernel_fpu_disabled()) {
>> +if (this_cpu_read(in_kernel_fpu)) {
>>  save_fpregs_to_fpstate(kfpu);
>>  __cpu_invalidate_fpregs_state();
>>  oob_fpu_set_preempt(>thread.fpu);
> 
> Merged adding the same fixup to the comment nearby, thanks.
> 

Could you backport to 5.15 as well?

https://gitlab.com/Xenomai/xenomai-hacker-space/-/jobs/2563340960

Thanks,
Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/2] net/drivers: Remove ARM64 restriction for fec driver

2022-06-08 Thread Jan Kiszka via Xenomai
On 23.05.22 16:19, Gunter Grau via Xenomai wrote:
> From: Johann Wiens 
> 
> As described in commit 04fab252f5d2ec3fe47be266f94714c3dda624bd, the
> fec driver was added with the intention to have it working on an
> i.MX8 target.
> But i.MX SoC specific quirks are also handled already since the driver
> was originally ported from the Linux kernel.
> It was now tested on i.MX6 and i.MX7 also (raw packet socket only).
> 
> Signed-off-by: Gunter Grau 
> ---
>  kernel/drivers/net/drivers/Kconfig | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/kernel/drivers/net/drivers/Kconfig 
> b/kernel/drivers/net/drivers/Kconfig
> index 3a90dd8ab..ecbad2cd0 100644
> --- a/kernel/drivers/net/drivers/Kconfig
> +++ b/kernel/drivers/net/drivers/Kconfig
> @@ -101,8 +101,6 @@ config XENO_DRIVERS_NET_DRV_MACB
>  
>  endif
>  
> -if ARM64
> -
>  config XENO_DRIVERS_NET_FEC
>  depends on XENO_DRIVERS_NET
>  tristate "Freescale FEC"
> @@ -113,8 +111,6 @@ config XENO_DRIVERS_NET_FEC
>  For built-in 10/100 Fast ethernet controller on Freescale i.MX
>  processors.
>  
> -endif
> -
>  source "drivers/xenomai/net/drivers/experimental/Kconfig"
>  
>  endmenu

Thanks, both applied. Let's see if the driver is already built in CI for
arm64 or if it takes another patch.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated clockfreq.

2022-06-08 Thread Jan Kiszka via Xenomai
On 02.06.22 14:56, Chen, Hongzhan wrote:
> 
> 
>> -Original Message-----
>> From: Jan Kiszka  
>> Sent: Thursday, June 2, 2022 5:54 PM
>> To: Chen, Hongzhan ; xenomai@xenomai.org
>> Subject: Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated clockfreq.
>>
>> On 27.05.22 08:22, Hongzhan Chen via Xenomai wrote:
>>> When there is refined tsc clock, notify Xenomai to apply it.
>>> Linux may schedule a delayed work to refine tsc clock and update
>>> tsc_khz which happen after Xenomai finsih init but tsc_scale and
>>> tsc_shift still keep the value depending on origianl tsc clock
>>> which is outdated. The difference between two clocks may cause
>>> timing issue.
>>>
>>> For example:
>>>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>>>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>>>   cat /sys/module/xenomai/parameters/clockfreq
>>>   2899886000
>>>   After patching, we like to use 2903.999 MHz.
>>>
>>> The patchset includes IPIPE patch and cobalt-patch.
>>>
>>
>> Sounds reasonable, but you could help me with reviewing this by already
>> answering:
>>
>> - How does dovetail (and xenomai 3.2 or evl) address this?
> 
> So far , I have not found similar issue on dovetail-based. Dovetail-based 
> would go vdso uniformly so there is
> no such issue but IPIPE would have to depend  on tsc_khz value it got at 
> first to do translation even after tsc clockfreq is refined and changed.

Right, that is reason...

> 
>> - Why is this tagged "3.1" only?
>> - Which I-pipe series is this targeting (5.4, or also 4.19)?
> 
> Currently , I just reproduced this issue and verified the patch on 5.4.133 + 
> xenomai 3.1. But according to [1] reported,
> the issue can be found on 4.19 and I think my patch may work but I have not 
> verified on 4.19. 
> 

Please check stable/v3.2.x first (or as well) as that is the latest
stable with I-pipe still included. Once the fix is merged there, we can
pick it for 3.1 and possibly even 3.0 as well.

Jan

> Regards
> 
> Hongzhan Chen
> 
> 
> [1]: https://xenomai.org/pipermail/xenomai/2022-May/047770.html
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 1/1] drivers/serial/16550A_pci.h: allow custom baud_base with pci cards

2022-06-08 Thread Jan Kiszka via Xenomai
On 27.05.22 23:48, Richard Weinberger via Xenomai wrote:
> On Fri, May 27, 2022 at 11:35 PM Konstantin Smola via Xenomai
>  wrote:
>>
>> pci probe was overwriting baud_base with default values, ignoring baud_base 
>> arguments passed in while loading driver.
>>
>> Signed-off-by: Konstantin Smola 
>> ---
>>  kernel/drivers/serial/16550A_pci.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/drivers/serial/16550A_pci.h 
>> b/kernel/drivers/serial/16550A_pci.h
>> index 91b0390..b747a10 100644
>> --- a/kernel/drivers/serial/16550A_pci.h
>> +++ b/kernel/drivers/serial/16550A_pci.h
>> @@ -244,7 +244,8 @@ static int rt_16550_pci_probe(struct pci_dev *pdev,
>> io[i] = base_addr + port * board->port_ofs;
>> irq[i] = pdev->irq;
>> irqtype[i] = board->irqtype;
>> -   baud_base[i] = board->baud_base;
>> + if (baud_base[i] == 0)
>> +  baud_base[i] = board->baud_base;
> 
> This assumes that the i-th baud_base you specify as module parameter
> will also be
> the i-th probed PCI driver.
> But PCI can probe devices in any order, even userspace an unbind/bind
> them at will.
> 

Yeah, this whole index-based param passing at least became fragile over
the past 15 years (time passed...), if it wasn't back then already. But
I think this is not getting worse with this patch, is it?

What is getting a little bit worse is the coding style. Please fix
indentions.

Jan

PS: Some "rtsetserial" to claim and parameterize an RTDM UART would
likely be nicer, long-term.

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/2] x86: ipipe: Enable FPU tests unconditionally

2022-06-08 Thread Jan Kiszka via Xenomai
On 25.05.22 11:56, Florian Bezdeka wrote:
> Parts of the FPU tests were skipped when one of the following config
> options was enabled, shadowing a real test issue that was triggered by
> high load on the system. The options:
>   - CONFIG_X86_USE_3DNOW
>   - CONFIG_MD_RAID456
>   - CONFIG_MD_RAID456_MODULE
> 
> As the FPU initialization is fixed now, we can enable the tests
> unconditionally.
> 
> Signed-off-by: Florian Bezdeka 
> ---
>  .../arch/x86/ipipe/include/asm/xenomai/fptest.h | 13 -
>  1 file changed, 13 deletions(-)
> 
> diff --git a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h 
> b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
> index ccf7afa11..7a2b17d75 100644
> --- a/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
> +++ b/kernel/cobalt/arch/x86/ipipe/include/asm/xenomai/fptest.h
> @@ -36,19 +36,6 @@ static inline void fp_init(void)
>  
>  static inline int fp_linux_begin(void)
>  {
> -#if defined(CONFIG_X86_USE_3DNOW) \
> - || defined(CONFIG_MD_RAID456) || defined(CONFIG_MD_RAID456_MODULE)
> - /* Ther kernel uses x86 FPU, we can not also use it in our tests. */
> - static int once = 0;
> - if (!once) {
> - once = 1;
> - printk("%s:%d: Warning: Linux is compiled to use FPU in "
> -"kernel-space.\nFor this reason, switchtest can not "
> -"test using FPU in Linux kernel-space.\n",
> -__FILE__, __LINE__);
> - }
> - return -EBUSY;
> -#endif /* 3DNow or RAID 456 */
>   kernel_fpu_begin();
>   /* kernel_fpu_begin() does no re-initialize the fpu context, but
>  fp_regs_set() implicitely expects an initialized fpu context, so

Hmm, I'm not yet fully convinced from reading both commit logs that the
one fix actually obsoletes this check. Did it really only paper over a
simple bug?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] switchtest: Cleanup FPU tests after ipipe -> dovetail transition

2022-06-08 Thread Jan Kiszka via Xenomai
On 25.05.22 09:47, Florian Bezdeka wrote:
> FPU usage in kernel space was allowed / enabled with ipipe, but is no
> longer available for dovetail based kernels. That allows us to clean
> up the FPU related tests of the switchtest utility.
> 
> fp_kernel_supported() can be removed as all supported architectures
> returned 0 already. That allows us to remove the FPU feature test flag
> RTTST_SWTEST_USE_FPU as well.
> 
> Signed-off-by: Florian Bezdeka 
> ---
>  include/rtdm/uapi/testing.h   |  1 -
>  .../arch/arm/include/asm/xenomai/fptest.h |  5 
>  .../arch/arm64/include/asm/xenomai/fptest.h   |  5 
>  .../arch/x86/include/asm/xenomai/fptest.h | 11 -
>  kernel/drivers/testing/switchtest.c   | 23 ---
>  testsuite/switchtest/switchtest.c |  1 -
>  6 files changed, 46 deletions(-)
> 
> diff --git a/include/rtdm/uapi/testing.h b/include/rtdm/uapi/testing.h
> index f8207b8c7..49f0b7f25 100644
> --- a/include/rtdm/uapi/testing.h
> +++ b/include/rtdm/uapi/testing.h
> @@ -68,7 +68,6 @@ struct rttst_swtest_task {
>  
>  /* Possible values for struct rttst_swtest_task::flags. */
>  #define RTTST_SWTEST_FPU 0x1
> -#define RTTST_SWTEST_USE_FPU 0x2 /* Only for kernel-space tasks. */
>  #define RTTST_SWTEST_FREEZE  0x4 /* Only for kernel-space tasks. */
>  
>  struct rttst_swtest_dir {
> diff --git a/kernel/cobalt/arch/arm/include/asm/xenomai/fptest.h 
> b/kernel/cobalt/arch/arm/include/asm/xenomai/fptest.h
> index ca1752206..fc177fcb5 100644
> --- a/kernel/cobalt/arch/arm/include/asm/xenomai/fptest.h
> +++ b/kernel/cobalt/arch/arm/include/asm/xenomai/fptest.h
> @@ -30,11 +30,6 @@
>  
>  #include 
>  
> -static inline int fp_kernel_supported(void)
> -{
> - return 0;
> -}
> -
>  static inline int fp_linux_begin(void)
>  {
>   return -ENOSYS;
> diff --git a/kernel/cobalt/arch/arm64/include/asm/xenomai/fptest.h 
> b/kernel/cobalt/arch/arm64/include/asm/xenomai/fptest.h
> index bc9dc342e..0958d5e2a 100644
> --- a/kernel/cobalt/arch/arm64/include/asm/xenomai/fptest.h
> +++ b/kernel/cobalt/arch/arm64/include/asm/xenomai/fptest.h
> @@ -13,11 +13,6 @@
>  
>  #define have_fp (ELF_HWCAP & HWCAP_FP)
>  
> -static inline int fp_kernel_supported(void)
> -{
> - return 0;
> -}
> -
>  static inline int fp_linux_begin(void)
>  {
>   return -ENOSYS;
> diff --git a/kernel/cobalt/arch/x86/include/asm/xenomai/fptest.h 
> b/kernel/cobalt/arch/x86/include/asm/xenomai/fptest.h
> index 83a6413d5..55818f853 100644
> --- a/kernel/cobalt/arch/x86/include/asm/xenomai/fptest.h
> +++ b/kernel/cobalt/arch/x86/include/asm/xenomai/fptest.h
> @@ -24,17 +24,6 @@
>  #include 
>  #include 
>  
> -/*
> - * We do NOT support out-of-band FPU operations in kernel space for a
> - * reason: this is a mess. Out-of-band FPU is just fine and makes a
> - * lot of sense for many real-time applications, but you have to do
> - * that from userland.
> - */
> -static inline int fp_kernel_supported(void)
> -{
> - return 0;
> -}
> -
>  static inline int fp_linux_begin(void)
>  {
>   kernel_fpu_begin();
> diff --git a/kernel/drivers/testing/switchtest.c 
> b/kernel/drivers/testing/switchtest.c
> index b5bc256df..9072717d5 100644
> --- a/kernel/drivers/testing/switchtest.c
> +++ b/kernel/drivers/testing/switchtest.c
> @@ -416,9 +416,6 @@ static void rtswitch_ktask(void *cookie)
>   rtswitch_pend_rt(ctx, task->base.index);
>  
>   while (!rtdm_task_should_stop()) {
> - if (task->base.flags & RTTST_SWTEST_USE_FPU)
> - fp_regs_set(fp_features, task->base.index + i * 1000);
> -
>   switch(i % 3) {
>   case 0:
>   /* to == from means "return to last task" */
> @@ -437,17 +434,6 @@ static void rtswitch_ktask(void *cookie)
>   rtswitch_to_rt(ctx, task->base.index, to);
>   }
>  
> - if (task->base.flags & RTTST_SWTEST_USE_FPU) {
> - expected = task->base.index + i * 1000;
> - fp_val = fp_regs_check(fp_features, expected, report);
> -
> - if (fp_val != expected) {
> - if (task->base.flags & RTTST_SWTEST_FREEZE)
> - xntrace_user_freeze(0, 0);
> - handle_ktask_error(ctx, fp_val);
> - }
> - }
> -
>   if (++i == 400)
>   i = 0;
>   }
> @@ -465,15 +451,6 @@ static int rtswitch_create_ktask(struct rtswitch_context 
> *ctx,
>   char name[30];
>   int err;
>  
> - /*
> -  * Silently disable FP tests in kernel if FPU is not supported
> -  * there. Typical case is math emulation support: we can use
> -  * it from userland as a synthetic FPU, but there is no sane
> -  * way to use it from kernel-based threads (Xenomai or Linux).
> -  */
> - if (!fp_kernel_supported())
> - 

Re: [Cobalt Xenoami3.1 PATCH 0/2] notify Xenomai udpated clockfreq.

2022-06-02 Thread Jan Kiszka via Xenomai
On 27.05.22 08:22, Hongzhan Chen via Xenomai wrote:
> When there is refined tsc clock, notify Xenomai to apply it.
> Linux may schedule a delayed work to refine tsc clock and update
> tsc_khz which happen after Xenomai finsih init but tsc_scale and
> tsc_shift still keep the value depending on origianl tsc clock
> which is outdated. The difference between two clocks may cause
> timing issue.
> 
> For example:
>   [ 0.001731] tsc: Detected 2899.886 MHz TSC
>   [ 5.588387] tsc: Refined TSC clocksource calibration: 2903.999 MHz
>   cat /sys/module/xenomai/parameters/clockfreq
>   2899886000
>   After patching, we like to use 2903.999 MHz.
> 
> The patchset includes IPIPE patch and cobalt-patch.
> 

Sounds reasonable, but you could help me with reviewing this by already
answering:

 - How does dovetail (and xenomai 3.2 or evl) address this?
 - Why is this tagged "3.1" only?
 - Which I-pipe series is this targeting (5.4, or also 4.19)?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [RFC] Rust API for evl

2022-05-31 Thread Jan Kiszka via Xenomai
On 31.05.22 15:37, Philippe Gerum via Xenomai wrote:
> 
> I've been getting my feet wet with Rust for a few weeks now, assessing
> the real-time latency figures I could get from an existing (C++)
> application once fully rewritten in this language.  It turned out that
> performance was on par with the original implementation with memory
> safety on top, among other upsides (like having quite some fun coding in
> Rust in the first place).
> 
> Having Rust as a Tier 1 language for Xenomai4/evl along with the
> existing C interface definitely makes sense to me. The goal would be to
> have a 'revl' interface providing the EVL services the idiomatic Rust
> way, available as a crate on top of the FFI bindings to libevl which
> have just landed [1].
> 
> Whether you are a Rustacean or not, if you are willing to discuss and
> help with this, let me know.
> 
> [1] https://source.denx.de/Xenomai/xenomai4/evl-sys.git
> 

I think this is a very interesting experiment, and I'm excited to see
that there are apparently no RT traps in the runtime. Do you have some
example code somewhere as well?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Reminder: Xenomai community call on Wednesday, Jun 1, 2022, UTC 7:00 AM

2022-05-31 Thread Jan Kiszka via Xenomai
On 01.06.22 05:22, Chen, Hongzhan via Xenomai wrote:
> 
> We have the Xenomai community call today.
> 
> Topics may include but are not limited to upstream/downstream project 
> plans, status updates, and technical discussions. It's an open online 
> meeting that anyone can join and ask questions.
> 

Unfortunately, I won't be able to join today.

Jan

PS: Looking forward to meeting a few of you in person at the high
altitude rt workshop next week!

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] cobalt: ipipe: intr: Fix return value check of ipipe_set_irq_affinity

2022-05-24 Thread Jan Kiszka via Xenomai
On 20.05.22 16:29, Gunter Grau via Xenomai wrote:
> From: Julian Haller 
> 
> ipipe_set_irq_affinity directly returns the value of the regular
> irq_set_affinity method. As described in irq.h in the linux kernel,
> the following return values indicate a success:
> 
> /*
>  * Return value for chip->irq_set_affinity()
>  *
>  * IRQ_SET_MASK_OK  - OK, core updates irq_common_data.affinity
>  * IRQ_SET_MASK_NOCPY   - OK, chip did update irq_common_data.affinity
>  * IRQ_SET_MASK_OK_DONE - Same as IRQ_SET_MASK_OK for core. Special code to
>  *support stacked irqchips, which indicates skipping
>  *all descendent irqchips.
>  */
> enum {
> IRQ_SET_MASK_OK = 0,
> IRQ_SET_MASK_OK_NOCOPY,
> IRQ_SET_MASK_OK_DONE,
> };
> 
> As one example, the GIC in i.MX6 devices returns IRQ_SET_MASK_OK_DONE
> on success. Fix the xintr_attach function by treating all positive
> return values as success.
> 
> Signed-off-by: Gunter Grau 
> ---
>  kernel/cobalt/ipipe/intr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/cobalt/ipipe/intr.c b/kernel/cobalt/ipipe/intr.c
> index 378c7f07d..cb15597f7 100644
> --- a/kernel/cobalt/ipipe/intr.c
> +++ b/kernel/cobalt/ipipe/intr.c
> @@ -869,7 +869,7 @@ int xnintr_attach(struct xnintr *intr, void *cookie, 
> const cpumask_t *cpumask)
>   return -EINVAL;
>   }
>   ret = ipipe_set_irq_affinity(intr->irq, *effective_mask);
> - if (ret)
> + if (ret < 0)
>   return ret;
>  #endif /* CONFIG_SMP */
>  

Thanks, applied to the 3.2 stable tree. Other versions are not affected.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] x86/fpu: fix compile error without kernel_fpu_disabled()

2022-05-24 Thread Jan Kiszka via Xenomai
On 16.05.22 17:59, Bezdeka, Florian via Xenomai wrote:
> On Mon, 2022-05-16 at 23:32 +0800, Jamie Huang via Xenomai wrote:
>> From: Jamie 
> 
> It's up to Jan but I guess a complete name would be nice here. Same
> fore the Signed-off-by below.

Philippe is maintaining the dovetail tree but, yes, we generally need a
real name to be able to check back regarding
https://developercertificate.org if any questions should arise in the
future.

Jan

> 
>>
>> In v5.18-evl-rebase, function kernel_fpu_disabled() has been removed in
>> commit 59f5ede3bc0f("x86/fpu: Prevent FPU state corruption"), so we will
>> get compile error when CONFIG_DOVETAIL is enabled:
>> arch/x86/kernel/fpu/core.c:931:6: error: implicit declaration of function 
>> ‘kernel_fpu_disabled’; did you mean ‘perf_pmu_disable’? 
>> [-Werror=implicit-function-declaration]
>>   if (kernel_fpu_disabled()) {
>>   ^~~
>>   perf_pmu_disable
>> cc1: all warnings being treated as errors
>> So, fix it.
>>
>> Signed-off-by: Jamie 
>> ---
>>  arch/x86/kernel/fpu/core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> index 6a0d1e7f278f..c3adbbb73226 100644
>> --- a/arch/x86/kernel/fpu/core.c
>> +++ b/arch/x86/kernel/fpu/core.c
>> @@ -928,7 +928,7 @@ void fpu__suspend_inband(void)
>>   * preemption of an inband kernel context currently using the
>>   * fpu by a thread which resumes on the oob stage.
>>   */
>> -if (kernel_fpu_disabled()) {
>> +if (this_cpu_read(in_kernel_fpu)) {
>>  save_fpregs_to_fpstate(kfpu);
>>  __cpu_invalidate_fpregs_state();
>>  oob_fpu_set_preempt(>thread.fpu);
> 
> Reviewed-By: Florian Bezdeka 
> 
> I noticed that while investigating the FPU test issue that I already
> reported, but missed that we already have a dovetail branch which is
> affected.
> 
> Non-Git reference would be
> https://lore.kernel.org/lkml/20220501193102.588689...@linutronix.de/
> 
> 
> Best regards,
> Florian Bezdeka
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Machine freezes under Ubuntu 20.04

2022-05-24 Thread Jan Kiszka via Xenomai
On 16.05.22 16:22, Arturo Laurenzi wrote:
> Il giorno lun 16 mag 2022 alle ore 15:56 Arturo Laurenzi
>  ha scritto:
>>
>>> The call-stack is not reported as fully reliable. Are you running with
>>> CONFIG_DEBUG_INFO=y? Do you have CONFIG_UNWINDER_ORC=y?
>>>
>>> Assuming it is reliable, we may try to run some irq-work that no longer
>>> exists. But that's speculation.
>>>
>>> What may help here is ftrace dump on panic, see
>>> https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#ftrace-dump-on-oops
>>>
>>> Jan
>>
>> Hi Jan, all,
>> sorry for getting back to you after so long. Our test machine hasn't
>> been available for a period of time, due to independent reasons.
>> We now enable debug information when building the kernel, as well as
>> ftrace support. You might want to check the attached .config
>> for correctness.
> 
> Sorry, missing attachment!
> 
>> We add the cmd line parameter that you suggested
>> (ftrace_dump_on_oops), and enable function_graphs as current tracer.
>> Here's the resulting serial dump.
>>
>> [  444.320303] kernel tried to execute NX-protected page - exploit
>> attempt? (uid: 1000)
>> [  444.320306] BUG: unable to handle page fault for address: 963f5a327040
>> [  444.320309] #PF: supervisor instruction fetch in kernel mode
>> [  444.320311] #PF: error_code(0x0011) - permissions violation
>> [  444.320313] PGD 44e001067 P4D 44e001067 PUD 800181e3
>> [  444.320323] Oops: 0011 [#1] SMP PTI IRQ_PIPELINE
>> [  444.320326] CPU: 7 PID: 4206 Comm: xbot2-core Not tainted
>> 5.10.89-xeno-ipipe-3.1+ #7
>> [  444.320328] Hardware name:  /TS175, BIOS BQKLR112 07/04/2017
>> [  444.320330] IRQ stage: Linux
>> [  444.320333] RIP: 0010:0x963f5a327040
>> [  444.320336] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 <00> 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 08 a4 8
>> [  444.320338] RSP: 0018:981f00238f90 EFLAGS: 00010202
>> [  444.320343] RAX: 963f5a327040 RBX: 981f0339fdd0 RCX: 
>> 
>> [  444.320346] RDX: 981f0339fdd8 RSI: 96421fd8 RDI: 
>> 981f0339fdd0
>> [  444.320348] RBP: 0339fe70 R08: 0048 R09: 
>> 963f349f3000
>> [  444.320350] R10: 0002 R11: 963f349e0a60 R12: 
>> 0339fe70
>> [  444.320353] R13:  R14: 0024 R15: 
>> 
>> [  444.320355] FS:  7f52856c5600() GS:96421fd8()
>> knlGS:
>> [  444.320358] CS:  0010 DS:  ES:  CR0: 80050033
>> [  444.320360] CR2: 963f5a327040 CR3: 00019a0d4005 CR4: 
>> 003706e0
>> [  444.320363] DR0:  DR1:  DR2: 
>> 
>> [  444.320365] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [  444.320368] Call Trace:
>> [  444.320370]  
>> [  444.320372]  ? irq_work_single+0x2c/0x40
>> [  444.320375]  ? irq_work_run_list+0x2d/0x40
>> [  444.320377]  ? irq_work_run+0x14/0x30
>> [  444.320380]  ? inband_work_interrupt+0xa/0x10
>> [  444.320382]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320384]  ? handle_synthetic_irq+0x61/0xf0
>> [  444.320387]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320389]  ? asm_call_irq_on_stack+0x12/0x20
>> [  444.320391]  
>> [  444.320394]  ? arch_do_IRQ_pipelined+0xbe/0x140
>> [  444.320396]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320399]  ? sync_current_irq_stage+0x1af/0x230
>> [  444.320401]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320403]  ? __inband_irq_enable+0x47/0x50
>> [  444.320406]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320408]  ? _raw_spin_unlock_irqrestore+0x1e/0x20
>> [  444.320411]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320413]  ? __set_cpus_allowed_ptr+0xa1/0x230
>> [  444.320415]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320418]  ? sched_setaffinity+0x1b0/0x290
>> [  444.320420]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320423]  ? __x64_sys_sched_setaffinity+0x4e/0x90
>> [  444.320425]  ? ftrace_graph_caller+0xa0/0xa0
>> [  444.320427]  ? do_syscall_64+0x3f/0x90
>> [  444.320430]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  444.320432] Modules linked in: fuse rtpacket binfmt_misc nls_ascii
>> nls_cp437 vfat fat i915 rt_e1000e i2c_algo_bit evdev drm_kms_helper
>> cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fon
>> [  444.320553] Dumping ftrace buffer:
>> [  444.320556] -
>> [  444.320558] CPU:5 [LOST 200196842 EVENTS]
>> [  444.320561]  5)   0.903 us|  } /* down_write */
>> [  444.320563]  5)   0.255 us|  
>> anon_vma_interval_tree_insert();
>> [  444.320566]  5)   0.175 us|  up_write();
>> [  444.320568]  5)   7.399 us|} /* anon_vma_clone */
>> [  444.320571]  5)   |kmem_cache_alloc() {
>> [  444.320573]  5)   |  _cond_resched() {
>> [  444.320576]  5)   

Re: Wrong CPU count on Apollo Lake platform

2022-05-24 Thread Jan Kiszka via Xenomai
On 20.05.22 13:41, dabbede--- via Xenomai wrote:
> Dear Xenomai community,
> 
> I'm writing here because of the following issue: I have compiled a
> xenomai-patched kernel that, when used on an Intel Atom E3950 (Apollo
> Lake family), return the wrong cpu count (i.e. nproc returns 1 instead
> of 4, and also lscpu claims CPU(s): 1).
> The very same kernel, when run on an Intel i5-7440EQ, returns nproc=4,
> which is correct.
> I'm using CONFIG_GENERIC_CPU=y, and this, to my knowledge, should be a
> safer option to handle both Atom and i5 or other platforms. I've also
> tried to use CONFIG_MATOM or CONFIG_MCORE2 without noticeable
> differences.
> 
> I attach here the full config and the dmesg that I obtain on the two
> platforms. Comparing the logs I've noticed the message "BIOS bug, no
> explicit IRQ entries, using default mptable. (tell your hw vendor)"
> but I'm not sure whether this is the cause or an effect of the
> problem.
> 
> Can you help me suggesting which CONFIG option I should change?
> 

[0.057400] Processor #0 (Bootup-CPU)
[0.057403] BIOS bug, no explicit IRQ entries, using default mptable. (tell 
your hw vendor)
[0.057412] Processors: 1
[0.057416] smpboot: Allowing 1 CPUs, 0 hotplug CPUs

Seems first of all unrelated to Xenomai.

Did you already try to compile a Xenomai/I-pipe free kernel with 
defconfig settings, whether that is able to detect all CPUs? If not, you 
should really look for a BIOS update of your board. Or complain with its 
vendor about breaking standards and, thus, Linux.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: RTNet: sendto(): EAGAIN error

2022-05-24 Thread Jan Kiszka via Xenomai
On 13.05.22 14:51, Mauro S. via Xenomai wrote:
> Il 05/05/22 17:04, Mauro S. via Xenomai ha scritto:
>> Il 05/05/22 15:05, Jan Kiszka ha scritto:
>>> On 03.05.22 17:18, Mauro S. via Xenomai wrote:
>>>> Hi all,
>>>>
>>>> I'm trying to use RTNet with TDMA.
>>>>
>>>> I succesfully set up my bus:
>>>>
>>>> - 1GBps speed
>>>> - 3 devices
>>>> - cycle time 1ms
>>>> - timeslots with 200us offset
>>>>
>>>> I wrote a simple application that in parallel receives and sends UDP
>>>> packets on TDMA bus.
>>>>
>>>> - sendto() is done to the broadcast address, port 
>>>> - recvfrom() is done on the port 
>>>>
>>>> Application sends a small packet (5 bytes) in a periodic task with 1ms
>>>> period and prio 51. Receive is done in a non-periodic task with prio
>>>> 50.
>>>>
>>>> Application is running on all the three devices, and I can see packets
>>>> are sent and received correctly by all the devices.
>>>>
>>>> But after a while, all send() calls on all devices fails with error
>>>> EAGAIN.
>>>>
>>>> Could this error be related to some internal buffer/queue that becomes
>>>> full? Or am I missing something?
>>>
>>> When you get EAGAIN on sender side, cleanup of TX buffers likely failed,
>>> and the socket ran out of buffers to send further frames. That may be
>>> related to TX IRQs not making it. Check the TX IRQ counter on the
>>> sender, if it increases at the same pace as you send packets.
>>>
>>> Jan
>>>
>>
>> Thanks Jan for your fast answer.
>>
>> I forgot to mention that I'm using the rt_igb driver.
>>
>> I have only one IRQ field in /proc/xenomai/irq, counting both TX and RX
>>
>>   cat /proc/xenomai/irq | grep rteth0
>>    125: 0   0 2312152 0   rteth0-TxRx-0
>>
>> I did this test:
>>
>> * on the master I send a packet every 1ms in a periodic RT task
>> (period 1ms, prio 51) with my test app.
>>
>> * on the master I see an increment of about 2000 IRQs per second: I
>> guess 1000 are for my sent packets (1 packet every ms), and 1000 for
>> the TDMA sync packet. In fact I see the "rtifconfig" RX counter almost
>> stationary (only 8 packets every 2-3 seconds, refresh requests from
>> slaves?), TX counter incrementing in about 2000 packets per second.
>>
>> * on the two slaves (thet are running nothing) I observe the same rate
>> (about 2000 IRQs per second). I see the "rtifconfig" TX counter almost
>> stationary (only 4 packets every 2-3 seconds), RX counter incrementing
>> in about 2000 packets per second.
>>
>> * if I stop sending packets with my app, I can see all the rates at
>> about 1000 per second
>>
>> If I start send-receive on all the three devices, I can see a IRQ rate
>> around 4000 IRQs per second on all devices (1000 sync, 1000 send and
>> 1000 + 1000 receive).
>>
>> I observed that if I only send from master and receive on slaves the
>> problem does not appear. Or if I send/receive from all, but with a
>> packet every 2ms, the problem does not appear.
>>
>> Could be a CPU performance problem (4k IRQs per second are too much
>> for an Intel Atom x5-E8000 CPU @ 1.04GHz)?
>>
>>
>> Thanks in advance, regards
>>
> 
> Hi all,
> 
> I did further tests.
> 
> First of all I modified my code to wait the TDMA sync event before do a
> send. I'm doing it with RTMAC_RTIOC_WAITONCYCLE ioctl (the .h file that
> defines it is not exported in userland, I need to copy
> kernel/drivers/net/stack/include/rtmac.h file in my project dir to
> include it).
> 
> I send one broadcast packet each TDMA cycle (1ms) from each device
> (total 3 devices), and each device also receive the packets from the
> other two (I use two different sockets to send and receive).
> 
> The first problem that I detected is that the EAGAIN error happens
> anyway (only with less frequency): I expected to have this error
> disappearing, since I send one packet synced with TDMA cycle time, then
> the rtskbs queue should remain empty (or at most with a single packet
> queued). I tried to change the cycle time (2ms, then 4ms) but the
> problem remains.
> 
> The only mode that seems to don't have EAGAIN error (or at least have it
> really less frequently) is to send the packet every two TDMA cycles,
> independently of the cycle duration (

Re: [linux-dovetail][PATCH 0/3] Add TSC function supports and MFD driver for TGPIO

2022-05-18 Thread Jan Kiszka via Xenomai
On 18.05.22 19:45, Bezdeka, Florian (T CED SES-DE) wrote:
> On Tue, 2022-05-17 at 09:53 +0800, Zqiang via Xenomai wrote:
>> The following modifications are mainly to add TSC function for TGPIO
>> and add intel-ehl-gpio driver divides and initializes the PSE TGPIO
>> resources.
>>
>> Christopher Hall (1):
>>   x86/tsc: Add TSC support functions to support ART driven Time-Aware
>> GPIO
>>
>> D, Lakshmi Sowjanya (1):
>>   ptp: tgpio: PSE TGPIO crosststamp, counttstamp
>>
>> Raymond Tan (1):
>>   mfd: intel-ehl-gpio: Introduce MFD framework to PSE GPIO/TGPIO
> 
> I haven't looked at the details yet and I missed the Community meeting
> today where this topic might have been discussed.
> 
> I'm asking myself if the dovetail tree is the right tree to take those
> patches. Have these patches already been sent to upstream / Linux? If
> so, we should fetch them within the next rebase cycle, right?
> 
> If not: I feel we shouldn't apply them here and follow the "upstream
> first" approach.

Yup, those should be confirmed upstream first. I'm not seeing them in
Linus' tree yet, or anything related. I thought EHL upstreaming was
completed by now?

Once they are confirmed, backporting accepted patches to certain stable
trees is much safer than having to change APIs again once that feature
hits upstream later.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: 16550A: failed to get the IRQ line free

2022-05-09 Thread Jan Kiszka via Xenomai
On 06.05.22 19:32, C Smith via Xenomai wrote:
> I have three serial devices connected using 16550A.ko driver. The card
> is a Moxa PCI 4-port card, where all ports share IRQ 18.
> Several times per minute in dmesg I get :
> [Xenomai] xnintr_edge_vec_handler: failed to get the IRQ18 line free
> 
> Yet I don't think I am losing any serial packet data (I check that with CRCs).
> This is the code which tries to handle shared interrupts and generates
> the message:
> intr.c :
> static void xnintr_edge_vec_handler(unsigned int irq, void *cookie)
> ...
> if (counter > MAX_EDGEIRQ_COUNTER)
> printk(XENO_ERR "%s: failed to get the IRQ%d line free\n",
>__FUNCTION__, irq);
> 
> Does this message mean serial data can be corrupted, or is it harmless?
> Is there something I can test for you on my system?
> 
> I'm using Xenomai 3.1.2, ipipe kernel 4.19.229, X86-64.
> thanks.  -C Smith

This check is a heuristic to detect whether we have a continuously
firing IRQ source - or a buggy handler that always claims to have done
something. If we spent 128 loops in the handler loop without ever seeing
a single run without any handled IRQ, we terminate and pray that this
will not cause more harm than stay in the loop.

Granted, the longer the chain of shared handler gets, the more likely it
becomes that heavy IRQ load can trigger this case.

But maybe your problem has a different reason: Are you sure you have
edge-triggered IRQs here? Legacy INTx is usually rather level-triggered.
Checking the driver... we hard-code edge, hmm. Maybe because the
platform IRQs on x86 are edge. What does lspci -vv tell us about that
card? What does Linux report (/proc/interrupts) when binding it to a
normal driver? We likely need to make irqtype a module parameter or
configure it automatically.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 1/2] debian: Remove --enable-smp

2022-05-06 Thread Jan Kiszka via Xenomai
On 06.05.22 08:55, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>> Von: "Jan Kiszka" 
>> On 05.05.22 22:06, Richard Weinberger via Xenomai wrote:
>>> ping? :-)
>>>
>>
>> Thanks for the reminder - merged.
> 
> Did you also apply "[PATCH 2/2] doc: Remove references to --enable-smp"?
>  
>> But "configure --help" itself still talks about "--enable-smp". Should
>> like be changed to "--disable-smp" to reflect the default.
> 
> Hm, I thought (hoped?) --help is auto generated by autoconf.
> Will check.
> 

You series did not touch configure.ac.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 1/2] debian: Remove --enable-smp

2022-05-06 Thread Jan Kiszka via Xenomai
On 05.05.22 22:06, Richard Weinberger via Xenomai wrote:
> ping? :-)
> 

Thanks for the reminder - merged.

But "configure --help" itself still talks about "--enable-smp". Should
like be changed to "--disable-smp" to reflect the default.

Jan

> On Wed, Apr 13, 2022 at 2:05 PM Richard Weinberger via Xenomai
>  wrote:
>>
>> SMP is now enabled by default for all architectures.
>> No need to use --enable-smp anymore.
>>
>> Signed-off-by: Richard Weinberger 
>> ---
>>  debian/rules | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/debian/rules b/debian/rules
>> index 3fe6bece93c9..60094734630d 100755
>> --- a/debian/rules
>> +++ b/debian/rules
>> @@ -17,7 +17,6 @@ CONFIG_OPTS = --prefix=/usr \
>>  --includedir=/usr/include/xenomai \
>>  --mandir=/usr/share/man \
>>  --with-testdir=/usr/lib/xenomai/testsuite \
>> ---enable-smp \
>>  --enable-lazy-setsched \
>>  --enable-debug=symbols \
>>  --enable-dlopen-libs
>> --
>> 2.26.2
>>
>>
> 
> 


-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Can't compile userspace application with compiler switch -std=c++11 and Xenomai 3.2

2022-05-05 Thread Jan Kiszka via Xenomai
On 05.05.22 16:56, Grau, Gunter via Xenomai wrote:
> 
> Hi,
> 
> We are trying to port our application to Xenomai 3.2.
> The source is c++ and we use in some parts C++11 elements. Therefore the 
> compile switch is set to -std=c++11.
> When now compiling with this option we see following output when using 
> Xenomai 3.2:
> 
> In file included from 
> /home/delphi/sw/oss/IntelliVue-OSS-R18-FEATUREBUILD-5.4-Kernel-88/sdk-iv-monitor/target-sysroot/usr/xenomai/include/cobalt/semaphore.h:24,
>  from 
> /home/delphi/sw/oss/IntelliVue-OSS-R18-FEATUREBUILD-5.4-Kernel-88/sdk-iv-monitor/target-sysroot/usr/xenomai/include/cobalt/sys/cobalt.h:25,
>  from 
> /home/delphi/sw/oss/IntelliVue-OSS-R18-FEATUREBUILD-5.4-Kernel-88/sdk-iv-monitor/target-sysroot/usr/xenomai/include/copperplate/clockobj.h:113,
>  from 
> /home/delphi/sw/oss/IntelliVue-OSS-R18-FEATUREBUILD-5.4-Kernel-88/sdk-iv-monitor/target-sysroot/usr/xenomai/include/alchemy/timer.h:22,
>  from 
> /home/delphi/sw/oss/IntelliVue-OSS-R18-FEATUREBUILD-5.4-Kernel-88/sdk-iv-monitor/target-sysroot/usr/xenomai/include/alchemy/heap.h:22,
>  from ../PROG/Test.cpp:39:
> /target-sysroot/usr/xenomai/include/cobalt/uapi/kernel/urw.h: In function 
> 'void __try_read_start(const urw_t*, urwstate_t*)':
> /target-sysroot/usr/xenomai/include/cobalt/uapi/kernel/urw.h:57:19: error: 
> expected primary-expression before 'volatile'
>  #define READ_ONCE ACCESS_ONCE
>^~~
> /target-sysroot/usr/xenomai/include/cobalt/uapi/kernel/urw.h:64:10: note: in 
> expansion of macro 'READ_ONCE'
>   token = READ_ONCE(urw->sequence);
>   ^
> /target-sysroot/usr/xenomai/include/cobalt/uapi/kernel/urw.h:57:19: error: 
> expected ')' before 'volatile'
>  #define READ_ONCE ACCESS_ONCE
>^~~
> /target-sysroot/usr/xenomai/include/cobalt/uapi/kernel/urw.h:64:10: note: in 
> expansion of macro 'READ_ONCE'
>   token = READ_ONCE(urw->sequence);
>   ^
> 
> This worked with Xenomai 3.1.
> It looks like the issue is related to the usage of "typeof" in the 
> ACCESS_ONCE macro. This seems not to be allowed if you use -std option:
> https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate-Keywords
> I have now a workaround by defining a macro prior the inclusion of "heap.h":
> #define typeof __typeof__
> But I am not happy with this. Is there a way to do this better? Is -std now 
> not allowed with Xenomai?

The answer is lying in

https://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate-Keywords

So we should convert to __typeof__ in-place in the affected kernel
header, possibly also elsewhere. We do have several occurrences of
__typeof__ in interface headers already.

Patch(es) welcome.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH v2] utils/net/rtnet.in: fixes after shellcheck inspection

2022-05-05 Thread Jan Kiszka via Xenomai
On 05.05.22 17:12, Mauro S. wrote:
> Signed-off-by: Mauro Salvini 
> ---
>  utils/net/rtnet.in | 76 +++---
>  1 file changed, 38 insertions(+), 38 deletions(-)
> 
> diff --git a/utils/net/rtnet.in b/utils/net/rtnet.in
> index f81a7bb0a..9f136b804 100644
> --- a/utils/net/rtnet.in
> +++ b/utils/net/rtnet.in
> @@ -9,7 +9,7 @@ RTNETCFG="@sysconfdir@/rtnet.conf"
> 
>  debug_func() {
>  echo "$*"
> -    eval $*
> +    eval "$@"
>  }
> 
>  usage() {
> @@ -33,35 +33,35 @@ EOF
>  init_rtnet() {
>  modprobe rtnet >/dev/null || exit 1
>  modprobe rtipv4 >/dev/null || exit 1
> -    modprobe $RT_DRIVER $RT_DRIVER_OPTIONS >/dev/null || exit 1
> +    modprobe "$RT_DRIVER" "$RT_DRIVER_OPTIONS" >/dev/null || exit 1
> 
>  for dev in $REBIND_RT_NICS; do
> -    if [ -d /sys/bus/pci/devices/$dev/driver ]; then
> -    echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
> +    if [ -d /sys/bus/pci/devices/"$dev"/driver ]; then
> +    echo "$dev" > /sys/bus/pci/devices/"$dev"/driver/unbind
>  fi
> -    echo $dev > /sys/bus/pci/drivers/$RT_DRIVER/bind
> +    echo "$dev" > /sys/bus/pci/drivers/"$RT_DRIVER"/bind
>  done
> 
>  for PROTOCOL in $RT_PROTOCOLS; do
> -    modprobe rt$PROTOCOL >/dev/null || exit 1
> +    modprobe rt"$PROTOCOL" >/dev/null || exit 1
>  done
> 
> -    if [ $RT_LOOPBACK = "yes" ]; then
> +    if [ "$RT_LOOPBACK" = "yes" ]; then
>  modprobe rt_loopback >/dev/null || exit 1
>  fi
> 
> -    if [ $RTCAP = "yes" ]; then
> +    if [ "$RTCAP" = "yes" ]; then
>  modprobe rtcap >/dev/null || exit 1
>  fi
> 
> -    if [ $RT_LOOPBACK = "yes" ]; then
> +    if [ "$RT_LOOPBACK" = "yes" ]; then
>  $RTIFCONFIG rtlo up 127.0.0.1
>  fi
> 
> -    if [ $RTCAP = "yes" ]; then
> +    if [ "$RTCAP" = "yes" ]; then
>  ifconfig rteth0 up
>  ifconfig rteth0-mac up
> -    if [ $RT_LOOPBACK = "yes" ]; then
> +    if [ "$RT_LOOPBACK" = "yes" ]; then
>  ifconfig rtlo up
>  fi
>  fi
> @@ -74,9 +74,9 @@ init_rtnet() {
>  submit_cfg() {
>  case "$STATION_TYPE" in
>  master)
> -    $RTIFCONFIG rteth0 up $STATION_IP
> +    $RTIFCONFIG rteth0 up "$STATION_IP"
> 
> -    $TDMACFG rteth0 master $TDMA_CYCLE
> +    $TDMACFG rteth0 master "$TDMA_CYCLE"
>  eval "$TDMA_SLOTS"
> 
>  IPADDR=$STATION_IP
> @@ -96,7 +96,7 @@ submit_cfg() {
>  ADD_STAGE1_CMDS="ifconfig vnic0 up $STATION_IP"
> 
>  echo "$TDMA_SLOTS$ADD_STAGE1_CMDS" | \
> -    $RTCFG rteth0 add $RTCFG_CLIENT -stage1 -
> +    $RTCFG rteth0 add "$RTCFG_CLIENT" -stage1 -
>  ;;
>  backup-master)
>  if [ ! "$STATION_IP" = "" ]; then
> @@ -117,7 +117,7 @@ submit_cfg() {
>  fi
> 
>  echo "\$TDMACFG rteth0 detach;\$TDMACFG rteth0 master
> $TDMA_CYCLE -b $TDMA_BACKUP_OFFS;$TDMA_SLOTS$ADD_STAGE1_CMDS" | \
> -    $RTCFG rteth0 add $RTCFG_CLIENT -stage1 - $STAGE_2_OPT
> +    $RTCFG rteth0 add "$RTCFG_CLIENT" -stage1 - "$STAGE_2_OPT"
>  ;;
>  esac
> 
> @@ -142,16 +142,16 @@ start_master() {
>  #   Sync / Master Slot / + TDMA_OFFSET us / Slave 1 /
>  #   + TDMA_OFFSET us / Slave 2 / + TDMA_OFFSET us / ... / Slave n
> 
> -    $RTIFCONFIG rteth0 up $IPADDR $NETMASK_OPT
> +    $RTIFCONFIG rteth0 up "$IPADDR" $NETMASK_OPT
> 
> -    $TDMACFG rteth0 master $TDMA_CYCLE
> +    $TDMACFG rteth0 master "$TDMA_CYCLE"
>  $TDMACFG rteth0 slot 0 0
> 
>  OFFSET=$TDMA_OFFSET
>  for SLAVE in $TDMA_SLAVES; do
>  echo "\$TDMACFG rteth0 slot 0 $OFFSET;ifconfig vnic0 up
> \$IPADDR \$NETMASK_OPT" | \
> -    $RTCFG rteth0 add $SLAVE -stage1 - $STAGE_2_OPT
> -    OFFSET=$(($OFFSET+$TDMA_OFFSET))
> +    $RTCFG rteth0 add "$SLAVE" -stage1 - "$STAGE_2_OPT"
> +    OFFSET=$((OFFSET+TDMA_OFFSET))
>  done
>  else
>  # Get setup from TDMA_CONFIG file:
> @@ -185,12 +185,12 @@ start_master() {
>  # slot ...
>  #
> 
> -    if [ ! -r $TDMA_CONFIG ]; then
> +    if [ ! -r "$TDMA_CONFIG" ]; then
>  echo "Could not read $TDMA_CONFIG"
>  exit 1
>  fi
> 
> -    while read ARG1 ARG2 ARG3 ARG4 ARG5 ARG6; do
> +    while read -r ARG1 ARG2 ARG3 ARG4 ARG5 ARG6; do
>  case "$ARG1" in
>  "master:")
>  submit_cfg
> @@ -217,7 +217,7 @@ start_master() {
>  STATION_MAC="$ARG2"
>  ;;
>  "stage2")
> -    STATION_STAGE_2="$ARG2"
> +    STATION_STAGE_2_SRC="$ARG2"
>  ;;
>  "slot")
>  TDMA_SLOTS="$TDMA_SLOTS\$TDMACFG rteth0 slot $ARG2 $ARG3"
> @@ -233,13 +233,13 @@ start_master() {
>  TDMA_SLOTS="$TDMA_SLOTS;"
>  ;;
>  esac
> -    done < $TDMA_CONFIG
> +    done < "$TDMA_CONFIG"
>  submit_cfg
>  fi
> 
> -    ifconfig vnic0 up $IPADDR $NETMASK_OPT
> +    ifconfig vnic0 up "$IPADDR" $NETMASK_OPT
> 
> -    echo -n "Waiting for all slaves..."
> +    echo "Waiting for all slaves..."
>  $RTCFG 

Re: RTNet: sendto(): EAGAIN error

2022-05-05 Thread Jan Kiszka via Xenomai
On 03.05.22 17:18, Mauro S. via Xenomai wrote:
> Hi all,
> 
> I'm trying to use RTNet with TDMA.
> 
> I succesfully set up my bus:
> 
> - 1GBps speed
> - 3 devices
> - cycle time 1ms
> - timeslots with 200us offset
> 
> I wrote a simple application that in parallel receives and sends UDP
> packets on TDMA bus.
> 
> - sendto() is done to the broadcast address, port 
> - recvfrom() is done on the port 
> 
> Application sends a small packet (5 bytes) in a periodic task with 1ms
> period and prio 51. Receive is done in a non-periodic task with prio 50.
> 
> Application is running on all the three devices, and I can see packets
> are sent and received correctly by all the devices.
> 
> But after a while, all send() calls on all devices fails with error EAGAIN.
> 
> Could this error be related to some internal buffer/queue that becomes
> full? Or am I missing something?

When you get EAGAIN on sender side, cleanup of TX buffers likely failed,
and the socket ran out of buffers to send further frames. That may be
related to TX IRQs not making it. Check the TX IRQ counter on the
sender, if it increases at the same pace as you send packets.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] utils/net/rtnet.in: fixes after shellcheck inspection

2022-05-05 Thread Jan Kiszka via Xenomai
On 03.05.22 10:22, Mauro S. via Xenomai wrote:
> Signed-off-by: Mauro Salvini 
> ---
>  utils/net/rtnet.in | 76 +++---
>  1 file changed, 38 insertions(+), 38 deletions(-)
> 
> diff --git a/utils/net/rtnet.in b/utils/net/rtnet.in
> index f81a7bb0a..06e796cd2 100644
> --- a/utils/net/rtnet.in
> +++ b/utils/net/rtnet.in
> @@ -9,7 +9,7 @@ RTNETCFG="@sysconfdir@/rtnet.conf"
> 
>  debug_func() {
>  echo "$*"
> -    eval $*
> +    eval "$*"

That may not work for all inputs as it breaks up formerly quoted
arguments and separates them by $IFS. You rather want "$@" to preserve
their structure.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] doc: Fix coreclk typo

2022-05-05 Thread Jan Kiszka via Xenomai
On 29.04.22 21:14, Richard Weinberger via Xenomai wrote:
> It's coreclk not coreclck.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  doc/asciidoc/MIGRATION.adoc | 6 +++---
>  doc/asciidoc/man1/autotune.adoc | 8 
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/asciidoc/MIGRATION.adoc b/doc/asciidoc/MIGRATION.adoc
> index dce7f40e9cbe..7a32b992183d 100644
> --- a/doc/asciidoc/MIGRATION.adoc
> +++ b/doc/asciidoc/MIGRATION.adoc
> @@ -226,11 +226,11 @@ of the Xenomai core clock:
>  
>  --
>  /* change the user gravity (default) */
> -# echo 3000 > /proc/xenomai/clock/coreclck
> +# echo 3000 > /proc/xenomai/clock/coreclk
>  /* change the IRQ gravity */
> -# echo 1000i > /proc/xenomai/clock/coreclck
> +# echo 1000i > /proc/xenomai/clock/coreclk
>  /* change the user and kernel gravities */
> -# echo "2000u 1000k" > /proc/xenomai/clock/coreclck
> +# echo "2000u 1000k" > /proc/xenomai/clock/coreclk
>  --
>  
>  +interfaces+ removed::
> diff --git a/doc/asciidoc/man1/autotune.adoc b/doc/asciidoc/man1/autotune.adoc
> index 3462e6b3fb91..2f9866c4d52a 100644
> --- a/doc/asciidoc/man1/autotune.adoc
> +++ b/doc/asciidoc/man1/autotune.adoc
> @@ -125,17 +125,17 @@ estimation. Although this delay may vary across 
> hardware platforms,
>  running for 30 seconds is common.
>  
>  Once the gravity values are known for a particular hardware, one may
> -write them to +/proc/xenomai/clock/coreclck+ from some system init
> +write them to +/proc/xenomai/clock/coreclk+ from some system init
>  script to set up the Xenomai core clock accordingly, instead of
>  running the auto-tuner after each boot e.g:
>  
>  --
>  /* change the user gravity to 1728 ns (default) */
> -# echo 1728 > /proc/xenomai/clock/coreclck
> +# echo 1728 > /proc/xenomai/clock/coreclk
>  /* change the IRQ gravity to 129 ns */
> -# echo 129i > /proc/xenomai/clock/coreclck
> +# echo 129i > /proc/xenomai/clock/coreclk
>  /* change the user and kernel gravities to 1728 and 907 ns resp. */
> -# echo "1728u 907k" > /proc/xenomai/clock/coreclck
> +# echo "1728u 907k" > /proc/xenomai/clock/coreclk
>  --
>  
>  Alternatively, the gravity values can be statically defined in the

thanks, applied

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Interrupt handler illicit call

2022-05-02 Thread Jan Kiszka via Xenomai
On 02.05.22 20:08, C Smith wrote:
> On Sun, May 1, 2022 at 11:25 PM Jan Kiszka  wrote:
>>
>> On 29.04.22 21:01, Richard Weinberger via Xenomai wrote:
>>> On Fri, Apr 29, 2022 at 9:04 AM C Smith via Xenomai  
>>> wrote:
>>>> int Lp_port_handler(rtdm_irq_t *irq_handle_p)
>>>> {
>>>>static int err;
>>>>unsigned long next;
>>>>rtdm_irq_t *handle_p;
>>>>
>>>>handle_p = rtdm_irq_get_arg(irq_handle_p, rtdm_irq_t);
>>>>
>>>>next = rtdm_clock_read();
>>>>// do some timing calculations with 'next' var here ...
>>>>err = rtdm_irq_enable(handle_p);   //re-enable this for subsequent 
>>>> interrupts
>>>
>>> You don't need this.
>>> Unconditionally enabling the interrupt line will confuse the IRQ subsystem.
>>>
>>
>> ...and it was never supported in primary mode. If it worked, then by
>> chance and by disabling related checks. Enable/disable are no masking APIs.
>>
>> Jan
>>
>>>>   return 0;
>>>
>>> Please use RTDM_IRQ_HANDLED here instead of raw values.
>>>
>>>> }
>>>
>>> Thanks,
>>> //richard
> 
> I was thinking the rtdm_irq_enable() was like an STI, but that was a
> mistake.  I looked at the 16550A and CAN drivers for examples of
> interrupt handlers, and indeed I found I must return RTDM_IRQ_HANDLED
> too like Richard says, or Xenomai de-registers the interrupt since no
> one appears to handle it. There doesn't seem to be an interrupt
> example in Xenomai 3.1 or Xeno3.2 sources. Perhaps I should submit a
> demo interrupt handler module as a patch?

Well, we have quite a few real drivers in the codebase. Maybe rather
enable one to serve a reference, e.g. by adding a document that
describes the relevant bits a bit more verbose? The UART drivers my
serve as input here as they do not come with an own core.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Interrupt handler illicit call

2022-05-02 Thread Jan Kiszka via Xenomai
On 29.04.22 21:01, Richard Weinberger via Xenomai wrote:
> On Fri, Apr 29, 2022 at 9:04 AM C Smith via Xenomai  
> wrote:
>> int Lp_port_handler(rtdm_irq_t *irq_handle_p)
>> {
>>static int err;
>>unsigned long next;
>>rtdm_irq_t *handle_p;
>>
>>handle_p = rtdm_irq_get_arg(irq_handle_p, rtdm_irq_t);
>>
>>next = rtdm_clock_read();
>>// do some timing calculations with 'next' var here ...
>>err = rtdm_irq_enable(handle_p);   //re-enable this for subsequent 
>> interrupts
> 
> You don't need this.
> Unconditionally enabling the interrupt line will confuse the IRQ subsystem.
> 

...and it was never supported in primary mode. If it worked, then by
chance and by disabling related checks. Enable/disable are no masking APIs.

Jan

>>   return 0;
> 
> Please use RTDM_IRQ_HANDLED here instead of raw values.
> 
>> }
> 
> Thanks,
> //richard
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] utils/net/rtnet.in: add delay during master configuration

2022-04-22 Thread Jan Kiszka via Xenomai
On 22.04.22 16:25, Mauro S. via Xenomai wrote:
> Some cards are slow to get the connection link up after the
> "rtifconfig rteth0 up" command, e.g. on an Atom-x5 with an Intel I210
> (rt_igb driver) I detected approximately 3 seconds to get the link up.
> 
> On master, the "rtifconfig rteth0 up" is followed by TDMA configuration and
> start. After the TDMA start, the sync packet is sent at the defined
> cycle time.
> 
> Sometimes, after "rtnet start", the dmesg fills with this error:
> 
>   TDMA: Failed to transmit sync frame!
> 
> and the rt driver locks. Then, the kernel watchdog is triggered and the
> NIC is hw-reset by the kernel, producing more errors and another lock.
> Sometimes the dmesg only fills with the error message and the NIC does
> not lock.
> This happens because the interface is not up and ready to handle
> the sync packets when TDMA is started.
> 
> This patch introduces a configurable delay between the "rtifconfig
> rteth0 up"
> and the TDMA start on master host. This allows to avoid these kind of
> problems.
> 

Thanks for the enhancement! Somehow the patch was attached, rather than
inlined, and that results in it being dropped from the distributed
emails. So I'm pasting it here.

We need a legal "Signed-off-by" line (https://developercertificate.org/).

> ---
>  utils/net/rtnet.in | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/utils/net/rtnet.in b/utils/net/rtnet.in
> index f81a7bb0a..8c2dcdf6e 100644
> --- a/utils/net/rtnet.in
> +++ b/utils/net/rtnet.in
> @@ -15,15 +15,19 @@ debug_func() {
>  usage() {
>  cat << EOF
>  Usage:
> -$0 [-cf ] [-v] [-c] {start|stop}
> +$0 [-cf ] [-d ] [-v] [-c] {start|stop}
>   Start or stop station according to configuration file
>  
> -$0 [-cf ] [-v] [-c] master  [ ...]
> +$0 [-cf ] [-d ] [-v] [-c] master  
> [ ...]
>   Start station as master for given list of slaves
>  
>  $0 [-cf ] [-v] capture
>   Start only passive realtime capturing
>  
> +The parameter -d allows to introduce a delay in seconds between the
> + "rtifconfig rtethX up" command and the TDMA start on the host
> + configured as master. Useful to avoid errors/card locks when the
> + RT NIC is slow to get the link up.
>  The additional switch -v enables verbose output.
>  The additional switch -c enables capturing mode to allow use of a network
>   analyzer such as Wireshark (if rtnet was built with --enable-rtcap).
> @@ -76,6 +80,10 @@ submit_cfg() {
>   master)
>   $RTIFCONFIG rteth0 up $STATION_IP
>  
> + if [ -n "$UPDELAY" ]; then
> + sleep $UPDELAY
> + fi
> +
>   $TDMACFG rteth0 master $TDMA_CYCLE
>   eval "$TDMA_SLOTS"
>  
> @@ -144,6 +152,10 @@ start_master() {
>  
>   $RTIFCONFIG rteth0 up $IPADDR $NETMASK_OPT
>  
> + if [ -n "$UPDELAY" ]; then
> + sleep $UPDELAY
> + fi
> +
>   $TDMACFG rteth0 master $TDMA_CYCLE
>   $TDMACFG rteth0 slot 0 0
>  
> @@ -258,6 +270,11 @@ else
>  exit 1
>  fi
>  
> +if [ "$1" = "-d" ]; then
> +UPDELAY="$2"
> +shift 2
> +fi
> +
>  if [ "$1" = "-v" ]; then
>  echo "Turning on verbose mode"
>  RTIFCONFIG="debug_func $RTIFCONFIG"
> -- 
> 2.17.1
> 

Pragmatic approach, I'm fine with it - as we have no proper interface to
read back the current link state (at least as far as I remember).

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Machine freezes under Ubuntu 20.04

2022-04-19 Thread Jan Kiszka via Xenomai
On 19.04.22 12:02, Arturo Laurenzi wrote:
> Sorry for the delayed answer, it took us some time to instrument our
> setup for broadcasting the kernel output over serial,
> and now we have some interesting results.
> See below.
> 
>> On 05.04.22 15:43, Arturo Laurenzi wrote:
 On 04.04.22 15:21, Arturo Laurenzi via Xenomai wrote:
>>>
>
> Recently, we have started a transition towards Ubuntu 20.04, and things
> have started to break.
>
> The first attempt was to install kernel 5.4.151 and stick to ipipe. Under
> this setup, we experience issues even before starting our applications. We
> have seen random crashes while compiling with GCC, sporadic "System 
> Program
> Problem Detected" popups by Ubuntu, and others. We even tried to 
> re-install
> OS and kernel from scratch with no luck.

 A reference setup for this kernel line can be found in xenomai-images
 (https://source.denx.de/Xenomai/xenomai-images). Would be good to
 understand which deviation from it makes the difference for which
 component (see also further questions below).
>>>
>>> I'm attaching the config we're using (from /boot/config-$(uname -r)).
>>> If that makes sense, we're going to try to configure the kernel
>>> according to this file
>>> (https://source.denx.de/Xenomai/xenomai-images/-/blob/master/recipes-kernel/linux/files/amd64_defconfig).
>>> What kernel version do you recommend to try?
>>>
>>
>> Always the latest of the individual kernel series.
> 
> We still have to test the reference .config file, as we gave higher
> priority to the kernel output over serial stuff.
> 
>
> The second attempt was to stick to our old kernel 4.19.140. All the weird
> issues disappear and the system is stable. However, we are unable to have
> the system pass our suite of "stress tests", which basically involve 
> starting,
> running, and killing process B multiple times in a cyclic fashion, while
> process A runs in the background. After a short while (minutes), the whole
> system just hangs, forcing us to do an hard reset. Only once, we managed 
> to
> get this kernel oops after rebooting (journalctl -k -b -1 --no-pager).
>

 For reliably recording crashes, it is highly recommended to use a UART
 as kernel debug output.
>>>
>>> Will do ASAP and let you know.
> 
> Done, see below.
> 
> The third attempt was to try out kernel 5.10.89 plus the new dovetail
> patch, and Xenomai v3.2.1. Again, all the weird issues are gone and the
> system is stable. However, we are unable to have the system pass our suite
> of "stress tests". Differently from 4.19-ipipe, the system resists for a
> longer time before hanging (few hours sometimes), but this also varies a
> lot.
>
> After some more investigation, we found out something interesting. By
> removing the code that interacts with Process A, Process B is then able to
> run "forever" (overnight at least), but *only if Process A is not 
> running*.
> Otherwise, the system will hang. In other words, the mere presence of
> Process A is affecting Process B, even though both IDDP and ZMQ have been
> removed from B and replaced with fake data. Furthermore, the system does
> not freeze if we set B1's scheduling policy to SCHED_OTHER.

 Do you have the Xenomai watchdog enabled, thus will you be able to tell
 RT application "hangs" (infinite loop at high prio) apart from real
 hangs/crashes?
>>>
>>> Yes. When we try a while(true) inside a RT context, we see the
>>> watchdog killing our application
>>> as expected.
>>>
>>>
>
> From these - rather heuristic - tests, it looks like there could be some
> coupling between unrelated processes which causes some sort of bug, that 
> is
> probably related to some interaction with mutexes/condvars, when these are
> used from a RT context. This issue shows up (or at least we have seen it)
> only under Ubuntu 20.04 (GCC 9.x), whereas a 18.04 build (GCC 7.x) looks
> fine.

 Ubuntu toolchains are known for agressively enabling certain security
 features. Maybe one that we didn't check yet flipped between 18.04 and
 20.04 - if that switch is only difference between working and
 non-working builds in your case. GCC itself should be fine, we are
 testing with gcc-10 via Debian 11 in our CI.

 Can you check whether the toolchain change breaks the kernel (kernel
 with old toolchain runs fine with userspace built via new toolchain)?
>>>
>>> We have tried this, and still the system freezes after a while. We
>>> followed the procedure that follows:
>>>  1) generate binaries for our "working" kernel 4.19.140-xeno-ipipe-3.1
>>> on a Ubuntu 18 machine (make deb-pkg)
>>>  2) copy the whole /usr/xenomai directory (compiled with the 18.04
>>> toolchain) to the test machine with Ubuntu 20.04
>>>  3) install the kernel binaries to the test 

[xenomai-images][PATCH 2/2] ci: Apply DNS hack also on test instances

2022-04-14 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

We've seen DNS issues with the AWS defaults also on the test jobs.
Therefore, expand the scope of the hack to all jobs by generalizing
.add-proxy-config to .common-config.

Signed-off-by: Jan Kiszka 
---
 ci/gitlab-ci-base.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ci/gitlab-ci-base.yml b/ci/gitlab-ci-base.yml
index 057b9dd..7c65ca0 100644
--- a/ci/gitlab-ci-base.yml
+++ b/ci/gitlab-ci-base.yml
@@ -26,8 +26,9 @@ variables:
 default:
   image: ghcr.io/siemens/kas/kas-isar:2.6.3
 
-.add-proxy-config:
+.common-config:
   before_script:
+- sudo sh -c "echo 'nameserver 8.8.8.8' > /etc/resolv.conf"
 - mkdir -p -m=700 ~/.ssh
 - if [ -n "$https_proxy" ]; then
   echo "ProxyCommand socat - PROXY:$(echo $https_proxy | sed 
's|.*://\([^:]*\).*|\1|'):%h:%p,proxyport=$(echo $https_proxy | sed 
's|.*:\([0-9]*\)$|\1|')" >> ~/.ssh/config;
@@ -35,16 +36,15 @@ default:
   fi
 
 .build:
-  extends: .add-proxy-config
+  extends: .common-config
   stage: build
   script:
-- sudo sh -c "echo 'nameserver 8.8.8.8' > /etc/resolv.conf"
 - echo "Building 
kas.yml:board-${TARGET}.yml${XENOMAI_BUILD_OPTION}${LINUX_BUILD_OPTION}${BUILD_OPTIONS}:opt-ci.yml"
 - kas build 
kas.yml:board-${TARGET}.yml${XENOMAI_BUILD_OPTION}${LINUX_BUILD_OPTION}${BUILD_OPTIONS}:opt-ci.yml
 - if [ -n "${USE_S3_BUCKET}" ]; then scripts/deploy_to_aws.sh ${TARGET}; fi
 
 .test:
-  extends: .add-proxy-config
+  extends: .common-config
   stage: test
   script:
 - scripts/install-lavacli.sh
-- 
2.34.1




[xenomai-images][PATCH 1/2] Update Isar revision

2022-04-14 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

Various fixes and better control over sstate.

Signed-off-by: Jan Kiszka 
---
 kas.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kas.yml b/kas.yml
index 62ddbb4..6b3b32d 100644
--- a/kas.yml
+++ b/kas.yml
@@ -22,7 +22,7 @@ repos:
 
   isar:
 url: https://github.com/ilbers/isar.git
-refspec: eeefa03185e3f259d08ff94295b0981a19eddf55
+refspec: a960a4e52c50ef4a15e3827685fa9cfffead
 layers:
   meta:
 
-- 
2.34.1




[xenomai-images][PATCH 0/2] Isar update and CI tuning

2022-04-14 Thread Jan Kiszka via Xenomai
See patches for details.

Jan

Jan Kiszka (2):
  Update Isar revision
  ci: Apply DNS hack also on test instances

 ci/gitlab-ci-base.yml | 8 
 kas.yml   | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.34.1




Re: [PATCH v2 0/9] Revive alchemy, pSOS and VxWorks tests

2022-04-14 Thread Jan Kiszka via Xenomai
On 14.04.22 17:41, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>>> Did the nucleus CPU scheduler guarantee that giving another task
>>> the same priority of the calling task will favour the caller?
>>> Now the gifted task seems to win.
>>
>> Did you configure with --enable-lazy-setsched? If not, set_prio should
>> send the caller to Linux, and that will definitely cause some scheduling
>> change.
> 
> Yes. -enable-lazy-setsched is set.
> 

Then it's better to trace than to speculate.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH v2 0/9] Revive alchemy, pSOS and VxWorks tests

2022-04-14 Thread Jan Kiszka via Xenomai
On 14.04.22 17:13, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>> Von: "Jan Kiszka" 
>>>   task5 fails:
>>> [9] at task-5.c:79
>>> [1] at task-5.c:23
>>> [10] at task-5.c:87
>>> [3] at task-5.c:40
>>> [11] at task-5.c:95
>>> [4] at task-5.c:45
>>> [5] at task-5.c:50
>>> [6] at task-5.c:55
>>> [2] at task-5.c:28
>>> [7] at task-5.c:60
>>>0"003.160| BUG in __traceobj_check_abort(): [FGND] wrong return 
>>> status:
>>>   task-5.c:63 => EINVAL (want OK)
> 
> 
> This failure is a little trickier.
> 
> Line 62 is:
> ret = rt_task_set_priority(_bgnd, info.prio + 1);
> ret is EINVAL, that's why the assert in line 63 fails.
> It fails because t_bgnd has already terminated.
> 
> This concurs also with the above marker [2].
> [2] is reached when t_bgnd is done.
> 
> The foreground task does:
> ret = rt_task_inquire(NULL, );
> traceobj_assert(, ret == 0 && info.prio == 21);
> 
> traceobj_mark(, 6);
> 
> ret = rt_task_set_priority(_bgnd, info.prio);
> traceobj_check(, ret, 0);
> 
> traceobj_mark(, 7);
> 
> ret = rt_task_set_priority(_bgnd, info.prio + 1);
> traceobj_check(, ret, 0);
> 
> traceobj_mark(, 8);
> 
> So it asks for it's own priority, it must be 21, that's okay.
> Then it raises the priority of t_bgnd from 20 to 21
> and assumes that no scheduling happens. But this seems to fail.
> 
> Did the nucleus CPU scheduler guarantee that giving another task
> the same priority of the calling task will favour the caller?
> Now the gifted task seems to win.

Did you configure with --enable-lazy-setsched? If not, set_prio should
send the caller to Linux, and that will definitely cause some scheduling
change.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH v2] testsuite: Add test for x86 port io

2022-04-14 Thread Jan Kiszka via Xenomai
On 13.04.22 23:59, Richard Weinberger via Xenomai wrote:
> Test case for the following regression:
> https://www.xenomai.org/pipermail/xenomai/2022-March/047451.html
> 
> Signed-off-by: Richard Weinberger 
> ---
> Changes since v1:
>   - Make sure to restore SA upon failure.
> ---
>  configure.ac   |  2 +
>  testsuite/smokey/Makefile.am   |  7 +++
>  testsuite/smokey/x86io/Makefile.am |  7 +++
>  testsuite/smokey/x86io/x86io.c | 77 ++
>  4 files changed, 93 insertions(+)
>  create mode 100644 testsuite/smokey/x86io/Makefile.am
>  create mode 100644 testsuite/smokey/x86io/x86io.c
> 
> diff --git a/configure.ac b/configure.ac
> index 019453793..62506de69 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -169,6 +169,7 @@ esac
>  
>  AC_MSG_RESULT([$target_cpu_arch])
>  XENO_TARGET_ARCH=$target_cpu_arch
> +AM_CONDITIONAL(XENO_X86,[test x$target_cpu_arch = xx86])
>  AC_ENABLE_SHARED
>  AC_PROG_LIBTOOL
>  
> @@ -1044,6 +1045,7 @@ AC_CONFIG_FILES([ \
>   testsuite/smokey/gdb/Makefile \
>   testsuite/smokey/y2038/Makefile \
>   testsuite/smokey/can/Makefile
> + testsuite/smokey/x86io/Makefile
>   testsuite/clocktest/Makefile \
>   testsuite/xeno-test/Makefile \
>   utils/Makefile \
> diff --git a/testsuite/smokey/Makefile.am b/testsuite/smokey/Makefile.am
> index 4a9773f58..79dc61e9f 100644
> --- a/testsuite/smokey/Makefile.am
> +++ b/testsuite/smokey/Makefile.am
> @@ -82,6 +82,10 @@ DIST_SUBDIRS = \
>   xddp\
>   y2038
>  
> +if XENO_X86
> +DIST_SUBDIRS += x86io
> +endif
> +
>  if XENO_COBALT
>  if CONFIG_XENO_LIBS_DLOPEN
>  COBALT_SUBDIRS += dlopen
> @@ -89,6 +93,9 @@ endif
>  if XENO_PSHARED
>  COBALT_SUBDIRS += memory-pshared
>  endif
> +if XENO_X86
> +COBALT_SUBDIRS += x86io
> +endif
>  wrappers = $(XENO_POSIX_WRAPPERS)
>  SUBDIRS = $(COBALT_SUBDIRS)
>  else
> diff --git a/testsuite/smokey/x86io/Makefile.am 
> b/testsuite/smokey/x86io/Makefile.am
> new file mode 100644
> index 0..d623af042
> --- /dev/null
> +++ b/testsuite/smokey/x86io/Makefile.am
> @@ -0,0 +1,7 @@
> +noinst_LIBRARIES = libx86io.a
> +
> +libx86io_a_SOURCES = x86io.c
> +
> +libx86io_a_CPPFLAGS =\
> + @XENO_USER_CFLAGS@  \
> + -I$(top_srcdir)/include
> diff --git a/testsuite/smokey/x86io/x86io.c b/testsuite/smokey/x86io/x86io.c
> new file mode 100644
> index 0..4db228660
> --- /dev/null
> +++ b/testsuite/smokey/x86io/x86io.c
> @@ -0,0 +1,77 @@
> +/*
> + * Test for a working iopl() after a regression on 5.15:
> + * https://www.xenomai.org/pipermail/xenomai/2022-March/047451.html
> + *
> + * Copyright (C) 2022 sigma star gmbh
> + * Author Richard Weinberger 
> + *
> + * Released under the terms of GPLv2.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PORT (0x378)
> +
> +static int saw_segv;
> +
> +static void *tfn(void *d)
> +{
> + struct sched_param schedp = {0};
> + int ret;
> +
> + schedp.sched_priority = 1;
> + __Terrno(ret, sched_setscheduler(0, SCHED_FIFO, ));
> +
> + (void)inb(PORT);
> +
> + return (void *)(unsigned long)ret;
> +}
> +
> +static void sgfn(int sig, siginfo_t *si, void *ctx)
> +{
> + saw_segv = 1;
> +}
> +
> +smokey_test_plugin(x86io, SMOKEY_NOARGS, "Check x86 port io");
> +
> +int run_x86io(struct smokey_test *t, int argc, char *const argv[])
> +{
> + struct sigaction sa, old_sa;
> + unsigned long ptret;
> + pthread_t pt;
> + int ret;
> +
> + memset(, 0, sizeof(sa));
> + sa.sa_sigaction = sgfn;
> + sa.sa_flags = SA_SIGINFO;
> + sigemptyset(_mask);
> +
> + if (!__Terrno(ret, sigaction(SIGSEGV, , _sa)))
> + goto out;
> +
> + if (!__Terrno(ret, iopl(3)))
> + goto out_restore;
> +
> + if (!__T(ret, pthread_create(, NULL, tfn, NULL)))
> + goto out_restore;
> +
> + if (!__T(ret, pthread_join(pt, (void *
> + goto out_restore;
> +
> +out_restore:
> + sigaction(SIGSEGV, _sa, NULL);
> +
> +out:
> + if (ret)
> + return -ret;
> + if (ptret)
> + return -ptret;
> + if (saw_segv)
> + return -EFAULT;
> + return 0;
> +}

OK, let's through this at our lab and see if anything explodes.

Thanks, applied.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 3/9] testsuite: Add a simple test driver for alchemytests

2022-04-14 Thread Jan Kiszka via Xenomai
On 13.04.22 23:58, Richard Weinberger via Xenomai wrote:
> In their current shape, every alchemy test has to be a single
> program and does not use the smokey test framework.
> 
> alchemytest_driver uses smokey and runs each test as new process.

Maybe rather call this "wrapper" or "loader" - driver reminded my first
of a kernel driver.

Jan

> 
> Signed-off-by: Richard Weinberger 
> ---
>  testsuite/alchemytests/Makefile.am  | 11 +++
>  testsuite/alchemytests/alchemytest_driver.c | 84 +
>  2 files changed, 95 insertions(+)
>  create mode 100644 testsuite/alchemytests/alchemytest_driver.c
> 
> diff --git a/testsuite/alchemytests/Makefile.am 
> b/testsuite/alchemytests/Makefile.am
> index 35df0d49c..9159a0b77 100644
> --- a/testsuite/alchemytests/Makefile.am
> +++ b/testsuite/alchemytests/Makefile.am
> @@ -146,3 +146,14 @@ task10_SOURCES = task-10.c
>  task10_CPPFLAGS = $(alchemycppflags)
>  task10_LDADD = $(alchemyldadd) -lpthread -lrt -lm
>  task10_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +alchemytest_driver_SOURCES = alchemytest_driver.c
> +alchemytest_driver_CPPFLAGS =\
> + $(XENO_USER_CFLAGS) \
> + -I$(top_srcdir)/include
> +alchemytest_driver_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +alchemytest_driver_LDADD =   \
> + ../../lib/smokey/libsmokey@CORE@.la \
> + @XENO_CORE_LDADD@   \
> + @XENO_USER_LDADD@   \
> + -lpthread -lrt
> diff --git a/testsuite/alchemytests/alchemytest_driver.c 
> b/testsuite/alchemytests/alchemytest_driver.c
> new file mode 100644
> index 0..45323507d
> --- /dev/null
> +++ b/testsuite/alchemytests/alchemytest_driver.c
> @@ -0,0 +1,84 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static char *mydir;
> +
> +#define TEST(name)   
>\
> + smokey_test_plugin(name, SMOKEY_NOARGS, "Run external test");   
>\
> + static int run_##name(struct smokey_test *t, int argc, char *const 
> argv[]) \
> + {   
>\
> + return __run_extprog(t, argc, argv);
>\
> + }
> +
> +static int __run_extprog(struct smokey_test *t, int argc, char *const argv[])
> +{
> + int ret;
> + char *tst_path;
> +
> + ret = asprintf(_path, "%s/%s --cpu-affinity=0", mydir, t->name);
> + if (ret == -1)
> + return -ENOMEM;
> +
> + ret = system(tst_path);
> + free(tst_path);
> +
> + return ret;
> +}
> +
> +TEST(alarm1)
> +TEST(buffer1)
> +TEST(event1)
> +TEST(heap1)
> +TEST(heap2)
> +TEST(mq1)
> +TEST(mq2)
> +TEST(mq3)
> +TEST(mutex1)
> +TEST(pipe1)
> +TEST(sem1)
> +TEST(sem2)
> +TEST(task1)
> +TEST(task2)
> +TEST(task3)
> +TEST(task4)
> +TEST(task5)
> +TEST(task6)
> +TEST(task7)
> +TEST(task8)
> +TEST(task9)
> +TEST(task10)
> +
> +int main(int argc, char *const argv[])
> +{
> + struct smokey_test *t;
> + int ret, fails = 0;
> +
> + if (argc > 0)
> + mydir = dirname(argv[0]);
> + else
> + mydir = ".";
> +
> + if (pvlist_empty(_test_list))
> + return 0;
> +
> + for_each_smokey_test(t) {
> + ret = t->run(t, argc, argv);
> + if (ret) {
> + fails++;
> + if (smokey_keep_going)
> + continue;
> + if (smokey_verbose_mode)
> + error(1, -ret, "test %s failed", t->name);
> + return 1;
> + }
> + smokey_note("%s OK", t->name);
> + }
> +
> + return fails != 0;
> +}

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/6] testsuite: Hook up alchemytests

2022-04-14 Thread Jan Kiszka via Xenomai
On 14.04.22 13:17, Richard Weinberger wrote:
> On Thu, Apr 14, 2022 at 1:12 PM Jan Kiszka via Xenomai
>  wrote:
>> With only up to here applied:
>>
>> make[2]: Entering directory 'xenomai/build/testsuite/alchemytests'
>> make[2]: *** No rule to make target 'alchemytest_driver.c', needed by
>> 'alchemytest_driver.o'.  Stop.
> 
> Ah, alchemytest_driver.c comes in a follow up patch.
> Will be fixed in a later series.

Upps, this was on v1 - but it seems v2 was not different in this regard.

> 
>>> +task10_SOURCES = task-10.c
>>> +task10_CPPFLAGS = $(alchemycppflags)
>>> +task10_LDADD = $(alchemyldadd) -lpthread -lrt -lm
>>> +task10_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
>>
>> Lots of repetitions. Can we use at least some macro for them, or can't
> 
> My automake-fu is weak. Does automake support something like that?

I would have to dig this up myself, but I would be surprised if there is
nothing like that.

> 
>> we assign CPPFLAGS, LDADD and LDFLAGS globally (in this file)?
> 
> I can give this a try. So far I didn't use global variables because building
> the tests and the test driver is slightly different and may change.
> 

Ah, that is the reason.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH v2 0/9] Revive alchemy, pSOS and VxWorks tests

2022-04-14 Thread Jan Kiszka via Xenomai
On 13.04.22 23:58, Richard Weinberger via Xenomai wrote:
> This patch series is a first attempt to integrate the currently abandoned
> alchemy, pSOS and VxWorks tests into Xenomai's test suite.
> Since each test assumes running as own process a test driver is needed
> which executes each tests separately.
> The driver makes use of the smokey framework.
> 
> Test results on a x86 VM (5.15.19, Xenomai master as of today):
> - Alchemy:
>   test2 fails:
>   [8] at task-2.c:71
>   [1] at task-2.c:24
>   [9] at task-2.c:79
>   [4] at task-2.c:43
>   [10] at task-2.c:87
>   [5] at task-2.c:48
>   [11] at task-2.c:92
>   [2] at task-2.c:29
>   [6] at task-2.c:52
>  0"022.972| BUG in __traceobj_check_abort(): [FGND] wrong return 
> status:
> task-2.c:55 => EINVAL (want OK)
> 
>   task5 fails:
>   [9] at task-5.c:79
>   [1] at task-5.c:23
>   [10] at task-5.c:87
>   [3] at task-5.c:40
>   [11] at task-5.c:95
>   [4] at task-5.c:45
>   [5] at task-5.c:50
>   [6] at task-5.c:55
>   [2] at task-5.c:28
>   [7] at task-5.c:60
>  0"003.160| BUG in __traceobj_check_abort(): [FGND] wrong return 
> status:
> task-5.c:63 => EINVAL (want OK)

Seems we need to fix those at least.

> 
>   If Xenomai was configured with --enable-lores-clock, tests mq1, mutex1, 
> pip1 and sem1 fail due to:
>   undefined symbol: __clockobj_ticks_to_timeout

Missing lib dependency, likely. Cannot try out myself yet as patch 2
does not build.

> 
> - pSOS (Xenomai has to be configured with --enable-lores-clock):

If we add this test to smokey, and it requires lores-clock, we either
need to enable that in debian/rules or the Debian package built in
xenomai-images so that CI/CT will not fail. Well, it will fail so far
due to the test case problems. But that would be next.

>   rn1 fails:
>   0"001.017| BUG in __traceobj_assert_failed(): [rn1] trace assertion 
> failed:
>   rn-1.c:46 => "ret == 0"
>   task2 fails:
>   [8] at task-2.c:73
>   [1] at task-2.c:23
>   [9] at task-2.c:81
>   [4] at task-2.c:44
>   [10] at task-2.c:89
>   [5] at task-2.c:49
>   [11] at task-2.c:94
>   [2] at task-2.c:28
>   [3] at task-2.c:33
>   [6] at task-2.c:53
>  0"004.756| BUG in __traceobj_assert_failed(): [FGND] trace assertion 
> failed:
> task-2.c:56 => "ret == 0"
> 
>   task6 fails:
>   [9] at task-6.c:79
>   [1] at task-6.c:22
>   [10] at task-6.c:87
>   [3] at task-6.c:39
>   [11] at task-6.c:95
>   [4] at task-6.c:44
>   [5] at task-6.c:49
>   [6] at task-6.c:54
>   [2] at task-6.c:27
>   [7] at task-6.c:59
>  0"002.870| BUG in __traceobj_assert_failed(): [FGND] trace assertion 
> failed:
> task-6.c:62 => "ret == 0 && oldprio == myprio"
> 
>   task8 runs forever (100% CPU)
> 
> - VxWorks:
>   task2 fails:
>   [8] at task-2.c:85
>   [1] at task-2.c:32
>   [9] at task-2.c:91
>   [4] at task-2.c:57
>   [10] at task-2.c:97
>   [5] at task-2.c:62
>   [11] at task-2.c:103
>   [2] at task-2.c:37
>   [12] at task-2.c:109
>   [6] at task-2.c:66
>  0"005.790| BUG in __traceobj_assert_failed(): [foregroundTask] trace 
> assertion failed:
> task-2.c:69 => "ret == 0"
> 

So... reviving means fixing first, unfortunately. Then enable CI/CT, and
then we can merge. The pattern looks scalable, but then we could also
apply it stepwise: first alchemy, then vxworks and finally psos with
that lores-clock topic.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/6] testsuite: Hook up alchemytests

2022-04-14 Thread Jan Kiszka via Xenomai
On 08.04.22 10:03, Richard Weinberger via Xenomai wrote:
> Build them using Xenomai's build system.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  configure.ac   |   1 +
>  testsuite/Makefile.am  |   6 +-
>  testsuite/alchemytests/Makefile.am | 148 +
>  3 files changed, 153 insertions(+), 2 deletions(-)
>  create mode 100644 testsuite/alchemytests/Makefile.am
> 

With only up to here applied:

make[2]: Entering directory 'xenomai/build/testsuite/alchemytests'
make[2]: *** No rule to make target 'alchemytest_driver.c', needed by
'alchemytest_driver.o'.  Stop.

> diff --git a/configure.ac b/configure.ac
> index 019453793..8fd86e5a1 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1046,6 +1046,7 @@ AC_CONFIG_FILES([ \
>   testsuite/smokey/can/Makefile
>   testsuite/clocktest/Makefile \
>   testsuite/xeno-test/Makefile \
> + testsuite/alchemytests/Makefile \
>   utils/Makefile \
>   utils/hdb/Makefile \
>   utils/can/Makefile \
> diff --git a/testsuite/Makefile.am b/testsuite/Makefile.am
> index 4932f6d33..e027485fb 100644
> --- a/testsuite/Makefile.am
> +++ b/testsuite/Makefile.am
> @@ -7,7 +7,8 @@ SUBDIRS +=\
>   gpiotest\
>   spitest \
>   switchtest  \
> - xeno-test
> + xeno-test   \
> + alchemytests
>  endif
>  
>  DIST_SUBDIRS =   \
> @@ -18,4 +19,5 @@ DIST_SUBDIRS =  \
>   smokey  \
>   spitest \
>   switchtest  \
> - xeno-test
> + xeno-test   \
> + alchemytests
> diff --git a/testsuite/alchemytests/Makefile.am 
> b/testsuite/alchemytests/Makefile.am
> new file mode 100644
> index 0..35df0d49c
> --- /dev/null
> +++ b/testsuite/alchemytests/Makefile.am
> @@ -0,0 +1,148 @@
> +testdir = @XENO_TEST_DIR@
> +
> +CCLD = $(top_srcdir)/scripts/wrap-link.sh $(CC)
> +
> +test_PROGRAMS = alchemytest_driver \
> + alarm1 \
> + buffer1\
> + event1 \
> + heap1  \
> + heap2  \
> + mq1\
> + mq2\
> + mq3\
> + mutex1 \
> + pipe1  \
> + sem1   \
> + sem2   \
> + task1  \
> + task2  \
> + task3  \
> + task4  \
> + task5  \
> + task6  \
> + task7  \
> + task8  \
> + task9  \
> + task10
> +
> +alchemycppflags =\
> + $(XENO_USER_CFLAGS) \
> + -I$(top_srcdir)/include
> +
> +alchemyldadd =   \
> + ../../lib/alchemy/libalchemy@CORE@.la   \
> + ../../lib/copperplate/libcopperplate@CORE@.la   \
> + @XENO_CORE_LDADD@   \
> + @XENO_USER_LDADD@   \
> + -lpthread -lrt -lm
> +
> +alarm1_SOURCES = alarm-1.c
> +alarm1_CPPFLAGS = $(alchemycppflags)
> +alarm1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +alarm1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +event1_SOURCES = event-1.c
> +event1_CPPFLAGS = $(alchemycppflags)
> +event1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +event1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +heap1_SOURCES = heap-1.c
> +heap1_CPPFLAGS = $(alchemycppflags)
> +heap1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +heap1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +heap2_SOURCES = heap-2.c
> +heap2_CPPFLAGS = $(alchemycppflags)
> +heap2_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +heap2_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +buffer1_SOURCES = buffer-1.c
> +buffer1_CPPFLAGS = $(alchemycppflags)
> +buffer1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +buffer1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +mutex1_SOURCES = mutex-1.c
> +mutex1_CPPFLAGS = $(alchemycppflags)
> +mutex1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +mutex1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +pipe1_SOURCES = pipe-1.c
> +pipe1_CPPFLAGS = $(alchemycppflags)
> +pipe1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +pipe1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +mq1_SOURCES = mq-1.c
> +mq1_CPPFLAGS = $(alchemycppflags)
> +mq1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +mq1_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +mq2_SOURCES = mq-2.c
> +mq2_CPPFLAGS = $(alchemycppflags)
> +mq2_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +mq2_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +mq3_SOURCES = mq-3.c
> +mq3_CPPFLAGS = $(alchemycppflags)
> +mq3_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +mq3_LDFLAGS = @XENO_AUTOINIT_LDFLAGS@
> +
> +sem1_SOURCES = sem-1.c
> +sem1_CPPFLAGS = $(alchemycppflags)
> +sem1_LDADD = $(alchemyldadd) -lpthread -lrt -lm
> +sem1_LDFLAGS = 

Re: [PATCH 0/6] Revive alchemy tests

2022-04-13 Thread Jan Kiszka via Xenomai
On 13.04.22 16:14, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>> Von: "Jan Kiszka" 
>> An: "richard" , "xenomai" 
>> Gesendet: Mittwoch, 13. April 2022 14:56:16
>> Betreff: Re: [PATCH 0/6] Revive alchemy tests
> 
>> On 08.04.22 10:03, Richard Weinberger via Xenomai wrote:
>>> This patch series is a first attempt to integrate the currently abandoned
>>> alchemy tests into Xenomai's test suite.
>>> Since each test assumes running as own process a test driver is needed
>>> which executes each tests separately.
>>> The driver makes use of the smokey framework.
>>>
>>
>> A valuable step forward. Just wondering, before actually taking it, if
>> that can be a pattern for the rest as well (psos, vxworks) or if there
>> is anything that should be considered in addition to allow them
>> following later on.
> 
> I think the same approach will work for psos and vxworks too.
> If you're fine with the proposed approach I'd prepare a v2 of this series
> with psos and vxworks included.

Thanks! Will have a look and let you know if I have remarks on details
as well.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Does Xenomai 3.2.x work with 4.14.x kernels?

2022-04-13 Thread Jan Kiszka via Xenomai
On 13.04.22 11:25, Scott Reed via Xenomai wrote:
> Hello,
> 
> I am trying to build a 4.14.110 kernel+ipipe+xenomai_3.2.7 and running into
> issues when trying to compile. Namely, the 4.14.x kernel does not seem
> to understand the "__kernel_timespec" struct.
> 
> Is it known if Xenomai 3.2.x works with 4.14.x kernels?

We stopped supporting 4.14.x actively, and we apparently broke backward
compatibility at some point. Locally likely fixable, but not
maintainable for upstream anymore.

> 
> Some background information
> ---
> Our platform is based on an ARM iMx6q.
> 
> I have tried to move to a 5.4.x kernel, but ran into another issue
> in that we have an RTDM driver which uses PCI MSI interrupts and
> when I try to register the interrupt with rtdm_irq_request the
> function returns EINVAL(-22).
> 
> When working with a vanilla 5.4.x kernel and non-RTDM version of
> the driver, there are no issues when registering the interrupt.
> 
> I started to dive into this problem, but saw that as of kernel
> 4.16, the PCI MSI interrupts for our platform have been changed
> to be handled as chained interrupts.
> 
> In this forum, I have seen multiple discussions and patches regarding
> chained interrupts for X86, but not for ARM and began to think that
> maybe chained interrupt support in xenomai is not complete for ARM.
> 
> Could this be the case?

In fact, those patches are for ARM and not x86 (the latter was always
fine). Please check latest ipipe-arm releases, or even dovetail (5.10+).

> 
> For this reason, I backed down to the 4.14.x kernel, but now have run
> into the compiling issue as mentioned at the start of this email.

When starting something new, do not use outdated kernels anymore.

> 
> One last piece of background information, my initial motivation to move
> to a new xenomai was that we moved to GCC 10.2 (in the meantime GCC 11.1)
> and I was concerned regarding the ARM r7 register clobbering issues/patches
> posted to the forum as I am fighting against a bug in our system where 
> registers get corrupted albeit not r7 (typically r0 with a value of 
> 0xFFFA (-38)). This problem seems to have come in after we upgraded
> to GCC 10.2

If you are in hurry and have a working version based on older things,
moving backward can be a valid intermediate options. But thinking ahead,
using a mixture of old (kernel) and new (xenomai, toolchain) components
is almost never a good idea: You are to far from mainstream, and piece
will fill off left and right.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Analogy supported DAQ

2022-04-13 Thread Jan Kiszka via Xenomai
On 12.04.22 18:36, Ivan F. Valerio via Xenomai wrote:
> Hello,
> 
> I am looking to use Xenomai's analogy to connect my application to a NI
> PCIe DAQ card. I found your list of supported hardware for analogy, and
> the only PCIe found in your list are no longer supported by National
> Instruments(PCIe-6251 & PCIe-6259). Are there any other PCIe cards
> supported by Analogy? How would I go about confirming compatibility with
> the newer NI PCIe cards?

Analogy was about to be removed from Xenomai because no one was
contributing updates to it in a long while, particularly -but not only-
drivers. One user raised his voice that at least the core is still used
(with out-of-tree drivers), so that move was postponed. In that light,
it is a bit unlikely that you get much more feedback.

We are surely open to new contributors "reviving" part of Analogy again,
specifically /wrt to newer hardware. The first step to assess the
efforts would likely be comparing older drivers and their manuals (or
kernel drivers, if any) to newer one, e.g. whether the register
interface is just an extension or rather something completely different.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 0/6] Revive alchemy tests

2022-04-13 Thread Jan Kiszka via Xenomai
On 08.04.22 10:03, Richard Weinberger via Xenomai wrote:
> This patch series is a first attempt to integrate the currently abandoned
> alchemy tests into Xenomai's test suite.
> Since each test assumes running as own process a test driver is needed
> which executes each tests separately.
> The driver makes use of the smokey framework.
> 

A valuable step forward. Just wondering, before actually taking it, if
that can be a pattern for the rest as well (psos, vxworks) or if there
is anything that should be considered in addition to allow them
following later on.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] configure.ac: Enable SMP by default

2022-04-13 Thread Jan Kiszka via Xenomai
On 08.04.22 10:59, Richard Weinberger via Xenomai wrote:
> In 2022 it is hard to find ARM or x86 systems without SMP.
> Other CPU architectures are no longer supported by Xenomai.
> So, make SMP support opt-out.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  configure.ac | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 019453793..867fc4636 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -385,12 +385,9 @@ if test x$use_registry = xy; then
>  fi
>  AM_CONDITIONAL(XENO_REGISTRY,[test x$use_registry = xy])
>  
> -dnl SMP support (default: on for cobalt/x86, off otherwise)
> +dnl SMP support (default: on)
>  
> -CONFIG_SMP=
> -if test $target_cpu_arch = x86 -a $rtcore_type = cobalt; then
> - CONFIG_SMP=y
> -fi
> +CONFIG_SMP=y
>  AC_MSG_CHECKING(for SMP support)
>  AC_ARG_ENABLE(smp,
>   AS_HELP_STRING([--enable-smp], [Enable SMP support]),

Thanks, applied.

We could now drop the explicit --enable-smp from debian/rules as well,
but that's nothing urgent.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 5/5] drivers: ipc: enable non-blocking write from regular threads

2022-04-13 Thread Jan Kiszka via Xenomai
On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
> From: Philippe Gerum 
> 
> Regular threads should be allowed to write to RTIPC sockets provided
> MSG_DONTWAIT is implicitly set for such a request. This would match
> the existing behavior with other synchronization objects, such as
> semaphores and events, avoiding unnecessary restrictions on usage.
> 
> Signed-off-by: Philippe Gerum 
> ---
>  kernel/drivers/ipc/bufp.c  | 8 ++--
>  kernel/drivers/ipc/iddp.c  | 8 ++--
>  kernel/drivers/ipc/rtipc.c | 4 ++--
>  kernel/drivers/ipc/xddp.c  | 6 +-
>  4 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/drivers/ipc/bufp.c b/kernel/drivers/ipc/bufp.c
> index fd533dba27..565409dd6f 100644
> --- a/kernel/drivers/ipc/bufp.c
> +++ b/kernel/drivers/ipc/bufp.c
> @@ -655,11 +655,15 @@ static ssize_t bufp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct bufp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __bufp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __bufp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __bufp_bind_socket(struct rtipc_private *priv,
> @@ -682,7 +686,7 @@ static int __bufp_bind_socket(struct rtipc_private *priv,
>   __test_and_set_bit(_BUFP_BINDING, >status))
>   ret = -EADDRINUSE;
>   cobalt_atomic_leave(s);
> - 
> +
>   if (ret)
>   return ret;
>  
> diff --git a/kernel/drivers/ipc/iddp.c b/kernel/drivers/ipc/iddp.c
> index a553902326..05d0193394 100644
> --- a/kernel/drivers/ipc/iddp.c
> +++ b/kernel/drivers/ipc/iddp.c
> @@ -255,7 +255,7 @@ static ssize_t __iddp_recvmsg(struct rtdm_fd *fd,
>   }
>  
>   /* We want to pick one buffer from the queue. */
> - 
> +
>   for (;;) {
>   ret = rtdm_sem_timeddown(>insem, timeout, toseq);
>   if (unlikely(ret)) {
> @@ -522,11 +522,15 @@ static ssize_t iddp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct iddp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __iddp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __iddp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __iddp_bind_socket(struct rtdm_fd *fd,
> diff --git a/kernel/drivers/ipc/rtipc.c b/kernel/drivers/ipc/rtipc.c
> index 859bdab2f2..211b496ec5 100644
> --- a/kernel/drivers/ipc/rtipc.c
> +++ b/kernel/drivers/ipc/rtipc.c
> @@ -428,7 +428,7 @@ static int rtipc_select(struct rtdm_fd *fd, struct 
> xnselector *selector,
>   struct xnselect *block;
>   spl_t s;
>   int ret;
> - 
> +
>   if (type != XNSELECT_READ && type != XNSELECT_WRITE)
>   return -EINVAL;
>  
> @@ -480,7 +480,7 @@ static struct rtdm_driver rtipc_driver = {
>   .read_rt=   rtipc_read,
>   .read_nrt   =   NULL,
>   .write_rt   =   rtipc_write,
> - .write_nrt  =   NULL,
> + .write_nrt  =   rtipc_write, /* MSG_DONTWAIT. */
>   .select =   rtipc_select,
>   },
>  };
> diff --git a/kernel/drivers/ipc/xddp.c b/kernel/drivers/ipc/xddp.c
> index ae5b720c0c..2ca0da5fd4 100644
> --- a/kernel/drivers/ipc/xddp.c
> +++ b/kernel/drivers/ipc/xddp.c
> @@ -657,11 +657,15 @@ static ssize_t xddp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct xddp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __xddp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __xddp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __xddp_bind_socket(struct rtipc_private *priv,

Picked this one as logically unrelated for next already.

Thanks,
Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 1/5] cobalt/events: add auto-clear feature

2022-04-13 Thread Jan Kiszka via Xenomai
On 07.04.22 19:09, Philippe Gerum wrote:
> 
> Jan Kiszka  writes:
> 
>> On 07.04.22 13:16, Philippe Gerum wrote:
>>>
>>> Jan Kiszka  writes:
>>>
>>>> On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
>>>>> From: Philippe Gerum 
>>>>>
>>>>> The current implementation does not atomically consume+clear the event
>>>>> set to be received by the waiter(s), which makes it useless for
>>>>> anything but a plain one-time latch due to the race window this opens
>>>>> with a consume[A]->signal[B]->clear[A] sequence.
>>>>>
>>>>> To address this issue, let's provide the auto-clear feature with
>>>>> __cobalt_event_wait().
>>>>>
>>>>> This change affects the ABI by adding the auto-clear mode as an opt-in
>>>>> feature, enabled by passing COBALT_EVENT_AUTOCLEAR to
>>>>> cobalt_event_init().
>>>>>
>>>>
>>>> Makes sense, but shouldn't autoclear be rather the default then? Which
>>>> users are affected? None in-tree so far?
>>>>
>>>
>>> There is one user in copperplate:
>>> https://source.denx.de/Xenomai/xenomai/-/blob/28158391258eea52650856bef5d3ed6ebaaf813b/lib/copperplate/eventobj.c#L87
>>>
>>> which indirectly affects rt_event_signal() from the alchemy API:
>>> https://source.denx.de/Xenomai/xenomai/-/blob/28158391258eea52650856bef5d3ed6ebaaf813b/lib/alchemy/event.c#L453
>>>
>>> Nobody raised the issue so far with alchemy, which is why I refrained
>>> from turning the autoclear mode on by default so far. This is debatable,
>>> since no documentation explains the limitation on usage caused by not
>>> having the autoclear mode set.
>>>
>>
>> So, the pattern via Alchemy would be
>>
>> Thread A Thread B
>>
>> rt_event_wait()
>>  rt_event_signal()
>> rt_event_clear()
>>
>> That would force users to perform a state check via a side-channel after
>> clearing the event to avoid starting to waiting if the condition was met
>> again.
>>
>> OK, but how could users request the new mode in rt_event_create? There
>> is not even a EV_AUTOCLEAR flag for it. Do you have more patches pending?
>>
> 
> The alternative I see is:
> 
> - assume that some people might be expecting the current - fragile to
>   say the least - behavior, which means that we should add a flag to
>   rt_event_create() in order to enable the auto-clear mode for all
>   others.
> 
> - consider the current behavior as broken beyond recognition, and force
>   in the auto-clear mode for all alchemy events, at the expense of
>   requiring the folks who have been using a side-channel to paper over
>   the current misdesign, to fix their stuff the right way, based on the
>   auto-clear behavior.
> 
> IMHO, everyone would be better off with #2, because it would just work
> in all cases. The side-channel would simply become a useless but
> innocuous noise if present.
> 
> Which way should we go is debatable, this is why I did not issue any
> patch changing the alchemy interface yet.
> 

Let's go for option #2 then and include the alchemy changes as well.
Will only target the next major release anyway.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



[Call for Participation] Xenomai Community Meet-Up

2022-04-11 Thread Jan Kiszka via Xenomai
The Xenomai community is planning to hold a virtual meet-up with
presentations and live discussions on May 25th. To accommodate for
different time zones, the meet-up will be split into an Asian morning
session, followed by a European one.

Sessions will be recorded as well and shared afterwards. Attendance will
be free for everyone, just a registration will be needed to join the
platform.

We are happy to already announce several speakers and topics such as

- "Xenomai in Industry Control System"
  by Yebin Liang (Han's Smart Control Technology Co., Ltd)

- "Motion Control Applications for Special Effects and Entertainment"
  by Steve Rosenbluth (Concept Overdrive Inc)

- "Autonomous mobile robots in warehouses running on Xenomai"
  by Sebastian Schmolorz (KION Mobile Automation)

- "Xenomai Motion Control Solution and Tuning Based on Intel Edge
  Controls for Industrial"
  by Wei Zhang and Yu Yan (Intel)

- "The Xenomai Project: Today and Tomorrow"
  by Jan Kiszka (Siemens)

...and more.  There are still some free slots available, and we are
therefore looking for further talks from the community:

- How are you using Xenomai?
- Are you working on ports or extensions of it?
- Do you have best practices to share?
- What do you expect from Xenomai in the future?

Please send your proposals consisting of a title and short abstract to 
jan.kis...@siemens.com and hongzhan.c...@intel.com.

Important dates:

- CFP closes:Friday, April 29, 2022
- CFP notifications: Wednesday, May 4, 2022
- Schedule announcement: Friday, May 6, 2022
- Recording Due Date:Wednesday, May 11, 2022
- Event: Wednesday, May 25, 2022

Looking forward to seeing you at the meet-up!

Rick, Hongzhan, Jan



Re: 5.10-dovetail regression?

2022-04-07 Thread Jan Kiszka via Xenomai
On 07.04.22 17:24, Philippe Gerum wrote:
> 
> Jan Kiszka  writes:
> 
>> Hi Philippe,
>>
>> does this already ring some bell?
>>
>> https://source.denx.de/Xenomai/xenomai-images/-/jobs/419210
>>
>> Only triggers with qemu-amd64, not on real HW and not with 5.15.
>>
> 
> I could not reproduce locally, but visual inspection revealed something
> fishy in #8e2c09ee5323. Could you try this on the failing kernel? TIA,
> 
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 2651c6cfd034..da6735d45a8a 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -644,8 +644,8 @@ void clockevents_exchange_device(struct 
> clock_event_device *old,
>* to the release list, keep it around but mark it as
>* reserved.
>*/
> + list_del(>list);
>   if (tick_check_is_proxy(new)) {
> - list_del(>list);
>   clockevents_switch_state(old, CLOCK_EVT_STATE_RESERVED);
>   } else {
>   clockevents_switch_state(old, CLOCK_EVT_STATE_DETACHED);
> 

Didn't reproduce locally for me as well, though using the same image.
But the patch helped on the CI system.

Thanks,
Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 1/5] cobalt/events: add auto-clear feature

2022-04-07 Thread Jan Kiszka via Xenomai
On 07.04.22 13:16, Philippe Gerum wrote:
> 
> Jan Kiszka  writes:
> 
>> On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
>>> From: Philippe Gerum 
>>>
>>> The current implementation does not atomically consume+clear the event
>>> set to be received by the waiter(s), which makes it useless for
>>> anything but a plain one-time latch due to the race window this opens
>>> with a consume[A]->signal[B]->clear[A] sequence.
>>>
>>> To address this issue, let's provide the auto-clear feature with
>>> __cobalt_event_wait().
>>>
>>> This change affects the ABI by adding the auto-clear mode as an opt-in
>>> feature, enabled by passing COBALT_EVENT_AUTOCLEAR to
>>> cobalt_event_init().
>>>
>>
>> Makes sense, but shouldn't autoclear be rather the default then? Which
>> users are affected? None in-tree so far?
>>
> 
> There is one user in copperplate:
> https://source.denx.de/Xenomai/xenomai/-/blob/28158391258eea52650856bef5d3ed6ebaaf813b/lib/copperplate/eventobj.c#L87
> 
> which indirectly affects rt_event_signal() from the alchemy API:
> https://source.denx.de/Xenomai/xenomai/-/blob/28158391258eea52650856bef5d3ed6ebaaf813b/lib/alchemy/event.c#L453
> 
> Nobody raised the issue so far with alchemy, which is why I refrained
> from turning the autoclear mode on by default so far. This is debatable,
> since no documentation explains the limitation on usage caused by not
> having the autoclear mode set.
> 

So, the pattern via Alchemy would be

Thread AThread B

rt_event_wait()
rt_event_signal()
rt_event_clear()

That would force users to perform a state check via a side-channel after
clearing the event to avoid starting to waiting if the condition was met
again.

OK, but how could users request the new mode in rt_event_create? There
is not even a EV_AUTOCLEAR flag for it. Do you have more patches pending?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



5.10-dovetail regression?

2022-04-07 Thread Jan Kiszka via Xenomai
Hi Philippe,

does this already ring some bell?

https://source.denx.de/Xenomai/xenomai-images/-/jobs/419210

Only triggers with qemu-amd64, not on real HW and not with 5.15.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 5/5] drivers: ipc: enable non-blocking write from regular threads

2022-04-07 Thread Jan Kiszka via Xenomai
On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
> From: Philippe Gerum 
> 
> Regular threads should be allowed to write to RTIPC sockets provided
> MSG_DONTWAIT is implicitly set for such a request. This would match
> the existing behavior with other synchronization objects, such as
> semaphores and events, avoiding unnecessary restrictions on usage.
> 
> Signed-off-by: Philippe Gerum 
> ---
>  kernel/drivers/ipc/bufp.c  | 8 ++--
>  kernel/drivers/ipc/iddp.c  | 8 ++--
>  kernel/drivers/ipc/rtipc.c | 4 ++--
>  kernel/drivers/ipc/xddp.c  | 6 +-
>  4 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/drivers/ipc/bufp.c b/kernel/drivers/ipc/bufp.c
> index fd533dba27..565409dd6f 100644
> --- a/kernel/drivers/ipc/bufp.c
> +++ b/kernel/drivers/ipc/bufp.c
> @@ -655,11 +655,15 @@ static ssize_t bufp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct bufp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __bufp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __bufp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __bufp_bind_socket(struct rtipc_private *priv,
> @@ -682,7 +686,7 @@ static int __bufp_bind_socket(struct rtipc_private *priv,
>   __test_and_set_bit(_BUFP_BINDING, >status))
>   ret = -EADDRINUSE;
>   cobalt_atomic_leave(s);
> - 
> +
>   if (ret)
>   return ret;
>  
> diff --git a/kernel/drivers/ipc/iddp.c b/kernel/drivers/ipc/iddp.c
> index a553902326..05d0193394 100644
> --- a/kernel/drivers/ipc/iddp.c
> +++ b/kernel/drivers/ipc/iddp.c
> @@ -255,7 +255,7 @@ static ssize_t __iddp_recvmsg(struct rtdm_fd *fd,
>   }
>  
>   /* We want to pick one buffer from the queue. */
> - 
> +
>   for (;;) {
>   ret = rtdm_sem_timeddown(>insem, timeout, toseq);
>   if (unlikely(ret)) {
> @@ -522,11 +522,15 @@ static ssize_t iddp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct iddp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __iddp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __iddp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __iddp_bind_socket(struct rtdm_fd *fd,
> diff --git a/kernel/drivers/ipc/rtipc.c b/kernel/drivers/ipc/rtipc.c
> index 859bdab2f2..211b496ec5 100644
> --- a/kernel/drivers/ipc/rtipc.c
> +++ b/kernel/drivers/ipc/rtipc.c
> @@ -428,7 +428,7 @@ static int rtipc_select(struct rtdm_fd *fd, struct 
> xnselector *selector,
>   struct xnselect *block;
>   spl_t s;
>   int ret;
> - 
> +
>   if (type != XNSELECT_READ && type != XNSELECT_WRITE)
>   return -EINVAL;
>  
> @@ -480,7 +480,7 @@ static struct rtdm_driver rtipc_driver = {
>   .read_rt=   rtipc_read,
>   .read_nrt   =   NULL,
>   .write_rt   =   rtipc_write,
> - .write_nrt  =   NULL,
> + .write_nrt  =   rtipc_write, /* MSG_DONTWAIT. */
>   .select =   rtipc_select,
>   },
>  };
> diff --git a/kernel/drivers/ipc/xddp.c b/kernel/drivers/ipc/xddp.c
> index ae5b720c0c..2ca0da5fd4 100644
> --- a/kernel/drivers/ipc/xddp.c
> +++ b/kernel/drivers/ipc/xddp.c
> @@ -657,11 +657,15 @@ static ssize_t xddp_write(struct rtdm_fd *fd,
>   struct rtipc_private *priv = rtdm_fd_to_private(fd);
>   struct iovec iov = { .iov_base = (void *)buf, .iov_len = len };
>   struct xddp_socket *sk = priv->state;
> + int flags = 0;
>  
>   if (sk->peer.sipc_port < 0)
>   return -EDESTADDRREQ;
>  
> - return __xddp_sendmsg(fd, , 1, 0, >peer);
> + if (is_secondary_domain())
> + flags = MSG_DONTWAIT;
> +
> + return __xddp_sendmsg(fd, , 1, flags, >peer);
>  }
>  
>  static int __xddp_bind_socket(struct rtipc_private *priv,

Does this patch have any dependency on 1-4, or was it just bundled with
them by chance?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH 2/5] cobalt/events: add single-waiter delivery mode

2022-04-07 Thread Jan Kiszka via Xenomai
On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
> From: Philippe Gerum 
> 
> Allow for sending a set of event bits to the heading waiter only,
> instead of broadcasting them to all waiters.
> 
> This change affects the ABI, since we add a set of operation flags to
> the cobalt_event_sync service, to pass the new COBALT_EVENT_BCAST
> flag. It also affects the internal API as follows:
> 
> - the new cobalt_event_broadcast() call behaves like
>   cobalt_event_post() formerly did, which is broadcasting the event to
>   all waiters atomically.
> 
> - the new cobalt_event_signal() call implements the single-waiter
>   delivery mode we introduce.
> 
> - the former cobalt_event_post() is now a wrapper to
>   cobalt_event_broadcast(), marked with a deprecation flag.

Strictly spoken, deprecation will come with patch 4 only.

Can we have some test cases for the new features as well? Who will use them?

Jan

> 
> Signed-off-by: Philippe Gerum 
> ---
>  include/cobalt/sys/cobalt.h | 12 +++-
>  include/cobalt/uapi/event.h | 10 +-
>  kernel/cobalt/posix/event.c | 37 +++--
>  kernel/cobalt/posix/event.h |  3 ++-
>  lib/cobalt/internal.c   | 20 
>  5 files changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/include/cobalt/sys/cobalt.h b/include/cobalt/sys/cobalt.h
> index 46096e8801..1035a29d52 100644
> --- a/include/cobalt/sys/cobalt.h
> +++ b/include/cobalt/sys/cobalt.h
> @@ -106,9 +106,19 @@ int cobalt_event_init(cobalt_event_t *event,
> unsigned int value,
> int flags);
>  
> -int cobalt_event_post(cobalt_event_t *event,
> +int cobalt_event_signal(cobalt_event_t *event,
> unsigned int bits);
>  
> +int cobalt_event_broadcast(cobalt_event_t *event,
> + unsigned int bits);
> +
> +/* Backward compatibility with 3.2.x and earlier. */
> +static inline int cobalt_event_post(cobalt_event_t *event,
> + unsigned int bits)
> +{
> + return cobalt_event_broadcast(event, bits);
> +}
> +
>  int cobalt_event_wait(cobalt_event_t *event,
> unsigned int bits,
> unsigned int *bits_r,
> diff --git a/include/cobalt/uapi/event.h b/include/cobalt/uapi/event.h
> index 14ddbcf567..52b71d4e3b 100644
> --- a/include/cobalt/uapi/event.h
> +++ b/include/cobalt/uapi/event.h
> @@ -22,9 +22,6 @@
>  
>  struct cobalt_event_state {
>   __u32 value;
> - __u32 flags;
> -#define COBALT_EVENT_PENDED  0x1
> - __u32 nwaiters;
>  };
>  
>  struct cobalt_event;
> @@ -36,8 +33,11 @@ struct cobalt_event;
>  #define COBALT_EVENT_AUTOCLEAR  0x4
>  
>  /* Wait mode. */
> -#define COBALT_EVENT_ALL  0x0
> -#define COBALT_EVENT_ANY  0x1
> +#define COBALT_EVENT_ALL0x0
> +#define COBALT_EVENT_ANY0x1
> +
> +/* Sync mode. */
> +#define COBALT_EVENT_BCAST  0x1
>  
>  struct cobalt_event_shadow {
>   __u32 state_offset;
> diff --git a/kernel/cobalt/posix/event.c b/kernel/cobalt/posix/event.c
> index 0a1236ad73..4433e733fd 100644
> --- a/kernel/cobalt/posix/event.c
> +++ b/kernel/cobalt/posix/event.c
> @@ -85,8 +85,6 @@ COBALT_SYSCALL(event_init, current,
>   synflags = (flags & COBALT_EVENT_PRIO) ? XNSYNCH_PRIO : XNSYNCH_FIFO;
>   xnsynch_init(>synch, synflags, NULL);
>   state->value = value;
> - state->flags = 0;
> - state->nwaiters = 0;
>   stateoff = cobalt_umm_offset(umm, state);
>   XENO_BUG_ON(COBALT, stateoff != (__u32)stateoff);
>  
> @@ -152,37 +150,31 @@ int __cobalt_event_wait(struct cobalt_event_shadow 
> __user *u_event,
>   goto out;
>   }
>  
> - state->flags |= COBALT_EVENT_PENDED;
>   rbits = state->value & bits;
>   testval = mode & COBALT_EVENT_ANY ? rbits : bits;
>   if (rbits && rbits == testval) {
>   if (event->flags & COBALT_EVENT_AUTOCLEAR)
>   state->value &= ~rbits;
> - goto done;
> + goto out;
>   }
>  
>   if (timeout == XN_NONBLOCK) {
>   ret = -EWOULDBLOCK;
> - goto done;
> + goto out;
>   }
>  
>   ewc.value = bits;
>   ewc.mode = mode;
>   xnthread_prepare_wait();
> - state->nwaiters++;
>   info = xnsynch_sleep_on(>synch, timeout, tmode);
>   if (info & XNRMID) {
>   ret = -EIDRM;
>   goto out;
>   }
> - if (info & (XNBREAK|XNTIMEO)) {
> - state->nwaiters--;
> + if (info & (XNBREAK|XNTIMEO))
>   ret = (info & XNBREAK) ? -EINTR : -ETIMEDOUT;
> - } else
> + else
>   rbits = ewc.value;
> -done:
> - if (!xnsynch_pended_p(>synch))
> - state->flags &= ~COBALT_EVENT_PENDED;
>  out:
>   xnlock_put_irqrestore(, s);
>  
> @@ -240,9 +232,10 @@ COBALT_SYSCALL(event_wait64, primary,
>  }
>  
>  COBALT_SYSCALL(event_sync, current,
> -(struct cobalt_event_shadow __user *u_event))
> + (struct 

Re: [PATCH 1/5] cobalt/events: add auto-clear feature

2022-04-07 Thread Jan Kiszka via Xenomai
On 06.04.22 17:56, Philippe Gerum via Xenomai wrote:
> From: Philippe Gerum 
> 
> The current implementation does not atomically consume+clear the event
> set to be received by the waiter(s), which makes it useless for
> anything but a plain one-time latch due to the race window this opens
> with a consume[A]->signal[B]->clear[A] sequence.
> 
> To address this issue, let's provide the auto-clear feature with
> __cobalt_event_wait().
> 
> This change affects the ABI by adding the auto-clear mode as an opt-in
> feature, enabled by passing COBALT_EVENT_AUTOCLEAR to
> cobalt_event_init().
> 

Makes sense, but shouldn't autoclear be rather the default then? Which
users are affected? None in-tree so far?

Jan

> Signed-off-by: Philippe Gerum 
> ---
>  configure.ac|   1 +
>  include/cobalt/uapi/event.h |   7 +-
>  kernel/cobalt/posix/event.c |  17 +++-
>  testsuite/smokey/Makefile.am|   2 +
>  testsuite/smokey/events/Makefile.am |   8 ++
>  testsuite/smokey/events/events.c| 123 
>  6 files changed, 152 insertions(+), 6 deletions(-)
>  create mode 100644 testsuite/smokey/events/Makefile.am
>  create mode 100644 testsuite/smokey/events/events.c
> 
> diff --git a/configure.ac b/configure.ac
> index 5611b5b8db..15d3e46e9a 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1002,6 +1002,7 @@ AC_CONFIG_FILES([ \
>   testsuite/smokey/gdb/Makefile \
>   testsuite/smokey/y2038/Makefile \
>   testsuite/smokey/can/Makefile
> + testsuite/smokey/events/Makefile \
>   testsuite/clocktest/Makefile \
>   testsuite/xeno-test/Makefile \
>   utils/Makefile \
> diff --git a/include/cobalt/uapi/event.h b/include/cobalt/uapi/event.h
> index 8710e8e25f..14ddbcf567 100644
> --- a/include/cobalt/uapi/event.h
> +++ b/include/cobalt/uapi/event.h
> @@ -30,9 +30,10 @@ struct cobalt_event_state {
>  struct cobalt_event;
>  
>  /* Creation flags. */
> -#define COBALT_EVENT_FIFO0x0
> -#define COBALT_EVENT_PRIO0x1
> -#define COBALT_EVENT_SHARED  0x2
> +#define COBALT_EVENT_FIFO   0x0
> +#define COBALT_EVENT_PRIO   0x1
> +#define COBALT_EVENT_SHARED 0x2
> +#define COBALT_EVENT_AUTOCLEAR  0x4
>  
>  /* Wait mode. */
>  #define COBALT_EVENT_ALL  0x0
> diff --git a/kernel/cobalt/posix/event.c b/kernel/cobalt/posix/event.c
> index 052c686050..0a1236ad73 100644
> --- a/kernel/cobalt/posix/event.c
> +++ b/kernel/cobalt/posix/event.c
> @@ -146,7 +146,7 @@ int __cobalt_event_wait(struct cobalt_event_shadow __user 
> *u_event,
>   if (bits == 0) {
>   /*
>* Special case: we don't wait for any event, we only
> -  * return the current flag group value.
> +  * return the pending ones without consuming them.
>*/
>   rbits = state->value;
>   goto out;
> @@ -155,8 +155,11 @@ int __cobalt_event_wait(struct cobalt_event_shadow 
> __user *u_event,
>   state->flags |= COBALT_EVENT_PENDED;
>   rbits = state->value & bits;
>   testval = mode & COBALT_EVENT_ANY ? rbits : bits;
> - if (rbits && rbits == testval)
> + if (rbits && rbits == testval) {
> + if (event->flags & COBALT_EVENT_AUTOCLEAR)
> + state->value &= ~rbits;
>   goto done;
> + }
>  
>   if (timeout == XN_NONBLOCK) {
>   ret = -EWOULDBLOCK;
> @@ -239,7 +242,7 @@ COBALT_SYSCALL(event_wait64, primary,
>  COBALT_SYSCALL(event_sync, current,
>  (struct cobalt_event_shadow __user *u_event))
>  {
> - unsigned int bits, waitval, testval;
> + unsigned int bits, waitval, testval, consumed = 0;
>   struct xnthread_wait_context *wc;
>   struct cobalt_event_state *state;
>   struct event_wait_context *ewc;
> @@ -275,10 +278,18 @@ COBALT_SYSCALL(event_sync, current,
>   if (waitval && waitval == testval) {
>   state->nwaiters--;
>   ewc->value = waitval;
> + consumed |= waitval;
>   xnsynch_wakeup_this_sleeper(>synch, p);
>   }
>   }
>  
> + /*
> +  * If some flags were consumed and auto-clear is enabled,
> +  * clear the former.
> +  */
> + if (consumed && (event->flags & COBALT_EVENT_AUTOCLEAR))
> + state->value &= ~consumed;
> +
>   xnsched_run();
>  out:
>   xnlock_put_irqrestore(, s);
> diff --git a/testsuite/smokey/Makefile.am b/testsuite/smokey/Makefile.am
> index 4a9773f586..3f0521282b 100644
> --- a/testsuite/smokey/Makefile.am
> +++ b/testsuite/smokey/Makefile.am
> @@ -14,6 +14,7 @@ COBALT_SUBDIRS =\
>   bufp\
>   can \
>   cpu-affinity\
> + events  \
>   fpu-stress  \
>   gdb \
>   iddp\
> @@ -53,6 +54,7 @@ DIST_SUBDIRS =  \
>   can \
>   cpu-affinity\
>   dlopen  \
> + events

Re: [PATCH] Remove __work from PIPELINE_INBAND_WORK_INITIALIZER

2022-04-07 Thread Jan Kiszka via Xenomai
On 06.04.22 23:06, Richard Weinberger via Xenomai wrote:
> ipipe took a copy of the queued work, __work was used to determine how much
> bytes had to get copied.
> With dovetail no copy as taken and the __work parameter is no longer
> useful, so we can get rid of it.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  include/cobalt/kernel/dovetail/pipeline/inband_work.h | 2 +-
>  include/cobalt/kernel/rtdm/driver.h   | 3 +--
>  kernel/cobalt/rtdm/drvlib.c   | 3 +--
>  kernel/cobalt/rtdm/fd.c   | 3 +--
>  kernel/cobalt/thread.c| 8 +++-
>  kernel/drivers/udd/udd.c  | 3 +--
>  6 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/include/cobalt/kernel/dovetail/pipeline/inband_work.h 
> b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> index af3d70fc6..a69bf651f 100644
> --- a/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> +++ b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> @@ -17,7 +17,7 @@ struct pipeline_inband_work {
>   struct irq_work work;
>  };
>  
> -#define PIPELINE_INBAND_WORK_INITIALIZER(__work, __handler)  \
> +#define PIPELINE_INBAND_WORK_INITIALIZER(__handler)  \
>   {   \
>   .work = IRQ_WORK_INIT((void (*)(struct irq_work *))__handler), \
>   }
> diff --git a/include/cobalt/kernel/rtdm/driver.h 
> b/include/cobalt/kernel/rtdm/driver.h
> index 930da34ed..d5f3dad34 100644
> --- a/include/cobalt/kernel/rtdm/driver.h
> +++ b/include/cobalt/kernel/rtdm/driver.h
> @@ -924,8 +924,7 @@ static inline void rtdm_nrtsig_init(rtdm_nrtsig_t 
> *nrt_sig,
>   rtdm_nrtsig_handler_t handler, void *arg)
>  {
>   nrt_sig->inband_work = (struct pipeline_inband_work)
> - PIPELINE_INBAND_WORK_INITIALIZER(*nrt_sig,
> -  __rtdm_nrtsig_execute);
> + PIPELINE_INBAND_WORK_INITIALIZER(__rtdm_nrtsig_execute);
>   nrt_sig->handler = handler;
>   nrt_sig->arg = arg;
>  }
> diff --git a/kernel/cobalt/rtdm/drvlib.c b/kernel/cobalt/rtdm/drvlib.c
> index 4ae1ed672..4eaf3a57c 100644
> --- a/kernel/cobalt/rtdm/drvlib.c
> +++ b/kernel/cobalt/rtdm/drvlib.c
> @@ -1666,8 +1666,7 @@ static void lostage_schedule_work(struct 
> pipeline_inband_work *inband_work)
>  static struct lostage_trigger_work {
>   struct pipeline_inband_work inband_work; /* Must be first. */
>  } nrt_work =  {
> - .inband_work = PIPELINE_INBAND_WORK_INITIALIZER(nrt_work,
> - lostage_schedule_work),
> + .inband_work = PIPELINE_INBAND_WORK_INITIALIZER(lostage_schedule_work),
>  };
>  
>  /**
> diff --git a/kernel/cobalt/rtdm/fd.c b/kernel/cobalt/rtdm/fd.c
> index bbeea06ae..3c26534f3 100644
> --- a/kernel/cobalt/rtdm/fd.c
> +++ b/kernel/cobalt/rtdm/fd.c
> @@ -304,8 +304,7 @@ static void lostage_trigger_close(struct 
> pipeline_inband_work *inband_work)
>  static struct lostage_trigger_close {
>   struct pipeline_inband_work inband_work; /* Must be first. */
>  } fd_closework =  {
> - .inband_work = PIPELINE_INBAND_WORK_INITIALIZER(fd_closework,
> - lostage_trigger_close),
> + .inband_work = PIPELINE_INBAND_WORK_INITIALIZER(lostage_trigger_close),
>  };
>  
>  static void __put_fd(struct rtdm_fd *fd, spl_t s)
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index dfaef7564..41804b24f 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -151,7 +151,7 @@ static int map_kthread(struct xnthread *thread, struct 
> kthread_arg *ka)
>   trace_cobalt_lostage_request("wakeup", current);
>  
>   ka->inband_work = (struct pipeline_inband_work)
> - PIPELINE_INBAND_WORK_INITIALIZER(*ka, do_parent_wakeup);
> + PIPELINE_INBAND_WORK_INITIALIZER(do_parent_wakeup);
>   pipeline_post_inband_work(ka);
>  
>   xnlock_get_irqsave(, s);
> @@ -2047,8 +2047,7 @@ void xnthread_relax(int notify, int reason)
>  {
>   struct task_struct *p = current;
>   struct lostage_wakeup wakework = {
> - .inband_work = PIPELINE_INBAND_WORK_INITIALIZER(wakework,
> - lostage_task_wakeup),
> + .inband_work = 
> PIPELINE_INBAND_WORK_INITIALIZER(lostage_task_wakeup),
>   .task = p,
>   };
>   struct xnthread *thread = xnthread_current();
> @@ -2393,8 +2392,7 @@ void __xnthread_signal(struct xnthread *thread, int 
> sig, int arg)
>   return;
>  
>   sigwork->inband_work = (struct pipeline_inband_work)
> - PIPELINE_INBAND_WORK_INITIALIZER(*sigwork,
> -  lostage_task_signal);
> + PIPELINE_INBAND_WORK_INITIALIZER(lostage_task_signal);
>   

Re: pthread_cancel services not documented

2022-04-07 Thread Jan Kiszka via Xenomai
On 07.04.22 10:20, Yunjie Gu via Xenomai wrote:
> Hi All,
> 
> I'm trying to use xenomai with posix interface, and I found some
> differences in the documentation for xenomai 2 and xenomai 3. In xenomai
> 2, pthread_cancel related services (including pthread_setcancelstate, .etc)
> are clearly documented. In xenomai 3, however, only pthread_kill is
> documented, but it seems other services are still there in source codes. My
> guessing is that pthread_cancel related services are managed by the linux
> rather than cobalt core and therefore were not explicitly documented in
> xenomai 3, and we can use them in the same way as we do in linux. Can
> anyone confirm this? See link below
> 
> https://xenomai.org/documentation/xenomai-2.6/html/api/group__posix__cancel.html
> 
> https://xenomai.org/documentation/xenomai-3/html/xeno3prm/group__cobalt__api__thread.html#gae1a96424296ef872696c7fb90a8ae9aa
> 

Yes, only Xenomai-wrapped pthread services appear in the documentation.
You can use consider other service available, but those will generally
work by migrating the caller first to Linux mode.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH v2] x86: dovetail: reinstate I/O bitmap on user entry

2022-04-06 Thread Jan Kiszka via Xenomai
On 06.04.22 11:13, Wolfgang Denk via Xenomai wrote:
> Dear Philippe,
> 
> In message <87ee2colqg@xenomai.org> you wrote:
>>
>>> BTW: Do you have a Patchwork instance for the Xenomai mailing list?
>>
>> Unfortunately not.
>>
>>> I makes dealing with patch
>>> series so much easier.
>>>
>>> Or even better, can we have Xenomai at lore.kernel.org? Then we can use b=
>> 4. :-)
>>
>> It seems unlikely that kernel.org would consider hosting us since the
>> project does not upstream patches to the mainline kernel.
> 
> Did you try it?  I don't thhink this is a mandatory requirement.
> For example, U-Boot also does not upstream any kernel patches.
> 

I agree, we should give that a try [1]. Philippe, would you do as
mailing list hoster or should I (and then ask you to sync with them
regarding the archive)?

Jan

[1] https://korg.docs.kernel.org/lore.html

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack

2022-04-05 Thread Jan Kiszka via Xenomai
On 05.04.22 19:23, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>>> How about additionally widening the suspected race window by adding a
>>> delay to lostage_task_wakeup?
>>
>> Excellent idea! :-)
> 
> Yeah, with a dealy in lostage_task_wakeup() my WARN_ON_ONCE() triggers
> very quickly.
> 
> [  123.237698] [ cut here ]
> [  123.238755] WARNING: CPU: 1 PID: 1411 at kernel/xenomai/thread.c:2158 
> xnthread_relax+0x5d4/0x680
> [  123.240698] Modules linked in: loader(OE) tun bridge stp llc nft_fib_inet 
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv6 nft_reject 
> nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables 
> xeno_can_peak_pci xeno_can_sja1000 xeno_can xeno_16550A libcrc32c nfnetlink 
> xeno_rtipc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm snd_timer 
> snd soundcore sunrpc pktcdvd rt_e1000 rt_e1000_new crc32_pclmul rtnet 
> i2c_piix4 bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm e1000 serio_raw crc32c_intel 
> ata_generic pata_acpi floppy qemu_fw_cfg fuse
> [  123.252790] CPU: 1 PID: 1411 Comm: app Tainted: G   OE 
> 5.15.9xeno3.2-x8664G-rw #6
> [  123.255001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [  123.257443] IRQ stage: Linux
> [  123.258090] RIP: 0010:xnthread_relax+0x5d4/0x680
> [  123.259136] Code: 18 05 00 00 e8 5d 8d 13 00 41 8b 97 18 05 00 00 48 8d b3 
> 6c 02 00 00 48 c7 c7 a0 8f 6d 82 e8 5c 96 cc 00 0f 0b e9 9d fc ff ff <0f> 0b 
> e9 af fd ff ff 65 44 8b 2d a5 2d ca 7e 41 83 fd 03 77 7a 45
> [  123.263163] RSP: 0018:88811381fb60 EFLAGS: 00010202
> [  123.264330] RAX: 0022 RBX: c9bf6408 RCX: 
> 8137e13b
> [  123.265893] RDX:  RSI: 0004 RDI: 
> 88811381fc18
> [  123.267474] RBP: 111022703f6e R08: ed1022703f84 R09: 
> ed1022703f84
> [  123.269095] R10: 88811381fc1b R11: ed1022703f83 R12: 
> 88811381fc10
> [  123.270662] R13: 888104f03e00 R14:  R15: 
> c9bf6428
> [  123.27] FS:  7fb4d0aca700() GS:88811b08() 
> knlGS:
> [  123.273983] CS:  0010 DS:  ES:  CR0: 80050033
> [  123.275262] CR2: 00a74008 CR3: 000111938003 CR4: 
> 00170ea0
> [  123.276823] Call Trace:
> [  123.277401]  
> [  123.277878]  ? xnthread_wait_period+0x4c0/0x4c0
> [  123.278899]  ? xnsynch_release+0x690/0x690
> [  123.279828]  ? __cobalt_sem_destroy+0x2dd/0x630
> [  123.280848]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.281812]  ? xnthread_set_periodic+0x3a0/0x3a0
> [  123.282855]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.283816]  ? ktime_get_mono_fast_ns+0xdb/0x120
> [  123.284852]  ? xnlock_dbg_release+0xd9/0x170
> [  123.285812]  prepare_for_signal+0x297/0x3a0
> [  123.286765]  ? CoBaLt_serialdbg+0x140/0x140
> [  123.287709]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.288906]  handle_head_syscall+0x6e2/0x810
> [  123.289867]  ? __cobalt_cond_wait_prologue+0xf60/0xf60
> [  123.291017]  ? CoBaLt_trace+0x650/0x650
> [  123.291887]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.293088]  pipeline_syscall+0x8e/0x140
> [  123.293979]  syscall_enter_from_user_mode+0x30/0x80
> [  123.295076]  do_syscall_64+0x1d/0xa0
> [  123.295895]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 

Ok, we are seeing clearer but not everything. Could you find out if work
and thread are running on different CPUs? Or, via tracing, what led to
this case otherwise?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack

2022-04-05 Thread Jan Kiszka via Xenomai
On 05.04.22 17:53, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>> Von: "Jan Kiszka" 
>> I would like to have an explanation or prove points (traces, assertions)
>> that we actually see xnthread_relax overtaking the delivery of its own
>> wakework.
> 
> I can re-test with something like that:
> 
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index beda67e18..4c100b645 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -2159,6 +2159,7 @@ void xnthread_relax(int notify, int reason)
> pipeline_clear_mayday();
>  
> trace_cobalt_shadow_relaxed(thread);
> +   WARN_ON_ONCE(irq_work_is_busy(_work.work));
>  }
>  EXPORT_SYMBOL_GPL(xnthread_relax);
> 
> But I fear this might take some time. The KASAM spat happened only once
> and also only after the test ran for almost 5 days.

How about additionally widening the suspected race window by adding a
delay to lostage_task_wakeup?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack

2022-04-05 Thread Jan Kiszka via Xenomai
On 05.04.22 15:10, Richard Weinberger wrote:
> On Tue, Apr 5, 2022 at 3:02 PM Bezdeka, Florian via Xenomai
>  wrote:
>> I'm not sure if waiting is really what we want. I like the idea of
>> moving the work into struct xnthread as Jan already suggested
>> internally.
> 
> Well, the wait is cheap, it does not involve scheduling.
> I'm not sure whether further bloating struct xnthread is wise either.
> 

Let's not optimize before we are actually sure that the issue is what we
assume it to be.

I would like to have an explanation or prove points (traces, assertions)
that we actually see xnthread_relax overtaking the delivery of its own
wakework.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Machine freezes under Ubuntu 20.04

2022-04-05 Thread Jan Kiszka via Xenomai
On 05.04.22 15:43, Arturo Laurenzi wrote:
>> On 04.04.22 15:21, Arturo Laurenzi via Xenomai wrote:
> 
>>>
>>> Recently, we have started a transition towards Ubuntu 20.04, and things
>>> have started to break.
>>>
>>> The first attempt was to install kernel 5.4.151 and stick to ipipe. Under
>>> this setup, we experience issues even before starting our applications. We
>>> have seen random crashes while compiling with GCC, sporadic "System Program
>>> Problem Detected" popups by Ubuntu, and others. We even tried to re-install
>>> OS and kernel from scratch with no luck.
>>
>> A reference setup for this kernel line can be found in xenomai-images
>> (https://source.denx.de/Xenomai/xenomai-images). Would be good to
>> understand which deviation from it makes the difference for which
>> component (see also further questions below).
> 
> I'm attaching the config we're using (from /boot/config-$(uname -r)).
> If that makes sense, we're going to try to configure the kernel
> according to this file
> (https://source.denx.de/Xenomai/xenomai-images/-/blob/master/recipes-kernel/linux/files/amd64_defconfig).
> What kernel version do you recommend to try?
> 

Always the latest of the individual kernel series.

>>>
>>> The second attempt was to stick to our old kernel 4.19.140. All the weird
>>> issues disappear and the system is stable. However, we are unable to have
>>> the system pass our suite of "stress tests", which basically involve 
>>> starting,
>>> running, and killing process B multiple times in a cyclic fashion, while
>>> process A runs in the background. After a short while (minutes), the whole
>>> system just hangs, forcing us to do an hard reset. Only once, we managed to
>>> get this kernel oops after rebooting (journalctl -k -b -1 --no-pager).
>>>
>>
>> For reliably recording crashes, it is highly recommended to use a UART
>> as kernel debug output.
> 
> Will do ASAP and let you know.
> 
>>> The third attempt was to try out kernel 5.10.89 plus the new dovetail
>>> patch, and Xenomai v3.2.1. Again, all the weird issues are gone and the
>>> system is stable. However, we are unable to have the system pass our suite
>>> of "stress tests". Differently from 4.19-ipipe, the system resists for a
>>> longer time before hanging (few hours sometimes), but this also varies a
>>> lot.
>>>
>>> After some more investigation, we found out something interesting. By
>>> removing the code that interacts with Process A, Process B is then able to
>>> run "forever" (overnight at least), but *only if Process A is not running*.
>>> Otherwise, the system will hang. In other words, the mere presence of
>>> Process A is affecting Process B, even though both IDDP and ZMQ have been
>>> removed from B and replaced with fake data. Furthermore, the system does
>>> not freeze if we set B1's scheduling policy to SCHED_OTHER.
>>
>> Do you have the Xenomai watchdog enabled, thus will you be able to tell
>> RT application "hangs" (infinite loop at high prio) apart from real
>> hangs/crashes?
> 
> Yes. When we try a while(true) inside a RT context, we see the
> watchdog killing our application
> as expected.
> 
> 
>>>
>>> From these - rather heuristic - tests, it looks like there could be some
>>> coupling between unrelated processes which causes some sort of bug, that is
>>> probably related to some interaction with mutexes/condvars, when these are
>>> used from a RT context. This issue shows up (or at least we have seen it)
>>> only under Ubuntu 20.04 (GCC 9.x), whereas a 18.04 build (GCC 7.x) looks
>>> fine.
>>
>> Ubuntu toolchains are known for agressively enabling certain security
>> features. Maybe one that we didn't check yet flipped between 18.04 and
>> 20.04 - if that switch is only difference between working and
>> non-working builds in your case. GCC itself should be fine, we are
>> testing with gcc-10 via Debian 11 in our CI.
>>
>> Can you check whether the toolchain change breaks the kernel (kernel
>> with old toolchain runs fine with userspace built via new toolchain)?
> 
> We have tried this, and still the system freezes after a while. We
> followed the procedure that follows:
>  1) generate binaries for our "working" kernel 4.19.140-xeno-ipipe-3.1
> on a Ubuntu 18 machine (make deb-pkg)
>  2) copy the whole /usr/xenomai directory (compiled with the 18.04
> toolchain) to the test machine with Ubuntu 20.04
>  3) install the kernel binaries to the test machine
>  4) re-compile our application
> Is this ok?
> 

Wait, these are three variables: kernel, Xenomai application and Ubuntu
userspace. Does your system also break when using both kernel and
application binaries from a Ubuntu 18 build? Or will it start to break
once you recompile the Xenomai application with Ubuntu 20 toolchain?

>>>
>>> The purpose of this message is twofold.
>>> First, to see if these symptoms might "ring a bell" to anyone in the
>>> community, who might be able to suggest a fix.
>>> Second, we'd like to ask what you would do to debug this issue. Which tool

Re: Machine freezes under Ubuntu 20.04

2022-04-04 Thread Jan Kiszka via Xenomai
On 04.04.22 15:21, Arturo Laurenzi via Xenomai wrote:
> Dear Xenomai community,
> in our lab we use Xenomai + RTnet to control complex EtherCAT-based robotic
> platforms (research prototypes).
> 
> Our infrastructure is made of two multi-threaded processes, let's say A and
> B, as follows.
> 
> Process A is an ethercat master, wrapped to expose both a RT and NRT
> interface to other processes:
>  - A1: ecat master (SOEM-based, uses RTnet), SCHED_FIFO
>  - A2: iddp end-point, SCHED_FIFO
>  - A3: zmq server, xddp end-point, SCHED_OTHER
> 
> Process B is our "control process" where algorithms actually run:
>  - B1: control thread, SCHED_FIFO
>  - B2: communication thread, SCHED_OTHER
> 
> The two processes interact in two ways.
> The first is zmq-based, and happens between B1 and A3 during the
> initialization phase (so, before the time-critical part of thread B1).
> The second is iddp-based. Both endpoints (A2 and B1) will bind/connect to a
> set of pipes, to realize a bi-directional communication channel that is
> RT-safe.
> 
> This usually works fine under the following setup:
> 
> CPU: Intel Core i7-7820EQ @3.00 GHz
> OS/Kernel: Ubuntu 18.04 + Linux 4.19.140-xeno-ipipe-3.1
> Xenomai: v3.1 (Cobalt + Posix API)
> Compiler: default GCC (v7.5)
> 
> Recently, we have started a transition towards Ubuntu 20.04, and things
> have started to break.
> 
> The first attempt was to install kernel 5.4.151 and stick to ipipe. Under
> this setup, we experience issues even before starting our applications. We
> have seen random crashes while compiling with GCC, sporadic "System Program
> Problem Detected" popups by Ubuntu, and others. We even tried to re-install
> OS and kernel from scratch with no luck.

A reference setup for this kernel line can be found in xenomai-images
(https://source.denx.de/Xenomai/xenomai-images). Would be good to
understand which deviation from it makes the difference for which
component (see also further questions below).

> 
> The second attempt was to stick to our old kernel 4.19.140. All the weird
> issues disappear and the system is stable. However, we are unable to have
> the system pass our suite of "stress tests", which basically involve starting,
> running, and killing process B multiple times in a cyclic fashion, while
> process A runs in the background. After a short while (minutes), the whole
> system just hangs, forcing us to do an hard reset. Only once, we managed to
> get this kernel oops after rebooting (journalctl -k -b -1 --no-pager).
> 

For reliably recording crashes, it is highly recommended to use a UART
as kernel debug output.

> 
> *dic 20 17:07:10 com-exp-dev kernel: BUG: unable to handle kernel paging
> request at fffeee9e41b1*
> *dic 20 17:07:10 com-exp-dev kernel: PGD 42080c067 P4D 42080c067 PUD 0*
> *dic 20 17:07:10 com-exp-dev kernel: Oops: 0010 [#1] SMP PTI*
> *dic 20 17:07:10 com-exp-dev kernel: CPU: 1 PID: 134 Comm: kworker/u16:1
> Not tainted 4.19.140-xeno-ipipe-3.1 #1*
> *dic 20 17:07:10 com-exp-dev kernel: Hardware name:  /TS175, BIOS BQKLR112
> 07/04/2017*
> *dic 20 17:07:10 com-exp-dev kernel: I-pipe domain: Linux*
> *dic 20 17:07:10 com-exp-dev kernel: Workqueue: efi_rts_wq efi_call_rts*
> *dic 20 17:07:10 com-exp-dev kernel: RIP: 0010:0xfffeee9e41b1*
> *dic 20 17:07:10 com-exp-dev kernel: Code: Bad RIP value.*
> *dic 20 17:07:10 com-exp-dev kernel: RSP: 0018:a6170334fd28 EFLAGS:
> 00010246*
> *dic 20 17:07:10 com-exp-dev kernel: RAX: 02ff RBX:
>  RCX: fffeee9e73b8*
> *dic 20 17:07:10 com-exp-dev kernel: RDX: 00a1 RSI:
> 8884d8371400 RDI: a61704f8fdcc*
> *dic 20 17:07:10 com-exp-dev kernel: RBP: 8884d8371000 R08:
> fffeee9e73b8 R09: a61704f8fdd0*
> *dic 20 17:07:10 com-exp-dev kernel: R10: 02ff R11:
> 0018 R12: 8884d8371000*
> *dic 20 17:07:10 com-exp-dev kernel: R13: 8884d8371400 R14:
> a61704f8fdcc R15: 8884c8331d84*
> *dic 20 17:07:10 com-exp-dev kernel: FS:  ()
> GS:8884df50() knlGS:*
> *dic 20 17:07:10 com-exp-dev kernel: CS:  0010 DS:  ES:  CR0:
> 80050033*
> *dic 20 17:07:10 com-exp-dev kernel: CR2: fffeee9e4187 CR3:
> 00042080a005 CR4: 003606e0*
> *dic 20 17:07:10 com-exp-dev kernel: DR0:  DR1:
>  DR2: *
> *dic 20 17:07:10 com-exp-dev kernel: DR3:  DR6:
> fffe0ff0 DR7: 0400*
> *dic 20 17:07:10 com-exp-dev kernel: Call Trace:*
> *dic 20 17:07:10 com-exp-dev kernel:  ? __switch_to_asm+0x35/0x70*
> *dic 20 17:07:10 com-exp-dev kernel:  ? __switch_to_asm+0x41/0x70*
> *dic 20 17:07:10 com-exp-dev kernel:  ? __switch_to_asm+0x35/0x70*
> *dic 20 17:07:10 com-exp-dev kernel:  ? __switch_to_asm+0x41/0x70*
> *dic 20 17:07:10 com-exp-dev kernel:  ? efi_call+0x58/0x90*
> *dic 20 17:07:10 com-exp-dev kernel:  ? __switch_to_asm+0x41/0x70*
> *dic 20 17:07:10 com-exp-dev kernel:  ? 

Re: [PATCH] Alchemy: Fix rt_task_unblock() for RT_MUTEX

2022-04-04 Thread Jan Kiszka via Xenomai
On 04.04.22 14:17, Richard Weinberger wrote:
> - Ursprüngliche Mail -
>> Von: "Jan Kiszka" 
>> An: "richard" , "xenomai" 
>> Gesendet: Montag, 4. April 2022 13:21:45
>> Betreff: Re: [PATCH] Alchemy: Fix rt_task_unblock() for RT_MUTEX
> 
>> On 30.03.22 22:16, Richard Weinberger via Xenomai wrote:
>>> Starting with Xenomai 3, RT_MUTEX is based on libcobalt's pthread mutex
>>> implementation.
>>> POSIX requires that pthread_mutex_lock() shall not return EINTR,
>>> this requirement breaks rt_task_unblock() if a RT_TASK blocks on
>>> a RT_MUTEX.
>>>
>>> To restore the functionality provide a new function,
>>> pthread_mutex_lock_eintr_np().
>>> It can get interrupted and will return EINTR to the caller.
>>>
>>> Signed-off-by: Richard Weinberger 
>>> ---
>>>  include/cobalt/pthread.h |  2 ++
>>>  lib/alchemy/mutex.c  |  2 +-
>>>  lib/cobalt/mutex.c   | 14 --
>>>  3 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/cobalt/pthread.h b/include/cobalt/pthread.h
>>> index 3e9bd47053bc..2994c2467219 100644
>>> --- a/include/cobalt/pthread.h
>>> +++ b/include/cobalt/pthread.h
>>> @@ -62,6 +62,8 @@ COBALT_DECL(int, pthread_mutex_destroy(pthread_mutex_t
>>> *mutex));
>>>  
>>>  COBALT_DECL(int, pthread_mutex_lock(pthread_mutex_t *mutex));
>>>  
>>> +COBALT_DECL(int, pthread_mutex_lock_eintr_np(pthread_mutex_t *mutex));
>>> +
>>>  COBALT_DECL(int, pthread_mutex_timedlock(pthread_mutex_t *mutex,
>>>  const struct timespec *to));
>>>  
>>> diff --git a/lib/alchemy/mutex.c b/lib/alchemy/mutex.c
>>> index f8933858647a..bb97395142aa 100644
>>> --- a/lib/alchemy/mutex.c
>>> +++ b/lib/alchemy/mutex.c
>>> @@ -327,7 +327,7 @@ int rt_mutex_acquire_timed(RT_MUTEX *mutex,
>>>  
>>> /* Slow path. */
>>> if (abs_timeout == NULL) {
>>> -   ret = -__RT(pthread_mutex_lock(>lock));
>>> +   ret = -__RT(pthread_mutex_lock_eintr_np(>lock));
>>
>> This won't build for mercury, will it?
> 
> Uff yes. Just tried building with --with-core=mercury and it failed. ;-\
> 
> So, when mercury is enabled, alchemy will use NPTL.
> This means rt_task_unblock() cannot work on mercury because
> pthread_mutex_lock() will not return EINTR either.
> 
> Fixing the build for mercury should be easy by adding an ifdef 
> CONFIG_XENO_MERCURY.
> But I'm not sure how to fix unblocking for the mercury case.

Either by ignoring that case on mercury (it has not predecessor in
Xenomai 2, and it can't be fixed via posix means) or by redefining the
behavior for everyone, i.e. by exempting rt_mutex_acquire* from
rt_task_unblock.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] Alchemy: Fix rt_task_unblock() for RT_MUTEX

2022-04-04 Thread Jan Kiszka via Xenomai
On 30.03.22 22:16, Richard Weinberger via Xenomai wrote:
> Starting with Xenomai 3, RT_MUTEX is based on libcobalt's pthread mutex
> implementation.
> POSIX requires that pthread_mutex_lock() shall not return EINTR,
> this requirement breaks rt_task_unblock() if a RT_TASK blocks on
> a RT_MUTEX.
> 
> To restore the functionality provide a new function, 
> pthread_mutex_lock_eintr_np().
> It can get interrupted and will return EINTR to the caller.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  include/cobalt/pthread.h |  2 ++
>  lib/alchemy/mutex.c  |  2 +-
>  lib/cobalt/mutex.c   | 14 --
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/include/cobalt/pthread.h b/include/cobalt/pthread.h
> index 3e9bd47053bc..2994c2467219 100644
> --- a/include/cobalt/pthread.h
> +++ b/include/cobalt/pthread.h
> @@ -62,6 +62,8 @@ COBALT_DECL(int, pthread_mutex_destroy(pthread_mutex_t 
> *mutex));
>  
>  COBALT_DECL(int, pthread_mutex_lock(pthread_mutex_t *mutex));
>  
> +COBALT_DECL(int, pthread_mutex_lock_eintr_np(pthread_mutex_t *mutex));
> +
>  COBALT_DECL(int, pthread_mutex_timedlock(pthread_mutex_t *mutex,
>const struct timespec *to));
>  
> diff --git a/lib/alchemy/mutex.c b/lib/alchemy/mutex.c
> index f8933858647a..bb97395142aa 100644
> --- a/lib/alchemy/mutex.c
> +++ b/lib/alchemy/mutex.c
> @@ -327,7 +327,7 @@ int rt_mutex_acquire_timed(RT_MUTEX *mutex,
>  
>   /* Slow path. */
>   if (abs_timeout == NULL) {
> - ret = -__RT(pthread_mutex_lock(>lock));
> + ret = -__RT(pthread_mutex_lock_eintr_np(>lock));

This won't build for mercury, will it?

Jan

>   goto done;
>   }
>  
> diff --git a/lib/cobalt/mutex.c b/lib/cobalt/mutex.c
> index 73e45a1c4396..2ef02a175c13 100644
> --- a/lib/cobalt/mutex.c
> +++ b/lib/cobalt/mutex.c
> @@ -314,7 +314,7 @@ COBALT_IMPL(int, pthread_mutex_destroy, (pthread_mutex_t 
> *mutex))
>   *
>   * @apitags{xthread-only, switch-primary}
>   */
> -COBALT_IMPL(int, pthread_mutex_lock, (pthread_mutex_t *mutex))
> +static int __pthread_mutex_lock(pthread_mutex_t *mutex, bool want_eintr)
>  {
>   struct cobalt_mutex_shadow *_mutex =
>   &((union cobalt_mutex_union *)mutex)->shadow_mutex;
> @@ -373,7 +373,7 @@ slow_path:
>  
>   do
>   ret = XENOMAI_SYSCALL1(sc_cobalt_mutex_lock, _mutex);
> - while (ret == -EINTR);
> + while (ret == -EINTR && !want_eintr);
>  
>   if (ret == 0)
>   _mutex->lockcnt = 1;
> @@ -392,6 +392,16 @@ protect:
>   goto fast_path;
>  }
>  
> +COBALT_IMPL(int, pthread_mutex_lock, (pthread_mutex_t *mutex))
> +{
> + return __pthread_mutex_lock(mutex, false);
> +}
> +
> +COBALT_IMPL(int, pthread_mutex_lock_eintr_np, (pthread_mutex_t *mutex))
> +{
> + return __pthread_mutex_lock(mutex, true);
> +}
> +
>  /**
>   * Attempt, during a bounded time, to lock a mutex.
>   *

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] Alchemy: Fix rt_task_unblock() for RT_MUTEX

2022-04-04 Thread Jan Kiszka via Xenomai
On 04.04.22 11:39, Richard Weinberger via Xenomai wrote:
> On Fri, Apr 1, 2022 at 9:16 AM Bezdeka, Florian via Xenomai
>  wrote:
>>> With my changes at least the application works again and the tests
>>> on the customer side pass.
>>
>> We would need such / similar tests as well inside the Xenomai testsuite
>> to make sure we don't break it again.
> 
> I'm a bit unsure where to place the test.
> Under testsuite/ seems to be not a single alchemy related test.
> Shall I create a new folder?

There is lib/alchemy/testsuite/ - but also
https://gitlab.com/Xenomai/xenomai-hacker-space/-/issues/32.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



dovetail-5.10: build-breakage on ARM

2022-03-29 Thread Jan Kiszka via Xenomai
Hi Philippe,

likely by accident: 5.10-dovetail-rebase lost disable_irq_if_pipelined /
enable_irq_if_pipelined macros. 5.16 was affected as well, 5.15 and 5.17
are not.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: xeno-test fails when kernel 64bit userland 32bit

2022-03-25 Thread Jan Kiszka via Xenomai
On 25.03.22 14:01, Paal Tamas wrote:
>  
> 
> Jan Kiszka  írta:
> 
> On 25.03.22 13:26, Paal Tamas via Xenomai wrote:
> > Dear All,
> > I am using xenomai_3.2.1 with kernel 5.10-dovetail on x86_64 platform. 
> I compile the kernel to 64bit, but I need to have the userspace as 32bit. So 
> I configured the userspace compilation the following way:
> > ./configure --with-core=cobalt --enable-smp --enable-pshared 
> --host=i686-linux CFLAGS="-m32 -D_FILE_OFFSET_BITS=64" LDFLAGS=-m32 
> host_alias=i686-linux
> > I tried to start RTNET, but I get "Inappropriate ioctl for device" 
> errors. I reported this issue in another thread.
> > The problem seems to be not RTNET realted, since xeno-test fails with 
> the same error: desktop:/usr/xenomai/bin$ sudo ./xeno-test
> > Started child 2202: /bin/bash /usr/xenomai/bin/xeno-test-run-wrapper 
> ./xeno-test
> > ++ echo 0
> > ++ testdir=/usr/xenomai/bin
> > ++ which systemctl
> > ++ systemctl is-active --quiet systemd-timesyncd
> > ++ timesyncd_was_running=true
> > ++ systemctl stop systemd-timesyncd
> > ++ /usr/xenomai/bin/smokey --run random_alloc_rounds=64 
> pattern_check_rounds=64
> > arith OK
> > bufp skipped (no kernel support)
> > cpu_affinity skipped (no kernel support)
> > fpu_stress OK
> > gdb OK
> > iddp skipped (no kernel support)
> > leaks OK
> > memory_coreheap OK
> > memory_heapmem OK
> > memory_tlsf OK
> > setup.c:96, ioctl(fd, IOC_RT_IFINFO, ): Inappropriate ioctl for 
> device
> > /usr/xenomai/bin/smokey: test net_packet_dgram failed: Inappropriate 
> ioctl for device
> > + start_timesyncd
> > + true
> > + systemctl start systemd-timesyncd
> > + timesyncd_was_running=false
> > child 2202 returned: exited with status 1
> > What am I doing wrong?
> 
> Nothing, known issue in principle:
> https://gitlab.com/Xenomai/xenomai-hacker-space/-/issues/21
> 
> Jan
> 
> -- 
> Siemens AG, Technology
> Competence Center Embedded Linux
> 
> Jan,
> 

The citation settings of your email client are misconfigured.

>  
> 
> Thank you for the fast response!
> 
> Do you have any guess when this "mixed system" will work?
> 

There are multiple issues to be debugged and patched. Someone has to sit
down and do at least the debugging work. And we need that image for the
test lab so that the result is validated during regular runs. So far
none of those to-dos have a schedule yet.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: xeno-test fails when kernel 64bit userland 32bit

2022-03-25 Thread Jan Kiszka via Xenomai
On 25.03.22 13:26, Paal Tamas via Xenomai wrote:
> Dear All,
> I am using xenomai_3.2.1 with kernel 5.10-dovetail on x86_64 platform. I 
> compile the kernel to 64bit, but I need to have the userspace as 32bit. So I 
> configured the userspace compilation the following way:
> ./configure --with-core=cobalt --enable-smp --enable-pshared 
> --host=i686-linux CFLAGS="-m32 -D_FILE_OFFSET_BITS=64" LDFLAGS=-m32 
> host_alias=i686-linux
> I tried to start RTNET, but I get "Inappropriate ioctl for device" errors. I 
> reported this issue in another thread.
> The problem seems to be not RTNET realted, since xeno-test fails with the 
> same error: desktop:/usr/xenomai/bin$ sudo ./xeno-test
> Started child 2202: /bin/bash /usr/xenomai/bin/xeno-test-run-wrapper 
> ./xeno-test
> ++ echo 0
> ++ testdir=/usr/xenomai/bin
> ++ which systemctl
> ++ systemctl is-active --quiet systemd-timesyncd
> ++ timesyncd_was_running=true
> ++ systemctl stop systemd-timesyncd
> ++ /usr/xenomai/bin/smokey --run random_alloc_rounds=64 
> pattern_check_rounds=64
> arith OK
> bufp skipped (no kernel support)
> cpu_affinity skipped (no kernel support)
> fpu_stress OK
> gdb OK
> iddp skipped (no kernel support)
> leaks OK
> memory_coreheap OK
> memory_heapmem OK
> memory_tlsf OK
> setup.c:96, ioctl(fd, IOC_RT_IFINFO, ): Inappropriate ioctl for device
> /usr/xenomai/bin/smokey: test net_packet_dgram failed: Inappropriate ioctl 
> for device
> + start_timesyncd
> + true
> + systemctl start systemd-timesyncd
> + timesyncd_was_running=false
> child 2202 returned: exited with status 1
> What am I doing wrong?

Nothing, known issue in principle:
https://gitlab.com/Xenomai/xenomai-hacker-space/-/issues/21

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: Bug in xenomai 3.2.x on ARM

2022-03-22 Thread Jan Kiszka via Xenomai
On 22.03.22 18:31, Greg Gallagher via Xenomai wrote:
> On Tue, Mar 22, 2022 at 12:49 PM François Legal via Xenomai <
> xenomai@xenomai.org> wrote:
> 
>> Hello,
>>
>> trying to port a running Xenomai 3.1.x/Linux 4.4.x system to Xenomai
>> 3.2.x/Linux 4.4.x running on Zynq7000, I found a (maybe) bug in intr.c
>>
>> In xnintr_attach, the call to ipipe_set_irq_affinity (when in SMP) expects
>> a 0 return code. However, on my platform, I end up in irq_gic.c in
>> gic_set_affinity() which returns IRQ_SET_MASK_OK_DONE (2) when everything
>> is fine.
>>
>> This IRQ_SET_MASK_OK_DONE does not seem very common in the kernel (I could
>> find a few occurences, on Gic v3 so maybe AARCH64 has the same problem,
>> plus some PCI devices).
>>
>> I was wondering if this was something to fix in ipipe or xenomai ?
>>
>> François
> 
> 
> Do the newer kernels have the same return value? Do we test 3.2.x on the
> older kernels? Maybe just cip ?
> 

We limited testing to 4.4-cip with 3.0.x. Newer kernels require newer
Xenomai, and vice versa.

But let's first understand if the issue if 4.4-specific.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [rtnet] kernel bug during slave configuration

2022-03-21 Thread Jan Kiszka via Xenomai
On 21.03.22 17:25, Jan Kiszka via Xenomai wrote:
> On 21.03.22 15:40, Mauro via Xenomai wrote:
>> Hi all,
>>
>> I'm using Xenomai 3.1.2 on a Intel Atom x5-E8000 64bit with an Intel
>> I210 gigabit ethernet controller. Linux kernel is 5.4.181.
>>
>> I have two identical devices, one configured as master:
>>
>> -
>> $ cat /etc/rtnet.conf
>> prefix="/usr"
>> exec_prefix="/usr"
>> RTNET_MOD="/lib/modules/`uname -r`/kernel/drivers/xenomai/net"
>> RTIFCONFIG="/usr/sbin/rtifconfig"
>> RTCFG="/usr/sbin/rtcfg"
>> TDMACFG="/usr/sbin/tdmacfg"
>> MODULE_EXT=".ko"
>>
>> RT_DRIVER="rt_igb"
>> RT_DRIVER_OPTIONS=""
>>
>> REBIND_RT_NICS=":03:00.0"
>>
>> IPADDR="10.0.0.1"
>> NETMASK=""
>>
>> RT_LOOPBACK="yes"
>> RT_PROTOCOLS="udp packet"
>> RTCAP="no"
>>
>> STAGE_2_SRC=""
>> STAGE_2_DST=""
>> STAGE_2_CMDS=""
>>
>> TDMA_MODE="master"
>> TDMA_SLAVES="10.0.0.2"
>> TDMA_CYCLE="1000"
>> TDMA_OFFSET="200"
>> #TDMA_CONFIG="/etc/tdma.conf"
>> -
>>
>> and one as slave:
>>
>> -
>> $ cat /etc/rtnet.conf
>> prefix="/usr"
>> exec_prefix="/usr"
>> RTNET_MOD="/lib/modules/`uname -r`/kernel/drivers/xenomai/net"
>> RTIFCONFIG="/usr/sbin/rtifconfig"
>> RTCFG="/usr/sbin/rtcfg"
>> TDMACFG="/usr/sbin/tdmacfg"
>> MODULE_EXT=".ko"
>>
>> RT_DRIVER="rt_igb"
>> RT_DRIVER_OPTIONS=""
>>
>> REBIND_RT_NICS=":03:00.0"
>>
>> IPADDR="10.0.0.2"
>> NETMASK=""
>> RT_LOOPBACK="yes"
>> RT_PROTOCOLS="udp packet"
>> RTCAP="no"
>>
>> STAGE_2_SRC=""
>> STAGE_2_DST=""
>> STAGE_2_CMDS=""
>>
>> TDMA_MODE="slave"
>>
>> TDMA_SLAVES="10.0.0.2 10.0.0.3 10.0.0.4"
>> TDMA_CYCLE="5000"
>> TDMA_OFFSET="200"
>> #TDMA_CONFIG="/etc/tdma.conf"
>> -
>>
>> I start rtnet with "rtnet start" on master
>>
>> $ rtnet start
>> Waiting for all slaves...
>>
>> dmesg on master shows:
>>
>> ...
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> TDMA: Failed to transmit sync frame!
>> rt_igb: rteth0: igb: rteth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: RX/TX
>>
>>
>> Then, I start rtnet with "rtnet start" on slave
>>
>> $ rtnet start
>> Stage 1: searching for master...
>> Stage 2: waiting for other slaves...
>> Stage 3: waiting for common setup completion...ioctl: Invalid argument
>>
>> dmesg on slave shows:
>>
>> *** RTnet for Xenomai v3.1.2 ***
>>
>> RTnet: initialising real-time networking
>> rt_igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>> rt_igb: Copyright (c) 2007-2014 Intel Corporation.
>> igb :03:00.0: removed PHC on eth1
>> RTnet: registered rteth0
>> rt_igb :03:00.0: Intel(R) Gigabit Ethernet Network Connection
>> rt_igb :03:00.0: rteth0: (PCIe:2.5Gb/s:Width x1) 00:30:d6:2b:78:c9
>> rt_igb :03:00.0: rteth0: PBA No: FF-0FF
>> rt_igb :03:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
>> rt_loopback: initializing loopback interface...
>> RTnet: registered rtlo
>> RTcfg: init real-time configuration distribution protocol
>> RTmac: init realtime media access control
>> RTmac/TDMA: init time division multiple access control mechanism
>> udevd[401]: Error changing net interface name vnic0 to : Invalid argument
>> udevd[401]: could not rename interface '5' from 'vnic0' to '': Invalid
>> argument
>> rt_igb: rteth0: igb: rteth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
>> Control: RX/TX
>> usercopy: Kernel memory exposure attempt detected from SLUB object
>> 'rtskb_slab_pool' (offset 219, size 66)!
>> invalid opcode:  [#1] PREEMPT SMP PTI
>> CPU: 

Re: [rtnet] kernel bug during slave configuration

2022-03-21 Thread Jan Kiszka via Xenomai
On 21.03.22 15:40, Mauro via Xenomai wrote:
> Hi all,
> 
> I'm using Xenomai 3.1.2 on a Intel Atom x5-E8000 64bit with an Intel
> I210 gigabit ethernet controller. Linux kernel is 5.4.181.
> 
> I have two identical devices, one configured as master:
> 
> -
> $ cat /etc/rtnet.conf
> prefix="/usr"
> exec_prefix="/usr"
> RTNET_MOD="/lib/modules/`uname -r`/kernel/drivers/xenomai/net"
> RTIFCONFIG="/usr/sbin/rtifconfig"
> RTCFG="/usr/sbin/rtcfg"
> TDMACFG="/usr/sbin/tdmacfg"
> MODULE_EXT=".ko"
> 
> RT_DRIVER="rt_igb"
> RT_DRIVER_OPTIONS=""
> 
> REBIND_RT_NICS=":03:00.0"
> 
> IPADDR="10.0.0.1"
> NETMASK=""
> 
> RT_LOOPBACK="yes"
> RT_PROTOCOLS="udp packet"
> RTCAP="no"
> 
> STAGE_2_SRC=""
> STAGE_2_DST=""
> STAGE_2_CMDS=""
> 
> TDMA_MODE="master"
> TDMA_SLAVES="10.0.0.2"
> TDMA_CYCLE="1000"
> TDMA_OFFSET="200"
> #TDMA_CONFIG="/etc/tdma.conf"
> -
> 
> and one as slave:
> 
> -
> $ cat /etc/rtnet.conf
> prefix="/usr"
> exec_prefix="/usr"
> RTNET_MOD="/lib/modules/`uname -r`/kernel/drivers/xenomai/net"
> RTIFCONFIG="/usr/sbin/rtifconfig"
> RTCFG="/usr/sbin/rtcfg"
> TDMACFG="/usr/sbin/tdmacfg"
> MODULE_EXT=".ko"
> 
> RT_DRIVER="rt_igb"
> RT_DRIVER_OPTIONS=""
> 
> REBIND_RT_NICS=":03:00.0"
> 
> IPADDR="10.0.0.2"
> NETMASK=""
> RT_LOOPBACK="yes"
> RT_PROTOCOLS="udp packet"
> RTCAP="no"
> 
> STAGE_2_SRC=""
> STAGE_2_DST=""
> STAGE_2_CMDS=""
> 
> TDMA_MODE="slave"
> 
> TDMA_SLAVES="10.0.0.2 10.0.0.3 10.0.0.4"
> TDMA_CYCLE="5000"
> TDMA_OFFSET="200"
> #TDMA_CONFIG="/etc/tdma.conf"
> -
> 
> I start rtnet with "rtnet start" on master
> 
> $ rtnet start
> Waiting for all slaves...
> 
> dmesg on master shows:
> 
> ...
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> TDMA: Failed to transmit sync frame!
> rt_igb: rteth0: igb: rteth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: RX/TX
> 
> 
> Then, I start rtnet with "rtnet start" on slave
> 
> $ rtnet start
> Stage 1: searching for master...
> Stage 2: waiting for other slaves...
> Stage 3: waiting for common setup completion...ioctl: Invalid argument
> 
> dmesg on slave shows:
> 
> *** RTnet for Xenomai v3.1.2 ***
> 
> RTnet: initialising real-time networking
> rt_igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
> rt_igb: Copyright (c) 2007-2014 Intel Corporation.
> igb :03:00.0: removed PHC on eth1
> RTnet: registered rteth0
> rt_igb :03:00.0: Intel(R) Gigabit Ethernet Network Connection
> rt_igb :03:00.0: rteth0: (PCIe:2.5Gb/s:Width x1) 00:30:d6:2b:78:c9
> rt_igb :03:00.0: rteth0: PBA No: FF-0FF
> rt_igb :03:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
> rt_loopback: initializing loopback interface...
> RTnet: registered rtlo
> RTcfg: init real-time configuration distribution protocol
> RTmac: init realtime media access control
> RTmac/TDMA: init time division multiple access control mechanism
> udevd[401]: Error changing net interface name vnic0 to : Invalid argument
> udevd[401]: could not rename interface '5' from 'vnic0' to '': Invalid
> argument
> rt_igb: rteth0: igb: rteth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: RX/TX
> usercopy: Kernel memory exposure attempt detected from SLUB object
> 'rtskb_slab_pool' (offset 219, size 66)!
> invalid opcode:  [#1] PREEMPT SMP PTI
> CPU: 0 PID: 419 Comm: rtcfg Tainted: G    W 5.4.181-xeno #1
> Hardware name: Default string Default string/69823 MSC
> Q7-BW-E8000-13N0220C PCBFTX, BIOS V1.20#KW050220A 03/16/2018
> I-pipe domain: Linux
> RIP: 0010:usercopy_abort+0x7b/0x7d
> Code: bb 48 c7 c2 30 1d e5 bb 4c 0f 45 de 48 c7 c6 a3 0c e4 bb 57 48 0f
> 45 f2 4c 89 d1 4c 89 da 48 c7 c7 d0 1c e5 bb e8 d9 95 ff ff <0f> 0b 49
> 8d 0c 24 4c 8d 03 48 29 d1 31 f6 41 8d 55 00 48 c7 c7 72
> RSP: 0018:b7dac07efbf0 EFLAGS: 00010246
> RAX: 006b RBX: 0042 RCX: 
> RDX:  RSI: 9df43661b4c8 RDI: 
> RBP: b7dac07efc08 R08: 02b5 R09: 0101
> R10: 0001 R11: 0400 R12: 9df4371a38db
> R13: 0001 R14: 9df4371a391d R15: 9df43896af48
> FS:  7fe873ae6540() GS:9df43660()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7fa6254b14e0 CR3: 77026000 CR4: 001006f0
> Call Trace:
>  __check_heap_object+0xed/0x120
>  __check_object_size+0x14c/0x160
>  copy_stage_1_data+0x50/0x80 [rtcfg]
>  rtnet_rtpc_dispatch_call+0x187/0x360 [rtnet]
>  ? cleanup_cmd_del+0x70/0x70 [rtcfg]
>  ? finish_wait+0x90/0x90
>  rtcfg_ioctl+0xa2/0x250 [rtcfg]
>  ? rtdev_get_by_name+0xa6/0xd0 [rtnet]
>  rtnet_ioctl+0xe4/0x180 [rtnet]
>  do_vfs_ioctl+0x40c/0x670
>  ? 

Re: [PATCH] ipipe: noarch: Fix handling of PCIe MSI interrupts for dwc PCIe controller

2022-03-21 Thread Jan Kiszka via Xenomai
On 18.03.22 17:53, Scott Reed wrote:
> Handling of PCIe MSI interrupts resulted in system
> hanging or high latencies.
> 
> Fix is to replaced missed call to generic_handle_irq with
> ipipe_handle_irq().
> 
> Signed-off-by: Scott Reed 
> ---
>  drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
> b/drivers/pci/controller/dwc/pcie-designware-host.c
> index c9fd4e4966ba..7b566da64438 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -100,7 +100,7 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>   irq = irq_find_mapping(pp->irq_domain,
>  (i * MAX_MSI_IRQS_PER_CTRL) +
>  pos);
> - generic_handle_irq(irq);
> + ipipe_handle_demuxed_irq(irq);
>   pos++;
>   }
>   }

Thanks, applied to noarch, ipipe/master and stable/4.19.x.

Greg, you can pick up.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62

2022-03-17 Thread Jan Kiszka via Xenomai
On 17.03.22 16:24, Scott Reed wrote:
> 
> 
> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>> On 16.03.22 10:58, Scott Reed wrote:
>>>
>>>
>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>
>>>>
>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>
>>>>>>
>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>
>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>
>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>> /var/log/messages.
>>>>>>>>>
>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>> 5.4.151
>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>
>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>> MSI
>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>> MAC.
>>>>>>>>>
>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>> Also
>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>> interrupt
>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>> 5.4.151,
>>>>>>>>> but
>>>>>>>>> see the same hang.
>>>>>>>>
>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>> ipipe
>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>> instead.
>>>>>>>
>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>> did not
>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>
>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>> and I-pipe?
>>>>>>>>>
>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>
>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>> definitely.
>>>>>>>
>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>> that
>>>>>>> it will not be too much effort and report back.
>>>>>>
>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>> first
>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>> kernel
>>>>>> on my platform.
>

Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62

2022-03-16 Thread Jan Kiszka via Xenomai
On 16.03.22 10:58, Scott Reed wrote:
> 
> 
> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>
>>
>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>
>>>>
>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>
>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>
>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>> hangs with no message output on the serial console or in
>>>>>>> /var/log/messages.
>>>>>>>
>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>
>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>>>
>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>> interrupt
>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>>>> but
>>>>>>> see the same hang.
>>>>>>
>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>>>
>>>>> To do a quick test, I just applied the change from the commit you
>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>> did not
>>>>> help (hang still occurs with first interrupt).
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>
>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>> and I-pipe?
>>>>>>>
>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>> the problem. Would this be recommended?
>>>>>>
>>>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>>>
>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>>>> it will not be too much effort and report back.
>>>>
>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>>>> on my platform.
>>>>
>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>> i.MX 6 is not working (cannot ping in or out).
>>>
>>> Do you have or did you have any custom patches on top?
>>
>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>     μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>
>>>
>>>>
>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>> example, the source MAC address should be
>>>>    00:09:cc:02:c1:b6
>>>> but is
>>>>    00:01:cc:02:01:36 or
>>>>    00:09:cc:02:c1:36
>>>> Wireshark also complains about the Frame check sequence
>>>> ([FCS Status: Unverified]
>>>>
>>>> 

Re: CONFIG_NO_HZ_FULL = y but still have arch-timer on isolation CPUs

2022-03-16 Thread Jan Kiszka via Xenomai
On 15.03.22 05:01, Ivan Jiang via Xenomai wrote:
> Dear Guys:
> 
>  
> 
>    I’ve set the configs like this
> 
>    CONFIG_NO_HZ_FULL = y
> 
> CONFIG_RCU_NOCB_CPU=y
> 
> CONFIG_PREEMPT=y
> 
> CONFIG_CPU_IDLE=n
> 
> CONFIG_ARM_CPUIDLE=n
> 
> CONFIG_CPU_FREQ=n 
> 
> And setenv isolcpus=1 xenomai.supported_cpus=0x02 nohz_full=1  irqaffinity=0  
>  rcu_nocbs=1
> 
> The CPU is Cortex-A55 Dual core and I use CPU 0 as Linux CPU and CPU1 for 
> isolation core.
> 
> But cat /proc/interrupts still the arch_timers are increasing the same time 
> on both CPUs.
> 
> Seems NO_HZ_FULL = y has no effect.
> 

The boundary conditions for nohz-full are challenging, already for
"normal" Linux apps. So far, Xenomai does not take any steps to support
this, and so it would be no surprise if the condition for turning off
the scheduler and, thus, also the timer tick are not met.

But as you already found out: Even plain Linux does not succeed in your
case. Maybe explore this first with a more recent vanilla kernel, a
single syscall-free CPU-bound task and official documentation on
nohz-full. After that starts to work, it could indeed become a Xenomai
topic (though likely no longer for I-pipe, ie. anything before 5.10).

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: waitqueue vs. mutex behavior

2022-03-15 Thread Jan Kiszka via Xenomai
On 15.03.22 19:27, Matt Klass via Xenomai wrote:
> Using Xenomai 3.0.10, with kernel 4.9.128-05789, on armv7, we're having
> problems with the functionality of rtdm_waitqueues. The code was written by
> a Xenomai-adept developer who has since left for greener pastures.
> 
> We have two functions that use rtdm_waitqueue_lock/unlock on the same
> rtdm_waitqueue_t to manage access to a shared data structure. One is an
> rtdm_task_t that runs periodically every 1ms, the second is an IOCTL
> handler.
> 
> Problem: In some circumstances, one of the two functions will acquire the
> lock, and access the shared data structure. But before the first function
> releases the lock, the second function seems to also acquire the lock, and
> begin to access its own access of the shared data structure. The second
> function releases its lock after its work is complete, and then when the
> first function tries to release the lock, it gets an "already unlocked"
> error from Xenomai:
> 
> [Xenomai] lock 80f10020 already unlocked on CPU #0
>   last owner = kernel/xenomai/sched.c:908 (___xnsched_run(), CPU #0)
> [<8010ed78>] (unwind_backtrace) from [<8010b5f0>] (show_stack+0x10/0x14)
> [<8010b5f0>] (show_stack) from [<801c8c08>] (xnlock_dbg_release+0x12c/0x138)
> [<801c8c08>] (xnlock_dbg_release) from [<801be110>] (___xnlock_put+0xc/0x38)
> [<801be110>] (___xnlock_put) from [<7f000434>]
> (myengine_rtdm_waitqueue_unlock_with_num+0xf8/0x13c [engine_rtnet])
> [<7f000434>] (myengine_rtdm_waitqueue_unlock_with_num [engine_rtnet]) from
> [<7f00ace8>] (engine_rtnet_periodic_task+0x604/0x660 [engine_rtnet])
> [<7f00ace8>] (engine_rtnet_periodic_task [engine_rtnet]) from [<801c73ac>]
> (kthread_trampoline+0x68/0xa4)
> [<801c73ac>] (kthread_trampoline) from [<80147190>] (kthread+0x108/0x110)
> [<80147190>] (kthread) from [<80107cd4>] (ret_from_fork+0x18/0x24)
> 
> 
> These waitqueues were originally mutexes, and the above-mentioned adept
> committed this change to waitqueues seven years ago with the following
> comment: "Use Wait Queue instead of Mutex, because Mutex can't be called
> from the non-RT context."
> 
> We'd expect that once one of the functions obtains the lock on the
> waitqueue, the other would be blocked until the first function releases the
> lock. It's quite possible, likely really, that we don't understand the
> differences between mutexes and waitqueues. We've looked at the online
> Xenomai documentation on waitqueues, but we have not been enlightened.
> 
> 
> Would you have any suggestions on things we should do (or not do) to figure
> out what's going on?
> 

rtdm_waitqueue_lock/unlock is surely no replacement for
rtdm_mutex_lock/unlock to be used in non-rt contexts. It exists in order
to prepare the caller for waiting in a queue, and that waiting shares
the same constraint that rtdm_mutex_lock have: the caller must be RT.
Furthermore, the lock will obviously be dropped while being blocked on
the waitqueue.

If you need synchronization between RT and non-RT contexts, you should
use rtdm_lock_get_irqsave/put_irqrestore AND have little code in the
critical section. Definitely not any code that could sleep, call random
Linux functions or do even worse things. Or you need to ensure to
promote the non-RT caller to RT on entry.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



[PATCH] arm64: ipipe: Fix section mismatch of __ipipe_tsc_register

2022-03-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

The kernel warns:

The function dw_apb_clocksource_register() references
the function __init __ipipe_tsc_register().
This is often because dw_apb_clocksource_register lacks a __init
annotation or the annotation of __ipipe_tsc_register is wrong.

Signed-off-by: Jan Kiszka 
---

Developed for 5.4 but probably also 4.19 material.

 arch/arm64/kernel/ipipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/ipipe.c b/arch/arm64/kernel/ipipe.c
index 9dcb54c636395..a60787230fd84 100644
--- a/arch/arm64/kernel/ipipe.c
+++ b/arch/arm64/kernel/ipipe.c
@@ -228,7 +228,7 @@ void __ipipe_root_sync(void)
 
 static struct __ipipe_tscinfo tsc_info;
 
-void __init __ipipe_tsc_register(struct __ipipe_tscinfo *info)
+void __ipipe_tsc_register(struct __ipipe_tscinfo *info)
 {
tsc_info = *info;
__ipipe_hrclock_freq = info->freq;
-- 
2.34.1


-- 
Siemens AG, Technology
Competence Center Embedded Linux



[PATCH] arm64: ipipe: Make erratum_1418040_thread_switch compatible with I-pipe

2022-03-15 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

First of all, this erratum hook is called from __switch_to, thus
potentially also from the primary domain. Some of the functions it calls
check if preemption was disabled under Linux - which may not be the case
when invoked from primary domain. Rather than adding a costly check for
ipipe_root_p to this hot-path, simply turn the check off if I-pipe is
enabled.

As the hook can be called from primary context, we need to protect its
setup for new execs against those contexts via hard_preempt_disable.

Signed-off-by: Jan Kiszka 
---

This is for 5.4-only, older kernels do no have the erratum fix.

Philippe, the hardening of erratum_1418040_new_exec() could be a topic 
for dovetail as well. preemptible() is fully oob-aware there, though.

 arch/arm64/kernel/cpu_errata.c | 3 ++-
 arch/arm64/kernel/cpufeature.c | 3 ++-
 arch/arm64/kernel/process.c| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 1e16c4e00e771..7fd7d1c8b9fcc 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -37,7 +37,8 @@ static bool __maybe_unused
 is_affected_midr_range_list(const struct arm64_cpu_capabilities *entry,
int scope)
 {
-   WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+   WARN_ON(scope != SCOPE_LOCAL_CPU ||
+   (preemptible() && !IS_ENABLED(CONFIG_IPIPE)));
return is_midr_in_range_list(read_cpuid_id(), entry->midr_range_list);
 }
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index acdef8d76c64d..d65287cc2148b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2023,7 +2023,8 @@ static void __init mark_const_caps_ready(void)
 
 bool this_cpu_has_cap(unsigned int n)
 {
-   if (!WARN_ON(preemptible()) && n < ARM64_NCAPS) {
+   if (!WARN_ON(!IS_ENABLED(CONFIG_IPIPE) && preemptible()) &&
+   n < ARM64_NCAPS) {
const struct arm64_cpu_capabilities *cap = cpu_hwcaps_ptrs[n];
 
if (cap)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 68c078ab0250c..879ecf0237c88 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -517,9 +517,9 @@ static void erratum_1418040_thread_switch(struct 
task_struct *next)
 
 static void erratum_1418040_new_exec(void)
 {
-   preempt_disable();
+   unsigned long flags = hard_preempt_disable();
erratum_1418040_thread_switch(current);
-   preempt_enable();
+   hard_preempt_enable(flags);
 }
 
 /*
-- 
2.34.1



Re: ipipe-5.4: arm64 regression

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 14:02, Greg Gallagher wrote:
> 
> 
> On Mon, Mar 14, 2022 at 8:33 AM Jan Kiszka  <mailto:jan.kis...@siemens.com>> wrote:
> 
> On 04.03.22 00:45, Greg Gallagher wrote:
> >
> >
>     > On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka  <mailto:jan.kis...@siemens.com>
> > <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>>
> wrote:
> >
> >     On 02.03.22 16:44, Greg Gallagher wrote:
> >     >
> >     >
> >     > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka
> mailto:jan.kis...@siemens.com>
> >     <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>
> >     > <mailto:jan.kis...@siemens.com
> <mailto:jan.kis...@siemens.com> <mailto:jan.kis...@siemens.com
> <mailto:jan.kis...@siemens.com>>>>
> >     wrote:
> >     >
> >     >     Hi Greg,
> >     >
> >     >     something is going wrong on arm64 with latest ipipe version,
> >     see e.g.
> >     >
> >     >   
> >   
>   https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>
> >   
>  <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>
> >     >   
> >   
>   <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>
> >   
>  <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>>
> >     >     (same thing seen on HiKey as well)
> >     >
> >     >     Could you have a look?
> >     >
> >     >     Thanks,
> >     >     Jan
> >     >
> >     >     --
> >     >     Siemens AG, Technology
> >     >     Competence Center Embedded Linux
> >     >
> >     >
> >     > I'll take a look, it will be close to the end of the week
> but i'll aim
> >     > to have it root caused by the weekend.
> >     >
> >
> >     Just tried locally with xenomai-images and qemu-arm64 (just
> run smokey):
> >
> >     [  408.747349] Kernel panic - not syncing: kernel stack overflow
> >     [  408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted:
> G   
> >         W         5.4.180+ #1
> >     [  408.747762] Hardware name: linux,dummy-virt (DT)
> >     [  408.747852] I-pipe domain: Xenomai
> >     [  408.747941] Call trace:
> >     ...
> >     [  408.761131]  do_debug_exception+0x94/0x240
> >     [  408.761255]  el1_dbg+0x18/0x8c
> >     [  408.761329]  this_cpu_has_cap+0x60/0x7c
> >     [  408.761423]  erratum_1418040_thread_switch+0x18/0x5c
> >     [  408.761534]  __switch_to+0xf8/0x154
> >     [  408.761622]  xnarch_switch_to+0x5c/0xc4
> >     [  408.761711]  pipeline_switch_to+0x14/0x84
> >     [  408.761803]  ___xnsched_run+0x154/0x240
> >     [  408.761889]  pipeline_schedule+0x30/0x40
> >     [  408.761999]  xnintr_core_clock_handler+0x250/0x260
> >     [  408.762107]  dispatch_irq_head+0x84/0x120
> >     [  408.762198]  __ipipe_dispatch_irq+0x19c/0x1c4
> >     [  408.762293]  __ipipe_grab_irq+0x5c/0xa0
> >     [  408.762377]  gic_handle_irq+0x54/0xb0
> >     [  408.762457]  handle_arch_irq_pipelined+0x14/0x60
> >     [  408.762557]  el0_irq_naked+0x5c/0x84
> >     [  408.762905] SMP: stopping secondary CPUs
> >
> >     This dbg trap from erratum_1418040_thread_switch looks
> suspicious, and
> >     if I had to bet, I would say it somehow relates to [1] which
> came with
> >     v5.4.176. But more logical would [2] due to its switch from
> static to
> >     dynamic cpu_has_cap - but that is already in since v5.4.80...
> >
> >     Jan
> >
> >     [1]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b
> 
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b>
> >   
>  
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d

Re: ipipe-5.4: arm64 regression

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 14:02, Greg Gallagher wrote:
> 
> 
> On Mon, Mar 14, 2022 at 8:33 AM Jan Kiszka  <mailto:jan.kis...@siemens.com>> wrote:
> 
> On 04.03.22 00:45, Greg Gallagher wrote:
> >
> >
>     > On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka  <mailto:jan.kis...@siemens.com>
> > <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>>
> wrote:
> >
> >     On 02.03.22 16:44, Greg Gallagher wrote:
> >     >
> >     >
> >     > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka
> mailto:jan.kis...@siemens.com>
> >     <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>
> >     > <mailto:jan.kis...@siemens.com
> <mailto:jan.kis...@siemens.com> <mailto:jan.kis...@siemens.com
> <mailto:jan.kis...@siemens.com>>>>
> >     wrote:
> >     >
> >     >     Hi Greg,
> >     >
> >     >     something is going wrong on arm64 with latest ipipe version,
> >     see e.g.
> >     >
> >     >   
> >   
>   https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>
> >   
>  <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>
> >     >   
> >   
>   <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>
> >   
>  <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>>
> >     >     (same thing seen on HiKey as well)
> >     >
> >     >     Could you have a look?
> >     >
> >     >     Thanks,
> >     >     Jan
> >     >
> >     >     --
> >     >     Siemens AG, Technology
> >     >     Competence Center Embedded Linux
> >     >
> >     >
> >     > I'll take a look, it will be close to the end of the week
> but i'll aim
> >     > to have it root caused by the weekend.
> >     >
> >
> >     Just tried locally with xenomai-images and qemu-arm64 (just
> run smokey):
> >
> >     [  408.747349] Kernel panic - not syncing: kernel stack overflow
> >     [  408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted:
> G   
> >         W         5.4.180+ #1
> >     [  408.747762] Hardware name: linux,dummy-virt (DT)
> >     [  408.747852] I-pipe domain: Xenomai
> >     [  408.747941] Call trace:
> >     ...
> >     [  408.761131]  do_debug_exception+0x94/0x240
> >     [  408.761255]  el1_dbg+0x18/0x8c
> >     [  408.761329]  this_cpu_has_cap+0x60/0x7c
> >     [  408.761423]  erratum_1418040_thread_switch+0x18/0x5c
> >     [  408.761534]  __switch_to+0xf8/0x154
> >     [  408.761622]  xnarch_switch_to+0x5c/0xc4
> >     [  408.761711]  pipeline_switch_to+0x14/0x84
> >     [  408.761803]  ___xnsched_run+0x154/0x240
> >     [  408.761889]  pipeline_schedule+0x30/0x40
> >     [  408.761999]  xnintr_core_clock_handler+0x250/0x260
> >     [  408.762107]  dispatch_irq_head+0x84/0x120
> >     [  408.762198]  __ipipe_dispatch_irq+0x19c/0x1c4
> >     [  408.762293]  __ipipe_grab_irq+0x5c/0xa0
> >     [  408.762377]  gic_handle_irq+0x54/0xb0
> >     [  408.762457]  handle_arch_irq_pipelined+0x14/0x60
> >     [  408.762557]  el0_irq_naked+0x5c/0x84
> >     [  408.762905] SMP: stopping secondary CPUs
> >
> >     This dbg trap from erratum_1418040_thread_switch looks
> suspicious, and
> >     if I had to bet, I would say it somehow relates to [1] which
> came with
> >     v5.4.176. But more logical would [2] due to its switch from
> static to
> >     dynamic cpu_has_cap - but that is already in since v5.4.80...
> >
> >     Jan
> >
> >     [1]
> >   
>  
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b
> 
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b>
> >   
>  
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d

Re: [PATCH v8 0/4] Kernel-Shark and libtraceevent plugins

2022-03-15 Thread Jan Kiszka via Xenomai
On 15.03.22 02:32, Chen, Hongzhan wrote:
> Looks good to me. Thanks for your help.
> 

Thanks for your effort! The last "few percentages" to get
autoconf/automake integration were indeed tougher than I thought.

Jan

> Regards
> 
> Hongzhan Chen
> 
> -Original Message-
> From: Jan Kiszka  
> Sent: Monday, March 14, 2022 6:01 PM
> To: xenomai@xenomai.org
> Cc: Chen, Hongzhan 
> Subject: [PATCH v8 0/4] Kernel-Shark and libtraceevent plugins
> 
> Changes in v8:
>  - drop explicit deps again - no longer unneeded after refreshing local
>libtracecmd installation
> 
> Changes in v7:
>  - reworked installation
>  - fixed build of kernelshark plugin (missing dep)
>  - dropped applied first patch
> 
> Jan
> 
> 
> CC: Hongzhan Chen 
> 
> Hongzhan Chen (3):
>   build: add options to build plugins of kernelshark and libtraceevent
>   KernelShark: Add xenomai_cobalt_switch_events plugin for KernelShark
>   libtraceevent: Add xenomai_schedparams plugin for libtraceevent
> 
> Jan Kiszka (1):
>   libs: Silence installation output of libtool
> 
>  Makefile.am   |   4 +
>  configure.ac  |  35 
>  lib/alchemy/Makefile.am   |   2 +
>  lib/analogy/Makefile.am   |   2 +
>  lib/cobalt/Makefile.am|   2 +
>  lib/copperplate/Makefile.am   |   2 +
>  lib/mercury/Makefile.am   |   3 +-
>  lib/psos/Makefile.am  |   2 +
>  lib/smokey/Makefile.am|   2 +
>  lib/trank/Makefile.am |   3 +-
>  lib/vxworks/Makefile.am   |   2 +
>  tracing/Makefile.am   |  13 ++
>  tracing/README|  84 +
>  tracing/kernelshark/CobaltSwitchEvents.cpp| 156 
>  tracing/kernelshark/Makefile.am   |  20 ++
>  .../xenomai_cobalt_switch_events.c| 174 ++
>  .../xenomai_cobalt_switch_events.h|  58 ++
>  tracing/libtraceevent/Makefile.am |  19 ++
>  .../plugin_xenomai_schedparams.c  | 158 
>  19 files changed, 739 insertions(+), 2 deletions(-)
>  create mode 100644 tracing/Makefile.am
>  create mode 100644 tracing/README
>  create mode 100644 tracing/kernelshark/CobaltSwitchEvents.cpp
>  create mode 100644 tracing/kernelshark/Makefile.am
>  create mode 100644 tracing/kernelshark/xenomai_cobalt_switch_events.c
>  create mode 100644 tracing/kernelshark/xenomai_cobalt_switch_events.h
>  create mode 100644 tracing/libtraceevent/Makefile.am
>  create mode 100644 tracing/libtraceevent/plugin_xenomai_schedparams.c
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: [PATCH] cobalt/sched: Use nr_cpumask_bits instead of BITS_PER_LONG

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 22:13, Bezdeka, Florian via Xenomai wrote:
> On Mon, 2022-03-14 at 21:05 +, Bezdeka, Florian via Xenomai wrote:
>> Hi Richard,
>>
>> On Mon, 2022-03-14 at 21:38 +0100, Richard Weinberger via Xenomai
>> wrote:
>>> BITS_PER_LONG is too broad, the max number of usable bits is limited
>>> by nr_cpumask_bits.
>>
>> I agree, BITS_PER_LONG seems wrong. But couldn't it be too small as
>> well? It depends on NR_CPUS which might be > BITS_PER_LONG.
> 
> Sorry that was unclear. I assume that the size of the cpumask depends
> somehow on NR_CPUS, which might be > BITS_PER_LONG. So BITS_PER_LONG
> might be too small AND might be too broad.
> 

Good hint, I've adjusted this on merge.

Thanks,
Jan

>>
>> Regards,
>> Florian
>>
>>> Found while debugging a system with CONFIG_DEBUG_PER_CPU_MAPS enabled.
>>>
>>> Signed-off-by: Richard Weinberger 
>>> ---
>>>  kernel/cobalt/sched.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
>>> index 88c4951ed814..aa65fd7f5d63 100644
>>> --- a/kernel/cobalt/sched.c
>>> +++ b/kernel/cobalt/sched.c
>>> @@ -1370,7 +1370,7 @@ static int affinity_vfile_show(struct 
>>> xnvfile_regular_iterator *it,
>>> unsigned long val = 0;
>>> int cpu;
>>>  
>>> -   for (cpu = 0; cpu < BITS_PER_LONG; cpu++)
>>> +   for (cpu = 0; cpu < nr_cpumask_bits; cpu++)
>>> if (cpumask_test_cpu(cpu, _cpu_affinity))
>>> val |= (1UL << cpu);
>>>  
>>> @@ -1395,7 +1395,7 @@ static ssize_t affinity_vfile_store(struct 
>>> xnvfile_input *input)
>>> affinity = xnsched_realtime_cpus; /* Reset to default. */
>>> else {
>>> cpumask_clear();
>>> -   for (cpu = 0; cpu < BITS_PER_LONG; cpu++, val >>= 1) {
>>> +   for (cpu = 0; cpu < nr_cpumask_bits; cpu++, val >>= 1) {
>>> if (val & 1) {
>>> /*
>>>  * The new dynamic affinity must be a strict
>>
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62

2022-03-15 Thread Jan Kiszka via Xenomai
On 14.03.22 18:45, Scott Reed wrote:
> 
> 
> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>
>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>> Hello,
>>>>
>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>
>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>> hangs with no message output on the serial console or in
>>>> /var/log/messages.
>>>>
>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>
>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>
>>>> I have stable system running for some time with Linux 4.14.62 with
>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>> saw same scenario of my system hanging on the first PCIe MSI interrupt
>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>> but
>>>> see the same hang.
>>>
>>> What about 4.19.y-cip? Specifically because of
>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>
>>>
>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>
>> To do a quick test, I just applied the change from the commit you
>> referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
>> help (hang still occurs with first interrupt).
>>
>>>
>>>>
>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>
>>>> What are other people's experiences with using PCIe MSI interrupts
>>>> and I-pipe?
>>>>
>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>> the problem. Would this be recommended?
>>>
>>> If you can migrate your test with reasonable effort, yes, definitely.
>>
>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>> it will not be too much effort and report back.
> 
> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
> on my platform.
> 
> The kernel boots without a problem, but the FEC Ethernet port on the
> i.MX 6 is not working (cannot ping in or out).

Do you have or did you have any custom patches on top?

> 
> I looked at the trace with Wireshark and it looks like when pinging
> out that the ARP packet is corrupt and therefore failing. The ARP
> packet is corrupt in that it looks like various bits are flipped. For
> example, the source MAC address should be
>   00:09:cc:02:c1:b6
> but is
>   00:01:cc:02:01:36 or
>   00:09:cc:02:c1:36
> Wireshark also complains about the Frame check sequence
> ([FCS Status: Unverified]
> 
> I can provide Wireshark dumps if someone is interested, but for me
> at this point I do not want to fight with getting a 5.10.x kernel
> to work as I was pretty far along moving to a 5.4.x kernel with
> ipipe before running into the original problem posted (with ipipe
> my system freezes on the first PCIe MSI interrupt. Note: without
> ipipe, I do not see any issues).
> 
> As mentioned, I first saw this problem a while ago when trying
> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
> then backed back down to 4.14.62+ipipe which works.
> 
> I guess my next strategy is to try to figure out what changed
> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
> the hang as I hope the delta between them is not too large.
> 
> If anyone has other suggestions or tips, they are more than welcome.

As I wrote before: try the latest 4.19-cip-ipipe first.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



Re: ipipe-5.4: arm64 regression

2022-03-14 Thread Jan Kiszka via Xenomai
On 04.03.22 00:45, Greg Gallagher wrote:
> 
> 
> On Thu, Mar 3, 2022 at 1:20 PM Jan Kiszka  <mailto:jan.kis...@siemens.com>> wrote:
> 
> On 02.03.22 16:44, Greg Gallagher wrote:
> >
> >
>     > On Wed, Mar 2, 2022 at 1:48 AM Jan Kiszka  <mailto:jan.kis...@siemens.com>
> > <mailto:jan.kis...@siemens.com <mailto:jan.kis...@siemens.com>>>
> wrote:
> >
> >     Hi Greg,
> >
> >     something is going wrong on arm64 with latest ipipe version,
> see e.g.
> >
> >   
>  https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>
> >   
>  <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw
> <https://source.denx.de/Xenomai/xenomai-images/-/jobs/398455/raw>>
> >     (same thing seen on HiKey as well)
> >
> >     Could you have a look?
> >
> >     Thanks,
> >     Jan
> >
> >     --
> >     Siemens AG, Technology
> >     Competence Center Embedded Linux
> >
> >
> > I'll take a look, it will be close to the end of the week but i'll aim
> > to have it root caused by the weekend.
> >
> 
> Just tried locally with xenomai-images and qemu-arm64 (just run smokey):
> 
> [  408.747349] Kernel panic - not syncing: kernel stack overflow
> [  408.747591] CPU: 0 PID: 1577 Comm: systemd-journal Tainted: G   
>     W         5.4.180+ #1
> [  408.747762] Hardware name: linux,dummy-virt (DT)
> [  408.747852] I-pipe domain: Xenomai
> [  408.747941] Call trace:
> ...
> [  408.761131]  do_debug_exception+0x94/0x240
> [  408.761255]  el1_dbg+0x18/0x8c
> [  408.761329]  this_cpu_has_cap+0x60/0x7c
> [  408.761423]  erratum_1418040_thread_switch+0x18/0x5c
> [  408.761534]  __switch_to+0xf8/0x154
> [  408.761622]  xnarch_switch_to+0x5c/0xc4
> [  408.761711]  pipeline_switch_to+0x14/0x84
> [  408.761803]  ___xnsched_run+0x154/0x240
> [  408.761889]  pipeline_schedule+0x30/0x40
> [  408.761999]  xnintr_core_clock_handler+0x250/0x260
> [  408.762107]  dispatch_irq_head+0x84/0x120
> [  408.762198]  __ipipe_dispatch_irq+0x19c/0x1c4
> [  408.762293]  __ipipe_grab_irq+0x5c/0xa0
> [  408.762377]  gic_handle_irq+0x54/0xb0
> [  408.762457]  handle_arch_irq_pipelined+0x14/0x60
> [  408.762557]  el0_irq_naked+0x5c/0x84
> [  408.762905] SMP: stopping secondary CPUs
> 
> This dbg trap from erratum_1418040_thread_switch looks suspicious, and
> if I had to bet, I would say it somehow relates to [1] which came with
> v5.4.176. But more logical would [2] due to its switch from static to
> dynamic cpu_has_cap - but that is already in since v5.4.80...
> 
> Jan
> 
> [1]
> 
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b
> 
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/a6d588572568c7431a9a3dc17f3c75962a2f070b>
> [2]
> 
> https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f
> 
> <https://source.denx.de/Xenomai/ipipe-arm64/-/commit/71eea3d3df94ccdcf3b616d27d68d6c028c1968f>
> 
> 
> -- 
> Siemens AG, Technology
> Competence Center Embedded Linux
> 
> 
> I just built a new image and I’ll have time to look into this probably
> tomorrow.
> 
> Thanks for the help :)
> 

Any news on this? Do you need further support?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux



  1   2   3   4   5   6   7   8   9   10   >