Re: sideeffect of rcu_nocbs on periodic Alchemy task

2019-09-17 Thread Jan Kiszka via Xenomai

On 17.09.19 15:57, jk.beh...@web.de wrote:

Hi Jan,
wow, that was a really quick response.
Does "testing latest rt release" mean that I also have to use a new Linux kernel 
version?

Currently I am running linux 4.14.71-rt44.
The most recent rt-patch for linux 4.14 seems to be "patch-4.14.137-rt66.patch"
Would you recommend to use that?


I would recommend v5.2.14-rt7 to cross-check if the issue is version-agnostic.

Jan


Thanks for your comments
Jochen
 >> Hello,
 >>
 >> I am running an Alchemy application (xenomai 3.0.9) over Linux
 >> prempt-rt on a dual core
 >> atom (E3930) and noticed a side effect of the Linux boot parameter
 >> "rcu_nocbs".
 >
 >To clarify: You are using Xenomai in mercury mode (userspace libs only) on a
 >stock preempt-rt kernel, right? Then a kernel splash is best reported to the
 >preempt-rt community. But first make sure to have tested also the latest 
release
 >to clarify if that is a stable -rt issue or a general one.
 >
 >Jan
 >
 >>
 >> Whenever the kernel bootparameter "rcu_nocbs=1" is set, I get the
 >> following Linux
 >> kernel warning, when the AlchemyTask (TestTask) is terminated.
 >>
 >>
 >> Sep 17 13:02:43 localhost kernel: [ 97.342398] [ cut here
 >> ]
 >> Sep 17 13:02:43 localhost kernel: [ 97.342412] WARNING: CPU: 0 PID:
 >> 530 at
 >> /home/behnkjoc/prc2020/poky/build-tca5-32/tmp/work-shared/congatec-tca5
 >> -32/kernel-source/kernel/rcu/tree_plugin.h:310
 >> rcu_note_context_switch+0x2a0/0x4d0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342414] Modules linked in:
 >> ec_generic(O) ec_master(O) spidev nls_iso8859_1 cmdlinepart
 >> intel_spi_platform intel_spi spi_nor mtd spi_pxa2xx_platform joydev
 >> intel_rapl intel_powerclamp coretemp crc32_pclmul snd_hda_codec_hdmi
 >> pcbc aesni_intel aes_i586 crypto_simd cryptd intel_cstate
 >> intel_rapl_perf i2c_i801 lpc_ich pcspkr idma64 virt_dma intel_lpss_
 >> pci intel_lpss input_leds mac_hid i915 video snd_hda_intel
 >> drm_kms_helper snd_hda_codec snd_hda_core snd_hwdep snd_pcm
 >> hid_multitouch drm mei_me snd_timer fb_sys_fops syscopyarea sysfillrect
 >> snd
 >> mei sysimgblt soundcore shpchp sch_fq_codel nfsd autofs4
 >> Sep 17 13:02:43 localhost kernel: [ 97.342470] CPU: 0 PID: 530 Comm:
 >> TestTask Tainted: G O 4.14.71-rt44 #1
 >> Sep 17 13:02:43 localhost kernel: [ 97.342472] task: f3347300
 >> task.stack: f32ea000
 >> Sep 17 13:02:43 localhost kernel: [ 97.342476] EIP:
 >> rcu_note_context_switch+0x2a0/0x4d0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342478] EFLAGS: 00010002 CPU:
 >> 0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342480] EAX: 0001 EBX:
 >>  ECX: 0001 EDX: 
 >> Sep 17 13:02:43 localhost kernel: [ 97.342482] ESI:  EDI:
 >> f3347300 EBP: f32ebeec ESP: f32ebed0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342485] DS: 007b ES: 007b FS:
 >> 00d8 GS: 00e0 SS: 0068
 >> Sep 17 13:02:43 localhost kernel: [ 97.342487] CR0: 80050033 CR2:
 >> 06338490 CR3: 332d4000 CR4: 003406d0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342489] Call Trace:
 >> Sep 17 13:02:43 localhost kernel: [ 97.342501] ?
 >> unpin_current_cpu+0x53/0x80
 >> Sep 17 13:02:43 localhost kernel: [ 97.342507] __schedule+0x85/0x700
 >> Sep 17 13:02:43 localhost kernel: [ 97.342511] ?
 >> _raw_spin_unlock_irqrestore+0x17/0x50
 >> Sep 17 13:02:43 localhost kernel: [ 97.342514] ?
 >> rt_spin_unlock+0x24/0x50
 >> Sep 17 13:02:43 localhost kernel: [ 97.342517] schedule+0x41/0xe0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342521]
 >> hrtimer_wait_for_timer+0x5d/0x90
 >> Sep 17 13:02:43 localhost kernel: [ 97.342525] ?
 >> wait_woken+0x70/0x70
 >> Sep 17 13:02:43 localhost kernel: [ 97.342530]
 >> timer_wait_for_callback+0x40/0x50
 >> Sep 17 13:02:43 localhost kernel: [ 97.342533]
 >> SyS_timer_delete+0x6b/0x140
 >> Sep 17 13:02:43 localhost kernel: [ 97.342538]
 >> do_int80_syscall_32+0x6b/0xf0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342542]
 >> entry_INT80_32+0x31/0x31
 >> Sep 17 13:02:43 localhost kernel: [ 97.342545] EIP: 0xb22c68d0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342546] EFLAGS: 0282 CPU:
 >> 0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342548] EAX: ffda EBX:
 >> 0002 ECX:  EDX: b1d00480
 >> Sep 17 13:02:43 localhost kernel: [ 97.342550] ESI: b1d005e0 EDI:
 >> b22cc000 EBP: b1fc7318 ESP: b1fc72b0
 >> Sep 17 13:02:43 localhost kernel: [ 97.342552] DS: 007b ES: 007b FS:
 >>  GS: 0033 SS: 007b
 >> Sep 17 13:02:43 localhost kernel: [ 97.342556] Code: c3 83 e8 01 39
 >> c2 0f 85 27 02 00 00 83 f9 0f 8d 97 f8 02 00 00 0f 87 78 01 00 00 8b 04
 >> 8d f8 52 51 c3 e9 a4 9b 83 00 8d 74 26 00 <0f> 0b 80
 >> bf f4 02 00 00 00 0f 85 a6 fd ff ff e9 1c ff ff ff 8d
 >> Sep 17 13:02:43 localhost kernel: [ 97.342600] ---[ end trace
 >> 0002 ]---
 >> The issue can be reproduced with the following simple program
 >> ///
 >> 
 >> // Test application
 >> ///

Aw: Re: sideeffect of rcu_nocbs on periodic Alchemy task

2019-09-17 Thread JK.Behnke--- via Xenomai
   Hi Jan,
   wow, that was a really quick response.

   Does "testing latest rt release" mean that I also have to use a new
   Linux kernel version?
   Currently I am running linux 4.14.71-rt44.
   The most recent rt-patch for linux 4.14 seems to be
   "patch-4.14.137-rt66.patch"
   Would you recommend to use that?

   Thanks for your comments

   Jochen

   >> Hello,
   >>
   >> I am running an Alchemy application (xenomai 3.0.9) over Linux
   >> prempt-rt on a dual core
   >> atom (E3930) and noticed a side effect of the Linux boot parameter
   >> "rcu_nocbs".
   >
   >To clarify: You are using Xenomai in mercury mode (userspace libs
   only) on a
   >stock preempt-rt kernel, right? Then a kernel splash is best reported
   to the
   >preempt-rt community. But first make sure to have tested also the
   latest release
   >to clarify if that is a stable -rt issue or a general one.
   >
   >Jan
   >
   >>
   >> Whenever the kernel bootparameter "rcu_nocbs=1" is set, I get the
   >> following Linux
   >> kernel warning, when the AlchemyTask (TestTask) is terminated.
   >>
   >>
   >> Sep 17 13:02:43 localhost kernel: [ 97.342398] [ cut
   here
   >> ]
   >> Sep 17 13:02:43 localhost kernel: [ 97.342412] WARNING: CPU: 0 PID:
   >> 530 at
   >>
   /home/behnkjoc/prc2020/poky/build-tca5-32/tmp/work-shared/congatec-tca5
   >> -32/kernel-source/kernel/rcu/tree_plugin.h:310
   >> rcu_note_context_switch+0x2a0/0x4d0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342414] Modules linked in:
   >> ec_generic(O) ec_master(O) spidev nls_iso8859_1 cmdlinepart
   >> intel_spi_platform intel_spi spi_nor mtd spi_pxa2xx_platform joydev
   >> intel_rapl intel_powerclamp coretemp crc32_pclmul snd_hda_codec_hdmi
   >> pcbc aesni_intel aes_i586 crypto_simd cryptd intel_cstate
   >> intel_rapl_perf i2c_i801 lpc_ich pcspkr idma64 virt_dma intel_lpss_
   >> pci intel_lpss input_leds mac_hid i915 video snd_hda_intel
   >> drm_kms_helper snd_hda_codec snd_hda_core snd_hwdep snd_pcm
   >> hid_multitouch drm mei_me snd_timer fb_sys_fops syscopyarea
   sysfillrect
   >> snd
   >> mei sysimgblt soundcore shpchp sch_fq_codel nfsd autofs4
   >> Sep 17 13:02:43 localhost kernel: [ 97.342470] CPU: 0 PID: 530 Comm:
   >> TestTask Tainted: G O 4.14.71-rt44 #1
   >> Sep 17 13:02:43 localhost kernel: [ 97.342472] task: f3347300
   >> task.stack: f32ea000
   >> Sep 17 13:02:43 localhost kernel: [ 97.342476] EIP:
   >> rcu_note_context_switch+0x2a0/0x4d0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342478] EFLAGS: 00010002 CPU:
   >> 0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342480] EAX: 0001 EBX:
   >>  ECX: 0001 EDX: 
   >> Sep 17 13:02:43 localhost kernel: [ 97.342482] ESI:  EDI:
   >> f3347300 EBP: f32ebeec ESP: f32ebed0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342485] DS: 007b ES: 007b FS:
   >> 00d8 GS: 00e0 SS: 0068
   >> Sep 17 13:02:43 localhost kernel: [ 97.342487] CR0: 80050033 CR2:
   >> 06338490 CR3: 332d4000 CR4: 003406d0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342489] Call Trace:
   >> Sep 17 13:02:43 localhost kernel: [ 97.342501] ?
   >> unpin_current_cpu+0x53/0x80
   >> Sep 17 13:02:43 localhost kernel: [ 97.342507] __schedule+0x85/0x700
   >> Sep 17 13:02:43 localhost kernel: [ 97.342511] ?
   >> _raw_spin_unlock_irqrestore+0x17/0x50
   >> Sep 17 13:02:43 localhost kernel: [ 97.342514] ?
   >> rt_spin_unlock+0x24/0x50
   >> Sep 17 13:02:43 localhost kernel: [ 97.342517] schedule+0x41/0xe0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342521]
   >> hrtimer_wait_for_timer+0x5d/0x90
   >> Sep 17 13:02:43 localhost kernel: [ 97.342525] ?
   >> wait_woken+0x70/0x70
   >> Sep 17 13:02:43 localhost kernel: [ 97.342530]
   >> timer_wait_for_callback+0x40/0x50
   >> Sep 17 13:02:43 localhost kernel: [ 97.342533]
   >> SyS_timer_delete+0x6b/0x140
   >> Sep 17 13:02:43 localhost kernel: [ 97.342538]
   >> do_int80_syscall_32+0x6b/0xf0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342542]
   >> entry_INT80_32+0x31/0x31
   >> Sep 17 13:02:43 localhost kernel: [ 97.342545] EIP: 0xb22c68d0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342546] EFLAGS: 0282 CPU:
   >> 0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342548] EAX: ffda EBX:
   >> 0002 ECX:  EDX: b1d00480
   >> Sep 17 13:02:43 localhost kernel: [ 97.342550] ESI: b1d005e0 EDI:
   >> b22cc000 EBP: b1fc7318 ESP: b1fc72b0
   >> Sep 17 13:02:43 localhost kernel: [ 97.342552] DS: 007b ES: 007b FS:
   >>  GS: 0033 SS: 007b
   >> Sep 17 13:02:43 localhost kernel: [ 97.342556] Code: c3 83 e8 01 39
   >> c2 0f 85 27 02 00 00 83 f9 0f 8d 97 f8 02 00 00 0f 87 78 01 00 00 8b
   04
   >> 8d f8 52 51 c3 e9 a4 9b 83 00 8d 74 26 00 <0f> 0b 80
   >> bf f4 02 00 00 00 0f 85 a6 fd ff ff e9 1c ff ff ff 8d
   >> Sep 17 13:02:43 localhost kernel: [ 97.342600] ---[ end trace
   >> 0002 ]---
   >> The issue can be reproduced with the following simple program
   >>
   //

Re: sideeffect of rcu_nocbs on periodic Alchemy task

2019-09-17 Thread Jan Kiszka via Xenomai

On 17.09.19 15:16, JK.Behnke--- via Xenomai wrote:

Hello,

I am running an Alchemy application (xenomai 3.0.9) over Linux
prempt-rt on a dual core
atom (E3930) and noticed a side effect of the Linux boot parameter
"rcu_nocbs".


To clarify: You are using Xenomai in mercury mode (userspace libs only) on a 
stock preempt-rt kernel, right? Then a kernel splash is best reported to the 
preempt-rt community. But first make sure to have tested also the latest release 
to clarify if that is a stable -rt issue or a general one.


Jan



Whenever the kernel bootparameter "rcu_nocbs=1" is set, I get the
following Linux
kernel warning, when the AlchemyTask (TestTask) is terminated.


Sep 17 13:02:43 localhost kernel: [   97.342398] [ cut here
]
Sep 17 13:02:43 localhost kernel: [   97.342412] WARNING: CPU: 0 PID:
530 at
/home/behnkjoc/prc2020/poky/build-tca5-32/tmp/work-shared/congatec-tca5
-32/kernel-source/kernel/rcu/tree_plugin.h:310
rcu_note_context_switch+0x2a0/0x4d0
Sep 17 13:02:43 localhost kernel: [   97.342414] Modules linked in:
ec_generic(O) ec_master(O) spidev nls_iso8859_1 cmdlinepart
intel_spi_platform intel_spi spi_nor mtd spi_pxa2xx_platform joydev
 intel_rapl intel_powerclamp coretemp crc32_pclmul snd_hda_codec_hdmi
pcbc aesni_intel aes_i586 crypto_simd cryptd intel_cstate
intel_rapl_perf i2c_i801 lpc_ich pcspkr idma64 virt_dma intel_lpss_
pci intel_lpss input_leds mac_hid i915 video snd_hda_intel
drm_kms_helper snd_hda_codec snd_hda_core snd_hwdep snd_pcm
hid_multitouch drm mei_me snd_timer fb_sys_fops syscopyarea sysfillrect
snd
mei sysimgblt soundcore shpchp sch_fq_codel nfsd autofs4
Sep 17 13:02:43 localhost kernel: [   97.342470] CPU: 0 PID: 530 Comm:
TestTask Tainted: G   O4.14.71-rt44 #1
Sep 17 13:02:43 localhost kernel: [   97.342472] task: f3347300
task.stack: f32ea000
Sep 17 13:02:43 localhost kernel: [   97.342476] EIP:
rcu_note_context_switch+0x2a0/0x4d0
Sep 17 13:02:43 localhost kernel: [   97.342478] EFLAGS: 00010002 CPU:
0
Sep 17 13:02:43 localhost kernel: [   97.342480] EAX: 0001 EBX:
 ECX: 0001 EDX: 
Sep 17 13:02:43 localhost kernel: [   97.342482] ESI:  EDI:
f3347300 EBP: f32ebeec ESP: f32ebed0
Sep 17 13:02:43 localhost kernel: [   97.342485]  DS: 007b ES: 007b FS:
00d8 GS: 00e0 SS: 0068
Sep 17 13:02:43 localhost kernel: [   97.342487] CR0: 80050033 CR2:
06338490 CR3: 332d4000 CR4: 003406d0
Sep 17 13:02:43 localhost kernel: [   97.342489] Call Trace:
Sep 17 13:02:43 localhost kernel: [   97.342501]  ?
unpin_current_cpu+0x53/0x80
Sep 17 13:02:43 localhost kernel: [   97.342507]  __schedule+0x85/0x700
Sep 17 13:02:43 localhost kernel: [   97.342511]  ?
_raw_spin_unlock_irqrestore+0x17/0x50
Sep 17 13:02:43 localhost kernel: [   97.342514]  ?
rt_spin_unlock+0x24/0x50
Sep 17 13:02:43 localhost kernel: [   97.342517]  schedule+0x41/0xe0
Sep 17 13:02:43 localhost kernel: [   97.342521]
hrtimer_wait_for_timer+0x5d/0x90
Sep 17 13:02:43 localhost kernel: [   97.342525]  ?
wait_woken+0x70/0x70
Sep 17 13:02:43 localhost kernel: [   97.342530]
timer_wait_for_callback+0x40/0x50
Sep 17 13:02:43 localhost kernel: [   97.342533]
SyS_timer_delete+0x6b/0x140
Sep 17 13:02:43 localhost kernel: [   97.342538]
do_int80_syscall_32+0x6b/0xf0
Sep 17 13:02:43 localhost kernel: [   97.342542]
entry_INT80_32+0x31/0x31
Sep 17 13:02:43 localhost kernel: [   97.342545] EIP: 0xb22c68d0
Sep 17 13:02:43 localhost kernel: [   97.342546] EFLAGS: 0282 CPU:
0
Sep 17 13:02:43 localhost kernel: [   97.342548] EAX: ffda EBX:
0002 ECX:  EDX: b1d00480
Sep 17 13:02:43 localhost kernel: [   97.342550] ESI: b1d005e0 EDI:
b22cc000 EBP: b1fc7318 ESP: b1fc72b0
Sep 17 13:02:43 localhost kernel: [   97.342552]  DS: 007b ES: 007b FS:
 GS: 0033 SS: 007b
Sep 17 13:02:43 localhost kernel: [   97.342556] Code: c3 83 e8 01 39
c2 0f 85 27 02 00 00 83 f9 0f 8d 97 f8 02 00 00 0f 87 78 01 00 00 8b 04
8d f8 52 51 c3 e9 a4 9b 83 00 8d 74 26 00 <0f> 0b 80
bf f4 02 00 00 00 0f 85 a6 fd ff ff e9 1c ff ff ff 8d
Sep 17 13:02:43 localhost kernel: [   97.342600] ---[ end trace
0002 ]---
The issue can be reproduced with the following simple program
///

// Test application
///

#include 
#include  // usleep
#include 
#define CPU_AFFINITY_DEFAULT 0
#define MAIN_TASK_NAME  "MainTask"
#define MAIN_TASK_PRIO  0
#define MAIN_TASK_MODE  0
#define TESTTASK_NAME  "TestTask"
#define TESTTASK_PRIO  10
#define TEST

sideeffect of rcu_nocbs on periodic Alchemy task

2019-09-17 Thread JK.Behnke--- via Xenomai
   Hello,

   I am running an Alchemy application (xenomai 3.0.9) over Linux
   prempt-rt on a dual core
   atom (E3930) and noticed a side effect of the Linux boot parameter
   "rcu_nocbs".

   Whenever the kernel bootparameter "rcu_nocbs=1" is set, I get the
   following Linux
   kernel warning, when the AlchemyTask (TestTask) is terminated.


   Sep 17 13:02:43 localhost kernel: [   97.342398] [ cut here
   ]
   Sep 17 13:02:43 localhost kernel: [   97.342412] WARNING: CPU: 0 PID:
   530 at
   /home/behnkjoc/prc2020/poky/build-tca5-32/tmp/work-shared/congatec-tca5
   -32/kernel-source/kernel/rcu/tree_plugin.h:310
   rcu_note_context_switch+0x2a0/0x4d0
   Sep 17 13:02:43 localhost kernel: [   97.342414] Modules linked in:
   ec_generic(O) ec_master(O) spidev nls_iso8859_1 cmdlinepart
   intel_spi_platform intel_spi spi_nor mtd spi_pxa2xx_platform joydev
intel_rapl intel_powerclamp coretemp crc32_pclmul snd_hda_codec_hdmi
   pcbc aesni_intel aes_i586 crypto_simd cryptd intel_cstate
   intel_rapl_perf i2c_i801 lpc_ich pcspkr idma64 virt_dma intel_lpss_
   pci intel_lpss input_leds mac_hid i915 video snd_hda_intel
   drm_kms_helper snd_hda_codec snd_hda_core snd_hwdep snd_pcm
   hid_multitouch drm mei_me snd_timer fb_sys_fops syscopyarea sysfillrect
   snd
   mei sysimgblt soundcore shpchp sch_fq_codel nfsd autofs4
   Sep 17 13:02:43 localhost kernel: [   97.342470] CPU: 0 PID: 530 Comm:
   TestTask Tainted: G   O4.14.71-rt44 #1
   Sep 17 13:02:43 localhost kernel: [   97.342472] task: f3347300
   task.stack: f32ea000
   Sep 17 13:02:43 localhost kernel: [   97.342476] EIP:
   rcu_note_context_switch+0x2a0/0x4d0
   Sep 17 13:02:43 localhost kernel: [   97.342478] EFLAGS: 00010002 CPU:
   0
   Sep 17 13:02:43 localhost kernel: [   97.342480] EAX: 0001 EBX:
    ECX: 0001 EDX: 
   Sep 17 13:02:43 localhost kernel: [   97.342482] ESI:  EDI:
   f3347300 EBP: f32ebeec ESP: f32ebed0
   Sep 17 13:02:43 localhost kernel: [   97.342485]  DS: 007b ES: 007b FS:
   00d8 GS: 00e0 SS: 0068
   Sep 17 13:02:43 localhost kernel: [   97.342487] CR0: 80050033 CR2:
   06338490 CR3: 332d4000 CR4: 003406d0
   Sep 17 13:02:43 localhost kernel: [   97.342489] Call Trace:
   Sep 17 13:02:43 localhost kernel: [   97.342501]  ?
   unpin_current_cpu+0x53/0x80
   Sep 17 13:02:43 localhost kernel: [   97.342507]  __schedule+0x85/0x700
   Sep 17 13:02:43 localhost kernel: [   97.342511]  ?
   _raw_spin_unlock_irqrestore+0x17/0x50
   Sep 17 13:02:43 localhost kernel: [   97.342514]  ?
   rt_spin_unlock+0x24/0x50
   Sep 17 13:02:43 localhost kernel: [   97.342517]  schedule+0x41/0xe0
   Sep 17 13:02:43 localhost kernel: [   97.342521]
   hrtimer_wait_for_timer+0x5d/0x90
   Sep 17 13:02:43 localhost kernel: [   97.342525]  ?
   wait_woken+0x70/0x70
   Sep 17 13:02:43 localhost kernel: [   97.342530]
   timer_wait_for_callback+0x40/0x50
   Sep 17 13:02:43 localhost kernel: [   97.342533]
   SyS_timer_delete+0x6b/0x140
   Sep 17 13:02:43 localhost kernel: [   97.342538]
   do_int80_syscall_32+0x6b/0xf0
   Sep 17 13:02:43 localhost kernel: [   97.342542]
   entry_INT80_32+0x31/0x31
   Sep 17 13:02:43 localhost kernel: [   97.342545] EIP: 0xb22c68d0
   Sep 17 13:02:43 localhost kernel: [   97.342546] EFLAGS: 0282 CPU:
   0
   Sep 17 13:02:43 localhost kernel: [   97.342548] EAX: ffda EBX:
   0002 ECX:  EDX: b1d00480
   Sep 17 13:02:43 localhost kernel: [   97.342550] ESI: b1d005e0 EDI:
   b22cc000 EBP: b1fc7318 ESP: b1fc72b0
   Sep 17 13:02:43 localhost kernel: [   97.342552]  DS: 007b ES: 007b FS:
    GS: 0033 SS: 007b
   Sep 17 13:02:43 localhost kernel: [   97.342556] Code: c3 83 e8 01 39
   c2 0f 85 27 02 00 00 83 f9 0f 8d 97 f8 02 00 00 0f 87 78 01 00 00 8b 04
   8d f8 52 51 c3 e9 a4 9b 83 00 8d 74 26 00 <0f> 0b 80
   bf f4 02 00 00 00 0f 85 a6 fd ff ff e9 1c ff ff ff 8d
   Sep 17 13:02:43 localhost kernel: [   97.342600] ---[ end trace
   0002 ]---
   The issue can be reproduced with the following simple program
   ///
   
   // Test application
   ///
   
   #include 
   #include  // usleep
   #include 
   #define CPU_AFFINITY_DEFAULT 0
   #define MAIN_TASK_NAME  "MainTask"
   #define MAIN_TASK_PRIO  0
   #define MAIN_TASK_MODE  0
   #define TESTTASK_NAME  "TestTask"
   #define TESTTASK_PRIO  10
   #define TESTTASK_MODE  0
   #define TESTTASK_STACKSIZE 0x10l// 1 MB
   #define TESTTASK_PERIOD_NS (5* 100)
   typedef struct {
   RT_TASK TaskDescr;
   int nEndTask;
   int Period_ns;
   } TESTTASK_CONTEXT;
   RT_TASKg_MainTask;
   int g_nRun=1;
   void TestTask(void *pData) {
   TESTTASK_CONTEXT *pCtx = (TESTTASK_CONTEXT *)pData;
   unsigned long overrun;
   int nErr = 0;
   printf("TestTask starting..

Re: [PATCH] net/tcp: Account for connection teardown handshake

2019-09-17 Thread Jan Kiszka via Xenomai

On 17.09.19 14:01, Sebastian Smolorz wrote:

When closing a TCP connection a handshake procedure is executed between the
peers. The close routine of the rttcp driver did not participate in
detecting the end of this handshake but rather waited one second inside
a close call unconditionally. Especially when peers are directly connected
this is a waste of time which can hurt a lot in some situations.

This patch replaces the msleep(1000) call with a timed wait on a
semaphore which gets sigalled when the termination handshake is complete.

Signed-off-by: Sebastian Smolorz 
---
  kernel/drivers/net/stack/ipv4/tcp/tcp.c | 29 +++--
  1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/drivers/net/stack/ipv4/tcp/tcp.c 
b/kernel/drivers/net/stack/ipv4/tcp/tcp.c
index 54bafa80f..81089afd1 100644
--- a/kernel/drivers/net/stack/ipv4/tcp/tcp.c
+++ b/kernel/drivers/net/stack/ipv4/tcp/tcp.c
@@ -147,6 +147,9 @@ struct tcp_socket {
struct rtskb_queue retransmit_queue;
struct timerwheel_timer timer;

+   struct semaphore close_sem;


Sounds rather lake a job for struct completion.


+   rtdm_nrtsig_t close_sig;
+
  #ifdef CONFIG_XENO_DRIVERS_NET_RTIPV4_TCP_ERROR_INJECTION
unsigned int packet_counter;
unsigned int error_rate;
@@ -1042,6 +1045,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
rt_tcp_send(ts, TCP_FLAG_ACK);
/* data receiving is not possible anymore */
rtdm_sem_destroy(&ts->sock.pending_sem);
+   rtdm_nrtsig_pend(&ts->close_sig);
goto feed;
} else if (ts->tcp_state == TCP_FIN_WAIT1) {
/* Send ACK */
@@ -1105,6 +1109,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
ts->tcp_state = TCP_CLOSE;
rtdm_lock_put_irqrestore(&ts->socket_lock, context);
/* socket destruction will be done on close() */
+   rtdm_nrtsig_pend(&ts->close_sig);
goto drop;
} else if (ts->tcp_state == TCP_FIN_WAIT1) {
ts->tcp_state = TCP_FIN_WAIT2;
@@ -1119,6 +1124,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
ts->tcp_state = TCP_TIME_WAIT;
rtdm_lock_put_irqrestore(&ts->socket_lock, context);
/* socket destruction will be done on close() */
+   rtdm_nrtsig_pend(&ts->close_sig);
goto feed;
}
}
@@ -1190,6 +1196,11 @@ static int rt_tcp_window_send(struct tcp_socket *ts, u32 
data_len, u8 *data_ptr)
return ret;
  }

+static void rt_tcp_close_signal_handler(rtdm_nrtsig_t *nrtsig, void *arg)
+{
+   up((struct semaphore *)arg);
+}
+
  static int rt_tcp_socket_create(struct tcp_socket *ts)
  {
rtdm_lockctx_t context;
@@ -1226,6 +1237,10 @@ static int rt_tcp_socket_create(struct tcp_socket *ts)
timerwheel_init_timer(&ts->timer, rt_tcp_retransmit_handler, ts);
rtskb_queue_init(&ts->retransmit_queue);

+   sema_init(&ts->close_sem, 0);
+   rtdm_nrtsig_init(&ts->close_sig, rt_tcp_close_signal_handler,
+&ts->close_sem);
+
  #ifdef CONFIG_XENO_DRIVERS_NET_RTIPV4_TCP_ERROR_INJECTION
ts->packet_counter = counter_start;
ts->error_rate = error_rate;
@@ -1237,6 +1252,7 @@ static int rt_tcp_socket_create(struct tcp_socket *ts)
/* enforce maximum number of TCP sockets */
if (free_ports == 0) {
rtdm_lock_put_irqrestore(&tcp_socket_base_lock, context);
+   rtdm_nrtsig_destroy(&ts->close_sig);
return -EAGAIN;
}
free_ports--;
@@ -1338,6 +1354,8 @@ static void rt_tcp_socket_destruct(struct tcp_socket *ts)

rtdm_event_destroy(&ts->conn_evt);

+   rtdm_nrtsig_destroy(&ts->close_sig);
+
/* cleanup already collected fragments */
rt_ip_frag_invalidate_socket(sock);

@@ -1362,6 +1380,7 @@ static void rt_tcp_close(struct rtdm_fd *fd)
struct rt_tcp_dispatched_packet_send_cmd send_cmd;
rtdm_lockctx_t context;
int signal = 0;
+   int ret;

rtdm_lock_get_irqsave(&ts->socket_lock, context);

@@ -1380,7 +1399,10 @@ static void rt_tcp_close(struct rtdm_fd *fd)
/* result is ignored */

/* Give the peer some time to reply to our FIN. */
-   msleep(1000);
+   ret = down_timeout(&ts->close_sem, msecs_to_jiffies(1000));
+   if (ret)
+   rtdm_printk("rttcp: waiting for FIN-ACK handshake returned 
%d\n",
+   ret);


Do we consider a timeout to worth a kernel log entry? This could also be caused 
by a lost connection, right?


And should we make the waiting time configurable?


} else if (ts->tcp_state == TCP_CLOSE_WAIT) {

[PATCH] net/tcp: Account for connection teardown handshake

2019-09-17 Thread Sebastian Smolorz via Xenomai
When closing a TCP connection a handshake procedure is executed between the
peers. The close routine of the rttcp driver did not participate in
detecting the end of this handshake but rather waited one second inside
a close call unconditionally. Especially when peers are directly connected
this is a waste of time which can hurt a lot in some situations.

This patch replaces the msleep(1000) call with a timed wait on a
semaphore which gets sigalled when the termination handshake is complete.

Signed-off-by: Sebastian Smolorz 
---
 kernel/drivers/net/stack/ipv4/tcp/tcp.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/drivers/net/stack/ipv4/tcp/tcp.c 
b/kernel/drivers/net/stack/ipv4/tcp/tcp.c
index 54bafa80f..81089afd1 100644
--- a/kernel/drivers/net/stack/ipv4/tcp/tcp.c
+++ b/kernel/drivers/net/stack/ipv4/tcp/tcp.c
@@ -147,6 +147,9 @@ struct tcp_socket {
struct rtskb_queue retransmit_queue;
struct timerwheel_timer timer;

+   struct semaphore close_sem;
+   rtdm_nrtsig_t close_sig;
+
 #ifdef CONFIG_XENO_DRIVERS_NET_RTIPV4_TCP_ERROR_INJECTION
unsigned int packet_counter;
unsigned int error_rate;
@@ -1042,6 +1045,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
rt_tcp_send(ts, TCP_FLAG_ACK);
/* data receiving is not possible anymore */
rtdm_sem_destroy(&ts->sock.pending_sem);
+   rtdm_nrtsig_pend(&ts->close_sig);
goto feed;
} else if (ts->tcp_state == TCP_FIN_WAIT1) {
/* Send ACK */
@@ -1105,6 +1109,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
ts->tcp_state = TCP_CLOSE;
rtdm_lock_put_irqrestore(&ts->socket_lock, context);
/* socket destruction will be done on close() */
+   rtdm_nrtsig_pend(&ts->close_sig);
goto drop;
} else if (ts->tcp_state == TCP_FIN_WAIT1) {
ts->tcp_state = TCP_FIN_WAIT2;
@@ -1119,6 +1124,7 @@ static void rt_tcp_rcv(struct rtskb *skb)
ts->tcp_state = TCP_TIME_WAIT;
rtdm_lock_put_irqrestore(&ts->socket_lock, context);
/* socket destruction will be done on close() */
+   rtdm_nrtsig_pend(&ts->close_sig);
goto feed;
}
}
@@ -1190,6 +1196,11 @@ static int rt_tcp_window_send(struct tcp_socket *ts, u32 
data_len, u8 *data_ptr)
return ret;
 }

+static void rt_tcp_close_signal_handler(rtdm_nrtsig_t *nrtsig, void *arg)
+{
+   up((struct semaphore *)arg);
+}
+
 static int rt_tcp_socket_create(struct tcp_socket *ts)
 {
rtdm_lockctx_t context;
@@ -1226,6 +1237,10 @@ static int rt_tcp_socket_create(struct tcp_socket *ts)
timerwheel_init_timer(&ts->timer, rt_tcp_retransmit_handler, ts);
rtskb_queue_init(&ts->retransmit_queue);

+   sema_init(&ts->close_sem, 0);
+   rtdm_nrtsig_init(&ts->close_sig, rt_tcp_close_signal_handler,
+&ts->close_sem);
+
 #ifdef CONFIG_XENO_DRIVERS_NET_RTIPV4_TCP_ERROR_INJECTION
ts->packet_counter = counter_start;
ts->error_rate = error_rate;
@@ -1237,6 +1252,7 @@ static int rt_tcp_socket_create(struct tcp_socket *ts)
/* enforce maximum number of TCP sockets */
if (free_ports == 0) {
rtdm_lock_put_irqrestore(&tcp_socket_base_lock, context);
+   rtdm_nrtsig_destroy(&ts->close_sig);
return -EAGAIN;
}
free_ports--;
@@ -1338,6 +1354,8 @@ static void rt_tcp_socket_destruct(struct tcp_socket *ts)

rtdm_event_destroy(&ts->conn_evt);

+   rtdm_nrtsig_destroy(&ts->close_sig);
+
/* cleanup already collected fragments */
rt_ip_frag_invalidate_socket(sock);

@@ -1362,6 +1380,7 @@ static void rt_tcp_close(struct rtdm_fd *fd)
struct rt_tcp_dispatched_packet_send_cmd send_cmd;
rtdm_lockctx_t context;
int signal = 0;
+   int ret;

rtdm_lock_get_irqsave(&ts->socket_lock, context);

@@ -1380,7 +1399,10 @@ static void rt_tcp_close(struct rtdm_fd *fd)
/* result is ignored */

/* Give the peer some time to reply to our FIN. */
-   msleep(1000);
+   ret = down_timeout(&ts->close_sem, msecs_to_jiffies(1000));
+   if (ret)
+   rtdm_printk("rttcp: waiting for FIN-ACK handshake 
returned %d\n",
+   ret);
} else if (ts->tcp_state == TCP_CLOSE_WAIT) {
/* Send FIN in CLOSE_WAIT */
send_cmd.ts = ts;
@@ -1394,7 +1416,10 @@ static void rt_tcp_close(struct rtdm_fd *fd)
/* result is ignored */

/* Give the peer some time to reply to our FIN. */
-   msleep(1

Re: Static build of rtnet

2019-09-17 Thread Jan Kiszka via Xenomai

On 17.09.19 10:29, Lange Norbert wrote:




-Original Message-
From: Jan Kiszka 
Sent: Dienstag, 17. September 2019 09:42
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: Static build of rtnet

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
ATTACHMENTS.


On 16.09.19 11:13, Lange Norbert via Xenomai wrote:

Hello,

I havent tested this in a while, but building rtnet static will crash the kernel

when this module initializes.

With the various fixes and cleanups in master/next (like rtdm_available) that

might be worth a look?


I would hope to build a static kernel one day, and so far there are 2

roadblocks:


-   rtnet (+ rtpacket) crashing when built statically

-   symbol nameclashes with linux + rt drivers enabled (I could work on 
fixing

that for rt_igb atleast)




Do you mean removing the "depends on m"?


Yes, ideally I would use a kernel without loadable modules, so kernel 
upgrades/changes don’t affect the rootfs (ideally read-only apart from few 
places).


Possibly, that moves the
initialization order in a way that causes troubles. I also just added another 
case
that exploits the module [1], but that would be solvable. More critical is
understanding the crashes.


I had a quick test removing the "depends on m" about a year ago, I brought this 
up now because it might be fitting with the recent cleanups.


I don't think recent cleanups have changed the situation. Someone has to sit 
down, analyze the crashes, resolve them, and propose all changes for upstream. 
Also, startup scripts need to be adjusted to accept non-module RTnet.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: Static build of rtnet

2019-09-17 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Dienstag, 17. September 2019 09:42
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: Static build of rtnet
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.09.19 11:13, Lange Norbert via Xenomai wrote:
> > Hello,
> >
> > I havent tested this in a while, but building rtnet static will crash the 
> > kernel
> when this module initializes.
> > With the various fixes and cleanups in master/next (like rtdm_available) 
> > that
> might be worth a look?
> >
> > I would hope to build a static kernel one day, and so far there are 2
> roadblocks:
> >
> > -   rtnet (+ rtpacket) crashing when built statically
> >
> > -   symbol nameclashes with linux + rt drivers enabled (I could work on 
> > fixing
> that for rt_igb atleast)
> >
>
> Do you mean removing the "depends on m"?

Yes, ideally I would use a kernel without loadable modules, so kernel 
upgrades/changes don’t affect the rootfs (ideally read-only apart from few 
places).

> Possibly, that moves the
> initialization order in a way that causes troubles. I also just added another 
> case
> that exploits the module [1], but that would be solvable. More critical is
> understanding the crashes.

I had a quick test removing the "depends on m" about a year ago, I brought this 
up now because it might be fitting with the recent cleanups.

Regards, Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: Static build of rtnet

2019-09-17 Thread Jan Kiszka via Xenomai

On 16.09.19 11:13, Lange Norbert via Xenomai wrote:

Hello,

I havent tested this in a while, but building rtnet static will crash the 
kernel when this module initializes.
With the various fixes and cleanups in master/next (like rtdm_available) that 
might be worth a look?

I would hope to build a static kernel one day, and so far there are 2 
roadblocks:

-   rtnet (+ rtpacket) crashing when built statically

-   symbol nameclashes with linux + rt drivers enabled (I could work on 
fixing that for rt_igb atleast)



Do you mean removing the "depends on m"? Possibly, that moves the initialization 
order in a way that causes troubles. I also just added another case that 
exploits the module [1], but that would be solvable. More critical is 
understanding the crashes.


Jan

[1] https://xenomai.org/pipermail/xenomai/2019-September/041583.html

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: RTDM open, open_rt, and open_nrt

2019-09-17 Thread Per Oberg via Xenomai
- Den 16 sep 2019, på kl 19:01, Jan Kiszka jan.kis...@siemens.com skrev:

> On 16.09.19 17:33, Per Oberg wrote:

> > - Den 16 sep 2019, på kl 16:59, Jan Kiszka jan.kis...@siemens.com skrev:

> >> On 16.09.19 14:41, Per Oberg via Xenomai wrote:
> >>> - Den 16 sep 2019, på kl 14:36, Per Öberg p...@wolfram.com skrev:

>  - Den 16 sep 2019, på kl 11:34, Jan Kiszka jan.kis...@siemens.com 
>  skrev:

> > On 16.09.19 09:32, Per Oberg via Xenomai wrote:
> >> Hello list

> >> I am trying to understand how rtdm works, and possibly why out of a 
> >> historical
> >> context. Perhaps there is a good place to read up on this stuff, then 
> >> please
> >> let me know.

> >> It seems like in the rtdm-api there is only open, but no open_rt or 
> >> open_nrt.
> >> More specifically we have:
> >> - read_rt / read_nrt
> >> - recvmsg_rt / recvmsg_nrt
> >> - ioctl_rt / ioctl_nrt
> >> - .. etc.

> >> However, when studying an old xenomai2->3 ported driver it seems like 
> >> there used
> >> to be open_rt and open_nrt. The problem I was having before (see my 
> >> background
> >> comment below) was because the open had been mapped to the old 
> >> open_nrt code,
> >> which in turned used a rt-lock, thus a mix of the two. When switching 
> >> to a
> >> regular mutex it "worked", as in it didn't complain.

> >> In a short discussion Jan Kiszka gave me the impression that open 
> >> could possibly
> >> end up being rt or nrt depending on situation.

> >> PÖ: I'm guessing that open is always non-rt and therefore a rtdm_lock 
> >> should be
> >> used? ...

> >> JK: This depends. If the open code needs to synchronize only with 
> >> other non-RT
> >> JK: paths, normal Linux locks are fine. If there is the need to sync 
> >> with the
> >> JK: interrupt handler or some of the _rt callbacks, rtdm_lock & Co. is 
> >> needed.

> >> So, how does this work? And why was (if it was) open_nrt and open_rt 
> >> replaced
> >> with a common open?

> > The original RTDM design was foreseeing the use case of creating and 
> > destroying
> > resources like file descriptors for devices in RT context. That idea 
> > was dropped
> > as also the trend for the core was clearly making this less realistic.
> > Therefore, we removed open/socket_rt from Xenomai 3.

> > If you have a driver that exploited open_rt, you need to remove all 
> > rt-sleeping
> > operations from its open function. If rtdm_lock is an appropriate 
> > alternative
> > depends on the driver locking structure and the code run under the lock.
> > rtdm_lock_get makes the lock holder unpreemptible. So, if rtdm_mutex 
> > was chosen
> > because of lengthy code under the lock, that would not be a good 
> > alternative.
> > Then we would have to discuss what exactly is run there, and why.

>  Ok, can I read up on this somewhere? I found [1], is that still valid in 
>  this
>  context? ( Oh, and can we expect a third edition perhaps ? =) )

>  [1] Building Embedded Linux Systems: Concepts, Techniques, Tricks, and 
>  Traps 2nd
>  Edition, Kindle Edition

> >> Basic locking principles should be covered there, not sure if it had a
> >> Xenomai/RTDM section. If so, check if it was written/updated after 2015.

>> It has, but it's written in 2008. With references for a paper you wrote. ( 
>> "The
> > Real-Time Driver Model and First Applications" )

> >> Background
> >> 
> >> I recently wrote about a driver which warned about "drvlib.c:1349
> >> rtdm_mutex_timedlock". I got good answers which led me to some more 
> >> general
> >> questions, but instead of continuing in the old tread I thought it 
> >> better to
> >> start a new one since it's not about the initial problem. The driver 
> >> in case is
> >> the Peak Linux Driver for their CAN hardware, see [1]

> >> [1] https://www.peak-system.com/fileadmin/media/linux/index.htm

> > Did you inform them about their problem already? Maybe they are willing 
> > to fix
> > it. We can't, it's not upstream code.

>  No, I haven't, but I will. The reason I haven't yet is because I was 
>  under the
>  impression that this didn't happen to them. I'm trying to compile 
>  everything
>  (driver, lib, and application) in a Yocto based SDK setup and it seems 
>  like
>  compilation flags and environment variables are getting squashed in 
>  interesting
>  ways. My reasoning so far was that I got this wrong somehow.

> >>> Forget that, I did actually ask them and they answered in a manner that
> >>> suggested that I was doing something wrong (wrong compilation flags or 
> >>> user
> >>> privileges ). I never got rid of the warning though and it fell into the 
> >>> dark
> >>> corners of the backlo