RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 16. Dezember 2019 15:30
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 14:45, Jan Kiszka wrote:
> > On 16.12.19 14:03, Lange Norbert wrote:
> >>> I still need to find the actually used pattern in the latest kernel.
> >>> It's not the one I suspected.
> >>>
> >>> Do I have your config for this setup already?
> >>
> >> Attached now
> >
> > Analyzing... not very different to mine /wrt tracing.
> >
> > Can you reproduce the issue by running only some of the Xenomai test
> > cases while turning on tracing?
> >
>
> Please retry without CONFIG_JUMP_LABEL. I think this brings in the
> unsupported dynamic (and we should make it depend on !IPIPE).

Yes, cant reproduce anymore. Thanks.



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: stalled head domain with 3.1rc4

2019-12-16 Thread Jan Kiszka via Xenomai

On 16.12.19 14:45, Jan Kiszka wrote:

On 16.12.19 14:03, Lange Norbert wrote:

I still need to find the actually used pattern in the latest kernel.
It's not the one I suspected.

Do I have your config for this setup already?


Attached now


Analyzing... not very different to mine /wrt tracing.

Can you reproduce the issue by running only some of the Xenomai test 
cases while turning on tracing?




Please retry without CONFIG_JUMP_LABEL. I think this brings in the 
unsupported dynamic (and we should make it depend on !IPIPE).


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: stalled head domain with 3.1rc4

2019-12-16 Thread Jan Kiszka via Xenomai

On 16.12.19 14:03, Lange Norbert wrote:

I still need to find the actually used pattern in the latest kernel.
It's not the one I suspected.

Do I have your config for this setup already?


Attached now


Analyzing... not very different to mine /wrt tracing.

Can you reproduce the issue by running only some of the Xenomai test 
cases while turning on tracing?


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Montag, 16. Dezember 2019 13:57
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 16.12.19 13:50, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 13. Dezember 2019 14:44
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: stalled head domain with 3.1rc4
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 13.12.19 14:35, Lange Norbert wrote:
> >>>
> >>>
> >>>> -----Original Message-
> >>>> From: Jan Kiszka 
> >>>> Sent: Freitag, 13. Dezember 2019 14:13
> >>>> To: Lange Norbert ; Xenomai
> >>>> (xenomai@xenomai.org) 
> >>>> Subject: Re: stalled head domain with 3.1rc4
> >>>>
> >>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>>
> >>>>
> >>>> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> >>>>> Same thing with panic trace enabled (another, longer trace with 4000
> >>>>> samples attached)
> >>>>>
> >>>>> [  292.743618] I-pipe: Detected stalled head domain, probably caused
> by
> >> a
> >>>> bug.
> >>>>> [  292.743618] A critical section may have been left 
> >>>>> unterminated.
> >>>>> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> >>>> 4.19.84-xeno8-static #1
> >>>>> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> >>>> Product
> >>>>> Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> >>>>> Linux [  292.778546] Call Trace:
> >>>>> [  292.781005]  
> >>>>> [  292.783034]  dump_stack+0x8c/0xc0
> >>>>> [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> >>>>> ipipe_stall_root+0xe/0x60 [  292.794322]
> >>>>> __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> >>>>> 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code:
> 55
> >>>>> 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> >>>>> 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> >>>>> 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >>>>>  RBX: 00025090 RCX:  [
> >>>>> 292.837219] RDX:  RSI: 000c6130 RDI:
> >>>>> 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >>>>>  R09: 0027e6d0 [  292.851514] R10:
> >>>>> 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> >>>>> 292.858658] R13:  R14: 9e03bca0 R15:
> >>>>> 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> >>>>> 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> >>>>> clockevents_program_event+0x88/0xe0
> >>>>> [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> >>>>> smp_apic_timer_interrupt+0x46/0x110
> >>>>> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> >>>>> __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> >>>>> ipipe_reschedule_interrupt+0x12/0x40
> >>>>> [  292.900412]  
> >>>>> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> >>>>> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> >>>>> fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> >>>>> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202
> ORIG_RAX:
> >>>>> ff15 [  292.934210] RAX: 0003 RBX:
> >>>>> 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> >>>>> 0001 RSI:  RDI: 0001 [
> >>>>> 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> >>>>> 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> >>>>> 

Re: stalled head domain with 3.1rc4

2019-12-16 Thread Jan Kiszka via Xenomai

On 16.12.19 13:50, Lange Norbert wrote:




-Original Message-
From: Jan Kiszka 
Sent: Freitag, 13. Dezember 2019 14:44
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: stalled head domain with 3.1rc4

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
ATTACHMENTS.


On 13.12.19 14:35, Lange Norbert wrote:




-Original Message-
From: Jan Kiszka 
Sent: Freitag, 13. Dezember 2019 14:13
To: Lange Norbert ; Xenomai
(xenomai@xenomai.org) 
Subject: Re: stalled head domain with 3.1rc4

NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR

ATTACHMENTS.



On 13.12.19 13:25, Lange Norbert via Xenomai wrote:

Same thing with panic trace enabled (another, longer trace with 4000
samples attached)

[  292.743618] I-pipe: Detected stalled head domain, probably caused by

a

bug.

[  292.743618] A critical section may have been left unterminated.
[  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW

4.19.84-xeno8-static #1

[  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board

Product

Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
Linux [  292.778546] Call Trace:
[  292.781005]  
[  292.783034]  dump_stack+0x8c/0xc0
[  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
ipipe_stall_root+0xe/0x60 [  292.794322]
__ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
 RBX: 00025090 RCX:  [
292.837219] RDX:  RSI: 000c6130 RDI:
97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
 R09: 0027e6d0 [  292.851514] R10:
0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
292.858658] R13:  R14: 9e03bca0 R15:
000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
clockevents_program_event+0x88/0xe0
[  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
smp_apic_timer_interrupt+0x46/0x110
[  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
__ipipe_handle_irq+0xa0/0x220 [  292.895699]
ipipe_reschedule_interrupt+0x12/0x40
[  292.900412]  
[  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
[  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
[  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
ff15 [  292.934210] RAX: 0003 RBX:
97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
0001 RSI:  RDI: 0001 [
292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
0002e248 [  292.955644] R10: 97d43aeb7780 R11:
97d43a003800 R12:  [  292.962789] R13:
97d43aeb4c08 R14: 0004 R15: 0001 [
292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
292.997128]  ?

trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0

[  293.003495]  __jump_label_transform.isra.0+0x102/0x150
[  293.008645]  arch_jump_label_transform+0x2e/0x40
[  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
static_key_slow_inc_cpuslocked+0x75/0x80
[  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
tracepoint_probe_register_prio+0x1f3/0x2a0
[  293.031790]  ?
trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
[  293.038155]  __ftrace_event_enable_disable+0x6f/0x230
[  293.043217]  __ftrace_set_clr_event_nolock+0xe6/0x130
[  293.048280]  system_enable_write+0xaa/0xe0 [  293.052392]
do_iter_write+0x140/0x180 [  293.056151]  vfs_writev+0xa6/0xf0 [
293.059484]  do_writev+0x5f/0x100 [  293.062813]
do_syscall_64+0x82/0x4e0 [  293.066489]
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  293.071554] RIP: 0033:0x45874c
[  293.074619] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd
48 29 c2 49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 898
[  293.093397] RSP: 002b:7ffc91a57a00 EFLAGS: 0202 ORIG_RAX:
0014 [  293.100983] RAX: ffda RBX:
0002 RCX: 0045874c [  293.108129] RDX:
0002 RSI: 7ffc91a57a10 RDI: 0005 [
293.115275] RBP: 0002 R08: 00b7d4e0 R09:
8080808080808080 [  293.122422] R10: 0005 R11:

0202 R12: 0014 [  293.129569] R13:
7ffc91a57a10 R14: 0001 R15: 00b7d4e0 [
293.136722] I-pipe tracer log (100 points):

[  293.140917]  |*#func

RE: stalled head domain with 3.1rc4

2019-12-16 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 13. Dezember 2019 14:44
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.12.19 14:35, Lange Norbert wrote:
> >
> >
> >> -Original Message-
> >> From: Jan Kiszka 
> >> Sent: Freitag, 13. Dezember 2019 14:13
> >> To: Lange Norbert ; Xenomai
> >> (xenomai@xenomai.org) 
> >> Subject: Re: stalled head domain with 3.1rc4
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> >>> Same thing with panic trace enabled (another, longer trace with 4000
> >>> samples attached)
> >>>
> >>> [  292.743618] I-pipe: Detected stalled head domain, probably caused by
> a
> >> bug.
> >>> [  292.743618] A critical section may have been left unterminated.
> >>> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> >> 4.19.84-xeno8-static #1
> >>> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> >> Product
> >>> Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> >>> Linux [  292.778546] Call Trace:
> >>> [  292.781005]  
> >>> [  292.783034]  dump_stack+0x8c/0xc0
> >>> [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> >>> ipipe_stall_root+0xe/0x60 [  292.794322]
> >>> __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> >>> 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
> >>> 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> >>> 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> >>> 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >>>  RBX: 00025090 RCX:  [
> >>> 292.837219] RDX:  RSI: 000c6130 RDI:
> >>> 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >>>  R09: 0027e6d0 [  292.851514] R10:
> >>> 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> >>> 292.858658] R13:  R14: 9e03bca0 R15:
> >>> 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> >>> 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> >>> clockevents_program_event+0x88/0xe0
> >>> [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> >>> smp_apic_timer_interrupt+0x46/0x110
> >>> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> >>> __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> >>> ipipe_reschedule_interrupt+0x12/0x40
> >>> [  292.900412]  
> >>> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> >>> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> >>> fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> >>> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
> >>> ff15 [  292.934210] RAX: 0003 RBX:
> >>> 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> >>> 0001 RSI:  RDI: 0001 [
> >>> 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> >>> 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> >>> 97d43a003800 R12:  [  292.962789] R13:
> >>> 97d43aeb4c08 R14: 0004 R15: 0001 [
> >>> 292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
> >>> optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
> >>> xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
> >>> 292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
> >>> xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
> >>> 292.997128]  ?
> >> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> >>> [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
> >>> [  293.008645]  arch_jump_label_transform+0x2e/0x40
> >>> [  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
> >>> static_key_slow_inc_cpuslocked+0x75/0x80
> >>> [  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
>

Re: stalled head domain with 3.1rc4

2019-12-13 Thread Jan Kiszka via Xenomai
On 13.12.19 14:35, Lange Norbert wrote:
> 
> 
>> -Original Message-
>> From: Jan Kiszka 
>> Sent: Freitag, 13. Dezember 2019 14:13
>> To: Lange Norbert ; Xenomai
>> (xenomai@xenomai.org) 
>> Subject: Re: stalled head domain with 3.1rc4
>>
>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>> ATTACHMENTS.
>>
>>
>> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
>>> Same thing with panic trace enabled (another, longer trace with 4000
>>> samples attached)
>>>
>>> [  292.743618] I-pipe: Detected stalled head domain, probably caused by a
>> bug.
>>> [  292.743618] A critical section may have been left unterminated.
>>> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
>> 4.19.84-xeno8-static #1
>>> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
>> Product
>>> Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
>>> Linux [  292.778546] Call Trace:
>>> [  292.781005]  
>>> [  292.783034]  dump_stack+0x8c/0xc0
>>> [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
>>> ipipe_stall_root+0xe/0x60 [  292.794322]
>>> __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
>>> 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
>>> 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
>>> 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
>>> 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
>>>  RBX: 00025090 RCX:  [
>>> 292.837219] RDX:  RSI: 000c6130 RDI:
>>> 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
>>>  R09: 0027e6d0 [  292.851514] R10:
>>> 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
>>> 292.858658] R13:  R14: 9e03bca0 R15:
>>> 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
>>> 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
>>> clockevents_program_event+0x88/0xe0
>>> [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
>>> smp_apic_timer_interrupt+0x46/0x110
>>> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
>>> __ipipe_handle_irq+0xa0/0x220 [  292.895699]
>>> ipipe_reschedule_interrupt+0x12/0x40
>>> [  292.900412]  
>>> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
>>> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
>>> fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
>>> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
>>> ff15 [  292.934210] RAX: 0003 RBX:
>>> 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
>>> 0001 RSI:  RDI: 0001 [
>>> 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
>>> 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
>>> 97d43a003800 R12:  [  292.962789] R13:
>>> 97d43aeb4c08 R14: 0004 R15: 0001 [
>>> 292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
>>> optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
>>> xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
>>> 292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
>>> xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
>>> 292.997128]  ?
>> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
>>> [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
>>> [  293.008645]  arch_jump_label_transform+0x2e/0x40
>>> [  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
>>> static_key_slow_inc_cpuslocked+0x75/0x80
>>> [  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
>>> tracepoint_probe_register_prio+0x1f3/0x2a0
>>> [  293.031790]  ?
>>> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
>>> [  293.038155]  __ftrace_event_enable_disable+0x6f/0x230
>>> [  293.043217]  __ftrace_set_clr_event_nolock+0xe6/0x130
>>> [  293.048280]  system_enable_write+0xaa/0xe0 [  293.052392]
>>> do_iter_write+0x140/0x180 [  293.056151]  vfs_writev+0xa6/0xf0 [
>>> 293.059484]  do_writev+0x5f/0x100 [  293.062813]
>>> do_syscall_64+0x82/0x4e0 [  293.066489]
>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [  293.071554] RIP: 0033:0x45874c
>>&g

RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai


> -Original Message-
> From: Jan Kiszka 
> Sent: Freitag, 13. Dezember 2019 14:13
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) 
> Subject: Re: stalled head domain with 3.1rc4
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> > Same thing with panic trace enabled (another, longer trace with 4000
> > samples attached)
> >
> > [  292.743618] I-pipe: Detected stalled head domain, probably caused by a
> bug.
> > [  292.743618] A critical section may have been left unterminated.
> > [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW
> 4.19.84-xeno8-static #1
> > [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> Product
> > Name, BIOS 5.12.30.21.20 08/05/2019 [  292.775304] I-pipe domain:
> > Linux [  292.778546] Call Trace:
> > [  292.781005]  
> > [  292.783034]  dump_stack+0x8c/0xc0
> > [  292.786363]  ipipe_root_only.cold+0x11/0x32 [  292.790560]
> > ipipe_stall_root+0xe/0x60 [  292.794322]
> > __ipipe_trap_prologue+0x11d/0x2f0 [  292.798782]  int3+0x45/0x70 [
> > 292.801592] RIP: 0010:xntimer_start+0x3a/0x330 [  292.806050] Code: 55
> > 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 4c 8b 37 48 63
> > 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f [  292.824832] RSP:
> > 0018:97d43ac03e78 EFLAGS: 0082 [  292.830075] RAX:
> >  RBX: 00025090 RCX:  [
> > 292.837219] RDX:  RSI: 000c6130 RDI:
> > 97d43aeb0708 [  292.844367] RBP: 97d43aeb0708 R08:
> >  R09: 0027e6d0 [  292.851514] R10:
> > 0043f5344961 R11: 0043f5344961 R12: 97d43aebb020 [
> > 292.858658] R13:  R14: 9e03bca0 R15:
> > 000c6130 [  292.865804]  ? xntimer_start+0x3a/0x330 [
> > 292.869653]  program_htick_shot+0x8d/0x130 [  292.873761]
> > clockevents_program_event+0x88/0xe0
> > [  292.878392]  hrtimer_interrupt+0x140/0x230 [  292.882502]
> > smp_apic_timer_interrupt+0x46/0x110
> > [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0 [  292.891592]
> > __ipipe_handle_irq+0xa0/0x220 [  292.895699]
> > ipipe_reschedule_interrupt+0x12/0x40
> > [  292.900412]  
> > [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> > [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4
> > fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> > [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX:
> > ff15 [  292.934210] RAX: 0003 RBX:
> > 97d43aeb4c00 RCX: 97d43b2b7ac0 [  292.941357] RDX:
> > 0001 RSI:  RDI: 0001 [
> > 292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09:
> > 0002e248 [  292.955644] R10: 97d43aeb7780 R11:
> > 97d43a003800 R12:  [  292.962789] R13:
> > 97d43aeb4c08 R14: 0004 R15: 0001 [
> > 292.969936]  ? optimize_nops.isra.0+0x90/0x90 [  292.974306]  ?
> > optimize_nops.isra.0+0x90/0x90 [  292.978673]  ?
> > xntimer_start+0x39/0x330 [  292.982519]  ? xntimer_start+0x3a/0x330 [
> > 292.986368]  on_each_cpu+0x28/0x50 [  292.989782]  ?
> > xntimer_start+0x39/0x330 [  292.993630]  text_poke_bp+0x68/0xde [
> > 292.997128]  ?
> trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> > [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
> > [  293.008645]  arch_jump_label_transform+0x2e/0x40
> > [  293.013276]  __jump_label_update+0x67/0xa0 [  293.017382]
> > static_key_slow_inc_cpuslocked+0x75/0x80
> > [  293.022445]  static_key_slow_inc+0x16/0x20 [  293.026555]
> > tracepoint_probe_register_prio+0x1f3/0x2a0
> > [  293.031790]  ?
> > trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> > [  293.038155]  __ftrace_event_enable_disable+0x6f/0x230
> > [  293.043217]  __ftrace_set_clr_event_nolock+0xe6/0x130
> > [  293.048280]  system_enable_write+0xaa/0xe0 [  293.052392]
> > do_iter_write+0x140/0x180 [  293.056151]  vfs_writev+0xa6/0xf0 [
> > 293.059484]  do_writev+0x5f/0x100 [  293.062813]
> > do_syscall_64+0x82/0x4e0 [  293.066489]
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  293.071554] RIP: 0033:0x45874c
> > [  293.074619] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd
> > 48 29 c2 49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 898
> > [  293.093397] RSP: 002b:7ffc91a57a00 EFLAGS: 0202 ORIG_RAX:
> > 0014 [  293.100983] RAX: ffda RBX:
> > 

Re: stalled head domain with 3.1rc4

2019-12-13 Thread Jan Kiszka via Xenomai
On 13.12.19 13:25, Lange Norbert via Xenomai wrote:
> Same thing with panic trace enabled (another, longer trace with 4000 samples 
> attached)
> 
> [  292.743618] I-pipe: Detected stalled head domain, probably caused by a bug.
> [  292.743618] A critical section may have been left unterminated.
> [  292.757195] CPU: 0 PID: 1159 Comm: trace-cmd Tainted: GW 
> 4.19.84-xeno8-static #1
> [  292.765986] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
> BIOS 5.12.30.21.20 08/05/2019
> [  292.775304] I-pipe domain: Linux
> [  292.778546] Call Trace:
> [  292.781005]  
> [  292.783034]  dump_stack+0x8c/0xc0
> [  292.786363]  ipipe_root_only.cold+0x11/0x32
> [  292.790560]  ipipe_stall_root+0xe/0x60
> [  292.794322]  __ipipe_trap_prologue+0x11d/0x2f0
> [  292.798782]  int3+0x45/0x70
> [  292.801592] RIP: 0010:xntimer_start+0x3a/0x330
> [  292.806050] Code: 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 
> 4c 8b 37 48 63 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 e3f
> [  292.824832] RSP: 0018:97d43ac03e78 EFLAGS: 0082
> [  292.830075] RAX:  RBX: 00025090 RCX: 
> 
> [  292.837219] RDX:  RSI: 000c6130 RDI: 
> 97d43aeb0708
> [  292.844367] RBP: 97d43aeb0708 R08:  R09: 
> 0027e6d0
> [  292.851514] R10: 0043f5344961 R11: 0043f5344961 R12: 
> 97d43aebb020
> [  292.858658] R13:  R14: 9e03bca0 R15: 
> 000c6130
> [  292.865804]  ? xntimer_start+0x3a/0x330
> [  292.869653]  program_htick_shot+0x8d/0x130
> [  292.873761]  clockevents_program_event+0x88/0xe0
> [  292.878392]  hrtimer_interrupt+0x140/0x230
> [  292.882502]  smp_apic_timer_interrupt+0x46/0x110
> [  292.887132]  __ipipe_do_sync_stage+0x15d/0x1c0
> [  292.891592]  __ipipe_handle_irq+0xa0/0x220
> [  292.895699]  ipipe_reschedule_interrupt+0x12/0x40
> [  292.900412]  
> [  292.902525] RIP: 0010:smp_call_function_many+0x1b6/0x250
> [  292.907848] Code: e8 4f 23 6c 00 3b 05 5d 5f 01 01 89 c7 0f 83 c4 fe ff ff 
> 48 63 c7 48 8b 0b 48 03 0c c5 00 e3 f1 9d 8b 41 18 a8 01 745
> [  292.926626] RSP: 0018:ab24c0c9bb40 EFLAGS: 0202 ORIG_RAX: 
> ff15
> [  292.934210] RAX: 0003 RBX: 97d43aeb4c00 RCX: 
> 97d43b2b7ac0
> [  292.941357] RDX: 0001 RSI:  RDI: 
> 0001
> [  292.948500] RBP: 9d017b70 R08: 97d43aeb4c08 R09: 
> 0002e248
> [  292.955644] R10: 97d43aeb7780 R11: 97d43a003800 R12: 
> 
> [  292.962789] R13: 97d43aeb4c08 R14: 0004 R15: 
> 0001
> [  292.969936]  ? optimize_nops.isra.0+0x90/0x90
> [  292.974306]  ? optimize_nops.isra.0+0x90/0x90
> [  292.978673]  ? xntimer_start+0x39/0x330
> [  292.982519]  ? xntimer_start+0x3a/0x330
> [  292.986368]  on_each_cpu+0x28/0x50
> [  292.989782]  ? xntimer_start+0x39/0x330
> [  292.993630]  text_poke_bp+0x68/0xde
> [  292.997128]  ? trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> [  293.003495]  __jump_label_transform.isra.0+0x102/0x150
> [  293.008645]  arch_jump_label_transform+0x2e/0x40
> [  293.013276]  __jump_label_update+0x67/0xa0
> [  293.017382]  static_key_slow_inc_cpuslocked+0x75/0x80
> [  293.022445]  static_key_slow_inc+0x16/0x20
> [  293.026555]  tracepoint_probe_register_prio+0x1f3/0x2a0
> [  293.031790]  ? trace_event_raw_event_cobalt_thread_suspend+0xe0/0xe0
> [  293.038155]  __ftrace_event_enable_disable+0x6f/0x230
> [  293.043217]  __ftrace_set_clr_event_nolock+0xe6/0x130
> [  293.048280]  system_enable_write+0xaa/0xe0
> [  293.052392]  do_iter_write+0x140/0x180
> [  293.056151]  vfs_writev+0xa6/0xf0
> [  293.059484]  do_writev+0x5f/0x100
> [  293.062813]  do_syscall_64+0x82/0x4e0
> [  293.066489]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  293.071554] RIP: 0033:0x45874c
> [  293.074619] Code: ed 01 48 29 d0 49 83 c5 10 49 8b 55 08 48 63 dd 48 29 c2 
> 49 01 45 00 49 89 55 08 49 63 7f 78 4c 89 e0 4c 89 ee 48 898
> [  293.093397] RSP: 002b:7ffc91a57a00 EFLAGS: 0202 ORIG_RAX: 
> 0014
> [  293.100983] RAX: ffda RBX: 0002 RCX: 
> 0045874c
> [  293.108129] RDX: 0002 RSI: 7ffc91a57a10 RDI: 
> 0005
> [  293.115275] RBP: 0002 R08: 00b7d4e0 R09: 
> 8080808080808080
> [  293.122422] R10: 0005 R11: 0202 R12: 
> 0014
> [  293.129569] R13: 7ffc91a57a10 R14: 0001 R15: 
> 00b7d4e0
> [  293.136722] I-pipe tracer log (100 points):
> [  293.140917]  |*#func0 ipipe_trace_panic_freeze+0x0 
> (ipipe_root_only+0xcf)
> [  293.149511]  |*#func0 ipipe_root_only+0x0 
> (ipipe_stall_root+0xe)
> [  293.157323]  |*#func   -1 ipipe_stall_root+0x0 
> (__ipipe_trap_prologue+0x11d)
> [  293.165833]  |*#func   -1 ipipe_test_root+0x0 
> (__ipipe_trap_prologue+0xbf)
> [  29

RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
nc  -24 __update_load_avg_se+0x0 
(update_load_avg+0x341)
[  293.638525]#func  -25 cgroup_rstat_updated+0x0 
(__cgroup_account_cputime+0x24)
[  293.647558]#func  -25 __cgroup_account_cputime+0x0 
(update_curr+0x101)
[  293.655891]#func  -26 cpuacct_charge+0x0 
(update_curr+0xe4)
[  293.663270]#func  -26 update_min_vruntime+0x0 
(update_curr+0x73)
[  293.671087]#func  -26 update_curr+0x0 
(task_tick_fair+0x3d)
[  293.678468]#func  -27 task_tick_fair+0x0 
(scheduler_tick+0x5d)
[  293.686107]#func  -27 __accumulate_pelt_segments+0x0 
(update_irq_load_avg+0x22c)
[  293.695310]#func  -28 update_irq_load_avg+0x0 
(scheduler_tick+0x4b)
[  293.703381]#func  -28 update_rq_clock+0x0 
(scheduler_tick+0x4b)
[  293.711104]#func  -28 _raw_spin_lock+0x0 
(scheduler_tick+0x3c)
[  293.718744]#func  -29 scheduler_tick+0x0 
(update_process_times+0x69)
[  293.726901]  | #end 0x8001-29 ipipe_test_root+0x55 (<>)
[  293.733930]  | #begin   0x8001-30 ipipe_test_root+0x40 (<>)
[  293.740959]#func  -30 ipipe_test_root+0x0 
(irq_work_run_list+0xe)
[  293.748862]#func  -30 rcu_segcblist_ready_cbs+0x0 
(rcu_check_callbacks+0x16d)
[  293.757803]#func  -31 rcu_segcblist_ready_cbs+0x0 
(rcu_check_callbacks+0x16d)
[  293.766742]#func  -31 rcu_check_callbacks+0x0 
(update_process_times+0x41)
[  293.775335]  | #end 0x8001-32 ipipe_stall_root+0x53 (<>)
[  293.782451]  | #begin   0x8001-32 ipipe_stall_root+0x47 (<>)
[  293.789569]  | #end 0x8001-32 ipipe_root_only+0x74 (<>)
[  293.796601]  | #begin   0x8001-33 ipipe_root_only+0x68 (<>)
[  293.803633]#func  -33 ipipe_root_only+0x0 
(ipipe_stall_root+0xe)
[  293.811445]#func  -33 ipipe_stall_root+0x0 
(update_process_times+0x3a)
[  293.819778]  | #end 0x8001-34 ipipe_root_only+0x74 (<>)
[  293.826809]  | #begin   0x8001-34 ipipe_root_only+0x68 (<>)
[  293.833837]#func  -35 ipipe_root_only+0x0 
(ipipe_restore_root+0xe)
[  293.841821]#func  -35 ipipe_restore_root+0x0 
(update_process_times+0x3a)
[  293.850327]  | #end 0x8001-35 ipipe_stall_root+0x53 (<>)
[  293.857444]  | #begin   0x8001-36 ipipe_stall_root+0x47 (<>)
[  293.864560]  | #end 0x8001-36 ipipe_root_only+0x74 (<>)
[  293.871590]  | #begin   0x8001-37 ipipe_root_only+0x68 (<>)
[  293.878622]#func  -37 ipipe_root_only+0x0 
(ipipe_stall_root+0xe)
[  293.886434]#func  -37 ipipe_stall_root+0x0 
(raise_softirq+0x1f)
[  293.894163]  | #end 0x8001-38 ipipe_test_root+0x55 (<>)
[  293.901192]  | #begin   0x8001-38 ipipe_test_root+0x40 (<>)
[  293.908221]#func  -38 ipipe_test_root+0x0 
(raise_softirq+0x13)
[  293.915861]#func  -39 raise_softirq+0x0 
(update_process_times+0x3a)
[  293.923933]#func  -39 hrtimer_run_queues+0x0 
(run_local_timers+0x1a)
[  293.932092]#func  -39 run_local_timers+0x0 
(update_process_times+0x3a)
[  293.960301] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and 
stat_runtime require the kernel parameter schedstats=enabl1

> -Original Message-----
> From: Lange Norbert 
> Sent: Freitag, 13. Dezember 2019 11:54
> To: Lange Norbert 
> Cc: Philippe Gerum (r...@xenomai.org) 
> Subject: RE: stalled head domain with 3.1rc4
>
> I now removed calls to recv/send_mmsg and instead call the single *msg
> variant in a loop. This makes the bug appear less, but it now triggered once
> when stopping the trace, so there might be goods in there for you.
> (the last sendmsg/recvmsg pair at 1842.622889 -> 1842.622956  is the IDDP
> socket to wakeup the other process)
>
> [ 1842.420470] I-pipe: Detected stalled head domain, probably caused by a
> bug.
> [ 1842.420470] A critical section may have been left unterminated.
> [ 1842.434053] CPU: 0 PID: 1353 Comm: trace-cmd Not tainted 4.19.84-xeno8-
> static #1 [ 1842.441456] Hardware name: TQ-Group TQMxE39M/Type2 -
> Board Product Name, BIOS 5.12.30.21.20 08/05/2019 [ 1842.450773] I-pipe
> domain: Linux [ 1842.454014] Call Trace:
> [ 1842.456472]  
> [ 1842.458502]  dump_stack+0x8c/0xc0
> [ 1842.461829]  ipipe_stall_root+0xc/0x30 [ 1842.465591]
> __ipipe_trap_prologue+0x100/0x210 [ 1842.470045]  int3+0x45/0x70 [
> 1842.472854] RIP: 0010:xntimer_start+0x3a/0x3

RE: stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
Added the trace starting 1 second before the bug (might help you more).
(last one was the same trace cut at the time of the bug)

> -Original Message-
> From: Lange Norbert 
> Sent: Freitag, 13. Dezember 2019 11:54
> To: Lange Norbert 
> Cc: Philippe Gerum (r...@xenomai.org) 
> Subject: RE: stalled head domain with 3.1rc4
>
> I now removed calls to recv/send_mmsg and instead call the single *msg
> variant in a loop. This makes the bug appear less,
> but it now triggered once when stopping the trace, so there might be goods
> in there for you.
> (the last sendmsg/recvmsg pair at 1842.622889 -> 1842.622956  is the IDDP
> socket to wakeup the other process)
>
> [ 1842.420470] I-pipe: Detected stalled head domain, probably caused by a
> bug.
> [ 1842.420470] A critical section may have been left unterminated.
> [ 1842.434053] CPU: 0 PID: 1353 Comm: trace-cmd Not tainted 4.19.84-xeno8-
> static #1
> [ 1842.441456] Hardware name: TQ-Group TQMxE39M/Type2 - Board
> Product Name, BIOS 5.12.30.21.20 08/05/2019
> [ 1842.450773] I-pipe domain: Linux
> [ 1842.454014] Call Trace:
> [ 1842.456472]  
> [ 1842.458502]  dump_stack+0x8c/0xc0
> [ 1842.461829]  ipipe_stall_root+0xc/0x30
> [ 1842.465591]  __ipipe_trap_prologue+0x100/0x210
> [ 1842.470045]  int3+0x45/0x70
> [ 1842.472854] RIP: 0010:xntimer_start+0x3a/0x330
> [ 1842.477308] Code: 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 10 48 8b 47 70 
> 4c
> 8b 37 48 63 40 18 4d 8b a6 90 00 00 00 4c 03 24 c5 00 d3f
> [ 1842.496083] RSP: 0018:8fe9fba03e80 EFLAGS: 0082
> [ 1842.501324] RAX:  RBX: 00025090 RCX:
> 
> [ 1842.508468] RDX:  RSI: 0003b55f RDI:
> 8fe9fba305c8
> [ 1842.515609] RBP: 8fe9fba305c8 R08:  R09:
> 01acc52f873d
> [ 1842.522754] R10: 01acc52b974d R11: 01acc52b974d R12:
> 8fe9fba3aee0
> [ 1842.529898] R13:  R14: b223bbe0 R15:
> 0003b55f
> [ 1842.537044]  ? xntimer_start+0x3a/0x330
> [ 1842.540889]  ? enqueue_hrtimer+0x36/0x90
> [ 1842.544823]  program_htick_shot+0x83/0x100
> [ 1842.548931]  clockevents_program_event+0x88/0xe0
> [ 1842.553561]  hrtimer_interrupt+0x140/0x230
> [ 1842.557669]  smp_apic_timer_interrupt+0x46/0x110
> [ 1842.562296]  __ipipe_do_sync_stage+0x130/0x180
> [ 1842.566751]  __ipipe_handle_irq+0x94/0x200
> [ 1842.570860]  apic_timer_interrupt+0x12/0x40
> [ 1842.575054]  
> [ 1842.577163] RIP: 0010:smp_call_function_many+0x1b6/0x250
> [ 1842.582485] Code: e8 6f a0 6b 00 3b 05 dd 60 01 01 89 c7 0f 83 c4 fe ff ff 
> 48
> 63 c7 48 8b 0b 48 03 0c c5 00 d3 11 b2 8b 41 18 a8 01 745
> [ 1842.601264] RSP: 0018:957380bbfba8 EFLAGS: 0202 ORIG_RAX:
> ff13
> [ 1842.608846] RAX: 0003 RBX: 8fe9fba34ac0 RCX:
> 8fe9fbbb8680
> [ 1842.615989] RDX: 0001 RSI:  RDI:
> 0003
> [ 1842.623133] RBP: b12179a0 R08: 8fe9fba34ac8 R09:
> 
> [ 1842.630276] R10: 000a R11: f000 R12:
> 
> [ 1842.637417] R13: 8fe9fba34ac8 R14: 0004 R15:
> 0001
> [ 1842.644565]  ? optimize_nops.isra.0+0x90/0x90
> [ 1842.648934]  ? smp_call_function_many+0x191/0x250
> [ 1842.653650]  ? optimize_nops.isra.0+0x90/0x90
> [ 1842.658015]  ? xntimer_start+0x39/0x330
> [ 1842.661859]  ? xntimer_start+0x3a/0x330
> [ 1842.665705]  on_each_cpu+0x28/0x50
> [ 1842.669116]  ? xntimer_start+0x39/0x330
> [ 1842.672959]  text_poke_bp+0x91/0xde
> [ 1842.676460]  __jump_label_transform.isra.0+0x102/0x150
> [ 1842.681610]  arch_jump_label_transform+0x2e/0x40
> [ 1842.686239]  __jump_label_update+0x67/0xa0
> [ 1842.690348]  __static_key_slow_dec_cpuslocked+0x30/0x80
> [ 1842.695583]  static_key_slow_dec+0x23/0x50
> [ 1842.699689]  tracepoint_probe_unregister+0x176/0x1b0
> [ 1842.704661]  trace_event_reg+0x31/0xa0
> [ 1842.708421]  ? mutex_lock+0x13/0x30
> [ 1842.711921]  __ftrace_event_enable_disable+0x120/0x230
> [ 1842.717072]  __ftrace_set_clr_event_nolock+0xe6/0x130
> [ 1842.722133]  system_enable_write+0xaa/0xe0
> [ 1842.726240]  __vfs_write+0x34/0x190
> [ 1842.729739]  ? __check_heap_object+0x5/0x120
> [ 1842.734021]  ? __check_object_size+0x136/0x147
> [ 1842.738474]  ? rcu_all_qs+0x5/0x80
> [ 1842.741884]  vfs_write+0xb6/0x190
> [ 1842.745210]  ksys_write+0x57/0xd0
> [ 1842.748537]  do_syscall_64+0x78/0x3c0
> [ 1842.752212]  ? __do_page_fault+0x207/0x400
> [ 1842.756319]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1842.761381] RIP: 0033:0x45f5d9
> [ 1842.76] Code: 89 d6 0f 05 c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 
> f8
> 4d 89 c2 48 89 f7 4d 89 c8 48 89

stalled head domain with 3.1rc4

2019-12-13 Thread Lange Norbert via Xenomai
Just had a bug msg pop up. Its triggered by enabling tracing, while we have 2 
processes running, using IDDP, XDDP and RTNet (just packet sockets, no ip 
stack).
Some points:

-   trace-cmd stores in tmp, so shouldn't touch other filesystems than 
tmpfs, sysfs

-   upon starting this, our process complains about a 150ms hole in CPU 
time (likely the time of the bug)

-   it seems to happen only the first time after a boot

-   running trace-cmd "dry" (without our processes) doesn't trigger the 
bug. Neither when disabling active communication on our project (per 
millisecond up to 15 eth packets in both directions via packet socket, using 
the new send/recv_mmsg calls).

-   system seems to continue stable afterwards

-   a trace is attached, not after triggering the bug (then it would just 
contain our project in error state) but showing or project with active 
communication  (ie. trace-cmd started a second time after a bug)


# trace-cmd record -e 'cobalt*'
[  160.443596] I-pipe: Detected stalled head domain, probably caused by a bug.
[  160.443596] A critical section may have been left unterminated.
[  160.457178] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.84-xeno8-static #1
[  160.464323] Hardware name: TQ-Group TQMxE39M/Type2 - Board Product Name, 
BIOS 5.12.30.21.20 08/05/2019
[  160.473640] I-pipe domain: Linux
[  160.476877] Call Trace:
[  160.479345]  dump_stack+0x8c/0xc0
[  160.482672]  ipipe_stall_root+0xc/0x30
[  160.486436]  __ipipe_trap_prologue+0x100/0x210
[  160.490894]  int3+0x45/0x70
[  160.493702] RIP: 0010:xnthread_resume+0x75/0x3a0
[  160.498329] Code: 0f eb 00 74 21 31 c0 ba 01 00 00 00 f0 0f b1 15 c5 0f eb 
00 85 c0 0f 85 db 02 00 00 4c 8b 2c 24 89 1d af 0f eb 00 4d0
[  160.517108] RSP: 0018:9934400a7dd8 EFLAGS: 0046
[  160.522349] RAX: 0001 RBX: 0001 RCX: 7f37aa603700
[  160.529490] RDX: 0001 RSI: 0080 RDI: 9934405dc240
[  160.536631] RBP: 9934405dc240 R08: 000f7df7 R09: 9140f8cb2800
[  160.543774] R10: 03b3 R11: 000b8c4a R12: 00025090
[  160.550918] R13: 0003 R14: 0080 R15: 0080
[  160.558064]  ? xnthread_resume+0x75/0x3a0
[  160.562083]  ? xnthread_resume+0x1f/0x3a0
[  160.566104]  ipipe_migration_hook+0xda/0x1d0
[  160.570385]  complete_domain_migration+0x79/0xe0
[  160.575011]  __ipipe_switch_tail+0x39/0x50
[  160.579118]  __schedule+0x2d0/0x890
[  160.582615]  schedule_idle+0x28/0x40
[  160.586203]  do_idle+0x101/0x130
[  160.589440]  cpu_startup_entry+0x6f/0x80
[  160.593373]  start_secondary+0x169/0x1b0
[  160.597312]  secondary_startup_64+0xa4/0xb0



Mit besten Grüßen / Kind regards

NORBERT LANGE

AT-RD3

ANDRITZ HYDRO GmbH
Eibesbrunnergasse 20
1120 Vienna / AUSTRIA
p: +43 50805 56684
norbert.la...@andritz.com
andritz.com



This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You

-- next part --
A non-text attachment was scrubbed...
Name: trace.dat.xz
Type: application/octet-stream
Size: 2775472 bytes
Desc: trace.dat.xz
URL: