> -----Original Message-----
> From: Jan Kiszka <jan.kis...@siemens.com>
> Sent: Dienstag, 25. Februar 2020 18:05
> To: Lange Norbert <norbert.la...@andritz.com>; Greg Gallagher
> <g...@embeddedgreg.com>
> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 25.02.20 16:55, Lange Norbert wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Kiszka <jan.kis...@siemens.com>
> >> Sent: Dienstag, 25. Februar 2020 16:47
> >> To: Lange Norbert <norbert.la...@andritz.com>; Greg Gallagher
> >> <g...@embeddedgreg.com>
> >> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Greg Gallagher <g...@embeddedgreg.com>
> >>>> Sent: Dienstag, 25. Februar 2020 16:24
> >>>> To: Lange Norbert <norbert.la...@andritz.com>
> >>>> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>;
> Philippe
> >>>> Gerum (r...@xenomai.org) <r...@xenomai.org>
> >>>> Subject: Re: Interrupt timeout
> >>>>
> >>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >>>> <xenomai@xenomai.org> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I hope you can give me some pointers to understand why this Bug
> >>>> happened.
> >>>>> It seems an interrupt got lost somehow, maybe some issue with
> >>>> leveltriggers?
> >>>>>
> >>>>> Note that I run on an Apollo Lake, which would normally use
> >>>>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The
> system
> >>>>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>>>
> >>>>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
> >> 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci
> Kernel
> >>>>> modules: sdhci_pci
> >>>>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel
> modules:
> >>>>> sdhci_pci
> >>>>>
> >>>>> # cat /proc/interrupts
> >>>>>               CPU0       CPU1       CPU2       CPU3
> >>>>>      0:         50          0          0          0   IO-APIC    2-edge 
> >>>>>      timer
> >>>>>      4:          0          0        498          0   IO-APIC    4-edge 
> >>>>>      ttyS0
> >>>>>      8:          0          0          0          0   IO-APIC    8-edge 
> >>>>>      rtc0
> >>>>>      9:          0          0          0          0   IO-APIC    
> >>>>> 9-fasteoi   acpi
> >>>>>     20:         20          0          0          0   IO-APIC   
> >>>>> 20-fasteoi   i801_smbus
> >>>>>     39:          0       1315          0          0   IO-APIC   
> >>>>> 39-fasteoi   mmc0
> >>>>>    129:          0          1          0          0   PCI-MSI 
> >>>>> 1048576-edge      enp2s0
> >>>>>    130:          0          0        839          0   PCI-MSI 
> >>>>> 1048577-edge      enp2s0-rx-0
> >>>>>    131:          0          0          0        773   PCI-MSI 
> >>>>> 1048578-edge      enp2s0-rx-1
> >>>>>    132:        985          0          0          0   PCI-MSI 
> >>>>> 1048579-edge      enp2s0-tx-0
> >>>>>    133:          0        773          0          0   PCI-MSI 
> >>>>> 1048580-edge      enp2s0-tx-1
> >>>>>    134:          0          0          0          1   PCI-MSI 
> >>>>> 1572864-edge      enp3s0
> >>>>>    135:      11464          0          0          0   PCI-MSI 
> >>>>> 1572865-edge      enp3s0-
> rx-0
> >>>>>    136:          0        781          0          0   PCI-MSI 
> >>>>> 1572866-edge      enp3s0-rx-1
> >>>>>    137:          0          0        899          0   PCI-MSI 
> >>>>> 1572867-edge      enp3s0-tx-0
> >>>>>    138:          0          0          0       9701   PCI-MSI 
> >>>>> 1572868-edge      enp3s0-tx-
> 1
> >>>>>    139:          1          0          0          0   PCI-MSI 
> >>>>> 2097152-edge      enp4s0
> >>>>>    140:          0       1985          0          0   PCI-MSI 
> >>>>> 2097153-edge      enp4s0-
> TxRx-
> >> 0
> >>>>>    141:          0          0        774          0   PCI-MSI 
> >>>>> 2097154-edge      enp4s0-
> TxRx-1
> >>>>>    142:          0          0          0       1905   PCI-MSI 
> >>>>> 2097155-edge      enp4s0-
> TxRx-
> >> 2
> >>>>>    143:        775          0          0          0   PCI-MSI 
> >>>>> 2097156-edge      enp4s0-
> TxRx-3
> >>>>>    144:          0          0      97790          0   PCI-MSI 
> >>>>> 344064-edge      xhci_hcd
> >>>>>    NMI:          0          0          0          0   Non-maskable 
> >>>>> interrupts
> >>>>>    LOC:      47839     147583      13807      32602   Local timer 
> >>>>> interrupts
> >>>>>    SPU:          0          0          0          0   Spurious 
> >>>>> interrupts
> >>>>>    PMI:          0          0          0          0   Performance 
> >>>>> monitoring interrupts
> >>>>>    IWI:          0          0          0          0   IRQ work 
> >>>>> interrupts
> >>>>>    RTR:          0          0          0          0   APIC ICR read 
> >>>>> retries
> >>>>>    RES:      11955       6931       6567       4946   Rescheduling 
> >>>>> interrupts
> >>>>>    CAL:        268        223        210        217   Function call 
> >>>>> interrupts
> >>>>>    TLB:         63         57         61         50   TLB shootdowns
> >>>>>    TRM:          0          0          0          0   Thermal event 
> >>>>> interrupts
> >>>>>    THR:          0          0          0          0   Threshold APIC 
> >>>>> interrupts
> >>>>>    MCE:          0          0          0          0   Machine check 
> >>>>> exceptions
> >>>>>    MCP:          5          6          6          6   Machine check 
> >>>>> polls
> >>>>>    ERR:          0
> >>>>>    MIS:          0
> >>>>>    PIN:          0          0          0          0   Posted-interrupt 
> >>>>> notification event
> >>>>>    NPI:          0          0          0          0   Nested 
> >>>>> posted-interrupt event
> >>>>>    PIW:          0          0          0          0   Posted-interrupt 
> >>>>> wakeup event
> >>>>>
> >>>>>
> >>>>
> >>>>> [  238.245509] mmc0: Timeout waiting for hardware interrupt.
> >>>>> [  238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [  238.257422] mmc0: sdhci: Sys addr:  0x20000008 |
> >>>>> Version:  0x00001002 [  238.263900] mmc0: sdhci: Blk size:  0x00007200
> >>>>> | Blk cnt:  0x00000000 [  238.270373] mmc0: sdhci: Argument:
> 0x00118678
> >> |
> >>>> Trn mode: 0x0000002b
> >>>>> [  238.276848] mmc0: sdhci: Present:   0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [  238.283323] mmc0: sdhci: Power:     0x0000000b | Blk gap:
> 0x00000080
> >>>>> [  238.289797] mmc0: sdhci: Wake-up:   0x00000000 | Clock:
> 0x00000007
> >>>>> [  238.296265] mmc0: sdhci: Timeout:   0x00000007 | Int stat:
> 0x00000003
> >>>>> [  238.302734] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [  238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [  238.315682] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [  238.322154] mmc0: sdhci: Cmd:       0x0000193a | Max curr:
> 0x00000000
> >>>>> [  238.328618] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
> >>>>> [  238.335080] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [  238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [  238.346016]
> mmc0:
> >>>>> sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 238.353180] mmc0: sdhci:
> >>>> ============================================
> >>>>> [  248.485507] mmc0: Timeout waiting for hardware interrupt.
> >>>>> [  248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [  248.497412] mmc0: sdhci: Sys addr:  0x20000008 |
> >>>>> Version:  0x00001002 [  248.503887] mmc0: sdhci: Blk size:  0x00007200
> >>>>> | Blk cnt:  0x00000000 [  248.510362] mmc0: sdhci: Argument:
> 0x00000000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [  248.516838] mmc0: sdhci: Present:   0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [  248.523313] mmc0: sdhci: Power:     0x0000000b | Blk gap:
> 0x00000080
> >>>>> [  248.529790] mmc0: sdhci: Wake-up:   0x00000000 | Clock:
> 0x00000007
> >>>>> [  248.536266] mmc0: sdhci: Timeout:   0x0000000e | Int stat:
> 0x00018002
> >>>>> [  248.542742] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [  248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [  248.555693] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [  248.562167] mmc0: sdhci: Cmd:       0x00000c1b | Max curr:
> 0x00000000
> >>>>> [  248.568643] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
> >>>>> [  248.575117] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [  248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [  248.586065]
> mmc0:
> >>>>> sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 248.593228] mmc0: sdhci:
> >>>> ============================================
> >>>>> [  249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >>>>> [  249.143462] rcu:     1-...!: (1 GPs behind)
> >> idle=84a/1/0x4000000000000000
> >>>> softirq=5875/5876 fqs=2479
> >>>>> [  249.152458] rcu:     (detected by 3, t=21016 jiffies, g=5793, q=391)
> >>>>> [  249.158588] Sending NMI from CPU 3 to CPUs 1:
> >>>>> [  249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
> >>>>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [  249.174284] rcu:
> >> RCU
> >>>> grace-period kthread stack dump:
> >>>>> [  249.179364] rcu_sched       I    0    10      2 0x80000000
> >>>>> [  249.184882] Call Trace:
> >>>>> [  249.187360]  ? __schedule+0x340/0x8a0 [  249.191046]
> >>>>> schedule+0x32/0x80 [  249.194211]  schedule_timeout+0x180/0x360 [
> >>>>> 249.198248]  ? __next_timer_interrupt+0xd0/0xd0 [  249.202806]
> >>>>> rcu_gp_kthread+0x522/0xec0 [  249.206667]  ?
> >> call_rcu_sched+0x20/0x20
> >>>>> [  249.210528]  kthread+0x115/0x130 [  249.213779]  ?
> >>>>> kthread_create_worker_on_cpu+0x70/0x70
> >>>>> [  249.218861]  ret_from_fork+0x20/0x30 [  256.449655] igb
> >>>>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [  258.725508]
> mmc0:
> >>>>> Timeout waiting for hardware cmd interrupt.
> >>>>> [  258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [  258.737761] mmc0: sdhci: Sys addr:  0x20000008 |
> >>>>> Version:  0x00001002 [  258.744238] mmc0: sdhci: Blk size:  0x00007200
> >>>>> | Blk cnt:  0x00000000 [  258.750716] mmc0: sdhci: Argument:
> 0x00010000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [  258.757191] mmc0: sdhci: Present:   0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [  258.763666] mmc0: sdhci: Power:     0x0000000b | Blk gap:
> 0x00000080
> >>>>> [  258.770143] mmc0: sdhci: Wake-up:   0x00000000 | Clock:
> 0x00000007
> >>>>> [  258.776620] mmc0: sdhci: Timeout:   0x0000000e | Int stat:
> 0x00018001
> >>>>> [  258.783097] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [  258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [  258.796044] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [  258.802518] mmc0: sdhci: Cmd:       0x00000d1a | Max curr:
> 0x00000000
> >>>>> [  258.808994] mmc0: sdhci: Resp[0]:   0x00400900 | Resp[1]:  0xffffffff
> >>>>> [  258.815468] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [  258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [  258.826411]
> mmc0:
> >>>>> sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 258.833578] mmc0: sdhci:
> >>>> ============================================
> >>>>> [  268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> >>>>> [  268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [  268.977759] mmc0: sdhci: Sys addr:  0x20000008 |
> >>>>> Version:  0x00001002 [  268.984233] mmc0: sdhci: Blk size:  0x00007200
> >>>>> | Blk cnt:  0x00000000 [  268.990704] mmc0: sdhci: Argument:
> 0x00010000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [  268.997174] mmc0: sdhci: Present:   0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [  269.003647] mmc0: sdhci: Power:     0x0000000b | Blk gap:
> 0x00000080
> >>>>> [  269.010118] mmc0: sdhci: Wake-up:   0x00000000 | Clock:
> 0x00000007
> >>>>> [  269.016590] mmc0: sdhci: Timeout:   0x0000000e | Int stat:
> 0x00018001
> >>>>> [  269.023067] mmc0: sdhci: Int enab:  0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [  269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [  269.036011] mmc0: sdhci: Caps:      0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [  269.042484] mmc0: sdhci: Cmd:       0x00000d1a | Max curr:
> 0x00000000
> >>>>> [  269.048958] mmc0: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
> >>>>> [  269.055432] mmc0: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [  269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [  269.066373]
> mmc0:
> >>>>> sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 269.073542] mmc0: sdhci:
> >>>> ============================================
> >>>>>
> >>>>>
> >>>>> Kind regards,
> >>>>> Norbert
> >>>> I've seen this before when I was playing around with getting the ipipe
> to
> >> run
> >>>> on my edison board.  I don't think it's ipipe related.  I found this 
> >>>> thread
> on
> >>>> patchwork:
> >>>>
> >>>> https://hes32-
> >>>>
> >>
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
> >>>> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-
> >> a301-
> >>>> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> >>>> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
> >>>
> >>> Ah thanks,
> >>>
> >>> Quote: "Commands and resets are under spin lock, so no possibility for
> >> preemption,
> >>> and certainly a few microseconds isn't going to make any difference to
> >> these
> >>> timeouts, so I don't see how this patch could help."
> >>>
> >>> So I guess this is not an issue under plain Linux, but Cobalt could 
> >>> preempt
> >> for a
> >>> couple of ms (or even longer if you managed to get an RT thread stuck).
> >>
> >> That may also be different on PREEMPT-RT.
> >>
> >>>>
> >>>> There was another thread on the Ubuntu forums that solved the issue
> by
> >>>> disabling the sdcard in the bios.
> >>>
> >>> I can't do that, as this is the eMMC the system is running from.
> >>>
> >>> BTW, if I would want to limit such issues, could I limit Linux IRQs (apart
> from
> >> a handful of necessary maintainance/timer) to a single core?
> >>>
> >>
> >> Yes, that's the best option on a multicore system for both sides, RT and
> >> Linux.
> >
> > Next question: how?
> >
> > I played around with isolcpus, but that had severe sideeffects:
> > https://hes32-
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fxenom
> ai.org%2fpipermail%2fxenomai%2f2019%2dOctober%2f041902.html&umid=c
> 44405e5-b378-4e77-a655-
> 510a18e1f8a0&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> 57fcb874bfb6cbc083c56dba173ad30125d41e08
> >
> > This was before Philippe fixed some serious scheduler bugs,
> > but I stopped trying to do anything "smart" after ending up with random
> weirdness.
> >
>
> isolcpus should work. If not, we need to debug and fix.

See it does, atleast I did not run into problems yet.

I added "irqaffinity=0 isolcpus=1-3" to the cmdline, which binds the driver 
irqs to core 0,
and prevents automatic scheduling to the remaining cpus.

I am a bit unclear how the Xenomai-side is supposed to work.

Setting "xenomai.supported_cpus=0xe" will strictly deny running anything there,
My process did bail out with
[   27.726681] [Xenomai] thread rtgate[793] switched to non-rt CPU0, aborted.

That means I either have to
echo 0xfffffffe > /proc/xenomai/affinity

or set affinity manually (early enough) each time.
Whatever I do, I guess xenomais "printf" thread will always have to run
on a cobalt-enabled thread, even if its just using linux sycalls?

Why is there no xenomai.affinity kernel commandline parameter (or alternatively 
an xenomai.isolcpus)?


>
> Less convenient alternative: manually tuning /proc/irq/*/smp_affinity or
> instructing irqbalanced to keep some cores free. cgroups with CPU
> affinities can help to keep tasks away.

Yes, systemd-system.conf has an CPUAffinity option. The Linux-side seems solved
with the two kernel parameters anyways.

Thanks,
Norbert

________________________________

This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________

Reply via email to