RE: rcu_preempt detected expedited stalls on CPUs/tasks

Zhang, Qiang1 via Xenomai Mon, 11 Apr 2022 01:59:34 -0700

Add on：

        [  443.996057] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  444.002166] rcu:     5-...0: (0 ticks this GP) idle=8de/1/0x4000000000000000 
softirq=722/722 fqs=9227 last_accelerate: fcd8/8c9e, nonlazy_posted: 0, L.
[  444.015487] rcu:     (detected by 3, t=36767 jiffies, g=11061, q=792)
[  444.021677] Task dump for CPU 5:
[  444.024906] kworker/5:2     R  running task        0   263     !2 0x0000022a
[  444.031969] Workq▒eue: events igb_watchdog_task [  444.036503] Call trace:
[  444.038952]  __switch_to+0xe8/0x138
[  444.042442]  0xffff800639210000
[  444.045588] rcu: rcu_preempt kthread starved for 18306 jiffies! g11061 f0x2 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=5


The  rcu_preempt kthread is in running state, but it can not get CPU5, because 
the 
rcu_preempt is RT-FIFO kthreads,  normally, it should get CPU5 before  
igb_watchdog_task.
the igb_watchdog_task calltrace still not complete,  maybe you can see if there 
is disable irq or preemption etc
in the igb_watchdog_task function .

[  444.055951] rcu: RCU grace-period kthread stack dump:
[  444.061006] rcu_preempt     R  running task        0    10      2 0x00000028
[  444.068066] Call trace:
[  444.070514]  __switch_to+0xe8/0x138
[  444.074006]  __schedule+0x2c0/0x828
[  444.077497]  schedule+0x38/0xa8
[  444.080641]  schedule_timeout+0x98/0x440 [  444.084568]  
rcu_gp_kthread+0x4ac/0x8b0 [  444.088407]  kthread+0x12c/0x130 [  444.091638]  
ret_from_fork+0x14/0x1c

在 2022/4/11 16:24，“Xenomai 代表 Ivan Jiang via 
Xenomai”<xenomai-boun...@xenomai.org 代表 xenomai@xenomai.org> 写入:

    Dear Qiang：

        Another information like this:
        [  121.127758] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    [  121.133887] rcu:     0-...0: (5 ticks this GP) 
idle=b5a/1/0x4000000000000002 softirq=11605/11606 fqs=2336
    [  121.143294] rcu:     (detected by 4, t=5255 jiffies, g=4137, q=10)
    [  121.149225] Task dump for CPU 0:
    [  121.152457] tcf-agent       R  running task        0   384      1 
0x0000020a
    [  121.159520] Call trace:
    [  121.161987]  __switch_to+0xe8/0x138
    [  121.165481]  0xffff000008e4c000
    [  220.120678] audit: type=1701 audit(1649325109.997:4): auid=4294967295 
uid=0 gid=0 ses=4294967295 pid=241 comm="systemd-journal" 
exe="/lib/systemd/systemd-journald" sig=6 res=1
    [  280.369229] systemd[1]: systemd-udevd.service: Watchdog timeout (limit 
3min)!
    [  280.376682] systemd[1]: systemd-udevd.service: Killing process 255 
(systemd-udevd) with signal SIGABRT.
    [  280.386241] audit: type=1701 audit(1649325109.997:5): auid=4294967295 
uid=0 gid=0 ses=4294967295 pid=255 comm="systemd-udevd" 
exe="/lib/systemd/systemd-udevd" sig=6 res=1
    [  310.369200] systemd[1]: systemd-journald.service: State 'stop-sigabrt' 
timed out. Terminating.
    [  370.619174] systemd[1]: systemd-udevd.service: State 'stop-sigabrt' 
timed out. Terminating.
    [  400.619175] systemd[1]: systemd-journald.service: State 'stop-sigterm' 
timed out. Killing.
    [  400.627584] systemd[1]: systemd-journald.service: Killing process 241 
(systemd-journal) with signal SIGKILL.

    Thank you.
    BR.,
    Ivan


    在 2022/4/11 16:11，“Zhang, Qiang1”<qiang1.zh...@intel.com> 写入:


            Dear xenomais：

                When I used Intel I211 Ethernet chip (pcie 2 Gigibit) met rcu 
error like below：
                rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: 
{ 5-... } 5335 jiffies s: 37 root: 0x20/.
                [  392.174721] rcu: blocking rcu_node structures:
                [  392.179174] Task dump for CPU 5:
                [  392.182411] kworker/5:2     R  running task        0   263   
   2 0x0000022a
                [  392.189479] Workqueue: events igb_watchdog_task
                [  392.194018] Call trace:
                [  392.196471]  __switch_to+0xe8/0x138
                [  392.199961]  0xffff800639210000

        Hi, Please provide full  rcu stall information, this info indicate the 
CPU5 can not notice rcu quiescent state,
        This may be because the status cannot be reported in time due to 
disable interrupts or preemption long time etc
         you can see Documentation/RCU/stallwarn.rst.

        Thanks
        Zqiang



                        Environment like this:
                        Cobalt 3.1 ipipe 4.19.106 or 4.19.198 ARM64
                        Starting commands isolcpus=1,2 
xenomai.supported_cpus=0x06 nohz_full=1,2 rcu_nocbs=1,2 irqaffinity=0,3,4,5 
nmi_interrupt=0
                        Because I set irqaffinity on NRT cpus: CPU0 CPU3 CPU4 
CPU5, so Task dump for CPU 0 3 4 5 randomly.
                        And when the error messages appeared after 10 minutes, 
the system 's no longer responded anything and I haven't done any threads with 
this Ethernet port just ping command.
                        .config file contains such options:
                CONFIG_NO_HZ_FULL = y
                CONFIG_RCU_NOCB_CPU=y
                CONFIG_PREEMPT=y
                CONFIG_CPU_IDLE=n
                CONFIG_ARM_CPUIDLE=n
                CONFIG_CPU_FREQ=n
                Could you be pleased help me what could be the problem.

                        Thank you.

            Best Regards，
            Ivan

RE: rcu_preempt detected expedited stalls on CPUs/tasks

Reply via email to