Hi,

I am seeing an occasional kernel soft lockup.  I have TIPC v4.7 and the
kernel dump occurs
when the system is going down for a reboot.

The kernel dump is:

<0>NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [exfx:1474]
<6>Modules linked in: tipc jitterentropy_rng echainiv drbg
platform_driver(O) ipifwd(PO)
...
<6>
<6>GPR00: c15333e8 a4e0fb80 a4ee3600 a51748ac 00000000 ae475024
a537feec fffffffd
<6>GPR08: a2197408 00000001 00000001 00000004 80691c00
<6>NIP [80691c40] _raw_spin_lock_bh+0x40/0x70
<6>LR [c1534f30] tipc_nametbl_unsubscribe+0x50/0x120 [tipc]
<6>Call Trace:
<6>[a4e0fba0] [c15333e8] tipc_named_reinit+0xf8/0x820 [tipc]
<6>[a4e0fbb0] [c15336a0] tipc_named_reinit+0x3b0/0x820 [tipc]
<6>[a4e0fbd0] [c1540bac] tipc_nl_publ_dump+0x50c/0xed0 [tipc]
<6>[a4e0fc00] [c154164c] tipc_conn_sendmsg+0xdc/0x170 [tipc]
<6>[a4e0fc30] [c1533c9c] tipc_subscrp_report_overlap+0xbc/0xd0 [tipc]
<6>[a4e0fc70] [c153425c] tipc_topsrv_stop+0x45c/0x4f0 [tipc]
<6>[a4e0fca0] [c1534788] tipc_nametbl_remove_publ+0x58/0x110 [tipc]
<6>[a4e0fcd0] [c1534c48] tipc_nametbl_withdraw+0x68/0x140 [tipc]
<6>[a4e0fd00] [c153cc24] tipc_nl_node_dump_link+0x1904/0x45d0 [tipc]
<6>[a4e0fd30] [c153d838] tipc_nl_node_dump_link+0x2518/0x45d0 [tipc]
<6>[a4e0fd70] [804f2870] sock_release+0x30/0xf0
<6>[a4e0fd80] [804f2944] sock_close+0x14/0x30
<6>[a4e0fd90] [80105844] __fput+0x94/0x200
<6>[a4e0fdb0] [8003dca4] task_work_run+0xd4/0x100
<6>[a4e0fdd0] [80023620] do_exit+0x280/0x980
<6>[a4e0fe10] [80024c48] do_group_exit+0x48/0xb0
<6>[a4e0fe30] [80030344] get_signal+0x244/0x4f0
<6>[a4e0fe80] [80007734] do_signal+0x34/0x1c0
<6>[a4e0ff30] [800079a8] do_notify_resume+0x68/0x80
<6>[a4e0ff40] [8000fa1c] do_user_signal+0x74/0xc4


>From the stack dump it looks like tipc_named_reinit is trying to
acquire nametbl_lock.

>From looking at the call chain I can see that tipc_conn_sendmsg can
send up calling conn_put

which will go on and call the tipc_named_reinit via tipc_sock_release.

As tipc_nametbl_withdraw (from the stack dump) has already acquired
the nametbl_lock, tipc_named_reinit

cannot get it and so the process hangs.


The call to tipc_sock_release (added in Commit 333f796235a527
<http://git.atlnz.lc/cgit/cgit.cgi/upstream_imports/linux-stable.git/commit/?id=333f796235a52727db7e0a13888045f3aa3d5335>)
seems to have changed the behaviour

such that it tries to do a lot more when shutting the connection down.


If there is other information I can provide please let me know.

Regards,

John
------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to