Thanks for the update. Seems fairly critical given that it results in a Call Trace - however is this an actual fault (doesn't appear to be an Oops/Panic).
Any ETA on a fix for this? -----Original Message----- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:57 AM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Subject: RE: Constant Illegal FSM event / Resetting Link errors This looks very much like the deadlock that Partha tried to fix in commit d094c4d5f5c7e1b2 ("tipc: add subscription refcount..") in 4.10. It is quite likely that this is the root of the problem with the reset broadcast link too. Unfortunately that patch didn't solve the problem, and we are waiting for a more complete solution from Ying (Which he has been promising for a while ;) ///jon > -----Original Message----- > From: Butler, Peter [mailto:pbut...@sonusnet.com] > Sent: Wednesday, March 08, 2017 05:41 PM > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > discuss...@lists.sourceforge.net > Subject: RE: Constant Illegal FSM event / Resetting Link errors > > I don't see any logs at all coming from the 2 TIPC 1.7 nodes. > > However in looking at the logs for the card that is spamming the logs, > I see > this: > > Mar 7 00:20:55 [SEQ 006552] myVMslot12 kernel: [51021.039638] NMI > watchdog: BUG: soft lockup - CPU#2 stuck for 23s! > [kworker/u32:0:27462] Mar 7 00:20:55 [SEQ 006553] myVMslot12 kernel: > [51021.041638] NMI > watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [gtt:29011] Mar 7 > 00:20:55 [SEQ 006554] myVMslot12 kernel: [51021.041660] Modules linked in: > iptable_mangle iptable_raw nf_log_ipv4 nf_log_common xt_LOG sctp > libcrc32c e1000e tipc udp_tunnel ip6_udp_tunnel 8021q garp iTCO_wdt > ipmiq_drv(O) sio_mmc(O) xt_physdev br_netfilter bridge event_drv(O) > stp llc nf_conntrack_ipv4 nf_defrag_ipv4 lockd grace ip6t_REJECT > nf_reject_ipv6 > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack > pt_timer_info(O) ip6table_filter ip6_tables ddi(O) iTCO_vendor_support > usb_storage i2c_i801 pcspkr intel_ips ixgbe igb lpc_ich mfd_core > i2c_algo_bit i2c_core ioatdma dca ptp pps_core mdio tpm_tis tpm sunrpc > [last unloaded: iTCO_wdt] Mar 7 00:20:55 [SEQ 006555] myVMslot12 kernel: > [51021.041662] CPU: 3 PID: > 29011 Comm: gtt Tainted: G O 4.4.0 #24 > Mar 7 00:20:55 [SEQ 006556] myVMslot12 kernel: [51021.041663] > Hardware > name: PT AMC124/Base Board Product Name, BIOS LGNAJFIP.PTI.0012.P15 > 01/15/2014 Mar 7 00:20:55 [SEQ 006557] myVMslot12 kernel: > [51021.041664] > task: ffff88034cc63680 ti: ffff8802eba20000 task.ti: ffff8802eba20000 > Mar 7 > 00:20:55 [SEQ 006558] myVMslot12 kernel: [51021.041670] RIP: > 0010:[<ffffffff810c34c4>] [<ffffffff810c34c4>] > queued_spin_lock_slowpath+0x44/0x160 > Mar 7 00:20:55 [SEQ 006559] myVMslot12 kernel: [51021.041671] RSP: > 0018:ffff8802eba239e8 EFLAGS: 00000202 Mar 7 00:20:55 [SEQ 006560] > myVMslot12 kernel: [51021.041671] RAX: 00000000000c0101 RBX: > ffff8802e092e3c0 RCX: 0000000000000001 Mar 7 00:20:55 [SEQ 006561] > myVMslot12 kernel: [51021.041672] RDX: 0000000000000101 RSI: > 0000000000000001 RDI: ffff88034ddcf140 Mar 7 00:20:55 [SEQ 006562] > myVMslot12 kernel: [51021.041673] RBP: ffff8802eba239e8 R08: > 0000000000000101 R09: ffffffffa022f8a0 Mar 7 00:20:55 [SEQ 006563] > myVMslot12 kernel: [51021.041674] R10: 0000000000000001 R11: > 0000000000000000 R12: ffff88034ddcf140 Mar 7 00:20:55 [SEQ 006564] > myVMslot12 kernel: [51021.041674] R13: 000000000000c350 R14: > ffff8803499aa760 R15: ffff8803499aa768 Mar 7 00:20:55 [SEQ 006565] > myVMslot12 kernel: [51021.041675] FS: 0000000000000000(0000) > GS:ffff88035fc60000(0000) knlGS:0000000000000000 Mar 7 00:20:55 [SEQ > 006566] myVMslot12 kernel: [51021.041676] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 Mar 7 00:20:55 [SEQ 006567] myVMslot12 kernel: > [51021.041677] CR2: 00007fe84f9b1070 CR3: 0000000001c0a000 CR4: > 00000000000006e0 Mar 7 00:20:55 [SEQ 006568] myVMslot12 kernel: > [51021.041677] Stack: > Mar 7 00:20:55 [SEQ 006569] myVMslot12 kernel: [51021.041679] > ffff8802eba239f8 ffffffff816e087c ffff8802eba23a38 ffffffffa021d6b3 > Mar 7 > 00:20:55 [SEQ 006570] myVMslot12 kernel: [51021.041680] > ffff8802eba23a18 > ffff8802e092e3c0 ffff88034ddce000 ffff8803499aa770 Mar 7 00:20:55 > [SEQ 006571] myVMslot12 kernel: [51021.041682] ffff8803499aa760 > ffff8803499aa768 ffff8802eba23a58 ffffffffa021a8f9 Mar 7 00:20:55 > [SEQ 006572] myVMslot12 kernel: [51021.041682] Call Trace: > Mar 7 00:20:55 [SEQ 006573] myVMslot12 kernel: [51021.041686] > [<ffffffff816e087c>] _raw_spin_lock_bh+0x2c/0x40 Mar 7 00:20:55 [SEQ > 006574] myVMslot12 kernel: [51021.041691] [<ffffffffa021d6b3>] > tipc_nametbl_unsubscribe+0x63/0x120 [tipc] Mar 7 00:20:55 [SEQ > 006575] > myVMslot12 kernel: [51021.041695] [<ffffffffa021a8f9>] > tipc_subscrp_delete+0x39/0x60 [tipc] Mar 7 00:20:55 [SEQ 006576] > myVMslot12 kernel: [51021.041699] [<ffffffffa021a9d0>] > tipc_subscrb_release_cb+0x70/0xb0 [tipc] Mar 7 00:20:55 [SEQ 006577] > myVMslot12 kernel: [51021.041703] [<ffffffffa0229683>] > tipc_conn_kref_release+0x133/0x140 [tipc] Mar 7 00:20:55 [SEQ 006578] > myVMslot12 kernel: [51021.041706] [<ffffffffa02296a6>] > conn_put+0x16/0x20 [tipc] Mar 7 00:20:55 [SEQ 006579] myVMslot12 kernel: > [51021.041710] [<ffffffffa0229eb4>] tipc_conn_sendmsg+0x114/0x190 > [tipc] Mar 7 00:20:55 [SEQ 006580] myVMslot12 kernel: [51021.041714] > [<ffffffffa021adf5>] tipc_subscrp_send_event+0xd5/0xf0 [tipc] Mar 7 > 00:20:55 [SEQ 006581] myVMslot12 kernel: [51021.041718] > [<ffffffffa021b038>] tipc_subscrp_report_overlap+0x98/0xb0 [tipc] Mar > 7 > 00:20:55 [SEQ 006582] myVMslot12 kernel: [51021.041721] > [<ffffffffa021c96e>] tipc_nameseq_remove_publ+0x12e/0x1d0 [tipc] Mar > 7 > 00:20:55 [SEQ 006583] myVMslot12 kernel: [51021.041725] > [<ffffffffa021cfd6>] tipc_nametbl_remove_publ+0x66/0xf0 [tipc] Mar 7 > 00:20:55 [SEQ 006584] myVMslot12 kernel: [51021.041729] > [<ffffffffa021d3dd>] tipc_nametbl_withdraw+0x6d/0x130 [tipc] Mar 7 > 00:20:55 [SEQ 006585] myVMslot12 kernel: [51021.041735] > [<ffffffffa022410a>] tipc_sk_withdraw+0xba/0x100 [tipc] Mar 7 > 00:20:55 [SEQ 006586] myVMslot12 kernel: [51021.041739] > [<ffffffff811edbac>] ? > __d_free+0x1c/0x20 Mar 7 00:20:55 [SEQ 006587] myVMslot12 kernel: > [51021.041743] [<ffffffffa0227489>] tipc_release+0xa9/0x1c0 [tipc] > Mar 7 > 00:20:55 [SEQ 006588] myVMslot12 kernel: [51021.041744] > [<ffffffff811edbac>] ? __d_free+0x1c/0x20 Mar 7 00:20:55 [SEQ 006589] > myVMslot12 kernel: [51021.041746] [<ffffffff811ee02c>] ? > dentry_free+0x4c/0x90 Mar 7 00:20:55 [SEQ 006590] myVMslot12 kernel: > [51021.041750] [<ffffffff815bd158>] sock_release+0x28/0x90 Mar 7 > 00:20:55 [SEQ 006591] myVMslot12 kernel: [51021.041752] > [<ffffffff815bd4c2>] > sock_close+0x12/0x20 Mar 7 00:20:55 [SEQ 006592] myVMslot12 kernel: > [51021.041754] [<ffffffff811db605>] __fput+0xb5/0x200 Mar 7 00:20:55 > [SEQ 006593] myVMslot12 kernel: [51021.041757] [<ffffffff811db79e>] > ____fput+0xe/0x10 Mar 7 00:20:55 [SEQ 006594] myVMslot12 kernel: > [51021.041760] [<ffffffff8109abbb>] task_work_run+0x7b/0x90 Mar 7 > 00:20:55 [SEQ 006595] myVMslot12 kernel: [51021.041763] > [<ffffffff81082616>] do_exit+0x2b6/0xa80 Mar 7 00:20:55 [SEQ 006596] > myVMslot12 kernel: [51021.041766] [<ffffffff81002186>] ? > do_audit_syscall_entry+0x66/0x70 Mar 7 00:20:55 [SEQ 006597] > myVMslot12 > kernel: [51021.041767] [<ffffffff81082ec2>] do_group_exit+0x42/0xa0 > Mar > 7 00:20:55 [SEQ 006598] myVMslot12 kernel: [51021.041768] > [<ffffffff81082f37>] SyS_exit_group+0x17/0x20 Mar 7 00:20:55 [SEQ > 006599] > myVMslot12 kernel: [51021.041770] [<ffffffff816e0c97>] > entry_SYSCALL_64_fastpath+0x12/0x6a > Mar 7 00:20:55 [SEQ 006600] myVMslot12 kernel: [51021.041785] Code: > 00 00 > 00 eb 02 89 c6 f7 c6 00 ff ff ff 75 41 83 fe 01 89 ca 89 f0 41 0f 44 > d0 f0 0f b1 17 39 > f0 75 e3 83 fa 01 75 04 eb 0d f3 90 <8b> 07 84 c0 75 f8 66 c7 07 01 00 > 5d c3 8b 37 > 81 fe 00 01 00 00 > > -----Original Message----- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: March-08-17 11:32 AM > To: Butler, Peter <pbut...@sonusnet.com>; tipc- > discuss...@lists.sourceforge.net > Subject: RE: Constant Illegal FSM event / Resetting Link errors > > > > > -----Original Message----- > > From: Butler, Peter [mailto:pbut...@sonusnet.com] > > Sent: Wednesday, March 08, 2017 05:29 PM > > To: Jon Maloy <jon.ma...@ericsson.com>; tipc- > > discuss...@lists.sourceforge.net > > Cc: Butler, Peter <pbut...@sonusnet.com> > > Subject: RE: Constant Illegal FSM event / Resetting Link errors > > > > There are 7 nodes in the system running 4.9.11 TIPC (on 4.4.0 x86-64 > > kernels), and 2 nodes in the system running TIPC 1.7 (on 2.6.20 PPC > kernels). > > Are there any corresponding printouts on the other 4.p nodes? And on > the > 2.6 nodes? > I am especially curious about the latter ones. > > ///jon > > > > > > > > > > > > > -----Original Message----- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: March-08-17 11:21 AM > > To: Butler, Peter <pbut...@sonusnet.com>; tipc- > > discuss...@lists.sourceforge.net > > Subject: RE: Constant Illegal FSM event / Resetting Link errors > > > > Hi Peter, > > Is this only the 4.9.11 code, or is it in combination with tipc 1.7 > > nodes you mentioned earlier? > > ///jon > > > > > -----Original Message----- > > > From: Butler, Peter [mailto:pbut...@sonusnet.com] > > > Sent: Wednesday, March 08, 2017 03:33 PM > > > To: tipc-discussion@lists.sourceforge.net > > > Subject: [tipc-discussion] Constant Illegal FSM event / Resetting > > > Link errors > > > > > > 8 nodes in mesh, running TIPC from kernel 4.9.11. > > > > > > The following log messages are continually being spammed (many > > > times per > > > second): > > > > > > Mar 8 00:17:31 [SEQ 409067] myVMslot12 kernel: [ 130.406118] > > > Resetting link Link <broadcast-link> state 2000 Mar 8 00:17:31 > > > [SEQ 409068] > > > myVMslot12 kernel: [ 130.406120] XMTQ: 28 [53-80], BKLGQ: 0, SNDNX: > > > 81, > > > RCVNX: 1 Mar 8 00:17:31 [SEQ 409069] myVMslot12 kernel: [ > > > 130.406121] Failed msg: usr 0, typ 1, len 104, err 0 Mar 8 > > > 00:17:31 [SEQ 409070] > > > myVMslot12 kernel: [ 130.406123] sqno 53, prev: 100100c, src: > > > 100100c Mar > > > 8 00:17:31 [SEQ 409071] myVMslot12 kernel: [ 130.406124] Illegal > > > FSM event fa110e in state 2000 on link broadcast-link Mar 8 > > > 00:17:31 [SEQ 409072] > > > myVMslot12 kernel: [ 130.413162] Retransmission failure on link > > > <broadcast- > > > link> Mar 8 00:17:31 [SEQ 409073] myVMslot12 kernel: [ > > > link> 130.419300] > > > Resetting link Link <broadcast-link> state 2000 Mar 8 00:17:31 > > > [SEQ 409074] > > > myVMslot12 kernel: [ 130.419309] XMTQ: 28 [53-80], BKLGQ: 0, SNDNX: > > > 81, > > > RCVNX: 1 Mar 8 00:17:31 [SEQ 409075] myVMslot12 kernel: [ > > > 130.419310] Failed msg: usr 0, typ 1, len 104, err 0 Mar 8 > > > 00:17:31 [SEQ 409076] > > > myVMslot12 kernel: [ 130.419311] sqno 53, prev: 100100c, src: > > > 100100c Mar > > > 8 00:17:31 [SEQ 409077] myVMslot12 kernel: [ 130.419313] Illegal > > > FSM event fa110e in state 2000 on link broadcast-link Mar 8 > > > 00:17:32 [SEQ 409081] > > > myVMslot12 kernel: [ 130.701416] Retransmission failure on link > > > <broadcast- > > > link> Mar 8 00:17:32 [SEQ 409082] myVMslot12 kernel: [ > > > link> 130.707070] > > > Resetting link Link <broadcast-link> state 2000 Mar 8 00:17:32 > > > [SEQ 409083] > > > myVMslot12 kernel: [ 130.707072] XMTQ: 28 [53-80], BKLGQ: 0, SNDNX: > > > 81, > > > RCVNX: 1 Mar 8 00:17:32 [SEQ 409084] myVMslot12 kernel: [ > > > 130.707074] Failed msg: usr 0, typ 1, len 104, err 0 Mar 8 > > > 00:17:32 [SEQ 409085] > > > myVMslot12 kernel: [ 130.707075] sqno 53, prev: 100100c, src: > > > 100100c Mar > > > 8 00:17:32 [SEQ 409086] myVMslot12 kernel: [ 130.707076] Illegal > > > FSM event fa110e in state 2000 on link broadcast-link Mar 8 > > > 00:17:32 [SEQ 409087] > > > myVMslot12 kernel: [ 130.713887] Retransmission failure on link > > > <broadcast- > > > link> Mar 8 00:17:32 [SEQ 409088] myVMslot12 kernel: [ > > > link> 130.719700] > > > Resetting link Link <broadcast-link> state 2000 Mar 8 00:17:32 > > > [SEQ 409089] > > > myVMslot12 kernel: [ 130.719702] XMTQ: 28 [53-80], BKLGQ: 0, SNDNX: > > > 81, > > > RCVNX: 1 Mar 8 00:17:32 [SEQ 409090] myVMslot12 kernel: [ > > > 130.719703] Failed msg: usr 0, typ 1, len 104, err 0 Mar 8 > > > 00:17:32 [SEQ 409091] > > > myVMslot12 kernel: [ 130.719704] sqno 53, prev: 100100c, src: > > > 100100c Mar > > > 8 00:17:32 [SEQ 409092] myVMslot12 kernel: [ 130.719705] Illegal > > > FSM event fa110e in state 2000 on link broadcast-link Mar 8 > > > 00:17:32 [SEQ 409093] > > > myVMslot12 kernel: [ 130.726752] Retransmission failure on link > > > <broadcast- > > > link> > > > > > > > > > ------------------------------------------------------------------ > > > -- > > > -- > > > -------- Announcing the Oxford Dictionaries API! The API offers > > > world-renowned dictionary content that is easy and intuitive to > > > access. Sign up for an account today to start using our lexical > > > data to power your apps and projects. Get started today and enter > > > our developer competition. > > > http://sdm.link/oxford > > > _______________________________________________ > > > tipc-discussion mailing list > > > tipc-discussion@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------------ Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion