Hi Jon, Sorry for the delay, could not work due to sick child.
The crash occurs due to the last commit: "tipc: reduce risk of user starvation during link congestion" I examined the crash today, the crash due to array out of bounds for skb->cb[48]. The max size allowed for the callback area is 48bytes, whereas the new struct tipc_skb_cb is 64 bytes. This overrides the skb->destructor callback lying below the 'skb->cb'. The sizeof struct sk_buff_head itself is 48bytes. crash> p *(struct sk_buff*)0xffff88003f007600 : dev = 0xffff88003f985000, cb = "\000\00\000", _skb_refdst = 0, destructor = 0x1000000000000, << insane function pointer >> I think the simpler way to place these packets 'pkts' into the backlogq and allow temporary over-committing and keep the wakeup mechanism as it is. This way, we transmit the packet in tipc_link_advance_backlog() instead of doing it in link_prepare_wakeup(). Its misleading that link_prepare_wakeup() transmits packets. /Partha On 11/30/2016 07:48 PM, Jon Maloy wrote: > Weird. Looks like a corrupted incoming buffer directly at startup, > before any of my new code is active. Is this repeatable? > > ///jon > > > On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote: >> Hi Jon, >> >> With your patches, I get the following crash when loading the tipc >> module. Leaving home now, so couldnt investigate further. >> >> [ 58.201114] tipc: Started in single node mode >> [ 58.212991] Started in network mode >> [ 58.213796] Own node address <1.1.1>, network identity 4711 >> [ 58.238416] 8021q: adding VLAN 0 to HW filter on device data0 >> [ 58.252217] 8021q: adding VLAN 0 to HW filter on device data1 >> [ 58.270822] Enabled bearer <eth:data0>, discovery domain <1.1.0>, >> priority 10 >> [ 58.571114] general protection fault: 0000 [#1] SMP >> [ 58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel >> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio >> [ 58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15 >> [ 58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 58.572031] task: ffffffff81c0d540 task.stack: ffffffff81c00000 >> [ 58.572031] RIP: 0010:[<ffffffff8162f10d>] [<ffffffff8162f10d>] >> skb_release_head_state+0x4d/0xa0 >> [ 58.572031] RSP: 0018:ffff880037c03ba0 EFLAGS: 00010246 >> [ 58.572031] RAX: 0001000000000000 RBX: ffff880033fffa00 RCX: >> 00000000000000ff >> [ 58.572031] RDX: 0000000000000000 RSI: ffff880037c03bca RDI: >> ffff880033fffa00 >> [ 58.572031] RBP: ffff880037c03ba8 R08: ffffffffa005f2c0 R09: >> 0000000000000000 >> [ 58.572031] R10: ffff880035b0f0a0 R11: ffffea0000000000 R12: >> ffff880033fffa00 >> [ 58.572031] R13: ffffffffa0048fd4 R14: ffffffff81cfbec0 R15: >> ffff880033718000 >> [ 58.572031] FS: 0000000000000000(0000) GS:ffff880037c00000(0000) >> knlGS:0000000000000000 >> [ 58.572031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 58.572031] CR2: 0000000000851bf0 CR3: 0000000035b00000 CR4: >> 00000000000006f0 >> [ 58.572031] Stack: >> [ 58.572031] ffff880033fffa00 ffff880037c03bc0 ffffffff8162f2b2 >> ffff880033fffa00 >> [ 58.572031] ffff880037c03be8 ffffffff8162f327 ffff880033fffa00 >> 0000000000000000 >> [ 58.572031] ffff880035b32540 ffff880037c03c68 ffffffffa0048fd4 >> 0000000000000082 >> [ 58.572031] Call Trace: >> [ 58.572031] <IRQ> [ 58.572031] [<ffffffff8162f2b2>] >> skb_release_all+0x12/0x30 >> [ 58.572031] [<ffffffff8162f327>] kfree_skb+0x37/0xa0 >> [ 58.572031] [<ffffffffa0048fd4>] tipc_disc_rcv+0x84/0x1d0 [tipc] >> [ 58.572031] [<ffffffffa0053ddc>] tipc_rcv+0x3ac/0x3c0 [tipc] >> [ 58.572031] [<ffffffff81093457>] ? find_busiest_group+0x117/0x940 >> [ 58.572031] [<ffffffffa0043088>] tipc_l2_rcv_msg+0x48/0x60 [tipc] >> [ 58.572031] [<ffffffff81641245>] __netif_receive_skb_core+0x2e5/0xa60 >> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >> [ 58.572031] [<ffffffff81643a8b>] __netif_receive_skb+0x1b/0x70 >> [ 58.572031] [<ffffffff81643b0d>] netif_receive_skb_internal+0x2d/0x90 >> [ 58.572031] [<ffffffff81644494>] napi_gro_receive+0x94/0x130 >> [ 58.572031] [<ffffffffa0019405>] virtnet_receive+0x1a5/0x8a0 >> [virtio_net] >> [ 58.572031] [<ffffffffa0019b1d>] virtnet_poll+0x1d/0x80 [virtio_net] >> [ 58.572031] [<ffffffff81644c2e>] net_rx_action+0x20e/0x390 >> [ 58.572031] [<ffffffff8178358b>] __do_softirq+0x9b/0x2a2 >> [ 58.572031] [<ffffffff81062d00>] irq_exit+0x60/0x70 >> [ 58.572031] [<ffffffff81783324>] do_IRQ+0x54/0xd0 >> [ 58.572031] [<ffffffff817817ff>] common_interrupt+0x7f/0x7f >> [ 58.572031] <EOI> [ 58.572031] [<ffffffff817805c0>] ? >> default_idle+0x20/0xe0 >> [ 58.572031] [<ffffffff8114d439>] ? next_zone+0x29/0x30 >> [ 58.572031] [<ffffffff8102769f>] arch_cpu_idle+0xf/0x20 >> [ 58.572031] [<ffffffff81780a0c>] default_idle_call+0x2c/0x30 >> [ 58.572031] [<ffffffff8109a4d7>] cpu_startup_entry+0x177/0x1e0 >> [ 58.572031] [<ffffffff8177a7f7>] rest_init+0x77/0x80 >> [ 58.572031] [<ffffffff81d5deb5>] start_kernel+0x40e/0x41b >> [ 58.572031] [<ffffffff81d5d42f>] x86_64_start_reservations+0x2a/0x2c >> [ 58.572031] [<ffffffff81d5d51b>] x86_64_start_kernel+0xea/0xed >> [ 58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36 >> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75 >> 30 48 89 df <ff> d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3 >> e8 bb >> [ 58.572031] RIP [<ffffffff8162f10d>] skb_release_head_state+0x4d/0xa0 >> [ 58.572031] RSP <ffff880037c03ba0> >> [ 58.662814] ---[ end trace fa57695d3ce8757f ]--- >> [ 58.663875] Kernel panic - not syncing: Fatal exception in interrupt >> [ 58.664872] Kernel Offset: disabled >> [ 58.664872] ---[ end Kernel panic - not syncing: Fatal exception in >> interrupt >> >> regards >> Partha >> >> On 11/29/2016 06:07 PM, Jon Maloy wrote: >>> Ying, Partha, >>> It would be very nice I could get "acked" or "reviewed" on this so I >>> can send it to David before net-next closes. >>> >>> ///jon >>> >>> >>>> -----Original Message----- >>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com] >>>> Sent: Tuesday, 29 November, 2016 12:04 >>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan >>>> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue >>>> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com> >>>> Cc: ma...@donjonn.com; thompa....@gmail.com >>>> Subject: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >>>> >>>> We fix a very real starvation problem that may occur when the link >>>> level runs into send buffer congestion. At the same time we make the >>>> interaction between the socket and link layer simpler and more >>>> consistent. >>>> >>>> v2: - Simplified link congestion check to only check against own >>>> importance limit. This reduces the risk of higher levels >>>> starving out lower levels. >>>> >>>> Jon Maloy (3): >>>> tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() >>>> functions >>>> tipc: modify struct tipc_plist to be more versatile >>>> tipc: reduce risk of user starvation during link congestion >>>> >>>> net/tipc/bcast.c | 2 +- >>>> net/tipc/link.c | 81 ++++----- >>>> net/tipc/msg.h | 8 +- >>>> net/tipc/name_table.c | 100 +++++++---- >>>> net/tipc/name_table.h | 21 +-- >>>> net/tipc/node.c | 2 +- >>>> net/tipc/socket.c | 450 >>>> ++++++++++++++++++++++---------------------------- >>>> 7 files changed, 327 insertions(+), 337 deletions(-) >>>> >>>> -- >>>> 2.7.4 >>> > ------------------------------------------------------------------------------ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion