Hi Jon,

Sorry for the delay, could not work due to sick child.

The crash occurs due to the last commit:
"tipc: reduce risk of user starvation during link congestion"

I examined the crash today, the crash due to array out of bounds for 
skb->cb[48].
The max size allowed for the callback area is 48bytes, whereas the new struct 
tipc_skb_cb is 64 bytes. 
This overrides the skb->destructor callback lying below the 'skb->cb'.
The sizeof struct sk_buff_head itself is 48bytes.

crash> p *(struct sk_buff*)0xffff88003f007600
   :
   dev = 0xffff88003f985000,
   cb = "\000\00\000", 
   _skb_refdst = 0,
   destructor = 0x1000000000000,  << insane function pointer >>

I think the simpler way to place these packets 'pkts' into the backlogq and 
allow
temporary over-committing and keep the wakeup mechanism as it is.

This way, we transmit the packet in tipc_link_advance_backlog() instead of 
doing it in
link_prepare_wakeup().  Its misleading that link_prepare_wakeup() transmits 
packets.

/Partha


On 11/30/2016 07:48 PM, Jon Maloy wrote:
> Weird. Looks like a corrupted incoming buffer directly at startup,
> before any of my new code is active. Is this repeatable?
>
> ///jon
>
>
> On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote:
>> Hi Jon,
>>
>> With your patches, I get the following crash when loading the tipc
>> module. Leaving home now, so couldnt investigate further.
>>
>> [   58.201114] tipc: Started in single node mode
>> [   58.212991] Started in network mode
>> [   58.213796] Own node address <1.1.1>, network identity 4711
>> [   58.238416] 8021q: adding VLAN 0 to HW filter on device data0
>> [   58.252217] 8021q: adding VLAN 0 to HW filter on device data1
>> [   58.270822] Enabled bearer <eth:data0>, discovery domain <1.1.0>,
>> priority 10
>> [   58.571114] general protection fault: 0000 [#1] SMP
>> [   58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel
>> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio
>> [   58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15
>> [   58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [   58.572031] task: ffffffff81c0d540 task.stack: ffffffff81c00000
>> [   58.572031] RIP: 0010:[<ffffffff8162f10d>] [<ffffffff8162f10d>]
>> skb_release_head_state+0x4d/0xa0
>> [   58.572031] RSP: 0018:ffff880037c03ba0  EFLAGS: 00010246
>> [   58.572031] RAX: 0001000000000000 RBX: ffff880033fffa00 RCX:
>> 00000000000000ff
>> [   58.572031] RDX: 0000000000000000 RSI: ffff880037c03bca RDI:
>> ffff880033fffa00
>> [   58.572031] RBP: ffff880037c03ba8 R08: ffffffffa005f2c0 R09:
>> 0000000000000000
>> [   58.572031] R10: ffff880035b0f0a0 R11: ffffea0000000000 R12:
>> ffff880033fffa00
>> [   58.572031] R13: ffffffffa0048fd4 R14: ffffffff81cfbec0 R15:
>> ffff880033718000
>> [   58.572031] FS:  0000000000000000(0000) GS:ffff880037c00000(0000)
>> knlGS:0000000000000000
>> [   58.572031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   58.572031] CR2: 0000000000851bf0 CR3: 0000000035b00000 CR4:
>> 00000000000006f0
>> [   58.572031] Stack:
>> [   58.572031]  ffff880033fffa00 ffff880037c03bc0 ffffffff8162f2b2
>> ffff880033fffa00
>> [   58.572031]  ffff880037c03be8 ffffffff8162f327 ffff880033fffa00
>> 0000000000000000
>> [   58.572031]  ffff880035b32540 ffff880037c03c68 ffffffffa0048fd4
>> 0000000000000082
>> [   58.572031] Call Trace:
>> [   58.572031]  <IRQ> [   58.572031] [<ffffffff8162f2b2>]
>> skb_release_all+0x12/0x30
>> [   58.572031]  [<ffffffff8162f327>] kfree_skb+0x37/0xa0
>> [   58.572031]  [<ffffffffa0048fd4>] tipc_disc_rcv+0x84/0x1d0 [tipc]
>> [   58.572031]  [<ffffffffa0053ddc>] tipc_rcv+0x3ac/0x3c0 [tipc]
>> [   58.572031]  [<ffffffff81093457>] ? find_busiest_group+0x117/0x940
>> [   58.572031]  [<ffffffffa0043088>] tipc_l2_rcv_msg+0x48/0x60 [tipc]
>> [   58.572031]  [<ffffffff81641245>] __netif_receive_skb_core+0x2e5/0xa60
>> [   58.572031]  [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0
>> [   58.572031]  [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0
>> [   58.572031]  [<ffffffff81643a8b>] __netif_receive_skb+0x1b/0x70
>> [   58.572031]  [<ffffffff81643b0d>] netif_receive_skb_internal+0x2d/0x90
>> [   58.572031]  [<ffffffff81644494>] napi_gro_receive+0x94/0x130
>> [   58.572031]  [<ffffffffa0019405>] virtnet_receive+0x1a5/0x8a0
>> [virtio_net]
>> [   58.572031]  [<ffffffffa0019b1d>] virtnet_poll+0x1d/0x80 [virtio_net]
>> [   58.572031]  [<ffffffff81644c2e>] net_rx_action+0x20e/0x390
>> [   58.572031]  [<ffffffff8178358b>] __do_softirq+0x9b/0x2a2
>> [   58.572031]  [<ffffffff81062d00>] irq_exit+0x60/0x70
>> [   58.572031]  [<ffffffff81783324>] do_IRQ+0x54/0xd0
>> [   58.572031]  [<ffffffff817817ff>] common_interrupt+0x7f/0x7f
>> [   58.572031]  <EOI> [   58.572031] [<ffffffff817805c0>] ?
>> default_idle+0x20/0xe0
>> [   58.572031]  [<ffffffff8114d439>] ? next_zone+0x29/0x30
>> [   58.572031]  [<ffffffff8102769f>] arch_cpu_idle+0xf/0x20
>> [   58.572031]  [<ffffffff81780a0c>] default_idle_call+0x2c/0x30
>> [   58.572031]  [<ffffffff8109a4d7>] cpu_startup_entry+0x177/0x1e0
>> [   58.572031]  [<ffffffff8177a7f7>] rest_init+0x77/0x80
>> [   58.572031]  [<ffffffff81d5deb5>] start_kernel+0x40e/0x41b
>> [   58.572031]  [<ffffffff81d5d42f>] x86_64_start_reservations+0x2a/0x2c
>> [   58.572031]  [<ffffffff81d5d51b>] x86_64_start_kernel+0xea/0xed
>> [   58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36
>> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75
>> 30 48 89 df <ff> d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3
>> e8 bb
>> [   58.572031] RIP  [<ffffffff8162f10d>] skb_release_head_state+0x4d/0xa0
>> [   58.572031]  RSP <ffff880037c03ba0>
>> [   58.662814] ---[ end trace fa57695d3ce8757f ]---
>> [   58.663875] Kernel panic - not syncing: Fatal exception in interrupt
>> [   58.664872] Kernel Offset: disabled
>> [   58.664872] ---[ end Kernel panic - not syncing: Fatal exception in
>> interrupt
>>
>> regards
>> Partha
>>
>> On 11/29/2016 06:07 PM, Jon Maloy wrote:
>>> Ying, Partha,
>>> It would be very nice I could get "acked" or "reviewed" on this so I
>>> can send it to David before net-next closes.
>>>
>>> ///jon
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
>>>> Sent: Tuesday, 29 November, 2016 12:04
>>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan
>>>> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue
>>>> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com>
>>>> Cc: ma...@donjonn.com; thompa....@gmail.com
>>>> Subject: [PATCH net-next v2 0/3] tipc: improve interaction socket-link
>>>>
>>>> We fix a very real starvation problem that may occur when the link
>>>> level runs into send buffer congestion. At the same time we make the
>>>> interaction between the socket and link layer simpler and more
>>>> consistent.
>>>>
>>>> v2: - Simplified link congestion check to only check against own
>>>>       importance limit. This reduces the risk of higher levels
>>>>       starving out lower levels.
>>>>
>>>> Jon Maloy (3):
>>>>   tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
>>>>     functions
>>>>   tipc: modify struct tipc_plist to be more versatile
>>>>   tipc: reduce risk of user starvation during link congestion
>>>>
>>>>  net/tipc/bcast.c      |   2 +-
>>>>  net/tipc/link.c       |  81 ++++-----
>>>>  net/tipc/msg.h        |   8 +-
>>>>  net/tipc/name_table.c | 100 +++++++----
>>>>  net/tipc/name_table.h |  21 +--
>>>>  net/tipc/node.c       |   2 +-
>>>>  net/tipc/socket.c     | 450
>>>> ++++++++++++++++++++++----------------------------
>>>>  7 files changed, 327 insertions(+), 337 deletions(-)
>>>>
>>>> --
>>>> 2.7.4
>>>
>

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to