On 12/05/2016 09:59 PM, Jon Maloy wrote: > > >> -----Original Message----- >> From: Parthasarathy Bhuvaragan >> Sent: Monday, 05 December, 2016 15:11 >> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>; >> tipc-discussion@lists.sourceforge.net; Ying Xue <ying....@windriver.com> >> Cc: thompa....@gmail.com >> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >> >> Hi Jon, >> >> Sorry for the delay, could not work due to sick child. >> >> The crash occurs due to the last commit: >> "tipc: reduce risk of user starvation during link congestion" >> >> I examined the crash today, the crash due to array out of bounds for >> skb->cb[48]. >> The max size allowed for the callback area is 48bytes, whereas the new struct >> tipc_skb_cb is 64 bytes. > > Weird. I did of course test this, and on my system sizeof(tipc_skb_cb) yields > 40, and everything works flawlessly. Jon, you are on 32-bit and iam on 64-bit, thats why the size differs. > >> This overrides the skb->destructor callback lying below the 'skb->cb'. >> The sizeof struct sk_buff_head itself is 48bytes. >> >> crash> p *(struct sk_buff*)0xffff88003f007600 >> : >> dev = 0xffff88003f985000, >> cb = "\000\00\000", >> _skb_refdst = 0, >> destructor = 0x1000000000000, << insane function pointer >> >> >> I think the simpler way to place these packets 'pkts' into the backlogq and >> allow >> temporary over-committing and keep the wakeup mechanism as it is. > > You are right. The end result will be the same. I'll change it and recommit. As i discussed with you earlier, my implementation on 3.12 track to reduce latency followed the following principles: - We schedule a user-socket based on the limit settings in the backlog queue. Thus the importance setting of a user socket influences the scheduling decision of TIPC. - Every socket will be permitted to transmit 1 message (the first message or after a poll/epoll write event) to the link layer after which the above scheduling principles are applied.
I still think that this ratio of 1:2:3:4 has wide span and penalizes the low importance sockets. This is an inverse ratio of user-kernel context switches. I feel that this should be user configurable and the default ratio should be 1:1.25:1.5:2. /partha > > ///jon > >> >> This way, we transmit the packet in tipc_link_advance_backlog() instead of >> doing >> it in >> link_prepare_wakeup(). Its misleading that link_prepare_wakeup() transmits >> packets. >> >> /Partha >> >> >> On 11/30/2016 07:48 PM, Jon Maloy wrote: >>> Weird. Looks like a corrupted incoming buffer directly at startup, >>> before any of my new code is active. Is this repeatable? >>> >>> ///jon >>> >>> >>> On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote: >>>> Hi Jon, >>>> >>>> With your patches, I get the following crash when loading the tipc >>>> module. Leaving home now, so couldnt investigate further. >>>> >>>> [ 58.201114] tipc: Started in single node mode >>>> [ 58.212991] Started in network mode >>>> [ 58.213796] Own node address <1.1.1>, network identity 4711 >>>> [ 58.238416] 8021q: adding VLAN 0 to HW filter on device data0 >>>> [ 58.252217] 8021q: adding VLAN 0 to HW filter on device data1 >>>> [ 58.270822] Enabled bearer <eth:data0>, discovery domain <1.1.0>, >>>> priority 10 >>>> [ 58.571114] general protection fault: 0000 [#1] SMP >>>> [ 58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel >>>> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio >>>> [ 58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15 >>>> [ 58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>>> [ 58.572031] task: ffffffff81c0d540 task.stack: ffffffff81c00000 >>>> [ 58.572031] RIP: 0010:[<ffffffff8162f10d>] [<ffffffff8162f10d>] >>>> skb_release_head_state+0x4d/0xa0 >>>> [ 58.572031] RSP: 0018:ffff880037c03ba0 EFLAGS: 00010246 >>>> [ 58.572031] RAX: 0001000000000000 RBX: ffff880033fffa00 RCX: >>>> 00000000000000ff >>>> [ 58.572031] RDX: 0000000000000000 RSI: ffff880037c03bca RDI: >>>> ffff880033fffa00 >>>> [ 58.572031] RBP: ffff880037c03ba8 R08: ffffffffa005f2c0 R09: >>>> 0000000000000000 >>>> [ 58.572031] R10: ffff880035b0f0a0 R11: ffffea0000000000 R12: >>>> ffff880033fffa00 >>>> [ 58.572031] R13: ffffffffa0048fd4 R14: ffffffff81cfbec0 R15: >>>> ffff880033718000 >>>> [ 58.572031] FS: 0000000000000000(0000) GS:ffff880037c00000(0000) >>>> knlGS:0000000000000000 >>>> [ 58.572031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 58.572031] CR2: 0000000000851bf0 CR3: 0000000035b00000 CR4: >>>> 00000000000006f0 >>>> [ 58.572031] Stack: >>>> [ 58.572031] ffff880033fffa00 ffff880037c03bc0 ffffffff8162f2b2 >>>> ffff880033fffa00 >>>> [ 58.572031] ffff880037c03be8 ffffffff8162f327 ffff880033fffa00 >>>> 0000000000000000 >>>> [ 58.572031] ffff880035b32540 ffff880037c03c68 ffffffffa0048fd4 >>>> 0000000000000082 >>>> [ 58.572031] Call Trace: >>>> [ 58.572031] <IRQ> [ 58.572031] [<ffffffff8162f2b2>] >>>> skb_release_all+0x12/0x30 >>>> [ 58.572031] [<ffffffff8162f327>] kfree_skb+0x37/0xa0 >>>> [ 58.572031] [<ffffffffa0048fd4>] tipc_disc_rcv+0x84/0x1d0 [tipc] >>>> [ 58.572031] [<ffffffffa0053ddc>] tipc_rcv+0x3ac/0x3c0 [tipc] >>>> [ 58.572031] [<ffffffff81093457>] ? find_busiest_group+0x117/0x940 >>>> [ 58.572031] [<ffffffffa0043088>] tipc_l2_rcv_msg+0x48/0x60 [tipc] >>>> [ 58.572031] [<ffffffff81641245>] __netif_receive_skb_core+0x2e5/0xa60 >>>> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >>>> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >>>> [ 58.572031] [<ffffffff81643a8b>] __netif_receive_skb+0x1b/0x70 >>>> [ 58.572031] [<ffffffff81643b0d>] netif_receive_skb_internal+0x2d/0x90 >>>> [ 58.572031] [<ffffffff81644494>] napi_gro_receive+0x94/0x130 >>>> [ 58.572031] [<ffffffffa0019405>] virtnet_receive+0x1a5/0x8a0 >>>> [virtio_net] >>>> [ 58.572031] [<ffffffffa0019b1d>] virtnet_poll+0x1d/0x80 [virtio_net] >>>> [ 58.572031] [<ffffffff81644c2e>] net_rx_action+0x20e/0x390 >>>> [ 58.572031] [<ffffffff8178358b>] __do_softirq+0x9b/0x2a2 >>>> [ 58.572031] [<ffffffff81062d00>] irq_exit+0x60/0x70 >>>> [ 58.572031] [<ffffffff81783324>] do_IRQ+0x54/0xd0 >>>> [ 58.572031] [<ffffffff817817ff>] common_interrupt+0x7f/0x7f >>>> [ 58.572031] <EOI> [ 58.572031] [<ffffffff817805c0>] ? >>>> default_idle+0x20/0xe0 >>>> [ 58.572031] [<ffffffff8114d439>] ? next_zone+0x29/0x30 >>>> [ 58.572031] [<ffffffff8102769f>] arch_cpu_idle+0xf/0x20 >>>> [ 58.572031] [<ffffffff81780a0c>] default_idle_call+0x2c/0x30 >>>> [ 58.572031] [<ffffffff8109a4d7>] cpu_startup_entry+0x177/0x1e0 >>>> [ 58.572031] [<ffffffff8177a7f7>] rest_init+0x77/0x80 >>>> [ 58.572031] [<ffffffff81d5deb5>] start_kernel+0x40e/0x41b >>>> [ 58.572031] [<ffffffff81d5d42f>] x86_64_start_reservations+0x2a/0x2c >>>> [ 58.572031] [<ffffffff81d5d51b>] x86_64_start_kernel+0xea/0xed >>>> [ 58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36 >>>> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75 >>>> 30 48 89 df <ff> d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3 >>>> e8 bb >>>> [ 58.572031] RIP [<ffffffff8162f10d>] skb_release_head_state+0x4d/0xa0 >>>> [ 58.572031] RSP <ffff880037c03ba0> >>>> [ 58.662814] ---[ end trace fa57695d3ce8757f ]--- >>>> [ 58.663875] Kernel panic - not syncing: Fatal exception in interrupt >>>> [ 58.664872] Kernel Offset: disabled >>>> [ 58.664872] ---[ end Kernel panic - not syncing: Fatal exception in >>>> interrupt >>>> >>>> regards >>>> Partha >>>> >>>> On 11/29/2016 06:07 PM, Jon Maloy wrote: >>>>> Ying, Partha, >>>>> It would be very nice I could get "acked" or "reviewed" on this so I >>>>> can send it to David before net-next closes. >>>>> >>>>> ///jon >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com] >>>>>> Sent: Tuesday, 29 November, 2016 12:04 >>>>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan >>>>>> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue >>>>>> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com> >>>>>> Cc: ma...@donjonn.com; thompa....@gmail.com >>>>>> Subject: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >>>>>> >>>>>> We fix a very real starvation problem that may occur when the link >>>>>> level runs into send buffer congestion. At the same time we make the >>>>>> interaction between the socket and link layer simpler and more >>>>>> consistent. >>>>>> >>>>>> v2: - Simplified link congestion check to only check against own >>>>>> importance limit. This reduces the risk of higher levels >>>>>> starving out lower levels. >>>>>> >>>>>> Jon Maloy (3): >>>>>> tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() >>>>>> functions >>>>>> tipc: modify struct tipc_plist to be more versatile >>>>>> tipc: reduce risk of user starvation during link congestion >>>>>> >>>>>> net/tipc/bcast.c | 2 +- >>>>>> net/tipc/link.c | 81 ++++----- >>>>>> net/tipc/msg.h | 8 +- >>>>>> net/tipc/name_table.c | 100 +++++++---- >>>>>> net/tipc/name_table.h | 21 +-- >>>>>> net/tipc/node.c | 2 +- >>>>>> net/tipc/socket.c | 450 >>>>>> ++++++++++++++++++++++---------------------------- >>>>>> 7 files changed, 327 insertions(+), 337 deletions(-) >>>>>> >>>>>> -- >>>>>> 2.7.4 >>>>> >>> > ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion