On 12/06/2016 12:41 PM, Jon Maloy wrote: > > >> -----Original Message----- >> From: Parthasarathy Bhuvaragan >> Sent: Tuesday, 06 December, 2016 03:46 >> To: Jon Maloy <jon.ma...@ericsson.com>; Jon Maloy <ma...@donjonn.com>; >> tipc-discussion@lists.sourceforge.net; Ying Xue <ying....@windriver.com> >> Cc: thompa....@gmail.com >> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >> >> On 12/05/2016 09:59 PM, Jon Maloy wrote: >>> >>> >>>> -----Original Message----- >>>> From: Parthasarathy Bhuvaragan >>>> Sent: Monday, 05 December, 2016 15:11 >>>> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy >> <jon.ma...@ericsson.com>; >>>> tipc-discussion@lists.sourceforge.net; Ying Xue <ying....@windriver.com> >>>> Cc: thompa....@gmail.com >>>> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >>>> >>>> Hi Jon, >>>> >>>> Sorry for the delay, could not work due to sick child. >>>> >>>> The crash occurs due to the last commit: >>>> "tipc: reduce risk of user starvation during link congestion" >>>> >>>> I examined the crash today, the crash due to array out of bounds for skb- >>> cb[48]. >>>> The max size allowed for the callback area is 48bytes, whereas the new >>>> struct >>>> tipc_skb_cb is 64 bytes. >>> >>> Weird. I did of course test this, and on my system sizeof(tipc_skb_cb) >>> yields 40, >> and everything works flawlessly. >> Jon, you are on 32-bit and iam on 64-bit, thats why the size differs. > > No, my machine is x64 as is yours. I think it is more likely you are > compiling with some debug options that causes struct sk_buff_head to be > bigger. Yes, I compile with CONFIG_DEBUG_SPINLOCK which increases the size of spinlock_t by 16 bytes on 64-bit, and is a memeber of sk_buff_head. This will no longer be an issue as the final solution not exceed that size of skb->cb. > >>> >>>> This overrides the skb->destructor callback lying below the 'skb->cb'. >>>> The sizeof struct sk_buff_head itself is 48bytes. >>>> >>>> crash> p *(struct sk_buff*)0xffff88003f007600 >>>> : >>>> dev = 0xffff88003f985000, >>>> cb = "\000\00\000", >>>> _skb_refdst = 0, >>>> destructor = 0x1000000000000, << insane function pointer >> >>>> >>>> I think the simpler way to place these packets 'pkts' into the backlogq and >> allow >>>> temporary over-committing and keep the wakeup mechanism as it is. >>> >>> You are right. The end result will be the same. I'll change it and recommit. >> As i discussed with you earlier, my implementation on 3.12 track to >> reduce latency followed the following principles: >> - We schedule a user-socket based on the limit settings in the backlog >> queue. Thus the importance setting of a user socket influences the >> scheduling decision of TIPC. > > What is different from what I do here? > >> - Every socket will be permitted to transmit 1 message (the first >> message or after a poll/epoll write event) to the link layer after which >> the above scheduling principles are applied. > > Not sure what you mean. Each socket must be permitted to send as long as > there is space in the queue at its level. Thereafter the above principles are > applied. > We don't want the socket to always stop after one message, just in case there > *might* be somebody else wanting to send. But maybe I misunderstand you. No, I was describing that even if the link is congested, a socket will still be permitted to send a message. The implementation after this patch series will ensure that. No more comments from me and thanks for the patience.
/Partha > >> >> I still think that this ratio of 1:2:3:4 has wide span and penalizes the >> low importance sockets. This is an inverse ratio of user-kernel context >> switches. I feel that this should be user configurable and the default >> ratio should be 1:1.25:1.5:2. > > If your measurements show that this is better I am ok with changing default > ratio. But I think we should leave that for a later patch. Ok, you can skip this in your series and we add it to backlog. > > ///jon > >> >> /partha >>> >>> ///jon >>> >>>> >>>> This way, we transmit the packet in tipc_link_advance_backlog() instead of >> doing >>>> it in >>>> link_prepare_wakeup(). Its misleading that link_prepare_wakeup() transmits >>>> packets. >>>> >>>> /Partha >>>> >>>> >>>> On 11/30/2016 07:48 PM, Jon Maloy wrote: >>>>> Weird. Looks like a corrupted incoming buffer directly at startup, >>>>> before any of my new code is active. Is this repeatable? >>>>> >>>>> ///jon >>>>> >>>>> >>>>> On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote: >>>>>> Hi Jon, >>>>>> >>>>>> With your patches, I get the following crash when loading the tipc >>>>>> module. Leaving home now, so couldnt investigate further. >>>>>> >>>>>> [ 58.201114] tipc: Started in single node mode >>>>>> [ 58.212991] Started in network mode >>>>>> [ 58.213796] Own node address <1.1.1>, network identity 4711 >>>>>> [ 58.238416] 8021q: adding VLAN 0 to HW filter on device data0 >>>>>> [ 58.252217] 8021q: adding VLAN 0 to HW filter on device data1 >>>>>> [ 58.270822] Enabled bearer <eth:data0>, discovery domain <1.1.0>, >>>>>> priority 10 >>>>>> [ 58.571114] general protection fault: 0000 [#1] SMP >>>>>> [ 58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel >>>>>> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio >>>>>> [ 58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15 >>>>>> [ 58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>>>>> [ 58.572031] task: ffffffff81c0d540 task.stack: ffffffff81c00000 >>>>>> [ 58.572031] RIP: 0010:[<ffffffff8162f10d>] [<ffffffff8162f10d>] >>>>>> skb_release_head_state+0x4d/0xa0 >>>>>> [ 58.572031] RSP: 0018:ffff880037c03ba0 EFLAGS: 00010246 >>>>>> [ 58.572031] RAX: 0001000000000000 RBX: ffff880033fffa00 RCX: >>>>>> 00000000000000ff >>>>>> [ 58.572031] RDX: 0000000000000000 RSI: ffff880037c03bca RDI: >>>>>> ffff880033fffa00 >>>>>> [ 58.572031] RBP: ffff880037c03ba8 R08: ffffffffa005f2c0 R09: >>>>>> 0000000000000000 >>>>>> [ 58.572031] R10: ffff880035b0f0a0 R11: ffffea0000000000 R12: >>>>>> ffff880033fffa00 >>>>>> [ 58.572031] R13: ffffffffa0048fd4 R14: ffffffff81cfbec0 R15: >>>>>> ffff880033718000 >>>>>> [ 58.572031] FS: 0000000000000000(0000) GS:ffff880037c00000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 58.572031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 58.572031] CR2: 0000000000851bf0 CR3: 0000000035b00000 CR4: >>>>>> 00000000000006f0 >>>>>> [ 58.572031] Stack: >>>>>> [ 58.572031] ffff880033fffa00 ffff880037c03bc0 ffffffff8162f2b2 >>>>>> ffff880033fffa00 >>>>>> [ 58.572031] ffff880037c03be8 ffffffff8162f327 ffff880033fffa00 >>>>>> 0000000000000000 >>>>>> [ 58.572031] ffff880035b32540 ffff880037c03c68 ffffffffa0048fd4 >>>>>> 0000000000000082 >>>>>> [ 58.572031] Call Trace: >>>>>> [ 58.572031] <IRQ> [ 58.572031] [<ffffffff8162f2b2>] >>>>>> skb_release_all+0x12/0x30 >>>>>> [ 58.572031] [<ffffffff8162f327>] kfree_skb+0x37/0xa0 >>>>>> [ 58.572031] [<ffffffffa0048fd4>] tipc_disc_rcv+0x84/0x1d0 [tipc] >>>>>> [ 58.572031] [<ffffffffa0053ddc>] tipc_rcv+0x3ac/0x3c0 [tipc] >>>>>> [ 58.572031] [<ffffffff81093457>] ? find_busiest_group+0x117/0x940 >>>>>> [ 58.572031] [<ffffffffa0043088>] tipc_l2_rcv_msg+0x48/0x60 [tipc] >>>>>> [ 58.572031] [<ffffffff81641245>] __netif_receive_skb_core+0x2e5/0xa60 >>>>>> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >>>>>> [ 58.572031] [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0 >>>>>> [ 58.572031] [<ffffffff81643a8b>] __netif_receive_skb+0x1b/0x70 >>>>>> [ 58.572031] [<ffffffff81643b0d>] netif_receive_skb_internal+0x2d/0x90 >>>>>> [ 58.572031] [<ffffffff81644494>] napi_gro_receive+0x94/0x130 >>>>>> [ 58.572031] [<ffffffffa0019405>] virtnet_receive+0x1a5/0x8a0 >>>>>> [virtio_net] >>>>>> [ 58.572031] [<ffffffffa0019b1d>] virtnet_poll+0x1d/0x80 [virtio_net] >>>>>> [ 58.572031] [<ffffffff81644c2e>] net_rx_action+0x20e/0x390 >>>>>> [ 58.572031] [<ffffffff8178358b>] __do_softirq+0x9b/0x2a2 >>>>>> [ 58.572031] [<ffffffff81062d00>] irq_exit+0x60/0x70 >>>>>> [ 58.572031] [<ffffffff81783324>] do_IRQ+0x54/0xd0 >>>>>> [ 58.572031] [<ffffffff817817ff>] common_interrupt+0x7f/0x7f >>>>>> [ 58.572031] <EOI> [ 58.572031] [<ffffffff817805c0>] ? >>>>>> default_idle+0x20/0xe0 >>>>>> [ 58.572031] [<ffffffff8114d439>] ? next_zone+0x29/0x30 >>>>>> [ 58.572031] [<ffffffff8102769f>] arch_cpu_idle+0xf/0x20 >>>>>> [ 58.572031] [<ffffffff81780a0c>] default_idle_call+0x2c/0x30 >>>>>> [ 58.572031] [<ffffffff8109a4d7>] cpu_startup_entry+0x177/0x1e0 >>>>>> [ 58.572031] [<ffffffff8177a7f7>] rest_init+0x77/0x80 >>>>>> [ 58.572031] [<ffffffff81d5deb5>] start_kernel+0x40e/0x41b >>>>>> [ 58.572031] [<ffffffff81d5d42f>] x86_64_start_reservations+0x2a/0x2c >>>>>> [ 58.572031] [<ffffffff81d5d51b>] x86_64_start_kernel+0xea/0xed >>>>>> [ 58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36 >>>>>> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75 >>>>>> 30 48 89 df <ff> d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3 >>>>>> e8 bb >>>>>> [ 58.572031] RIP [<ffffffff8162f10d>] skb_release_head_state+0x4d/0xa0 >>>>>> [ 58.572031] RSP <ffff880037c03ba0> >>>>>> [ 58.662814] ---[ end trace fa57695d3ce8757f ]--- >>>>>> [ 58.663875] Kernel panic - not syncing: Fatal exception in interrupt >>>>>> [ 58.664872] Kernel Offset: disabled >>>>>> [ 58.664872] ---[ end Kernel panic - not syncing: Fatal exception in >>>>>> interrupt >>>>>> >>>>>> regards >>>>>> Partha >>>>>> >>>>>> On 11/29/2016 06:07 PM, Jon Maloy wrote: >>>>>>> Ying, Partha, >>>>>>> It would be very nice I could get "acked" or "reviewed" on this so I >>>>>>> can send it to David before net-next closes. >>>>>>> >>>>>>> ///jon >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com] >>>>>>>> Sent: Tuesday, 29 November, 2016 12:04 >>>>>>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan >>>>>>>> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue >>>>>>>> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com> >>>>>>>> Cc: ma...@donjonn.com; thompa....@gmail.com >>>>>>>> Subject: [PATCH net-next v2 0/3] tipc: improve interaction socket-link >>>>>>>> >>>>>>>> We fix a very real starvation problem that may occur when the link >>>>>>>> level runs into send buffer congestion. At the same time we make the >>>>>>>> interaction between the socket and link layer simpler and more >>>>>>>> consistent. >>>>>>>> >>>>>>>> v2: - Simplified link congestion check to only check against own >>>>>>>> importance limit. This reduces the risk of higher levels >>>>>>>> starving out lower levels. >>>>>>>> >>>>>>>> Jon Maloy (3): >>>>>>>> tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() >>>>>>>> functions >>>>>>>> tipc: modify struct tipc_plist to be more versatile >>>>>>>> tipc: reduce risk of user starvation during link congestion >>>>>>>> >>>>>>>> net/tipc/bcast.c | 2 +- >>>>>>>> net/tipc/link.c | 81 ++++----- >>>>>>>> net/tipc/msg.h | 8 +- >>>>>>>> net/tipc/name_table.c | 100 +++++++---- >>>>>>>> net/tipc/name_table.h | 21 +-- >>>>>>>> net/tipc/node.c | 2 +- >>>>>>>> net/tipc/socket.c | 450 >>>>>>>> ++++++++++++++++++++++---------------------------- >>>>>>>> 7 files changed, 327 insertions(+), 337 deletions(-) >>>>>>>> >>>>>>>> -- >>>>>>>> 2.7.4 >>>>>>> >>>>> >>> ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/xeonphi _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion