Re: [tipc-discussion] [PATCH net-next v2 0/3] tipc: improve interaction socket-link

Parthasarathy Bhuvaragan Tue, 06 Dec 2016 04:50:32 -0800


On 12/06/2016 12:41 PM, Jon Maloy wrote:
>
>
>> -----Original Message-----
>> From: Parthasarathy Bhuvaragan
>> Sent: Tuesday, 06 December, 2016 03:46
>> To: Jon Maloy <jon.ma...@ericsson.com>; Jon Maloy <ma...@donjonn.com>;
>> tipc-discussion@lists.sourceforge.net; Ying Xue <ying....@windriver.com>
>> Cc: thompa....@gmail.com
>> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link
>>
>> On 12/05/2016 09:59 PM, Jon Maloy wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Parthasarathy Bhuvaragan
>>>> Sent: Monday, 05 December, 2016 15:11
>>>> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy
>> <jon.ma...@ericsson.com>;
>>>> tipc-discussion@lists.sourceforge.net; Ying Xue <ying....@windriver.com>
>>>> Cc: thompa....@gmail.com
>>>> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link
>>>>
>>>> Hi Jon,
>>>>
>>>> Sorry for the delay, could not work due to sick child.
>>>>
>>>> The crash occurs due to the last commit:
>>>> "tipc: reduce risk of user starvation during link congestion"
>>>>
>>>> I examined the crash today, the crash due to array out of bounds for skb-
>>> cb[48].
>>>> The max size allowed for the callback area is 48bytes, whereas the new 
>>>> struct
>>>> tipc_skb_cb is 64 bytes.
>>>
>>> Weird. I did of course test this, and on my system sizeof(tipc_skb_cb) 
>>> yields 40,
>> and everything works flawlessly.
>> Jon, you are on 32-bit and iam on 64-bit, thats why the size differs.
>
> No, my machine is x64 as is yours. I think it is more likely you are 
> compiling with some debug options that causes struct sk_buff_head to be 
> bigger.
Yes, I compile with CONFIG_DEBUG_SPINLOCK which increases the size of 
spinlock_t by 16 bytes on 64-bit, and is a memeber of sk_buff_head.
This will no longer be an issue as the final solution not exceed that 
size of skb->cb.
>
>>>
>>>> This overrides the skb->destructor callback lying below the 'skb->cb'.
>>>> The sizeof struct sk_buff_head itself is 48bytes.
>>>>
>>>> crash> p *(struct sk_buff*)0xffff88003f007600
>>>>    :
>>>>    dev = 0xffff88003f985000,
>>>>    cb = "\000\00\000",
>>>>    _skb_refdst = 0,
>>>>    destructor = 0x1000000000000,  << insane function pointer >>
>>>>
>>>> I think the simpler way to place these packets 'pkts' into the backlogq and
>> allow
>>>> temporary over-committing and keep the wakeup mechanism as it is.
>>>
>>> You are right. The end result will be the same. I'll change it and recommit.
>> As i discussed with you earlier, my implementation on 3.12 track to
>> reduce latency followed the following principles:
>> - We schedule a user-socket based on the limit settings in the backlog
>> queue. Thus the importance setting of a user socket influences the
>> scheduling decision of TIPC.
>
> What is different from  what I do here?
>
>> - Every socket will be permitted to transmit 1 message (the first
>> message or after a poll/epoll write event) to the link layer after which
>> the above scheduling principles are applied.
>
> Not sure what you mean. Each socket must be permitted to send as long as 
> there is space in the queue at its level. Thereafter the above principles are 
> applied.
> We don't want the socket to always stop after one message, just in case there 
> *might* be somebody else wanting to send. But maybe I misunderstand you.
No, I was describing that even if the link is congested, a socket will 
still be permitted to send a message.
The implementation after this patch series will ensure that.
No more comments from me and thanks for the patience.


/Partha
>
>>
>> I still think that this ratio of 1:2:3:4 has wide span and penalizes the
>> low importance sockets. This is an inverse ratio of user-kernel context
>> switches. I feel that this should be user configurable and the default
>> ratio should be 1:1.25:1.5:2.
>
> If your measurements show that this is better I am ok with changing default 
> ratio. But I think we should leave that for a later patch.
Ok, you can skip this in your series and we add it to backlog.
>
> ///jon
>
>>
>> /partha
>>>
>>> ///jon
>>>
>>>>
>>>> This way, we transmit the packet in tipc_link_advance_backlog() instead of
>> doing
>>>> it in
>>>> link_prepare_wakeup().  Its misleading that link_prepare_wakeup() transmits
>>>> packets.
>>>>
>>>> /Partha
>>>>
>>>>
>>>> On 11/30/2016 07:48 PM, Jon Maloy wrote:
>>>>> Weird. Looks like a corrupted incoming buffer directly at startup,
>>>>> before any of my new code is active. Is this repeatable?
>>>>>
>>>>> ///jon
>>>>>
>>>>>
>>>>> On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote:
>>>>>> Hi Jon,
>>>>>>
>>>>>> With your patches, I get the following crash when loading the tipc
>>>>>> module. Leaving home now, so couldnt investigate further.
>>>>>>
>>>>>> [   58.201114] tipc: Started in single node mode
>>>>>> [   58.212991] Started in network mode
>>>>>> [   58.213796] Own node address <1.1.1>, network identity 4711
>>>>>> [   58.238416] 8021q: adding VLAN 0 to HW filter on device data0
>>>>>> [   58.252217] 8021q: adding VLAN 0 to HW filter on device data1
>>>>>> [   58.270822] Enabled bearer <eth:data0>, discovery domain <1.1.0>,
>>>>>> priority 10
>>>>>> [   58.571114] general protection fault: 0000 [#1] SMP
>>>>>> [   58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel
>>>>>> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio
>>>>>> [   58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15
>>>>>> [   58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>>>>> [   58.572031] task: ffffffff81c0d540 task.stack: ffffffff81c00000
>>>>>> [   58.572031] RIP: 0010:[<ffffffff8162f10d>] [<ffffffff8162f10d>]
>>>>>> skb_release_head_state+0x4d/0xa0
>>>>>> [   58.572031] RSP: 0018:ffff880037c03ba0  EFLAGS: 00010246
>>>>>> [   58.572031] RAX: 0001000000000000 RBX: ffff880033fffa00 RCX:
>>>>>> 00000000000000ff
>>>>>> [   58.572031] RDX: 0000000000000000 RSI: ffff880037c03bca RDI:
>>>>>> ffff880033fffa00
>>>>>> [   58.572031] RBP: ffff880037c03ba8 R08: ffffffffa005f2c0 R09:
>>>>>> 0000000000000000
>>>>>> [   58.572031] R10: ffff880035b0f0a0 R11: ffffea0000000000 R12:
>>>>>> ffff880033fffa00
>>>>>> [   58.572031] R13: ffffffffa0048fd4 R14: ffffffff81cfbec0 R15:
>>>>>> ffff880033718000
>>>>>> [   58.572031] FS:  0000000000000000(0000) GS:ffff880037c00000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [   58.572031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [   58.572031] CR2: 0000000000851bf0 CR3: 0000000035b00000 CR4:
>>>>>> 00000000000006f0
>>>>>> [   58.572031] Stack:
>>>>>> [   58.572031]  ffff880033fffa00 ffff880037c03bc0 ffffffff8162f2b2
>>>>>> ffff880033fffa00
>>>>>> [   58.572031]  ffff880037c03be8 ffffffff8162f327 ffff880033fffa00
>>>>>> 0000000000000000
>>>>>> [   58.572031]  ffff880035b32540 ffff880037c03c68 ffffffffa0048fd4
>>>>>> 0000000000000082
>>>>>> [   58.572031] Call Trace:
>>>>>> [   58.572031]  <IRQ> [   58.572031] [<ffffffff8162f2b2>]
>>>>>> skb_release_all+0x12/0x30
>>>>>> [   58.572031]  [<ffffffff8162f327>] kfree_skb+0x37/0xa0
>>>>>> [   58.572031]  [<ffffffffa0048fd4>] tipc_disc_rcv+0x84/0x1d0 [tipc]
>>>>>> [   58.572031]  [<ffffffffa0053ddc>] tipc_rcv+0x3ac/0x3c0 [tipc]
>>>>>> [   58.572031]  [<ffffffff81093457>] ? find_busiest_group+0x117/0x940
>>>>>> [   58.572031]  [<ffffffffa0043088>] tipc_l2_rcv_msg+0x48/0x60 [tipc]
>>>>>> [   58.572031]  [<ffffffff81641245>] __netif_receive_skb_core+0x2e5/0xa60
>>>>>> [   58.572031]  [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0
>>>>>> [   58.572031]  [<ffffffff816360ba>] ? __build_skb+0x2a/0xe0
>>>>>> [   58.572031]  [<ffffffff81643a8b>] __netif_receive_skb+0x1b/0x70
>>>>>> [   58.572031]  [<ffffffff81643b0d>] netif_receive_skb_internal+0x2d/0x90
>>>>>> [   58.572031]  [<ffffffff81644494>] napi_gro_receive+0x94/0x130
>>>>>> [   58.572031]  [<ffffffffa0019405>] virtnet_receive+0x1a5/0x8a0
>>>>>> [virtio_net]
>>>>>> [   58.572031]  [<ffffffffa0019b1d>] virtnet_poll+0x1d/0x80 [virtio_net]
>>>>>> [   58.572031]  [<ffffffff81644c2e>] net_rx_action+0x20e/0x390
>>>>>> [   58.572031]  [<ffffffff8178358b>] __do_softirq+0x9b/0x2a2
>>>>>> [   58.572031]  [<ffffffff81062d00>] irq_exit+0x60/0x70
>>>>>> [   58.572031]  [<ffffffff81783324>] do_IRQ+0x54/0xd0
>>>>>> [   58.572031]  [<ffffffff817817ff>] common_interrupt+0x7f/0x7f
>>>>>> [   58.572031]  <EOI> [   58.572031] [<ffffffff817805c0>] ?
>>>>>> default_idle+0x20/0xe0
>>>>>> [   58.572031]  [<ffffffff8114d439>] ? next_zone+0x29/0x30
>>>>>> [   58.572031]  [<ffffffff8102769f>] arch_cpu_idle+0xf/0x20
>>>>>> [   58.572031]  [<ffffffff81780a0c>] default_idle_call+0x2c/0x30
>>>>>> [   58.572031]  [<ffffffff8109a4d7>] cpu_startup_entry+0x177/0x1e0
>>>>>> [   58.572031]  [<ffffffff8177a7f7>] rest_init+0x77/0x80
>>>>>> [   58.572031]  [<ffffffff81d5deb5>] start_kernel+0x40e/0x41b
>>>>>> [   58.572031]  [<ffffffff81d5d42f>] x86_64_start_reservations+0x2a/0x2c
>>>>>> [   58.572031]  [<ffffffff81d5d51b>] x86_64_start_kernel+0xea/0xed
>>>>>> [   58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36
>>>>>> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75
>>>>>> 30 48 89 df <ff> d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3
>>>>>> e8 bb
>>>>>> [   58.572031] RIP  [<ffffffff8162f10d>] skb_release_head_state+0x4d/0xa0
>>>>>> [   58.572031]  RSP <ffff880037c03ba0>
>>>>>> [   58.662814] ---[ end trace fa57695d3ce8757f ]---
>>>>>> [   58.663875] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>> [   58.664872] Kernel Offset: disabled
>>>>>> [   58.664872] ---[ end Kernel panic - not syncing: Fatal exception in
>>>>>> interrupt
>>>>>>
>>>>>> regards
>>>>>> Partha
>>>>>>
>>>>>> On 11/29/2016 06:07 PM, Jon Maloy wrote:
>>>>>>> Ying, Partha,
>>>>>>> It would be very nice I could get "acked" or "reviewed" on this so I
>>>>>>> can send it to David before net-next closes.
>>>>>>>
>>>>>>> ///jon
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
>>>>>>>> Sent: Tuesday, 29 November, 2016 12:04
>>>>>>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan
>>>>>>>> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue
>>>>>>>> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com>
>>>>>>>> Cc: ma...@donjonn.com; thompa....@gmail.com
>>>>>>>> Subject: [PATCH net-next v2 0/3] tipc: improve interaction socket-link
>>>>>>>>
>>>>>>>> We fix a very real starvation problem that may occur when the link
>>>>>>>> level runs into send buffer congestion. At the same time we make the
>>>>>>>> interaction between the socket and link layer simpler and more
>>>>>>>> consistent.
>>>>>>>>
>>>>>>>> v2: - Simplified link congestion check to only check against own
>>>>>>>>       importance limit. This reduces the risk of higher levels
>>>>>>>>       starving out lower levels.
>>>>>>>>
>>>>>>>> Jon Maloy (3):
>>>>>>>>   tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
>>>>>>>>     functions
>>>>>>>>   tipc: modify struct tipc_plist to be more versatile
>>>>>>>>   tipc: reduce risk of user starvation during link congestion
>>>>>>>>
>>>>>>>>  net/tipc/bcast.c      |   2 +-
>>>>>>>>  net/tipc/link.c       |  81 ++++-----
>>>>>>>>  net/tipc/msg.h        |   8 +-
>>>>>>>>  net/tipc/name_table.c | 100 +++++++----
>>>>>>>>  net/tipc/name_table.h |  21 +--
>>>>>>>>  net/tipc/node.c       |   2 +-
>>>>>>>>  net/tipc/socket.c     | 450
>>>>>>>> ++++++++++++++++++++++----------------------------
>>>>>>>>  7 files changed, 327 insertions(+), 337 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.7.4
>>>>>>>
>>>>>
>>>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v2 0/3] tipc: improve interaction socket-link

Reply via email to