Re: [tipc-discussion] soft lockup for TIPC

2016-11-21 Thread XIANG Haiming
Hi Jon, I am sorry that we cannot upgrade to a more recent kernel because we follow the Red Hat release. Can you give us one patch for this issue in 3.x kernel? -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: 2016年11月22日 0:37 To: XIANG Haiming; tipc-discussion

Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

2016-11-21 Thread John Thompson
Hi Partha, My test has 4 nodes, 2 of which are alternately rebooting. When the rebooted node rejoins a few minutes pass and then the other node is rebooted. I am not printing out link stats and believe that the the other code is not doing so either, when nodes leave or rejoin. JT On Tue, Nov 2

Re: [tipc-discussion] soft lockup for TIPC

2016-11-21 Thread Jon Maloy
Hi Xiang, Although the version you are using has the same number (I am planning to step it to 3.0.0 soon) as the current one in the latest kernels (4.x), it is a very different species indeed. Almost all code has been rewritten, and in some cases more than once. I would strongly suggest you upgr

[tipc-discussion] [PATCH net v2 2/2] tipc: fix nametbl_lock soft lockup at module exit

2016-11-21 Thread Parthasarathy Bhuvaragan
Commit 333f796235a527 ("tipc: fix a race condition leading to subscriber refcnt bug") reveals a soft lockup while acquiring nametbl_lock. Before commit 333f796235a527, we call tipc_conn_shutdown() from tipc_close_conn() in the context of tipc_topsrv_stop(). In that context, we are allowed to grab

[tipc-discussion] [PATCH net v2 1/2] tipc: fix nametbl_lock soft lockup at node/link events

2016-11-21 Thread Parthasarathy Bhuvaragan
We trigger a soft lockup as we grab nametbl_lock twice if the node has a pending node up/down or link up/down event while: - we process an incoming named message in tipc_named_rcv() and perform an tipc_update_nametbl(). - we have pending backlog items in the name distributor queue during a name

Re: [tipc-discussion] [PATCH net-next 3/3] tipc: reduce risk of user starvation during link congestion

2016-11-21 Thread Jon Maloy
I am having some new doubts about our current link congestion criteria. See below. ///jon > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, 21 November, 2016 09:57 > To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan > ; Ying Xue > ;

[tipc-discussion] [PATCH net-next 3/3] tipc: reduce risk of user starvation during link congestion

2016-11-21 Thread Jon Maloy
The socket code currently handles link congestion by either blocking and trying to send again when the congestion has abated, or just returning to the user with -EAGAIN and let him re-try later. This mechanism is prone to starvation, because the wakeup algorithm is non-atomic. During the time the

[tipc-discussion] [PATCH net-next 1/3] tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

2016-11-21 Thread Jon Maloy
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very similar. The latter function is also called from two locations, and there will be more in the coming commits, which will all need to test on different conditions. Instead of making yet another duplicates of the function, we n

[tipc-discussion] [PATCH net-next 2/3] tipc: modify struct tipc_plist to be more versatile

2016-11-21 Thread Jon Maloy
During multicast reception we currently use a simple linked list with push/pop semantics to store port numbers. We now see a need for a more generic list for storing values of type u32. We therefore make some modifications to this list, while replacing the prefix 'tipc_plist_' with 'u32_'. We also

[tipc-discussion] [PATCH net-next 0/3] tipc: improve interaction socket-link

2016-11-21 Thread Jon Maloy
We fix a very real starvation problem that may occur when the link level runs into send buffer congestion. At the same time we make the interaction between the socket and link layer simpler and more consistent. Jon Maloy (3): tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() fu

Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

2016-11-21 Thread Parthasarathy Bhuvaragan
Hi, There is an other branch where softlockup for nametbl_lock occurs. tipc_named_rcv() Grabs nametbl_lock tipc_update_nametbl() (publish/withdraw) tipc_node_subscribe()/unsubscribe() tipc_node_write_unlock() << lockup occurs if it needs to process NODE UP/DOWN LINK UP/D

Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

2016-11-21 Thread Parthasarathy Bhuvaragan
Hi, tipc_nametbl_withdraw() triggers the softlockup as it tries to grab nametbl_lock twice if the node triggered a TIPC_NOTIFY_LINK_DOWN event while its is running. The erroneous call chain is: tipc_nametbl_withdraw() Grab nametbl_lock tipc_named_process_backlog() tipc_update_nametb