Hi Jon, I think the issue now is not minor as I concluded. Support we have a huge cluster 75 VMS node, this issue will caused traffic performance degradation. So, please take time to review the patch.
Thanks, Hoang -----Original Message----- From: Hoang Le <[email protected]> Sent: Monday, November 5, 2018 3:29 PM To: [email protected]; [email protected]; [email protected]; [email protected] Subject: [tipc-discussion] [net-next] tipc: fix node keep alive interval calculation When setting LINK tolerance, node timer interval will be calculated base on the LINK with lowest tolerance. But when calculated, the old node timer interval only updated if current setting value (tolerance/4) less than old ones regardless of number of links as well as links' lowest tolerance value. This caused to two cases missing if tolerance changed as following: Case 1: 1.1/ There is one link (L1) available in the system 1.2/ Set L1's tolerance from 1500ms => lower (i.e 500ms) 1.3/ Then, fallback to default (1500ms) or higher (i.e 2000ms) Expected: node timer interval is 1500/4=375ms after 1.3 Result: node timer interval will not being updated after changing tolerance at 1.3 since its value 1500/4=375ms is not less than 500/4=125ms at 1.2. Case 2: 2.1/ There are two links (L1, L2) available in the system 2.2/ L1 and L2 tolerance value are 2000ms as initial 2.3/ Set L2's tolerance from 2000ms => lower 1500ms 2.4/ Disable link L2 (bring down its bearer) Expected: node timer interval is 2000ms/4=500ms after 2.4 Result: node timer interval will not being updated after disabling L2 since its value 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3 although L2 is already not available in the system. To fix this, we will update node timer interval according to link tolerance value setting if there is only one link available and recalculate the link with lowest tolerance for each iteration checking. Signed-off-by: Hoang Le <[email protected]> --- net/tipc/node.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index 2afc4f8c37a7..a0e5b2e2720d 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -437,13 +437,14 @@ static struct tipc_node *tipc_node_create(struct net *net, u32 addr, return n; } -static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link *l) +static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link *l, + bool first_link) { unsigned long tol = tipc_link_tolerance(l); unsigned long intv = ((tol / 4) > 500) ? 500 : tol / 4; /* Link with lowest tolerance determines timer interval */ - if (intv < n->keepalive_intv) + if (intv < n->keepalive_intv || first_link || n->link_cnt == 1) n->keepalive_intv = intv; /* Ensure link's abort limit corresponds to current tolerance */ @@ -627,7 +628,9 @@ static void tipc_node_timeout(struct timer_list *t) if (le->link) { spin_lock_bh(&le->lock); /* Link tolerance may change asynchronously: */ - tipc_node_calculate_timer(n, le->link); + tipc_node_calculate_timer(n, le->link, + (bearer_id == 0) ? + true : false); rc = tipc_link_timeout(le->link, &xmitq); spin_unlock_bh(&le->lock); remains--; @@ -1018,7 +1021,8 @@ void tipc_node_check_dest(struct net *net, u32 addr, tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT); le->link = l; n->link_cnt++; - tipc_node_calculate_timer(n, l); + tipc_node_calculate_timer(n, l, n->link_cnt == 1 ? + true : false); if (n->link_cnt == 1) { intv = jiffies + msecs_to_jiffies(n->keepalive_intv); if (!mod_timer(&n->timer, intv)) -- 2.17.1 _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
