Hi Hoang, See below.
> -----Original Message----- > From: Hoang Le <[email protected]> > Sent: 15-Nov-18 04:17 > To: [email protected]; Jon Maloy > <[email protected]>; [email protected]; [email protected] > Subject: RE: [tipc-discussion] [net-next] tipc: fix node keep alive interval > calculation > > Hi Jon, > > I think the issue now is not minor as I concluded. Support we have a huge > cluster 75 VMS node, this issue will caused traffic performance degradation. > So, please take time to review the patch. > > Thanks, > Hoang > -----Original Message----- > From: Hoang Le <[email protected]> > Sent: Monday, November 5, 2018 3:29 PM > To: [email protected]; [email protected]; > [email protected]; [email protected] > Subject: [tipc-discussion] [net-next] tipc: fix node keep alive interval > calculation > > When setting LINK tolerance, node timer interval will be calculated base on > the LINK with lowest tolerance. > > But when calculated, the old node timer interval only updated if current > setting value (tolerance/4) less than old ones regardless of number of links > as > well as links' lowest tolerance value. > > This caused to two cases missing if tolerance changed as following: > Case 1: > 1.1/ There is one link (L1) available in the system 1.2/ Set L1's tolerance > from > 1500ms => lower (i.e 500ms) 1.3/ Then, fallback to default (1500ms) or higher > (i.e 2000ms) > > Expected: > node timer interval is 1500/4=375ms after 1.3 > > Result: > node timer interval will not being updated after changing tolerance at 1.3 > since its value 1500/4=375ms is not less than 500/4=125ms at 1.2. > Yes, this is not good. > Case 2: > 2.1/ There are two links (L1, L2) available in the system 2.2/ L1 and L2 > tolerance value are 2000ms as initial 2.3/ Set L2's tolerance from 2000ms => > lower 1500ms 2.4/ Disable link L2 (bring down its bearer) If we take down the bearer L2 should be deleted, and its value ignored during further calculations. > > Expected: > node timer interval is 2000ms/4=500ms after 2.4 > Similar issue as above. > Result: > node timer interval will not being updated after disabling L2 since its value > 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3 although L2 is > already not available in the system. > > To fix this, we will update node timer interval according to link tolerance > value setting if there is only one link available and recalculate the link > with > lowest tolerance for each iteration checking. I think you can do this much simpler: Just set n->keepalive_intv to a high value, e.g., 10000 ms, before you enter the loop. Then, in any iteration where a lower value is calculated (i.e., inclusive the first one) n->keepalive_intv will be updated accordingly. No more changes needed 😉 Regards ///jon > > Signed-off-by: Hoang Le <[email protected]> > --- > net/tipc/node.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/net/tipc/node.c b/net/tipc/node.c index > 2afc4f8c37a7..a0e5b2e2720d 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -437,13 +437,14 @@ static struct tipc_node *tipc_node_create(struct > net *net, u32 addr, > return n; > } > > -static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link > *l) > +static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link > *l, > + bool first_link) > { > unsigned long tol = tipc_link_tolerance(l); > unsigned long intv = ((tol / 4) > 500) ? 500 : tol / 4; > > /* Link with lowest tolerance determines timer interval */ > - if (intv < n->keepalive_intv) > + if (intv < n->keepalive_intv || first_link || n->link_cnt == 1) > n->keepalive_intv = intv; > > /* Ensure link's abort limit corresponds to current tolerance */ @@ - > 627,7 +628,9 @@ static void tipc_node_timeout(struct timer_list *t) > if (le->link) { > spin_lock_bh(&le->lock); > /* Link tolerance may change asynchronously: */ > - tipc_node_calculate_timer(n, le->link); > + tipc_node_calculate_timer(n, le->link, > + (bearer_id == 0) ? > + true : false); > rc = tipc_link_timeout(le->link, &xmitq); > spin_unlock_bh(&le->lock); > remains--; > @@ -1018,7 +1021,8 @@ void tipc_node_check_dest(struct net *net, u32 > addr, > tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT); > le->link = l; > n->link_cnt++; > - tipc_node_calculate_timer(n, l); > + tipc_node_calculate_timer(n, l, n->link_cnt == 1 ? > + true : false); > if (n->link_cnt == 1) { > intv = jiffies + msecs_to_jiffies(n->keepalive_intv); > if (!mod_timer(&n->timer, intv)) > -- > 2.17.1 > > > > _______________________________________________ > tipc-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/tipc-discussion _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
