-----Original Message----- From: Ying Xue [mailto:[email protected]] Sent: Monday, May 28, 2018 5:54 AM To: Mohan Krishna Ghanta Krishnamurthy <[email protected]>; [email protected]; Jon Maloy <[email protected]>; [email protected] Subject: Re: [net-next 1/1] tipc: Auto removal of peer down node instance
On 05/23/2018 10:38 PM, GhantaKrishnamurthy MohanKrishna wrote: > A peer node is considered down if there are no active links (or) lost > contact to the node. In current implementation, a peer node instance > is deleted either if > a) TIPC module is removed (or) > b) Application can use a netlink/iproute2 interface to delete a > specific down node. > > Thus, a down node instance lives in the system forever, unless the > application explicitly removes it. > Per my understanding, this is not a flaw, instead the current behavior might > be > a deliberate design. With the current design, once a lost node is recovered > from > dead state within a short period, it's not necessary to allocate its > corresponding node > object. As far as I remember, this behavior exists a very long time, at since > 1.7.X It is a deliberate design, - an optimization, but made at a time when assumptions were different than they are now. Bare metal clusters used to be much more static than today's VM or container based cluster, so if a node disappeared, we could assume that this was either because of a crash or a deliberate restart, and that it would soon be back. In a cloud-based cluster a disappeared node is much more likely to be the result of a scale-in, and will never be back, - at least not with the same identity. I think a 5 minutes delay is a reasonable compromise to solve this problem. - If a node is crashing or restarted, it is likely to be back within 5 minutes, otherwise it is most likely to have been permanently removed. Also be aware that nothing is lost with this approach. If a crashed node comes back after more than 5 minutes, it will still reconnect as expected. > On the contrary, if we automatically delete dead node object when node is > down, > it's not necessary to manually remove it through netlink from user space. In > other > words, we think this approach is better, we should obsolete the TIPC iproute2 > command > of deleting dead node. I think we should keep manual removal, too. If somebody is doing a scale-in, followed by a scale-out, he may not want to be confused by the old nodes lingering around a while after the scale-out, so he may want to explicitly (in his scale-in script) clean out those nodes before going on with the next operation. This idea was pre-approved by me (although I haven't looked into the actual patch yet), and is also based on user requests. Regards ///jon > Jon, > Can you please kindly advise which method is better? > Thanks, >Ying > > We fix this by deleting the nodes which are down for a specified > amount of time (5 minutes). > Existing node supervision timer is used to achieve this. > > Signed-off-by: GhantaKrishnamurthy MohanKrishna > <[email protected]> > --- > net/tipc/node.c | 77 > ++++++++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 66 insertions(+), 11 deletions(-) > > diff --git a/net/tipc/node.c b/net/tipc/node.c index > b71e4e376bb9..e08f78a72f03 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -45,6 +45,7 @@ > #include "netlink.h" > > #define INVALID_NODE_SIG 0x10000 > +#define NODE_CLEANUP_AFTER 0x493E0 > > /* Flags used to take different actions according to flag type > * TIPC_NOTIFY_NODE_DOWN: notify node is down @@ -96,6 +97,7 @@ > struct tipc_bclink_entry { > * @link_id: local and remote bearer ids of changing link, if any > * @publ_list: list of publications > * @rcu: rcu struct for tipc_node > + * @down_at: indicates the time for deleting a down node > */ > struct tipc_node { > u32 addr; > @@ -121,6 +123,7 @@ struct tipc_node { > unsigned long keepalive_intv; > struct timer_list timer; > struct rcu_head rcu; > + unsigned long down_at; > }; > > /* Node FSM states and events: > @@ -160,6 +163,7 @@ static struct tipc_node *tipc_node_find(struct net > *net, u32 addr); static struct tipc_node *tipc_node_find_by_id(struct > net *net, u8 *id); static void tipc_node_put(struct tipc_node *node); > static bool node_is_up(struct tipc_node *n); > +static void tipc_node_delete_from_list(struct tipc_node *node); > > struct tipc_sock_conn { > u32 port; > @@ -369,6 +373,7 @@ static struct tipc_node *tipc_node_create(struct net > *net, u32 addr, > for (i = 0; i < MAX_BEARERS; i++) > spin_lock_init(&n->links[i].lock); > n->state = SELF_DOWN_PEER_LEAVING; > + n->down_at = jiffies + msecs_to_jiffies(NODE_CLEANUP_AFTER); > n->signature = INVALID_NODE_SIG; > n->active_links[0] = INVALID_BEARER_ID; > n->active_links[1] = INVALID_BEARER_ID; @@ -412,11 +417,16 @@ static > void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link *l) > tipc_link_set_abort_limit(l, tol / n->keepalive_intv); } > > -static void tipc_node_delete(struct tipc_node *node) > +static void tipc_node_delete_from_list(struct tipc_node *node) > { > list_del_rcu(&node->list); > hlist_del_rcu(&node->hash); > tipc_node_put(node); > +} > + > +static void tipc_node_delete(struct tipc_node *node) { > + tipc_node_delete_from_list(node); > > del_timer_sync(&node->timer); > tipc_node_put(node); > @@ -523,6 +533,53 @@ void tipc_node_remove_conn(struct net *net, u32 dnode, > u32 port) > tipc_node_put(node); > } > > +static void tipc_node_clear_links(struct tipc_node *node) { > + int i; > + > + for (i = 0; i < MAX_BEARERS; i++) { > + struct tipc_link_entry *le = &node->links[i]; > + > + if (le->link) { > + kfree(le->link); > + le->link = NULL; > + node->link_cnt--; > + } > + } > +} > + > +/* tipc_node_cleanup - delete nodes that does not > + * have active links for NODE_CLEANUP_AFTER time */ static int > +tipc_node_cleanup(struct tipc_node *peer) { > + struct tipc_net *tn = tipc_net(peer->net); > + int err = 0; > + > + spin_lock_bh(&tn->node_list_lock); > + tipc_node_write_lock(peer); > + > + if (node_is_up(peer)) { > + tipc_node_write_unlock(peer); > + err = -EBUSY; > + goto err_out; > + } > + > + if (!time_after(jiffies, peer->down_at)) { > + tipc_node_write_unlock(peer); > + err = -EBUSY; > + goto err_out; > + } > + > + tipc_node_clear_links(peer); > + tipc_node_write_unlock(peer); > + tipc_node_delete_from_list(peer); > + > +err_out: > + spin_unlock_bh(&tn->node_list_lock); > + return err; > +} > + > /* tipc_node_timeout - handle expiration of node timer > */ > static void tipc_node_timeout(struct timer_list *t) @@ -533,6 +590,12 > @@ static void tipc_node_timeout(struct timer_list *t) > int bearer_id; > int rc = 0; > > + if (!tipc_node_cleanup(n)) { > + /*Removing the reference of Timer*/ > + tipc_node_put(n); > + return; > + } > + > __skb_queue_head_init(&xmitq); > > for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) { @@ > -1150,6 +1213,7 @@ static void node_lost_contact(struct tipc_node *n, > uint i; > > pr_debug("Lost contact with %x\n", n->addr); > + n->down_at = jiffies + msecs_to_jiffies(NODE_CLEANUP_AFTER); > > /* Clean up broadcast state */ > tipc_bcast_remove_peer(n->net, n->bc_entry.link); @@ -1719,7 +1783,6 > @@ int tipc_nl_peer_rm(struct sk_buff *skb, struct genl_info *info) > struct tipc_node *peer; > u32 addr; > int err; > - int i; > > /* We identify the peer by its net */ > if (!info->attrs[TIPC_NLA_NET]) > @@ -1754,15 +1817,7 @@ int tipc_nl_peer_rm(struct sk_buff *skb, struct > genl_info *info) > goto err_out; > } > > - for (i = 0; i < MAX_BEARERS; i++) { > - struct tipc_link_entry *le = &peer->links[i]; > - > - if (le->link) { > - kfree(le->link); > - le->link = NULL; > - peer->link_cnt--; > - } > - } > + tipc_node_clear_links(peer); > tipc_node_write_unlock(peer); > tipc_node_delete(peer); > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
