Thank Jon for your clarification. I fully agree to your below points. Regards, Ying
On 05/29/2018 12:01 AM, Jon Maloy wrote: > > -----Original Message----- > From: Ying Xue [mailto:[email protected]] > Sent: Monday, May 28, 2018 5:54 AM > To: Mohan Krishna Ghanta Krishnamurthy > <[email protected]>; > [email protected]; Jon Maloy <[email protected]>; > [email protected] > Subject: Re: [net-next 1/1] tipc: Auto removal of peer down node instance > > On 05/23/2018 10:38 PM, GhantaKrishnamurthy MohanKrishna wrote: >> A peer node is considered down if there are no active links (or) lost >> contact to the node. In current implementation, a peer node instance >> is deleted either if >> a) TIPC module is removed (or) >> b) Application can use a netlink/iproute2 interface to delete a >> specific down node. >> >> Thus, a down node instance lives in the system forever, unless the >> application explicitly removes it. >> Per my understanding, this is not a flaw, instead the current behavior might >> be >> a deliberate design. With the current design, once a lost node is recovered >> from >> dead state within a short period, it's not necessary to allocate its >> corresponding node >> object. As far as I remember, this behavior exists a very long time, at >> since 1.7.X > It is a deliberate design, - an optimization, but made at a time when > assumptions were different than they are now. Bare metal clusters used to be > much more static than today's VM or container based cluster, so if a node > disappeared, we could assume that this was either because of a crash or a > deliberate restart, and that it would soon be back. > In a cloud-based cluster a disappeared node is much more likely to be the > result of a scale-in, and will never be back, - at least not with the same > identity. > I think a 5 minutes delay is a reasonable compromise to solve this problem. - > If a node is crashing or restarted, it is likely to be back within 5 minutes, > otherwise it is most likely to have been permanently removed. > Also be aware that nothing is lost with this approach. If a crashed node > comes back after more than 5 minutes, it will still reconnect as expected. > >> On the contrary, if we automatically delete dead node object when node is >> down, >> it's not necessary to manually remove it through netlink from user space. >> In other >> words, we think this approach is better, we should obsolete the TIPC >> iproute2 command >> of deleting dead node. > I think we should keep manual removal, too. If somebody is doing a scale-in, > followed by a scale-out, he may not want to be confused by the old nodes > lingering around a while after the scale-out, so he may want to explicitly > (in his scale-in script) clean out those nodes before going on with the next > operation. > > This idea was pre-approved by me (although I haven't looked into the actual > patch yet), and is also based on user requests. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
