Thank Jon for your clarification. I fully agree to your below points.

Regards,
Ying

On 05/29/2018 12:01 AM, Jon Maloy wrote:
> 
> -----Original Message-----
> From: Ying Xue [mailto:[email protected]] 
> Sent: Monday, May 28, 2018 5:54 AM
> To: Mohan Krishna Ghanta Krishnamurthy 
> <[email protected]>; 
> [email protected]; Jon Maloy <[email protected]>; 
> [email protected]
> Subject: Re: [net-next 1/1] tipc: Auto removal of peer down node instance
> 
> On 05/23/2018 10:38 PM, GhantaKrishnamurthy MohanKrishna wrote:
>> A peer node is considered down if there are no active links (or) lost 
>> contact to the node. In current implementation, a peer node instance 
>> is deleted either if
>> a) TIPC module is removed (or)
>> b) Application can use a netlink/iproute2 interface to delete a 
>> specific down node.
>>
>> Thus, a down node instance lives in the system forever, unless the 
>> application explicitly removes it.
>> Per my understanding, this is not a flaw, instead the current behavior might 
>> be
>> a deliberate design. With the current design, once a lost node is recovered 
>> from 
>> dead state within a short period, it's not necessary to allocate its 
>> corresponding node
>> object. As far as I remember, this behavior exists a very long time, at 
>> since 1.7.X
> It is a deliberate design, - an optimization, but made at a time when 
> assumptions were different than they are now. Bare metal clusters used to be 
> much more static than today's VM or container based cluster, so if a node 
> disappeared, we could assume that this was either because of a crash or a 
> deliberate restart, and that it would soon be back.
> In a cloud-based cluster a disappeared node is much more likely to be the 
> result of a scale-in, and will never be back, - at least not with the same 
> identity.
> I think a 5 minutes delay is a reasonable compromise to solve this problem. - 
> If a node is crashing or restarted, it is likely to be back within 5 minutes, 
> otherwise it is most likely to have been permanently removed.
> Also be aware that nothing is lost with this approach. If a crashed node 
> comes back after more than 5 minutes, it will still reconnect as expected.
> 
>> On the contrary, if we automatically delete dead node object when node is 
>> down,
>>  it's not necessary to manually remove it through netlink from user space. 
>> In other
>> words, we think this approach is better, we should obsolete the TIPC 
>> iproute2 command
>> of deleting dead node.
> I think we should keep manual removal, too. If somebody is doing a scale-in, 
> followed by a scale-out, he may not want to be confused by the old nodes 
> lingering around a while after the scale-out, so he may want to explicitly 
> (in his scale-in script) clean out those nodes before going on with the next 
> operation.
> 
> This idea was pre-approved by me (although I haven't looked into the actual 
> patch yet), and is also based on user requests.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to