[jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802743#comment-17802743
 ] 

Shilun Fan commented on YARN-5465:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Major
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2016-08-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404354#comment-15404354
 ] 

Robert Kanter commented on YARN-5465:
-

I think the second option is better.  Even though updating the timeout of a 
currently decommissioning node is harder, it's at least possible to have 
different sets of decommissioning nodes with different timeouts, which seems 
like a common scenario to me.  The first option doesn't allow you to do this at 
all.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org