[ 
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663130#comment-15663130
 ] 

Varun Vasudev commented on YARN-5600:
-------------------------------------

Thanks for the patch [~miklos.szeg...@cloudera.com]! My apologies for the late 
comments -

1)
{code}
+        String value = environment
+            .get(YarnConfiguration.DEBUG_NM_CUSTOM_DELETE_DELAY_SEC);
{code}

The environment variable name shouldn’t come from YarnConfiguration, which is 
meant for entries in yarn-site.xml. Please add the variable to the Environment 
class in ApplicationConstants.java

2)
{code}
+  /** Delay before deleting resource to ease debugging of NM issues.
+   * This is a custom key for the environment map in
+   * container launch context*/
+  public static final String DEBUG_NM_CUSTOM_DELETE_DELAY_SEC =
+      YarnConfiguration.NM_PREFIX + "delete.custom-debug-delay-sec";
+  public static final int DEFAULT_DEBUG_NM_CUSTOM_DELETE_DELAY_SEC = 0;
{code}
Not required

3)
{code}
+  /** Maximum delay before deleting the resource.
+   * This setting limits YarnConfiguration.DEFAULT_DEBUG_NM_DELETE_DELAY_SEC
+   * and any container specific delay as well for reliability and security*/
+  public static final String DEBUG_NM_MAX_DELETE_DELAY_SEC =
+      YarnConfiguration.NM_PREFIX + "delete.max-debug-delay-sec";
+  public static final int DEFAULT_DEBUG_NM_MAX_DELETE_DELAY_SEC =
+      -1;
{code}
{code}
+  <property>
+    <description>
+      This is a maximum cap on the following values
+        yarn.nodemanager.delete.debug-delay-sec
+        yarn.nodemanager.delete.custom-debug-delay-sec
+
+      Maximum mumber of seconds after an application finishes before the 
nodemanager's
+      DeletionService will delete the application's localized file directory
+      and log directory. The value -1 means let the service keep the files 
forever,
+      if requested by any of the settings above.
+    </description>
+    <name>yarn.nodemanager.delete.max-debug-delay-sec</name>
+    <value>-1</value>
+  </property>
{code}
The default value for this can’t be -1. This means that users can keep 
resources around forever without admins knowing about it. The default value 
should be 0. Admins should be required to explicitly set the value to a number 
they’re comfortable with.

4)
To a larger point, I don’t think we should provide a keep forever option at 
all. Admins can set the value to a really high number if they desire. The idea 
behind the feature is to help debugging which implies a reasonable time 
window(a few days to a week?) - not sure why we need to keep artifacts around 
forever.

5)
{code}
+    if (debugDelaySec == KEEP_FOREVER &&
+        debugDelayDefaultMax != KEEP_FOREVER) {
+      LOG.warn(String.format("Limiting debug delay %d to %d",
+          debugDelaySec, debugDelayDefaultMax));
+      limitedDelay = debugDelayDefaultMax;
+    } else if (debugDelaySec > debugDelayDefaultMax &&
+        debugDelayDefaultMax != KEEP_FOREVER) {
+      LOG.warn(String.format("Limiting debug delay %d to %d",
+          debugDelaySec, debugDelayDefaultMax));
+      limitedDelay = Math.min(debugDelaySec, debugDelayDefaultMax);
+    }
{code}
Not sure these should be logged as warnings - info might be fine?

6)
{code}
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertNull;
-import static org.junit.Assert.assertTrue;
{code}
Any particular reason for changing the location of the import statements?

7)
{code}
+  private static final FileContext LFS = getLfs();
+  private static FileContext getLfs() {
{code}
{code}
-  private static final Path base =
-    lfs.makeQualified(new Path("target", TestDeletionService.class.getName()));
{code}
Similar to above - any reason for renaming the variables?

> Add a parameter to ContainerLaunchContext to emulate 
> yarn.nodemanager.delete.debug-delay-sec on a per-application basis
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5600
>                 URL: https://issues.apache.org/jira/browse/YARN-5600
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Daniel Templeton
>            Assignee: Miklos Szegedi
>              Labels: oct16-medium
>         Attachments: YARN-5600.000.patch, YARN-5600.001.patch, 
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, 
> YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch, 
> YARN-5600.008.patch, YARN-5600.009.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a 
> parameter to the CLC to allow an application owner to request delayed 
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where 
> ATS is not in use, e.g. branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to