[ 
https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-7426:
--------------------------------
    Summary: Interrupt does not work when LocalizerRunner is reading from 
InputStream  (was: Add a finite shell command timeout to ContainerLocalizer)

> Interrupt does not work when LocalizerRunner is reading from InputStream
> ------------------------------------------------------------------------
>
>                 Key: YARN-7426
>                 URL: https://issues.apache.org/jira/browse/YARN-7426
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Critical
>
> When the NodeManager is overloaded and ContainerLocalizer processes are 
> hanging, the containers will timeout and cleaned up. The LocalizerRunner 
> thread will be interrupted during cleanup but the interrupt does not work 
> when it is reading from FileInputStream. LocalizerRunner threads and 
> ContainerLocalizer process keeps on accumulating which makes the node 
> completely unresponsive. We can have a timeout for Shell Command to avoid 
> this similar to HADOOP-13817.
> The timeout value can be set by AM same as container timeout.
> ContainerLocalizer JVM stacktrace:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007fd8ec019000 nid=0xc295 runnable 
> [0x00007fd8f3956000]
>    java.lang.Thread.State: RUNNABLE
>       at java.util.zip.ZipFile.open(Native Method)
>       at java.util.zip.ZipFile.<init>(ZipFile.java:219)
>       at java.util.zip.ZipFile.<init>(ZipFile.java:149)
>       at java.util.jar.JarFile.<init>(JarFile.java:166)
>       at java.util.jar.JarFile.<init>(JarFile.java:103)
>       at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893)
>       at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756)
>       at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838)
>       at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830)
>       at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:803)
>       at sun.misc.URLClassPath$3.run(URLClassPath.java:530)
>       at sun.misc.URLClassPath$3.run(URLClassPath.java:520)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
>       at sun.misc.URLClassPath.getLoader(URLClassPath.java:492)
>       - locked <0x000000076ac75058> (a sun.misc.URLClassPath)
>       at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457)
>       - locked <0x000000076ac75058> (a sun.misc.URLClassPath)
>       at sun.misc.URLClassPath.getResource(URLClassPath.java:211)
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>       - locked <0x000000076ac7f960> (a java.lang.Object)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>       at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}
> NodeManager LocalizerRunner thread which is not interrupted:
> {code}
> "LocalizerRunner for container_e746_1508665985104_601806_01_000005" #3932753 
> prio=5 os_prio=0 tid=0x00007fb258d5f800 nid=0x11091 runnable 
> [0x00007fb153946000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.FileInputStream.readBytes(Native Method)
>         at java.io.FileInputStream.read(FileInputStream.java:255)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>         - locked <0x0000000718502b80> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:161)
>         at java.io.BufferedReader.read1(BufferedReader.java:212)
>         at java.io.BufferedReader.read(BufferedReader.java:286)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>         at org.apache.hadoop.util.Shell.run(Shell.java:848)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
> NM log shows the LocalizerRunner is suppose to 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to