[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355646#comment-16355646 ] Hudson commented on YARN-6078: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13628 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13628/]) Revert "YARN-6078. Containers stuck in Localizing state. Contributed by (billie: rev 266da25c048aef352cfc7306e44e4d62b21a9e8a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at >
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351420#comment-16351420 ] Billie Rinaldi commented on YARN-6078: -- Thanks [~bibinchundatt]. I've opened YARN-7873 for the revert. If you have any concerns, please comment there as well. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351396#comment-16351396 ] Bibin A Chundatt commented on YARN-6078: [~billie.rinaldi] I think since the Jira is already gone in as part of 3.0.0 i think it better to raise another Jira fix. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349585#comment-16349585 ] Billie Rinaldi commented on YARN-6078: -- [~bibinchundatt] [~djp] I am planning to revert this commit in YARN-7873. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280984#comment-16280984 ] Billie Rinaldi commented on YARN-6078: -- [~bibinchundatt] [~djp] It should be noted that the LocalizerRunner thread in the NM will not actually be able to kill the ContainerLocalizer shell process, because it is running as a different user. However, performing destroy on the process may still have some effect in the LocalizerRunner, since destroy may try to close the stdout/stderr streams in addition to attempting to kill the process. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts.
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252163#comment-16252163 ] Junping Du commented on YARN-6078: -- +1 on branch-2 patch. I have commit the patch to trunk, branch-3.0, branch-2 and branch-2.9. Thanks [~billie.rinaldi] for the patch and [~bibinchundatt] for review! > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1 > > Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, > YARN-6078.002.patch, YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250704#comment-16250704 ] Hadoop QA commented on YARN-6078: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 1s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 16s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | YARN-6078 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12897419/YARN-6078-branch-2.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 07c4c29dc783 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / f894eef | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18471/testReport/ | | Max. process+thread count | 156 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18471/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments:
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250505#comment-16250505 ] Hudson commented on YARN-6078: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13232 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13232/]) YARN-6078. Containers stuck in Localizing state. Contributed by Billie (junping_du: rev e14f03dfbf078de63126a1e882261081b9ec6778) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch, > YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at >
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250478#comment-16250478 ] Junping Du commented on YARN-6078: -- Thanks [~bibinchundatt] for review and comments. +1 on 03 patch as well. Bump up the priority to critical given we have hit this problem with serious impact. Committing. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch, > YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249109#comment-16249109 ] Bibin A Chundatt commented on YARN-6078: Thank you [~billie.rinaldi] for latest patch +1 LGTM for latest patch. [~djp] Any comments from your side?. I will wait for 1-2 days more before commit . > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch, > YARN-6078.003.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248322#comment-16248322 ] Hadoop QA commented on YARN-6078: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 47s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-6078 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12897100/YARN-6078.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5c30e908abfd 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 796a0d3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18438/testReport/ | | Max. process+thread count | 336 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18438/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Containers stuck in Localizing state >
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247029#comment-16247029 ] Bibin A Chundatt commented on YARN-6078: Thank you [~billie.rinaldi] for explanation Some minor comments: # Since we have only one shell instance per {{LocalizerRunner}} we can break the loop once condition is met. {code} 1197for (Shell shell : Shell.getAllShells()) { 1198 try { 1199if (shell.getProcess() != null && 1200shell.getWaitingThread() != null && 1201shell.getWaitingThread().equals(this)) { 1202 LOG.info("Destroying shell process for " + localizerId); 1203 shell.getProcess().destroy(); 1204 destroyedShell = true; 1205} 1206 } catch (Exception e) { 1207LOG.warn("Failed to destroy shell for " + localizerId, e); 1208 } 1209} {code} # Thoughts on double check of {{shell.getProcess().isAlive()}} too in condition ? # Since {{localizerId}} is same as containerId better to change log to {{Destroying localization shell process for + localizerId}} {code} LOG.info("Destroying shell process for " + localizerId); {code} > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) >
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246741#comment-16246741 ] Billie Rinaldi commented on YARN-6078: -- Thanks for taking a look, [~djp]! This is a good question. As far as I can tell, the purpose of the interrupt is to end the LocalizerRunner thread. I have not been able to think of a case where this wouldn't be accomplished by destroying the shell. But in the case where there is no shell found, I think it is safer to keep the interrupt. Perhaps there could be a situation where the LocalizerRunner is interrupted before the shell has been executed. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246713#comment-16246713 ] Junping Du commented on YARN-6078: -- Thanks [~billie.rinaldi] for updating the patch! A quick question here: bq. The new patch only propagates the interrupt when a shell hasn't successfully been destroyed. What's impact for {{super.interrupt();}} in case shell process get destroyed? Like you said, it may prevent the rest of the cleanup from being performed for process destroying. Any side effect if we totally skip this? Other than this question, the patch looks good to me. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch, YARN-6078.002.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245153#comment-16245153 ] Hadoop QA commented on YARN-6078: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 214 unchanged - 0 fixed = 215 total (was 214) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 81m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-6078 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896785/YARN-6078.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6dd257586488 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f2df6b8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18414/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18414/testReport/ | | Max. process+thread count | 321 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240548#comment-16240548 ] Bibin A Chundatt commented on YARN-6078: Thank you [~Prabhu Joseph] for analysis [~billie.rinaldi] Over all approach looks good to me. Additional options by YARN-5641 should help in solving the issue. Is it possible to add testcase in {{TestResourceLocalizationService}} for the same. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Assignee: Billie Rinaldi > Attachments: YARN-6078.001.patch > > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238113#comment-16238113 ] Hadoop QA commented on YARN-6078: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-6078 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895888/YARN-6078.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d9dc69378b78 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c417284 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18339/testReport/ | | Max. process+thread count | 335 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18339/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237879#comment-16237879 ] Billie Rinaldi commented on YARN-6078: -- I found an additional option, which is to destroy the child process as enabled by YARN-5641 and HADOOP-13709. > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish >Priority: Major > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235308#comment-16235308 ] Prabhu Joseph commented on YARN-6078: - We have hit this issue recently. Below are the analysis When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner threads and ContainerLocalizer process keeps on accumulating which makes the node completely unresponsive. There are below options which will help to avoid this: 1. ShellCommandExecutor parseExecResult currently uses blocking read() which can be changed into below to use non blocking available() + sleep for some time. {code} while(running) { if(in.available() > 0) { n = in.read(buffer); //do stuff with the buffer } else { Thread.sleep(500); } } {code} 2. Add a timeout for shell command similar to HADOOP-13817, timeout value can be set by AM same as container timeout. ContainerLocalizer JVM stacktrace: {code} "main" #1 prio=5 os_prio=0 tid=0x7fd8ec019000 nid=0xc295 runnable [0x7fd8f3956000] java.lang.Thread.State: RUNNABLE at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:219) at java.util.zip.ZipFile.(ZipFile.java:149) at java.util.jar.JarFile.(JarFile.java:166) at java.util.jar.JarFile.(JarFile.java:103) at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893) at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838) at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830) at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:803) at sun.misc.URLClassPath$3.run(URLClassPath.java:530) at sun.misc.URLClassPath$3.run(URLClassPath.java:520) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath.getLoader(URLClassPath.java:519) at sun.misc.URLClassPath.getLoader(URLClassPath.java:492) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457) - locked <0x00076ac75058> (a sun.misc.URLClassPath) at sun.misc.URLClassPath.getResource(URLClassPath.java:211) at java.net.URLClassLoader$1.run(URLClassLoader.java:365) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) - locked <0x00076ac7f960> (a java.lang.Object) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) {code} NodeManager LocalizerRunner thread which is not interrupted: {code} "LocalizerRunner for container_e746_1508665985104_601806_01_05" #3932753 prio=5 os_prio=0 tid=0x7fb258d5f800 nid=0x11091 runnable [0x7fb153946000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) - locked <0x000718502bd8> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) - locked <0x000718502bd8> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155) at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960466#comment-15960466 ] Feng Yuan commented on YARN-6078: - The NM stack trace seem indicate the localizerRunner thread stuck at the read pipe from process stdout. Can you make sure the container-localizer process have start? > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6078) Containers stuck in Localizing state
[ https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814343#comment-15814343 ] Naganarasimha G R commented on YARN-6078: - version ? > Containers stuck in Localizing state > > > Key: YARN-6078 > URL: https://issues.apache.org/jira/browse/YARN-6078 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jagadish > > I encountered an interesting issue in one of our Yarn clusters (where the > containers are stuck in localizing phase). > Our AM requests a container, and starts a process using the NMClient. > According to the NM the container is in LOCALIZING state: > {code} > 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] > container.ContainerImpl.handle(ContainerImpl.java:1135) - Container > container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING > 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] > localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711) > - Created localizer for container_e03_1481261762048_0541_02_60 > 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for > container_e03_1481261762048_0541_02_60] > localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191) > - Writing credentials to the nmPrivate file > /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. > Credentials list: > {code} > According to the RM the container is in RUNNING state: > {code} > 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ALLOCATED to ACQUIRED > 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] > rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - > container_e03_1481261762048_0541_02_60 Container Transitioned from > ACQUIRED to RUNNING > {code} > When I click the Yarn RM UI to view the logs for the container, I get an > error > that > {code} > No logs were found. state is LOCALIZING > {code} > The Node manager 's stack trace seems to indicate that the NM's > LocalizerRunner is stuck waiting to read from the sub-process's outputstream. > {code} > "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 > prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable > [0x7fa5076c3000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0xc6dc9c50> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0xc6dc9c78> (a java.io.InputStreamReader) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:568) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113) > {code} > I did a {code}ps aux{code} and confirmed that there was no container-executor > process running with INITIALIZE_CONTAINER that the localizer starts. It seems > that the output stream pipe of the process is still not closed (even though > the localizer process is no longer present). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org