[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Yang updated YARN-8403: ---------------------------- Issue Type: Bug (was: Sub-task) Parent: (was: YARN-8472) > Nodemanager logs failed to download file with INFO level > -------------------------------------------------------- > > Key: YARN-8403 > URL: https://issues.apache.org/jira/browse/YARN-8403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8403.001.patch, YARN-8403.002.patch, > YARN-8403.003.patch, YARN-8403.png > > > Some of the container execution related stack traces are printing in INFO or > WARN level. > {code} > 2018-06-06 03:10:40,077 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:writeCredentials(1312)) - Writing > credentials to the nmPrivate file > /grid/0/hadoop/yarn/local/nmPrivate/container_e02_1528246317583_0048_01_000001.tokens > 2018-06-06 03:10:40,087 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(975)) - Failed to download resource { { > hdfs://mycluster.example.com:8020/user/hrt_qa/Streaming/InputDir, > 1528254452720, FILE, null > },pending,[(container_e02_1528246317583_0048_01_000001)],6074418082915225,DOWNLOADING} > org.apache.hadoop.yarn.exceptions.YarnException: Download and unpack failed > at > org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:306) > at > org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:409) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:66) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: > /grid/0/hadoop/yarn/local/filecache/28_tmp/InputDir/input1.txt (Permission > denied) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.<init>(FileOutputStream.java:213) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:236) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:401) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:408) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:399) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:381) > at > org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:298) > ... 9 more > {code} > {code} > 2018-06-06 03:10:41,547 WARN privileged.PrivilegedOperationExecutor > (PrivilegedOperationExecutor.java:executePrivilegedOperation(182)) - > IOException executing command: > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002) > ... 5 more > 2018-06-06 03:10:41,548 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:startLocalizer(407)) - Exit code from container > container_e02_1528246317583_0048_01_000001 startLocalizer is : -1 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > java.io.InterruptedIOException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > Caused by: java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002) > ... 5 more > 2018-06-06 03:10:41,548 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(1249)) - Localizer failed for > container_e02_1528246317583_0048_01_000001 > java.io.IOException: Application application_1528246317583_0048 > initialization failed (exitCode=-1) with output: null > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > java.io.InterruptedIOException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > ... 1 more > Caused by: java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > Caused by: java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002) > ... 5 more > {code} > These logs are only present in NM. ( It does not show up in AM log) > These stacktraces are in WARN or INFO level. Ideally, exception should be > printed in ERROR log level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org