[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-6091: - Fix Version/s: (was: 3.1.1) (was: 3.2.0) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Labels: Docker > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Target Version/s: (was: 2.8.5) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Labels: Docker > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6091: -- Labels: Docker (was: ) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Labels: Docker > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Target Version/s: 2.8.5 (was: 2.8.4) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Target Version/s: 2.8.4 (was: 2.8.3) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Target Version/s: 2.8.3 (was: 2.8.1) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6091: -- Attachment: YARN-6091.002.patch Rebasing patch to trunk > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6091: -- Attachment: YARN-6091.001.patch Uploading a patch that redirects stdout to /dev/null for the popen commands whose stdout we don't read. I tested this change locally leveraging the test-container-executor program. When you don't redirect stdout you get a return 13, which means SIGPIPE. When you do redirect, you get a return value of 0. However, test-container-executor doesn't run without error, so I had to work around errors and comment out large pieces of the code to test the relevant section of in launch_docker_container_as_user that uses popen()/pclose(). We might need a new JIRA to fix test-container-executor. I'm not sure what the plan on it is going forward, since it's not a part of the maven testing. At this point, it's clearly out of sync so we need to either scrap it or maintain it. For context, I tried to run test-container-executor on both macOS and rhel7 and they both failed in separate ways. > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Priority: Critical > Attachments: YARN-6091.001.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-6091: -- Affects Version/s: (was: 2.8.0) 2.8.1 > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Priority: Critical > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Target Version/s: 2.8.1 (was: 2.8.0) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.0 > Environment: CentOS >Reporter: zhengchenyu >Priority: Critical > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6091: - Fix Version/s: (was: 2.8.0) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.0 > Environment: CentOS >Reporter: zhengchenyu >Priority: Critical > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org