[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-11-05 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-6091:
-
Fix Version/s: (was: 3.1.1)
   (was: 3.2.0)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s:   (was: 2.8.5)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6091:
--
Labels: Docker  (was: )

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.5  (was: 2.8.4)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.4  (was: 2.8.3)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-08-29 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.3  (was: 2.8.1)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-08-29 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6091:
--
Attachment: YARN-6091.002.patch

Rebasing patch to trunk

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-04-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6091:
--
Attachment: YARN-6091.001.patch

Uploading a patch that redirects stdout to /dev/null for the popen commands 
whose stdout we don't read. I tested this change locally leveraging the 
test-container-executor program. When you don't redirect stdout you get a 
return 13, which means SIGPIPE. When you do redirect, you get a return value of 
0. 

However, test-container-executor doesn't run without error, so I had to work 
around errors and comment out large pieces of the code to test the relevant 
section of in launch_docker_container_as_user that uses popen()/pclose(). We 
might need a new JIRA to fix test-container-executor. I'm not sure what the 
plan on it is going forward, since it's not a part of the maven testing. At 
this point, it's clearly out of sync so we need to either scrap it or maintain 
it. 

For context, I tried to run test-container-executor on both macOS and rhel7 and 
they both failed in separate ways. 

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Critical
> Attachments: YARN-6091.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-01-17 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6091:
--
Affects Version/s: (was: 2.8.0)
   2.8.1

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Critical
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-01-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.1  (was: 2.8.0)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.0
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Critical
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-01-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Fix Version/s: (was: 2.8.0)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.0
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Critical
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org