[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438548#comment-16438548
 ] 

Wangda Tan commented on YARN-8135:
--

And just attached the WIP POC patch (poc.001), I know this is very early and 
incomplete, but want to post it to here to get some feedbacks.

*What it completed:*

1) Run training job (single node).

2) Support user specify docker images.

3) Support DNS for tasks (like worker0.tfjob001..).

4) Support easier access of HDFS.

5) Support GPU isolation.

*What to do next for POC:*

1) Model serving. (WIP).

2) Distributed training. (WIP)

3) Determine development plans.

Will be out for conference in the upcoming week, please expect some delays of 
my responses.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8135.poc.001.patch
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: YARN-8135.poc.001.patch

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8135.poc.001.patch
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438543#comment-16438543
 ] 

Wangda Tan commented on YARN-8135:
--

I just removed some contents from description, and put a link to the on-going 
design doc. 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]

Please feel free to add your thoughts / feedbacks.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

h3. {color:#FF}Please refer to on-going design doc, and add your thoughts: 
{color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Please refer to on-going design doc, and add your thoughts: 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> h3. {color:#FF}Please refer to on-going design doc, and add your 
> thoughts: 
> {color:#33}[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]{color}{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Please refer to on-going design doc, and add your thoughts: 
[https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Please refer to on-going design doc, and add your thoughts: 
> [https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#|https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: (was: image-2018-04-09-14-44-41-101.png)

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: (was: image-2018-04-09-14-35-16-778.png)

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8060) Create default readiness check for service components

2018-04-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438534#comment-16438534
 ] 

genericqa commented on YARN-8060:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 125 unchanged - 3 fixed = 125 total (was 128) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
52s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8060 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment

[jira] [Commented] (YARN-8122) Component health threshold monitor

2018-04-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438519#comment-16438519
 ] 

Billie Rinaldi commented on YARN-8122:
--

Thanks for the patch, [~gsaha]! Overall the design of the feature looks good. I 
still have to do some testing of the patch, but I wanted to give you a few 
minor comments.
* Use the YarnServiceConf getInt and getLong helper methods to initialize the 
new configuration properties
* If YARN-8060 gets committed before this, please add the properties to the 
newly created Component-Level configuration section of the Configurations docs
* It would be good to test the validity of the property values in ServiceApiUtil
* Use YarnServiceConf properties instead of hardcoded strings in the test
* Test comments say 2 secs, but actual setting appears to be 3 secs
* In the log line "Health is going below threshold for the first time" I think 
"for the first time" might be confusing, since it might not be the first time 
it has happened. Perhaps something like "Health is going below threshold, 
starting health threshold timer" or something like that
* The info log "Resetting first occurence to 0" perhaps should be a debug log, 
since the fact that we are using 0 as an internal indicator that the component 
is currently healthy is an implementation detail that the user doesn't need to 
know about
* I like the log format consistency with the Component logging, using the 
prefix [Component {}]. Component uses uppercase [COMPONENT {}], so that would 
be even better

> Component health threshold monitor
> --
>
> Key: YARN-8122
> URL: https://issues.apache.org/jira/browse/YARN-8122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8122.001.patch, YARN-8122.002.patch, 
> YARN-8122.draft.patch
>
>
> Slider supported component health threshold monitoring with SLIDER-1246. It 
> would be good to have this feature for YARN Service too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8060) Create default readiness check for service components

2018-04-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438508#comment-16438508
 ] 

Billie Rinaldi commented on YARN-8060:
--

Thanks for the reviews, [~gsaha] and [~eyang]. I have updated the documentation 
in patch 6.

> Create default readiness check for service components
> -
>
> Key: YARN-8060
> URL: https://issues.apache.org/jira/browse/YARN-8060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8060.1.patch, YARN-8060.2.patch, YARN-8060.3.patch, 
> YARN-8060.4.patch, YARN-8060.5.patch, YARN-8060.6.patch
>
>
> It is currently possible for a component instance to have READY status before 
> the AM retrieves an IP for the container. We should make sure the IP has been 
> retrieved before marking the instance as READY.
> This default probe could also have an option to check for a DNS entry for the 
> instance's hostname if a DNS address is provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8060) Create default readiness check for service components

2018-04-14 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8060:
-
Attachment: YARN-8060.6.patch

> Create default readiness check for service components
> -
>
> Key: YARN-8060
> URL: https://issues.apache.org/jira/browse/YARN-8060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8060.1.patch, YARN-8060.2.patch, YARN-8060.3.patch, 
> YARN-8060.4.patch, YARN-8060.5.patch, YARN-8060.6.patch
>
>
> It is currently possible for a component instance to have READY status before 
> the AM retrieves an IP for the container. We should make sure the IP has been 
> retrieved before marking the instance as READY.
> This default probe could also have an option to check for a DNS entry for the 
> instance's hostname if a DNS address is provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-14 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8142:
-
Fix Version/s: 3.1.1
   3.2.0

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8142.1.patch
>
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7142) Support placement policy in yarn native services

2018-04-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438475#comment-16438475
 ] 

Billie Rinaldi commented on YARN-7142:
--

No problem! Thanks for taking care of that, [~leftnoteasy].

> Support placement policy in yarn native services
> 
>
> Key: YARN-7142
> URL: https://issues.apache.org/jira/browse/YARN-7142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, 
> YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch
>
>
> Placement policy exists in the API but is not implemented yet.
> I have filed YARN-8074 to move the composite constraints implementation out 
> of this phase-1 implementation of placement policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8138) Add unit test to validate queue priority preemption works under node partition.

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8138:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-8159

> Add unit test to validate queue priority preemption works under node 
> partition.
> ---
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Minor
> Attachments: YARN-8138.001.patch, YARN-8138.002.patch, 
> YARN-8138.003.patch
>
>
> Add unit test to validate queue priority preemption works under node 
> partition.
> Test configuration:
>  queue A (capacity=50, priority=1)
>  queue B (capacity=50, priority=2)
>  both have accessible-node-labels set to x
>  A.accessible-node-labels.x.capacity = 50
>  B.accessible-node-labels.x.capacity = 50
>  Along with this pre-emption related properties have been set.
> Test steps:
>  - Submit an application A1 to B, with am-container = container = 4096, no. 
> of containers = 4
>  - Submit an application A2 to A, with am-container = 1024, container = 2048, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5210, no. of 
> containers=NUM_NM
>  - Expectation is that containers are pre-empted from application A2 to A3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8159) [Umbrella] Fixes for Multiple Resource Type Preemption in Capacity Scheduler

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8159:
-
Description: 
There're a couple of JIRAs open for multiple resource types preemption in CS. 
It might be better to group them to make sure everybody is on the same page. 

In addition to that, I don't believe our preemption logic can properly handle 
multiple resource type preemption when YARN-5881 is being used. (Different 
percentage of shares for different resource types). We may need some overhaul 
of the preemption logics for that.

  was:We see a couple of 


> [Umbrella] Fixes for Multiple Resource Type Preemption in Capacity Scheduler
> 
>
> Key: YARN-8159
> URL: https://issues.apache.org/jira/browse/YARN-8159
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Zian Chen
>Priority: Major
>
> There're a couple of JIRAs open for multiple resource types preemption in CS. 
> It might be better to group them to make sure everybody is on the same page. 
> In addition to that, I don't believe our preemption logic can properly handle 
> multiple resource type preemption when YARN-5881 is being used. (Different 
> percentage of shares for different resource types). We may need some overhaul 
> of the preemption logics for that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8159) [Umbrella] Fixes for Multiple Resource Type Preemption in Capacity Scheduler

2018-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8159:
-
Description: We see a couple of 

> [Umbrella] Fixes for Multiple Resource Type Preemption in Capacity Scheduler
> 
>
> Key: YARN-8159
> URL: https://issues.apache.org/jira/browse/YARN-8159
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Zian Chen
>Priority: Major
>
> We see a couple of 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7142) Support placement policy in yarn native services

2018-04-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438438#comment-16438438
 ] 

Billie Rinaldi commented on YARN-7142:
--

YARN-8018 has been committed to branch-3.1 (8118 was a typo), so this patch 
will cherry-pick cleanly now. I will go ahead and do this today.

> Support placement policy in yarn native services
> 
>
> Key: YARN-7142
> URL: https://issues.apache.org/jira/browse/YARN-7142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, 
> YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch
>
>
> Placement policy exists in the API but is not implemented yet.
> I have filed YARN-8074 to move the composite constraints implementation out 
> of this phase-1 implementation of placement policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4931) Preempted resources go back to the same application

2018-04-14 Thread Daniel Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438406#comment-16438406
 ] 

Daniel Li commented on YARN-4931:
-

Hi [~kasha], any update in addressing this issue in FairScheduler? Thanks. 

> Preempted resources go back to the same application
> ---
>
> Key: YARN-4931
> URL: https://issues.apache.org/jira/browse/YARN-4931
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
>Priority: Major
> Attachments: resourcemanager.log
>
>
> Sometimes a queue that needs resources causes preemption - but the preempted 
> containers are just allocated right back to the application that just 
> released them!
> Here is a tiny application (0007) that wants resources, and a container is 
> preempted from application 0002 to satisfy it:
> {code}
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> (FairSchedulerUpdateThread): Should preempt  res for 
> queue root.default: resDueToMinShare = , 
> resDueToFairShare = 
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> (FairSchedulerUpdateThread): Preempting container (prio=1res= vCores:1>) from queue root.milesc
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics
>  (FairSchedulerUpdateThread): Non-AM container preempted, current 
> appAttemptId=appattempt_1460047303577_0002_01, 
> containerId=container_1460047303577_0002_01_001038, resource= vCores:1>
> 2016-04-07 21:08:13,463 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container 
> Transitioned from RUNNING to KILLED
> {code}
> But then a moment later, application 2 gets the container right back:
> {code}
> 2016-04-07 21:08:13,844 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode 
> (ResourceManager Event Processor): Assigned container 
> container_1460047303577_0002_01_001039 of capacity  
> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 
> containers,  used and  
> available after allocation
> 2016-04-07 21:08:14,555 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 
> Container Transitioned from ALLOCATED to ACQUIRED
> 2016-04-07 21:08:14,845 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (ResourceManager Event Processor): container_1460047303577_0002_01_001039 
> Container Transitioned from ACQUIRED to RUNNING
> {code}
> This results in new applications being unable to even get an AM, and never 
> starting at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7810) TestDockerContainerRuntime test failures due to UID lookup of a non-existent user

2018-04-14 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438403#comment-16438403
 ] 

Shane Kumpf commented on YARN-7810:
---

I validated the branch-2 patch against branch-2.9 as well. It applies cleanly 
and tests pass.

> TestDockerContainerRuntime test failures due to UID lookup of a non-existent 
> user
> -
>
> Key: YARN-7810
> URL: https://issues.apache.org/jira/browse/YARN-7810
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.2
>
> Attachments: YARN-7810-branch-2.001.patch, 
> YARN-7810-branch-2.002.patch, YARN-7810-branch-3.0.001.patch, 
> YARN-7810.001.patch, YARN-7810.002.patch
>
>
> YARN-7782 enabled the Docker runtime feature to remap the username to uid:gid 
> form for launching Docker containers. The feature does an {{id -u}} and {{id 
> -G}} to get the UID and GIDs. This fails with the test user, as that user 
> doesn't actually exist on the host.
> {code:java}
> [ERROR] 
> testContainerLaunchWithCustomNetworks(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime)
>   Time elapsed: 0.411 s  <<< ERROR!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  
> ExitCodeException exitCode=1: id: 'run_as_user': no such user
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.getUserIdInfo(DockerLinuxContainerRuntime.java:711)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:757)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks(TestDockerContainerRuntime.java:599){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-04-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438390#comment-16438390
 ] 

genericqa commented on YARN-7996:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
24s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
13s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7996 |
| JIRA 

[jira] [Commented] (YARN-7810) TestDockerContainerRuntime test failures due to UID lookup of a non-existent user

2018-04-14 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438384#comment-16438384
 ] 

Shane Kumpf commented on YARN-7810:
---

Thanks for the report and follow ups [~ebadger], [~eyang], and [~jojochuang]. 
Attached a patch for branch-2.

> TestDockerContainerRuntime test failures due to UID lookup of a non-existent 
> user
> -
>
> Key: YARN-7810
> URL: https://issues.apache.org/jira/browse/YARN-7810
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.2
>
> Attachments: YARN-7810-branch-2.001.patch, 
> YARN-7810-branch-2.002.patch, YARN-7810-branch-3.0.001.patch, 
> YARN-7810.001.patch, YARN-7810.002.patch
>
>
> YARN-7782 enabled the Docker runtime feature to remap the username to uid:gid 
> form for launching Docker containers. The feature does an {{id -u}} and {{id 
> -G}} to get the UID and GIDs. This fails with the test user, as that user 
> doesn't actually exist on the host.
> {code:java}
> [ERROR] 
> testContainerLaunchWithCustomNetworks(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime)
>   Time elapsed: 0.411 s  <<< ERROR!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  
> ExitCodeException exitCode=1: id: 'run_as_user': no such user
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.getUserIdInfo(DockerLinuxContainerRuntime.java:711)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:757)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks(TestDockerContainerRuntime.java:599){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7810) TestDockerContainerRuntime test failures due to UID lookup of a non-existent user

2018-04-14 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-7810:
--
Attachment: YARN-7810-branch-2.002.patch

> TestDockerContainerRuntime test failures due to UID lookup of a non-existent 
> user
> -
>
> Key: YARN-7810
> URL: https://issues.apache.org/jira/browse/YARN-7810
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.2
>
> Attachments: YARN-7810-branch-2.001.patch, 
> YARN-7810-branch-2.002.patch, YARN-7810-branch-3.0.001.patch, 
> YARN-7810.001.patch, YARN-7810.002.patch
>
>
> YARN-7782 enabled the Docker runtime feature to remap the username to uid:gid 
> form for launching Docker containers. The feature does an {{id -u}} and {{id 
> -G}} to get the UID and GIDs. This fails with the test user, as that user 
> doesn't actually exist on the host.
> {code:java}
> [ERROR] 
> testContainerLaunchWithCustomNetworks(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime)
>   Time elapsed: 0.411 s  <<< ERROR!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  
> ExitCodeException exitCode=1: id: 'run_as_user': no such user
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.getUserIdInfo(DockerLinuxContainerRuntime.java:711)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:757)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks(TestDockerContainerRuntime.java:599){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-04-14 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438352#comment-16438352
 ] 

Shane Kumpf commented on YARN-7996:
---

New patch to fix the failing test

> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch, 
> YARN-7996.003.patch, YARN-7996.004.patch, YARN-7996.005.patch, 
> YARN-7996.006.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-04-14 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-7996:
--
Attachment: YARN-7996.006.patch

> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch, 
> YARN-7996.003.patch, YARN-7996.004.patch, YARN-7996.005.patch, 
> YARN-7996.006.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org