date:20180202

[jira] [Updated] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7757:
--
Attachment: YARN-7757-YARN-3409.006.patch

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983
 ] 

Weiwei Yang commented on YARN-7757:
---

Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly 
introduced another abstract class layer {{NodeLabelsProvider}} and 
{{NodeAttributesProvider}}, to avoid a potential typing mis-match while 
initializing by reflection. Other improvements [~Naganarasimha] mentioned we 
agree to postpone to individual jiras so we can get this blocker done first. 
Some details for reference:

bq. It looks like it can create only one of the provider, either for labels or 
for attributes. I think we need to explicitly support for both.

This will be done in YARN-7871

bq. multi scripts for different types of attributes

Our configuration doesn't allow to configure multiple scripts now, it will fail 
on script verification. Right now we do not see a need to support this, but we 
can revisit if necessary.

bq. Comments over NodeManager and NodeStatusUpdate

Addressed in v6 patch.

bq. verifyConfiguredScript seems to be out of place

Right now the verifyConfiguredScript is only used by scripted based providers, 
lets keep it for now. If further we see it can be reused in some place else, we 
can pull it out.

bq. serviceStart needs to capture that taskInterval needs to be set before the 
service is started

It is initiated with -1 value, and gets override by particular provider. 

bq.  Lets use scheduledexecutorservice instead of timer task ...

We have agreed on this, but since this is not a work of refactoring, we agreed 
to open another lower priority JIRA to track.

bq. output format of ScriptBasedNodeAttributesProvider

This will need to be taken care of by YARN-7871 once we decided the finalized 
format of the attributes and conventions. This also depends on YARN-7856.

Hope this addresses everything so far.
Thanks.

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983
 ] 

Weiwei Yang edited comment on YARN-7757 at 2/2/18 8:46 AM:
---

Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly 
introduced another abstract class layer {{NodeLabelsProvider}} and 
{{NodeAttributesProvider}}, to avoid a potential typing mis-match while 
initializing by reflection. Other improvements [~Naganarasimha] mentioned we 
agree to postpone to individual jiras so we can get this blocker done first. 
Some details for reference:

bq. It looks like it can create only one of the provider, either for labels or 
for attributes. I think we need to explicitly support for both.

This will be done in YARN-7871

bq. multi scripts for different types of attributes

Our configuration doesn't allow to configure multiple scripts now, it will fail 
on script verification. Right now we do not see a need to support this, but we 
can revisit if necessary. We will also make sure this is documented properly in 
YARN-7865.

bq. Comments over NodeManager and NodeStatusUpdate

Addressed in v6 patch.

bq. verifyConfiguredScript seems to be out of place

Right now the verifyConfiguredScript is only used by scripted based providers, 
lets keep it for now. If further we see it can be reused in some place else, we 
can pull it out.

bq. serviceStart needs to capture that taskInterval needs to be set before the 
service is started

It is initiated with -1 value, and gets override by particular provider. 

bq.  Lets use scheduledexecutorservice instead of timer task ...

We have agreed on this, but since this is not a work of refactoring, we agreed 
to open another lower priority JIRA to track.

bq. output format of ScriptBasedNodeAttributesProvider

This will need to be taken care of by YARN-7871 once we decided the finalized 
format of the attributes and conventions. This also depends on YARN-7856.

Hope this addresses everything so far.
Thanks.


was (Author: cheersyang):
Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly 
introduced another abstract class layer {{NodeLabelsProvider}} and 
{{NodeAttributesProvider}}, to avoid a potential typing mis-match while 
initializing by reflection. Other improvements [~Naganarasimha] mentioned we 
agree to postpone to individual jiras so we can get this blocker done first. 
Some details for reference:

bq. It looks like it can create only one of the provider, either for labels or 
for attributes. I think we need to explicitly support for both.

This will be done in YARN-7871

bq. multi scripts for different types of attributes

Our configuration doesn't allow to configure multiple scripts now, it will fail 
on script verification. Right now we do not see a need to support this, but we 
can revisit if necessary.

bq. Comments over NodeManager and NodeStatusUpdate

Addressed in v6 patch.

bq. verifyConfiguredScript seems to be out of place

Right now the verifyConfiguredScript is only used by scripted based providers, 
lets keep it for now. If further we see it can be reused in some place else, we 
can pull it out.

bq. serviceStart needs to capture that taskInterval needs to be set before the 
service is started

It is initiated with -1 value, and gets override by particular provider. 

bq.  Lets use scheduledexecutorservice instead of timer task ...

We have agreed on this, but since this is not a work of refactoring, we agreed 
to open another lower priority JIRA to track.

bq. output format of ScriptBasedNodeAttributesProvider

This will need to be taken care of by YARN-7871 once we decided the finalized 
format of the attributes and conventions. This also depends on YARN-7856.

Hope this addresses everything so far.
Thanks.

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

--

[jira] [Updated] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7757:
--
Attachment: nodeLabelsProvider_refactor_v3.pdf

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983
 ] 

Weiwei Yang edited comment on YARN-7757 at 2/2/18 8:50 AM:
---

Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly 
introduced another abstract class layer {{NodeLabelsProvider}} and 
{{NodeAttributesProvider}}, to avoid a potential typing mis-match while 
initializing by reflection, hierarchy see  
[^nodeLabelsProvider_refactor_v3.pdf]  . Other improvements [~Naganarasimha] 
mentioned we agree to postpone to individual jiras so we can get this blocker 
done first. Some details for reference:

bq. It looks like it can create only one of the provider, either for labels or 
for attributes. I think we need to explicitly support for both.

This will be done in YARN-7871

bq. multi scripts for different types of attributes

Our configuration doesn't allow to configure multiple scripts now, it will fail 
on script verification. Right now we do not see a need to support this, but we 
can revisit if necessary. We will also make sure this is documented properly in 
YARN-7865.

bq. Comments over NodeManager and NodeStatusUpdate

Addressed in v6 patch.

bq. verifyConfiguredScript seems to be out of place

Right now the verifyConfiguredScript is only used by scripted based providers, 
lets keep it for now. If further we see it can be reused in some place else, we 
can pull it out.

bq. serviceStart needs to capture that taskInterval needs to be set before the 
service is started

It is initiated with -1 value, and gets override by particular provider. 

bq.  Lets use scheduledexecutorservice instead of timer task ...

We have agreed on this, but since this is not a work of refactoring, we agreed 
to open another lower priority JIRA to track.

bq. output format of ScriptBasedNodeAttributesProvider

This will need to be taken care of by YARN-7871 once we decided the finalized 
format of the attributes and conventions. This also depends on YARN-7856.

Hope this addresses everything so far.
Thanks.


was (Author: cheersyang):
Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly 
introduced another abstract class layer {{NodeLabelsProvider}} and 
{{NodeAttributesProvider}}, to avoid a potential typing mis-match while 
initializing by reflection. Other improvements [~Naganarasimha] mentioned we 
agree to postpone to individual jiras so we can get this blocker done first. 
Some details for reference:

bq. It looks like it can create only one of the provider, either for labels or 
for attributes. I think we need to explicitly support for both.

This will be done in YARN-7871

bq. multi scripts for different types of attributes

Our configuration doesn't allow to configure multiple scripts now, it will fail 
on script verification. Right now we do not see a need to support this, but we 
can revisit if necessary. We will also make sure this is documented properly in 
YARN-7865.

bq. Comments over NodeManager and NodeStatusUpdate

Addressed in v6 patch.

bq. verifyConfiguredScript seems to be out of place

Right now the verifyConfiguredScript is only used by scripted based providers, 
lets keep it for now. If further we see it can be reused in some place else, we 
can pull it out.

bq. serviceStart needs to capture that taskInterval needs to be set before the 
service is started

It is initiated with -1 value, and gets override by particular provider. 

bq.  Lets use scheduledexecutorservice instead of timer task ...

We have agreed on this, but since this is not a work of refactoring, we agreed 
to open another lower priority JIRA to track.

bq. output format of ScriptBasedNodeAttributesProvider

This will need to be taken care of by YARN-7871 once we decided the finalized 
format of the attributes and conventions. This also depends on YARN-7856.

Hope this addresses everything so far.
Thanks.

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can r

[jira] [Commented] (YARN-7841) Cleanup AllocationFileLoaderService's reloadAllocations method

2018-02-02 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350170#comment-16350170
 ] 

Gergo Repas commented on YARN-7841:
---

+1 (non-binding)

Since this is a big piece of refactoring, I think a branch-2 version of the 
patch would be also good to have.

> Cleanup AllocationFileLoaderService's reloadAllocations method
> --
>
> Key: YARN-7841
> URL: https://issues.apache.org/jira/browse/YARN-7841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-7841-001.patch, YARN-7841-002.patch
>
>
> AllocationFileLoaderService's reloadAllocations method is too complex. 
> Please refactor / cleanup this method to be more simple to understand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185

2018-02-02 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350189#comment-16350189
 ] 

Gergo Repas commented on YARN-7876:
---

Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it 
would be better to use that constant in the new section in RunJar.java.

> Workaround ZipInputStream limitation for YARN-2185
> --
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7876.000.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185

2018-02-02 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350189#comment-16350189
 ] 

Gergo Repas edited comment on YARN-7876 at 2/2/18 11:30 AM:


Thanks [~miklos.szeg...@cloudera.com] for the patch.

Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it 
would be better to use that constant in the new section in RunJar.java.


was (Author: grepas):
Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it 
would be better to use that constant in the new section in RunJar.java.

> Workaround ZipInputStream limitation for YARN-2185
> --
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7876.000.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7841) Cleanup AllocationFileLoaderService's reloadAllocations method

2018-02-02 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350171#comment-16350171
 ] 

Szilard Nemeth commented on YARN-7841:
--

[~grepas]

Thanks for the review!

> Cleanup AllocationFileLoaderService's reloadAllocations method
> --
>
> Key: YARN-7841
> URL: https://issues.apache.org/jira/browse/YARN-7841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-7841-001.patch, YARN-7841-002.patch
>
>
> AllocationFileLoaderService's reloadAllocations method is too complex. 
> Please refactor / cleanup this method to be more simple to understand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350141#comment-16350141
 ] 

genericqa commented on YARN-7757:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
49s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
42s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
3s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
12s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
28s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3409 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 269 unchanged - 20 fixed = 271 total (was 289) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7757 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908943/YARN-7757-YARN-3409.006.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xm

[jira] [Created] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Shane Kumpf (JIRA)

Shane Kumpf created YARN-7879:
-

 Summary: NM user is unable to access the application filecache due 
to permissions
 Key: YARN-7879
 URL: https://issues.apache.org/jira/browse/YARN-7879
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


I noticed the following log entries where localization was being retried on 
several MR AM files. 
{code}
2018-02-02 02:53:02,905 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
 Resource 
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
 is missing, localizing it again
2018-02-02 02:53:42,908 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
 Resource 
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
 is missing, localizing it again
{code}

The cluster is configured to use LCE and 
{{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set 
to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has a 
umask of {{0002}}. The cluser is configured with 
{{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
produces the same results.
{code}
[hadoopuser@y7001 ~]$ umask
0002
[hadoopuser@y7001 ~]$ id
uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
{code}

The cause of the log entry was tracked down a simple !file.exists call in 
{{LocalResourcesTrackerImpl#isResourcePresent}}.
{code}
  public boolean isResourcePresent(LocalizedResource rsrc) {
boolean ret = true;
if (rsrc.getState() == ResourceState.LOCALIZED) {
  File file = new File(rsrc.getLocalPath().toUri().getRawPath().
toString());
  if (!file.exists()) {
ret = false;
  } else if (dirsHandler != null) {
ret = checkLocalResource(rsrc);
  }
}
return ret;
  }
{code}

The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
being retried are in the filecache. The directories in the filecache are all 
owned by the local-user's primary group and 700 perms, which makes it 
unreadable by the {{yarn}} user.
{code}
[root@y7001 ~]# ls -la 
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
total 0
drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
{code}

I saw YARN-5287, but that appears to be related to a restrictive umask and the 
usercache itself. I was unable to locate any other known issues that seemed 
relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5028) RMStateStore should trim down app state for completed applications

2018-02-02 Thread Gergo Repas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated YARN-5028:
--
Attachment: YARN-5028.001.patch

> RMStateStore should trim down app state for completed applications
> --
>
> Key: YARN-5028
> URL: https://issues.apache.org/jira/browse/YARN-5028
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-5028.000.patch, YARN-5028.001.patch
>
>
> RMStateStore stores enough information to recover applications in case of a 
> restart. The store also retains this information for completed applications 
> to serve their status to REST, WebUI, Java and CLI clients. We don't need all 
> the information we store today to serve application status; for instance, we 
> don't need the {{ApplicationSubmissionContext}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2018-02-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350340#comment-16350340
 ] 

Shane Kumpf commented on YARN-6456:
---

[~miklos.szeg...@cloudera.com] - I believe YARN-7815 will address #1 and 
YARN-7814 removes the automatic mounting for #2. Should we re-purpose this 
issue to focus on #3 and make it a subtask of YARN-3611? Thanks.

> Isolation of Docker containers In LinuxContainerExecutor
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5028) RMStateStore should trim down app state for completed applications

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350349#comment-16350349
 ] 

genericqa commented on YARN-5028:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  5m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908982/YARN-5028.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19577/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RMStateStore should trim down app state for completed applications
> --
>
> Key: YARN-5028
> URL: https://issues.apache.org/jira/browse/YARN-5028
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-5028.000.patch, YARN-5028.001.patch
>
>
> RMStateStore stores enough information to recover applications in case of a 
> restart. The store also retains this information for completed applications 
> to serve their status to REST, WebUI, Java and CLI clients. We don't need all 
> the information we store today to serve application status; for instance, we 
> don't need the {{ApplicationSubmissionContext}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350353#comment-16350353
 ] 

Jason Lowe commented on YARN-7677:
--

I realize now that the theoretical example cannot work in practice.  In order 
for there to be a "hook" variable for the user to leverage, the variable would 
need to have escaped variable expansion by the shell when it was originally 
set.  The variable would need to be set in the NM's environment like, 
JAVA_HOME="/some/node/path/\$JDKVER".  While that could be a valid path for the 
user when it is expanded in the container launch script, it is not a valid 
setting for JAVA_HOME in the nodemanager itself.  NM whitelist variables are 
going to be variables coming from a shell environment and not from XML property 
settings, so it's highly unlikely they will retain unexpanded variable 
references.

In short, I'm cool with simply placing the NM whitelist variables first and 
simplifying YARN-5714 to list the variables in the launch script in the order 
they appear in their corresponding configuration properties.  My apologies for 
the detour.


> Docker image cannot set HADOOP_CONF_DIR
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Eric Badger
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-7677.001.patch, YARN-7677.002.patch
>
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5714) ContainerExecutor does not order environment map

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350359#comment-16350359
 ] 

Jason Lowe commented on YARN-5714:
--

On second thought, it's extremely unlikely that NM whitelist variables could 
reference user variables.  Details are in [this 
comment|https://issues.apache.org/jira/browse/YARN-7677?focusedCommentId=16350353&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16350353]
 on YARN-7667.  I think we'll be fine if we make sure the NM whitelist 
inherited variables appear first in the launch script then followed by the 
user's variables in the order they are specified in the container launch 
context.  YARN-7667 should be taking care of the NM whitelist variables, so 
this JIRA can tackle ordering the user's variables.


> ContainerExecutor does not order environment map
> 
>
> Key: YARN-5714
> URL: https://issues.apache.org/jira/browse/YARN-5714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1
> Environment: all (linux and windows alike)
>Reporter: Remi Catherinot
>Assignee: Remi Catherinot
>Priority: Trivial
>  Labels: oct16-medium
> Attachments: YARN-5714.001.patch, YARN-5714.002.patch, 
> YARN-5714.003.patch, YARN-5714.004.patch, YARN-5714.005.patch, 
> YARN-5714.006.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> when dumping the launch container script, environment variables are dumped 
> based on the order internally used by the map implementation (hash based). It 
> does not take into consideration that some env varibales may refer each 
> other, and so that some env variables must be declared before those 
> referencing them.
> In my case, i ended up having LD_LIBRARY_PATH which was depending on 
> HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a 
> wrong value and so native libraries weren't loaded. jobs were running but not 
> at their best efficiency. This is just a use case falling into that bug, but 
> i'm sure others may happen as well.
> I already have a patch running in my production environment, i just estimate 
> to 5 days for packaging the patch in the right fashion for JIRA + try my best 
> to add tests.
> Note : the patch is not OS aware with a default empty implementation. I will 
> only implement the unix version on a 1st release. I'm not used to windows env 
> variables syntax so it will take me more time/research for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7838) Support AND/OR constraints in Distributed Shell

2018-02-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7838:
--
Description: 
Extending DS placement spec syntax to support AND/OR constraints, something like

{code}
// simple
-placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
// nested
-placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar))
{code}

  was:
Extending DS placement spec syntax to support AND/OR constraints, something like

{code}
-placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
{code}


> Support AND/OR constraints in Distributed Shell
> ---
>
> Key: YARN-7838
> URL: https://issues.apache.org/jira/browse/YARN-7838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> Extending DS placement spec syntax to support AND/OR constraints, something 
> like
> {code}
> // simple
> -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
> // nested
> -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7838) Support AND/OR constraints in Distributed Shell

2018-02-02 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7838:
--
Attachment: YARN-7838.prelim.patch

> Support AND/OR constraints in Distributed Shell
> ---
>
> Key: YARN-7838
> URL: https://issues.apache.org/jira/browse/YARN-7838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7838.prelim.patch
>
>
> Extending DS placement spec syntax to support AND/OR constraints, something 
> like
> {code}
> // simple
> -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
> // nested
> -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls

2018-02-02 Thread Jiandan Yang (JIRA)

Jiandan Yang  created YARN-7880:
---

 Summary: FiCaSchedulerApp.commonCheckContainerAllocation throws 
NPE when running sls
 Key: YARN-7880
 URL: https://issues.apache.org/jira/browse/YARN-7880
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jiandan Yang 


18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-7839:
-
Description: 
Currently, the Algorithm assigns a node to a request purely based on if the 
constraints are met. It is later in the scheduling phase that the Queue 
capacity and Node capacity are checked. If the request cannot be placed because 
of unavailable Queue/Node capacity, the request is retried by the Algorithm.

For clusters that are running at high utilization, we can reduce the retries if 
we perform the Node capacity check in the Algorithm as well. The Queue capacity 
check and the other user limit checks can still be handled by the scheduler 
(since queues and other limits are tied to the scheduler, and not scheduler 
agnostic)

  was:
Currently, the Algorithm assigns a node to a requests purely based on if the 
constraints are met. It is later in the scheduling phase that the Queue 
capacity and Node capacity are checked. If the request cannot be placed because 
of unavailable Queue/Node capacity, the request is retried by the Algorithm.

For clusters that are running at high utilization, we can reduce the retries if 
we perform the Node capacity check in the Algorithm as well. The Queue capacity 
check and the other user limit checks can still be handled by the scheduler 
(since queues and other limits are tied to the scheduler, and not scheduler 
agnostic)


> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR

2018-02-02 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350368#comment-16350368
 ] 

Billie Rinaldi commented on YARN-7677:
--

That sounds like a good approach, NM vars followed by preserving the order of 
the user variables. I'd prefer if the NM vars included all the ones defined by 
the NM (see ContainerLaunch.sanitizeEnv), not just the whitelist vars.

> Docker image cannot set HADOOP_CONF_DIR
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Eric Badger
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-7677.001.patch, YARN-7677.002.patch
>
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls

2018-02-02 Thread Jiandan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}

  was:
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)


> FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
> ---
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jiandan Yang 
>Priority: Major
>
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers

2018-02-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350375#comment-16350375
 ] 

Shane Kumpf commented on YARN-7815:
---

The localization issue appears to be unrelated. I see the same without the 
patch. I've opened YARN-7879 to track that issue. Doing the final testing now 
for this patch and will have it posted shortly.

> Mount the filecache as read-only in Docker containers
> -
>
> Key: YARN-7815
> URL: https://issues.apache.org/jira/browse/YARN-7815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>
> Currently, when using the Docker runtime, the filecache directories are 
> mounted read-write into the Docker containers. Read write access is not 
> necessary. We should make this more restrictive by changing that mount to 
> read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls

2018-02-02 Thread Jiandan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}

  was:
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)

        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}


> FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
> ---
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jiandan Yang 
>Priority: Major
>
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7838) Support AND/OR constraints in Distributed Shell

2018-02-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350372#comment-16350372
 ] 

Weiwei Yang commented on YARN-7838:
---

Hello [~asuresh]

Today I spent a few hours working on this one, to be able to support composite 
and nested constraints in DS, I think today's approach in PlacementSpec is not 
flexible. So I created a parser class {{PlacementConstraintParser}}. This is a 
prelim patch, please take a look and let me know your feedback.

My thought is we can use such parser class to further support specifying 
expressions while submit an app, similar format like in DS. So app could be 
easier to use this feature without modifying client code.

Once you agree with this approach, I can go on working on a formal patch.

Thank you.

> Support AND/OR constraints in Distributed Shell
> ---
>
> Key: YARN-7838
> URL: https://issues.apache.org/jira/browse/YARN-7838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7838.prelim.patch
>
>
> Extending DS placement spec syntax to support AND/OR constraints, something 
> like
> {code}
> // simple
> -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
> // nested
> -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-7879:


 Assignee: Jason Lowe
Affects Version/s: 3.1.0
 Priority: Critical  (was: Major)
 Target Version/s: 3.1.0

We hit this before, and it was fixed in YARN-1386 by adding group execute 
permissions to the directories in the user's filecache.  I think it could be 
YARN-2185 which added more restrictive permissions on some directories during 
localization.  I'll run some quick tests locally to verify.

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7876) Localized jars that are expanded during localization are not fully copied

2018-02-02 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7876:
-
Affects Version/s: 3.1.0
 Target Version/s: 3.1.0
 Priority: Blocker  (was: Major)
  Summary: Localized jars that are expanded during localization are 
not fully copied  (was: Workaround ZipInputStream limitation for YARN-2185)

> Localized jars that are expanded during localization are not fully copied
> -
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7815) Mount the filecache as read-only in Docker containers

2018-02-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-7815:
--
Attachment: YARN-7815.001.patch

> Mount the filecache as read-only in Docker containers
> -
>
> Key: YARN-7815
> URL: https://issues.apache.org/jira/browse/YARN-7815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7815.001.patch
>
>
> Currently, when using the Docker runtime, the filecache directories are 
> mounted read-write into the Docker containers. Read write access is not 
> necessary. We should make this more restrictive by changing that mount to 
> read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned YARN-7839:


Assignee: Panagiotis Garefalakis

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers

2018-02-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350424#comment-16350424
 ] 

Shane Kumpf commented on YARN-7815:
---

Attached a patch that implements the proposal. Given I had to touch a bulk of 
the test methods in {{TestDockerContainerRuntime}}, I went ahead a cleaned up 
some warnings and unused code as well. If you'd prefer that clean up be moved 
to a separate patch, I can do so.

> Mount the filecache as read-only in Docker containers
> -
>
> Key: YARN-7815
> URL: https://issues.apache.org/jira/browse/YARN-7815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7815.001.patch
>
>
> Currently, when using the Docker runtime, the filecache directories are 
> mounted read-write into the Docker containers. Read write access is not 
> necessary. We should make this more restrictive by changing that mount to 
> read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-7839:
-
Attachment: YARN-7839-YARN-6592.001.patch

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496
 ] 

Panagiotis Garefalakis commented on YARN-7839:
--

 

Submitting a simple patch tracking available cluster resources in the 
DefaultPlacement algorithm - to support capacity check before placement.

The actual check is part of the attemptPlacementOnNode method which could be 
configured with the **ignoreResourceCheck** flag.

In the current patch the check is enabled on placement step and disabled on the 
validation step.

A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to 
keep track of the failed attempts on the rejected SchedulingRequests.

 

Thoughts?  [~asuresh] [~kkaranasos] [~cheersyang] 

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR

2018-02-02 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350500#comment-16350500
 ] 

Jim Brennan commented on YARN-7677:
---

Thanks everyone!  I will work on a new patch using this approach.


> Docker image cannot set HADOOP_CONF_DIR
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Eric Badger
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-7677.001.patch, YARN-7677.002.patch
>
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496
 ] 

Panagiotis Garefalakis edited comment on YARN-7839 at 2/2/18 3:13 PM:
--

 

Submitting a simple patch tracking available cluster resources in the 
DefaultPlacement algorithm - to support capacity check before placement.

The actual check is part of the attemptPlacementOnNode method which could be 
configured with the *ignoreResourceCheck* flag.

In the current patch the check is enabled on placement step and disabled on the 
validation step.

A wrapper class *SchedulingRequestWithPlacementAttempt* was also introduced to 
keep track of the failed attempts on the rejected SchedulingRequests.

 

Thoughts?  [~asuresh] [~kkaranasos] [~cheersyang] 


was (Author: pgaref):
 

Submitting a simple patch tracking available cluster resources in the 
DefaultPlacement algorithm - to support capacity check before placement.

The actual check is part of the attemptPlacementOnNode method which could be 
configured with the **ignoreResourceCheck** flag.

In the current patch the check is enabled on placement step and disabled on the 
validation step.

A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to 
keep track of the failed attempts on the rejected SchedulingRequests.

 

Thoughts?  [~asuresh] [~kkaranasos] [~cheersyang] 

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7832) Logs page does not work for Running applications

2018-02-02 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-7832.
---
Resolution: Not A Problem

Thanks [~yeshavora] for confirming, This is working fine with Combine System 
Metric Publisher mode

> Logs page does not work for Running applications
> 
>
> Key: YARN-7832
> URL: https://issues.apache.org/jira/browse/YARN-7832
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Yesha Vora
>Assignee: Sunil G
>Priority: Critical
> Attachments: Screen Shot 2018-01-26 at 3.28.40 PM.png, 
> YARN-7832.001.patch
>
>
> Scenario
>  * Run yarn service application
>  * When application is Running, go to log page
>  * Select AttemptId and Container Id
> Logs are not showed on UI. It complains "No log data available!"
>  
> Here 
> [http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358]
>  API fails with 500 Internal Server Error.
> {"exception":"WebApplicationException","message":"java.io.IOException: 
> ","javaClassName":"javax.ws.rs.WebApplicationException"}
> {code:java}
> GET 
> http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358
>  500 (Internal Server Error)
> (anonymous) @ VM779:1
> send @ vendor.js:572
> ajax @ vendor.js:548
> (anonymous) @ vendor.js:5119
> initializePromise @ vendor.js:2941
> Promise @ vendor.js:3005
> ajax @ vendor.js:5117
> ajax @ yarn-ui.js:1
> superWrapper @ vendor.js:1591
> query @ vendor.js:5112
> ember$data$lib$system$store$finders$$_query @ vendor.js:5177
> query @ vendor.js:5334
> fetchLogFilesForContainerId @ yarn-ui.js:132
> showLogFilesForContainerId @ yarn-ui.js:126
> run @ vendor.js:648
> join @ vendor.js:648
> run.join @ vendor.js:1510
> closureAction @ vendor.js:1865
> trigger @ vendor.js:302
> (anonymous) @ vendor.js:339
> each @ vendor.js:61
> each @ vendor.js:51
> trigger @ vendor.js:339
> d.select @ vendor.js:5598
> (anonymous) @ vendor.js:5598
> d.invoke @ vendor.js:5598
> d.trigger @ vendor.js:5598
> e.trigger @ vendor.js:5598
> (anonymous) @ vendor.js:5598
> d.invoke @ vendor.js:5598
> d.trigger @ vendor.js:5598
> (anonymous) @ vendor.js:5598
> dispatch @ vendor.js:306
> elemData.handle @ vendor.js:281{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350564#comment-16350564
 ] 

Jason Lowe commented on YARN-7879:
--

This was caused by YARN-2185.  That change locked down the top-level directory 
for a non-public localized file to 0700 which prevents the nodemanager user 
from checking file presence on secure clusters.


> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350571#comment-16350571
 ] 

genericqa commented on YARN-7815:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 3 new + 90 unchanged - 0 fixed = 93 total (was 90) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 40s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 14s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7815 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908989/YARN-7815.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 87aec99b0bcb 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4aef8bd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19578/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19578/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hado

[jira] [Updated] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7879:
-
Attachment: YARN-7879.001.patch

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7866) [UI2] Kerberizing the UI doesn't give any warning or content when UI is accessed without kinit

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350610#comment-16350610
 ] 

genericqa commented on YARN-7866:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
27m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7866 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908607/YARN-7866.001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux 29183bd0f4f9 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4aef8bd |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 314 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19581/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Kerberizing the UI doesn't give any warning or content when UI is 
> accessed without kinit
> --
>
> Key: YARN-7866
> URL: https://issues.apache.org/jira/browse/YARN-7866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7866.001.patch
>
>
> Handle 401 error and show in UI
> credit to [~ssath...@hortonworks.com] for finding  this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7881) Add Log Aggregation Status API to the RM Webservice

2018-02-02 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated YARN-7881:

Description: The old YARN UI has a page: 
/cluster/logaggregationstatus/\{app_id} which shows the log aggregation status 
for all the nodes that run containers for the given application. In order to 
add a similar page to the new YARN UI we need to add an RM WS endpoint first.   
(was: The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which 
shows the log aggregation status for all the nodes that run containers for the 
given application. This information is not yet available by the RM Rest API.)

> Add Log Aggregation Status API to the RM Webservice
> ---
>
> Key: YARN-7881
> URL: https://issues.apache.org/jira/browse/YARN-7881
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Major
>
> The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which 
> shows the log aggregation status for all the nodes that run containers for 
> the given application. In order to add a similar page to the new YARN UI we 
> need to add an RM WS endpoint first. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7881) Add Log Aggregation Status API to the RM Webservice

2018-02-02 Thread JIRA

Gergely Novák created YARN-7881:
---

 Summary: Add Log Aggregation Status API to the RM Webservice
 Key: YARN-7881
 URL: https://issues.apache.org/jira/browse/YARN-7881
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: yarn
Reporter: Gergely Novák
Assignee: Gergely Novák


The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which shows 
the log aggregation status for all the nodes that run containers for the given 
application. This information is not yet available by the RM Rest API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350628#comment-16350628
 ] 

Jason Lowe commented on YARN-7879:
--

I also manually tested the patch on a secure cluster and verified non-private 
resources are not re-localized with each application.

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7868) Provide improved error message when YARN service is disabled

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350632#comment-16350632
 ] 

Eric Yang commented on YARN-7868:
-

[~csingh] Thank you for reviewing the patch.

[~jianhe] Thanks for the review.  The message might be inaccurate for 
multi-users environment where an end user doesn't have system admin rights to 
enable the service.  This is where the message would shows up the most, if 
system admin intentionally disabled this feature.  Therefore, I prefer to omit 
this message to prevent noise generation.

> Provide improved error message when YARN service is disabled
> 
>
> Key: YARN-7868
> URL: https://issues.apache.org/jira/browse/YARN-7868
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7868.001.patch
>
>
> Some YARN CLI command will throw verbose error message when YARN service is 
> disabled.  The error message looks like this:
> {code}
> Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity
> SEVERE: A message body reader for Java class 
> org.apache.hadoop.yarn.service.api.records.ServiceStatus, and Java type class 
> org.apache.hadoop.yarn.service.api.records.ServiceStatus, and MIME media type 
> application/octet-stream was not found
> Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity
> SEVERE: The registered message body readers compatible with the MIME media 
> type are:
> application/octet-stream ->
>   com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
>   com.sun.jersey.core.impl.provider.entity.FileProvider
>   com.sun.jersey.core.impl.provider.entity.InputStreamProvider
>   com.sun.jersey.core.impl.provider.entity.DataSourceProvider
>   com.sun.jersey.core.impl.provider.entity.RenderedImageProvider
> */* ->
>   com.sun.jersey.core.impl.provider.entity.FormProvider
>   com.sun.jersey.core.impl.provider.entity.StringProvider
>   com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
>   com.sun.jersey.core.impl.provider.entity.FileProvider
>   com.sun.jersey.core.impl.provider.entity.InputStreamProvider
>   com.sun.jersey.core.impl.provider.entity.DataSourceProvider
>   com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.ReaderProvider
>   com.sun.jersey.core.impl.provider.entity.DocumentProvider
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader
>   com.sun.jersey.json.impl.provider.entity.JSONJAXBElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONArrayProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONObjectProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General
>   com.sun.jersey.core.impl.provider.entity.EntityHolderReader
>   com.sun.jersey.json.impl.provider.entity.JSONRootElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONListElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JacksonProviderProxy
>   com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider
> 2018-01-31 16:24:46,415 ERROR client.ApiServiceClient: 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7881) Add Log Aggregation Status API to the RM Webservice

2018-02-02 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated YARN-7881:

Attachment: YARN-7881.001.patch

> Add Log Aggregation Status API to the RM Webservice
> ---
>
> Key: YARN-7881
> URL: https://issues.apache.org/jira/browse/YARN-7881
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-7881.001.patch
>
>
> The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which 
> shows the log aggregation status for all the nodes that run containers for 
> the given application. In order to add a similar page to the new YARN UI we 
> need to add an RM WS endpoint first. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation

2018-02-02 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350635#comment-16350635
 ] 

Sunil G commented on YARN-7850:
---

[~GergelyNovak] Change looks fine to me.

One dbt, when log aggregation is yet to start, we ll show the status without 
any style, correct?

> New UI does not show status for Log Aggregation
> ---
>
> Key: YARN-7850
> URL: https://issues.apache.org/jira/browse/YARN-7850
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Attachments: Screen Shot 2018-02-01 at 11.37.30.png, 
> YARN-7850.001.patch
>
>
> The status of Log Aggregation is not specified any where.
> New UI should show the Log aggregation status for finished application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation

2018-02-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350642#comment-16350642
 ] 

Gergely Novák commented on YARN-7850:
-

We show it with the default style (grey). 

> New UI does not show status for Log Aggregation
> ---
>
> Key: YARN-7850
> URL: https://issues.apache.org/jira/browse/YARN-7850
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Attachments: Screen Shot 2018-02-01 at 11.37.30.png, 
> YARN-7850.001.patch
>
>
> The status of Log Aggregation is not specified any where.
> New UI should show the Log aggregation status for finished application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation

2018-02-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350625#comment-16350625
 ] 

Gergely Novák commented on YARN-7850:
-

Created [YARN-7881|https://issues.apache.org/jira/browse/YARN-7881].

> New UI does not show status for Log Aggregation
> ---
>
> Key: YARN-7850
> URL: https://issues.apache.org/jira/browse/YARN-7850
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Attachments: Screen Shot 2018-02-01 at 11.37.30.png, 
> YARN-7850.001.patch
>
>
> The status of Log Aggregation is not specified any where.
> New UI should show the Log aggregation status for finished application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350648#comment-16350648
 ] 

genericqa commented on YARN-7879:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7879 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908999/YARN-7879.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b15ac493c66c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4aef8bd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19582/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19582/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NM user is unable to access the application filecache due to permissions
>

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350649#comment-16350649
 ] 

genericqa commented on YARN-7839:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-6592 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} YARN-6592 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 37s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 11s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7839 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908993/YARN-7839-YARN-6592.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9cb62d03926c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-6592 / 8df7666 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19579/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19579/testReport/ |
| Max. process+thread count | 866 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemana

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350656#comment-16350656
 ] 

Sunil G commented on YARN-7839:
---

bq.despite the naming, as far as I know, the candidateNodeSet is currently 
always only a single node

[~kkaranasos] and [~asuresh] for multi node, CandidateNodeSet was ideal 
interface to extend for. So multiple nodes could come in tat iterator.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5028) RMStateStore should trim down app state for completed applications

2018-02-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350701#comment-16350701
 ] 

genericqa commented on YARN-5028:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
57s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 76 unchanged - 0 fixed = 78 total (was 76) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 11s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementProcessor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-5028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908982/YARN-5028.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 119bf6b51613 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4aef8bd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19580/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/jo

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350731#comment-16350731
 ] 

Arun Suresh commented on YARN-7839:
---

[~sunilg], regarding the {{CandidateNodeSet}}, lets move the discussion to when 
we refactor the {{AppSchedulingInfo}} - since this patch is isolated to the 
algorithm.

[~kkaranasos] comment:
bq. However, what about the case that a node seems full but a container is 
about to finish (and will be finished until the allocate is done)? Should we 
completely reject such nodes, or simply give higher priority to nodes that 
already have available resources?
We are not rejecting those resources. If a Scheduling request cannot be 
satisfied by any node in the algorithm round, it will be retried in the next AM 
heartbeat - and hopefully some of those containers would complete by then. We 
can set the retry to a higher value for clusters that are running at a higher 
utilization.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350732#comment-16350732
 ] 

Arun Suresh commented on YARN-7839:
---

Thanks for the patch [~pgaref]
It looks pretty straight forward to me. +1 will commit this shortly.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7876) Localized jars that are expanded during localization are not fully copied

2018-02-02 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7876:
-
Attachment: YARN-7876.001.patch

> Localized jars that are expanded during localization are not fully copied
> -
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7876) Localized jars that are expanded after localization are not fully copied

2018-02-02 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7876:
-
Summary: Localized jars that are expanded after localization are not fully 
copied  (was: Localized jars that are expanded during localization are not 
fully copied)

> Localized jars that are expanded after localization are not fully copied
> 
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350762#comment-16350762
 ] 

Miklos Szegedi commented on YARN-7876:
--

Thank you [~grepas] for the review and [~jlowe] for updating the title. I 
refined the title a little bit since what happens is that the localized and 
extracted files should be there. The patch does not change that. What might be 
truncated is the jar that is left around for compatibility reasons. If the job 
tries to extract this with zip for example, it might run into an issue.

> Localized jars that are expanded after localization are not fully copied
> 
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7876) Localized jars that are expanded after localization are not fully copied

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350762#comment-16350762
 ] 

Miklos Szegedi edited comment on YARN-7876 at 2/2/18 6:19 PM:
--

Thank you [~grepas] for the review and [~jlowe] for updating the title. I 
refined the title a little bit since what happens is that the localized and 
extracted files should be there. The patch does not change that. What might be 
truncated is the jar that is left around for compatibility reasons. If the job 
tries to extract this with zip for example, it could run into an issue.


was (Author: miklos.szeg...@cloudera.com):
Thank you [~grepas] for the review and [~jlowe] for updating the title. I 
refined the title a little bit since what happens is that the localized and 
extracted files should be there. The patch does not change that. What might be 
truncated is the jar that is left around for compatibility reasons. If the job 
tries to extract this with zip for example, it might run into an issue.

> Localized jars that are expanded after localization are not fully copied
> 
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7839) Modify PlacementAlgorithm to Check node capacity before placing request on node

2018-02-02 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7839:
--
Summary: Modify PlacementAlgorithm to Check node capacity before placing 
request on node  (was: Check node capacity before placing in the Algorithm)

> Modify PlacementAlgorithm to Check node capacity before placing request on 
> node
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7791) Support submit intra-app placement constraint in Distributed Shell to AppPlacementAllocator

2018-02-02 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7791:
--
Environment: (was: Set 
{{yarn.resourcemanager.placement-constraints.enabled}} to {{false}}
Submit a job with placement constraint spec, e.g

{code}
in/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar 
-shell_command sleep -shell_args 30 -num_containers 2 -master_memory 500 
-container_memory 200 -placement_spec foo=4,NOTIN,NODE,foo
{code}

got following errors in RM log

{noformat}
Exception message:As of now, the only accepted target key for targetKey of 
allocation_tag target expression is: [yarn_application_label/%intra_app%]. 
Please make changes to placement constraints accordingly.
{noformat}

Looks like DS needs some modification to support submitting proper scheduling 
requests to app placement allocators.)

> Support submit intra-app placement constraint in Distributed Shell to 
> AppPlacementAllocator
> ---
>
> Key: YARN-7791
> URL: https://issues.apache.org/jira/browse/YARN-7791
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Weiwei Yang
>Assignee: Sunil G
>Priority: Major
>  Labels: distributedshell
> Attachments: YARN-7791-YARN-6592.001.patch
>
>
> Set {{yarn.resourcemanager.placement-constraints.enabled}} to {{false}}
> Submit a job with placement constraint spec, e.g
> {code}
> in/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar
>  -shell_command sleep -shell_args 30 -num_containers 2 -master_memory 500 
> -container_memory 200 -placement_spec foo=4,NOTIN,NODE,foo
> {code}
> got following errors in RM log
> {noformat}
> Exception message:As of now, the only accepted target key for targetKey of 
> allocation_tag target expression is: [yarn_application_label/%intra_app%]. 
> Please make changes to placement constraints accordingly.
> {noformat}
> Looks like DS needs some modification to support submitting proper scheduling 
> requests to app placement allocators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350775#comment-16350775
 ] 

Eric Yang commented on YARN-7879:
-

Are we conformable in assuming all files in filecache are world readable?  In 
health care, and financial industry, user's default umask is set to 027.  Can 
there be private files that are exposed as result of the umask change?  Should 
we check every file, or assume that pipe archive always expands properly with 
single checksum file test?  Would it be possible to make this detection using 
privileged access and report back to nodemanager to trigger reinitialization?

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7857) -fstack-check compilation flag causes binary incompatibility for container-executor between RHEL 6 and RHEL 7

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350781#comment-16350781
 ] 

Miklos Szegedi commented on YARN-7857:
--

Thank you, [~Jim_Brennan] for the patch. I am a little bit concerned that we 
sacrifice security over compatibility. Since RHEL7 code does not run on RHEL6 
anyways for glibc compatibility issues, would it make sense to keep the stack 
check code for RHEL7 and above? I checked the RHEL74 stack guard code and it 
seems to be much more precise than the one in the previous version.

> -fstack-check compilation flag causes binary incompatibility for 
> container-executor between RHEL 6 and RHEL 7
> -
>
> Key: YARN-7857
> URL: https://issues.apache.org/jira/browse/YARN-7857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-7857.001.patch
>
>
> The segmentation fault in container-executor reported in [YARN-7796]  appears 
> to be due to a binary compatibility issue with the {{-fstack-check}} flag 
> that was added in [YARN-6721]
> Based on my testing, a container-executor (without the patch from 
> [YARN-7796]) compiled on RHEL 6 with the -fstack-check flag always hits this 
> segmentation fault when run on RHEL 7.  But if you compile without this flag, 
> the container-executor runs on RHEL 7 with no problems.  I also verified this 
> with a simple program that just does the copy_file.
> I think we need to either remove this flag, or find a suitable alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2018-02-02 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-6456:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-3611

> Isolation of Docker containers In LinuxContainerExecutor
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350787#comment-16350787
 ] 

Miklos Szegedi commented on YARN-6456:
--

Sure, I made it as a subtask. Are you referring to #3 as this? "Maybe the 
container directories could be outside the application directory."

> Isolation of Docker containers In LinuxContainerExecutor
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350794#comment-16350794
 ] 

Jason Lowe commented on YARN-7879:
--

bq. Are we conformable in assuming all files in filecache are world readable?

Non-public localized files are not world readable.  The top-level directory of 
the user's filecache directory is mode 0710 with the NM group, so only the user 
and those in the NM's group can see files are there regardless of what the 
permissions of the underlying paths are.  The localized files themselves are 
mode 0500.  In short, the NM user can see that a file is there but cannot read 
it.

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7838) Support AND/OR constraints in Distributed Shell

2018-02-02 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350798#comment-16350798
 ] 

Arun Suresh commented on YARN-7838:
---

Thanks for taking a stab at this [~cheersyang].

Yup, I agree we need a more flexible parser - the placementspec parser I put in 
was just for some adhoc testing :)
Couple of comments:
# do we need a tryParse ? Either we are able to parse or an exception is thrown 
right ?
# The {{toInt}} should be static
# I am assuming your {{shouldHaveNext}} is more like an assert - Maybe make 
that static as well, and
# In the final implementation, we have to ensure that it accepts a 
placementspec string WITHOUT any and/or as well.

> Support AND/OR constraints in Distributed Shell
> ---
>
> Key: YARN-7838
> URL: https://issues.apache.org/jira/browse/YARN-7838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7838.prelim.patch
>
>
> Extending DS placement spec syntax to support AND/OR constraints, something 
> like
> {code}
> // simple
> -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar)
> // nested
> -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-02 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350831#comment-16350831
 ] 

Yufei Gu commented on YARN-7655:


The patch looks good to me generally. Only some nits:
# Add Java doc to method {{identifyContainersToPreempt()}} to indicates that 
preemption will try to meet locality first no matter resource request relax on 
it or not, and there is an exception for AM containers.
# Solve some style issues in the test class.

> avoid AM preemption caused by RRs for specific nodes or racks
> -
>
> Key: YARN-7655
> URL: https://issues.apache.org/jira/browse/YARN-7655
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-7655-001.patch
>
>
> We frequently see AM preemptions when 
> {{starvedApp.getStarvedResourceRequests()}} in 
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs 
> that request containers on a specific node. Since this causes us to only 
> consider one node to preempt containers on, the really good work that was 
> done in YARN-5830 doesn't save us from AM preemption. Even though there might 
> be multiple nodes on which we could preempt enough non-AM containers to 
> satisfy the app's starvation, we often wind up preempting one or more AM 
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM 
> containers for an RR that specifies a node or rack, then we should instead 
> expand the search space to consider all nodes. That way we take advantage of 
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a 
> patch with an initial implementation of this. We've been running it on a few 
> clusters, and have seen AM preemptions drop from double-digit occurrences on 
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is 
> less likely to be allocated resources at the most specific locality level 
> that it asked for. My opinion is that this tradeoff is worth it, but 
> interested to hear what others think as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350832#comment-16350832
 ] 

Jason Lowe commented on YARN-7876:
--

Right, thanks for clarifying the title further.

+1 lgtm pending Jenkins.

> Localized jars that are expanded after localization are not fully copied
> 
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7857) -fstack-check compilation flag causes binary incompatibility for container-executor between RHEL 6 and RHEL 7

2018-02-02 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350833#comment-16350833
 ] 

Jim Brennan commented on YARN-7857:
---

Thanks [~miklos.szeg...@cloudera.com]! That is a good suggestion. I will need 
to do some more investigation - I would think the key factor is the GCC 
version, not the OS version.

> -fstack-check compilation flag causes binary incompatibility for 
> container-executor between RHEL 6 and RHEL 7
> -
>
> Key: YARN-7857
> URL: https://issues.apache.org/jira/browse/YARN-7857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-7857.001.patch
>
>
> The segmentation fault in container-executor reported in [YARN-7796]  appears 
> to be due to a binary compatibility issue with the {{-fstack-check}} flag 
> that was added in [YARN-6721]
> Based on my testing, a container-executor (without the patch from 
> [YARN-7796]) compiled on RHEL 6 with the -fstack-check flag always hits this 
> segmentation fault when run on RHEL 7.  But if you compile without this flag, 
> the container-executor runs on RHEL 7 with no problems.  I also verified this 
> with a simple program that just does the copy_file.
> I think we need to either remove this flag, or find a suitable alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7732) Support Generic AM Simulator from SynthGenerator

2018-02-02 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-7732:
-
Attachment: YARN-7732.04.patch

> Support Generic AM Simulator from SynthGenerator
> 
>
> Key: YARN-7732
> URL: https://issues.apache.org/jira/browse/YARN-7732
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Attachments: YARN-7732-YARN-7798.01.patch, 
> YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, 
> YARN-7732.03.patch, YARN-7732.04.patch
>
>
> Extract the MapReduce specific set-up in the SLSRunner into the 
> MRAMSimulator, and enable support for pluggable AMSimulators.
> Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, 
> for example startAMFromSynthGenerator() calls this:
>  
> {code:java}
> runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId,
> jobStartTimeMS, jobFinishTimeMS, containerList, reservationId,
> job.getDeadline(), getAMContainerResource(null));
> {code}
> where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce"
> The container set up was also only suitable for mapreduce: 
>  
> {code:java}
> Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 
> EndFragment:12474 StartSelection:03700 EndSelection:12464 
> SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
>  
> // map tasks
> for (int i = 0; i < job.getNumberMaps(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(new ContainerSimulator(containerResource,
>   containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map"));
> }
> // reduce tasks
> for (int i = 0; i < job.getNumberReduces(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(
>   new ContainerSimulator(containerResource, containerLifeTime,
>   hostname, DEFAULT_REDUCER_PRIORITY, "reduce"));
> }
> {code}
>  
> In addition, the syn.json format supported only mapreduce (the parameters 
> were very specific: mtime, rtime, mtasks, rtasks, etc..).
> This patch aims to introduce a new syn.json format that can describe generic 
> jobs, and the SLS setup required to support the synth generation of generic 
> jobs.
> See syn_generic.json for an equivalent of the previous syn.json in the new 
> format.
> Using the new generic format, we describe a StreamAMSimulator simulates a 
> long running streaming service that maintains N number of containers for the 
> lifetime of the AM. See syn_stream.json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7868) Provide improved error message when YARN service is disabled

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350851#comment-16350851
 ] 

Eric Yang commented on YARN-7868:
-

[~jianhe] Thank you for the commit.

> Provide improved error message when YARN service is disabled
> 
>
> Key: YARN-7868
> URL: https://issues.apache.org/jira/browse/YARN-7868
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-7868.001.patch
>
>
> Some YARN CLI command will throw verbose error message when YARN service is 
> disabled.  The error message looks like this:
> {code}
> Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity
> SEVERE: A message body reader for Java class 
> org.apache.hadoop.yarn.service.api.records.ServiceStatus, and Java type class 
> org.apache.hadoop.yarn.service.api.records.ServiceStatus, and MIME media type 
> application/octet-stream was not found
> Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity
> SEVERE: The registered message body readers compatible with the MIME media 
> type are:
> application/octet-stream ->
>   com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
>   com.sun.jersey.core.impl.provider.entity.FileProvider
>   com.sun.jersey.core.impl.provider.entity.InputStreamProvider
>   com.sun.jersey.core.impl.provider.entity.DataSourceProvider
>   com.sun.jersey.core.impl.provider.entity.RenderedImageProvider
> */* ->
>   com.sun.jersey.core.impl.provider.entity.FormProvider
>   com.sun.jersey.core.impl.provider.entity.StringProvider
>   com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
>   com.sun.jersey.core.impl.provider.entity.FileProvider
>   com.sun.jersey.core.impl.provider.entity.InputStreamProvider
>   com.sun.jersey.core.impl.provider.entity.DataSourceProvider
>   com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.ReaderProvider
>   com.sun.jersey.core.impl.provider.entity.DocumentProvider
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader
>   com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader
>   com.sun.jersey.json.impl.provider.entity.JSONJAXBElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONArrayProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONObjectProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General
>   com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General
>   com.sun.jersey.core.impl.provider.entity.EntityHolderReader
>   com.sun.jersey.json.impl.provider.entity.JSONRootElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JSONListElementProvider$General
>   com.sun.jersey.json.impl.provider.entity.JacksonProviderProxy
>   com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider
> 2018-01-31 16:24:46,415 ERROR client.ApiServiceClient: 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator

2018-02-02 Thread Young Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350853#comment-16350853
 ] 

Young Chen commented on YARN-7732:
--

Added back compatibility with JobStory and JobStoryProducer interfaces for 
gridmix integration in [^YARN-7732.04.patch]

> Support Generic AM Simulator from SynthGenerator
> 
>
> Key: YARN-7732
> URL: https://issues.apache.org/jira/browse/YARN-7732
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Minor
> Attachments: YARN-7732-YARN-7798.01.patch, 
> YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, 
> YARN-7732.03.patch, YARN-7732.04.patch
>
>
> Extract the MapReduce specific set-up in the SLSRunner into the 
> MRAMSimulator, and enable support for pluggable AMSimulators.
> Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, 
> for example startAMFromSynthGenerator() calls this:
>  
> {code:java}
> runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId,
> jobStartTimeMS, jobFinishTimeMS, containerList, reservationId,
> job.getDeadline(), getAMContainerResource(null));
> {code}
> where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce"
> The container set up was also only suitable for mapreduce: 
>  
> {code:java}
> Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 
> EndFragment:12474 StartSelection:03700 EndSelection:12464 
> SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
>  
> // map tasks
> for (int i = 0; i < job.getNumberMaps(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(new ContainerSimulator(containerResource,
>   containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map"));
> }
> // reduce tasks
> for (int i = 0; i < job.getNumberReduces(); i++) {
>   TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0);
>   RMNode node =
>   nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size(
>   .getNode();
>   String hostname = "/" + node.getRackName() + "/" + node.getHostName();
>   long containerLifeTime = tai.getRuntime();
>   Resource containerResource =
>   Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(),
>   (int) tai.getTaskInfo().getTaskVCores());
>   containerList.add(
>   new ContainerSimulator(containerResource, containerLifeTime,
>   hostname, DEFAULT_REDUCER_PRIORITY, "reduce"));
> }
> {code}
>  
> In addition, the syn.json format supported only mapreduce (the parameters 
> were very specific: mtime, rtime, mtasks, rtasks, etc..).
> This patch aims to introduce a new syn.json format that can describe generic 
> jobs, and the SLS setup required to support the synth generation of generic 
> jobs.
> See syn_generic.json for an equivalent of the previous syn.json in the new 
> format.
> Using the new generic format, we describe a StreamAMSimulator simulates a 
> long running streaming service that maintains N number of containers for the 
> lifetime of the AM. See syn_stream.json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-02-02 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7626:

Attachment: YARN-7626.006.patch

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7882) Server side proxy for UI2 log viewer

2018-02-02 Thread Eric Yang (JIRA)

Eric Yang created YARN-7882:
---

 Summary: Server side proxy for UI2 log viewer
 Key: YARN-7882
 URL: https://issues.apache.org/jira/browse/YARN-7882
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security, timelineserver, yarn-ui-v2
Affects Versions: 3.0.0
Reporter: Eric Yang


When viewing container logs in UI2, the log files are directly fetched through 
timeline server 2.  Hadoop in simple security mode does not have authenticator 
to make sure the user is authorized to view the log.  The general practice is 
to use knox or other security proxy to authenticate the user and reserve proxy 
the request to Hadoop UI to ensure the information does not leak through 
anonymous user.  The current implementation of UI2 log viewer uses ajax code to 
timeline server 2.  This could prevent knox or reverse proxy software from 
working properly with the new design.  It would be good to perform server side 
proxy to prevent browser from side step the authentication check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-02-02 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350873#comment-16350873
 ] 

Zian Chen commented on YARN-7626:
-

Update patch 006 per Miklos's suggestions. [~leftnoteasy] ,[~sunilg] , could 
you please help review the latest patch? Thanks!

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied

2018-02-02 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350895#comment-16350895
 ] 

Robert Kanter commented on YARN-7876:
-

+1 LGTM pending Jenkins

Thanks for adding the directory to the unit test like we discussed offline.

> Localized jars that are expanded after localization are not fully copied
> 
>
> Key: YARN-7876
> URL: https://issues.apache.org/jira/browse/YARN-7876
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7876.000.patch, YARN-7876.001.patch
>
>
> YARN-2185 added the ability to localize jar files as a stream instead of 
> copying to local disk and then extracting. ZipInputStream does not need the 
> end of the file. Let's read it out. This helps with an additional 
> TeeInputStream on the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2018-02-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350904#comment-16350904
 ] 

Shane Kumpf commented on YARN-6456:
---

Thanks [~miklos.szeg...@cloudera.com].
{quote}Are you referring to #3 as this? "Maybe the container directories could 
be outside the application directory."
{quote}
I was referring the #3 in the Description. 
{quote}3. There is no way to enforce exclusive use of Docker for all 
containers. There should be an option that it is not the user but the admin 
that requires to use Docker.
{quote}

> Isolation of Docker containers In LinuxContainerExecutor
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7720) Race condition between second app attempt and UAM timeout when first attempt node is down

2018-02-02 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7720:
---
Summary: Race condition between second app attempt and UAM timeout when 
first attempt node is down  (was: [Federation] Race condition between second 
app attempt and UAM timeout when first attempt node is down)

> Race condition between second app attempt and UAM timeout when first attempt 
> node is down
> -
>
> Key: YARN-7720
> URL: https://issues.apache.org/jira/browse/YARN-7720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> In Federation, multiple attempts of an application share the same UAM in each 
> secondary sub-cluster. When first attempt fails, we reply on the fact that 
> secondary RM won't kill the existing UAM before the AM heartbeat timeout 
> (default at 10 min). When second attempt comes up in the home sub-cluster, it 
> will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to 
> secondary RMs. 
> The default heartbeat timeout for NM and AM are both 10 mins. The problem is 
> that when the first attempt node goes down or out of connection, only after 
> 10 mins will the home RM mark the first attempt as failed, and then schedule 
> the 2nd attempt in some other node. By then the UAMs in secondaries are 
> already timing out, and they might not survive until the second attempt comes 
> up. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350938#comment-16350938
 ] 

Chandni Singh commented on YARN-7572:
-

 Do we want something like this? Due to space constraints the service row is 
not aligned
{code:java}
Service: 
URI   Name  ID Artifact ID Launch TimeNum Containers   
State   Lifetime
  app-1 application_1503358878042_00113600
Components: 
Name   Artifact ID   Launch CommandNum ContainersState
simple   sleep 36002 FLEXING
master   sleep 36001 FLEXING
worker   sleep 36005 FLEXING

{code}
 

> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350942#comment-16350942
 ] 

Miklos Szegedi commented on YARN-6456:
--

Sure, I would keep the description just for context but let this Jira cover 
only 3. above.

> Isolation of Docker containers In LinuxContainerExecutor
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7881) Add Log Aggregation Status API to the RM Webservice

2018-02-02 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350954#comment-16350954
 ] 

Giovanni Matteo Fumarola commented on YARN-7881:


Thanks [~GergelyNovak] for the patch. Can you please insert the function in the 
\{{RMWebServiceProtocol}}, and override it in \{{RMWebServices}}?

> Add Log Aggregation Status API to the RM Webservice
> ---
>
> Key: YARN-7881
> URL: https://issues.apache.org/jira/browse/YARN-7881
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-7881.001.patch
>
>
> The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which 
> shows the log aggregation status for all the nodes that run containers for 
> the given application. In order to add a similar page to the new YARN UI we 
> need to add an RM WS endpoint first. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350938#comment-16350938
 ] 

Chandni Singh edited comment on YARN-7572 at 2/2/18 9:37 PM:
-

 Do we want something like this? Due to space constraints the service row is 
not aligned
{code:java}
Service: 
URI   Name  ID Artifact ID  Launch TimeNum Containers   State   Lifetime
  app-1 application_1503358878042_00113600
Components: 
Name   Artifact ID   Launch CommandNum ContainersState
simple   sleep 36002 FLEXING
master   sleep 36001 FLEXING
worker   sleep 36005 FLEXING

{code}
 


was (Author: csingh):
 Do we want something like this? Due to space constraints the service row is 
not aligned
{code:java}
Service: 
URI   Name  ID Artifact ID Launch TimeNum Containers   
State   Lifetime
  app-1 application_1503358878042_00113600
Components: 
Name   Artifact ID   Launch CommandNum ContainersState
simple   sleep 36002 FLEXING
master   sleep 36001 FLEXING
worker   sleep 36005 FLEXING

{code}
 

> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350974#comment-16350974
 ] 

Vinod Kumar Vavilapalli commented on YARN-7572:
---

This is a generally better presentation than the first one.

After running this command, what command do I run to get status per-component 
inside a specific service?

> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350981#comment-16350981
 ] 

Eric Yang commented on YARN-7879:
-

[~jlowe] Thank you for the reply.  We are allowing file cache to be mounted in 
docker container as read only in YARN-7815.  The risk of exposing filename is 
marginally small, but I like to confirm that is not a problem even the filename 
contains sensitive information exposed in docker containers.  Is it possible to 
use 750 and group is owned by NM's group?  

Can cache directory contain subdirectories to prevent this arrangement from 
working?


> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350994#comment-16350994
 ] 

Eric Yang commented on YARN-7572:
-

Num Container, State and Lifetime on the Service line seems to have missing 
information.  How about formatted JSON?  Most modern software like AWS, MongoDB 
uses formatted JSON output as default.  This reduces the chance of having 
misaligned text output that look great in some terminal but unreadable in 
others.

If we still want to go with preformatted text, I would suggest the following:

# Service: [service-name] as first line to avoid app name misalignment when 
status multiple apps.
# Put ID column in front because ID is uniform in length.
# we can remove URI and Artifact ID.  Those information exists in service spec, 
and add spec specific command.


> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler

2018-02-02 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351009#comment-16351009
 ] 

Arun Suresh commented on YARN-7819:
---

Updating patch after rebasing. The earlier testcase errors are spurious / 
not-related.

[~templedf], [~haibochen] - let me know if the latest patch

> Allow PlacementProcessor to be used with the FairScheduler
> --
>
> Key: YARN-7819
> URL: https://issues.apache.org/jira/browse/YARN-7819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-7819-YARN-6592.001.patch, 
> YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, 
> YARN-7819.004.patch
>
>
> The FairScheduler needs to implement the 
> {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to 
> support the FairScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler

2018-02-02 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7819:
--
Attachment: YARN-7819.004.patch

> Allow PlacementProcessor to be used with the FairScheduler
> --
>
> Key: YARN-7819
> URL: https://issues.apache.org/jira/browse/YARN-7819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-7819-YARN-6592.001.patch, 
> YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, 
> YARN-7819.004.patch
>
>
> The FairScheduler needs to implement the 
> {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to 
> support the FairScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler

2018-02-02 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351009#comment-16351009
 ] 

Arun Suresh edited comment on YARN-7819 at 2/2/18 10:23 PM:


Updating patch after rebasing. The earlier testcase errors are spurious / 
not-related.

[~templedf], [~haibochen] - do let me know if you are ok with the latest patch.


was (Author: asuresh):
Updating patch after rebasing. The earlier testcase errors are spurious / 
not-related.

[~templedf], [~haibochen] - let me know if the latest patch

> Allow PlacementProcessor to be used with the FairScheduler
> --
>
> Key: YARN-7819
> URL: https://issues.apache.org/jira/browse/YARN-7819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-7819-YARN-6592.001.patch, 
> YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, 
> YARN-7819.004.patch
>
>
> The FairScheduler needs to implement the 
> {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to 
> support the FairScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-02-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-6456:
--
Summary: Allow administrators to set a single ContainerRuntime for all 
containers  (was: Isolation of Docker containers In LinuxContainerExecutor)

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
> One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
> 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
> 2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
> 3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-02-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-6456:
--
Description: 
 

With LCE, there are multiple ContainerRuntimes available for handling different 
types of containers; default, docker, java sandbox. Admins should have the 
ability to override the user decision and set a single global ContainerRuntime 
to be used for all containers.

Original Description:
{quote}One reason to use Docker containers is to be able to isolate different 
workloads, even, if they run as the same user.

I have noticed some issues in the current design:
 1. DockerLinuxContainerRuntime mounts containerLocalDirs 
{{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and 
modify the files of another container. I think the application file cache 
directory should be enough for the container to run in most of the cases.
 2. The whole cgroups directory is mounted. Would the container directory be 
enough?
 3. There is no way to enforce exclusive use of Docker for all containers. 
There should be an option that it is not the user but the admin that requires 
to use Docker.
{quote}

  was:
One reason to use Docker containers is to be able to isolate different 
workloads, even, if they run as the same user.
I have noticed some issues in the current design:
1. DockerLinuxContainerRuntime mounts containerLocalDirs 
{{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and 
modify the files of another container. I think the application file cache 
directory should be enough for the container to run in most of the cases.
2. The whole cgroups directory is mounted. Would the container directory be 
enough?
3. There is no way to enforce exclusive use of Docker for all containers. There 
should be an option that it is not the user but the admin that requires to use 
Docker.



> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7778) Merging of placement constraints defined at different levels

2018-02-02 Thread Konstantinos Karanasos (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-7778:
-
Summary: Merging of placement constraints defined at different levels  
(was: Merging of constraints defined at different levels)

> Merging of placement constraints defined at different levels
> 
>
> Key: YARN-7778
> URL: https://issues.apache.org/jira/browse/YARN-7778
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: Merge Constraints Solution.pdf, 
> YARN-7778-YARN-7812.001.patch, YARN-7778-YARN-7812.002.patch, 
> YARN-7778.003.patch, YARN-7778.004.patch
>
>
> When we have multiple constraints defined for a given set of allocation tags 
> at different levels (i.e., at the cluster, the application or the scheduling 
> request level), we need to merge those constraints.
> Defining constraint levels as cluster > application > scheduling request, 
> constraints defined at lower levels should only be more restrictive than 
> those of higher levels. Otherwise the allocation should fail.
> For example, if there is an application level constraint that allows no more 
> than 5 HBase containers per rack, a scheduling request can further restrict 
> that to 3 containers per rack but not to 7 containers per rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351028#comment-16351028
 ] 

Vinod Kumar Vavilapalli commented on YARN-7572:
---

[~eyang], see my first comment. I'm proposing *both* a human-readable format as 
well as json (through a --json option)

> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7572) Make the service status output more readable

2018-02-02 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351035#comment-16351035
 ] 

Chandni Singh commented on YARN-7572:
-

[~vinodkv]

I don't think we currently have any command that spits out status of a specific 
component. 

The only command supported for a component is to flex.

A way for users to do it will be to spit out the json and run some sort of json 
filter tool (like jq) 

Are you proposing that we support spitting out component status as well?

> Make the service status output more readable 
> -
>
> Key: YARN-7572
> URL: https://issues.apache.org/jira/browse/YARN-7572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
>
> Currently the service status output is just a JSON spec, we can make it more 
> human readable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7883) Make HAR tool support IndexedLogAggregtionController

2018-02-02 Thread Xuan Gong (JIRA)

Xuan Gong created YARN-7883:
---

 Summary: Make HAR tool support IndexedLogAggregtionController
 Key: YARN-7883
 URL: https://issues.apache.org/jira/browse/YARN-7883
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a tool 
to combine aggregated logs into HAR files which currently only work for 
TFileLogAggregationFileController. We should make it support  
IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7883) Make HAR tool support IndexedLogAggregtionController

2018-02-02 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-7883.
-
Resolution: Duplicate

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: YARN-7883
> URL: https://issues.apache.org/jira/browse/YARN-7883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support  
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7883) Make HAR tool support IndexedLogAggregtionController

2018-02-02 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351050#comment-16351050
 ] 

Xuan Gong commented on YARN-7883:
-

Created a MapReduce ticket to track the work progress.  Close this one as 
duplicate

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: YARN-7883
> URL: https://issues.apache.org/jira/browse/YARN-7883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support  
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351059#comment-16351059
 ] 

Jason Lowe commented on YARN-7879:
--

bq. We are allowing file cache to be mounted in docker container as read only 
in YARN-7815.

If we are mounting a file cache directory into a container then I assume the 
user running in the Docker container should have the right to read every file 
under that file cache directory.  I do not see the security concern there if 
that's the case, but maybe I'm missing a key scenario that would be problematic?

bq. The risk of exposing filename is marginally small, but I like to confirm 
that is not a problem even the filename contains sensitive information exposed 
in docker containers.

The only way I can see it being an issue specific to Docker is if somehow 
something in the Docker container is not trusted that runs as a different user 
within the Docker  container (but still in the hadoop group or equivalent for 
the Docker container) pokes around for the filename.  That thing would have to 
probe for filenames since there's no read access on the filecache top-level 
directory, only group-execute permissions.

However I would argue that if the user is running untrusted things within the 
Docker container it's simply much easier to access the sensitive files _as the 
user_.  Then there would be access to the file's contents in addition to the 
filename.

bq. Can cache directory contain subdirectories to prevent this arrangement from 
working?

Yes, if the cache directory manager is being used there can be subdirectories 
to limit the total number of entries in a single directory.  In those cases the 
intermediate directories are setup with similar 0755 permissions so the NM user 
can access them easily, see ContainerLocalizer#createParentDirs.

This patch is restoring the usercache permissions behavior from before 
YARN-2185 went in.  YARN-2185 wasn't about addressing directory permissions, 
but it had a sidecar permission change that broke the ability for the NM to 
reuse non-public localized resources.  Therefore I'd like to see this go in so 
we aren't regressing functionality, and if there are concerns/improvements for 
how usercache permissions are handled we should address those in a separate 
JIRA.  Either that or we revert YARN-2185, remove the unrelated permissions 
change, recommit it, and still end up addressing any usercache permissions 
concerns in a separate JIRA. ;-)



> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being re

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351061#comment-16351061
 ] 

Miklos Szegedi commented on YARN-7879:
--

Thank you, [~shaneku...@gmail.com] for the report, [~jlowe] for the patch. I 
checked and the change seems good to me. Since this is a regression, [~eyang], 
would you mind if I commit it and we continue the discussion here or on another 
patch?

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7884) Race condition in registering YARN service in ZooKeeper

2018-02-02 Thread Eric Yang (JIRA)

Eric Yang created YARN-7884:
---

 Summary: Race condition in registering YARN service in ZooKeeper
 Key: YARN-7884
 URL: https://issues.apache.org/jira/browse/YARN-7884
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Eric Yang


In Kerberos enabled cluster, there seems to be a race condition for registering 
YARN service.

Yarn-service znode creation seems to happen after AM started and reporting back 
to update components information.  For some reason, Yarnservice znode should 
have access to create the znode, but reported NoAuth.

{code}
2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
user accounts: sasl:hbase
2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
system acls: 
[1,s{'world,'anyone}
, 31,s{'sasl,'yarn}
, 31,s{'sasl,'jhs}
, 31,s{'sasl,'hdfs-demo}
, 31,s{'sasl,'rm}
, 31,s{'sasl,'hive}
]
2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
[31,s{'sasl,'hbase}
, 31,s{'sasl,'hbase}
]
2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering class 
org.apache.hadoop.yarn.service.component.ComponentEventType for class 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering class 
org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
for class 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
the thread pool size is 500
2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
class org.apache.hadoop.ipc.DefaultRpcScheduler
2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
Starting Socket Reader #1 for port 56859
2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
the server
2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
Responder: starting
2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
Server listener on 56859: starting
2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
client: jaasClientEntry = Client, principal = 
hbase/eyang-5.openstacklo...@example.com, keytab = 
/etc/security/keytabs/hbase.service.keytab
2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
appattempt_1517611904996_0001_01, abc into registry
2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
containers from previous attempt.
2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not read 
component paths: `/users/hbase/services/yarn-service/abc/components': No such 
file or directory: KeeperErrorCode = NoNode for 
/registry/users/hbase/services/yarn-service/abc/components
2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
initial evaluation of component sleeper
2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
sleeper]: 2 instances.
2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT sleeper] 
Transitioned from INIT to FLEXING on FLEX event.
2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
Failed to register app abc in registry
org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: 
`/registry/users/hbase/services/yarn-service/abc': Not authorized to access 
path; ACLs: [
0x01: 'world,'anyone
 0x1f: 'sasl,'yarn
 0x1f: 'sasl,'jhs
 0x1f: 'sasl,'hdfs-demo
 0x1f: 'sasl,'rm
 0x1f: 'sasl,'hive
 0x1f: 'sasl,'hbase
 0x1f: 'sasl,'hbase
 ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412)
at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637)
at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679)
at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOper

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351065#comment-16351065
 ] 

Shane Kumpf commented on YARN-7879:
---

Thanks for the patch [~jlowe] - I've tested the patch and it fixes the problem 
I reported. +1 (non-binding) from me.

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions

2018-02-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351083#comment-16351083
 ] 

Eric Yang commented on YARN-7879:
-

{quote}

The only way I can see it being an issue specific to Docker is if somehow 
something in the Docker container is not trusted that runs as a different user 
within the Docker container

{quote}

[~jlowe] Thanks for the reassurance.  I think YARN-7516 and YARN-7221 
combination will eliminate the risk to make sure only authorized sudoers can 
impersonate in docker containers to remove this loophole.

 

[~miklos.szeg...@cloudera.com] Yes, I think this change is fine, and there are 
possible solutions to eliminate the concerns.  Thanks

> NM user is unable to access the application filecache due to permissions
> 
>
> Key: YARN-7879
> URL: https://issues.apache.org/jira/browse/YARN-7879
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7879.001.patch
>
>
> I noticed the following log entries where localization was being retried on 
> several MR AM files. 
> {code}
> 2018-02-02 02:53:02,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
>  is missing, localizing it again
> 2018-02-02 02:53:42,908 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
>  Resource 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
>  is missing, localizing it again
> {code}
> The cluster is configured to use LCE and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has 
> a umask of {{0002}}. The cluser is configured with 
> {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the 
> local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, 
> produces the same results.
> {code}
> [hadoopuser@y7001 ~]$ umask
> 0002
> [hadoopuser@y7001 ~]$ id
> uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
> {code}
> The cause of the log entry was tracked down a simple !file.exists call in 
> {{LocalResourcesTrackerImpl#isResourcePresent}}.
> {code}
>   public boolean isResourcePresent(LocalizedResource rsrc) {
> boolean ret = true;
> if (rsrc.getState() == ResourceState.LOCALIZED) {
>   File file = new File(rsrc.getLocalPath().toUri().getRawPath().
> toString());
>   if (!file.exists()) {
> ret = false;
>   } else if (dirsHandler != null) {
> ret = checkLocalResource(rsrc);
>   }
> }
> return ret;
>   }
> {code}
> The Resources Tracker runs as the NM user, in this case {{yarn}}. The files 
> being retried are in the filecache. The directories in the filecache are all 
> owned by the local-user's primary group and 700 perms, which makes it 
> unreadable by the {{yarn}} user.
> {code}
> [root@y7001 ~]# ls -la 
> /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
> total 0
> drwx--x---. 6 hadoopuser hadoop 46 Feb  2 03:06 .
> drwxr-s---. 4 hadoopuser hadoop 73 Feb  2 03:07 ..
> drwx--. 2 hadoopuser hadoopuser 61 Feb  2 03:05 10
> drwx--. 3 hadoopuser hadoopuser 21 Feb  2 03:05 11
> drwx--. 2 hadoopuser hadoopuser 45 Feb  2 03:06 12
> drwx--. 2 hadoopuser hadoopuser 41 Feb  2 03:06 13
> {code}
> I saw YARN-5287, but that appears to be related to a restrictive umask and 
> the usercache itself. I was unable to locate any other known issues that 
> seemed relevent. Is the above already known? a configuration issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 124 matches

Mail list logo