[jira] [Updated] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7757: -- Attachment: YARN-7757-YARN-3409.006.patch > Refactor NodeLabelsProvider to be more generic and reusable for node > attributes providers > - > > Key: YARN-7757 > URL: https://issues.apache.org/jira/browse/YARN-7757 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-7757-YARN-3409.001.patch, > YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, > YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, > YARN-7757-YARN-3409.006.patch, > nodeLabelsProvider_refactor_class_hierarchy.pdf, > nodeLabelsProvider_refactor_v2.pdf > > > Propose to do refactor on {{NodeLabelsProvider}}, > {{AbstractNodeLabelsProvider}} to be more generic, so node attributes > providers can reuse these interface/abstract classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983 ] Weiwei Yang commented on YARN-7757: --- Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly introduced another abstract class layer {{NodeLabelsProvider}} and {{NodeAttributesProvider}}, to avoid a potential typing mis-match while initializing by reflection. Other improvements [~Naganarasimha] mentioned we agree to postpone to individual jiras so we can get this blocker done first. Some details for reference: bq. It looks like it can create only one of the provider, either for labels or for attributes. I think we need to explicitly support for both. This will be done in YARN-7871 bq. multi scripts for different types of attributes Our configuration doesn't allow to configure multiple scripts now, it will fail on script verification. Right now we do not see a need to support this, but we can revisit if necessary. bq. Comments over NodeManager and NodeStatusUpdate Addressed in v6 patch. bq. verifyConfiguredScript seems to be out of place Right now the verifyConfiguredScript is only used by scripted based providers, lets keep it for now. If further we see it can be reused in some place else, we can pull it out. bq. serviceStart needs to capture that taskInterval needs to be set before the service is started It is initiated with -1 value, and gets override by particular provider. bq. Lets use scheduledexecutorservice instead of timer task ... We have agreed on this, but since this is not a work of refactoring, we agreed to open another lower priority JIRA to track. bq. output format of ScriptBasedNodeAttributesProvider This will need to be taken care of by YARN-7871 once we decided the finalized format of the attributes and conventions. This also depends on YARN-7856. Hope this addresses everything so far. Thanks. > Refactor NodeLabelsProvider to be more generic and reusable for node > attributes providers > - > > Key: YARN-7757 > URL: https://issues.apache.org/jira/browse/YARN-7757 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-7757-YARN-3409.001.patch, > YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, > YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, > YARN-7757-YARN-3409.006.patch, > nodeLabelsProvider_refactor_class_hierarchy.pdf, > nodeLabelsProvider_refactor_v2.pdf > > > Propose to do refactor on {{NodeLabelsProvider}}, > {{AbstractNodeLabelsProvider}} to be more generic, so node attributes > providers can reuse these interface/abstract classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983 ] Weiwei Yang edited comment on YARN-7757 at 2/2/18 8:46 AM: --- Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly introduced another abstract class layer {{NodeLabelsProvider}} and {{NodeAttributesProvider}}, to avoid a potential typing mis-match while initializing by reflection. Other improvements [~Naganarasimha] mentioned we agree to postpone to individual jiras so we can get this blocker done first. Some details for reference: bq. It looks like it can create only one of the provider, either for labels or for attributes. I think we need to explicitly support for both. This will be done in YARN-7871 bq. multi scripts for different types of attributes Our configuration doesn't allow to configure multiple scripts now, it will fail on script verification. Right now we do not see a need to support this, but we can revisit if necessary. We will also make sure this is documented properly in YARN-7865. bq. Comments over NodeManager and NodeStatusUpdate Addressed in v6 patch. bq. verifyConfiguredScript seems to be out of place Right now the verifyConfiguredScript is only used by scripted based providers, lets keep it for now. If further we see it can be reused in some place else, we can pull it out. bq. serviceStart needs to capture that taskInterval needs to be set before the service is started It is initiated with -1 value, and gets override by particular provider. bq. Lets use scheduledexecutorservice instead of timer task ... We have agreed on this, but since this is not a work of refactoring, we agreed to open another lower priority JIRA to track. bq. output format of ScriptBasedNodeAttributesProvider This will need to be taken care of by YARN-7871 once we decided the finalized format of the attributes and conventions. This also depends on YARN-7856. Hope this addresses everything so far. Thanks. was (Author: cheersyang): Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly introduced another abstract class layer {{NodeLabelsProvider}} and {{NodeAttributesProvider}}, to avoid a potential typing mis-match while initializing by reflection. Other improvements [~Naganarasimha] mentioned we agree to postpone to individual jiras so we can get this blocker done first. Some details for reference: bq. It looks like it can create only one of the provider, either for labels or for attributes. I think we need to explicitly support for both. This will be done in YARN-7871 bq. multi scripts for different types of attributes Our configuration doesn't allow to configure multiple scripts now, it will fail on script verification. Right now we do not see a need to support this, but we can revisit if necessary. bq. Comments over NodeManager and NodeStatusUpdate Addressed in v6 patch. bq. verifyConfiguredScript seems to be out of place Right now the verifyConfiguredScript is only used by scripted based providers, lets keep it for now. If further we see it can be reused in some place else, we can pull it out. bq. serviceStart needs to capture that taskInterval needs to be set before the service is started It is initiated with -1 value, and gets override by particular provider. bq. Lets use scheduledexecutorservice instead of timer task ... We have agreed on this, but since this is not a work of refactoring, we agreed to open another lower priority JIRA to track. bq. output format of ScriptBasedNodeAttributesProvider This will need to be taken care of by YARN-7871 once we decided the finalized format of the attributes and conventions. This also depends on YARN-7856. Hope this addresses everything so far. Thanks. > Refactor NodeLabelsProvider to be more generic and reusable for node > attributes providers > - > > Key: YARN-7757 > URL: https://issues.apache.org/jira/browse/YARN-7757 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-7757-YARN-3409.001.patch, > YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, > YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, > YARN-7757-YARN-3409.006.patch, > nodeLabelsProvider_refactor_class_hierarchy.pdf, > nodeLabelsProvider_refactor_v2.pdf > > > Propose to do refactor on {{NodeLabelsProvider}}, > {{AbstractNodeLabelsProvider}} to be more generic, so node attributes > providers can reuse these interface/abstract classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --
[jira] [Updated] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7757: -- Attachment: nodeLabelsProvider_refactor_v3.pdf > Refactor NodeLabelsProvider to be more generic and reusable for node > attributes providers > - > > Key: YARN-7757 > URL: https://issues.apache.org/jira/browse/YARN-7757 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-7757-YARN-3409.001.patch, > YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, > YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, > YARN-7757-YARN-3409.006.patch, > nodeLabelsProvider_refactor_class_hierarchy.pdf, > nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf > > > Propose to do refactor on {{NodeLabelsProvider}}, > {{AbstractNodeLabelsProvider}} to be more generic, so node attributes > providers can reuse these interface/abstract classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349983#comment-16349983 ] Weiwei Yang edited comment on YARN-7757 at 2/2/18 8:50 AM: --- Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly introduced another abstract class layer {{NodeLabelsProvider}} and {{NodeAttributesProvider}}, to avoid a potential typing mis-match while initializing by reflection, hierarchy see [^nodeLabelsProvider_refactor_v3.pdf] . Other improvements [~Naganarasimha] mentioned we agree to postpone to individual jiras so we can get this blocker done first. Some details for reference: bq. It looks like it can create only one of the provider, either for labels or for attributes. I think we need to explicitly support for both. This will be done in YARN-7871 bq. multi scripts for different types of attributes Our configuration doesn't allow to configure multiple scripts now, it will fail on script verification. Right now we do not see a need to support this, but we can revisit if necessary. We will also make sure this is documented properly in YARN-7865. bq. Comments over NodeManager and NodeStatusUpdate Addressed in v6 patch. bq. verifyConfiguredScript seems to be out of place Right now the verifyConfiguredScript is only used by scripted based providers, lets keep it for now. If further we see it can be reused in some place else, we can pull it out. bq. serviceStart needs to capture that taskInterval needs to be set before the service is started It is initiated with -1 value, and gets override by particular provider. bq. Lets use scheduledexecutorservice instead of timer task ... We have agreed on this, but since this is not a work of refactoring, we agreed to open another lower priority JIRA to track. bq. output format of ScriptBasedNodeAttributesProvider This will need to be taken care of by YARN-7871 once we decided the finalized format of the attributes and conventions. This also depends on YARN-7856. Hope this addresses everything so far. Thanks. was (Author: cheersyang): Per offline discussion with [~Naganarasimha], uploaded v6 patch majorly introduced another abstract class layer {{NodeLabelsProvider}} and {{NodeAttributesProvider}}, to avoid a potential typing mis-match while initializing by reflection. Other improvements [~Naganarasimha] mentioned we agree to postpone to individual jiras so we can get this blocker done first. Some details for reference: bq. It looks like it can create only one of the provider, either for labels or for attributes. I think we need to explicitly support for both. This will be done in YARN-7871 bq. multi scripts for different types of attributes Our configuration doesn't allow to configure multiple scripts now, it will fail on script verification. Right now we do not see a need to support this, but we can revisit if necessary. We will also make sure this is documented properly in YARN-7865. bq. Comments over NodeManager and NodeStatusUpdate Addressed in v6 patch. bq. verifyConfiguredScript seems to be out of place Right now the verifyConfiguredScript is only used by scripted based providers, lets keep it for now. If further we see it can be reused in some place else, we can pull it out. bq. serviceStart needs to capture that taskInterval needs to be set before the service is started It is initiated with -1 value, and gets override by particular provider. bq. Lets use scheduledexecutorservice instead of timer task ... We have agreed on this, but since this is not a work of refactoring, we agreed to open another lower priority JIRA to track. bq. output format of ScriptBasedNodeAttributesProvider This will need to be taken care of by YARN-7871 once we decided the finalized format of the attributes and conventions. This also depends on YARN-7856. Hope this addresses everything so far. Thanks. > Refactor NodeLabelsProvider to be more generic and reusable for node > attributes providers > - > > Key: YARN-7757 > URL: https://issues.apache.org/jira/browse/YARN-7757 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-7757-YARN-3409.001.patch, > YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, > YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, > YARN-7757-YARN-3409.006.patch, > nodeLabelsProvider_refactor_class_hierarchy.pdf, > nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf > > > Propose to do refactor on {{NodeLabelsProvider}}, > {{AbstractNodeLabelsProvider}} to be more generic, so node attributes > providers can r
[jira] [Commented] (YARN-7841) Cleanup AllocationFileLoaderService's reloadAllocations method
[ https://issues.apache.org/jira/browse/YARN-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350170#comment-16350170 ] Gergo Repas commented on YARN-7841: --- +1 (non-binding) Since this is a big piece of refactoring, I think a branch-2 version of the patch would be also good to have. > Cleanup AllocationFileLoaderService's reloadAllocations method > -- > > Key: YARN-7841 > URL: https://issues.apache.org/jira/browse/YARN-7841 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-7841-001.patch, YARN-7841-002.patch > > > AllocationFileLoaderService's reloadAllocations method is too complex. > Please refactor / cleanup this method to be more simple to understand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350189#comment-16350189 ] Gergo Repas commented on YARN-7876: --- Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it would be better to use that constant in the new section in RunJar.java. > Workaround ZipInputStream limitation for YARN-2185 > -- > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Major > Attachments: YARN-7876.000.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350189#comment-16350189 ] Gergo Repas edited comment on YARN-7876 at 2/2/18 11:30 AM: Thanks [~miklos.szeg...@cloudera.com] for the patch. Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it would be better to use that constant in the new section in RunJar.java. was (Author: grepas): Nit: there is already a BUFFER_SIZE constant used for the copyBytes calls, it would be better to use that constant in the new section in RunJar.java. > Workaround ZipInputStream limitation for YARN-2185 > -- > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Major > Attachments: YARN-7876.000.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7841) Cleanup AllocationFileLoaderService's reloadAllocations method
[ https://issues.apache.org/jira/browse/YARN-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350171#comment-16350171 ] Szilard Nemeth commented on YARN-7841: -- [~grepas] Thanks for the review! > Cleanup AllocationFileLoaderService's reloadAllocations method > -- > > Key: YARN-7841 > URL: https://issues.apache.org/jira/browse/YARN-7841 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-7841-001.patch, YARN-7841-002.patch > > > AllocationFileLoaderService's reloadAllocations method is too complex. > Please refactor / cleanup this method to be more simple to understand. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers
[ https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350141#comment-16350141 ] genericqa commented on YARN-7757: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 42s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 3s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 12s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 28s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in YARN-3409 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 269 unchanged - 20 fixed = 271 total (was 289) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7757 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908943/YARN-7757-YARN-3409.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xm
[jira] [Created] (YARN-7879) NM user is unable to access the application filecache due to permissions
Shane Kumpf created YARN-7879: - Summary: NM user is unable to access the application filecache due to permissions Key: YARN-7879 URL: https://issues.apache.org/jira/browse/YARN-7879 Project: Hadoop YARN Issue Type: Bug Reporter: Shane Kumpf I noticed the following log entries where localization was being retried on several MR AM files. {code} 2018-02-02 02:53:02,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar is missing, localizing it again 2018-02-02 02:53:42,908 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml is missing, localizing it again {code} The cluster is configured to use LCE and {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has a umask of {{0002}}. The cluser is configured with {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, produces the same results. {code} [hadoopuser@y7001 ~]$ umask 0002 [hadoopuser@y7001 ~]$ id uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) {code} The cause of the log entry was tracked down a simple !file.exists call in {{LocalResourcesTrackerImpl#isResourcePresent}}. {code} public boolean isResourcePresent(LocalizedResource rsrc) { boolean ret = true; if (rsrc.getState() == ResourceState.LOCALIZED) { File file = new File(rsrc.getLocalPath().toUri().getRawPath(). toString()); if (!file.exists()) { ret = false; } else if (dirsHandler != null) { ret = checkLocalResource(rsrc); } } return ret; } {code} The Resources Tracker runs as the NM user, in this case {{yarn}}. The files being retried are in the filecache. The directories in the filecache are all owned by the local-user's primary group and 700 perms, which makes it unreadable by the {{yarn}} user. {code} [root@y7001 ~]# ls -la /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache total 0 drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 {code} I saw YARN-5287, but that appears to be related to a restrictive umask and the usercache itself. I was unable to locate any other known issues that seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5028) RMStateStore should trim down app state for completed applications
[ https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergo Repas updated YARN-5028: -- Attachment: YARN-5028.001.patch > RMStateStore should trim down app state for completed applications > -- > > Key: YARN-5028 > URL: https://issues.apache.org/jira/browse/YARN-5028 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-5028.000.patch, YARN-5028.001.patch > > > RMStateStore stores enough information to recover applications in case of a > restart. The store also retains this information for completed applications > to serve their status to REST, WebUI, Java and CLI clients. We don't need all > the information we store today to serve application status; for instance, we > don't need the {{ApplicationSubmissionContext}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350340#comment-16350340 ] Shane Kumpf commented on YARN-6456: --- [~miklos.szeg...@cloudera.com] - I believe YARN-7815 will address #1 and YARN-7814 removes the automatic mounting for #2. Should we re-purpose this issue to focus on #3 and make it a subtask of YARN-3611? Thanks. > Isolation of Docker containers In LinuxContainerExecutor > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5028) RMStateStore should trim down app state for completed applications
[ https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350349#comment-16350349 ] genericqa commented on YARN-5028: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 5m 11s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-5028 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908982/YARN-5028.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19577/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RMStateStore should trim down app state for completed applications > -- > > Key: YARN-5028 > URL: https://issues.apache.org/jira/browse/YARN-5028 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-5028.000.patch, YARN-5028.001.patch > > > RMStateStore stores enough information to recover applications in case of a > restart. The store also retains this information for completed applications > to serve their status to REST, WebUI, Java and CLI clients. We don't need all > the information we store today to serve application status; for instance, we > don't need the {{ApplicationSubmissionContext}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350353#comment-16350353 ] Jason Lowe commented on YARN-7677: -- I realize now that the theoretical example cannot work in practice. In order for there to be a "hook" variable for the user to leverage, the variable would need to have escaped variable expansion by the shell when it was originally set. The variable would need to be set in the NM's environment like, JAVA_HOME="/some/node/path/\$JDKVER". While that could be a valid path for the user when it is expanded in the container launch script, it is not a valid setting for JAVA_HOME in the nodemanager itself. NM whitelist variables are going to be variables coming from a shell environment and not from XML property settings, so it's highly unlikely they will retain unexpanded variable references. In short, I'm cool with simply placing the NM whitelist variables first and simplifying YARN-5714 to list the variables in the launch script in the order they appear in their corresponding configuration properties. My apologies for the detour. > Docker image cannot set HADOOP_CONF_DIR > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch, YARN-7677.002.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5714) ContainerExecutor does not order environment map
[ https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350359#comment-16350359 ] Jason Lowe commented on YARN-5714: -- On second thought, it's extremely unlikely that NM whitelist variables could reference user variables. Details are in [this comment|https://issues.apache.org/jira/browse/YARN-7677?focusedCommentId=16350353&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16350353] on YARN-7667. I think we'll be fine if we make sure the NM whitelist inherited variables appear first in the launch script then followed by the user's variables in the order they are specified in the container launch context. YARN-7667 should be taking care of the NM whitelist variables, so this JIRA can tackle ordering the user's variables. > ContainerExecutor does not order environment map > > > Key: YARN-5714 > URL: https://issues.apache.org/jira/browse/YARN-5714 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1 > Environment: all (linux and windows alike) >Reporter: Remi Catherinot >Assignee: Remi Catherinot >Priority: Trivial > Labels: oct16-medium > Attachments: YARN-5714.001.patch, YARN-5714.002.patch, > YARN-5714.003.patch, YARN-5714.004.patch, YARN-5714.005.patch, > YARN-5714.006.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > when dumping the launch container script, environment variables are dumped > based on the order internally used by the map implementation (hash based). It > does not take into consideration that some env varibales may refer each > other, and so that some env variables must be declared before those > referencing them. > In my case, i ended up having LD_LIBRARY_PATH which was depending on > HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a > wrong value and so native libraries weren't loaded. jobs were running but not > at their best efficiency. This is just a use case falling into that bug, but > i'm sure others may happen as well. > I already have a patch running in my production environment, i just estimate > to 5 days for packaging the patch in the right fashion for JIRA + try my best > to add tests. > Note : the patch is not OS aware with a default empty implementation. I will > only implement the unix version on a 1st release. I'm not used to windows env > variables syntax so it will take me more time/research for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7838) Support AND/OR constraints in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7838: -- Description: Extending DS placement spec syntax to support AND/OR constraints, something like {code} // simple -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) // nested -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar)) {code} was: Extending DS placement spec syntax to support AND/OR constraints, something like {code} -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) {code} > Support AND/OR constraints in Distributed Shell > --- > > Key: YARN-7838 > URL: https://issues.apache.org/jira/browse/YARN-7838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > > Extending DS placement spec syntax to support AND/OR constraints, something > like > {code} > // simple > -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) > // nested > -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7838) Support AND/OR constraints in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7838: -- Attachment: YARN-7838.prelim.patch > Support AND/OR constraints in Distributed Shell > --- > > Key: YARN-7838 > URL: https://issues.apache.org/jira/browse/YARN-7838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7838.prelim.patch > > > Extending DS placement spec syntax to support AND/OR constraints, something > like > {code} > // simple > -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) > // nested > -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
Jiandan Yang created YARN-7880: --- Summary: FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls Key: YARN-7880 URL: https://issues.apache.org/jira/browse/YARN-7880 Project: Hadoop YARN Issue Type: Bug Reporter: Jiandan Yang 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-7839: - Description: Currently, the Algorithm assigns a node to a request purely based on if the constraints are met. It is later in the scheduling phase that the Queue capacity and Node capacity are checked. If the request cannot be placed because of unavailable Queue/Node capacity, the request is retried by the Algorithm. For clusters that are running at high utilization, we can reduce the retries if we perform the Node capacity check in the Algorithm as well. The Queue capacity check and the other user limit checks can still be handled by the scheduler (since queues and other limits are tied to the scheduler, and not scheduler agnostic) was: Currently, the Algorithm assigns a node to a requests purely based on if the constraints are met. It is later in the scheduling phase that the Queue capacity and Node capacity are checked. If the request cannot be placed because of unavailable Queue/Node capacity, the request is retried by the Algorithm. For clusters that are running at high utilization, we can reduce the retries if we perform the Node capacity check in the Algorithm as well. The Queue capacity check and the other user limit checks can still be handled by the scheduler (since queues and other limits are tied to the scheduler, and not scheduler agnostic) > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Priority: Major > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350368#comment-16350368 ] Billie Rinaldi commented on YARN-7677: -- That sounds like a good approach, NM vars followed by preserving the order of the user variables. I'd prefer if the NM vars included all the ones defined by the NM (see ContainerLaunch.sanitizeEnv), not just the whitelist vars. > Docker image cannot set HADOOP_CONF_DIR > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch, YARN-7677.002.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
[ https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-7880: Description: {code} 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) {code} was: 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) > FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls > --- > > Key: YARN-7880 > URL: https://issues.apache.org/jira/browse/YARN-7880 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jiandan Yang >Priority: Major > > {code} > 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: > container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED > to RUNNING > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers
[ https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350375#comment-16350375 ] Shane Kumpf commented on YARN-7815: --- The localization issue appears to be unrelated. I see the same without the patch. I've opened YARN-7879 to track that issue. Doing the final testing now for this patch and will have it posted shortly. > Mount the filecache as read-only in Docker containers > - > > Key: YARN-7815 > URL: https://issues.apache.org/jira/browse/YARN-7815 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > > Currently, when using the Docker runtime, the filecache directories are > mounted read-write into the Docker containers. Read write access is not > necessary. We should make this more restrictive by changing that mount to > read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
[ https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-7880: Description: {code} 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) {code} was: {code} 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) {code} > FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls > --- > > Key: YARN-7880 > URL: https://issues.apache.org/jira/browse/YARN-7880 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jiandan Yang >Priority: Major > > {code} > 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: > container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED > to RUNNING > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7838) Support AND/OR constraints in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350372#comment-16350372 ] Weiwei Yang commented on YARN-7838: --- Hello [~asuresh] Today I spent a few hours working on this one, to be able to support composite and nested constraints in DS, I think today's approach in PlacementSpec is not flexible. So I created a parser class {{PlacementConstraintParser}}. This is a prelim patch, please take a look and let me know your feedback. My thought is we can use such parser class to further support specifying expressions while submit an app, similar format like in DS. So app could be easier to use this feature without modifying client code. Once you agree with this approach, I can go on working on a formal patch. Thank you. > Support AND/OR constraints in Distributed Shell > --- > > Key: YARN-7838 > URL: https://issues.apache.org/jira/browse/YARN-7838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7838.prelim.patch > > > Extending DS placement spec syntax to support AND/OR constraints, something > like > {code} > // simple > -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) > // nested > -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-7879: Assignee: Jason Lowe Affects Version/s: 3.1.0 Priority: Critical (was: Major) Target Version/s: 3.1.0 We hit this before, and it was fixed in YARN-1386 by adding group execute permissions to the directories in the user's filecache. I think it could be YARN-2185 which added more restrictive permissions on some directories during localization. I'll run some quick tests locally to verify. > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7876) Localized jars that are expanded during localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-7876: - Affects Version/s: 3.1.0 Target Version/s: 3.1.0 Priority: Blocker (was: Major) Summary: Localized jars that are expanded during localization are not fully copied (was: Workaround ZipInputStream limitation for YARN-2185) > Localized jars that are expanded during localization are not fully copied > - > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7815) Mount the filecache as read-only in Docker containers
[ https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-7815: -- Attachment: YARN-7815.001.patch > Mount the filecache as read-only in Docker containers > - > > Key: YARN-7815 > URL: https://issues.apache.org/jira/browse/YARN-7815 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Attachments: YARN-7815.001.patch > > > Currently, when using the Docker runtime, the filecache directories are > mounted read-write into the Docker containers. Read write access is not > necessary. We should make this more restrictive by changing that mount to > read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned YARN-7839: Assignee: Panagiotis Garefalakis > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers
[ https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350424#comment-16350424 ] Shane Kumpf commented on YARN-7815: --- Attached a patch that implements the proposal. Given I had to touch a bulk of the test methods in {{TestDockerContainerRuntime}}, I went ahead a cleaned up some warnings and unused code as well. If you'd prefer that clean up be moved to a separate patch, I can do so. > Mount the filecache as read-only in Docker containers > - > > Key: YARN-7815 > URL: https://issues.apache.org/jira/browse/YARN-7815 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Attachments: YARN-7815.001.patch > > > Currently, when using the Docker runtime, the filecache directories are > mounted read-write into the Docker containers. Read write access is not > necessary. We should make this more restrictive by changing that mount to > read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated YARN-7839: - Attachment: YARN-7839-YARN-6592.001.patch > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496 ] Panagiotis Garefalakis commented on YARN-7839: -- Submitting a simple patch tracking available cluster resources in the DefaultPlacement algorithm - to support capacity check before placement. The actual check is part of the attemptPlacementOnNode method which could be configured with the **ignoreResourceCheck** flag. In the current patch the check is enabled on placement step and disabled on the validation step. A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to keep track of the failed attempts on the rejected SchedulingRequests. Thoughts? [~asuresh] [~kkaranasos] [~cheersyang] > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350500#comment-16350500 ] Jim Brennan commented on YARN-7677: --- Thanks everyone! I will work on a new patch using this approach. > Docker image cannot set HADOOP_CONF_DIR > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch, YARN-7677.002.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496 ] Panagiotis Garefalakis edited comment on YARN-7839 at 2/2/18 3:13 PM: -- Submitting a simple patch tracking available cluster resources in the DefaultPlacement algorithm - to support capacity check before placement. The actual check is part of the attemptPlacementOnNode method which could be configured with the *ignoreResourceCheck* flag. In the current patch the check is enabled on placement step and disabled on the validation step. A wrapper class *SchedulingRequestWithPlacementAttempt* was also introduced to keep track of the failed attempts on the rejected SchedulingRequests. Thoughts? [~asuresh] [~kkaranasos] [~cheersyang] was (Author: pgaref): Submitting a simple patch tracking available cluster resources in the DefaultPlacement algorithm - to support capacity check before placement. The actual check is part of the attemptPlacementOnNode method which could be configured with the **ignoreResourceCheck** flag. In the current patch the check is enabled on placement step and disabled on the validation step. A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to keep track of the failed attempts on the rejected SchedulingRequests. Thoughts? [~asuresh] [~kkaranasos] [~cheersyang] > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7832) Logs page does not work for Running applications
[ https://issues.apache.org/jira/browse/YARN-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-7832. --- Resolution: Not A Problem Thanks [~yeshavora] for confirming, This is working fine with Combine System Metric Publisher mode > Logs page does not work for Running applications > > > Key: YARN-7832 > URL: https://issues.apache.org/jira/browse/YARN-7832 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Sunil G >Priority: Critical > Attachments: Screen Shot 2018-01-26 at 3.28.40 PM.png, > YARN-7832.001.patch > > > Scenario > * Run yarn service application > * When application is Running, go to log page > * Select AttemptId and Container Id > Logs are not showed on UI. It complains "No log data available!" > > Here > [http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358] > API fails with 500 Internal Server Error. > {"exception":"WebApplicationException","message":"java.io.IOException: > ","javaClassName":"javax.ws.rs.WebApplicationException"} > {code:java} > GET > http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358 > 500 (Internal Server Error) > (anonymous) @ VM779:1 > send @ vendor.js:572 > ajax @ vendor.js:548 > (anonymous) @ vendor.js:5119 > initializePromise @ vendor.js:2941 > Promise @ vendor.js:3005 > ajax @ vendor.js:5117 > ajax @ yarn-ui.js:1 > superWrapper @ vendor.js:1591 > query @ vendor.js:5112 > ember$data$lib$system$store$finders$$_query @ vendor.js:5177 > query @ vendor.js:5334 > fetchLogFilesForContainerId @ yarn-ui.js:132 > showLogFilesForContainerId @ yarn-ui.js:126 > run @ vendor.js:648 > join @ vendor.js:648 > run.join @ vendor.js:1510 > closureAction @ vendor.js:1865 > trigger @ vendor.js:302 > (anonymous) @ vendor.js:339 > each @ vendor.js:61 > each @ vendor.js:51 > trigger @ vendor.js:339 > d.select @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > d.invoke @ vendor.js:5598 > d.trigger @ vendor.js:5598 > e.trigger @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > d.invoke @ vendor.js:5598 > d.trigger @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > dispatch @ vendor.js:306 > elemData.handle @ vendor.js:281{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350564#comment-16350564 ] Jason Lowe commented on YARN-7879: -- This was caused by YARN-2185. That change locked down the top-level directory for a non-public localized file to 0700 which prevents the nodemanager user from checking file presence on secure clusters. > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7815) Mount the filecache as read-only in Docker containers
[ https://issues.apache.org/jira/browse/YARN-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350571#comment-16350571 ] genericqa commented on YARN-7815: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 90 unchanged - 0 fixed = 93 total (was 90) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 14s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7815 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908989/YARN-7815.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 87aec99b0bcb 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4aef8bd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/19578/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/19578/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hado
[jira] [Updated] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-7879: - Attachment: YARN-7879.001.patch > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7866) [UI2] Kerberizing the UI doesn't give any warning or content when UI is accessed without kinit
[ https://issues.apache.org/jira/browse/YARN-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350610#comment-16350610 ] genericqa commented on YARN-7866: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7866 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908607/YARN-7866.001.patch | | Optional Tests | asflicense shadedclient | | uname | Linux 29183bd0f4f9 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4aef8bd | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 314 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19581/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [UI2] Kerberizing the UI doesn't give any warning or content when UI is > accessed without kinit > -- > > Key: YARN-7866 > URL: https://issues.apache.org/jira/browse/YARN-7866 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sumana Sathish >Assignee: Sunil G >Priority: Major > Attachments: YARN-7866.001.patch > > > Handle 401 error and show in UI > credit to [~ssath...@hortonworks.com] for finding this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7881) Add Log Aggregation Status API to the RM Webservice
[ https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-7881: Description: The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which shows the log aggregation status for all the nodes that run containers for the given application. In order to add a similar page to the new YARN UI we need to add an RM WS endpoint first. (was: The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which shows the log aggregation status for all the nodes that run containers for the given application. This information is not yet available by the RM Rest API.) > Add Log Aggregation Status API to the RM Webservice > --- > > Key: YARN-7881 > URL: https://issues.apache.org/jira/browse/YARN-7881 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Major > > The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which > shows the log aggregation status for all the nodes that run containers for > the given application. In order to add a similar page to the new YARN UI we > need to add an RM WS endpoint first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7881) Add Log Aggregation Status API to the RM Webservice
Gergely Novák created YARN-7881: --- Summary: Add Log Aggregation Status API to the RM Webservice Key: YARN-7881 URL: https://issues.apache.org/jira/browse/YARN-7881 Project: Hadoop YARN Issue Type: New Feature Components: yarn Reporter: Gergely Novák Assignee: Gergely Novák The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which shows the log aggregation status for all the nodes that run containers for the given application. This information is not yet available by the RM Rest API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350628#comment-16350628 ] Jason Lowe commented on YARN-7879: -- I also manually tested the patch on a secure cluster and verified non-private resources are not re-localized with each application. > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7868) Provide improved error message when YARN service is disabled
[ https://issues.apache.org/jira/browse/YARN-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350632#comment-16350632 ] Eric Yang commented on YARN-7868: - [~csingh] Thank you for reviewing the patch. [~jianhe] Thanks for the review. The message might be inaccurate for multi-users environment where an end user doesn't have system admin rights to enable the service. This is where the message would shows up the most, if system admin intentionally disabled this feature. Therefore, I prefer to omit this message to prevent noise generation. > Provide improved error message when YARN service is disabled > > > Key: YARN-7868 > URL: https://issues.apache.org/jira/browse/YARN-7868 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7868.001.patch > > > Some YARN CLI command will throw verbose error message when YARN service is > disabled. The error message looks like this: > {code} > Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity > SEVERE: A message body reader for Java class > org.apache.hadoop.yarn.service.api.records.ServiceStatus, and Java type class > org.apache.hadoop.yarn.service.api.records.ServiceStatus, and MIME media type > application/octet-stream was not found > Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity > SEVERE: The registered message body readers compatible with the MIME media > type are: > application/octet-stream -> > com.sun.jersey.core.impl.provider.entity.ByteArrayProvider > com.sun.jersey.core.impl.provider.entity.FileProvider > com.sun.jersey.core.impl.provider.entity.InputStreamProvider > com.sun.jersey.core.impl.provider.entity.DataSourceProvider > com.sun.jersey.core.impl.provider.entity.RenderedImageProvider > */* -> > com.sun.jersey.core.impl.provider.entity.FormProvider > com.sun.jersey.core.impl.provider.entity.StringProvider > com.sun.jersey.core.impl.provider.entity.ByteArrayProvider > com.sun.jersey.core.impl.provider.entity.FileProvider > com.sun.jersey.core.impl.provider.entity.InputStreamProvider > com.sun.jersey.core.impl.provider.entity.DataSourceProvider > com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General > com.sun.jersey.core.impl.provider.entity.ReaderProvider > com.sun.jersey.core.impl.provider.entity.DocumentProvider > com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader > com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader > com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader > com.sun.jersey.json.impl.provider.entity.JSONJAXBElementProvider$General > com.sun.jersey.json.impl.provider.entity.JSONArrayProvider$General > com.sun.jersey.json.impl.provider.entity.JSONObjectProvider$General > com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General > com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General > com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General > com.sun.jersey.core.impl.provider.entity.EntityHolderReader > com.sun.jersey.json.impl.provider.entity.JSONRootElementProvider$General > com.sun.jersey.json.impl.provider.entity.JSONListElementProvider$General > com.sun.jersey.json.impl.provider.entity.JacksonProviderProxy > com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider > 2018-01-31 16:24:46,415 ERROR client.ApiServiceClient: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7881) Add Log Aggregation Status API to the RM Webservice
[ https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-7881: Attachment: YARN-7881.001.patch > Add Log Aggregation Status API to the RM Webservice > --- > > Key: YARN-7881 > URL: https://issues.apache.org/jira/browse/YARN-7881 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-7881.001.patch > > > The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which > shows the log aggregation status for all the nodes that run containers for > the given application. In order to add a similar page to the new YARN UI we > need to add an RM WS endpoint first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation
[ https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350635#comment-16350635 ] Sunil G commented on YARN-7850: --- [~GergelyNovak] Change looks fine to me. One dbt, when log aggregation is yet to start, we ll show the status without any style, correct? > New UI does not show status for Log Aggregation > --- > > Key: YARN-7850 > URL: https://issues.apache.org/jira/browse/YARN-7850 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Attachments: Screen Shot 2018-02-01 at 11.37.30.png, > YARN-7850.001.patch > > > The status of Log Aggregation is not specified any where. > New UI should show the Log aggregation status for finished application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation
[ https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350642#comment-16350642 ] Gergely Novák commented on YARN-7850: - We show it with the default style (grey). > New UI does not show status for Log Aggregation > --- > > Key: YARN-7850 > URL: https://issues.apache.org/jira/browse/YARN-7850 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Attachments: Screen Shot 2018-02-01 at 11.37.30.png, > YARN-7850.001.patch > > > The status of Log Aggregation is not specified any where. > New UI should show the Log aggregation status for finished application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7850) New UI does not show status for Log Aggregation
[ https://issues.apache.org/jira/browse/YARN-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350625#comment-16350625 ] Gergely Novák commented on YARN-7850: - Created [YARN-7881|https://issues.apache.org/jira/browse/YARN-7881]. > New UI does not show status for Log Aggregation > --- > > Key: YARN-7850 > URL: https://issues.apache.org/jira/browse/YARN-7850 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Attachments: Screen Shot 2018-02-01 at 11.37.30.png, > YARN-7850.001.patch > > > The status of Log Aggregation is not specified any where. > New UI should show the Log aggregation status for finished application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350648#comment-16350648 ] genericqa commented on YARN-7879: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7879 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908999/YARN-7879.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b15ac493c66c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4aef8bd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19582/testReport/ | | Max. process+thread count | 407 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19582/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NM user is unable to access the application filecache due to permissions >
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350649#comment-16350649 ] genericqa commented on YARN-7839: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} YARN-6592 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 15s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} YARN-6592 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} YARN-6592 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7839 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908993/YARN-7839-YARN-6592.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9cb62d03926c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | YARN-6592 / 8df7666 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/19579/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19579/testReport/ | | Max. process+thread count | 866 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemana
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350656#comment-16350656 ] Sunil G commented on YARN-7839: --- bq.despite the naming, as far as I know, the candidateNodeSet is currently always only a single node [~kkaranasos] and [~asuresh] for multi node, CandidateNodeSet was ideal interface to extend for. So multiple nodes could come in tat iterator. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5028) RMStateStore should trim down app state for completed applications
[ https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350701#comment-16350701 ] genericqa commented on YARN-5028: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 57s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 76 unchanged - 0 fixed = 78 total (was 76) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel | | | hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementProcessor | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-5028 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908982/YARN-5028.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 119bf6b51613 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4aef8bd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/19580/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/jo
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350731#comment-16350731 ] Arun Suresh commented on YARN-7839: --- [~sunilg], regarding the {{CandidateNodeSet}}, lets move the discussion to when we refactor the {{AppSchedulingInfo}} - since this patch is isolated to the algorithm. [~kkaranasos] comment: bq. However, what about the case that a node seems full but a container is about to finish (and will be finished until the allocate is done)? Should we completely reject such nodes, or simply give higher priority to nodes that already have available resources? We are not rejecting those resources. If a Scheduling request cannot be satisfied by any node in the algorithm round, it will be retried in the next AM heartbeat - and hopefully some of those containers would complete by then. We can set the retry to a higher value for clusters that are running at a higher utilization. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350732#comment-16350732 ] Arun Suresh commented on YARN-7839: --- Thanks for the patch [~pgaref] It looks pretty straight forward to me. +1 will commit this shortly. > Check node capacity before placing in the Algorithm > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7876) Localized jars that are expanded during localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-7876: - Attachment: YARN-7876.001.patch > Localized jars that are expanded during localization are not fully copied > - > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7876) Localized jars that are expanded after localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-7876: - Summary: Localized jars that are expanded after localization are not fully copied (was: Localized jars that are expanded during localization are not fully copied) > Localized jars that are expanded after localization are not fully copied > > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350762#comment-16350762 ] Miklos Szegedi commented on YARN-7876: -- Thank you [~grepas] for the review and [~jlowe] for updating the title. I refined the title a little bit since what happens is that the localized and extracted files should be there. The patch does not change that. What might be truncated is the jar that is left around for compatibility reasons. If the job tries to extract this with zip for example, it might run into an issue. > Localized jars that are expanded after localization are not fully copied > > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7876) Localized jars that are expanded after localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350762#comment-16350762 ] Miklos Szegedi edited comment on YARN-7876 at 2/2/18 6:19 PM: -- Thank you [~grepas] for the review and [~jlowe] for updating the title. I refined the title a little bit since what happens is that the localized and extracted files should be there. The patch does not change that. What might be truncated is the jar that is left around for compatibility reasons. If the job tries to extract this with zip for example, it could run into an issue. was (Author: miklos.szeg...@cloudera.com): Thank you [~grepas] for the review and [~jlowe] for updating the title. I refined the title a little bit since what happens is that the localized and extracted files should be there. The patch does not change that. What might be truncated is the jar that is left around for compatibility reasons. If the job tries to extract this with zip for example, it might run into an issue. > Localized jars that are expanded after localization are not fully copied > > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7839) Modify PlacementAlgorithm to Check node capacity before placing request on node
[ https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7839: -- Summary: Modify PlacementAlgorithm to Check node capacity before placing request on node (was: Check node capacity before placing in the Algorithm) > Modify PlacementAlgorithm to Check node capacity before placing request on > node > --- > > Key: YARN-7839 > URL: https://issues.apache.org/jira/browse/YARN-7839 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Panagiotis Garefalakis >Priority: Major > Attachments: YARN-7839-YARN-6592.001.patch > > > Currently, the Algorithm assigns a node to a request purely based on if the > constraints are met. It is later in the scheduling phase that the Queue > capacity and Node capacity are checked. If the request cannot be placed > because of unavailable Queue/Node capacity, the request is retried by the > Algorithm. > For clusters that are running at high utilization, we can reduce the retries > if we perform the Node capacity check in the Algorithm as well. The Queue > capacity check and the other user limit checks can still be handled by the > scheduler (since queues and other limits are tied to the scheduler, and not > scheduler agnostic) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7791) Support submit intra-app placement constraint in Distributed Shell to AppPlacementAllocator
[ https://issues.apache.org/jira/browse/YARN-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7791: -- Environment: (was: Set {{yarn.resourcemanager.placement-constraints.enabled}} to {{false}} Submit a job with placement constraint spec, e.g {code} in/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar -shell_command sleep -shell_args 30 -num_containers 2 -master_memory 500 -container_memory 200 -placement_spec foo=4,NOTIN,NODE,foo {code} got following errors in RM log {noformat} Exception message:As of now, the only accepted target key for targetKey of allocation_tag target expression is: [yarn_application_label/%intra_app%]. Please make changes to placement constraints accordingly. {noformat} Looks like DS needs some modification to support submitting proper scheduling requests to app placement allocators.) > Support submit intra-app placement constraint in Distributed Shell to > AppPlacementAllocator > --- > > Key: YARN-7791 > URL: https://issues.apache.org/jira/browse/YARN-7791 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Weiwei Yang >Assignee: Sunil G >Priority: Major > Labels: distributedshell > Attachments: YARN-7791-YARN-6592.001.patch > > > Set {{yarn.resourcemanager.placement-constraints.enabled}} to {{false}} > Submit a job with placement constraint spec, e.g > {code} > in/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.0-SNAPSHOT.jar > -shell_command sleep -shell_args 30 -num_containers 2 -master_memory 500 > -container_memory 200 -placement_spec foo=4,NOTIN,NODE,foo > {code} > got following errors in RM log > {noformat} > Exception message:As of now, the only accepted target key for targetKey of > allocation_tag target expression is: [yarn_application_label/%intra_app%]. > Please make changes to placement constraints accordingly. > {noformat} > Looks like DS needs some modification to support submitting proper scheduling > requests to app placement allocators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350775#comment-16350775 ] Eric Yang commented on YARN-7879: - Are we conformable in assuming all files in filecache are world readable? In health care, and financial industry, user's default umask is set to 027. Can there be private files that are exposed as result of the umask change? Should we check every file, or assume that pipe archive always expands properly with single checksum file test? Would it be possible to make this detection using privileged access and report back to nodemanager to trigger reinitialization? > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7857) -fstack-check compilation flag causes binary incompatibility for container-executor between RHEL 6 and RHEL 7
[ https://issues.apache.org/jira/browse/YARN-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350781#comment-16350781 ] Miklos Szegedi commented on YARN-7857: -- Thank you, [~Jim_Brennan] for the patch. I am a little bit concerned that we sacrifice security over compatibility. Since RHEL7 code does not run on RHEL6 anyways for glibc compatibility issues, would it make sense to keep the stack check code for RHEL7 and above? I checked the RHEL74 stack guard code and it seems to be much more precise than the one in the previous version. > -fstack-check compilation flag causes binary incompatibility for > container-executor between RHEL 6 and RHEL 7 > - > > Key: YARN-7857 > URL: https://issues.apache.org/jira/browse/YARN-7857 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7857.001.patch > > > The segmentation fault in container-executor reported in [YARN-7796] appears > to be due to a binary compatibility issue with the {{-fstack-check}} flag > that was added in [YARN-6721] > Based on my testing, a container-executor (without the patch from > [YARN-7796]) compiled on RHEL 6 with the -fstack-check flag always hits this > segmentation fault when run on RHEL 7. But if you compile without this flag, > the container-executor runs on RHEL 7 with no problems. I also verified this > with a simple program that just does the copy_file. > I think we need to either remove this flag, or find a suitable alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-6456: - Issue Type: Sub-task (was: Bug) Parent: YARN-3611 > Isolation of Docker containers In LinuxContainerExecutor > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350787#comment-16350787 ] Miklos Szegedi commented on YARN-6456: -- Sure, I made it as a subtask. Are you referring to #3 as this? "Maybe the container directories could be outside the application directory." > Isolation of Docker containers In LinuxContainerExecutor > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350794#comment-16350794 ] Jason Lowe commented on YARN-7879: -- bq. Are we conformable in assuming all files in filecache are world readable? Non-public localized files are not world readable. The top-level directory of the user's filecache directory is mode 0710 with the NM group, so only the user and those in the NM's group can see files are there regardless of what the permissions of the underlying paths are. The localized files themselves are mode 0500. In short, the NM user can see that a file is there but cannot read it. > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7838) Support AND/OR constraints in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350798#comment-16350798 ] Arun Suresh commented on YARN-7838: --- Thanks for taking a stab at this [~cheersyang]. Yup, I agree we need a more flexible parser - the placementspec parser I put in was just for some adhoc testing :) Couple of comments: # do we need a tryParse ? Either we are able to parse or an exception is thrown right ? # The {{toInt}} should be static # I am assuming your {{shouldHaveNext}} is more like an assert - Maybe make that static as well, and # In the final implementation, we have to ensure that it accepts a placementspec string WITHOUT any and/or as well. > Support AND/OR constraints in Distributed Shell > --- > > Key: YARN-7838 > URL: https://issues.apache.org/jira/browse/YARN-7838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7838.prelim.patch > > > Extending DS placement spec syntax to support AND/OR constraints, something > like > {code} > // simple > -placement_spec foo=4,AND(NOTIN,NODE,foo:NOTIN,NODE,bar) > // nested > -placement_spec foo=4,AND(NOTIN,NODE,foo:OR(IN,NODE,moo:IN,NODE,bar)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350831#comment-16350831 ] Yufei Gu commented on YARN-7655: The patch looks good to me generally. Only some nits: # Add Java doc to method {{identifyContainersToPreempt()}} to indicates that preemption will try to meet locality first no matter resource request relax on it or not, and there is an exception for AM containers. # Solve some style issues in the test class. > avoid AM preemption caused by RRs for specific nodes or racks > - > > Key: YARN-7655 > URL: https://issues.apache.org/jira/browse/YARN-7655 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.0 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: YARN-7655-001.patch > > > We frequently see AM preemptions when > {{starvedApp.getStarvedResourceRequests()}} in > {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs > that request containers on a specific node. Since this causes us to only > consider one node to preempt containers on, the really good work that was > done in YARN-5830 doesn't save us from AM preemption. Even though there might > be multiple nodes on which we could preempt enough non-AM containers to > satisfy the app's starvation, we often wind up preempting one or more AM > containers on the single node that we're considering. > A proposed solution is that if we're going to preempt one or more AM > containers for an RR that specifies a node or rack, then we should instead > expand the search space to consider all nodes. That way we take advantage of > YARN-5830, and only preempt AMs if there's no alternative. I've attached a > patch with an initial implementation of this. We've been running it on a few > clusters, and have seen AM preemptions drop from double-digit occurrences on > many days to zero. > Of course, the tradeoff is some loss of locality, since the starved app is > less likely to be allocated resources at the most specific locality level > that it asked for. My opinion is that this tradeoff is worth it, but > interested to hear what others think as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350832#comment-16350832 ] Jason Lowe commented on YARN-7876: -- Right, thanks for clarifying the title further. +1 lgtm pending Jenkins. > Localized jars that are expanded after localization are not fully copied > > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7857) -fstack-check compilation flag causes binary incompatibility for container-executor between RHEL 6 and RHEL 7
[ https://issues.apache.org/jira/browse/YARN-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350833#comment-16350833 ] Jim Brennan commented on YARN-7857: --- Thanks [~miklos.szeg...@cloudera.com]! That is a good suggestion. I will need to do some more investigation - I would think the key factor is the GCC version, not the OS version. > -fstack-check compilation flag causes binary incompatibility for > container-executor between RHEL 6 and RHEL 7 > - > > Key: YARN-7857 > URL: https://issues.apache.org/jira/browse/YARN-7857 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7857.001.patch > > > The segmentation fault in container-executor reported in [YARN-7796] appears > to be due to a binary compatibility issue with the {{-fstack-check}} flag > that was added in [YARN-6721] > Based on my testing, a container-executor (without the patch from > [YARN-7796]) compiled on RHEL 6 with the -fstack-check flag always hits this > segmentation fault when run on RHEL 7. But if you compile without this flag, > the container-executor runs on RHEL 7 with no problems. I also verified this > with a simple program that just does the copy_file. > I think we need to either remove this flag, or find a suitable alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7732) Support Generic AM Simulator from SynthGenerator
[ https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-7732: - Attachment: YARN-7732.04.patch > Support Generic AM Simulator from SynthGenerator > > > Key: YARN-7732 > URL: https://issues.apache.org/jira/browse/YARN-7732 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Attachments: YARN-7732-YARN-7798.01.patch, > YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, > YARN-7732.03.patch, YARN-7732.04.patch > > > Extract the MapReduce specific set-up in the SLSRunner into the > MRAMSimulator, and enable support for pluggable AMSimulators. > Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, > for example startAMFromSynthGenerator() calls this: > > {code:java} > runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId, > jobStartTimeMS, jobFinishTimeMS, containerList, reservationId, > job.getDeadline(), getAMContainerResource(null)); > {code} > where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce" > The container set up was also only suitable for mapreduce: > > {code:java} > Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 > EndFragment:12474 StartSelection:03700 EndSelection:12464 > SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > > // map tasks > for (int i = 0; i < job.getNumberMaps(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add(new ContainerSimulator(containerResource, > containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map")); > } > // reduce tasks > for (int i = 0; i < job.getNumberReduces(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add( > new ContainerSimulator(containerResource, containerLifeTime, > hostname, DEFAULT_REDUCER_PRIORITY, "reduce")); > } > {code} > > In addition, the syn.json format supported only mapreduce (the parameters > were very specific: mtime, rtime, mtasks, rtasks, etc..). > This patch aims to introduce a new syn.json format that can describe generic > jobs, and the SLS setup required to support the synth generation of generic > jobs. > See syn_generic.json for an equivalent of the previous syn.json in the new > format. > Using the new generic format, we describe a StreamAMSimulator simulates a > long running streaming service that maintains N number of containers for the > lifetime of the AM. See syn_stream.json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7868) Provide improved error message when YARN service is disabled
[ https://issues.apache.org/jira/browse/YARN-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350851#comment-16350851 ] Eric Yang commented on YARN-7868: - [~jianhe] Thank you for the commit. > Provide improved error message when YARN service is disabled > > > Key: YARN-7868 > URL: https://issues.apache.org/jira/browse/YARN-7868 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.1.0 > > Attachments: YARN-7868.001.patch > > > Some YARN CLI command will throw verbose error message when YARN service is > disabled. The error message looks like this: > {code} > Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity > SEVERE: A message body reader for Java class > org.apache.hadoop.yarn.service.api.records.ServiceStatus, and Java type class > org.apache.hadoop.yarn.service.api.records.ServiceStatus, and MIME media type > application/octet-stream was not found > Jan 31, 2018 4:24:46 PM com.sun.jersey.api.client.ClientResponse getEntity > SEVERE: The registered message body readers compatible with the MIME media > type are: > application/octet-stream -> > com.sun.jersey.core.impl.provider.entity.ByteArrayProvider > com.sun.jersey.core.impl.provider.entity.FileProvider > com.sun.jersey.core.impl.provider.entity.InputStreamProvider > com.sun.jersey.core.impl.provider.entity.DataSourceProvider > com.sun.jersey.core.impl.provider.entity.RenderedImageProvider > */* -> > com.sun.jersey.core.impl.provider.entity.FormProvider > com.sun.jersey.core.impl.provider.entity.StringProvider > com.sun.jersey.core.impl.provider.entity.ByteArrayProvider > com.sun.jersey.core.impl.provider.entity.FileProvider > com.sun.jersey.core.impl.provider.entity.InputStreamProvider > com.sun.jersey.core.impl.provider.entity.DataSourceProvider > com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General > com.sun.jersey.core.impl.provider.entity.ReaderProvider > com.sun.jersey.core.impl.provider.entity.DocumentProvider > com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader > com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader > com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader > com.sun.jersey.json.impl.provider.entity.JSONJAXBElementProvider$General > com.sun.jersey.json.impl.provider.entity.JSONArrayProvider$General > com.sun.jersey.json.impl.provider.entity.JSONObjectProvider$General > com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General > com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General > com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General > com.sun.jersey.core.impl.provider.entity.EntityHolderReader > com.sun.jersey.json.impl.provider.entity.JSONRootElementProvider$General > com.sun.jersey.json.impl.provider.entity.JSONListElementProvider$General > com.sun.jersey.json.impl.provider.entity.JacksonProviderProxy > com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider > 2018-01-31 16:24:46,415 ERROR client.ApiServiceClient: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7732) Support Generic AM Simulator from SynthGenerator
[ https://issues.apache.org/jira/browse/YARN-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350853#comment-16350853 ] Young Chen commented on YARN-7732: -- Added back compatibility with JobStory and JobStoryProducer interfaces for gridmix integration in [^YARN-7732.04.patch] > Support Generic AM Simulator from SynthGenerator > > > Key: YARN-7732 > URL: https://issues.apache.org/jira/browse/YARN-7732 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Young Chen >Assignee: Young Chen >Priority: Minor > Attachments: YARN-7732-YARN-7798.01.patch, > YARN-7732-YARN-7798.02.patch, YARN-7732.01.patch, YARN-7732.02.patch, > YARN-7732.03.patch, YARN-7732.04.patch > > > Extract the MapReduce specific set-up in the SLSRunner into the > MRAMSimulator, and enable support for pluggable AMSimulators. > Previously, the AM set up in SLSRunner had the MRAMSimulator type hard coded, > for example startAMFromSynthGenerator() calls this: > > {code:java} > runNewAM(SLSUtils.DEFAULT_JOB_TYPE, user, jobQueue, oldJobId, > jobStartTimeMS, jobFinishTimeMS, containerList, reservationId, > job.getDeadline(), getAMContainerResource(null)); > {code} > where SLSUtils.DEFAULT_JOB_TYPE = "mapreduce" > The container set up was also only suitable for mapreduce: > > {code:java} > Version:1.0 StartHTML:00286 EndHTML:12564 StartFragment:03634 > EndFragment:12474 StartSelection:03700 EndSelection:12464 > SourceURL:https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > > // map tasks > for (int i = 0; i < job.getNumberMaps(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.MAP, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add(new ContainerSimulator(containerResource, > containerLifeTime, hostname, DEFAULT_MAPPER_PRIORITY, "map")); > } > // reduce tasks > for (int i = 0; i < job.getNumberReduces(); i++) { > TaskAttemptInfo tai = job.getTaskAttemptInfo(TaskType.REDUCE, i, 0); > RMNode node = > nmMap.get(keyAsArray.get(rand.nextInt(keyAsArray.size( > .getNode(); > String hostname = "/" + node.getRackName() + "/" + node.getHostName(); > long containerLifeTime = tai.getRuntime(); > Resource containerResource = > Resource.newInstance((int) tai.getTaskInfo().getTaskMemory(), > (int) tai.getTaskInfo().getTaskVCores()); > containerList.add( > new ContainerSimulator(containerResource, containerLifeTime, > hostname, DEFAULT_REDUCER_PRIORITY, "reduce")); > } > {code} > > In addition, the syn.json format supported only mapreduce (the parameters > were very specific: mtime, rtime, mtasks, rtasks, etc..). > This patch aims to introduce a new syn.json format that can describe generic > jobs, and the SLS setup required to support the synth generation of generic > jobs. > See syn_generic.json for an equivalent of the previous syn.json in the new > format. > Using the new generic format, we describe a StreamAMSimulator simulates a > long running streaming service that maintains N number of containers for the > lifetime of the AM. See syn_stream.json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount
[ https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7626: Attachment: YARN-7626.006.patch > Allow regular expression matching in container-executor.cfg for devices and > named docker volumes mount > -- > > Key: YARN-7626 > URL: https://issues.apache.org/jira/browse/YARN-7626 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7626.001.patch, YARN-7626.002.patch, > YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, > YARN-7626.006.patch > > > Currently when we config some of the GPU devices related fields (like ) in > container-executor.cfg, these fields are generated based on different driver > versions or GPU device names. We want to enable regular expression matching > so that user don't need to manually set up these fields when config > container-executor.cfg, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7882) Server side proxy for UI2 log viewer
Eric Yang created YARN-7882: --- Summary: Server side proxy for UI2 log viewer Key: YARN-7882 URL: https://issues.apache.org/jira/browse/YARN-7882 Project: Hadoop YARN Issue Type: Bug Components: security, timelineserver, yarn-ui-v2 Affects Versions: 3.0.0 Reporter: Eric Yang When viewing container logs in UI2, the log files are directly fetched through timeline server 2. Hadoop in simple security mode does not have authenticator to make sure the user is authorized to view the log. The general practice is to use knox or other security proxy to authenticate the user and reserve proxy the request to Hadoop UI to ensure the information does not leak through anonymous user. The current implementation of UI2 log viewer uses ajax code to timeline server 2. This could prevent knox or reverse proxy software from working properly with the new design. It would be good to perform server side proxy to prevent browser from side step the authentication check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount
[ https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350873#comment-16350873 ] Zian Chen commented on YARN-7626: - Update patch 006 per Miklos's suggestions. [~leftnoteasy] ,[~sunilg] , could you please help review the latest patch? Thanks! > Allow regular expression matching in container-executor.cfg for devices and > named docker volumes mount > -- > > Key: YARN-7626 > URL: https://issues.apache.org/jira/browse/YARN-7626 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7626.001.patch, YARN-7626.002.patch, > YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, > YARN-7626.006.patch > > > Currently when we config some of the GPU devices related fields (like ) in > container-executor.cfg, these fields are generated based on different driver > versions or GPU device names. We want to enable regular expression matching > so that user don't need to manually set up these fields when config > container-executor.cfg, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7876) Localized jars that are expanded after localization are not fully copied
[ https://issues.apache.org/jira/browse/YARN-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350895#comment-16350895 ] Robert Kanter commented on YARN-7876: - +1 LGTM pending Jenkins Thanks for adding the directory to the unit test like we discussed offline. > Localized jars that are expanded after localization are not fully copied > > > Key: YARN-7876 > URL: https://issues.apache.org/jira/browse/YARN-7876 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-7876.000.patch, YARN-7876.001.patch > > > YARN-2185 added the ability to localize jar files as a stream instead of > copying to local disk and then extracting. ZipInputStream does not need the > end of the file. Let's read it out. This helps with an additional > TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350904#comment-16350904 ] Shane Kumpf commented on YARN-6456: --- Thanks [~miklos.szeg...@cloudera.com]. {quote}Are you referring to #3 as this? "Maybe the container directories could be outside the application directory." {quote} I was referring the #3 in the Description. {quote}3. There is no way to enforce exclusive use of Docker for all containers. There should be an option that it is not the user but the admin that requires to use Docker. {quote} > Isolation of Docker containers In LinuxContainerExecutor > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7720) Race condition between second app attempt and UAM timeout when first attempt node is down
[ https://issues.apache.org/jira/browse/YARN-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-7720: --- Summary: Race condition between second app attempt and UAM timeout when first attempt node is down (was: [Federation] Race condition between second app attempt and UAM timeout when first attempt node is down) > Race condition between second app attempt and UAM timeout when first attempt > node is down > - > > Key: YARN-7720 > URL: https://issues.apache.org/jira/browse/YARN-7720 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > > In Federation, multiple attempts of an application share the same UAM in each > secondary sub-cluster. When first attempt fails, we reply on the fact that > secondary RM won't kill the existing UAM before the AM heartbeat timeout > (default at 10 min). When second attempt comes up in the home sub-cluster, it > will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to > secondary RMs. > The default heartbeat timeout for NM and AM are both 10 mins. The problem is > that when the first attempt node goes down or out of connection, only after > 10 mins will the home RM mark the first attempt as failed, and then schedule > the 2nd attempt in some other node. By then the UAMs in secondaries are > already timing out, and they might not survive until the second attempt comes > up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350938#comment-16350938 ] Chandni Singh commented on YARN-7572: - Do we want something like this? Due to space constraints the service row is not aligned {code:java} Service: URI Name ID Artifact ID Launch TimeNum Containers State Lifetime app-1 application_1503358878042_00113600 Components: Name Artifact ID Launch CommandNum ContainersState simple sleep 36002 FLEXING master sleep 36001 FLEXING worker sleep 36005 FLEXING {code} > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350942#comment-16350942 ] Miklos Szegedi commented on YARN-6456: -- Sure, I would keep the description just for context but let this Jira cover only 3. above. > Isolation of Docker containers In LinuxContainerExecutor > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7881) Add Log Aggregation Status API to the RM Webservice
[ https://issues.apache.org/jira/browse/YARN-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350954#comment-16350954 ] Giovanni Matteo Fumarola commented on YARN-7881: Thanks [~GergelyNovak] for the patch. Can you please insert the function in the \{{RMWebServiceProtocol}}, and override it in \{{RMWebServices}}? > Add Log Aggregation Status API to the RM Webservice > --- > > Key: YARN-7881 > URL: https://issues.apache.org/jira/browse/YARN-7881 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-7881.001.patch > > > The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which > shows the log aggregation status for all the nodes that run containers for > the given application. In order to add a similar page to the new YARN UI we > need to add an RM WS endpoint first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350938#comment-16350938 ] Chandni Singh edited comment on YARN-7572 at 2/2/18 9:37 PM: - Do we want something like this? Due to space constraints the service row is not aligned {code:java} Service: URI Name ID Artifact ID Launch TimeNum Containers State Lifetime app-1 application_1503358878042_00113600 Components: Name Artifact ID Launch CommandNum ContainersState simple sleep 36002 FLEXING master sleep 36001 FLEXING worker sleep 36005 FLEXING {code} was (Author: csingh): Do we want something like this? Due to space constraints the service row is not aligned {code:java} Service: URI Name ID Artifact ID Launch TimeNum Containers State Lifetime app-1 application_1503358878042_00113600 Components: Name Artifact ID Launch CommandNum ContainersState simple sleep 36002 FLEXING master sleep 36001 FLEXING worker sleep 36005 FLEXING {code} > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350974#comment-16350974 ] Vinod Kumar Vavilapalli commented on YARN-7572: --- This is a generally better presentation than the first one. After running this command, what command do I run to get status per-component inside a specific service? > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350981#comment-16350981 ] Eric Yang commented on YARN-7879: - [~jlowe] Thank you for the reply. We are allowing file cache to be mounted in docker container as read only in YARN-7815. The risk of exposing filename is marginally small, but I like to confirm that is not a problem even the filename contains sensitive information exposed in docker containers. Is it possible to use 750 and group is owned by NM's group? Can cache directory contain subdirectories to prevent this arrangement from working? > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350994#comment-16350994 ] Eric Yang commented on YARN-7572: - Num Container, State and Lifetime on the Service line seems to have missing information. How about formatted JSON? Most modern software like AWS, MongoDB uses formatted JSON output as default. This reduces the chance of having misaligned text output that look great in some terminal but unreadable in others. If we still want to go with preformatted text, I would suggest the following: # Service: [service-name] as first line to avoid app name misalignment when status multiple apps. # Put ID column in front because ID is uniform in length. # we can remove URI and Artifact ID. Those information exists in service spec, and add spec specific command. > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler
[ https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351009#comment-16351009 ] Arun Suresh commented on YARN-7819: --- Updating patch after rebasing. The earlier testcase errors are spurious / not-related. [~templedf], [~haibochen] - let me know if the latest patch > Allow PlacementProcessor to be used with the FairScheduler > -- > > Key: YARN-7819 > URL: https://issues.apache.org/jira/browse/YARN-7819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-7819-YARN-6592.001.patch, > YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, > YARN-7819.004.patch > > > The FairScheduler needs to implement the > {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to > support the FairScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler
[ https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7819: -- Attachment: YARN-7819.004.patch > Allow PlacementProcessor to be used with the FairScheduler > -- > > Key: YARN-7819 > URL: https://issues.apache.org/jira/browse/YARN-7819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-7819-YARN-6592.001.patch, > YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, > YARN-7819.004.patch > > > The FairScheduler needs to implement the > {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to > support the FairScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7819) Allow PlacementProcessor to be used with the FairScheduler
[ https://issues.apache.org/jira/browse/YARN-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351009#comment-16351009 ] Arun Suresh edited comment on YARN-7819 at 2/2/18 10:23 PM: Updating patch after rebasing. The earlier testcase errors are spurious / not-related. [~templedf], [~haibochen] - do let me know if you are ok with the latest patch. was (Author: asuresh): Updating patch after rebasing. The earlier testcase errors are spurious / not-related. [~templedf], [~haibochen] - let me know if the latest patch > Allow PlacementProcessor to be used with the FairScheduler > -- > > Key: YARN-7819 > URL: https://issues.apache.org/jira/browse/YARN-7819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-7819-YARN-6592.001.patch, > YARN-7819-YARN-7812.001.patch, YARN-7819.002.patch, YARN-7819.003.patch, > YARN-7819.004.patch > > > The FairScheduler needs to implement the > {{ResourceScheduler#attemptAllocationOnNode}} function for the processor to > support the FairScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-6456: -- Summary: Allow administrators to set a single ContainerRuntime for all containers (was: Isolation of Docker containers In LinuxContainerExecutor) > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-6456: -- Description: With LCE, there are multiple ContainerRuntimes available for handling different types of containers; default, docker, java sandbox. Admins should have the ability to override the user decision and set a single global ContainerRuntime to be used for all containers. Original Description: {quote}One reason to use Docker containers is to be able to isolate different workloads, even, if they run as the same user. I have noticed some issues in the current design: 1. DockerLinuxContainerRuntime mounts containerLocalDirs {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and modify the files of another container. I think the application file cache directory should be enough for the container to run in most of the cases. 2. The whole cgroups directory is mounted. Would the container directory be enough? 3. There is no way to enforce exclusive use of Docker for all containers. There should be an option that it is not the user but the admin that requires to use Docker. {quote} was: One reason to use Docker containers is to be able to isolate different workloads, even, if they run as the same user. I have noticed some issues in the current design: 1. DockerLinuxContainerRuntime mounts containerLocalDirs {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and modify the files of another container. I think the application file cache directory should be enough for the container to run in most of the cases. 2. The whole cgroups directory is mounted. Would the container directory be enough? 3. There is no way to enforce exclusive use of Docker for all containers. There should be an option that it is not the user but the admin that requires to use Docker. > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Priority: Major > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7778) Merging of placement constraints defined at different levels
[ https://issues.apache.org/jira/browse/YARN-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-7778: - Summary: Merging of placement constraints defined at different levels (was: Merging of constraints defined at different levels) > Merging of placement constraints defined at different levels > > > Key: YARN-7778 > URL: https://issues.apache.org/jira/browse/YARN-7778 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Weiwei Yang >Priority: Major > Attachments: Merge Constraints Solution.pdf, > YARN-7778-YARN-7812.001.patch, YARN-7778-YARN-7812.002.patch, > YARN-7778.003.patch, YARN-7778.004.patch > > > When we have multiple constraints defined for a given set of allocation tags > at different levels (i.e., at the cluster, the application or the scheduling > request level), we need to merge those constraints. > Defining constraint levels as cluster > application > scheduling request, > constraints defined at lower levels should only be more restrictive than > those of higher levels. Otherwise the allocation should fail. > For example, if there is an application level constraint that allows no more > than 5 HBase containers per rack, a scheduling request can further restrict > that to 3 containers per rack but not to 7 containers per rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351028#comment-16351028 ] Vinod Kumar Vavilapalli commented on YARN-7572: --- [~eyang], see my first comment. I'm proposing *both* a human-readable format as well as json (through a --json option) > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7572) Make the service status output more readable
[ https://issues.apache.org/jira/browse/YARN-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351035#comment-16351035 ] Chandni Singh commented on YARN-7572: - [~vinodkv] I don't think we currently have any command that spits out status of a specific component. The only command supported for a component is to flex. A way for users to do it will be to spit out the json and run some sort of json filter tool (like jq) Are you proposing that we support spitting out component status as well? > Make the service status output more readable > - > > Key: YARN-7572 > URL: https://issues.apache.org/jira/browse/YARN-7572 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > > Currently the service status output is just a JSON spec, we can make it more > human readable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7883) Make HAR tool support IndexedLogAggregtionController
Xuan Gong created YARN-7883: --- Summary: Make HAR tool support IndexedLogAggregtionController Key: YARN-7883 URL: https://issues.apache.org/jira/browse/YARN-7883 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a tool to combine aggregated logs into HAR files which currently only work for TFileLogAggregationFileController. We should make it support IndexedLogAggregtionController as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7883) Make HAR tool support IndexedLogAggregtionController
[ https://issues.apache.org/jira/browse/YARN-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-7883. - Resolution: Duplicate > Make HAR tool support IndexedLogAggregtionController > > > Key: YARN-7883 > URL: https://issues.apache.org/jira/browse/YARN-7883 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > > In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a > tool to combine aggregated logs into HAR files which currently only work for > TFileLogAggregationFileController. We should make it support > IndexedLogAggregtionController as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7883) Make HAR tool support IndexedLogAggregtionController
[ https://issues.apache.org/jira/browse/YARN-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351050#comment-16351050 ] Xuan Gong commented on YARN-7883: - Created a MapReduce ticket to track the work progress. Close this one as duplicate > Make HAR tool support IndexedLogAggregtionController > > > Key: YARN-7883 > URL: https://issues.apache.org/jira/browse/YARN-7883 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > > In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a > tool to combine aggregated logs into HAR files which currently only work for > TFileLogAggregationFileController. We should make it support > IndexedLogAggregtionController as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351059#comment-16351059 ] Jason Lowe commented on YARN-7879: -- bq. We are allowing file cache to be mounted in docker container as read only in YARN-7815. If we are mounting a file cache directory into a container then I assume the user running in the Docker container should have the right to read every file under that file cache directory. I do not see the security concern there if that's the case, but maybe I'm missing a key scenario that would be problematic? bq. The risk of exposing filename is marginally small, but I like to confirm that is not a problem even the filename contains sensitive information exposed in docker containers. The only way I can see it being an issue specific to Docker is if somehow something in the Docker container is not trusted that runs as a different user within the Docker container (but still in the hadoop group or equivalent for the Docker container) pokes around for the filename. That thing would have to probe for filenames since there's no read access on the filecache top-level directory, only group-execute permissions. However I would argue that if the user is running untrusted things within the Docker container it's simply much easier to access the sensitive files _as the user_. Then there would be access to the file's contents in addition to the filename. bq. Can cache directory contain subdirectories to prevent this arrangement from working? Yes, if the cache directory manager is being used there can be subdirectories to limit the total number of entries in a single directory. In those cases the intermediate directories are setup with similar 0755 permissions so the NM user can access them easily, see ContainerLocalizer#createParentDirs. This patch is restoring the usercache permissions behavior from before YARN-2185 went in. YARN-2185 wasn't about addressing directory permissions, but it had a sidecar permission change that broke the ability for the NM to reuse non-public localized resources. Therefore I'd like to see this go in so we aren't regressing functionality, and if there are concerns/improvements for how usercache permissions are handled we should address those in a separate JIRA. Either that or we revert YARN-2185, remove the unrelated permissions change, recommit it, and still end up addressing any usercache permissions concerns in a separate JIRA. ;-) > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being re
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351061#comment-16351061 ] Miklos Szegedi commented on YARN-7879: -- Thank you, [~shaneku...@gmail.com] for the report, [~jlowe] for the patch. I checked and the change seems good to me. Since this is a regression, [~eyang], would you mind if I commit it and we continue the discussion here or on another patch? > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7884) Race condition in registering YARN service in ZooKeeper
Eric Yang created YARN-7884: --- Summary: Race condition in registering YARN service in ZooKeeper Key: YARN-7884 URL: https://issues.apache.org/jira/browse/YARN-7884 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Eric Yang In Kerberos enabled cluster, there seems to be a race condition for registering YARN service. Yarn-service znode creation seems to happen after AM started and reporting back to update components information. For some reason, Yarnservice znode should have access to create the znode, but reported NoAuth. {code} 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry user accounts: sasl:hbase 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default system acls: [1,s{'world,'anyone} , 31,s{'sasl,'yarn} , 31,s{'sasl,'jhs} , 31,s{'sasl,'hdfs-demo} , 31,s{'sasl,'rm} , 31,s{'sasl,'hive} ] 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs [31,s{'sasl,'hbase} , 31,s{'sasl,'hbase} ] 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.ComponentEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - Starting Socket Reader #1 for port 56859 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to the server 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server Responder: starting 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC Server listener on 56859: starting 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl client: jaasClientEntry = Client, principal = hbase/eyang-5.openstacklo...@example.com, keytab = /etc/security/keytabs/hbase.service.keytab 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering appattempt_1517611904996_0001_01, abc into registry 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 containers from previous attempt. 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hbase/services/yarn-service/abc/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hbase/services/yarn-service/abc/components 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component sleeper 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT sleeper]: 2 instances. 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT sleeper] Transitioned from INIT to FLEXING on FLEX event. 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - Failed to register app abc in registry org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: `/registry/users/hbase/services/yarn-service/abc': Not authorized to access path; ACLs: [ 0x01: 'world,'anyone 0x1f: 'sasl,'yarn 0x1f: 'sasl,'jhs 0x1f: 'sasl,'hdfs-demo 0x1f: 'sasl,'rm 0x1f: 'sasl,'hive 0x1f: 'sasl,'hbase 0x1f: 'sasl,'hbase ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOper
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351065#comment-16351065 ] Shane Kumpf commented on YARN-7879: --- Thanks for the patch [~jlowe] - I've tested the patch and it fixes the problem I reported. +1 (non-binding) from me. > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7879) NM user is unable to access the application filecache due to permissions
[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351083#comment-16351083 ] Eric Yang commented on YARN-7879: - {quote} The only way I can see it being an issue specific to Docker is if somehow something in the Docker container is not trusted that runs as a different user within the Docker container {quote} [~jlowe] Thanks for the reassurance. I think YARN-7516 and YARN-7221 combination will eliminate the risk to make sure only authorized sudoers can impersonate in docker containers to remove this loophole. [~miklos.szeg...@cloudera.com] Yes, I think this change is fine, and there are possible solutions to eliminate the concerns. Thanks > NM user is unable to access the application filecache due to permissions > > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Shane Kumpf >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org