[jira] [Comment Edited] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817555#comment-15817555 ] Ying Zhang edited comment on YARN-6031 at 1/11/17 7:49 AM: --- Thanks [~sunilg]. Done and uploaded a new patch. was (Author: ying zhang): Thanks [~sunilg]. Modified the code and uploaded a new patch. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817555#comment-15817555 ] Ying Zhang commented on YARN-6031: -- Thanks [~sunilg]. Modified the code and uploaded a new patch. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-6031: - Attachment: YARN-6031.005.patch > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6082) Webservice connection gets cutoff when it has to send back a large response
[ https://issues.apache.org/jira/browse/YARN-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6082: -- Summary: Webservice connection gets cutoff when it has to send back a large response (was: Webservice connection gets cutoff when it has to send back a large response (webservice)) > Webservice connection gets cutoff when it has to send back a large response > --- > > Key: YARN-6082 > URL: https://issues.apache.org/jira/browse/YARN-6082 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Sunil G >Priority: Critical > > {noformat} > 2017-01-11 07:17:11,475 WARN ipc.Server (Server.java:run(2202)) - Large > response size 4476919 for call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from > 172.27.0.101:39950 Call#951474 Retry#0 > {noformat} > In such cases, json output will get cutoff and client will not get clean > response. > For eg: > {noformat} > Unexpected token I in JSON at position 851 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6082) Webservice connection gets cutoff when it has to send back a large response
[ https://issues.apache.org/jira/browse/YARN-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6082: -- Affects Version/s: 2.7.3 > Webservice connection gets cutoff when it has to send back a large response > --- > > Key: YARN-6082 > URL: https://issues.apache.org/jira/browse/YARN-6082 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Sunil G >Priority: Critical > > {noformat} > 2017-01-11 07:17:11,475 WARN ipc.Server (Server.java:run(2202)) - Large > response size 4476919 for call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from > 172.27.0.101:39950 Call#951474 Retry#0 > {noformat} > In such cases, json output will get cutoff and client will not get clean > response. > For eg: > {noformat} > Unexpected token I in JSON at position 851 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6082) Webservice connection gets cutoff when it has to send back a large response (webservice)
Sunil G created YARN-6082: - Summary: Webservice connection gets cutoff when it has to send back a large response (webservice) Key: YARN-6082 URL: https://issues.apache.org/jira/browse/YARN-6082 Project: Hadoop YARN Issue Type: Bug Reporter: Sunil G Priority: Critical {noformat} 2017-01-11 07:17:11,475 WARN ipc.Server (Server.java:run(2202)) - Large response size 4476919 for call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 172.27.0.101:39950 Call#951474 Retry#0 {noformat} In such cases, json output will get cutoff and client will not get clean response. For eg: {noformat} Unexpected token I in JSON at position 851 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817454#comment-15817454 ] Varun Saxena commented on YARN-6027: bq. That said, what do we think might break or be a problem if we do end up having >1000 runs in one flow in a day? Over time, we would anyways have thousands of runs for that flow. Nothing breaks from our end. From UI side there was a concern about number of flow runs. Anyways that discussion can be taken up on YARN-4489 If we do aggregate data at flow level, wouldn't it be better to have flow ID before user ID in row key. When I was asking about row key, this was the intention. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817393#comment-15817393 ] Bibin A Chundatt commented on YARN-5849: [~templedf] Sorry for the delay. Will try to review the same by today. We are planning to push this change only to trunk rt?? > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817381#comment-15817381 ] Vrushali C commented on YARN-6027: -- bq. Currently, for YARN Web UI, client is aggregating all the duplicated flows before rendering. Hmm, I see. Is that making things slow? Should we consider aggregating at the server side before returning data? That way, the amount of data returned is less. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5304) Ship single node HBase config option with single startup command
[ https://issues.apache.org/jira/browse/YARN-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817376#comment-15817376 ] Vrushali C edited comment on YARN-5304 at 1/11/17 6:30 AM: --- [~jrottinghuis], [~sjlee0] and I had an offline discussion about this. Summarizing the discussion here: - Objective is to make things easier for the simple user who does not know much about HBase to get going with ATSv2 . - Have a sample HBase config with default settings supplied with timeline service that enables settings for timeline service v2. Ensure that documentation has steps clarifying use of this HBase config for the HBase cluster setup. We discussed about having different HBase config files for the server side and for clients. The server side HBase config can be made simple enough if it’s a standalone deployment, say in case we are bringing up a HBase setup on the RM node itself. We discussed about providing a yarn-daemon command that can spin up a ATSv2 HBase backend using the sample HBase config supplied with the code. But this is not feasible since this would mean ensuring the pids for HBase daemons are handled by the yarn command. Also, HBase jars are needed to be made available, which is outside of YARN/ATSv2 in any case. In case of more complex deployment scenarios, like we have, say if there exists an HBase cluster for ATSv2 data separate from other HBase clusters, then we need a way to have different HBase configs such that there is a way to connect from an application on a particular compute node on a hadoop cluster to two different HBase clusters. One connection is for writing timeline service data and another for the application to read/write from/to the other HBase cluster for it’s own purpose, which [~jrottinghuis] addressed in YARN-5265. was (Author: vrushalic): [~jrottinghuis], [~sjlee0] and I had an offline discussion about this. Summarizing the discussion here: - Objective is to make things easier for the simple user who does not know about HBase to get going with ATSv2 . - Have a sample HBase config with default settings supplied with timeline service that enables settings for timeline service v2. Ensure that documentation has steps clarifying use of this HBase config for the HBase cluster setup. We discussed about having different HBase config files for the server side and for clients. The server side HBase config can be made simple enough if it’s a standalone deployment, say in case we are bringing up a HBase setup on the RM node itself. We discussed about providing a yarn-daemon command that can spin up a ATSv2 HBase backend using the sample HBase config supplied with the code. But this is not feasible since this would mean ensuring the pids for HBase daemons are handled by the yarn command. Also, HBase jars are needed to be made available, which is outside of YARN/ATSv2 in any case. In case of more complex deployment scenarios, like we have, say if there exists an HBase cluster for ATSv2 data separate from other HBase clusters, then we need a way to have different HBase configs such that there is a way to connect from an application on a particular compute node on a hadoop cluster to two different HBase clusters. One connection is for writing timeline service data and another for the application to read/write from/to the other HBase cluster for it’s own purpose, which [~jrottinghuis] addressed in YARN-5265. > Ship single node HBase config option with single startup command > > > Key: YARN-5304 > URL: https://issues.apache.org/jira/browse/YARN-5304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Joep Rottinghuis >Assignee: Vrushali C > Labels: YARN-5355, yarn-5355-merge-blocker > > For small to medium Hadoop deployments we should make it dead-simple to use > the timeline service v2. We should have a single command to launch and stop > the timelineservice back-end for the default HBase implementation. > A default config with all the values should be packaged that launches all the > needed daemons (on the RM node) with a single command with all the > recommended settings. > Having a timeline admin command, perhaps an init command might be needed, or > perhaps the timeline service can even auto-detect that and create tables, > deploy needed coprocessors etc. > The overall purpose is to ensure nobody needs to be an HBase expert to get > this going. For those cluster operators with HBase experience, they can > choose their own more sophisticated deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To uns
[jira] [Commented] (YARN-5304) Ship single node HBase config option with single startup command
[ https://issues.apache.org/jira/browse/YARN-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817376#comment-15817376 ] Vrushali C commented on YARN-5304: -- [~jrottinghuis], [~sjlee0] and I had an offline discussion about this. Summarizing the discussion here: - Objective is to make things easier for the simple user who does not know about HBase to get going with ATSv2 . - Have a sample HBase config with default settings supplied with timeline service that enables settings for timeline service v2. Ensure that documentation has steps clarifying use of this HBase config for the HBase cluster setup. We discussed about having different HBase config files for the server side and for clients. The server side HBase config can be made simple enough if it’s a standalone deployment, say in case we are bringing up a HBase setup on the RM node itself. We discussed about providing a yarn-daemon command that can spin up a ATSv2 HBase backend using the sample HBase config supplied with the code. But this is not feasible since this would mean ensuring the pids for HBase daemons are handled by the yarn command. Also, HBase jars are needed to be made available, which is outside of YARN/ATSv2 in any case. In case of more complex deployment scenarios, like we have, say if there exists an HBase cluster for ATSv2 data separate from other HBase clusters, then we need a way to have different HBase configs such that there is a way to connect from an application on a particular compute node on a hadoop cluster to two different HBase clusters. One connection is for writing timeline service data and another for the application to read/write from/to the other HBase cluster for it’s own purpose, which [~jrottinghuis] addressed in YARN-5265. > Ship single node HBase config option with single startup command > > > Key: YARN-5304 > URL: https://issues.apache.org/jira/browse/YARN-5304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Joep Rottinghuis >Assignee: Vrushali C > Labels: YARN-5355, yarn-5355-merge-blocker > > For small to medium Hadoop deployments we should make it dead-simple to use > the timeline service v2. We should have a single command to launch and stop > the timelineservice back-end for the default HBase implementation. > A default config with all the values should be packaged that launches all the > needed daemons (on the RM node) with a single command with all the > recommended settings. > Having a timeline admin command, perhaps an init command might be needed, or > perhaps the timeline service can even auto-detect that and create tables, > deploy needed coprocessors etc. > The overall purpose is to ensure nobody needs to be an HBase expert to get > this going. For those cluster operators with HBase experience, they can > choose their own more sophisticated deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817359#comment-15817359 ] Rohith Sharma K S commented on YARN-6027: - bq. In that case, the data needs to be aggregated at the flow level. Currently, for YARN Web UI, client is aggregating all the duplicated flows before rendering. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4489) Limit flow runs returned while querying flows
[ https://issues.apache.org/jira/browse/YARN-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817354#comment-15817354 ] Rohith Sharma K S commented on YARN-4489: - I would expect few specific functionality to be provided # */flows* should have filter to disable flowRuns in it. By default, I would prefer to disable it. # Give a day, which all are the flowRuns ran for a flow? Note that, number of flowRuns per day is not limited, would go bigger values, so pagination support is required. # I suggest that to provide new REST API */users/rohithsharmaks/flows/$flowName*. Here, support for dateRange so that particular day flowRuns can be retrieved with pagination support. And point-1 can be achieved by default. > Limit flow runs returned while querying flows > - > > Key: YARN-4489 > URL: https://issues.apache.org/jira/browse/YARN-4489 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: YARN-5355 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817335#comment-15817335 ] Vrushali C commented on YARN-6027: -- Thinking more about how many/which records to show in the UI. By default, displaying for current day is good. If current day result set is empty, we could also consider showing the latest day that we have. But, I think we should also have the ability in the UI to select a time-range. In that case, the data needs to be aggregated at the flow level. This can be done in the REST response at the server side. That way, we won't have multiple records being shown for the same flow per day. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817332#comment-15817332 ] Karthik Kambatla commented on YARN-6072: The patch looks good. +1, pending Jenkins. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .Embed
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817326#comment-15817326 ] Vrushali C commented on YARN-6027: -- bq. So, I was thinking, by default can't we give current day flows only? Yes, by default current day is good. If current day result set is empty, we could also consider showing the latest day that we have. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817323#comment-15817323 ] Rohith Sharma K S commented on YARN-6027: - I think we can move the discussion of limiting flowRuns in Flows into YARN-4489. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817317#comment-15817317 ] Sunil G commented on YARN-6031: --- Hi [~Ying Zhang] When InvalidResourceRequestException is thrown from {{validateAndCreateResourceRequest}}, its sure that {{amReq}} is not updated. Hence {{amReq}} will be null. I think you can write a debug log and come out. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817303#comment-15817303 ] Rohith Sharma K S commented on YARN-6027: - bq. could not finally decipher in the comments above why there are separate rows for flow runs of the same flow in the flow activity table? cc Rohith Sharma K S? This should not happen. There should be exactly one row for a flow on a given day. Right, given a day, only one entry for flow and multiple flowRuns can be there. If same flow runs daily lets say 7 days, then */flows* gives 7 flows which are duplicated. So, I was thinking, by default can't we give current day flows only? > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4489) Limit flow runs returned while querying flows
[ https://issues.apache.org/jira/browse/YARN-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817291#comment-15817291 ] Vrushali C commented on YARN-4489: -- In general, I think it is good idea to limit the number of records being returned, regardless of what is being returned, be it flows, entities, applications, anything really. Unless the user explicitly asks for more or disables the limit by setting it to say -1, we should consider having a limit on payload being returned. > Limit flow runs returned while querying flows > - > > Key: YARN-4489 > URL: https://issues.apache.org/jira/browse/YARN-4489 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: YARN-5355 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817288#comment-15817288 ] Vrushali C edited comment on YARN-6027 at 1/11/17 5:55 AM: --- bq. How many do we expect typically ? Can it run into thousands ? So, let's see. Say, something is running every 5 mins on the cluster and let's say it completes in less than 5 mins, meaning we have at least one hadoop job every 5 mins, then that is 60 * 24 / 5 = 288. So much less than a thousand. I think it's reasonable to think it won't be a thousand runs in a day, unless someone is being malicious (triggering a flow run every min is 1440 runs). That said, what do we think might break or be a problem if we do end up having >1000 runs in one flow in a day? Over time, we would anyways have thousands of runs for that flow. bq. I had raised a JIRA to limit flow runs within a flow. We should probably have that support then. Hmm. Is that YARN-4489? In general, it is good idea to limit the number of records being returned, regardless of what is being returned, be it flows, entities, applications, anything really. Unless the user explicitly asks for more or disables the limit by setting it to say -1, we should consider having a limit on payload being returned. was (Author: vrushalic): bq. How many do we expect typically ? Can it run into thousands ? So, let's see. Say, something is running every 5 mins on the cluster and let's say it completes in less than 5 mins, meaning we have at least one hadoop job every 5 mins, then that is 60 * 24 / 5 = 288. So much less than a thousand. I think it's reasonable to think it won't be a thousand runs in a day, unless someone is being malicious (triggering a hadoop job every min is 1440 runs). That said, what do we think might break or be a problem if we do end up having >1000 runs in one flow in a day? Over time, we would have thousands of runs for that flow. bq. I had raised a JIRA to limit flow runs within a flow. We should probably have that support then. Hmm. Is that YARN-4489? In general, it is good idea to limit the number of records being returned, regardless of what is being returned, be it flows, entities, applications, anything really. Unless the user explicitly asks for more or disables the limit by setting it to say -1, we should consider having a limit on payload being returned. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817288#comment-15817288 ] Vrushali C commented on YARN-6027: -- bq. How many do we expect typically ? Can it run into thousands ? So, let's see. Say, something is running every 5 mins on the cluster and let's say it completes in less than 5 mins, meaning we have at least one hadoop job every 5 mins, then that is 60 * 24 / 5 = 288. So much less than a thousand. I think it's reasonable to think it won't be a thousand runs in a day, unless someone is being malicious (triggering a hadoop job every min is 1440 runs). That said, what do we think might break or be a problem if we do end up having >1000 runs in one flow in a day? Over time, we would have thousands of runs for that flow. bq. I had raised a JIRA to limit flow runs within a flow. We should probably have that support then. Hmm. Is that YARN-4489? In general, it is good idea to limit the number of records being returned, regardless of what is being returned, be it flows, entities, applications, anything really. Unless the user explicitly asks for more or disables the limit by setting it to say -1, we should consider having a limit on payload being returned. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817277#comment-15817277 ] Junping Du commented on YARN-6072: -- Latest patch LGTM too. Thanks [~ajithshetty] for quickly addressing our comments and Naga for review. +1 pending on Jenkins (exclude known UT failures). > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in fol
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817265#comment-15817265 ] Ying Zhang commented on YARN-6031: -- For findbugs error, it might be good to keep the null check in case later code change breaks the assumption. Test failure is known and tracked by YARN-5548. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817238#comment-15817238 ] Naganarasimha G R commented on YARN-6072: - Thanks for the patch [~ajithshetty], Approach seems good enough as we need to ensure only {{verifyAndSetConfiguration}} before login as per YARN-2805 moving initialization of elector after Admin service would avoid additional null check. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionTo
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817222#comment-15817222 ] Ajith S commented on YARN-6072: --- Thanks for the comments [~djp] [~jianhe] [~naganarasimha...@apache.org] and [~bibinchundatt] I have considered all the comments and reworked on the patch. Please review > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are ad
[jira] [Updated] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-6072: -- Attachment: YARN-6072.03.branch-2.8.patch YARN-6072.03.patch > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817180#comment-15817180 ] Hadoop QA commented on YARN-6031: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 55s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Redundant nullcheck of amReq which is known to be null in org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(ApplicationSubmissionContext, long, String, boolean, long) Redundant null check at RMAppManager.java:is known to be null in org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(ApplicationSubmissionContext, long, String, boolean, long) Redundant null check at RMAppManager.java:[line 404] | | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846735/YARN-6031.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4060267ed344 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4db119b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/14635/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's
[ https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817181#comment-15817181 ] Rohith Sharma K S commented on YARN-6064: - bq. What do we mean by next set of earlier entities ? The Java Doc defined is *Defines the flow run id. If specified, retrieve the next set of earlier entities from specified id. The set also includes specified fromId.* I do not see any issue with the sentence. Give fromId=10, it retrieves next set of earlier entity means older entities than fromId including 10 i.e 10, 9, 8...!!! > Support fromId for flowRuns and flow/flowRun apps REST API's > > > Key: YARN-6064 > URL: https://issues.apache.org/jira/browse/YARN-6064 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6064-YARN-5355.0001.patch, > YARN-6064-YARN-5355.0002.patch, YARN-6064-YARN-5355.0003.patch > > > Splitting out JIRA YARN-6027 for pagination support for flowRuns, flow apps > and flow run apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817153#comment-15817153 ] Hadoop QA commented on YARN-6081: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 888 unchanged - 3 fixed = 901 total (was 891) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6081 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846734/YARN-6081.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 25d2db55cd11 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4db119b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14634/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14634/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14634/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved contai
[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated
[ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817134#comment-15817134 ] Bibin A Chundatt commented on YARN-3884: Will update patch shortly > RMContainerImpl transition from RESERVED to KILL apphistory status not updated > -- > > Key: YARN-3884 > URL: https://issues.apache.org/jira/browse/YARN-3884 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: Suse11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-easy > Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, > Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, > YARN-3884.0003.patch, YARN-3884.0004.patch, YARN-3884.0005.patch > > > Setup > === > 1 NM 3072 16 cores each > Steps to reproduce > === > 1.Submit apps to Queue 1 with 512 mb 1 core > 2.Submit apps to Queue 2 with 512 mb and 5 core > lots of containers get reserved and unreserved in this case > {code} > 2015-07-02 20:45:31,169 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to > RESERVED > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource= queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, > usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, > numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 > used= cluster= > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.96875 > absoluteUsedCapacity=0.96875 used= > cluster= > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to > ALLOCATED > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf > OPERATION=AM Allocated ContainerTARGET=SchedulerApp > RESULT=SUCCESS APPID=application_1435849994778_0001 > CONTAINERID=container_e24_1435849994778_0001_01_14 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > Assigned container container_e24_1435849994778_0001_01_14 of capacity > on host host-10-19-92-117:64318, which has 6 > containers, used and available > after allocation > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > assignedContainer application attempt=appattempt_1435849994778_0001_01 > container=Container: [ContainerId: > container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, > NodeHttpAddress: host-10-19-92-117:65321, Resource: , > Priority: 20, Token: null, ] queue=default: capacity=0.2, > absoluteCapacity=0.2, usedResources=, > usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, > numContainers=5 clusterResource= > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.default stats: default: capacity=0.2, > absoluteCapacity=0.2, usedResources=, > usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 > used= cluster= > 2015-07-02 20:45:32,143 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_14 Container Transitioned from > ALLOCATED to ACQUIRED > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1435849994778_0002 > on node: host-10-19-92-143:64318 > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource= queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, >
[jira] [Commented] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing
[ https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817106#comment-15817106 ] Hadoop QA commented on YARN-6012: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6012 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846736/YARN-6012.002.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 9f86bfbfa946 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4db119b | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14636/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Remove node label (removeFromClusterNodeLabels) document is missing > --- > > Key: YARN-6012 > URL: https://issues.apache.org/jira/browse/YARN-6012 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha1 >Reporter: Weiwei Yang >Assignee: Ying Zhang > Labels: doc, nodelabel > Attachments: YARN-6012.001.patch, YARN-6012.002.patch > > > Add corresponding documentation for > {code} > yarn rmadmin -removeFromClusterNodeLabels "x,y" > {code} > in yarn node labels doc page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing
[ https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817081#comment-15817081 ] Ying Zhang edited comment on YARN-6012 at 1/11/17 3:58 AM: --- Re-attach the patch file to start Jenkins. was (Author: ying zhang): Re-attach the patch file to start Jekins. > Remove node label (removeFromClusterNodeLabels) document is missing > --- > > Key: YARN-6012 > URL: https://issues.apache.org/jira/browse/YARN-6012 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha1 >Reporter: Weiwei Yang >Assignee: Ying Zhang > Labels: doc, nodelabel > Attachments: YARN-6012.001.patch, YARN-6012.002.patch > > > Add corresponding documentation for > {code} > yarn rmadmin -removeFromClusterNodeLabels "x,y" > {code} > in yarn node labels doc page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing
[ https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817081#comment-15817081 ] Ying Zhang commented on YARN-6012: -- Re-attach the patch file to start Jekins. > Remove node label (removeFromClusterNodeLabels) document is missing > --- > > Key: YARN-6012 > URL: https://issues.apache.org/jira/browse/YARN-6012 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha1 >Reporter: Weiwei Yang >Assignee: Ying Zhang > Labels: doc, nodelabel > Attachments: YARN-6012.001.patch, YARN-6012.002.patch > > > Add corresponding documentation for > {code} > yarn rmadmin -removeFromClusterNodeLabels "x,y" > {code} > in yarn node labels doc page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing
[ https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-6012: - Attachment: YARN-6012.002.patch > Remove node label (removeFromClusterNodeLabels) document is missing > --- > > Key: YARN-6012 > URL: https://issues.apache.org/jira/browse/YARN-6012 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha1 >Reporter: Weiwei Yang >Assignee: Ying Zhang > Labels: doc, nodelabel > Attachments: YARN-6012.001.patch, YARN-6012.002.patch > > > Add corresponding documentation for > {code} > yarn rmadmin -removeFromClusterNodeLabels "x,y" > {code} > in yarn node labels doc page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing
[ https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-6012: - Attachment: (was: YARN-6012.002.patch) > Remove node label (removeFromClusterNodeLabels) document is missing > --- > > Key: YARN-6012 > URL: https://issues.apache.org/jira/browse/YARN-6012 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha1 >Reporter: Weiwei Yang >Assignee: Ying Zhang > Labels: doc, nodelabel > Attachments: YARN-6012.001.patch, YARN-6012.002.patch > > > Add corresponding documentation for > {code} > yarn rmadmin -removeFromClusterNodeLabels "x,y" > {code} > in yarn node labels doc page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817078#comment-15817078 ] Ying Zhang commented on YARN-6031: -- Thanks very much [~sunilg] for the quick review. Comments addressed in the new patch. > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label
[ https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-6031: - Attachment: YARN-6031.004.patch > Application recovery failed after disabling node label > -- > > Key: YARN-6031 > URL: https://issues.apache.org/jira/browse/YARN-6031 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-6031.001.patch, YARN-6031.002.patch, > YARN-6031.003.patch, YARN-6031.004.patch > > > Here is the repro steps: > Enable node label, restart RM, configure CS properly, and run some jobs; > Disable node label, restart RM, and the following exception thrown: > {noformat} > Caused by: > org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: > Invalid resource request, node label not enabled but request contains label > expression > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {noformat} > During RM restart, application recovery failed due to that application had > node label expression specified while node label has been disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817073#comment-15817073 ] Varun Saxena commented on YARN-6027: bq. This should not happen. There should be exactly one row for a flow on a given day. Yes. I think they were retrieving data based on last 24 hours instead of specific dates. That's why duplicate records came. bq. We do have a lot of runs of a flow on a given day, for instance hRaven is running constantly on our cluster. So we do expect several runs of a flow in a day. How many do we expect typically ? Can it run into thousands ? I had raised a JIRA to limit flow runs within a flow. We should probably have that support then. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromId for flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817047#comment-15817047 ] Vrushali C commented on YARN-6027: -- Catching up on this thread. bq. Sangjin Lee, you remember why the user was kept before flow name in row key? To achieve user level offline aggregation? Yes, a flow is identified by the user and cluster it ran on. For instance, I run a sleep job and you run a sleep job, these are different flows. But if I run a sleep 10 times in a day, they are all runs of the same flow. I could not finally decipher in the comments above why there are separate rows for flow runs of the same flow in the flow activity table? cc [~rohithsharma]? This should not happen. There should be exactly one row for a flow on a given day. Yes, pagination is absolutely to be supported. Also, the UI should limit by date as well as number of records and return based on whichever limit is hit first. We do have a lot of runs of a flow on a given day, for instance hRaven is running constantly on our cluster. So we do expect several runs of a flow in a day. > Support fromId for flows API > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817048#comment-15817048 ] Hudson commented on YARN-6079: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11103 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11103/]) YARN-6079. Fix simple spelling errors in yarn test code. Contributed by (junping_du: rev 4db119b7b55213929e5b86f2abb0ed84a21719b5) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA
[jira] [Commented] (YARN-6058) Support for listing all applications i.e /apps
[ https://issues.apache.org/jira/browse/YARN-6058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817037#comment-15817037 ] Vrushali C commented on YARN-6058: -- In the context of retrieving information from the flow activity table, here is a suggestion that will help frameworks like Tez: - In the flow activity table (as well as flow run table), we should also store the framework type of the flow, for instance, for a job that runs as part of the Tez workflow, we should add in a column like "type!tez" and value could be null. - this will enable queries like which are the most recently run flows which have framework type of "Tez" on the flow activity table. One flow can have multiple framework types, for instance, an oozie job can have pig and then hadoop map reduce. The flow will have three columns "type!oozie", "type!pig" , "type!mr". > Support for listing all applications i.e /apps > -- > > Key: YARN-6058 > URL: https://issues.apache.org/jira/browse/YARN-6058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > > Primary use case for /apps is many execution engines runs on top of YARN > example, Tez, MR. These engines will have their own UI's which list specific > type of entities which are published by them Ex: DAG entities. > But, these UI's do not aware of either userName or flowName or applicationId > which are submitted by these engines. > Currently, given that user do not aware of user, flownName, and > applicationId, then he can not retrieve any entities. > By supporting /apps with filters, user can list of application with given > ApplicationType. These applications can be used for retrieving engine > specific entities like DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817019#comment-15817019 ] Hadoop QA commented on YARN-5864: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 157 new + 1651 unchanged - 20 fixed = 1808 total (was 1671) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 35s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 36s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | | | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | | | hadoop.yarn.server.resourcemanager.Tes
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816996#comment-15816996 ] Wangda Tan commented on YARN-6081: -- [~sunilg], [~eepayne]. Could you please review this fix? It will be better to be committed before YARN-5864. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6081: - Priority: Major (was: Critical) > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6081.001.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-6081: - Attachment: YARN-6081.001.patch Attached ver.1 patch. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-6081.001.patch > > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
[ https://issues.apache.org/jira/browse/YARN-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816990#comment-15816990 ] Wangda Tan commented on YARN-6081: -- This is the test case to reproduce the problem: {code} @Test public void testPreemptionNotHappenForSingleReservedQueue() { Logger rootLogger = LogManager.getRootLogger(); rootLogger.setLevel(Level.DEBUG); int[][] qData = new int[][]{ // / A B C { 100, 40, 40, 20 }, // abs { 100, 100, 100, 100 }, // maxCap { 100, 70, 0, 0 }, // used { 10, 30, 0, 0 }, // pending { 0, 50, 0, 0 }, // reserved { 1, 1, 0, 0 }, // apps { -1, 1, 1, 1 }, // req granularity { 3, 0, 0, 0 }, // subqueues }; ProportionalCapacityPreemptionPolicy policy = buildPolicy(qData); policy.editSchedule(); // ensure all pending rsrc from A get preempted from other queues verify(mDisp, times(0)).handle(argThat(new IsPreemptionRequestFor(appA))); } {code} Please note that there's only one active queue. But preemption policy still preempt container from it. > LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved > from pending to avoid unnecessary preemption of reserved container > > > Key: YARN-6081 > URL: https://issues.apache.org/jira/browse/YARN-6081 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > While doing YARN-5864 tests, found an issue when a queue's reserved > > pending. PreemptionResourceCalculator will preempt reserved container even if > there's only one active queue in the cluster. > To fix the problem, we need to deduct reserved from pending when getting > total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6081) LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container
Wangda Tan created YARN-6081: Summary: LeafQueue#getTotalPendingResourcesConsideringUserLimit should deduct reserved from pending to avoid unnecessary preemption of reserved container Key: YARN-6081 URL: https://issues.apache.org/jira/browse/YARN-6081 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical While doing YARN-5864 tests, found an issue when a queue's reserved > pending. PreemptionResourceCalculator will preempt reserved container even if there's only one active queue in the cluster. To fix the problem, we need to deduct reserved from pending when getting total-pending resource for LeafQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816939#comment-15816939 ] Karthik Kambatla commented on YARN-6061: [~yufeigu] - thanks for working on this. I must have misunderstood you. I am in favor of creating a RM-wide UncaughtExceptionHandler, that creates and sends an RMFatalEvent so the RM can either shutdown or transition-to-standby based on whether HA is enabled. This allows the StandbyRM to become Active and run so long as that also doesn't run into the same uncaught exception. Thinking more about this, on receiving a fatal event, the RM should also consult {{yarn.resourcemanager.failfast}} to decide whether to shutdown or transition to standby. That is likely another JIRA though. > Add a customized uncaughtexceptionhandler for critical threads > -- > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6061.001.patch > > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816923#comment-15816923 ] Karthik Kambatla commented on YARN-4212: Thanks for updating the patch here, Yufei. Comments: # SchedulingPolicy ## Can we rename the method to {{alllowedParentPolicies()}}. ## Also, can it be {{Set}} instead? # QueueManager: ## createNewQueues has a couple of spurious new lines. ## Does checkIfParentPolicyAllowed need to be recursive? It seems to be called from createQueue only now? If every queue has a valid parent, wouldn't that suffice? ## Also, shouldn't we check parent policy on updateAllocationConfiguration as well? If that is the case, are we better off doing this check on AllocationConfiguration before we update it? That way, the cluster could continue running while the admin deals with the logged errors and fix the alloc file? ## In updateAllocationConfiguration, the added code seems to set SchedulingPolicy for all queues and not just root and root.default. What am I missing? # In the test, we seem to verify only the policies on initial load. Should we add another test that verifies updating the alloc file? > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-6079. -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 2.9.0 > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816900#comment-15816900 ] Junping Du commented on YARN-6079: -- I have commit the patch to trunk and branch-2. Thanks Vijay for the patch contribution and Grant for review! > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app
[ https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816887#comment-15816887 ] Karthik Kambatla commented on YARN-5831: Thanks for updating the patch, Yufei. The patch looks mostly good. A couple of minor comments: # QueueManager#createNewQueues: What do you think of calling queue.reinit(false) in FSQueue constructor instead of here. I feel that might be less error-prone. We won't need the kind of changes you had to make in TestFSLeafQueue. # QueueManager#initialize: Should the call to reinit be recursive and called after creating "root.default" queue. In practice, this might not matter. > Propagate allowPreemptionFrom flag all the way down to the app > -- > > Key: YARN-5831 > URL: https://issues.apache.org/jira/browse/YARN-5831 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Attachments: YARN-5831.001.patch, YARN-5831.002.patch, > YARN-5831.003.patch > > > FairScheduler allows disallowing preemption from a queue. When checking if > preemption for an application is allowed, the new preemption code recurses > all the way to the root queue to check this flag. > Propagating this information all the way to the app will be more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816878#comment-15816878 ] Junping Du commented on YARN-6079: -- +1. Committing it in. > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816872#comment-15816872 ] Sangjin Lee commented on YARN-5980: --- The latest patch LGTM in the sense that this update reflects additional information on a single node hbase setup as well as the hbase upgrade. Vrushali and I talked offline and it is likely more documentation changes will be needed as part of YARN-5304. I'll wait for a day or so to give others a chance to review it also. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch, YARN-5980.004.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4148: - Fix Version/s: 2.8.0 > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-4148-branch-2.8.003.patch, YARN-4148.001.patch, > YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816837#comment-15816837 ] Junping Du commented on YARN-4148: -- The test failures should be unrelated to the patch. TestAMAuthorization and TestClientRMTokens get tracked in HADOOP-12687 and TestWorkPreservingRMRestart get tracked in YARN-5349. +1 on 2.8 patch. Committing it now. > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-4148-branch-2.8.003.patch, YARN-4148.001.patch, > YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6016) Bugs in AMRMProxy handling (local)AMRMToken
[ https://issues.apache.org/jira/browse/YARN-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816807#comment-15816807 ] Hadoop QA commented on YARN-6016: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 32m 47s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6016 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846712/YARN-6016.v2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7da761c4fec3 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e692316 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14632/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14632/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Bugs in AMRMProxy handling (local)AMRMToken > --- > > Key: YARN-6016 > URL: https://issues.apache.org/jira/browse/YARN-6016 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6016.v1.patch, YARN-6016.v2.patch > > > Two AMRMProxy bugs: > First, the AMRMToken from RM
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816753#comment-15816753 ] Hadoop QA commented on YARN-4148: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 364 unchanged - 2 fixed = 365 total (was 366) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 4s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_121 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Issue | YARN-4148 | | JIRA Patch URL |
[jira] [Updated] (YARN-5864) YARN Capacity Scheduler - Queue Priorities
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5864: - Attachment: YARN-5864.003.patch Updated ver.3 patch, updates: - Only preempt for un-satisfied queues - Updated java docs of CapacitySchedulerConfiguration and optimized options. > YARN Capacity Scheduler - Queue Priorities > -- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.001.patch, YARN-5864.002.patch, > YARN-5864.003.patch, YARN-5864.poc-0.patch, > YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf > > > Currently, Capacity Scheduler at every parent-queue level uses relative > used-capacities of the chil-queues to decide which queue can get next > available resource first. > For example, > - Q1 & Q2 are child queues under queueA > - Q1 has 20% of configured capacity, 5% of used-capacity and > - Q2 has 80% of configured capacity, 8% of used-capacity. > In the situation, the relative used-capacities are calculated as below > - Relative used-capacity of Q1 is 5/20 = 0.25 > - Relative used-capacity of Q2 is 8/80 = 0.10 > In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is > selected by the scheduler first to receive next available resource. > Simply ordering queues according to relative used-capacities sometimes causes > a few troubles because scarce resources could be assigned to less-important > apps first. > # Latency sensitivity: This can be a problem with latency sensitive > applications where waiting till the ‘other’ queue gets full is not going to > cut it. The delay in scheduling directly reflects in the response times of > these applications. > # Resource fragmentation for large-container apps: Today’s algorithm also > causes issues with applications that need very large containers. It is > possible that existing queues are all within their resource guarantees but > their current allocation distribution on each node may be such that an > application which needs large container simply cannot fit on those nodes. > Services: > # The above problem (2) gets worse with long running applications. With short > running apps, previous containers may eventually finish and make enough space > for the apps with large containers. But with long running services in the > cluster, the large containers’ application may never get resources on any > nodes even if its demands are not yet met. > # Long running services are sometimes more picky w.r.t placement than normal > batch apps. For example, for a long running service in a separate queue (say > queue=service), during peak hours it may want to launch instances on 50% of > the cluster nodes. On each node, it may want to launch a large container, say > 200G memory per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816687#comment-15816687 ] Hadoop QA commented on YARN-5416: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 100 unchanged - 2 fixed = 103 total (was 102) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 41m 47s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5416 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846700/YARN-5416-v2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b9513ef2e4c4 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e692316 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14631/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14631/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14631/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped >
[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue
[ https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816669#comment-15816669 ] Daniel Templeton commented on YARN-5554: Looks like we'll need a branch-2 patch. Can you take care of that, [~wilfreds]? > MoveApplicationAcrossQueues does not check user permission on the target queue > -- > > Key: YARN-5554 > URL: https://issues.apache.org/jira/browse/YARN-5554 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg > Labels: oct16-medium > Attachments: YARN-5554.10.patch, YARN-5554.11.patch, > YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, > YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, > YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch > > > moveApplicationAcrossQueues operation currently does not check user > permission on the target queue. This incorrectly allows one user to move > his/her own applications to a queue that the user has no access to -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6016) Bugs in AMRMProxy handling (local)AMRMToken
[ https://issues.apache.org/jira/browse/YARN-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6016: --- Attachment: YARN-6016.v2.patch > Bugs in AMRMProxy handling (local)AMRMToken > --- > > Key: YARN-6016 > URL: https://issues.apache.org/jira/browse/YARN-6016 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6016.v1.patch, YARN-6016.v2.patch > > > Two AMRMProxy bugs: > First, the AMRMToken from RM should not be propagated to AM, since AMRMProxy > will create a local AMRMToken for it. > Second, the AMRMProxy Context is now parse the localAMRMTokenKeyId from > amrmToken, but should be from localAmrmToken. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816634#comment-15816634 ] Ray Chiang commented on YARN-4212: -- Minor nit. You have a method named QueueManager#checkIfParentPoliceAllowed(). I assume you didn't mean to use "Police". Is this supposed to be "Policies" or "Policy"? > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816516#comment-15816516 ] Hadoop QA commented on YARN-6079: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 729 unchanged - 1 fixed = 730 total (was 730) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 28s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 16s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 30s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 14s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6079 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846669/YARN-6079.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f2575bcd51c1 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e692316 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14627/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816512#comment-15816512 ] Devaraj K commented on YARN-5764: - Thanks a lot [~leftnoteasy] for review and comments. bq. What is the benefit to manually specify NUMA node? Since this is potentially complex for end user to specify, I think it's better to directly read data from OS. If the users want to share the NUMA resources in Node Manager machine for non-Yarn applications, then users can specify what all numa nodes and each node capabilities can be used by Yarn using this declaration. I understand there are configurations for specifying numa nodes, each node memory and cpu's. But if we don't have provision for separating the NUMA resources for Yarn, we could end up overlapping the resources used by Yarn and Non-Yarn applications. bq. Does the changes work on platform other than Linux? This patch works for Linux, if this approach is agreeable then I will update for windows as well. bq. I'm not quite sure about if this could happen: with this patch, YARN will launch process one by one on each NUMA node to bind memory/cpu. Is it possible that there's another process (outside of YARN) uses memory of NUMA node which causes processes launched by YARN failed to bind or run? I do think it could happen for memory, we can avoid this using the NUMA node topology declaration for specifying the NUMA resources for Yarn applications. And also it would not be an issue with the soft binding option which you mentioned in the below comment. bq. This patch uses hard binding (get allocated resource on specified node or fail), is it better to specify soft binding (prefer to allocate and can also accept other node). I think soft binding should be default behavior to support NUMA. I think it is a good suggestion, I can update the patch with this by changing '\--membind=nodes' to '\--preferred=node'. I will look forward for your further comments. > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816484#comment-15816484 ] Junping Du commented on YARN-5416: -- Sorry for missing above comments, [~ebadger] and [~jlowe]. Just update v2 patch. > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped > > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Attachments: YARN-5416-v2.patch, YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-5416: - Attachment: YARN-5416-v2.patch > TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently > due to not wait SchedulerApplicationAttempt to be stopped > > > Key: YARN-5416 > URL: https://issues.apache.org/jira/browse/YARN-5416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test, yarn >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Attachments: YARN-5416-v2.patch, YARN-5416.patch > > > The test failure stack is: > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 385.338 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartWaitForPreviousAMToFinish[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 43.134 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:86) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:594) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:1008) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:530) > This is due to the same issue that partially fixed in YARN-4968 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816455#comment-15816455 ] Hadoop QA commented on YARN-2962: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 35s{color} | {color:red} hadoop-yarn-project_hadoop-yarn generated 15 new + 35 unchanged - 0 fixed = 50 total (was 35) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 251 unchanged - 0 fixed = 256 total (was 251) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 51s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | hadoop.yarn.server.resourcemanager.TestRMStoreCommands | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-2962 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846634/YARN-2962.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 314d0dfd68c2 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provid
[jira] [Commented] (YARN-6011) Add a new web service to list the files on a container in AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816442#comment-15816442 ] Hadoop QA commented on YARN-6011: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 30 unchanged - 1 fixed = 32 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 57s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6011 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846685/YARN-6011.4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8c65d5f99881 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e692316 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14629/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14629/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server
[jira] [Commented] (YARN-5830) Avoid preempting AM containers
[ https://issues.apache.org/jira/browse/YARN-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816420#comment-15816420 ] Karthik Kambatla commented on YARN-5830: Comments on the latest patch: # The patch does not apply cleanly anymore. Can we rebase it please? # FSPreemptionThread#run: Based on the contract of identifyContainersToPreempt, it either returns the containers to preempt to return or null. Do we need to check the size? Can we avoid this? # moveNonAMContainerFirst: I see the method swaps AM containers with non-AM ones. However, it is rather long. Aren't we better off having SchedulerNode#getCopiedListOfRunningContainers to return a list with AM containers at the end. Based on the current uses of this method, this should not be a problem. Alternatively, I am fine with implementing another method that returns the list with AM containers at the end. # PreemptableContainers: ## The class should be static. ## Rename the field to numAMContainers? ## The constructor has an empty space between the name and parentheses. ## I would rather add a method addContainer that maintains the list and increments numAMContainers if it is an AM container. ## Maybe, add another field maxAMContainers. That way, addContainer could return false if it exceeds the maxAMContainers, and you could return null without additional checks. # identifyContainersToPreemptOnNode ## The javadoc should provide more details on the logic. ## I would start the args list with FSSchedulerNode and rename last arg to maxAMContainers. Also, update the javadoc to say the same instead of "smallest number" ## If we don't get rid of moveNonAMContainerFirst, that should come immediately after containersToCheck.removeAll without any new lines to group similar code together. ## When {{request <= potential}}, the containers in preemptableContainers is the best so far. This is not the final value the caller method returns. So, we should not call node.addContainersForPreemption here, but instead in the caller. # identifyContainersToPreempt: Can the code be simplified as follows: {code} ... Containers bestContainers = null; int maxAMContainers = Integer.MAX_VALUE; for ( ... identifyPreemptableContainers(blah, blah, maxAMContainers); if (preemptableContainers != null) { if (preemptableContainers.numAMContainers == 0) { return preemptableContainers.containers; } else { bestContainers = preemptableContainers; maxAMContainers = bestContainers.numAMContainers; } } return bestContainers; {code} > Avoid preempting AM containers > -- > > Key: YARN-5830 > URL: https://issues.apache.org/jira/browse/YARN-5830 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Attachments: YARN-5830.001.patch, YARN-5830.002.patch, > YARN-5830.003.patch > > > While considering containers for preemption, avoid AM containers unless > absolutely necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816388#comment-15816388 ] Jian He commented on YARN-5995: --- How about start with below: - Time cost of write op - MutableRate (which contains the total number of ops and avg time) - total failed ops - MutableCounterLong > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816360#comment-15816360 ] Junping Du commented on YARN-4148: -- Thanks Jason. I reopen the ticket for kicking off the jenkins. > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-4148-branch-2.8.003.patch, YARN-4148.001.patch, > YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened YARN-4148: -- > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-4148-branch-2.8.003.patch, YARN-4148.001.patch, > YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6011) Add a new web service to list the files on a container in AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-6011: Attachment: YARN-6011.4.patch > Add a new web service to list the files on a container in AHSWebService > --- > > Key: YARN-6011 > URL: https://issues.apache.org/jira/browse/YARN-6011 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-6011.1.patch, YARN-6011.2.patch, YARN-6011.3.patch, > YARN-6011.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6011) Add a new web service to list the files on a container in AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-6011: Attachment: YARN-6011.3.patch > Add a new web service to list the files on a container in AHSWebService > --- > > Key: YARN-6011 > URL: https://issues.apache.org/jira/browse/YARN-6011 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-6011.1.patch, YARN-6011.2.patch, YARN-6011.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6011) Add a new web service to list the files on a container in AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816340#comment-15816340 ] Xuan Gong commented on YARN-6011: - Thanks for the review. [~djp] bq. 1. For generating URI embed in response of getContainerLogsInfo, I saw some very similar one in other places, like: getLogs(). Can we refactor the code a bit to reuse the same logic? Yes, other than this, we have lots of duplicate codes related to webservice. Create a separate jira for the refactory work: https://issues.apache.org/jira/browse/YARN-6080 Also added the TODO for this. bq. 2. For getContainerLogsInfo(), if an app is not in running or finished state, here we will return bad request. However, I remember in our ATS implementation, RM after restart could send regressioned application state event to ATS, like app creation event to ATS which was running before. Can you double check ATS's app status won't have regression? Otherwise, we shouldn't just simply return a bad request. This is fine. Based on how we generate ApplicationReport from ATS entities, the CREATE event will only provide the created time for the application. The application state is generated based on ApplicationMetricsConstants.STATE_EVENT_INFO which is provided by ApplicationFinishedEvent and ApplicationStateUpdatedEvent. For the ApplicationStateUpdatedEvent, we only submit the event when the app transits from ACCEPTED to RUNNING state. So, when the Application is in RUNNING state in RM, the state will be RUNNING in ATS. Event if the RM restart/ AM restart happens later, the state will not be changed. So, I think that right now, it is fine that we only check for the RUNNING state here . bq. 3. For getContainerLogMeta(), I remember I have some previous comments on refactor code (consolidate similar logic, especially log reader) in previous JIRAs. How's going with that effort? If that effort is not a short term priory for you, please add a TODO here - may be someone else read this part of code could help on that. Link the jira: https://issues.apache.org/jira/browse/YARN-4993 Added the TODO for this. > Add a new web service to list the files on a container in AHSWebService > --- > > Key: YARN-6011 > URL: https://issues.apache.org/jira/browse/YARN-6011 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-6011.1.patch, YARN-6011.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4148: - Attachment: YARN-4148-branch-2.8.003.patch Attaching the patch for branch-2.8. > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-4148-branch-2.8.003.patch, YARN-4148.001.patch, > YARN-4148.002.patch, YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816321#comment-15816321 ] Grant Sohn commented on YARN-6079: -- +1 (non-binding). > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6080) Create WebServiceUtils to have common functions used in RMWebService, NMWebService and AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816316#comment-15816316 ] Xuan Gong commented on YARN-6080: - In LogsCLI, we have codes to create WebService call to RM, NM and AHS. We could move the similar codes to WebServiceUtils and re-use the logic to create REST call to different services. > Create WebServiceUtils to have common functions used in RMWebService, > NMWebService and AHSWebService > > > Key: YARN-6080 > URL: https://issues.apache.org/jira/browse/YARN-6080 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong > > Create WebServiceUtils to remove the duplicate code. Also, provide the > pattern to create webService call which could be used by client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6080) Create WebServiceUtils to have common functions used in RMWebService, NMWebService and AHSWebService
Xuan Gong created YARN-6080: --- Summary: Create WebServiceUtils to have common functions used in RMWebService, NMWebService and AHSWebService Key: YARN-6080 URL: https://issues.apache.org/jira/browse/YARN-6080 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Create WebServiceUtils to remove the duplicate code. Also, provide the pattern to create webService call which could be used by client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vijay updated YARN-6079: Attachment: YARN-6079.001.patch > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > Attachments: YARN-6079.001.patch > > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816191#comment-15816191 ] Junping Du commented on YARN-6072: -- bq. // Set HA configuration should be done before login This is added in YARN-2805. We should set HA related configuration before login. I think current fix should work fine for non-HA case. However, I think addIfService() mostly used for judging if services but not checking null. If we don't explicit check null or put any comments. I suspect later comers could replace it to addService() in refactor work (because it is obviously a service here). So, it should be better to add null check or some comments here. Also, we should mention in comments why we are re-order the sequence here as what YARN-2805 did. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyE
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816152#comment-15816152 ] Hadoop QA commented on YARN-5980: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5980 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846656/YARN-5980.004.patch | | Optional Tests | asflicense mvnsite | | uname | Linux a9f154deed2f 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e692316 | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14626/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch, YARN-5980.004.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816109#comment-15816109 ] Jian He commented on YARN-6072: --- bq. // Set HA configuration should be done before login I don't know why this comment is added. In my understanding, it should at least be fine to move "add admin service" before "add elector service". bq. Hmm yes but additionally we get the log trace too, Yes, I know. I meant it can be such as: new ServiceFailedException("RefreshAll operation failed ", ex); Anyway, based on your explanation, the current patch is also fine to me. these comments are minor. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.r
[jira] [Updated] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5980: - Attachment: YARN-5980.004.patch Uploading v004 that addresses [~sjlee0]'s suggestions. I will file a separate jira to update the coprocessor related steps (and code changes if any) for it to be a dynamic coprocessor, since that will be easier to track changes. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch, YARN-5980.004.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6042) Fairscheduler: Dump scheduler state in log
[ https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6042: --- Description: To improve the debugging of scheduler issues it would be a big improvement to be able to dump the scheduler state into a log on request. The Dump the scheduler state at a point in time would allow debugging of a scheduler that is not hung (deadlocked) but also not assigning containers. Currently we do not have a proper overview of what state the scheduler and the queues are in and we have to make assumptions or guess The scheduler and queue state needed would include (not exhaustive): - instantaneous and steady fair share (app / queue) - AM share and resources - weight - app demand - application run state (runnable/non runnable) - last time at fair/min share was: To improve the debugging of scheduler issues it would be a big improvement to be able to dump the scheduler state into a log on request. The Dump the scheduler state at a point in time would allow debugging of a scheduler that is not hung (deadlocked) but also not assigning containers. Currently we do not have a proper overview of what state the scheduler and the queues are in and we have to make assumptions or guess The scheduler and queue state needed would include (not exhaustive): instantaneous and steady fair share (app / queue) AM share and resources weight app demand application run state (runnable/non runnable) last time at fair/min share > Fairscheduler: Dump scheduler state in log > -- > > Key: YARN-6042 > URL: https://issues.apache.org/jira/browse/YARN-6042 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Yufei Gu >Assignee: Yufei Gu > > To improve the debugging of scheduler issues it would be a big improvement to > be able to dump the scheduler state into a log on request. > The Dump the scheduler state at a point in time would allow debugging of a > scheduler that is not hung (deadlocked) but also not assigning containers. > Currently we do not have a proper overview of what state the scheduler and > the queues are in and we have to make assumptions or guess > The scheduler and queue state needed would include (not exhaustive): > - instantaneous and steady fair share (app / queue) > - AM share and resources > - weight > - app demand > - application run state (runnable/non runnable) > - last time at fair/min share -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vijay reassigned YARN-6079: --- Assignee: vijay (was: Grant Sohn) > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: vijay >Priority: Trivial > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Sohn updated YARN-6079: - Assignee: (was: Grant Sohn) > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Priority: Trivial > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6079) simple spelling errors in yarn test code
[ https://issues.apache.org/jira/browse/YARN-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Sohn reassigned YARN-6079: Assignee: Grant Sohn > simple spelling errors in yarn test code > > > Key: YARN-6079 > URL: https://issues.apache.org/jira/browse/YARN-6079 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Grant Sohn >Assignee: Grant Sohn >Priority: Trivial > > charactor -> character > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: > Assert.assertTrue("invalid label charactor should not add to repo", > caught); > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > Exepected -> Expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: > "Exepected AbsoluteUsedCapacity > 0.95, got: " > expteced -> expected > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: > Assert.fail("Exception is not expteced."); > macthing -> matching > hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: > assertEquals("Expected no macthing requests.", matches.size(), 0); > propogated -> propagated > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: > Assert.assertTrue("Node script time out message not propogated", > protential -> potential > hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: > LOG.info(String.format("Exclude protential property: %s\n", > gsp.propertyName)); > recevied -> received > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: > throw new Exception("Unexpected resource recevied."); > shouldnt -> shouldn't > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: > fail("resourceInfo object shouldnt be available for finished apps"); > Transistion -> Transition > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: > Assert.fail("Transistion to Active should have failed for > refreshAll()"); > Unhelathy -> Unhealthy > hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: > Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6079) simple spelling errors in yarn test code
Grant Sohn created YARN-6079: Summary: simple spelling errors in yarn test code Key: YARN-6079 URL: https://issues.apache.org/jira/browse/YARN-6079 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Grant Sohn Assignee: Grant Sohn Priority: Trivial charactor -> character hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java: Assert.assertTrue("invalid label charactor should not add to repo", caught); expteced -> expected hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: Assert.fail("Exception is not expteced."); Exepected -> Expected hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java: "Exepected AbsoluteUsedCapacity > 0.95, got: " expteced -> expected hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java: Assert.fail("Exception is not expteced."); macthing -> matching hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java: assertEquals("Expected no macthing requests.", matches.size(), 0); propogated -> propagated hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java: Assert.assertTrue("Node script time out message not propogated", protential -> potential hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/BasePBImplRecordsTest.java: LOG.info(String.format("Exclude protential property: %s\n", gsp.propertyName)); recevied -> received hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java: throw new Exception("Unexpected resource recevied."); shouldnt -> shouldn't hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServiceAppsNodelabel.java: fail("resourceInfo object shouldnt be available for finished apps"); Transistion -> Transition hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java: Assert.fail("Transistion to Active should have failed for refreshAll()"); Unhelathy -> Unhealthy hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java: Assert.assertEquals("Unhelathy Nodes", initialUnHealthy, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
[ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815945#comment-15815945 ] Naganarasimha G R commented on YARN-574: Thanks [~ajithshetty] for the patch and yes keeping thread size as "1" ensures current behavior. Overall the approach looks fine but would like to get one relook on the patch from [~jlowe] as well > PrivateLocalizer does not support parallel resource download via > ContainerLocalizer > --- > > Key: YARN-574 > URL: https://issues.apache.org/jira/browse/YARN-574 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Omkar Vinit Joshi >Assignee: Ajith S > Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, > YARN-574.2.patch > > > At present private resources will be downloaded in parallel only if multiple > containers request the same resource. However otherwise it will be serial. > The protocol between PrivateLocalizer and ContainerLocalizer supports > multiple downloads however it is not used and only one resource is sent for > downloading at a time. > I think we can increase / assure parallelism (even for single container > requesting resource) for private/application resources by making multiple > downloads per ContainerLocalizer. > Total Parallelism before > = number of threads allotted for PublicLocalizer [public resource] + number > of containers[private and application resource] > Total Parallelism after > = number of threads allotted for PublicLocalizer [public resource] + number > of containers * max downloads per container [private and application resource] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
[ https://issues.apache.org/jira/browse/YARN-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815926#comment-15815926 ] Daniel Templeton commented on YARN-5849: [~bibinchundatt], any additional comments? > Automatically create YARN control group for pre-mounted cgroups > --- > > Key: YARN-5849 > URL: https://issues.apache.org/jira/browse/YARN-5849 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-5849.000.patch, YARN-5849.001.patch, > YARN-5849.002.patch, YARN-5849.003.patch, YARN-5849.004.patch, > YARN-5849.005.patch, YARN-5849.006.patch, YARN-5849.007.patch, > YARN-5849.008.patch > > > Yarn can be launched with linux-container-executor.cgroups.mount set to > false. It will search for the cgroup mount paths set up by the administrator > parsing the /etc/mtab file. You can also specify > resource.percentage-physical-cpu-limit to limit the CPU resources assigned to > containers. > linux-container-executor.cgroups.hierarchy is the root of the settings of all > YARN containers. If this is specified but not created YARN will fail at > startup: > Caused by: java.io.FileNotFoundException: > /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) > This JIRA is about automatically creating YARN control group in the case > above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815917#comment-15815917 ] Wangda Tan commented on YARN-5889: -- Thanks [~sunilg], Following are my comments for overall code structure and call flow: 1) LeafQueue: - Several unused members, could you check? - Can we move the users map to UsersManager? Ideally all operations on users should be redirected to UsersManager (UM) - recalculateULCount is implementation details of user limit calculation, better to be moved to UM. - Move all user-limit related configurations parameter (like ULF) to UM? Ideally UM should be more self-contained to make less dependencies and risk of deadlock. 2) UsersManager - Better to move to cpaacity package, since it handles CS-only functionalities like user limit. - Add a method like {{userLimitNeedsRecompute}} to handle the original logics of LQ#recalculateULCount - User#setCachedCount, should we invalidateUL for the user who allocates/releases containers, or we should invalidate all user limit? I think the latter one is more safe to me. If you agree, I suggest LQ to call UM#userLimitNeedsRecompute to notify UM. 3) UM, logics to compute UL First, the UL is classified by user-name, active state, scheduling-mode, partition. However I think we don't need user-name. Existing UL will be identical for users in active set and users in all-set. Second, existing logic automatically computes all schedulingMode, which may not necessary. The ignore-exclusivity is not common used, we can compute it only when necessary. If you agree above, we can simplify API a little bit, we only need userName (to get if it's an activeUser), clusterResource, partition. ResourceCalculator can be stored inside UM, we don't need to pass it as parameter everytime. And the call flow may look like: {code} UM#getActiveUserLimit(userName, clusterResource, partition, schedulingMode) { if (needRecompute) { return recompute(userName, clusterResource, partition, schedulingMode) } return getCachedActiveUserLimit(userName, clusterResource, partition, schedulingMode); } {code} 4) ActiveUserManager - I think we don't need to use the class in CS. Adding {{Set}} of UM#User, and add other fields to UM. It could have some duplicated code, but the code structure will be more clean. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, > YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815912#comment-15815912 ] Li Lu commented on YARN-6054: - Thanks [~raviprak]. The committed patch LGTM. Once the old file is backed up we don't need to worry if the repair process would make things worse. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815911#comment-15815911 ] Naganarasimha G R commented on YARN-6072: - Thanks [~jianhe], bq. If HA is not enabled, this call will be adding 'null' elector offline we had discussed on the same but inside addIfService there is {{instanceOf}} check and passing null fails thus not adding it as service. bq. I think we can either move the entire elector creation code after add admin service, or move add admin service before adding elector. Actually we were not sure what were the steps which needs to be done before login (and why ?) based on the comment {{"// Set HA configuration should be done before login"}} so to be on the safer side we just pushed adding of the Elector service only below the adminService. So if you can give more inputs on it we can correct it. bq. I think, the ex.getMessage will just be duplicated in the log trace Hmm yes but additionally we get the log trace too, though current issue is a code error NPE trace was not coming hence we added. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedEx
[jira] [Assigned] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi reassigned YARN-3427: Assignee: Miklos Szegedi (was: Karthik Kambatla) > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Blocker > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4975) Fair Scheduler: exception thrown when a parent queue marked 'parent' has configured child queues
[ https://issues.apache.org/jira/browse/YARN-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815874#comment-15815874 ] Hadoop QA commented on YARN-4975: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 43 unchanged - 1 fixed = 43 total (was 44) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 42m 0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-4975 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846626/YARN-4975.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3db8c3b80b0e 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c18590f | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14624/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14624/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fair Scheduler: exception thrown when a parent queue marked 'parent' has > configured child queues > > > Key: YARN-4975 > URL: https://issues.apache.org/jira/browse/YARN-4975 > Project: Hadoop YARN > Issue
[jira] [Updated] (YARN-4658) Typo in o.a.h.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler comment
[ https://issues.apache.org/jira/browse/YARN-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4658: --- Assignee: Udai Kiran Potluri (was: Nicole Pazmany) > Typo in o.a.h.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > comment > -- > > Key: YARN-4658 > URL: https://issues.apache.org/jira/browse/YARN-4658 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Daniel Templeton >Assignee: Udai Kiran Potluri > > Comment in {{testContinuousSchedulingInterruptedException()}} is > {code} > // Add one nodes > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815842#comment-15815842 ] Jian He commented on YARN-6072: --- - If HA is not enabled, this call will be adding 'null' elector ? I think we can either move the entire elector creation code after add admin service, or move add admin service before adding elector. {code} // elector to be added post adminservice addIfService(elector); {code} - I think, the ex.getMessage will just be duplicated in the log trace ? In addition to add the ex variable, may be replace ex.getMessage() with a more meaningful message for current call only {code} throw new ServiceFailedException(ex.getMessage(), ex); {code} > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused b