[jira] [Commented] (YARN-3575) Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted
[ https://issues.apache.org/jira/browse/YARN-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548118#comment-14548118 ] Jason Lowe commented on YARN-3575: -- The only way to support compatibility would be to remove the epoch number field from the container ID, but I doubt that's going to happen at this point. I filed this mostly to document the fact that an incompatibility exists. Most likely we'll have to recommend that users do _not_ perform a restart of the RM where it tries to recover (and therefore starts using an epoch number in container IDs) as long as applications are running on the grid using YARN client jars version 2.5 or earlier. RM restart with recovery would only be supported as long as all applications are using YARN jars = 2.6. Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted - Key: YARN-3575 URL: https://issues.apache.org/jira/browse/YARN-3575 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Trying to launch a job that uses the 2.5 jars fails on a 2.6 cluster whose RM has been restarted (i.e.: epoch != 0) becaue the epoch number starts appearing in the container IDs and the 2.5 jars no longer know how to parse the container IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548130#comment-14548130 ] Jason Lowe commented on YARN-41: Looking at the table above, I'm wondering about the case where we are doing a graceful shutdown, recovery is enabled, but we are not running under supervision. When we are shutting down without supervision the NM will normally kill active containers, so in that sense I think we should also unregister. I'm not sure there's a point to avoiding the unregister if no containers will survive the NM shutdown. The NM only avoids killing containers on shutdown if the NM supports recovery and it has been told it is being supervised (i.e.: it is likely the NM will be restarted shortly after the shutdown completes). In the other cases it kills containers on shutdown to avoid a situation where containers are running uncontrolled on a node due to the NM being unavailable for a prolonged duration. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Labels: BB2015-05-TBR Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548102#comment-14548102 ] Tsuyoshi Ozawa commented on YARN-2336: -- [~ajisakaa] thank you for updating. We're almost there. TestRMWebServicesFairScheduler#testClusterSchedulerWithSubQueues: Can we add a following test to verify non-existence of the field 'childQueues'? Also, could you add a same kind of test to TestRMWebServicesCapacitySched for the consistency of APIs between CapacityScheduler and FairScheduler? {code} try { subQueueInfo.getJSONObject(1).getJSONObject(childQueues); Assert.fail(subQueue should omit field 'childQueues' when childQueue + is empty.); } catch (JSONException je) { je.getMessage().contains(JSONObject[\childQueues\] not found.); } {code} ResourceManagerRest.md: we should describe childQueues is omitted if the queue doesn't have childQueue: {code} | childQueues | array of queues(JSON)/queue objects(XML) | A collection of sub-queue information | {code} We should fix CapacityScheduler's 'queues' field as same as FairScheduler's one: {code} | queues | array of queues(JSON)/zero or more queue objects(XML) | A collection of queue resources | {code} Minor nits: A following comment can be fixed as return null to omit childQueues field when its size is zero.. Also we should add a reason to do this like This is for consistency of return value of REST API between FairScheduler and CapacityScheduler - childQueues and . What do you think? {code} +// return null for FairSchedulerLeafQueueInfo to avoid childQueues being +// displayed in the response of REST API. {code} Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548141#comment-14548141 ] Junping Du commented on YARN-41: Thanks [~devaraj.k] for providing truth table above. Contents looks mostly good to me. One interesting case is when yarn.nodemanager.recovery.enabled=true but yarn.nodemanager.recovery.supervised=false, the shutdown behavior after YARN-2331 is: running containers will get killed, but recovery work will continue after NM get restarted. Theoretically, I think we should unregister NM from RM because we don't expect app/container get recovered on this NM. bq. As per my understanding I assumed here that NM is under supervision enabled only when the NM recovery is enabled. Agree. In practical, if NM is under supervision while NM recovery is disabled, the behavior should be exactly the same as both config are set to false. [~jlowe], any comments here? The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Labels: BB2015-05-TBR Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548145#comment-14548145 ] Junping Du commented on YARN-41: bq. Jason Lowe, any comments here? Sorry. I didn't see Jason's comments when I am putting my comments. But looks like we are talking about the same thing in parallel. :) The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Labels: BB2015-05-TBR Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548123#comment-14548123 ] sandflee commented on YARN-3668: thanks [~stevel], we're using our own AM not slider, and some online service are running on it, we really don't want applications to be killed because of AM's failure. Long run service shouldn't be killed even if Yarn crashed - Key: YARN-3668 URL: https://issues.apache.org/jira/browse/YARN-3668 Project: Hadoop YARN Issue Type: Wish Reporter: sandflee For long running service, it shouldn't be killed even if all yarn component crashed, with RM work preserving and NM restart, yarn could take over applications again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547966#comment-14547966 ] Hadoop QA commented on YARN-2821: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 36s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733521/YARN-2821.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 363c355 | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7967/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7967/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7967/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container
[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547997#comment-14547997 ] gu-chi commented on YARN-1922: -- Hi, I see you comment here to check in YARN-1922.5.patch, but why YARN-1922.6.patch merged? What is the concern? I find this solution may have defect. Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. Below is error scenario, task clean up not finished but NM was killed, then started 2015-05-14 21:49:03,063 | INFO | DeletionService #1 | Deleting absolute path : /export/data1/yarn/nm/localdir/usercache/omm/appcache/application_1430456703237_8047/container_1430456703237_8047_01_12582917 | org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:400) 2015-05-14 21:49:03,063 | INFO | AsyncDispatcher event handler | Container container_1430456703237_8047_01_12582917 transitioned from EXITED_WITH_SUCCESS to DONE | org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:918) 2015-05-14 21:49:03,064 | INFO | AsyncDispatcher event handler | Removing container_1430456703237_8047_01_12582917 from application application_1430456703237_8047 | org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl$ContainerDoneTransition.transition(ApplicationImpl.java:340) 2015-05-14 21:49:03,064 | INFO | AsyncDispatcher event handler | Considering container container_1430456703237_8047_01_12582917 for log-aggregation | org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.startContainerLogAggregation(AppLogAggregatorImpl.java:342) 2015-05-14 21:49:03,064 | INFO | AsyncDispatcher event handler | Got event CONTAINER_STOP for appId application_1430456703237_8047 | org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:196) 2015-05-14 21:49:03,152 | INFO | Node Status Updater | Removed completed containers from NM context: [container_1430456703237_8047_01_12582917] | org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeCompletedContainersFromContext(NodeStatusUpdaterImpl.java:417) 2015-05-14 21:49:03,293 | INFO | Task killer for 26924 | Using linux-container-executor.users as omm | org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:349) 2015-05-14 21:49:20,667 | INFO | main | STARTUP_MSG: / STARTUP_MSG: Starting NodeManager STARTUP_MSG: host = SR6S11/192.168.10.21 STARTUP_MSG: args = [] STARTUP_MSG: version = V100R001C00 STARTUP_MSG: classpath = Process group remains alive after container process is killed externally Key: YARN-1922 URL: https://issues.apache.org/jira/browse/YARN-1922 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Environment: CentOS 6.4 Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 2.6.0 Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch If the main container process is killed externally, ContainerLaunch does not kill the rest of the process group. Before sending the event that results in the ContainerLaunch.containerCleanup method being called, ContainerLaunch sets the completed flag to true. Then when cleaning up, it doesn't try to read the pid file if the completed flag is true. If it read the pid file, it would proceed to send the container a kill signal. In the case of the DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3670) Add JobConf XML link in Yarn RM UI
[ https://issues.apache.org/jira/browse/YARN-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548047#comment-14548047 ] Jason Lowe commented on YARN-3670: -- Sure, we could solve this by adding APIs for applications to convey job-specific settings or metrics. However this is far from a minor improvement and could be quite involved, including questions like where will these settings be stored (probably timelineserver), limits on the key/value sizes, limits on total amount of data that can be stored, etc. Add JobConf XML link in Yarn RM UI -- Key: YARN-3670 URL: https://issues.apache.org/jira/browse/YARN-3670 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Request to add JobConf xml link for each application in the RM UI so I don't have to keep recovering it from HDFS to debug if job settings are taking effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547990#comment-14547990 ] Tsuyoshi Ozawa commented on YARN-2336: -- OK, I'm checking it. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547959#comment-14547959 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 23s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 10m 1s | The applied patch generated 2 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 16s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733531/YARN-3411-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7966/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7966/console | This message was automatically generated. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-2336: - Affects Version/s: 2.6.0 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3670) Add JobConf XML link in Yarn RM UI
Hari Sekhon created YARN-3670: - Summary: Add JobConf XML link in Yarn RM UI Key: YARN-3670 URL: https://issues.apache.org/jira/browse/YARN-3670 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Request to add JobConf xml link for each application in the RM UI so I don't have to keep recovering it from HDFS to debug if job settings are taking effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3670) Add JobConf XML link in Yarn RM UI
[ https://issues.apache.org/jira/browse/YARN-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548039#comment-14548039 ] Hari Sekhon commented on YARN-3670: --- I understand (you can't imagine how often I've had to refrain from asking for Maps and Reduces to be available in UI like trusty old MRv1). At the moment it's too opaque, which is a first generation side-effect of Yarn being a generally abstracting resource manager. However, perhaps it would be possible to create a generic mechanism that allows jobs to publish key-value pairs which could be populated with either settings or counters to be displayed via a new named link also specified per set of key-value pairs, such as a job could publish counters or configuration. In this way both the MapReduce client, the Tez client and any other YARN apps would have the ability to publish any arbitrary key-value pair table of information to YARN to display in the UI? This would help immensely with debugging yarn jobs. Add JobConf XML link in Yarn RM UI -- Key: YARN-3670 URL: https://issues.apache.org/jira/browse/YARN-3670 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Request to add JobConf xml link for each application in the RM UI so I don't have to keep recovering it from HDFS to debug if job settings are taking effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3670) Add JobConf XML link in Yarn RM UI
[ https://issues.apache.org/jira/browse/YARN-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548020#comment-14548020 ] Jason Lowe commented on YARN-3670: -- This is not easily implementable in the general case, since not all YARN applications have a job conf that is an XML file and accessible via a link. While this is true for MapReduce jobs, YARN is not MapReduce-specific. This is similar to asking for the number of maps and reduces to be available on the RM UI. Add JobConf XML link in Yarn RM UI -- Key: YARN-3670 URL: https://issues.apache.org/jira/browse/YARN-3670 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Request to add JobConf xml link for each application in the RM UI so I don't have to keep recovering it from HDFS to debug if job settings are taking effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411-YARN-2928.004.patch Uploading YARN-3411-YARN-2928.004.patch. The earlier patch had a line missing, I think it got deleted by mistake when I was looking through the patch file. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411-YARN-2928.003.patch Attaching new patch YARN-3411-YARN-2928.003.patch with code updated as per review suggestions. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547730#comment-14547730 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 5s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 3m 18s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733480/YARN-3411-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7965/console | This message was automatically generated. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547601#comment-14547601 ] Mohammad Shahid Khan commented on YARN-3645: Loading with invalid node configuration is not feasible. But instead of throwing the NullPointerException, we can *AllocationConfigurationException* with proper message so that the reason of failure could be identified easily. {code} if (aclAdministerApps.equals(field.getTagName())) { Text aclText = (Text)field.getFirstChild(); if (aclText == null) { throw new AllocationConfigurationException( Invalid admin ACL configuration in allocation file); } String text = ((Text)field.getFirstChild()).getData(); acls.put(QueueACL.ADMINISTER_QUEUE, new AccessControlList(text)); } {code} ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml Key: YARN-3645 URL: https://issues.apache.org/jira/browse/YARN-3645 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin The aclSubmitApps is configured in fair-scheduler.xml like below: queue name=mr aclSubmitApps/aclSubmitApps /queue The resourcemanager log: 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) Caused by: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) ... 9 more 2015-05-14 12:59:48,623 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2015-05-14 12:59:48,623 INFO com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin transitionToStandbyIn 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When stopping the service ResourceManager : java.lang.NullPointerException java.lang.NullPointerException at com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at
[jira] [Commented] (YARN-126) yarn rmadmin help message contains reference to hadoop cli and JT
[ https://issues.apache.org/jira/browse/YARN-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547669#comment-14547669 ] Akira AJISAKA commented on YARN-126: Thanks [~rémy] for updating the patch. Some comments. 1. Would you fix TestGenericOptionsParser? 2. Would you fix the indent size? The indent size is 2 whitespaces. 3. Would you update CommandsManual.md as well? 4. I'm thinking we can remove the deprecated option from command-line help message. {code} +out.println(-jt local|resourcemanager:portspecify a ResourceManager (Deprecated)); {code} 5. The following code can be removed. {code} + conf.set(yarn.resourcemanager.address, localhost:8032, +from -rm command line option); {code} yarn rmadmin help message contains reference to hadoop cli and JT - Key: YARN-126 URL: https://issues.apache.org/jira/browse/YARN-126 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Rémy SAISSY Labels: usability Attachments: YARN-126.002.patch, YARN-126.patch has option to specify a job tracker and the last line for general command line syntax had bin/hadoop command [genericOptions] [commandOptions] ran yarn rmadmin to get usage: RMAdmin Usage: java RMAdmin [-refreshQueues] [-refreshNodes] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshAdminAcls] [-refreshServiceAcl] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3655: Attachment: YARN-3655.001.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547687#comment-14547687 ] zhihai xu commented on YARN-3655: - I uploaded a new patch YARN-3655.001.patch, which added a test case to verify this fix. Without the fix, the test will fail. FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3583: -- Attachment: 0003-YARN-3583.patch Thank you [~leftnoteasy] for the comments. Updated a patch addressing the comments. bq. I think it's better to add it separately to make sure it will be tested. In TestClientRMServices we have test case and I added new test case in TestYarnClient also for GetNodeLabels which will check the NodeIdToLabelsInfo. In TestPBImplsRecords, we only have test case for Request and Response PB Objects, and the inner values are validated in same. Here we need to test an inner object for a ResponsePBImpl and is covered by above tests. Pls let me know if this is fine. Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Attachment: YARN-2336.009.patch Thanks [~ozawa] for comments! Updated the patch. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart
[ https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548239#comment-14548239 ] Siqi Li commented on YARN-3468: --- Agreed. setting yarn.nodemanager.delete.debug-delay-sec to 10 minute is a bad idea in a production cluster. I will mark this jira as won't fix NM should not blindly rename usercache/filecache/nmPrivate on restart - Key: YARN-3468 URL: https://issues.apache.org/jira/browse/YARN-3468 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart
[ https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li resolved YARN-3468. --- Resolution: Won't Fix NM should not blindly rename usercache/filecache/nmPrivate on restart - Key: YARN-3468 URL: https://issues.apache.org/jira/browse/YARN-3468 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v2.patch In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v6.patch Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v2.patch In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548335#comment-14548335 ] Junping Du commented on YARN-3411: -- bq. No, we will never drop the last value. MIN_VERSIONS and TTL are set such that last value is always retained. I am setting the MAX_VERSIONS for now to 200, but we can revisit this when we determine how exactly the timeseries data is going to be handled. And of course it can be made configurable. I meant the earliest (oldest) value not the latest. Agree that we can revisit the value later for other cases that I mentioned above, but just want to double check we don't have other options, i.e. making time series data as different rows or columns rather than different timestamps/versions here. bq. Wondering why we would aggregate data in one timeseries for one metric over time? That's because the interested interval (present to enduser) is not always the same interval for gathering timeline metrics data. Let's saying we received container metrics data from NodeManager every second, but the aggregated data user interested is per minutes, then we need to aggregate 60 seconds data for one single metrics. Make sense? Thanks for updating the patch. Just quickly check latest patch, a few comments so far: 1. Sounds like we don't leverage single row transaction of HBase feature, as we are updating different column families (events, configurations, metrics, etc.) separately. Do we need to make sure data in each row get updated consistently? 2. We shouldn't swallow exception in updating data to HBase, just log.error() may not be enough. 3. We need to check null in writing TimelineEntity to HBase, as TimelineEntity could include null events/configurations/metrics, that could make foreach later throw NPE exception. Comments with more details could come later. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3605) _ as method name may not be supported much longer
[ https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3605: --- Hadoop Flags: Incompatible change _ as method name may not be supported much longer - Key: YARN-3605 URL: https://issues.apache.org/jira/browse/YARN-3605 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Joseph Evans I was trying to run the precommit test on my mac under JDK8, and I got the following error related to javadocs. (use of '_' as an identifier might not be supported in releases after Java SE 8) It looks like we need to at least change the method name to not be '_' any more, or possibly replace the HTML generation with something more standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3579) CommonNodeLabelsManager should support NodeLabel instead of string label name when getting node-to-label/label-to-label mappings
[ https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3579: - Fix Version/s: 2.8.0 CommonNodeLabelsManager should support NodeLabel instead of string label name when getting node-to-label/label-to-label mappings Key: YARN-3579 URL: https://issues.apache.org/jira/browse/YARN-3579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Priority: Minor Fix For: 2.8.0 Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch, 0003-YARN-3579.patch, 0004-YARN-3579.patch CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is not passing information such as Exclusivity etc back to REST interface apis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit
[ https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3669: Labels: newbie (was: ) Attempt-failures validatiy interval should have a global admin configurable lower limit --- Key: YARN-3669 URL: https://issues.apache.org/jira/browse/YARN-3669 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Labels: newbie Found this while reviewing YARN-3480. bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. I think we need to have a lower limit on the failure-validaty interval to avoid situations like this. Having this will avoid pardoning too-many failures in too-short a duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Attachment: YARN-1735.v2.patch For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548350#comment-14548350 ] Hadoop QA commented on YARN-2876: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 3m 26s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733579/YARN-2876.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 060c84e | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7972/console | This message was automatically generated. In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Labels: (was: BB2015-05-TBR) The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548410#comment-14548410 ] Hadoop QA commented on YARN-2336: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 59s | Site still builds. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 16s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 52m 5s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 18s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733572/YARN-2336.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / bcc1786 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7969/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7969/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7969/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7969/console | This message was automatically generated. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548356#comment-14548356 ] Hadoop QA commented on YARN-1945: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 3m 19s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733578/YARN-1945.v6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 060c84e | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7973/console | This message was automatically generated. Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3069: - Attachment: YARN-3069.009.patch - Move yarn.client.app-submission.poll-interval to DeprecatedProperties.md - Add new property yarn.application.classpath.prepend.distcache to yarn-default.xml - Update properties descriptions and values based on Akira's feedback Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: (was: YARN-2876.v2.patch) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3605) _ as method name may not be supported much longer
[ https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548375#comment-14548375 ] Robert Joseph Evans commented on YARN-3605: --- This is not a newbie issue. The code that has the _ method in it is generated code, and the code that generates it is far from simple. This is also technically a backwards incompatible change, because other YARN applications could be using it. _ as method name may not be supported much longer - Key: YARN-3605 URL: https://issues.apache.org/jira/browse/YARN-3605 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Joseph Evans I was trying to run the precommit test on my mac under JDK8, and I got the following error related to javadocs. (use of '_' as an identifier might not be supported in releases after Java SE 8) It looks like we need to at least change the method name to not be '_' any more, or possibly replace the HTML generation with something more standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3605) _ as method name may not be supported much longer
[ https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-3605: -- Labels: (was: newbie) _ as method name may not be supported much longer - Key: YARN-3605 URL: https://issues.apache.org/jira/browse/YARN-3605 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Joseph Evans I was trying to run the precommit test on my mac under JDK8, and I got the following error related to javadocs. (use of '_' as an identifier might not be supported in releases after Java SE 8) It looks like we need to at least change the method name to not be '_' any more, or possibly replace the HTML generation with something more standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548379#comment-14548379 ] Vrushali C commented on YARN-3411: -- Hi [~djp] Thanks for the initial quick feedback! Some responses below: bq. Do we need to make sure data in each row get updated consistently? I was thinking it is not necessary since the entity information would come in a more streaming fashion, one update at a time anyways. If say, one column is written and other is not, the callee can retry again, hbase put will simply over-write existing value. bq. We shouldn't swallow exception in updating data to HBase, just log.error() may not be enough. Okay, let me look through and modify that. bq. We need to check null in writing TimelineEntity to HBase, as TimelineEntity could include null events/configurations/metrics, that could make foreach later throw NPE exception I have added some null checks, I will go over the code again and update it to ensure I have null checks for entity class members like configurations, metrics etc. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548378#comment-14548378 ] Hadoop QA commented on YARN-1735: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 3m 18s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733581/YARN-1735.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 060c84e | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7974/console | This message was automatically generated. For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548383#comment-14548383 ] Vrushali C commented on YARN-3411: -- The patch has an overall -1 due to a couple of javadoc warnings. Will fix those in the next patch. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1896) For FairScheduler expose MinimumQueueResource of each queue in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548265#comment-14548265 ] Siqi Li commented on YARN-1896: --- As I said in the description, it would be good to have MinimumQueueResource and MaximumQueueResource exposed through QueueMetrics. By doing this, we can not only see the current usage of a queue but also the resource limits For FairScheduler expose MinimumQueueResource of each queue in QueueMetrics --- Key: YARN-1896 URL: https://issues.apache.org/jira/browse/YARN-1896 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1896.v1.patch, YARN-1896.v2.patch For FairScheduler, it's very useful to expose MinimumQueueResource and MaximumQueueResource of each queu in QueueMetrics. Therefore, people can use monitoring graph to see what are their current usage and their limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548271#comment-14548271 ] Hadoop QA commented on YARN-2876: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733577/YARN-2876.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bcc1786 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7971/console | This message was automatically generated. In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3302: -- Attachment: YARN-3302-trunk.002.patch Updated patch to address Ravi' s comments TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548261#comment-14548261 ] Devaraj K commented on YARN-41: --- Thanks a lot [~jlowe] and [~djp] for your comments. I will update the patch considering that NM unregistering with RM when NM recovery enabled and supervision disabled as well. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Labels: BB2015-05-TBR Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548274#comment-14548274 ] Hadoop QA commented on YARN-3302: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 18s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 49s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 23m 42s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733573/YARN-3302-trunk.002.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / bcc1786 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7970/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7970/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7970/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7970/console | This message was automatically generated. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548460#comment-14548460 ] Wangda Tan commented on YARN-3489: -- [~varun_saxena], Tx for updating, I tried to run RM tests locally after applied the patch, a couple of test failures: Results : Failed tests: TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche:383-testRMWritingMassiveHistory:441 null Tests in error: TestAppManager.testRMAppSubmitDuplicateApplicationId:531 » NullPointer TestAppManager.testRMAppSubmitMaxAppAttempts:506 » NullPointer TestAppManager.testRMAppSubmit:463 » NullPointer TestClientRMService.testAppSubmit:859 » NullPointer TestClientRMService.testGetApplications:959 » NullPointer TestClientRMService.testConcurrentAppSubmit:1115 » test timed out after 4000 ... Could you look at them? RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Labels: BB2015-05-RFC Attachments: YARN-3489-branch-2.7.02.patch, YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.patch, YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v2.patch In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: (was: YARN-2876.v2.patch) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548485#comment-14548485 ] Wangda Tan commented on YARN-3565: -- [~Naganarasimha], thanks for updating, mostly looks good, two minor comments: Changes in NodeStatusUpdaterImpl: - convertToNodeLabelSet could be removed - {{+ (nodeLabels));}} this line change is not necessary? NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548492#comment-14548492 ] Bikas Saha commented on YARN-1902: -- An alternate approach that we tried in Apache Tez is to wrap a TaskScheduler around the AMRMClient that would take request from the application and do the matching internally. Since it would know the matching, it could automatically remove the matched requests also. (Still does not remove the race condition but it cleaner wrt to the user as an API). The TaskScheduler was written to be independent of Tez code so that we could contribute it to YARN as a library, however we did not find time to do so. Now that code has evolved quite a bit but the original, well-tested code could still be extracted from Tez 0.1 branch and contributed to YARN if someone is interested in doing that work. Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Assignee: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548496#comment-14548496 ] Hadoop QA commented on YARN-3069: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 7m 28s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 59s | Site still builds. | | {color:green}+1{color} | checkstyle | 2m 1s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 13s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 70m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733585/YARN-3069.009.patch | | Optional Tests | site javadoc javac unit findbugs checkstyle | | git revision | trunk / 060c84e | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/7975/artifact/patchprocess/diffJavacWarnings.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7975/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7975/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7975/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7975/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7975/console | This message was automatically generated. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548480#comment-14548480 ] MENG DING commented on YARN-1902: - Thanks [~bikassaha] and [~vinodkv] for the education and background info. Really helpful. I can now appreciate that there is not a straightforward solution to this problem. Originally I was coming from a pure user experience point of view, where I was thinking that if I ever want to use removeContainerRequest, it should only be because that I need to cancel previous add requests. Yes I may still get the number of containers from the previous requests, but that is understandable. However, I would have never thought that I still need to do removeContainerRequest to remove requests of matched containers in order to make the internal bookkeeping of AMRMClient correct. Why should a user worry about these things? After reading the comments, I start to think that even if we were able to figure out which ResourceRequest to deduct from and automatically deduct it at the Client, it still won't solve race condition 1 (i.e., allocated containers are sitting in RM). So rather than changing the client, can we not do something at the RM side? For example, in AppSchedulingInfo: 1. Maintain a table for total request *only*. The updateResourceRequests() call will update this table to reflect the total requests from the client (matching the client side remoteRequestsTable). 2. Maintain a table for requests that have been satisfied. Every time a successful allocation is made for this application, this table is updated. 3. The difference between table 1 and table 2 will be the outstanding resource requests. This table is updated at every updateResourceRequests() and every successful allocation. Of course proper synchronization needs to be taken care of. 4. The scheduling will be made based on the table 3 (i.e., the outstanding request table). Do you think if this is something worth considering? Thanks a lot in advance. Meng Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Assignee: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548510#comment-14548510 ] Wangda Tan commented on YARN-3583: -- [~sunilg], Thanks updating, mostly looks good, 2 nits: yarn_server_resourcemanager_service_proto: - NodeIdToLabelsProto - NodeIdToLabelsNameProto, otherwise people will confuse this with NodeIdToLabelsInfoProto - A couple of asserts in tests like {{Assert.assertTrue(y.isExclusive() == true)}}, could be changed to {{Assert.assertTrue/False(y.isExclusive())}} Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548516#comment-14548516 ] Gabor Liptak commented on YARN-3645: Maybe we are to extract the functionality into a helper method and use it for all lookup instances? Thanks ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml Key: YARN-3645 URL: https://issues.apache.org/jira/browse/YARN-3645 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin The aclSubmitApps is configured in fair-scheduler.xml like below: queue name=mr aclSubmitApps/aclSubmitApps /queue The resourcemanager log: 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) Caused by: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) ... 9 more 2015-05-14 12:59:48,623 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2015-05-14 12:59:48,623 INFO com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin transitionToStandbyIn 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When stopping the service ResourceManager : java.lang.NullPointerException java.lang.NullPointerException at com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) 2015-05-14 12:59:48,623 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v6.patch Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2884: - Assignee: Kishore Chaliparambil (was: Subru Krishnan) Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3671) Integrate Federation services with ResourceManager
Subru Krishnan created YARN-3671: Summary: Integrate Federation services with ResourceManager Key: YARN-3671 URL: https://issues.apache.org/jira/browse/YARN-3671 Project: Hadoop YARN Issue Type: Sub-task Reporter: Subru Krishnan Assignee: Subru Krishnan This JIRA proposes adding the ability to turn on Federation services like StateStore, cluster membership heartbeat etc in the RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3673) Create a FailoverProxy for Federation services
Subru Krishnan created YARN-3673: Summary: Create a FailoverProxy for Federation services Key: YARN-3673 URL: https://issues.apache.org/jira/browse/YARN-3673 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan This JIRA proposes creating a facade for Federation State and Policy Store to simply access and have a common place for cache management etc that can be used by both Router AMRMProxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: (was: YARN-1945.v6.patch) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Kumar Naik updated YARN-3302: -- Attachment: YARN-3302-trunk.003.patch fixed whitespaces in patch TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan reassigned YARN-2884: Assignee: Subru Krishnan Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Subru Krishnan We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3672) Create Facade for Federation State and Policy Store
Subru Krishnan created YARN-3672: Summary: Create Facade for Federation State and Policy Store Key: YARN-3672 URL: https://issues.apache.org/jira/browse/YARN-3672 Project: Hadoop YARN Issue Type: Sub-task Reporter: Subru Krishnan Assignee: Subru Krishnan This JIRA proposes creating a facade for Federation State and Policy Store to simply access and have a common place for cache management etc that can be used by both Router AMRMProxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3673) Create a FailoverProxy for Federation services
[ https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3673: - Description: This JIRA proposes creating a failover proxy for Federation based on the cluster membership information in the StateStore that can be used by both Router AMRMProxy (was: This JIRA proposes creating a facade for Federation State and Policy Store to simply access and have a common place for cache management etc that can be used by both Router AMRMProxy) Create a FailoverProxy for Federation services -- Key: YARN-3673 URL: https://issues.apache.org/jira/browse/YARN-3673 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan This JIRA proposes creating a failover proxy for Federation based on the cluster membership information in the StateStore that can be used by both Router AMRMProxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548623#comment-14548623 ] Hadoop QA commented on YARN-3583: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 29s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 29s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 106m 46s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 1s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 30s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 50m 23s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 212m 33s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.mapred.TestJobSysDirWithDFS | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733569/0003-YARN-3583.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 182d86d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7968/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7968/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7968/console | This message was automatically generated. Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549099#comment-14549099 ] Jian He commented on YARN-2821: --- thanks Varun ! looks good overall. test case - for below, we need to test that if AM receives any unknown completed container, the numCompletedContainers still equals to the numTotalContainers {code} // ignore containers we know nothing about - probably from a previous // attempt if (!launchedContainers.contains(containerStatus.getContainerId())) { LOG.info(Ignoring completed status of + containerStatus.getContainerId() + ; unknown container(probably launched by previous attempt)); continue; } {code} Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549116#comment-14549116 ] Xuan Gong commented on YARN-3541: - +1 LGTM. Will commit Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549238#comment-14549238 ] Hadoop QA commented on YARN-2876: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 19s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 2s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 87m 7s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733594/YARN-2876.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cdfae44 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7976/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7976/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7976/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7976/console | This message was automatically generated. In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549378#comment-14549378 ] Wangda Tan commented on YARN-3647: -- [~sunilg], thanks for working on this: 1) getLabelsInfoOnNode in RMNodeLabelsManager is not needed actually, you can use CommonNodeLabelsManager.getLabelsByNode instead. Make it public and add readlock should be enough, right? 2) Tests of RMWebServices: I suggest to add tests to make sure all getters with NodeLabel has proper exclusive from NodeLabelsManager to avoid future regression. You can set different label properties like x.exclusive=true y.exclusive=false and check it in test. RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object --- Key: YARN-3647 URL: https://issues.apache.org/jira/browse/YARN-3647 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3647.patch After YARN-3579, RMWebServices apis can use the updated version of apis in CommonNodeLabelsManager which gives full NodeLabel object instead of creating NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549152#comment-14549152 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-trunk-Commit #7856 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7856/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3069: - Attachment: YARN-3069.010.patch - Fix whitespace and deprecation warnings Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3069: - Attachment: (was: YARN-3069.010.patch) Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3668) Long run service shouldn't be killed even if Yarn crashed
[ https://issues.apache.org/jira/browse/YARN-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549218#comment-14549218 ] Steve Loughran commented on YARN-3668: -- [~sandflee] : I know you are using something else, I was just describing what we do to deal with failures. If it is purely AM failure you care about, then setting the restart bit at launch time is enough for YARN to bring things back. If the AM fails too many times in the failure window then the app will fail, for which there is one fix: don't fail as often. I'd actually like a failure code to tell YARN to restart us without counting it as a failure; this would help us do live updates more safely. Long run service shouldn't be killed even if Yarn crashed - Key: YARN-3668 URL: https://issues.apache.org/jira/browse/YARN-3668 Project: Hadoop YARN Issue Type: Wish Reporter: sandflee For long running service, it shouldn't be killed even if all yarn component crashed, with RM work preserving and NM restart, yarn could take over applications again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v3.patch In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, YARN-2876.v3.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
Anubhav Dhoot created YARN-3675: --- Summary: FairScheduler: RM quits when node removal races with continousscheduling on the same node Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AM INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549127#comment-14549127 ] Hadoop QA commented on YARN-3302: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 22m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733601/YARN-3302-trunk.003.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / cdfae44 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7978/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7978/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7978/console | This message was automatically generated. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place --- Key: YARN-3302 URL: https://issues.apache.org/jira/browse/YARN-3302 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Ravi Prakash Assignee: Ravindra Kumar Naik Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, YARN-3302-trunk.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549138#comment-14549138 ] Xuan Gong commented on YARN-3541: - Committed into trunk/branch-2. Thanks, zhijie Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Attachment: YARN-1735.v3.patch For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch, YARN-1735.v3.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549314#comment-14549314 ] Hadoop QA commented on YARN-3069: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:green}+1{color} | checkstyle | 1m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 1s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 8s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 70m 50s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733618/YARN-3069.010.patch | | Optional Tests | site javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7979/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7979/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7979/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7979/console | This message was automatically generated. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store
[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3069: - Attachment: YARN-3069.010.patch - Leave out MR bits from previous patch. Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: BB2015-05-TBR, supportability Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549417#comment-14549417 ] Hadoop QA commented on YARN-1735: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 45s | The applied patch generated 3 new checkstyle issues (total was 129, now 132). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 19s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 1s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 86m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733625/YARN-1735.v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7980/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7980/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7980/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7980/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7980/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7980/console | This message was automatically generated. For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch, YARN-1735.v3.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549249#comment-14549249 ] Hadoop QA commented on YARN-1945: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 13s | The applied patch generated 2 new checkstyle issues (total was 214, now 216). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 40s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 49m 49s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 40s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733599/YARN-1945.v6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cdfae44 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7977/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7977/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7977/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7977/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7977/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7977/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7977/console | This message was automatically generated. Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549459#comment-14549459 ] Hadoop QA commented on YARN-2876: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 45s | The applied patch generated 2 new checkstyle issues (total was 17, now 19). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 19s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 2s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 87m 11s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733632/YARN-2876.v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7981/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7981/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7981/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7981/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7981/console | This message was automatically generated. In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, YARN-2876.v3.patch, screenshot-1.png If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549460#comment-14549460 ] Jian He commented on YARN-3632: --- - the current reorder implementation in containerReleased and containerAllocated is triggered by every single container completed or allocated. This results in time complexity of {code} (#containersCompleted + #containersReleased)* #appsOnNode * log(#appsInQueue) {code} on every node heartbeat, we can improve this by reordering the app after processing all containers of the app to get rid of the first {code} (#containersCompleted + #containersReleased) {code} overhead. - this null check is not needed, if it can never be null; {code} if (updateDemandForQueue != null) { {code} Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1735: -- Attachment: YARN-1735.v4.patch For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch, YARN-1735.v3.patch, YARN-1735.v4.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1945: -- Attachment: YARN-1945.v7.patch Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml --- Key: YARN-1945 URL: https://issues.apache.org/jira/browse/YARN-1945 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.3.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch, YARN-1945.v7.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549725#comment-14549725 ] Hadoop QA commented on YARN-3411: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 15s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 15s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733693/YARN-3411-YARN-2928.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7990/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7990/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7990/console | This message was automatically generated. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549692#comment-14549692 ] Sangjin Lee commented on YARN-3411: --- It seems like the line that adds the hbase configuration was removed from HBaseTimelineWriterImpl. Is that intentional? How would it be able to use and load hbase configuration then? Come to think of it, I think we may need to add both hbase-site.xml and hbase-default.xml? {code} conf.addResource(hbase-default.xml); conf.addResource(hbase-site.xml); {code} [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549776#comment-14549776 ] Sangjin Lee commented on YARN-3411: --- Or, better: {code} Configuration hbaseConf = HBaseConfiguration.create(conf); {code} [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411-YARN-2928.005.patch [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549601#comment-14549601 ] Hadoop QA commented on YARN-3411: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 2s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 46s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 14s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 37m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733677/YARN-3411-YARN-2928.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 463e070 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7986/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7986/console | This message was automatically generated. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3676) Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests
Arun Suresh created YARN-3676: - Summary: Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests Key: YARN-3676 URL: https://issues.apache.org/jira/browse/YARN-3676 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh AssignMultiple is generally set to false to prevent overloading a Node (for eg, new NMs that have just joined) A possible scheduling optimization would be to disregard this directive for apps whose allowed locality is NODE_LOCAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549614#comment-14549614 ] Hadoop QA commented on YARN-1735: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 1 new checkstyle issues (total was 130, now 131). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 20s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 8s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 43s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733665/YARN-1735.v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7984/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7984/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7984/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7984/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7984/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7984/console | This message was automatically generated. For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB --- Key: YARN-1735 URL: https://issues.apache.org/jira/browse/YARN-1735 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Siqi Li Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch, YARN-1735.v3.patch, YARN-1735.v4.patch in monitoring graphs the AvailableMB of each queue regularly spikes between the AllocatedMB and the entire cluster capacity. This cannot be correct since AvailableMB should never be more than the queue max allocation. The spikes are quite confusing since the availableMB is set as the fair share of each queue and the fair share of each queue is bond by their allowed max resource. Other than the spiking, the availableMB is always equal to allocatedMB. I think this is not very useful, availableMB for each queue should be their allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549631#comment-14549631 ] Vrushali C commented on YARN-3411: -- Updating the code now to fix the javadoc warning and [~sjlee0]'s review suggestion. [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-3518: --- Attachment: YARN-3518.002.patch replace RESOURCEMANAGER_CONNECT_MAX_WAIT_MS with RESOURCETRACKER_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, APPLICATIONMASTER_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS and APPLICATIONCLIENT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS default rm/am expire interval should not less than default resourcemanager connect wait time Key: YARN-3518 URL: https://issues.apache.org/jira/browse/YARN-3518 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: sandflee Assignee: sandflee Labels: BB2015-05-TBR, configuration, newbie Attachments: YARN-3518.001.patch, YARN-3518.002.patch take am for example, if am can't connect to RM, after am expire (600s), RM relaunch am, and there will be two am at the same time util resourcemanager connect max wait time(900s) passed. DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS = 15 * 60 * 1000; DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60; DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549505#comment-14549505 ] Chris Douglas commented on YARN-1039: - The semantics of a boolean flag are opaque. The policies enforced by different RM configurations (and versions) will not be- and cannot be made to be- consistent. Application and container priority are already encoded (or in progress, YARN-1963), so it's not just preemption priority or cost. Affinity and anti-affinity are also covered by different features. Discussion has been wide-ranging because it is unclear what long-lived guarantees across existing features (beyond removing the progress bar from the UI, which I hope we can stop mentioning). An implementation that only recognizes infinite and undefined leases could be mapped into duration. Lease duration could also be used to communicate when security tokens cannot be renewed, short-lived guarantees for YARN-2877 containers, boundaries of YARN-1051 reservations, and planned decommissioning. In contrast, the long-lived flag cannot be used for these cases. We could expose probabilistic guarantees (which are what we give in reality), but that's a later issue. Considering the blockers more concretely: bq. (a) reservations (b) white-listed requests or (c) node-label requests getting stuck on a node used by other services' containers that don't exit. Aren't these handled by adding a timeout to allocations, which would also catch cases where this flag is _not_ set? The timeout value could be set across the scheduler to start, but could even be user-visible in later versions... All said, I don't have time to work on this, agree the API can be evolved from the flag, and am -0 on it. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)