[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219089#comment-14219089 ] Devaraj K commented on YARN-2356: - Thanks Sunil for the updated patch. 1. bq. I could see that this return of exit code from each of the printXXXReport() was causing nested if in the caller side, and was becoming less readable. Also killApplication is already rethrowing exception and handling similar way.. Kindly share your thoughts on this. Either way is fine for me. I thought it would be good if we avoid rehandling the same exception. 2. For the tests, Can you remove the try-catch completely like below. If the test code throws any exception, it means it is failure and no need to fail again explicitly. {code:xml} @Test public void testGetContainerReportException() throws Exception { ApplicationCLI cli = createAndGetAppCLI(); ApplicationId applicationId = ApplicationId.newInstance(1234, 5); ApplicationAttemptId attemptId = ApplicationAttemptId.newInstance( applicationId, 1); long cntId = 1; ContainerId containerId1 = ContainerId.newContainerId(attemptId, cntId++); when(client.getContainerReport(containerId1)).thenThrow( new ApplicationNotFoundException("History file for application" + applicationId + " is not found")); int exitCode = cli.run(new String[] { "container", "-status", containerId1.toString() }); verify(sysOut).println( "Application for Container with id '" + containerId1 + "' doesn't exist in RM or Timeline Server."); Assert.assertNotSame("should return non-zero exit code.", 0, exitCode); ContainerId containerId2 = ContainerId.newContainerId(attemptId, cntId++); {code} 3. The patch is not getting applied using 'patch' command, can you check it for next patch? > yarn status command for non-existent application/application > attempt/container is too verbose > -- > > Key: YARN-2356 > URL: https://issues.apache.org/jira/browse/YARN-2356 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: 0001-YARN-2356.patch, Yarn-2356.1.patch > > > *yarn application -status* or *applicationattempt -status* or *container > status* commands can suppress exception such as ApplicationNotFound, > ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in > RM or History Server. > For example, below exception can be suppressed better > sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status > application_1402668848165_0015 > No GC_PROFILE is given. Defaults to medium. > 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at > /10.18.40.77:45022 > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1402668848165_0015' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocol
[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219090#comment-14219090 ] zhihai xu commented on YARN-2679: - Hi [~kasha], Good suggestion, I uploaded a new patch YARN-2679.001.patch to address your comment. thanks zhihai > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2679: Attachment: YARN-2679.001.patch > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch, YARN-2679.001.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219019#comment-14219019 ] Sunil G commented on YARN-1963: --- Thank you [~leftnoteasy] bq. I'd prefer highest + default priority. This configuration will make it easier for admins to config the same. Still I am not convinced with default acceptance coming from lower priorities by default. But I am not seeing any use case where this lower priorities are a problem also. Yes, we can have this as highest + default (this one i already have). Instead of labels per queue, it will be changed as highest per queue. I will update doc as per same, also my patch. bq. extra complexity both in implementation and configuration I agree about the more complicated config and implementation for this part. As you mentioned, if a preemption feature related to YARN-2069 runs in parallel, then the issue which I pointed out can be solved. So user-limit factor preemption if considers priority also, we can get the head room which is needed. User has to enable this preemption though. If this is workaround way is fine for resolving the issue mentioned, then I will file a jira to relate priority with user-limit preemption. Kindly share your thoughts. bq. I didn't see any related code in YarnClient Yes, this code is now in YarnRunner which is part of map reduce. I wanted to see it with YarnClient. > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: YARN Application Priorities Design.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219009#comment-14219009 ] Rohith commented on YARN-2865: -- Thanks Karthik JianHe and Tsuyoshi for your reviews. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218986#comment-14218986 ] Hudson commented on YARN-2315: -- FAILURE: Integrated in Hadoop-trunk-Commit #6579 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6579/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > FairScheduler: Set current capacity in addition to capacity > --- > > Key: YARN-2315 > URL: https://issues.apache.org/jira/browse/YARN-2315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2315.001.patch, YARN-2315.002.patch, > YARN-2315.003.patch, YARN-2315.patch > > > Should use setCurrentCapacity instead of setCapacity to configure used > resource capacity for FairScheduler. > In function getQueueInfo of FSQueue.java, we call setCapacity twice with > different parameters so the first call is overrode by the second call. > queueInfo.setCapacity((float) getFairShare().getMemory() / > scheduler.getClusterResource().getMemory()); > queueInfo.setCapacity((float) getResourceUsage().getMemory() / > scheduler.getClusterResource().getMemory()); > We should change the second setCapacity call to setCurrentCapacity to > configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2315: --- Summary: FairScheduler: Set current capacity in addition to capacity (was: FairScheduler: Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.) > FairScheduler: Set current capacity in addition to capacity > --- > > Key: YARN-2315 > URL: https://issues.apache.org/jira/browse/YARN-2315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2315.001.patch, YARN-2315.002.patch, > YARN-2315.003.patch, YARN-2315.patch > > > Should use setCurrentCapacity instead of setCapacity to configure used > resource capacity for FairScheduler. > In function getQueueInfo of FSQueue.java, we call setCapacity twice with > different parameters so the first call is overrode by the second call. > queueInfo.setCapacity((float) getFairShare().getMemory() / > scheduler.getClusterResource().getMemory()); > queueInfo.setCapacity((float) getResourceUsage().getMemory() / > scheduler.getClusterResource().getMemory()); > We should change the second setCapacity call to setCurrentCapacity to > configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2315) FairScheduler: Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2315: --- Summary: FairScheduler: Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. (was: Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.) > FairScheduler: Should use setCurrentCapacity instead of setCapacity to > configure used resource capacity for FairScheduler. > -- > > Key: YARN-2315 > URL: https://issues.apache.org/jira/browse/YARN-2315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2315.001.patch, YARN-2315.002.patch, > YARN-2315.003.patch, YARN-2315.patch > > > Should use setCurrentCapacity instead of setCapacity to configure used > resource capacity for FairScheduler. > In function getQueueInfo of FSQueue.java, we call setCapacity twice with > different parameters so the first call is overrode by the second call. > queueInfo.setCapacity((float) getFairShare().getMemory() / > scheduler.getClusterResource().getMemory()); > queueInfo.setCapacity((float) getResourceUsage().getMemory() / > scheduler.getClusterResource().getMemory()); > We should change the second setCapacity call to setCurrentCapacity to > configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218977#comment-14218977 ] Karthik Kambatla commented on YARN-2315: +1. > Should use setCurrentCapacity instead of setCapacity to configure used > resource capacity for FairScheduler. > --- > > Key: YARN-2315 > URL: https://issues.apache.org/jira/browse/YARN-2315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2315.001.patch, YARN-2315.002.patch, > YARN-2315.003.patch, YARN-2315.patch > > > Should use setCurrentCapacity instead of setCapacity to configure used > resource capacity for FairScheduler. > In function getQueueInfo of FSQueue.java, we call setCapacity twice with > different parameters so the first call is overrode by the second call. > queueInfo.setCapacity((float) getFairShare().getMemory() / > scheduler.getClusterResource().getMemory()); > queueInfo.setCapacity((float) getResourceUsage().getMemory() / > scheduler.getClusterResource().getMemory()); > We should change the second setCapacity call to setCurrentCapacity to > configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218970#comment-14218970 ] Karthik Kambatla commented on YARN-2679: How about renaming the metric to {{containerLaunchDuration}} and update the method names accordingly? > add container launch prepare time metrics to NM. > > > Key: YARN-2679 > URL: https://issues.apache.org/jira/browse/YARN-2679 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2679.000.patch > > > add metrics in NodeManagerMetrics to get prepare time to launch container. > The prepare time is the duration between sending > ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving > ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218969#comment-14218969 ] Hudson commented on YARN-2802: -- FAILURE: Integrated in Hadoop-trunk-Commit #6578 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6578/]) YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java > ClusterMetrics to include AM launch and register delays > --- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.7.0 > > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, > YARN-2802.005.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218965#comment-14218965 ] Karthik Kambatla commented on YARN-2675: Can we add unit tests to exercise all the newly added transitions? Otherwise, the patch looks good. [~vinodkv] - do the changes look okay to you as well? > the containersKilled metrics is not updated when the container is killed > during localization. > - > > Key: YARN-2675 > URL: https://issues.apache.org/jira/browse/YARN-2675 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2675.000.patch, YARN-2675.001.patch, > YARN-2675.002.patch, YARN-2675.003.patch > > > The containersKilled metrics is not updated when the container is killed > during localization. We should add KILLING state in finished of > ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218961#comment-14218961 ] Hudson commented on YARN-2865: -- FAILURE: Integrated in Hadoop-trunk-Commit #6577 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6577/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2802: --- Summary: ClusterMetrics to include AM launch and register delays (was: add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.) > ClusterMetrics to include AM launch and register delays > --- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, > YARN-2802.005.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218956#comment-14218956 ] Karthik Kambatla commented on YARN-2802: This should be very useful. The patch looks good. +1 > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, > YARN-2802.005.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2599: Assignee: Rohith > Standby RM should also expose some jmx and metrics > -- > > Key: YARN-2599 > URL: https://issues.apache.org/jira/browse/YARN-2599 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-1898 redirects jmx and metrics to the Active. As discussed there, we > need to separate out metrics displayed so the Standby RM can also be > monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
[ https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2880: Assignee: Rohith > Add a test in TestRMRestart to make sure node labels will be recovered if it > is enabled > --- > > Key: YARN-2880 > URL: https://issues.apache.org/jira/browse/YARN-2880 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Rohith > > As suggested by [~ozawa], > [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. > We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project
[ https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2784: - Fix Version/s: (was: 2.6.0) 2.7.0 > Yarn project module names in POM needs to consistent acros hadoop project > - > > Key: YARN-2784 > URL: https://issues.apache.org/jira/browse/YARN-2784 > Project: Hadoop YARN > Issue Type: Improvement > Components: scripts >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-2784.patch > > > All yarn and mapreduce pom.xml has project name has > hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop > projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop > MapReduce ". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218883#comment-14218883 ] Subru Krishnan commented on YARN-2738: -- Thanks [~adhoot] for the updated patch. It looks mostly good. I don't see any way to override the defaults (_Agent_, _Admission Policy_, _Replanner_, etc) at system level, is this intentional? I understand that you do not want to allow configuring them per queue in the first iteration but right now there is no option to even override them even globally as defaults are hard coded. Are you planning to file a separate JIRA for the _Plan Follower_ work as that seems to be the last piece for enabling reservations in _FairScheduler_ :). > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch, > YARN-2738.003.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218855#comment-14218855 ] Karthik Kambatla commented on YARN-2604: Thanks for the updating the patch. I should have made these comments on the earlier patch itself: # findbugs-exclude: there is a duplicate entry there. # Looking at the current code, I wonder if we should keep all logic about having two different maximumAllocations should be limited to AbstractYarnScheduler to avoid mistakes in the future. We can make related fields all private and accessible only through getter/setter methods. # {{updateMaxAllocation}} could take a {{Resource}} and a boolean to denote adding/removing a node, instead of SchedulerNode. That way, we don't have to iterate through all the nodes in the removeNode case. > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, > YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218794#comment-14218794 ] Jian He commented on YARN-2301: --- bq. he port will be node's http NM can setup SSL and so the port can also be https port. bq. Can you please provide more precisely where this check is done internally? I meant {{Times.format}} is internally doing the check. bq. pass the existing config object this will cause a series method signature changes. we may set the conf object in the rmContext and get it from context > Improve yarn container command > -- > > Key: YARN-2301 > URL: https://issues.apache.org/jira/browse/YARN-2301 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Naganarasimha G R > Labels: usability > Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch > > > While running yarn container -list command, some > observations: > 1) the scheme (e.g. http/https ) before LOG-URL is missing > 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to > print as time format. > 3) finish-time is 0 if container is not yet finished. May be "N/A" > 4) May have an option to run as yarn container -list OR yarn > application -list-containers also. > As attempt Id is not shown on console, this is easier for user to just copy > the appId and run it, may also be useful for container-preserving AM > restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218766#comment-14218766 ] Jian He commented on YARN-2765: --- thanks Jason for working on this. It's useful to have leveldb as an option for RMStateStore, as its more lightweight compared to others. reviewing the patch > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218715#comment-14218715 ] Hadoop QA commented on YARN-2865: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682428/YARN-2865.1.patch against trunk revision 73348a4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5883//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5883//console This message is automatically generated. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218666#comment-14218666 ] Hadoop QA commented on YARN-2738: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682356/YARN-2738.003.patch against trunk revision 5bd048e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5882//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5882//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5882//console This message is automatically generated. > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch, > YARN-2738.003.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218651#comment-14218651 ] Hudson commented on YARN-2878: -- FAILURE: Integrated in Hadoop-trunk-Commit #6575 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6575/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm * hadoop-yarn-project/CHANGES.txt > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.6.0 >Reporter: Abin Shahab >Assignee: Abin Shahab > Fix For: 2.7.0 > > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218629#comment-14218629 ] Zhijie Shen commented on YARN-2301: --- bq. i need to create new YARNConfigurations RMContainerImpl constructor and keep it. We shouldn't construct a yarn config object. Instead, when constructing RMContainerImpl, we need to pass the existing config object in as we did for RMAppImpl and RMAppAttemptImpl > Improve yarn container command > -- > > Key: YARN-2301 > URL: https://issues.apache.org/jira/browse/YARN-2301 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Naganarasimha G R > Labels: usability > Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch > > > While running yarn container -list command, some > observations: > 1) the scheme (e.g. http/https ) before LOG-URL is missing > 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to > print as time format. > 3) finish-time is 0 if container is not yet finished. May be "N/A" > 4) May have an option to run as yarn container -list OR yarn > application -list-containers also. > As attempt Id is not shown on console, this is easier for user to just copy > the appId and run it, may also be useful for container-preserving AM > restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218607#comment-14218607 ] Jian He commented on YARN-2878: --- +1, committing. thanks [~ashahab] for the patch and thanks [~ajisakaa] for the review ! > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.6.0 >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218598#comment-14218598 ] Jian He commented on YARN-2865: --- lgtm, test failures looks unrelated, re-kick jenkins > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218595#comment-14218595 ] Zhijie Shen commented on YARN-2375: --- Looks good to me overall, except some minor issues: 1. can we add a test case in TestMRTimelineEventHandling to check the scenario that MAPREDUCE_JOB_EMIT_TIMELINE_DATA = true, TIMELINE_SERVICE_ENABLED = false and MiniMRYarnCluster doesn't start the timeline server? 2. In ApplicationMaster.finish(), let's stop the timeline client? 3. Fix the indent issue in TimelineClientImpl#serviceInit(). 4. {{LOG.info("Timeline server is (not) enabled");}} -> {{LOG.info("Timeline service is (not) enabled");}}? To be consistent with the log sentence in other places. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2877: Comment: was deleted (was: Linking to HADOOP-11317 to cover project-wide use. I don't think yarn-common needs to explicitly declare a dependency on log4j, at least outside the test run. If you comment out that dependency —does everything still build?) > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2877: Comment: was deleted (was: (ignore that comment, was for YARN-2875)) > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218526#comment-14218526 ] Hadoop QA commented on YARN-2800: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682473/YARN-2800-20141119-1.patch against trunk revision 5bd048e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5881//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5881//console This message is automatically generated. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, > YARN-2800-20141119-1.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218507#comment-14218507 ] Jonathan Eagles commented on YARN-2375: --- This code looks good to me. [~zjshen], can you give a final review? > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218439#comment-14218439 ] Hadoop QA commented on YARN-2495: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682448/YARN-2495.20141119-1.patch against trunk revision 5bd048e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5880//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5880//console This message is automatically generated. > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml or using script > suggested by [~aw]) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
[ https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218405#comment-14218405 ] Zhijie Shen commented on YARN-2879: --- In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with new shuffle on NM; 3. Submitting via old client. We will see the following console exception: {code} Console Log: 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed successfully java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370) at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that we haven't cover all the problematic code path. Will another Jira again. > Compatibility validation between YARN 2.2/2.4 and 2.6 > - > > Key: YARN-2879 > URL: https://issues.apache.org/jira/browse/YARN-2879 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Recently, I did some simple backward compatibility experiments. Bascially, > I've taken the following 2 steps: > 1. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *new* Hadoop (2.6) client. > 2. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *old* Hadoop (2.2/2.4) client that comes with the app. > I've tried these 2 steps on both insecure and secure cluster. Here's a short > summary: > || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle > and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 || > | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible > | OK | OK | > | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | > OK | OK | > | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | > OK | OK | > | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK > | OK | > Note that I've tried to run NM with both old and new version of shuffle > handler plus the runtime libs. > In general, the compatibility looks good overall. There're a f
[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
[ https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2879: -- Description: Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new version of shuffle handler plus the runtime libs. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. was: Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new version of shuffle handler plus the runtime libs. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. > Compatibility validation between YARN 2.2/2.4 and 2.6 > - > > Key: YARN-2879 > URL: https://issues.apache.org/jira/browse/YARN-2879 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Recently, I did some simple backward compatibility experiments. Bascially, > I've taken the following 2 steps: > 1. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *new* Hadoop (2.6) client. > 2. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *old* Hadoop (2.2/2.4) client that comes with the app. > I've tried these 2 steps on both insecure and secure cluster. Here's a short > summary: > || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle > and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 || > | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible > | OK | OK | > | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | > OK | OK | > | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | > OK | OK | > | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK > | OK | > Note that I've tried to run NM with both old and new version of shuffle > handler plus the runtime libs. > In general, the compatibility looks good overall. There're a few issues that > are related to MR, but they seem to be not the YARN issue. I'll post the > individual problem in the follow-up comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
[ https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2879: -- Description: Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new version of shuffle handler plus the runtime libs. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. was: Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new shuffle handler version. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. > Compatibility validation between YARN 2.2/2.4 and 2.6 > - > > Key: YARN-2879 > URL: https://issues.apache.org/jira/browse/YARN-2879 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Recently, I did some simple backward compatibility experiments. Bascially, > I've taken the following 2 steps: > 1. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *new* Hadoop (2.6) client. > 2. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *old* Hadoop (2.2/2.4) client that comes with the app. > I've tried these 2 steps on both insecure and secure cluster. Here's a short > summary: > || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || > MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || > | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible > | OK | OK | > | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | > OK | OK | > | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | > OK | OK | > | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK > | OK | > Note that I've tried to run NM with both old and new version of shuffle > handler plus the runtime libs. > In general, the compatibility looks good overall. There're a few issues that > are related to MR, but they seem to be not the YARN issue. I'll post the > individual problem in the follow-up comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218400#comment-14218400 ] Wangda Tan commented on YARN-2800: -- Hi [~ozawa], Thanks for your comments, bq. MemoryRMNodeLabelsManager for tests do nothing in new patch. How about renaming MemoryRMNodeLabelsManager to NullRMNodeLabelsManager for the consistency with RMStateStore? Address bq. Maybe not related to this JIRA, but it's better to add testing RMRestart with NodeLabelManager to avoid regressions. Agree, filed YARN-2880 to track this, Uploaded a new patch, Thanks, Wangda > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, > YARN-2800-20141119-1.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Attachment: YARN-2800-20141119-1.patch > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, > YARN-2800-20141119-1.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
[ https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388 ] Zhijie Shen edited comment on YARN-2879 at 11/19/14 7:50 PM: - a. In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with either old or new shuffle handler on NM; 3. Submitting via new client. We will see the following console exception: {code} 14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014 java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String; at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} b. In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with old shuffle on NM; 3. Submitting via old client. We will see the following exception in the AM Log: {code} 2014-11-17 15:09:06,157 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1416264695865_0007_01 2014-11-17 15:09:06,436 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364) 2014-11-17 15:09:06,439 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler. {code} The two exceptions are actually the same problem, but using the old client prevents it happening during app submission. Will file a separate Jira for it. was (Author: zjshen): a. In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with either old or new shuffle handler on NM; 3. Submitting via new client. We will see the following console exception: {code} 14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014 java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String; at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.
[jira] [Created] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
Wangda Tan created YARN-2880: Summary: Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled Key: YARN-2880 URL: https://issues.apache.org/jira/browse/YARN-2880 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan As suggested by [~ozawa], [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
[ https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388 ] Zhijie Shen commented on YARN-2879: --- a. In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with either old or new shuffle handler on NM; 3. Submitting via new client. We will see the following console exception: {code} 14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014 java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String; at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} b. In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with old on NM; 3. Submitting via old client. We will see the following exception in the AM Log: {code} 2014-11-17 15:09:06,157 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1416264695865_0007_01 2014-11-17 15:09:06,436 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364) 2014-11-17 15:09:06,439 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler. {code} The two exceptions are actually the same problem, but using the old client prevents it happening during app submission. Will file a separate Jira for it. > Compatibility validation between YARN 2.2/2.4 and 2.6 > - > > Key: YARN-2879 > URL: https://issues.apache.org/jira/browse/YARN-2879 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Recently, I did some simple backward compatibility experiments. Bascially, > I've taken the following 2 steps: > 1. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *new* Hadoop (2.6) client. > 2. Deploy the application (MR and DistributedShell) that is compiled against > *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is > submitted via *old* Hadoop (2.2/2.4) client that comes with the app. > I've tried these 2 steps on both insecure and secure cluster. Here's a short > summary: > || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || > MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || > | Insecure | New Client | OK | OK | Client Incompatible | Client Incomp
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218381#comment-14218381 ] Wangda Tan commented on YARN-1963: -- [~sunilg], Thanks for reply, bq. All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ). Yes, that will be helpful, I wanna make sure they're not in YarnClient now? (including the queue). I didn't see any related code in YarnClient bq. The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high" Instead of having low-high range, I'd prefer highest + default priority. Admin can specify highest priority for queue/user, and default priority for queue/user bq. I have a use case scenario here. There are few applications running in a queue from 4 different users (sub... I understood the use case here, but I think maybe an easier way is not change the definition of user limit. Like having preemption mechanism to support higher priority applications take resource from lower priority applications, etc. Divide user limit by priority will add extra complexity both in implementation and configuration. bq. I suggest to add preemption within queue considering priority. ... +1. Already filed a subjira for this. The preemption I mentioned here is not YARN-2009, is to support the previous use case you mentioned, we can keep user-limit as-is, and enforce higher priority application can get resource, that should be possible :) Thanks, Wangda > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: YARN Application Priorities Design.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
Zhijie Shen created YARN-2879: - Summary: Compatibility validation between YARN 2.2/2.4 and 2.6 Key: YARN-2879 URL: https://issues.apache.org/jira/browse/YARN-2879 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new shuffle handler version. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2729: Attachment: YARN-2729.20141120-1.patch Updating patch with [~wangda]'s review comments > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20141119-1.patch Updating patch With [~wangda] 's review comments... > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, > YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml or using script > suggested by [~aw]) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218254#comment-14218254 ] Hadoop QA commented on YARN-2375: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682439/YARN-2375.patch against trunk revision 5bd048e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5879//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5879//console This message is automatically generated. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2356: -- Attachment: 0001-YARN-2356.patch Thank you [~jianhe] , [~devaraj.k] for the comments. I have updated the patch as per the comments. However I have a point to mention regarding below comment bq. can we return the exitCode directly from printXXXReport() methods I could see that this return of exit code from each of the printXXXReport() was causing nested if in the caller side, and was becoming less readable. Also *killApplication* is already rethrowing exception and handling similar way.. Kindly share your thoughts on this. > yarn status command for non-existent application/application > attempt/container is too verbose > -- > > Key: YARN-2356 > URL: https://issues.apache.org/jira/browse/YARN-2356 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: 0001-YARN-2356.patch, Yarn-2356.1.patch > > > *yarn application -status* or *applicationattempt -status* or *container > status* commands can suppress exception such as ApplicationNotFound, > ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in > RM or History Server. > For example, below exception can be suppressed better > sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status > application_1402668848165_0015 > No GC_PROFILE is given. Defaults to medium. > 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at > /10.18.40.77:45022 > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1402668848165_0015' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at $Proxy12.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) > Cause
[jira] [Commented] (YARN-2299) inconsistency at identifying node
[ https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218207#comment-14218207 ] Bruno Alexandre Rosa commented on YARN-2299: Which* > inconsistency at identifying node > - > > Key: YARN-2299 > URL: https://issues.apache.org/jira/browse/YARN-2299 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > > If port of "yarn.nodemanager.address" is not specified at NM, NM will choose > random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) > and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", > "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI > for a while, and after host:port1 expiration, we get host:port1 in "Lost > Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead > again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in > "Active Nodes" nor in "Lost Nodes". > Another case, two NM is running on same host(miniYarnCluster or other test > purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI. > In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of > nodes we expected. > The root cause is due to inconsistency at how we think two Nodes are > identical. > When we manager active nodes(RMContextImpl.nodes), we use NodeId which > contains port. Two nodes with same host but different port are thought to be > different node. > But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only > use host. Two nodes with same host but different port are thought to > identical. > To fix the inconsistency, we should differentiate below 2 cases and be > consistent for both of them: > - intentionally multiple NMs per host > - NM instances one after another on same host > Two possible solutions: > 1) Introduce a boolean config like "one-node-per-host"(default as "true"), > and use host to differentiate nodes on RM if it's true. > 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. > In this sutiation, NM instances one after another on same host will have > same NodeId, while intentionally multiple NMs per host will have different > NodeId. > Personally I prefer option 1 because it's easier for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2299) inconsistency at identifying node
[ https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218195#comment-14218195 ] Bruno Alexandre Rosa commented on YARN-2299: What are the affected versions? > inconsistency at identifying node > - > > Key: YARN-2299 > URL: https://issues.apache.org/jira/browse/YARN-2299 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > > If port of "yarn.nodemanager.address" is not specified at NM, NM will choose > random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) > and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", > "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI > for a while, and after host:port1 expiration, we get host:port1 in "Lost > Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead > again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in > "Active Nodes" nor in "Lost Nodes". > Another case, two NM is running on same host(miniYarnCluster or other test > purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI. > In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of > nodes we expected. > The root cause is due to inconsistency at how we think two Nodes are > identical. > When we manager active nodes(RMContextImpl.nodes), we use NodeId which > contains port. Two nodes with same host but different port are thought to be > different node. > But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only > use host. Two nodes with same host but different port are thought to > identical. > To fix the inconsistency, we should differentiate below 2 cases and be > consistent for both of them: > - intentionally multiple NMs per host > - NM instances one after another on same host > Two possible solutions: > 1) Introduce a boolean config like "one-node-per-host"(default as "true"), > and use host to differentiate nodes on RM if it's true. > 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. > In this sutiation, NM instances one after another on same host will have > same NodeId, while intentionally multiple NMs per host will have different > NodeId. > Personally I prefer option 1 because it's easier for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218171#comment-14218171 ] Hadoop QA commented on YARN-2865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682428/YARN-2865.1.patch against trunk revision 5bd048e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5878//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5878//console This message is automatically generated. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
[ https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218160#comment-14218160 ] Tim Robertson commented on YARN-2875: - Sadly no. It is used in the [ContainerLogAppender|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java#L37] and [ContainerRollingLogAppender|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java#L34]. I tried to remove it and compile using the [log4j-over-slf4j v1.7.7 bridge|http://search.maven.org/#artifactdetails%7Corg.slf4j%7Clog4j-over-slf4j%7C1.7.7%7Cjar] but that fails because the SLF4J classes are not the same API. For example [the SLF4J RollingFileAppender| https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/RollingFileAppender.java] does not implement methods like setFile(), setAppend() etc. The build will fail with the following: {code} [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[41,6] error: cannot find symbol [ERROR] symbol: method setFile(String) location: class ContainerRollingLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[42,6] error: cannot find symbol [ERROR] symbol: method setAppend(boolean) location: class ContainerRollingLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[43,11] error: cannot find symbol [ERROR] symbol: method activateOptions() /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[38,2] error: method does not override or implement a method from a supertype [ERROR] /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[49,8] error: cannot find symbol [ERROR] symbol: variable qw location: class ContainerRollingLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[50,6] error: cannot find symbol [ERROR] symbol: variable qw location: class ContainerRollingLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[37,7] error: no suitable constructor found for FileAppender() [ERROR] constructor FileAppender.FileAppender(Layout,String,boolean,boolean,int) is not applicable (actual and formal argument lists differ in length) constructor FileAppender.FileAppender(Layout,String,boolean) is not applicable (actual and formal argument lists differ in length) constructor FileAppender.FileAppender(Layout,String) is not applicable (actual and formal argument lists differ in length) /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[52,6] error: cannot find symbol [ERROR] symbol: method setFile(String) location: class ContainerLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[53,6] error: cannot find symbol [ERROR] symbol: method setAppend(boolean) location: class ContainerLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[65,13] error: cannot find symbol [ERROR] symbol: method append(LoggingEvent) /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[58,2] error: method does not override or implement a method from a supertype [ERROR] /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[77,8] error: cannot find symbol [ERROR] symbol: variable qw location: class ContainerLogAppender /Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[78,6] error: cannot find symbol [ERROR]
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2375: Attachment: YARN-2375.patch Attaching updated patch > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218044#comment-14218044 ] Rohith commented on YARN-2865: -- Attached patch, the changes from previous patch are 1. Karthik comment fixed. Adding comment for RMActiveServiceContext and making @private and @unstable annotations. 2. Jian He comment fixed. I use rmcontext only to set services. Please review the patch. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2865: - Attachment: YARN-2865.1.patch > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218014#comment-14218014 ] Hudson commented on YARN-2870: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/10/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218017#comment-14218017 ] Hudson commented on YARN-2157: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/10/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2878: Target Version/s: 2.6.1 Affects Version/s: 2.6.0 > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.6.0 >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218008#comment-14218008 ] Akira AJISAKA commented on YARN-2878: - Applied the patch and compiled the doc. The doc looks to me, +1 (non-binding). > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217998#comment-14217998 ] Hudson commented on YARN-2157: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1962 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1962/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217995#comment-14217995 ] Hudson commented on YARN-2870: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1962 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1962/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * hadoop-yarn-project/CHANGES.txt > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2878: Component/s: documentation > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217946#comment-14217946 ] Hudson commented on YARN-2870: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1938 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1938/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217949#comment-14217949 ] Hudson commented on YARN-2157: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1938 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1938/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm * hadoop-yarn-project/CHANGES.txt > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217931#comment-14217931 ] Hudson commented on YARN-2157: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/10/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm * hadoop-yarn-project/CHANGES.txt > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217928#comment-14217928 ] Hudson commented on YARN-2870: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/10/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * hadoop-yarn-project/CHANGES.txt > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217720#comment-14217720 ] Hudson commented on YARN-2870: -- FAILURE: Integrated in Hadoop-Yarn-trunk #748 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/748/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * hadoop-yarn-project/CHANGES.txt > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217723#comment-14217723 ] Hudson commented on YARN-2157: -- FAILURE: Integrated in Hadoop-Yarn-trunk #748 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/748/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217713#comment-14217713 ] Hudson commented on YARN-2157: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/10/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm * hadoop-yarn-project/CHANGES.txt > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217710#comment-14217710 ] Hudson commented on YARN-2870: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #10 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/10/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * hadoop-yarn-project/CHANGES.txt > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Fix For: 2.7.0 > > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217706#comment-14217706 ] Devaraj K commented on YARN-2356: - Sorry for coming late here. Thanks [~sunilg] for the patch and thanks [~jianhe] for review. Overall the patch looks good. In addition to [~jianhe] comment, I see these two observations. 1. Instead of rethrowing and catching the exception for exitCode determination, can we return the exitCode directly from printXXXReport() methods? 2. In all the newly added tests, I think no need to catch the exception and do Assert.fail() explicitly, JUnit will fail those when exception arises. > yarn status command for non-existent application/application > attempt/container is too verbose > -- > > Key: YARN-2356 > URL: https://issues.apache.org/jira/browse/YARN-2356 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: Yarn-2356.1.patch > > > *yarn application -status* or *applicationattempt -status* or *container > status* commands can suppress exception such as ApplicationNotFound, > ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in > RM or History Server. > For example, below exception can be suppressed better > sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status > application_1402668848165_0015 > No GC_PROFILE is given. Defaults to medium. > 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at > /10.18.40.77:45022 > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1402668848165_0015' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at $Proxy12.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) > Caused by: >
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217672#comment-14217672 ] Sunil G commented on YARN-1963: --- Hi Wangda, Thank for sharing your comments. bq. Does this means, any YARN application doesn't need change a line of their code, yarn.app.priority can be passed from client side. And if client can set the priority value to ApplicationSubmissionContext which is received from this config, then RM can get the same. All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ). bq. Specify only highest priority for queue and user The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high" {noformat} cluster labels {very_high, high, medium, low} yarn.scheduler.root..priority_label=low-high yarn.scheduler.capacity.root..high.acl=user1,user2 yarn.scheduler.capacity.root..low.acl=user3,user4 {noformat} This was the intention. Please share your thoughts [~vinodkv] [~gp.leftnoteasy] bq. I think we shouldn't consider user limit within priority level I have a use case scenario here. There are few applications running in a queue from 4 different users (submitted to priority level low) and user-limit factor is 20. 5th user has ACL for submitting high priority applications. Because of user-limit, he can get only 20% maximum for his high priority apps. This high priority apps submitted by user5 may need more resource which intern will be rejected by user-limit check. How do you feel this use case? bq. I suggest to add preemption within queue considering priority. +1. Already filed a subjira for this. > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: YARN Application Priorities Design.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217671#comment-14217671 ] Naganarasimha G R commented on YARN-2301: - Hi [~jianhe], Thanks for reviewing, but required more clarifications : bq. we can just use containerReport.getFinishTime(), as it internally is checking “>0” already. This modification is to support for the 3 issue which you mentioned {{3. finish-time is 0 if container is not yet finished. May be "N/A"}} and dint get where exactly ">0" is being checked internally as there are no checks in PBImpl. Can you please provide more precisely where this check is done internally? bq. the scheme could be https also, we should use WebAppUtils#getHttpSchemePrefix Due to the following reasons i kept scheme hard coded to http 1. We get the containers HTTP address only and to that we appending the scheme {{WebAppUtils.getRunningLogURL(container.getNodeHttpAddress()}}. so irrespective of what scheme we set, the port will be node's http port where this container ran. so it would not be ideal to set scheme as HTTPS and node's http port. And if we need to correct this then we need to enforce Container.newInstance to take https url also which will impact lot of places 2. WebAppUtils#getHttpSchemePrefix requires configuration object, so as the reference is not available in RMContainerImpl, i need to create new YARNConfigurations RMContainerImpl constructor and keep it. ??may be trivial issue?? so kept the changes simple. Please provide your opinion for the same > Improve yarn container command > -- > > Key: YARN-2301 > URL: https://issues.apache.org/jira/browse/YARN-2301 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Naganarasimha G R > Labels: usability > Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch > > > While running yarn container -list command, some > observations: > 1) the scheme (e.g. http/https ) before LOG-URL is missing > 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to > print as time format. > 3) finish-time is 0 if container is not yet finished. May be "N/A" > 4) May have an option to run as yarn container -list OR yarn > application -list-containers also. > As attempt Id is not shown on console, this is easier for user to just copy > the appId and run it, may also be useful for container-preserving AM > restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
[ https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217669#comment-14217669 ] Steve Loughran commented on YARN-2875: -- Linking to HADOOP-11317 to cover project-wide use. I don't think yarn-common needs to explicitly declare a dependency on log4j, at least outside the test run. If you comment out that dependency —does everything still build? > Bump SLF4J to 1.7.7 from 1.7.5 > --- > > Key: YARN-2875 > URL: https://issues.apache.org/jira/browse/YARN-2875 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tim Robertson >Priority: Minor > > hadoop-yarn-common [uses log4j > directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167] > and when trying to redirect that through an SLF4J bridge version 1.7.5 has > issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j > version 1.7.5. > This is documented on the [1.7.6 release > notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable. > This is applicable to all the projects using Hadoop motherpom, but Yarn > appears to be bringing Log4J in, rather than coding to the SLF4J API. > The issue shows in the logs as follows in Yarn MR apps, which is painful to > diagnose. > {code} > WARN [2014-11-18 09:58:06,390+0100] [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in > callback postStart > java.lang.reflect.InvocationTargetException: null > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[na:1.7.0_71] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[na:1.7.0_71] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[na:1.7.0_71] > at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71] > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > ~[job.jar:0.22-SNAPSHOT] > at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na] > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) > [job.jar:0.22-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > [na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474) > [job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) > [job.jar:0.22-SNAPSHOT] > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71] > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > ~[na:1.7.0_71] > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > ~[na:1.7.0_71] > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > ~[na:1.7.0_71] > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > ~[na:1.7.0_71] > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71] > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71] > at java.security.AccessController.doPrivileged(Native Method) > [na:1.7.0_71] > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > ~[na:1.7.0_71] > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > ~[na:1.7.0_71] > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71] > at > org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183) > ~[job.jar:0.22-SNAPSHOT] > at > org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) > ~[job.jar:0.22-S
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217668#comment-14217668 ] Steve Loughran commented on YARN-2877: -- (ignore that comment, was for YARN-2875) > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217667#comment-14217667 ] Steve Loughran commented on YARN-2877: -- Linking to HADOOP-11317 to cover project-wide use. I don't think yarn-common needs to explicitly declare a dependency on log4j, at least outside the test run. If you comment out that dependency —does everything still build? > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
[ https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Robertson updated YARN-2875: Description: hadoop-yarn-common [uses log4j directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167] and when trying to redirect that through an SLF4J bridge version 1.7.5 has issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j version 1.7.5. This is documented on the [1.7.6 release notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable. This is applicable to all the projects using Hadoop motherpom, but Yarn appears to be bringing Log4J in, rather than coding to the SLF4J API. The issue shows in the logs as follows in Yarn MR apps, which is painful to diagnose. {code} WARN [2014-11-18 09:58:06,390+0100] [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in callback postStart java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) ~[job.jar:0.22-SNAPSHOT] at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) [job.jar:0.22-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) [job.jar:0.22-SNAPSHOT] Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71] at java.lang.ClassLoader.defineClass(ClassLoader.java:800) ~[na:1.7.0_71] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[na:1.7.0_71] at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) ~[na:1.7.0_71] at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) ~[job.jar:0.22-SNAPSHOT] at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) ~[na:1.7.0_71] at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) ~[na:1
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.2.patch Go ahead and allow cores to be part of the am resource limit... > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)