[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693023#comment-14693023 ] Hadoop QA commented on YARN-4026: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 21s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 3 new checkstyle issues (total was 128, now 128). | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 30 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 24s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 53s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750004/YARN-4026.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ae716f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8829/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8829/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8829/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8829/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8829/console | This message was automatically generated. > FiCaSchedulerApp: ContainerAllocator should be able to choose how to order > pending resource requests > > > Key: YARN-4026 > URL: https://issues.apache.org/jira/browse/YARN-4026 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch > > > After YARN-3983, we have an extensible ContainerAllocator which can be used > by FiCaSchedulerApp to decide how to allocate resources. > While working on YARN-1651 (allocate resource to increase container), I found > one thing in existing logic not flexible enough: > - ContainerAllocator decides what to allocate for a given node and priority: > To support different kinds of resource allocation, for example, priority as > weight / skip priority or not, etc. It's better to let ContainerAllocator to > choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692916#comment-14692916 ] Sangjin Lee commented on YARN-2859: --- [~zjshen], can this be done for 2.6.1, or are you OK with deferring it to 2.6.2? > ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster > -- > > Key: YARN-2859 > URL: https://issues.apache.org/jira/browse/YARN-2859 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Hitesh Shah >Assignee: Zhijie Shen >Priority: Critical > Labels: 2.6.1-candidate > > In mini cluster, a random port should be used. > Also, the config is not updated to the host that the process got bound to. > {code} > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer > address: localhost:10200 > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer > web address: 0.0.0.0:8188 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2746: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > YARNDelegationTokenID misses serializing version from the common abstract ID > > > Key: YARN-2746 > URL: https://issues.apache.org/jira/browse/YARN-2746 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > > I found this during review of YARN-2743. > bq. AbstractDTId had a version, we dropped that in the protobuf > serialization. We should just write it during the serialization and read it > back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2657: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > MiniYARNCluster to (optionally) add MicroZookeeper service > -- > > Key: YARN-2657 > URL: https://issues.apache.org/jira/browse/YARN-2657 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2567-001.patch, YARN-2657-002.patch > > > This is needed for testing things like YARN-2646: add an option for the > {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. > This is just another YARN service to create and track the lifecycle. The > {{MicroZookeeperService}} publishes its binding information for direct takeup > by the registry services...this can address in-VM race conditions. > The default setting for this service is "off" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2599: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Standby RM should also expose some jmx and metrics > -- > > Key: YARN-2599 > URL: https://issues.apache.org/jira/browse/YARN-2599 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Rohith Sharma K S > > YARN-1898 redirects jmx and metrics to the Active. As discussed there, we > need to separate out metrics displayed so the Standby RM can also be > monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2506: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > TimelineClient should NOT be in yarn-common project > --- > > Key: YARN-2506 > URL: https://issues.apache.org/jira/browse/YARN-2506 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen >Priority: Critical > > YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't > belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2037: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Add restart support for Unmanaged AMs > - > > Key: YARN-2037 > URL: https://issues.apache.org/jira/browse/YARN-2037 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > It would be nice to allow Unmanaged AMs also to restart in a work-preserving > way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2457: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > FairScheduler: Handle preemption to help starved parent queues > -- > > Key: YARN-2457 > URL: https://issues.apache.org/jira/browse/YARN-2457 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.5.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't > check for parent queue starvation. > We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2055: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Preemption: Jobs are failing due to AMs are getting launched and killed > multiple times > -- > > Key: YARN-2055 > URL: https://issues.apache.org/jira/browse/YARN-2055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal > > If Queue A does not have enough capacity to run AM, then AM will borrow > capacity from queue B to run AM in that case AM will be killed if queue B > will reclaim its capacity and again AM will be launched and killed again, in > that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1848: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Persist ClusterMetrics across RM HA transitions > --- > > Key: YARN-1848 > URL: https://issues.apache.org/jira/browse/YARN-1848 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Post YARN-1705, ClusterMetrics are reset on transition to standby. This is > acceptable as the metrics show statistics since an RM has become active. > Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2014: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 > > > Key: YARN-2014 > URL: https://issues.apache.org/jira/browse/YARN-2014 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: patrick white >Assignee: Jason Lowe > > Performance comparison benchmarks from 2.x against 0.23 shows AM scalability > benchmark's runtime is approximately 10% slower in 2.4.0. The trend is > consistent across later releases in both lines, latest release numbers are: > 2.4.0.0 runtime 255.6 seconds (avg 5 passes) > 0.23.9.12 runtime 230.4 seconds (avg 5 passes) > Diff: -9.9% > AM Scalability test is essentially a sleep job that measures time to launch > and complete a large number of mappers. > The diff is consistent and has been reproduced in both a larger (350 node, > 100,000 mappers) perf environment, as well as a small (10 node, 2,900 > mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1856: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1480: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1767: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Windows: Allow a way for users to augment classpath of YARN daemons > --- > > Key: YARN-1767 > URL: https://issues.apache.org/jira/browse/YARN-1767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla > > YARN-1429 adds a way to augment the classpath for *nix-based systems. Need > something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1681) When "banned.users" is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1681: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > When "banned.users" is not set in LCE's container-executor.cfg, submit job > with user in DEFAULT_BANNED_USERS will receive unclear error message > --- > > Key: YARN-1681 > URL: https://issues.apache.org/jira/browse/YARN-1681 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Zhichun Wu >Assignee: Zhichun Wu > Labels: container, usability > Attachments: YARN-1681.patch > > > When using LCE in a secure setup, if "banned.users" is not set in > container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS > ("mapred", "hdfs", "bin", 0) will receive unclear error message. > for example, if we use hdfs to submit a mr job, we may see the following the > yarn app overview page: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: > {code} > while the prefer error message may look like: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: Requested user hdfs is banned > {code} > just a minor bug and I would like to start contributing to hadoop-common with > it:) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692901#comment-14692901 ] Hudson commented on YARN-3999: -- FAILURE: Integrated in Hadoop-trunk-Commit #8286 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8286/]) YARN-3999. RM hangs on draing events. Contributed by Jian He (xgong: rev 3ae716fa696b87e849dae40225dc59fb5ed114cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > RM hangs on draing events > - > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.2 > > Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, > YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, > YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692843#comment-14692843 ] Hadoop QA commented on YARN-313: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 19s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 56s | The applied patch generated 4 new checkstyle issues (total was 229, now 232). | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 58s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 2m 0s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 111m 5s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.cli.TestRMAdminCLI | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749993/YARN-313-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ae716f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8828/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8828/console | This message was automatically generated. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692827#comment-14692827 ] Hadoop QA commented on YARN-4026: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 18s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 54s | The applied patch generated 3 new checkstyle issues (total was 128, now 128). | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 30 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750004/YARN-4026.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8827/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8827/console | This message was automatically generated. > FiCaSchedulerApp: ContainerAllocator should be able to choose how to order > pending resource requests > > > Key: YARN-4026 > URL: https://issues.apache.org/jira/browse/YARN-4026 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch > > > After YARN-3983, we have an extensible ContainerAllocator which can be used > by FiCaSchedulerApp to decide how to allocate resources. > While working on YARN-1651 (allocate resource to increase container), I found > one thing in existing logic not flexible enough: > - ContainerAllocator decides what to allocate for a given node and priority: > To support different kinds of resource allocation, for example, priority as > weight / skip priority or not, etc. It's better to let ContainerAllocator to > choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2038: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Revisit how AMs learn of containers from previous attempts > -- > > Key: YARN-2038 > URL: https://issues.apache.org/jira/browse/YARN-2038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Based on YARN-556, we need to update the way AMs learn about containers > allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-313: - Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692729#comment-14692729 ] Xuan Gong commented on YARN-3999: - Thanks, Jian. Committed into trunk/branch-2/branch-2.7. > RM hangs on draing events > - > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.2 > > Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, > YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, > YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692709#comment-14692709 ] sandflee commented on YARN-2038: I thought it's the same issue to YARN-3519, but it seems not. I'm also confusing what the purpose of this issue now > Revisit how AMs learn of containers from previous attempts > -- > > Key: YARN-2038 > URL: https://issues.apache.org/jira/browse/YARN-2038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Based on YARN-556, we need to update the way AMs learn about containers > allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999-branch-2.7.patch upload branch-2.7 patch > RM hangs on draing events > - > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, > YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, > YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4026: - Attachment: YARN-4026.3.patch Attached ver.3, added more comments and fixed findbugs warning. > FiCaSchedulerApp: ContainerAllocator should be able to choose how to order > pending resource requests > > > Key: YARN-4026 > URL: https://issues.apache.org/jira/browse/YARN-4026 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch > > > After YARN-3983, we have an extensible ContainerAllocator which can be used > by FiCaSchedulerApp to decide how to allocate resources. > While working on YARN-1651 (allocate resource to increase container), I found > one thing in existing logic not flexible enough: > - ContainerAllocator decides what to allocate for a given node and priority: > To support different kinds of resource allocation, for example, priority as > weight / skip priority or not, etc. It's better to let ContainerAllocator to > choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4046.002.patch Fixed whitespace > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4046.002.patch, YARN-4046.002.patch, > YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error. The attempts are not retried. > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4046.002.patch Fixed a new checkstyle that was added, the other two are preexisting and should not be fixed. > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4046.002.patch, YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error. The attempts are not retried. > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4026: - Attachment: YARN-4026.2.patch Thanks for comments [~jianhe], attached ver.2 patch. > FiCaSchedulerApp: ContainerAllocator should be able to choose how to order > pending resource requests > > > Key: YARN-4026 > URL: https://issues.apache.org/jira/browse/YARN-4026 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4026.1.patch, YARN-4026.2.patch > > > After YARN-3983, we have an extensible ContainerAllocator which can be used > by FiCaSchedulerApp to decide how to allocate resources. > While working on YARN-1651 (allocate resource to increase container), I found > one thing in existing logic not flexible enough: > - ContainerAllocator decides what to allocate for a given node and priority: > To support different kinds of resource allocation, for example, priority as > weight / skip priority or not, etc. It's better to let ContainerAllocator to > choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692632#comment-14692632 ] Inigo Goiri commented on YARN-313: -- Not critical, I think it can be deferred. I would appreciate ideas on why this change breaks the refreshNodes with a graceful period. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2457: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > FairScheduler: Handle preemption to help starved parent queues > -- > > Key: YARN-2457 > URL: https://issues.apache.org/jira/browse/YARN-2457 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.5.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't > check for parent queue starvation. > We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-313: - Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2038: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Revisit how AMs learn of containers from previous attempts > -- > > Key: YARN-2038 > URL: https://issues.apache.org/jira/browse/YARN-2038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Based on YARN-556, we need to update the way AMs learn about containers > allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1681) When "banned.users" is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1681: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > When "banned.users" is not set in LCE's container-executor.cfg, submit job > with user in DEFAULT_BANNED_USERS will receive unclear error message > --- > > Key: YARN-1681 > URL: https://issues.apache.org/jira/browse/YARN-1681 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Zhichun Wu >Assignee: Zhichun Wu > Labels: container, usability > Attachments: YARN-1681.patch > > > When using LCE in a secure setup, if "banned.users" is not set in > container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS > ("mapred", "hdfs", "bin", 0) will receive unclear error message. > for example, if we use hdfs to submit a mr job, we may see the following the > yarn app overview page: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: > {code} > while the prefer error message may look like: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: Requested user hdfs is banned > {code} > just a minor bug and I would like to start contributing to hadoop-common with > it:) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2076: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Minor error in TestLeafQueue files > -- > > Key: YARN-2076 > URL: https://issues.apache.org/jira/browse/YARN-2076 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Labels: test > Attachments: YARN-2076.patch > > > "numNodes" should be 2 instead of 3 in testReservationExchange() since only > two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1767: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Windows: Allow a way for users to augment classpath of YARN daemons > --- > > Key: YARN-1767 > URL: https://issues.apache.org/jira/browse/YARN-1767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla > > YARN-1429 adds a way to augment the classpath for *nix-based systems. Need > something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2055: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Preemption: Jobs are failing due to AMs are getting launched and killed > multiple times > -- > > Key: YARN-2055 > URL: https://issues.apache.org/jira/browse/YARN-2055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal > > If Queue A does not have enough capacity to run AM, then AM will borrow > capacity from queue B to run AM in that case AM will be killed if queue B > will reclaim its capacity and again AM will be launched and killed again, in > that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1848: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Persist ClusterMetrics across RM HA transitions > --- > > Key: YARN-1848 > URL: https://issues.apache.org/jira/browse/YARN-1848 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Post YARN-1705, ClusterMetrics are reset on transition to standby. This is > acceptable as the metrics show statistics since an RM has become active. > Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1480: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2506: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > TimelineClient should NOT be in yarn-common project > --- > > Key: YARN-2506 > URL: https://issues.apache.org/jira/browse/YARN-2506 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen >Priority: Critical > > YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't > belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2599: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Standby RM should also expose some jmx and metrics > -- > > Key: YARN-2599 > URL: https://issues.apache.org/jira/browse/YARN-2599 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Rohith Sharma K S > > YARN-1898 redirects jmx and metrics to the Active. As discussed there, we > need to separate out metrics displayed so the Standby RM can also be > monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState
[ https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3478: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > FairScheduler page not performed because different enum of > YarnApplicationState and RMAppState > --- > > Key: YARN-3478 > URL: https://issues.apache.org/jira/browse/YARN-3478 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Xu Chen > Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, > screenshot-1.png > > > Got exception from log > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at > com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(Sele
[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2859: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster > -- > > Key: YARN-2859 > URL: https://issues.apache.org/jira/browse/YARN-2859 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Hitesh Shah >Assignee: Zhijie Shen >Priority: Critical > Labels: 2.6.1-candidate > > In mini cluster, a random port should be used. > Also, the config is not updated to the host that the process got bound to. > {code} > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer > address: localhost:10200 > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer > web address: 0.0.0.0:8188 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2746: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > YARNDelegationTokenID misses serializing version from the common abstract ID > > > Key: YARN-2746 > URL: https://issues.apache.org/jira/browse/YARN-2746 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > > I found this during review of YARN-2743. > bq. AbstractDTId had a version, we dropped that in the protobuf > serialization. We should just write it during the serialization and read it > back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2037: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Add restart support for Unmanaged AMs > - > > Key: YARN-2037 > URL: https://issues.apache.org/jira/browse/YARN-2037 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > It would be nice to allow Unmanaged AMs also to restart in a work-preserving > way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1856: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2657: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > MiniYARNCluster to (optionally) add MicroZookeeper service > -- > > Key: YARN-2657 > URL: https://issues.apache.org/jira/browse/YARN-2657 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2567-001.patch, YARN-2657-002.patch > > > This is needed for testing things like YARN-2646: add an option for the > {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. > This is just another YARN service to create and track the lifecycle. The > {{MicroZookeeperService}} publishes its binding information for direct takeup > by the registry services...this can address in-VM race conditions. > The default setting for this service is "off" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2014: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! > Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 > > > Key: YARN-2014 > URL: https://issues.apache.org/jira/browse/YARN-2014 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: patrick white >Assignee: Jason Lowe > > Performance comparison benchmarks from 2.x against 0.23 shows AM scalability > benchmark's runtime is approximately 10% slower in 2.4.0. The trend is > consistent across later releases in both lines, latest release numbers are: > 2.4.0.0 runtime 255.6 seconds (avg 5 passes) > 0.23.9.12 runtime 230.4 seconds (avg 5 passes) > Diff: -9.9% > AM Scalability test is essentially a sleep job that measures time to launch > and complete a large number of mappers. > The diff is consistent and has been reproduced in both a larger (350 node, > 100,000 mappers) perf environment, as well as a small (10 node, 2,900 > mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-313: - Attachment: YARN-313-v7.patch Updated to trunk. It looks like it still breaks the unit test for the graceful refresh but I cannot figure out why. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692605#comment-14692605 ] Chen He commented on YARN-2076: --- I will update patch. > Minor error in TestLeafQueue files > -- > > Key: YARN-2076 > URL: https://issues.apache.org/jira/browse/YARN-2076 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Labels: test > Attachments: YARN-2076.patch > > > "numNodes" should be 2 instead of 3 in testReservationExchange() since only > two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3978: -- Labels: 2.6.1-candidate (was: ) > Configurably turn off the saving of container info in Generic AHS > - > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn >Affects Versions: 2.8.0, 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne > Labels: 2.6.1-candidate > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-3978.001.patch, YARN-3978.002.patch, > YARN-3978.003.patch, YARN-3978.004.patch > > > Depending on how each application's metadata is stored, one week's worth of > data stored in the Generic Application History Server's database can grow to > be almost a terabyte of local disk space. In order to alleviate this, I > suggest that there is a need for a configuration option to turn off saving of > non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692546#comment-14692546 ] Sangjin Lee commented on YARN-313: -- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692538#comment-14692538 ] Sangjin Lee commented on YARN-1856: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692545#comment-14692545 ] Sangjin Lee commented on YARN-1480: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1681) When "banned.users" is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692544#comment-14692544 ] Sangjin Lee commented on YARN-1681: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > When "banned.users" is not set in LCE's container-executor.cfg, submit job > with user in DEFAULT_BANNED_USERS will receive unclear error message > --- > > Key: YARN-1681 > URL: https://issues.apache.org/jira/browse/YARN-1681 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Zhichun Wu >Assignee: Zhichun Wu > Labels: container, usability > Attachments: YARN-1681.patch > > > When using LCE in a secure setup, if "banned.users" is not set in > container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS > ("mapred", "hdfs", "bin", 0) will receive unclear error message. > for example, if we use hdfs to submit a mr job, we may see the following the > yarn app overview page: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: > {code} > while the prefer error message may look like: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: Requested user hdfs is banned > {code} > just a minor bug and I would like to start contributing to hadoop-common with > it:) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692543#comment-14692543 ] Sangjin Lee commented on YARN-1767: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Windows: Allow a way for users to augment classpath of YARN daemons > --- > > Key: YARN-1767 > URL: https://issues.apache.org/jira/browse/YARN-1767 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla > > YARN-1429 adds a way to augment the classpath for *nix-based systems. Need > something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692537#comment-14692537 ] Sangjin Lee commented on YARN-2014: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 > > > Key: YARN-2014 > URL: https://issues.apache.org/jira/browse/YARN-2014 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: patrick white >Assignee: Jason Lowe > > Performance comparison benchmarks from 2.x against 0.23 shows AM scalability > benchmark's runtime is approximately 10% slower in 2.4.0. The trend is > consistent across later releases in both lines, latest release numbers are: > 2.4.0.0 runtime 255.6 seconds (avg 5 passes) > 0.23.9.12 runtime 230.4 seconds (avg 5 passes) > Diff: -9.9% > AM Scalability test is essentially a sleep job that measures time to launch > and complete a large number of mappers. > The diff is consistent and has been reproduced in both a larger (350 node, > 100,000 mappers) perf environment, as well as a small (10 node, 2,900 > mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692541#comment-14692541 ] Sangjin Lee commented on YARN-1848: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Persist ClusterMetrics across RM HA transitions > --- > > Key: YARN-1848 > URL: https://issues.apache.org/jira/browse/YARN-1848 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Post YARN-1705, ClusterMetrics are reset on transition to standby. This is > acceptable as the metrics show statistics since an RM has become active. > Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692533#comment-14692533 ] Sangjin Lee commented on YARN-2038: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Revisit how AMs learn of containers from previous attempts > -- > > Key: YARN-2038 > URL: https://issues.apache.org/jira/browse/YARN-2038 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > Based on YARN-556, we need to update the way AMs learn about containers > allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692532#comment-14692532 ] Sangjin Lee commented on YARN-2055: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Preemption: Jobs are failing due to AMs are getting launched and killed > multiple times > -- > > Key: YARN-2055 > URL: https://issues.apache.org/jira/browse/YARN-2055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal > > If Queue A does not have enough capacity to run AM, then AM will borrow > capacity from queue B to run AM in that case AM will be killed if queue B > will reclaim its capacity and again AM will be launched and killed again, in > that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692534#comment-14692534 ] Sangjin Lee commented on YARN-2037: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Add restart support for Unmanaged AMs > - > > Key: YARN-2037 > URL: https://issues.apache.org/jira/browse/YARN-2037 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla > > It would be nice to allow Unmanaged AMs also to restart in a work-preserving > way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692529#comment-14692529 ] Sangjin Lee commented on YARN-2076: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Minor error in TestLeafQueue files > -- > > Key: YARN-2076 > URL: https://issues.apache.org/jira/browse/YARN-2076 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Labels: test > Attachments: YARN-2076.patch > > > "numNodes" should be 2 instead of 3 in testReservationExchange() since only > two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692527#comment-14692527 ] Sangjin Lee commented on YARN-2506: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > TimelineClient should NOT be in yarn-common project > --- > > Key: YARN-2506 > URL: https://issues.apache.org/jira/browse/YARN-2506 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen >Priority: Critical > > YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't > belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692524#comment-14692524 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 17s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 55s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 46s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 6s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 55m 53s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749943/YARN-3045-YARN-2928.009.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 07433c2 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8826/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8826/console | This message was automatically generated. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, > YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692528#comment-14692528 ] Sangjin Lee commented on YARN-2457: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > FairScheduler: Handle preemption to help starved parent queues > -- > > Key: YARN-2457 > URL: https://issues.apache.org/jira/browse/YARN-2457 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.5.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't > check for parent queue starvation. > We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692525#comment-14692525 ] Sangjin Lee commented on YARN-2657: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > MiniYARNCluster to (optionally) add MicroZookeeper service > -- > > Key: YARN-2657 > URL: https://issues.apache.org/jira/browse/YARN-2657 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-2567-001.patch, YARN-2657-002.patch > > > This is needed for testing things like YARN-2646: add an option for the > {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. > This is just another YARN service to create and track the lifecycle. The > {{MicroZookeeperService}} publishes its binding information for direct takeup > by the registry services...this can address in-VM race conditions. > The default setting for this service is "off" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692526#comment-14692526 ] Sangjin Lee commented on YARN-2599: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > Standby RM should also expose some jmx and metrics > -- > > Key: YARN-2599 > URL: https://issues.apache.org/jira/browse/YARN-2599 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Rohith Sharma K S > > YARN-1898 redirects jmx and metrics to the Active. As discussed there, we > need to separate out metrics displayed so the Standby RM can also be > monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692523#comment-14692523 ] Sangjin Lee commented on YARN-2746: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > YARNDelegationTokenID misses serializing version from the common abstract ID > > > Key: YARN-2746 > URL: https://issues.apache.org/jira/browse/YARN-2746 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > > I found this during review of YARN-2743. > bq. AbstractDTId had a version, we dropped that in the protobuf > serialization. We should just write it during the serialization and read it > back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692521#comment-14692521 ] Sangjin Lee commented on YARN-2859: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster > -- > > Key: YARN-2859 > URL: https://issues.apache.org/jira/browse/YARN-2859 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Hitesh Shah >Assignee: Zhijie Shen >Priority: Critical > Labels: 2.6.1-candidate > > In mini cluster, a random port should be used. > Also, the config is not updated to the host that the process got bound to. > {code} > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer > address: localhost:10200 > 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster > (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer > web address: 0.0.0.0:8188 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState
[ https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692520#comment-14692520 ] Sangjin Lee commented on YARN-3478: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. > FairScheduler page not performed because different enum of > YarnApplicationState and RMAppState > --- > > Key: YARN-3478 > URL: https://issues.apache.org/jira/browse/YARN-3478 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Xu Chen > Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, > screenshot-1.png > > > Got exception from log > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at > com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692518#comment-14692518 ] Hadoop QA commented on YARN-4046: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 9s | The applied patch generated 3 new checkstyle issues (total was 97, now 99). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 57s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 42s | Tests failed in hadoop-common. | | | | 63m 5s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749949/YARN-4096.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8825/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8825/console | This message was automatically generated. > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error. The attempts are not retried. > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692511#comment-14692511 ] Vrushali C commented on YARN-4025: -- Yes, +1 > Deal with byte representations of Longs in writer code > -- > > Key: YARN-4025 > URL: https://issues.apache.org/jira/browse/YARN-4025 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-4025-YARN-2928.001.patch > > > Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl > code. There seem to be some places in the code where there are conversions > between Long to byte[] to String for easier argument passing between function > calls. Then these values end up being converted back to byte[] while storing > in hbase. > It would be better to pass around byte[] or the Longs themselves as > applicable. > This may result in some api changes (store function) as well in adding a few > more function calls like getColumnQualifier which accepts a pre-encoded byte > array. It will be in addition to the existing api which accepts a String and > the ColumnHelper to return a byte[] column name instead of a String one. > Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692476#comment-14692476 ] Junping Du commented on YARN-3906: -- Ok. Committing this patch now. > split the application table from the entity table > - > > Key: YARN-3906 > URL: https://issues.apache.org/jira/browse/YARN-3906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3906-YARN-2928.001.patch, > YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, > YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, > YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch > > > Per discussions on YARN-3815, we need to split the application entities from > the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692468#comment-14692468 ] Sangjin Lee commented on YARN-4025: --- For the record, we will go ahead with YARN-3906 first. We'll need to update this patch to reflect the changes in YARN-3906. I'll work with [~vrushalic] on that. > Deal with byte representations of Longs in writer code > -- > > Key: YARN-4025 > URL: https://issues.apache.org/jira/browse/YARN-4025 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-4025-YARN-2928.001.patch > > > Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl > code. There seem to be some places in the code where there are conversions > between Long to byte[] to String for easier argument passing between function > calls. Then these values end up being converted back to byte[] while storing > in hbase. > It would be better to pass around byte[] or the Longs themselves as > applicable. > This may result in some api changes (store function) as well in adding a few > more function calls like getColumnQualifier which accepts a pre-encoded byte > array. It will be in addition to the existing api which accepts a String and > the ColumnHelper to return a byte[] column name instead of a String one. > Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692465#comment-14692465 ] Sangjin Lee commented on YARN-3906: --- I checked with [~vrushalic], and we decided to put the patch for this JIRA (YARN-3906) first. > split the application table from the entity table > - > > Key: YARN-3906 > URL: https://issues.apache.org/jira/browse/YARN-3906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3906-YARN-2928.001.patch, > YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, > YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, > YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch > > > Per discussions on YARN-3815, we need to split the application entities from > the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692412#comment-14692412 ] Hadoop QA commented on YARN-4047: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 1s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 27s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 53s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749935/YARN-4047.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8824/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8824/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8824/console | This message was automatically generated. > ClientRMService getApplications has high scheduler lock contention > -- > > Key: YARN-4047 > URL: https://issues.apache.org/jira/browse/YARN-4047 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: 2.6.1-candidate > Attachments: YARN-4047.001.patch > > > The getApplications call can be particuarly expensive because the code can > call checkAccess on every application being tracked by the RM. checkAccess > will often call scheduler.checkAccess which will grab the big scheduler lock. > This can cause a lot of contention with the scheduler thread which is busy > trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14687395#comment-14687395 ] Anubhav Dhoot commented on YARN-4046: - [~cnauroth] appreciate your review > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14687398#comment-14687398 ] Jian He commented on YARN-4026: --- - why "assignment.setFulfilledReservation(true);" is called in Reserved state ? {code} if (result.getAllocationState() == AllocationState.RESERVED) { // This is a reserved container LOG.info("Reserved container " + " application=" + application.getApplicationId() + " resource=" + allocatedResource + " queue=" + this.toString() + " cluster=" + clusterResource); assignment.getAssignmentInformation().addReservationDetails( updatedContainer.getId(), application.getCSLeafQueue().getQueuePath()); assignment.getAssignmentInformation().incrReservations(); Resources.addTo(assignment.getAssignmentInformation().getReserved(), allocatedResource); assignment.setFulfilledReservation(true); } else { {code} - I think here can always return ContainerAllocation.LOCALITY_SKIPPED as the semantics of this method is to try to allocate a container for certain locality. {code} return type == NodeType.OFF_SWITCH ? ContainerAllocation.APP_SKIPPED : ContainerAllocation.LOCALITY_SKIPPED; {code} The caller here can choose to return APP_SKIPPED if it sees the LOCALITY_SKIPPED {code} assigned = assignOffSwitchContainers(clusterResource, offSwitchResourceRequest, node, priority, reservedContainer, schedulingMode, currentResoureLimits); assigned.requestNodeType = requestType; return assigned; } {code} > FiCaSchedulerApp: ContainerAllocator should be able to choose how to order > pending resource requests > > > Key: YARN-4026 > URL: https://issues.apache.org/jira/browse/YARN-4026 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4026.1.patch > > > After YARN-3983, we have an extensible ContainerAllocator which can be used > by FiCaSchedulerApp to decide how to allocate resources. > While working on YARN-1651 (allocate resource to increase container), I found > one thing in existing logic not flexible enough: > - ContainerAllocator decides what to allocate for a given node and priority: > To support different kinds of resource allocation, for example, priority as > weight / skip priority or not, etc. It's better to let ContainerAllocator to > choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Description: On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} was: On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error. The attempts are not retried. > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4096.001.patch Attaching patch that prefixes "--" when using negative pid for kill > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4096.001.patch > > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14687389#comment-14687389 ] Wangda Tan commented on YARN-4045: -- [~tgraves]/[~shahrs87], I think the case could happen when container reservation interacts with node disconnect, one example is: {code} A cluster has 6 nodes, each node has 20G resource, and usage is N1-N4, are all used N5-N6, both of them are used 10G. An app ask 15G container, assume it is reserved at N5, so total used resource = 20G * 4 + 10G * 2 + 15G (just reserved) = 115G Then, N6 disconnected, now cluster resource becomes 100G, and used resource = 105G. {code} I've just checked fixes, YARN-3361 doesn't have related fixes. And currently we don't have a fix for above corner case. Another problem is caused by DRC, from 2.7.1, we have set availableResource = max(availableResource, Resources.none()). {code} childQueue.getMetrics().setAvailableResourcesToQueue( Resources.max( calculator, clusterResource, available, Resources.none() ) ); {code} But if you're using DRC, if a resource has availableMB < 0 and availableVCores > 0, it could report such resource > Resources.None(). We may need to fix this case as well. Thoughts? > Negative avaialbleMB is being reported for root queue. > -- > > Key: YARN-4045 > URL: https://issues.apache.org/jira/browse/YARN-4045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Rushabh S Shah > > We recently deployed 2.7 in one of our cluster. > We are seeing negative availableMB being reported for queue=root. > This is from the jmx output: > {noformat} > > ... > -163328 > ... > > {noformat} > The following is the RM log: > {noformat} > 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO > capacity.ParentQueue
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682496#comment-14682496 ] Xuan Gong commented on YARN-3999: - +1 lgtm. Will commit later if there are no other comments > RM hangs on draing events > - > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, > YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, > YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.009.patch Hi [~djp], Attaching a new patching resolving your comments also have modified one approach,, for cases where we to publish the timeline entities directly (not through wrapped application or container events) like ContainerMetrics, i have added a new NMTimelineEvent which accepts the TimelineEntity and ApplicationId, this approach avoids creating new event classes and would just suffice exposing method in NMTimelinePublisher. Also have fixed the test case failures but the javac warnings seems not to be related to my modifications and findbugs dint have any issue reported in the report. will check for it in next jenkins run > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, > YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4047: -- Labels: 2.6.1-candidate (was: ) > ClientRMService getApplications has high scheduler lock contention > -- > > Key: YARN-4047 > URL: https://issues.apache.org/jira/browse/YARN-4047 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: 2.6.1-candidate > Attachments: YARN-4047.001.patch > > > The getApplications call can be particuarly expensive because the code can > call checkAccess on every application being tracked by the RM. checkAccess > will often call scheduler.checkAccess which will grab the big scheduler lock. > This can cause a lot of contention with the scheduler thread which is busy > trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4047: - Attachment: YARN-4047.001.patch Patch that performs the checkAccess filter last rather than first. > ClientRMService getApplications has high scheduler lock contention > -- > > Key: YARN-4047 > URL: https://issues.apache.org/jira/browse/YARN-4047 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-4047.001.patch > > > The getApplications call can be particuarly expensive because the code can > call checkAccess on every application being tracked by the RM. checkAccess > will often call scheduler.checkAccess which will grab the big scheduler lock. > This can cause a lot of contention with the scheduler thread which is busy > trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682379#comment-14682379 ] Dustin Cote commented on YARN-2369: --- [~jlowe] thanks for all the input. I'll clean this latest patch up based on these comments this week. Happy to throw this in the MAPREDUCE project instead as well, since basically all the changes are in the MR client. I don't think sub JIRAs would be necessary since it's a pretty small change on the YARN side, but I leave that to the project management experts. I don't see any organizational problem keeping it all in one JIRA here. > Environment variable handling assumes values should be appended > --- > > Key: YARN-2369 > URL: https://issues.apache.org/jira/browse/YARN-2369 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Dustin Cote > Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, > YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch > > > When processing environment variables for a container context the code > assumes that the value should be appended to any pre-existing value in the > environment. This may be desired behavior for handling path-like environment > variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a > non-intuitive and harmful way to handle any variable that does not have > path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-4047: Assignee: Jason Lowe In OOZIE-1729 Oozie started calling getApplications to look for applications with specific tags. This significantly increases the utilization of this method on a cluster that makes heavy use of Oozie. One quick fix for the Oozie use-case may be to swap the filter order. Rather than doing the expensive checkAccess call first, we can do all the other filtering first and finally verify the user has access before adding the app to the response. In the Oozie scenario most apps will be filtered by the tag check before we ever get to the checkAccess call. > ClientRMService getApplications has high scheduler lock contention > -- > > Key: YARN-4047 > URL: https://issues.apache.org/jira/browse/YARN-4047 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > > The getApplications call can be particuarly expensive because the code can > call checkAccess on every application being tracked by the RM. checkAccess > will often call scheduler.checkAccess which will grab the big scheduler lock. > This can cause a lot of contention with the scheduler thread which is busy > trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
Jason Lowe created YARN-4047: Summary: ClientRMService getApplications has high scheduler lock contention Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4023) Publish Application Priority to TimelineServer
[ https://issues.apache.org/jira/browse/YARN-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682360#comment-14682360 ] Hadoop QA commented on YARN-4023: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 12s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 38s | The applied patch generated 1 new checkstyle issues (total was 16, now 16). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 1m 53s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 13s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 53m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 123m 59s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749303/0001-YARN-4023.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 1fc3c77 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8823/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8823/console | This message was automatically generated. > Publish Application Priority to TimelineServer > -- > > Key: YARN-4023 > URL: https://issues.apache.org/jira/browse/YARN-4023 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4023.patch, 0001-YARN-4023.patch, > ApplicationPage.png, TimelineserverMainpage.png > > > Publish Application priority details to Timeline Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682293#comment-14682293 ] Anubhav Dhoot commented on YARN-4046: - As per GNU linux [documentation|http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html#kill-invocation] "--" may not be needed, but looks like all distros (Debian) do not support not having "--". {noformat} If a negative pid argument is desired as the first one, it should be preceded by --. However, as a common extension to POSIX, -- is not required with ‘kill -signal -pid’. {noformat} So a fix is to prefix "--" always to match the recommendation. > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682284#comment-14682284 ] Anubhav Dhoot commented on YARN-4046: - The error in NodeManager shows {noformat} 2015-08-10 15:14:05,567 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e45_1439244348718_0001_01_01 java.io.IOException: Timeout while waiting for exit code from container_e45_1439244348718_0001_01_01 at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Looking under the debugger the actual shell command to check if container is alive fails because the kill command syntax "kill -0 -20773" fails. {noformat} his = {org.apache.hadoop.util.Shell$ShellCommandExecutor@6740} "kill -0 -20773 " builder = {java.lang.ProcessBuilder@6789} command = {java.util.ArrayList@6813} size = 3 directory = null environment = null redirectErrorStream = false redirects = null timeOutTimer = null timeoutTimerTask = null errReader = {java.io.BufferedReader@6830} inReader = {java.io.BufferedReader@6833} errMsg = {java.lang.StringBuffer@6836} "kill: invalid option -- '2'\n\nUsage:\n kill [options] [...]\n\nOptions:\n [...]send signal to every listed\n -, -s, --signal \n specify the to be sent\n -l, --list=[] list all signal names, or convert one to a name\n -L, --tablelist all signal names in a nice table\n\n -h, --help display this help and exit\n -V, --version output version information and exit\n\nFor more details see kill(1).\n" errThread = {org.apache.hadoop.util.Shell$1@6839} "Thread[Thread-102,5,]" line = null exitCode = 1 completed = {java.util.concurrent.atomic.AtomicBoolean@6806} "true" {noformat} This causes DefaultContainerExecutor#containerIsAlive to catch ExitCodeException thrown by ShellCommandExecutor.execute making it assume the container is lost. > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Summary: Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST (was: NM container recovery is broken on some linux distro because of syntax of signal) > Applications fail on NM restart on some linux distro because NM container > recovery declares AM container as LOST > > > Key: YARN-4046 > URL: https://issues.apache.org/jira/browse/YARN-4046 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > > On a debian machine we have seen node manager recovery of containers fail > because the signal syntax for process group may not work. We see errors in > checking if process is alive during container recovery which causes the > container to be declared as LOST (154) on a NodeManager restart. > The application will fail with error > {noformat} > Application application_1439244348718_0001 failed 1 times due to Attempt > recovered after RM restartAM Container for > appattempt_1439244348718_0001_01 exited with exitCode: 154 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4046) NM container recovery is broken on some linux distro because of syntax of signal
Anubhav Dhoot created YARN-4046: --- Summary: NM container recovery is broken on some linux distro because of syntax of signal Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM
[ https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682240#comment-14682240 ] Rohith Sharma K S commented on YARN-3979: - I had look at the RM logs shared, I strongly suspect that it is because of the same reason in YARN-3990. >From the shared log, I see below logs which indicates that asyncdispatcher is >overloaded with unnecessary events. May be you can use patch of YARN-3990 and >test it. {noformat} 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: BJHC-HERA-18352.hadoop.jd.local:50086 Node Transitioned from RUNNING to LOST 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved BJHC-HADOOP-HERA-17280.jd.local to /rack/rack4065 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2515000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2515000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node BJHC-HADOOP-HERA-17280.jd.local(cmPort: 50086 httpPort: 8042) registered with capability: , assigned nodeId BJHC-HADOOP-HERA-17280.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved BJHC-HERA-164102.hadoop.jd.local to /rack/rack41007 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node BJHC-HERA-164102.hadoop.jd.local(cmPort: 50086 httpPort: 8042) registered with capability: , assigned nodeId BJHC-HERA-164102.hadoop.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2516000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2516000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not found resyncing BJHC-HERA-18043.hadoop.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2517000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2517000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2518000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2518000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2519000 {noformat} > Am in ResourceLocalizationService hang 10 min cause RM kill AM > --- > > Key: YARN-3979 > URL: https://issues.apache.org/jira/browse/YARN-3979 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 > Environment: CentOS 6.5 Hadoop-2.2.0 >Reporter: zhangyubiao > Attachments: ERROR103.log > > > 2015-07-27 02:46:17,348 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1437735375558 > _104282_01_01 > 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE) > 2015-07-27 02:56:18,510 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for appattempt_1437735375558_104282_0 > 1 (auth:TOKEN) for protocol=interface > org.apache.hadoop.yarn.api.ContainerManagementProtocolPB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682118#comment-14682118 ] Rushabh S Shah commented on YARN-4045: -- bq. Thanks Rushabh S Shah for reporting this. One doubt, Which ResourceCalculator is used here? Is it Dominant RC. yes. > Negative avaialbleMB is being reported for root queue. > -- > > Key: YARN-4045 > URL: https://issues.apache.org/jira/browse/YARN-4045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Rushabh S Shah > > We recently deployed 2.7 in one of our cluster. > We are seeing negative availableMB being reported for queue=root. > This is from the jmx output: > {noformat} > > ... > -163328 > ... > > {noformat} > The following is the RM log: > {noformat} > 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:47,401 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root used
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682115#comment-14682115 ] Thomas Graves commented on YARN-4045: - I remember seeing that this was fixed in branch-2 by some of the capacity scheduler work for labels. I thought this might be fixed by https://issues.apache.org/jira/browse/YARN-3243 but that is included. This might be fixed as part of https://issues.apache.org/jira/browse/YARN-3361 which is probably to big to backport totally. [~leftnoteasy] Do you remember this issue? Note that it also shows up in capacity scheduler UI as root queue going over 100%. I remember when I was testing YARN-3434 it wasn't occurring for me on branch-2 (2.8) and I thought it was one of the above jiras that fixed. > Negative avaialbleMB is being reported for root queue. > -- > > Key: YARN-4045 > URL: https://issues.apache.org/jira/browse/YARN-4045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Rushabh S Shah > > We recently deployed 2.7 in one of our cluster. > We are seeing negative availableMB being reported for queue=root. > This is from the jmx output: > {noformat} > > ... > -163328 > ... > > {noformat} > The following is the RM log: > {noformat} > 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,886 [
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682112#comment-14682112 ] Rohith Sharma K S commented on YARN-3999: - thank [~jianhe] for the explanation. Overall patch looks good to me.. > RM hangs on draing events > - > > Key: YARN-3999 > URL: https://issues.apache.org/jira/browse/YARN-3999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, > YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, > YARN-3999.patch > > > If external systems like ATS, or ZK becomes very slow, draining all the > events take a lot of time. If this time becomes larger than 10 mins, all > applications will expire. Fixes include: > 1. add a timeout and stop the dispatcher even if not all events are drained. > 2. Move ATS service out from RM active service so that RM doesn't need to > wait for ATS to flush the events when transitioning to standby. > 3. Stop client-facing services (ClientRMService etc.) first so that clients > get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682110#comment-14682110 ] Sunil G commented on YARN-4045: --- Thanks [~shahrs87] for reporting this. One doubt, Which ResourceCalculator is used here? Is it Dominant RC. > Negative avaialbleMB is being reported for root queue. > -- > > Key: YARN-4045 > URL: https://issues.apache.org/jira/browse/YARN-4045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Rushabh S Shah > > We recently deployed 2.7 in one of our cluster. > We are seeing negative availableMB being reported for queue=root. > This is from the jmx output: > {noformat} > > ... > -163328 > ... > > {noformat} > The following is the RM log: > {noformat} > 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:47,401 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > abso
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682102#comment-14682102 ] Junping Du commented on YARN-3906: -- Thanks [~sjlee0] for the patch work and [~gtCarrera9] for review! Latest patch LGTM. However, I will wait for our decision on sequence of YARN-4025. > split the application table from the entity table > - > > Key: YARN-3906 > URL: https://issues.apache.org/jira/browse/YARN-3906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3906-YARN-2928.001.patch, > YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, > YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, > YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch > > > Per discussions on YARN-3815, we need to split the application entities from > the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused
[ https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682099#comment-14682099 ] Rohith Sharma K S commented on YARN-3924: - I agree with the concern that user should be able to obtain standby exception.I am not sure whether this point was discussed when initially RM HA was designed. keeping cc:\ [~ka...@cloudera.com] [~jianhe] [~xgong] [~vinodkv] for more discussion on this. > Submitting an application to standby ResourceManager should respond better > than Connection Refused > -- > > Key: YARN-3924 > URL: https://issues.apache.org/jira/browse/YARN-3924 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Dustin Cote >Assignee: Ajith S >Priority: Minor > > When submitting an application directly to a standby resource manager, the > resource manager responds with 'Connection Refused' rather than indicating > that it is a standby resource manager. Because the resource manager is aware > of its own state, I feel like we can have the 8032 port open for standby > resource managers and reject the request with something like 'Cannot process > application submission from this standby resource manager'. > This would be especially helpful for debugging oozie problems when users put > in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM > address but rather point to a specific resource manager). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4045) Negative avaialbleMB is being reported for root queue.
Rushabh S Shah created YARN-4045: Summary: Negative avaialbleMB is being reported for root queue. Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} ... -163328 ... {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used= cluster= 2015-08-10 14:42:47,401 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used= cluster= {noformat} bq. used= cluster= For root queue, usedCapacity is more than totalCapacity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682082#comment-14682082 ] Sunil G commented on YARN-3212: --- Hi [~djp] I have one doubt in this. For {{StatusUpdateWhenHealthyTransition}}, if state of node is DECOMMISSIONING at init state, now we move to DECOMMISIONED directly. Cud we give a chance to move it to UNHEALTHY here , so later after some rounds we can mark as DECOMMISIONED if it cannot be revived. Your thoughts? > RMNode State Transition Update with DECOMMISSIONING state > - > > Key: YARN-3212 > URL: https://issues.apache.org/jira/browse/YARN-3212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Junping Du >Assignee: Junping Du > Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, > YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, > YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch > > > As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and > can transition from “running” state triggered by a new event - > “decommissioning”. > This new state can be transit to state of “decommissioned” when > Resource_Update if no running apps on this NM or NM reconnect after restart. > Or it received DECOMMISSIONED event (after timeout from CLI). > In addition, it can back to “running” if user decides to cancel previous > decommission by calling recommission on the same node. The reaction to other > events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)