[jira] [Updated] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3264: - Attachment: YARN-3264.003.patch Updated patch as per review suggestions > [Storage implementation] Create a POC only file based storage implementation > for ATS writes > --- > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nijel reassigned YARN-3271: --- Assignee: nijel > FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler > to TestAppRunnability > --- > > Key: YARN-3271 > URL: https://issues.apache.org/jira/browse/YARN-3271 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: nijel > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
[ https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346588#comment-14346588 ] nijel commented on YARN-3271: - I would like to work no this task As per initial analysis the following test cases using the concept of runnable apps. testUserAsDefaultQueue testNotUserAsDefaultQueue testAppAdditionAndRemoval testPreemptionVariablesForQueueCreatedRuntime testDontAllowUndeclaredPools testMoveRunnableApp testMoveNonRunnableApp testMoveMakesAppRunnable Can i move these tests to the new class ? Correct me if i misunderstood the task > FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler > to TestAppRunnability > --- > > Key: YARN-3271 > URL: https://issues.apache.org/jira/browse/YARN-3271 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: nijel > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346602#comment-14346602 ] Jian He commented on YARN-3021: --- bq. RM should check if the renewer is null Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346638#comment-14346638 ] Ryu Kobayashi commented on YARN-3249: - [~jianhe] I see. Sure, looks good that. I'll try it. > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.patch, killapp-failed.log, > killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346652#comment-14346652 ] Hadoop QA commented on YARN-3264: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702408/YARN-3264.003.patch against trunk revision 3560180. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6835//console This message is automatically generated. > [Storage implementation] Create a POC only file based storage implementation > for ATS writes > --- > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346710#comment-14346710 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346708#comment-14346708 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346720#comment-14346720 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/856/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346718#comment-14346718 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/856/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated YARN-3187: - Attachment: YARN-3187.3.patch > Documentation of Capacity Scheduler Queue mapping based on user or group > > > Key: YARN-3187 > URL: https://issues.apache.org/jira/browse/YARN-3187 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, documentation >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Gururaj Shetty > Labels: documentation > Fix For: 2.6.0 > > Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch > > > YARN-2411 exposes a very useful feature {{support simple user and group > mappings to queues}} but its not captured in the documentation. So in this > jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346776#comment-14346776 ] Gururaj Shetty commented on YARN-3187: -- Hi [~jianhe]/[~Naganarasimha Garla] Attached the patch for the Markdown (.md). Please review. > Documentation of Capacity Scheduler Queue mapping based on user or group > > > Key: YARN-3187 > URL: https://issues.apache.org/jira/browse/YARN-3187 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, documentation >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Gururaj Shetty > Labels: documentation > Fix For: 2.6.0 > > Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch > > > YARN-2411 exposes a very useful feature {{support simple user and group > mappings to queues}} but its not captured in the documentation. So in this > jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3249: Attachment: YARN-3249.6.patch > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, > killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346792#comment-14346792 ] Ryu Kobayashi commented on YARN-3249: - I was to call directly RMWebService. And, I was changed enable so that by default. > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, > killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346835#comment-14346835 ] Brahma Reddy Battula commented on YARN-3170: [~aw] can I go head like above..? please give your inputs .. > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI
[ https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346849#comment-14346849 ] Hadoop QA commented on YARN-3249: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702452/YARN-3249.6.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6836//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6836//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6836//console This message is automatically generated. > Add the kill application to the Resource Manager Web UI > --- > > Key: YARN-3249 > URL: https://issues.apache.org/jira/browse/YARN-3249 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0, 2.7.0 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, > YARN-3249.4.patch, YARN-3249.5.patch, YARN-3249.6.patch, YARN-3249.patch, > killapp-failed.log, killapp-failed2.log, screenshot.png, screenshot2.png > > > It want to kill the application on the JobTracker similarly Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346876#comment-14346876 ] Varun Vasudev commented on YARN-2190: - [~chuanliu] here's the error I got on my mac - {noformat} HWx:hadoop vvasudev$ cat ~/Downloads/YARN-2190.10.patch | patch -p0 --dry-run patching file hadoop-common-project/hadoop-common/src/main/winutils/task.c patching file hadoop-common-project/hadoop-common/src/main/winutils/win8sdk.props patching file hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj Hunk #1 FAILED at 67. 1 out of 1 hunk FAILED -- saving rejects to file hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj.rej patching file hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java patching file hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java patching file hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java patching file hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsContainerExecutor.java patching file hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestWindowsContainerExecutor.java {noformat} > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346889#comment-14346889 ] Varun Vasudev commented on YARN-2190: - Also, the java portion of the patch looks ok to me. Some comments - 1. What about WindowsSecureContainerExecutor? Does this feature not apply to secure environments? 2. Can you please add documentation on the new config variables to yarn-default.xml? > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346911#comment-14346911 ] Varun Vasudev commented on YARN-2190: - [~chuanliu] one more issue with the patch - {noformat} if (conf.getBoolean(YarnConfiguration.NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED, + YarnConfiguration.DEFAULT_NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED)) { +int vcores = resource.getVirtualCores(); +// cap overall usage to the number of cores allocated to YARN +float yarnProcessors = NodeManagerHardwareUtils.getContainersCores( +ResourceCalculatorPlugin.getResourceCalculatorPlugin(null, conf), +conf); +// CPU should be set to a percentage * 100, e.g. 20% cpu rate limit +// should be set as 20 * 100. The following setting is equal to: +// 100 * (100 * (vcores / Total # of cores allocated to YARN)) +cpuRate = Math.min(1, (int) ((vcores * 1) / yarnProcessors)); + } {noformat} This may not behave as users expected. The 'yarnProcessors' that you receive from the NodeManagerHardwareUtils is the number of physical cores allocated to YARN containers. However resource.getVirtualCores() returns a number that the user submits(and can be potentially greater than 'yarnProcessors'). For example, an admin sets 'yarn.nodemanager.resource.cpu-vcores' to 24 on a node with 4 cores(this can be done by admins who wish to oversubscribe nodes). He also sets 'yarn.nodemanager.resource.percentage-physical-cpu-limit' to 50 indicating that only 2 physical cores are to be used for YARN containers. The RM allocates two containers with 12 vcores each on the node. According to your math both containers would get 100% cpu, when each container should only get 25% cpu. What you need to do is scale the container vcores to the number of physical cores and not use the value as provided. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346925#comment-14346925 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346932#comment-14346932 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346934#comment-14346934 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346923#comment-14346923 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347000#comment-14347000 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346998#comment-14346998 ] Hudson commented on YARN-3272: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/CHANGES.txt > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347026#comment-14347026 ] Yongjun Zhang commented on YARN-3021: - Many thanks Jian. {quote} Change MR client to set null renewer for the token coming from a different cluster {quote} In the special case that we are dealing with in this jira, cluster A and cluster B don't trust each other. However, in other scenarios, two clusters may trust each other. So we can't always set null renewer based on which cluster the token is from. Maybe we can combine our approaches, set null renewer for external cluster only when {{-Dmapreduce.job.delegation.tokenrenewer.for.external.cluster=null}} is specified for a job? {quote} Actually, YARN can also provide a constant string say "SKIP_RENEW_TOKEN", MR uses this string as the renewer for tokens it doesn't want to renew. RM detects if the renewer equals the constant string and skip renew if it is. {quote} Maybe we can use string "null" for SKIP_RENEW_TOKEN? we need to document whatever string here as a special string so application don't use it for tokens that need to be renewed. There is still chance of changing existing applications behavior for those who happen to set the renewer to our special string. So what about we still introduce {{yarn.resourcemanager.validate.tokenrenewer}} described in my last comment (enable renewer validation only when the config is true)? Thanks. > YARN's delegation-token handling disallows certain trust setups to operate > properly over DistCp > --- > > Key: YARN-3021 > URL: https://issues.apache.org/jira/browse/YARN-3021 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.3.0 >Reporter: Harsh J > Attachments: YARN-3021.001.patch, YARN-3021.002.patch, > YARN-3021.003.patch, YARN-3021.patch > > > Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, > and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN > clusters. > Now if one logs in with a COMMON credential, and runs a job on A's YARN that > needs to access B's HDFS (such as a DistCp), the operation fails in the RM, > as it attempts a renewDelegationToken(…) synchronously during application > submission (to validate the managed token before it adds it to a scheduler > for automatic renewal). The call obviously fails cause B realm will not trust > A's credentials (here, the RM's principal is the renewer). > In the 1.x JobTracker the same call is present, but it is done asynchronously > and once the renewal attempt failed we simply ceased to schedule any further > attempts of renewals, rather than fail the job immediately. > We should change the logic such that we attempt the renewal but go easy on > the failure and skip the scheduling alone, rather than bubble back an error > to the client, failing the app submission. This way the old behaviour is > retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3272) Surface container locality info in RM web UI
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347035#comment-14347035 ] Hudson commented on YARN-3272: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) YARN-3272. Surface container locality info in RM web UI (Jian He via wangda) (wangda: rev e17e5ba9d7e2bd45ba6884f59f8045817594b284) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > Surface container locality info in RM web UI > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.0 > > Attachments: YARN-3272.1.patch, YARN-3272.2.patch, YARN-3272.3.patch, > YARN-3272.4.patch, YARN-3272.5.patch, YARN-3272.5.patch, YARN-3272.6.patch, > YARN-3272.6.patch, container locality table.png > > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347037#comment-14347037 ] Hudson commented on YARN-3222: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Rebasing against trunk. Errors look unrelated > Priority Label Manager in RM to manage application priority based on > configuration > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, > 0006-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * Expose interface to RM to validate priority label > TO have simplified interface, Priority Manager will support only > configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3039: --- Assignee: Naganarasimha G R (was: Junping Du) > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: (was: 0006-YARN-2693.patch) > Priority Label Manager in RM to manage application priority based on > configuration > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * Expose interface to RM to validate priority label > TO have simplified interface, Priority Manager will support only > configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347053#comment-14347053 ] Naganarasimha G R commented on YARN-3039: - Hi [~djp], bq. for security reason it make NM to take AMRMTokens for using TimelineClient in future which make less sense. To get rid of rack condition you mentioned above, we propose to use observer pattern to make TimelineClient can listen aggregator address update in AM or NM (wrap with retry logic to tolerant connection failure). Even if we are not able to have "AMRMClient can be wrapped into TimelineClient" i feel other suggestion from vinod was right {{to add a blocking call in AMRMClient to get aggregator address directly from RM.}} instead of observer pattern @ the AM side. thoughts ? bq. There are other ways (check from diagram in YARN-3033) that app aggregators could be deployed in a separate process or an independent container which make less sense to have a protocol between AUX service and RM. I think now we should plan to add a protocol between aggregator and NM, and then notify RM through NM-RM heartbeat on registering/rebind for aggregator. Yes i have gone through 3033, but earlier was trying to mention as our current approach was with NM AUX service. But anyway what i wanted was some kind of protocol between appAggregators with either NM or RM. Protocol between NM and appAgregator should suffice all other ways to launch AppAgregators. bq. app aggregator should have logic to consolidate all messages (events and metrics) for one application into more complex and flexible new data model. If each NM do aggregation separately, then it still a writer (like old timeline service), but not an aggregator Well if there is no logic/requirement to aggregate/consolidate all messages (events and metrics) for an App, then in my opinion it better not to have additional instances of aggregators and we can keep it similar to old Timline service. bq. Will update proposal to reflect all these discussions (JIRA's and offline). Thanks it will be more clear to implement if we have the proposals documented. > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347045#comment-14347045 ] Sunil G commented on YARN-3136: --- HI [~jlowe] and [~jianhe] Could u please take a look on the patch. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-3039: Assignee: Junping Du (was: Naganarasimha G R) > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3136: -- Attachment: 0004-YARN-3136.patch > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347172#comment-14347172 ] Hadoop QA commented on YARN-2693: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702487/0006-YARN-2693.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationpriority.TestApplicationPriorityManager Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6837//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6837//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6837//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6837//console This message is automatically generated. > Priority Label Manager in RM to manage application priority based on > configuration > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, > 0006-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * Expose interface to RM to validate priority label > TO have simplified interface, Priority Manager will support only > configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347199#comment-14347199 ] Hadoop QA commented on YARN-3136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699495/0003-YARN-3136.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6838//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6838//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6838//console This message is automatically generated. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration
[ https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347232#comment-14347232 ] Hadoop QA commented on YARN-3136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702491/0004-YARN-3136.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6839//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6839//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6839//console This message is automatically generated. > getTransferredContainers can be a bottleneck during AM registration > --- > > Key: YARN-3136 > URL: https://issues.apache.org/jira/browse/YARN-3136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-3136.patch, 0002-YARN-3136.patch, > 0003-YARN-3136.patch, 0004-YARN-3136.patch > > > While examining RM stack traces on a busy cluster I noticed a pattern of AMs > stuck waiting for the scheduler lock trying to call getTransferredContainers. > The scheduler lock is highly contended, especially on a large cluster with > many nodes heartbeating, and it would be nice if we could find a way to > eliminate the need to grab this lock during this call. We've already done > similar work during AM allocate calls to make sure they don't needlessly grab > the scheduler lock, and it would be good to do so here as well, if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2854: Attachment: YARN-2854.20150304.1.patch Thanks [~gururaj] for helping me to convert the doc to markdown, [~jianhe], can you please review the patch . > The document about timeline service and generic service needs to be updated > --- > > Key: YARN-2854 > URL: https://issues.apache.org/jira/browse/YARN-2854 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Naganarasimha G R >Priority: Critical > Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, > YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, timeline_structure.jpg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3039: - Attachment: YARN-3039-v2-incomplete.patch Update a patch (haven't finished yet) to reflect some of discussions above. Including: + maintain the app aggregator info in RMApp, with event model (done) + aggregator update in NM-RM heartbeat (done) + aggregator update in AM-RM allocation request/response (done) + Persistent aggregator update in RMStateStore (fix previous patch) + a new API to ResourceTrackerService to register app aggregator to RM (done) + adding a new protocol between aggregator and NM - new proto file (and proto structure for request and response) -- done. - interfaces: (protocol, request, response) - AggregatorNodemanagerProtocol (done) - AggregatorNodemanagerProtocolPBClientImpl (TODO) - NMAggregatorService (TODO, server impl) - AggregatorNodemanagerProtocolPB (done) - AggregatorNodemanagerProtocolPBServiceImpl (done) - ReportNewAggregatorsInfoRequest/Response (and PBs) (done) - ReportNewAggregatorsInfoRequestPBImpl (done) - ReportNewAggregatorsInfoResponse (done) - ReportNewAggregatorsInfoResponsePBImpl (done) - AppAggregatorsMap (done) AppAggregatorsMapPBImpl (done) Not included yet: + NM hosting new protocol + Aggregator call new protocol client + aggregator info get recovered during NM restart + make TimelineClient Observer pattern to observe the change of aggregator address. Will update the proposal afterwards. > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2868: - Attachment: YARN-2868.009.patch Update with latest feedback. > Add metric for initial container launch time to FairScheduler > - > > Key: YARN-2868 > URL: https://issues.apache.org/jira/browse/YARN-2868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: metrics, supportability > Attachments: YARN-2868-01.patch, YARN-2868.002.patch, > YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, > YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, > YARN-2868.009.patch > > > Add a metric to measure the latency between "starting container allocation" > and "first container actually allocated". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: (was: YARN-3242.004.patch) > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347295#comment-14347295 ] Hudson commented on YARN-3131: -- FAILURE: Integrated in Hadoop-trunk-Commit #7255 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7255/]) YARN-3131. YarnClientImpl should check FAILED and KILLED state in submitApplication. Contributed by Chang Li (jlowe: rev 03cc22945e5d4e953c06a313b8158389554a6aa7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > YarnClientImpl should check FAILED and KILLED state in submitApplication > > > Key: YARN-3131 > URL: https://issues.apache.org/jira/browse/YARN-3131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Fix For: 2.7.0 > > Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, > yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, > yarn_3131_v6.patch, yarn_3131_v7.patch > > > Just run into a issue when submit a job into a non-existent queue and > YarnClient raise no exception. Though that job indeed get submitted > successfully and just failed immediately after, it will be better if > YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: YARN-3242.004.patch > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
Varun Vasudev created YARN-3294: --- Summary: Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3293) Track and display capacity scheduler health metrics in web UI
Varun Vasudev created YARN-3293: --- Summary: Track and display capacity scheduler health metrics in web UI Key: YARN-3293 URL: https://issues.apache.org/jira/browse/YARN-3293 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev It would be good to display metrics that let users know about the health of the capacity scheduler in the web UI. Today it is hard to get an idea if the capacity scheduler is functioning correctly. Metrics such as the time for the last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN_3267_V1.patch > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, > YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347320#comment-14347320 ] Chang Li commented on YARN-3267: Have implemented unit test for this patch > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, > YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347348#comment-14347348 ] Jason Lowe commented on YARN-3294: -- Do we really need a dedicated button for a specific system/scheduler when there's already the logLevel applet that let's us control log levels of arbitrary loggers in the process? > Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time > period > - > > Key: YARN-3294 > URL: https://issues.apache.org/jira/browse/YARN-3294 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > > It would be nice to have a button on the web UI that would allow dumping of > debug logs for just the capacity scheduler for a fixed period of time(1 min, > 5 min or so) in a separate log file. It would be useful when debugging > scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3031. --- Resolution: Duplicate Since the patch there covers the code of the writer interface. Let's resolve this one as the duplicate of YARN-3264. > [Storage abstraction] Create backing storage write interface for ATS writers > > > Key: YARN-3031 > URL: https://issues.apache.org/jira/browse/YARN-3031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C > Attachments: Sequence_diagram_write_interaction.2.png, > Sequence_diagram_write_interaction.png, YARN-3031.01.patch, > YARN-3031.02.patch, YARN-3031.03.patch > > > Per design in YARN-2928, come up with the interface for the ATS writer to > write to various backing storages. The interface should be created to capture > the right level of abstractions so that it will enable all backing storage > implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3264: -- Summary: [Storage implementation] Create backing storage write interface and a POC only file based storage implementation (was: [Storage implementation] Create a POC only file based storage implementation for ATS writes) > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347387#comment-14347387 ] Li Lu commented on YARN-3264: - Hi [~vrushalic], thanks for the patch! In general it looks good to me. I have a few quick questions about it: # In the following lines: {code} +String tmpRoot = FileSystemTimelineServiceWriterImpl.TIMELINE_SERVICE_STORAGE_DIR_ROOT; +if (tmpRoot == null || tmpRoot.isEmpty()) { + tmpRoot = "/tmp/"; +} {code} TIMELINE_SERVICE_STORAGE_DIR_ROOT is defined as final in FileSystemTimelineServiceWriterImpl (with a not-null initial value), why are we still checking if it's null here? (Am I missing anything? ) # Why we're removing the abstract keyword for the TimelineAggregator class? I thought this class was supposed to be abstract? {code} -public abstract class TimelineAggregator extends CompositeService { +public class TimelineAggregator extends CompositeService { {code} > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347384#comment-14347384 ] Junping Du commented on YARN-3039: -- Thanks [~Naganarasimha] for comments! bq. Even if we are not able to have "AMRMClient can be wrapped into TimelineClient" i feel other suggestion from vinod was right to add a blocking call in AMRMClient to get aggregator address directly from RM. instead of observer pattern @ the AM side. thoughts? I am open for this way. However, more to treat this as an optimization (don't have to wait heartbeat interval). Within this JIRA scope, I think we should have heartbeat in ApplicationMasterService as basic mechanism because some applications (like MR) doesn't use AMRMClient for now. We can have separated JIRA to address this optimization if necessary. BTW, what's your concern for observer (listener) pattern in AM? bq. Yes i have gone through 3033, but earlier was trying to mention as our current approach was with NM AUX service. But anyway what i wanted was some kind of protocol between appAggregators with either NM or RM. Protocol between NM and appAgregator should suffice all other ways to launch AppAgregators. Yes. Agree that not too much differences for aggregator talk to NM or RM. Just as demo patch shows, I would prefer slightly for NM because it seems RM already host too many RPC services today. bq. Well if there is no logic/requirement to aggregate/consolidate all messages (events and metrics) for an App, then in my opinion it better not to have additional instances of aggregators and we can keep it similar to old Timeline service. I am not sure on this but assume this is one part of motivation that we need new TimelineService (not only for performance reasons)? bq. Thanks it will be more clear to implement if we have the proposals documented. No problem. I will upload a new one when figuring out the demo patch which force me to address more details. > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347420#comment-14347420 ] Hadoop QA commented on YARN-2854: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702513/YARN-2854.20150304.1.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6843//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6843//console This message is automatically generated. > The document about timeline service and generic service needs to be updated > --- > > Key: YARN-2854 > URL: https://issues.apache.org/jira/browse/YARN-2854 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Naganarasimha G R >Priority: Critical > Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, > YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, timeline_structure.jpg > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: YARN-3242.004.patch > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3242: Attachment: (was: YARN-3242.004.patch) > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347431#comment-14347431 ] Remus Rusanu commented on YARN-2190: >From my experience patches containing .sln or .vcxproj changes need to have >Windows style CRLF terminators *for the lines in the .sln/.vcxproj thunks*. >The rest of the patch should be normal Unix style terminators. If this is not >true then the patch will apply fine on Windows, but fail on Linux/Mac. > Provide a Windows container executor that can limit memory and CPU > -- > > Key: YARN-2190 > URL: https://issues.apache.org/jira/browse/YARN-2190 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, > YARN-2190.10.patch, YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, > YARN-2190.5.patch, YARN-2190.6.patch, YARN-2190.7.patch, YARN-2190.8.patch, > YARN-2190.9.patch > > > Yarn default container executor on Windows does not set the resource limit on > the containers currently. The memory limit is enforced by a separate > monitoring thread. The container implementation on Windows uses Job Object > right now. The latest Windows (8 or later) API allows CPU and memory limits > on the job objects. We want to create a Windows container executor that sets > the limits on job objects thus provides resource enforcement at OS level. > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347457#comment-14347457 ] Varun Saxena commented on YARN-2928: What is meant by manual reader ? > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2928: -- Attachment: Timeline Service Next Gen - Planning - ppt.pptx I made up some notes (attached) on how we collectively work on this - to help surface some clarity of project execution for everyone involved. Divided the effort into phases. Feedback welcome. I'll keep this updated as things progress. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347464#comment-14347464 ] Vinod Kumar Vavilapalli commented on YARN-2786: --- Looks good. Checking this in. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347460#comment-14347460 ] Varun Saxena commented on YARN-3264: We can probably use try with resources construct as well. > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.006.patch Fixed using constant CpuTimeTracker.UNAVAILABLE instead of hard coded -1 > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347481#comment-14347481 ] Vinod Kumar Vavilapalli commented on YARN-2786: --- Actually, this won't apply to branch-2. Can you upload the branch-2 patch too? Tx. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347437#comment-14347437 ] Vinod Kumar Vavilapalli commented on YARN-2928: --- bq. Are there any plans to include intermediate routing/forwarding systems for ATS v2? We have a storage/forwarder interface that can definitely be plugged into to do something like this. > Application Timeline Server (ATS) next gen: phase 1 > --- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal > v1.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3264: - Attachment: YARN-3264.004.patch updating as per [~gtCarrera9] 's suggestions > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch, YARN-3264.004.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347490#comment-14347490 ] Hadoop QA commented on YARN-3264: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702599/YARN-3264.004.patch against trunk revision ed70fa1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6846//console This message is automatically generated. > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch, YARN-3264.004.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347491#comment-14347491 ] Chen He commented on YARN-3289: --- Thank [~jlowe] for the comments. IMHO, we can move the docker image localization into a preparation task. If we are using DCE for running applications. For example, we have 10 task in a job, we create extra 1 "tasks" for each real task. I mean, start a extra dummy task that can heartbeat and do the image downloading work. Once it is done, the real task can start to run. The benefit is that we can control the placement of those dummy tasks and achieve "data locality" for docker image localization. For example: we have node1 which has already downloaded the docker image and AM starts to run on it. If possible, RM scheduler should put other dummy and real task on this node since node1 has already has the image. Comparing with job input data (a block? maybe), the docker image "locality" (more than 10 min to download a image, it will be more than 2GB) may be more important. > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347524#comment-14347524 ] Jason Lowe commented on YARN-3289: -- Regarding a separate prepping task, localization already is a separate preparation task for non-public resources. See ContainerLocalizer. I don't think docker image download and localization as is done today is fundamentally different at a high level -- in both cases we are prepping the node to be able to run the container. No need to complicate the process with a specialized extra step just for docker. What we're missing here is progress reporting during localization so AMs can properly monitor progress of container launch requests before their code starts running, and that's useful for non-docker localization scenarios as well. Adjusting locality based on the cost of localization is an interesting idea, and applies to the non-docker case as well. However the docker case can be a bit tricky. One node may take tens of minutes to localize a docker image, but another node might only take a few seconds. Docker images are often derived from other images, and docker only downloads the deltas. So it will be difficult for YARN that is not aware of the docker contents of a node or image deltas to predict how long any node will take to localize a given docker image. > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347550#comment-14347550 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702598/YARN-3122.006.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6845//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6845//console This message is automatically generated. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347548#comment-14347548 ] Hadoop QA commented on YARN-3242: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702592/YARN-3242.004.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.http.TestHttpServerLifecycle The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6844//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6844//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6844//console This message is automatically generated. > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event >
[jira] [Commented] (YARN-3242) Old ZK client session watcher event causes ZKRMStateStore out of sync with current ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347561#comment-14347561 ] zhihai xu commented on YARN-3242: - Hi Rohith, thanks for the review and verifying the patch. I restarted the test, TestHttpServerLifecycle failure is not related to my change. it passed in my local latest test. {code} --- T E S T S --- Running org.apache.hadoop.http.TestHttpServerLifecycle Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.24 sec - in org.apache.hadoop.http.TestHttpServerLifecycle Results : Tests run: 7, Failures: 0, Errors: 0, Skipped: 0 {code} Also findbugs warning is not related to my change. many thanks zhihai > Old ZK client session watcher event causes ZKRMStateStore out of sync with > current ZK client session due to ZooKeeper asynchronously closing client > session. > > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h,
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347576#comment-14347576 ] Hadoop QA commented on YARN-3267: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702573/YARN_3267_V1.patch against trunk revision 03cc229. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.yarn.server.timeline.TestLeveldbTimelineStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6842//console This message is automatically generated. > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_V1.patch, YARN_3267_WIP.patch, > YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347583#comment-14347583 ] Chen He commented on YARN-3289: --- Thank you for the quick feedback, [~jlowe]. {quote} What we're missing here is progress reporting during localization so AMs can properly monitor progress of container launch requests before their code starts running, and that's useful for non-docker localization scenarios as well.{quote} I agree. That will be great. The idea that I proposed is based on the condition that we do not chance localization part. {quote} One node may take tens of minutes to localize a docker image, but another node might only take a few seconds. Docker images are often derived from other images, and docker only downloads the deltas. So it will be difficult for YARN that is not aware of the docker contents of a node or image deltas to predict how long any node will take to localize a given docker image. So it will be difficult for YARN that is not aware of the docker contents of a node or image deltas to predict how long any node will take to localize a given docker image.{quote} That is true. Docker image localization is a little bit different from other APP localization process (from HDFS to localFS). They all pull from docker registry. The network bandwidth from docker registry to each NM could be a bottleneck no matter whether the docker image deltas is large or small (we may need higher bandwidth, let's say 30G infi-band. But for a larger Hadoop cluster, more than 10 thousand task running, it may still be a problem). This is another reason that we need to consider docker image locality. > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150304-1-branch2.patch YARN-2786-20150304-1-trunk.patch Uploaded latest patch for trunk/branch-2, verified all work properly in local standalone cluster. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347598#comment-14347598 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702610/YARN-2786-20150304-1-branch2.patch against trunk revision ed70fa1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6847//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch Jenkins failed is because it wanted to apply branch-2 patch against trunk, upload trunk patch again just to rekick Jenkins. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Attachment: YARN_3267_V2.patch > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, > YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, > YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347619#comment-14347619 ] Chen He commented on YARN-3289: --- Or, maybe, we add some module on NM that can automatically pull deltas from registry. User can configure the frequency and schedule. > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347626#comment-14347626 ] Li Lu commented on YARN-3264: - Hi [~vrushalic], thanks for the update. Unfortunately the TestTimelineAggregator part failed to compile on my machine, due to the abstract TimelineAggregator class. Here, to test the basic features of TimelineAggregators, maybe we'd like to set up a SimpleTimelineAggregator class that only inherits TimelineAggregator but performs nothing else, and use it in TestTimelineAggregator? Also, I briefly skimmed through the patch and there are some unused imports. Maybe we would like to do a final cleanup before it's committed? (It's quite simple with an IDE, so let's put that to the final round. ) Thanks! > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch, YARN-3264.004.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347633#comment-14347633 ] Karthik Kambatla commented on YARN-3122: One last nit: javadoc for ResourceCalaculatorProcessTree#getCpuUsagePercent still says "return 0 if it cannot be calculated". > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347641#comment-14347641 ] Vrushali C commented on YARN-3264: -- [~gtCarrera] thanks! Will fix that test and remove the unused imports in the next update. > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch, YARN-3264.004.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues
[ https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-3275: - Attachment: YARN-3275.v2.txt [~jlowe] and [~leftnoteasy], thank you for the reviews. Attached is an updated patch (v2) with your suggested changes. > CapacityScheduler: Preemption happening on non-preemptable queues > - > > Key: YARN-3275 > URL: https://issues.apache.org/jira/browse/YARN-3275 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Labels: capacity-scheduler > Attachments: YARN-3275.v1.txt, YARN-3275.v2.txt > > > YARN-2056 introduced the ability to turn preemption on and off at the queue > level. In cases where a queue goes over its absolute max capacity (YARN-3243, > for example), containers can be preempted from that queue, even though the > queue is marked as non-preemptable. > We are using this feature in large, busy clusters and seeing this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347672#comment-14347672 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702617/YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6848//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6848//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6848//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1310) Get rid of MR settings in YARN configuration
[ https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347684#comment-14347684 ] Brahma Reddy Battula commented on YARN-1310: [~kasha] ,[~djp] and [~hitesh] can I go head like I mentioned..? can you please give your inputs..? Thanks!! > Get rid of MR settings in YARN configuration > > > Key: YARN-1310 > URL: https://issues.apache.org/jira/browse/YARN-1310 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Junping Du >Assignee: Brahma Reddy Battula > > Per discussion in YARN-1289, we should get rid of MR settings (like below) > and default values in YARN configuration which put unnecessary dependency for > YARN on MR. > {code} > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > mapreduce.job.jar > > > > mapreduce.job.hdfs-servers > ${fs.defaultFS} > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150304-2-trunk.patch > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3292) [Umbrella] Tests/documentation and/or tools for YARN rolling upgrades backwards/forward compatibility verification
[ https://issues.apache.org/jira/browse/YARN-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347710#comment-14347710 ] Li Lu commented on YARN-3292: - So far as we can see, YARN requires the following components to be compatible in a rolling upgrade (please feel free to add more in the discussion): - Protocols: both public protocols and private wiring protocols - RM/NM/ATS state stores: RM/NM/ATS data version numbers and the data store/read schema for each state store. - APIs - Security tokens - Configurations We may want to provide a suite of tools and/or unit tests that can verify if an incoming YARN patch will break the compatibility to the previous version. In the very first stage, we may want to finish the following tasks: # Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols # Extend the protobuf compatibility checker in step 1 to check RM state store # Look into the possibility to further extend the protobuf checker to NM/ATS(v1) state store (I’m not very sure now, we can merge this with step 2 if a simple extension is possible). # Implement a diff-based java API compatibility checker # Wire up the implemented tools to Jenkins test runs # Finish formal write ups for the YARN rolling upgrade standard Please feel free to discuss more about our first step goal. Thanks! > [Umbrella] Tests/documentation and/or tools for YARN rolling upgrades > backwards/forward compatibility verification > -- > > Key: YARN-3292 > URL: https://issues.apache.org/jira/browse/YARN-3292 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Li Lu >Assignee: Li Lu > Labels: compatibility, rolling_upgrade, test, tools > > YARN-666 added the support to YARN rolling upgrade. In order to support this > feature, we made changes from many perspectives. There were many assumptions > made together with these existing changes. Future code changes may break > these assumptions by accident, and hence break the YARN rolling upgrades > feature. > To simplify YARN RU regression tests, maybe we would like to create a set of > tools/tests that can verify YARN RU backward compatibility. > On the very first step, we may want to have a compatibility checker for > important protocols and APIs. We may also want to incorporate these tools > into our test Jenkins runs, if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150304-2-branch2.patch Attached both trunk/branch-2 patch, fixed findbugs warning. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, > YARN-2786-20150304-2-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347728#comment-14347728 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702629/YARN-2786-20150304-2-branch2.patch against trunk revision c66c3ac. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6851//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, > YARN-2786-20150304-2-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.007.patch Fixed javadoc > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, > YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347735#comment-14347735 ] Karthik Kambatla commented on YARN-3111: When cluster capacity is 0, do we want to show the ratio as 1? Also, instead of showing the shares as a single percentage, would it make sense to show it as % mem, %cpu? [~ashwinshankar77], [~peng.zhang] - thoughts? > Fix ratio problem on FairScheduler page > --- > > Key: YARN-3111 > URL: https://issues.apache.org/jira/browse/YARN-3111 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-3111.1.patch, YARN-3111.png > > > Found 3 problems on FairScheduler page: > 1. Only compute memory for ratio even when queue schedulingPolicy is DRF. > 2. When min resources is configured larger than real resources, the steady > fair share ratio is so long that it is out the page. > 3. When cluster resources is 0(no nodemanager start), ratio is displayed as > "NaN% used" > Attached image shows the snapshot of above problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347739#comment-14347739 ] Karthik Kambatla commented on YARN-3122: +1, pending Jenkins. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, > YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash reassigned YARN-1964: -- Assignee: Ravi Prakash (was: Abin Shahab) > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Ravi Prakash > Fix For: 2.6.0 > > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-1964: --- Assignee: Abin Shahab (was: Ravi Prakash) > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Fix For: 2.6.0 > > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347799#comment-14347799 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702629/YARN-2786-20150304-2-branch2.patch against trunk revision 722b479. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6852//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, > YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, > YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, > YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, > YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch, > YARN-2786-20150303-1-trunk.patch, YARN-2786-20150304-1-branch2.patch, > YARN-2786-20150304-1-trunk-to-rekick-Jenkins.patch, > YARN-2786-20150304-1-trunk.patch, YARN-2786-20150304-2-branch2.patch, > YARN-2786-20150304-2-trunk.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3275) CapacityScheduler: Preemption happening on non-preemptable queues
[ https://issues.apache.org/jira/browse/YARN-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347803#comment-14347803 ] Hadoop QA commented on YARN-3275: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702623/YARN-3275.v2.txt against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6850//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6850//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6850//console This message is automatically generated. > CapacityScheduler: Preemption happening on non-preemptable queues > - > > Key: YARN-3275 > URL: https://issues.apache.org/jira/browse/YARN-3275 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne > Labels: capacity-scheduler > Attachments: YARN-3275.v1.txt, YARN-3275.v2.txt > > > YARN-2056 introduced the ability to turn preemption on and off at the queue > level. In cases where a queue goes over its absolute max capacity (YARN-3243, > for example), containers can be preempted from that queue, even though the > queue is marked as non-preemptable. > We are using this feature in large, busy clusters and seeing this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347824#comment-14347824 ] Hadoop QA commented on YARN-3267: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702618/YARN_3267_V2.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6849//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6849//console This message is automatically generated. > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, > YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, > YARN_3267_WIP3.patch > > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2436) yarn application help doesn't work
[ https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2436: --- Release Note: (was: test) > yarn application help doesn't work > -- > > Key: YARN-2436 > URL: https://issues.apache.org/jira/browse/YARN-2436 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: newbie > Fix For: 3.0.0 > > Attachments: YARN-2436.patch > > > The previous version of the yarn command plays games with the command stack > for some commands. The new code needs duplicate this wackiness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347858#comment-14347858 ] Vrushali C commented on YARN-3134: -- There is a draft on some flow (and user and queue) based queries to be supported put up on jira YARN-3050 that could help us with the schema design. https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx Sharing the schema of some of the hbase tables in hRaven: (detailed schema at https://github.com/twitter/hraven/blob/master/bin/create_schema.rb) {code} create 'job_history', {NAME => 'i', COMPRESSION => 'LZO'} create 'job_history_task', {NAME => 'i', COMPRESSION => 'LZO'} # job_history (indexed) by jobId table contains 1 column family: # i: job-level information specifically the rowkey into the create 'job_history-by_jobId', {NAME => 'i', COMPRESSION => 'LZO'} # job_history_app_version - stores all version numbers seen for a single app ID # i: "info" -- version information create 'job_history_app_version', {NAME => 'i', COMPRESSION => 'LZO'} # job_history_agg_daily - stores daily aggregated job info # the s column family has a TTL of 30 days, it's used as a scratch col family # it stores the run ids that are seen for that day # we assume that a flow will not run for more than 30 days, hence it's fine to "expire" that data create 'job_history_agg_daily', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'}, {NAME => 's', VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => '2592000'} # job_history_agg_weekly - stores weekly aggregated job info # the s column family has a TTL of 30 days # it stores the run ids that are seen for that week # we assume that a flow will not run for more than 30 days, hence it's fine to "expire" that data create 'job_history_agg_weekly', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'}, {NAME => 's', VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => '2592000'} {code} job_history is the main table. It's row key: cluster!user!application!timestamp!jobID cluster, user, application are stored as Strings. timestamp and jobID are stored as longs. cluster - unique cluster name (ie. “cluster1@dc1”) user - user running the application (“edgar”) application - application ID (aka flow name) derived from job configuration: uses “batch.desc” property if set otherwise parses a consistent ID from “mapreduce.job.name” timestamp - inverted (Long.MAX_VALUE - value) value of submission time. Storing the value as an inverted timestamp ensures the latest jobs are stored first for that cluster!user!app. This enables faster retrieval of more recent jobs for this flow. jobID - stored as Job Tracker/Resource Manager start time (long), concatenated with job sequence number job_201306271100_0001 -> [1372352073732L][1L] How the columns are named in hRaven: - each key in the job history file becomes the column name. For example, for finishedMaps, it would be stored as {code} column=i:finished_maps, timestamp= 1425515902000, value=\x00\x00\x00\x00\x00\x00\x00\x05 {code} In the output above, timestamp is the hbase cell timestamp. - we store the configuration information with a column name prefix of "c!" {code} column=i:c!yarn.sharedcache.manager.client.thread-count, timestamp= 1425515902000, value=50 {code} - each counter is stored with a prefix of "g!" or "gr!" or "gm!" {code} For reducer counters, there is a prefix of gr! column=i:gr!org.apache.hadoop.mapreduce.TaskCounter!SPILLED_RECORDS, timestamp= 1425515902000 value=\x00\x00\x00\x00\x00\x00\x00\x02 For mapper counters, there is a prefix of gm! column=i:gm!org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter!BYTES_READ, timestamp= 1425515902000, value=\x00\x00\x00\x00\x00\x00\x00\x02 {code} > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the ord
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347866#comment-14347866 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702634/YARN-3122.007.patch against trunk revision 722b479. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6853//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6853//console This message is automatically generated. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.006.patch, YARN-3122.007.patch, YARN-3122.prelim.patch, > YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3264) [Storage implementation] Create backing storage write interface and a POC only file based storage implementation
[ https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3264: - Attachment: YARN-3264.005.patch Uploading new patch with review suggestions. - updated unit test for FileSystemTimelineServiceWriterImpl. - updated FileSystemTimelineServiceWriterImpl # serviceInit to initialize the local file system output directory - ensured the directory is read from config - fixed unused imports > [Storage implementation] Create backing storage write interface and a POC > only file based storage implementation > - > > Key: YARN-3264 > URL: https://issues.apache.org/jira/browse/YARN-3264 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3264.001.patch, YARN-3264.002.patch, > YARN-3264.003.patch, YARN-3264.004.patch, YARN-3264.005.patch > > > For the PoC, need to create a backend impl for file based storage of entities -- This message was sent by Atlassian JIRA (v6.3.4#6332)