[jira] [Commented] (YARN-4654) Yarn node label CLI should parse "=" correctly when trying to remove all labels on a node
[ https://issues.apache.org/jira/browse/YARN-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148101#comment-15148101 ] Rohith Sharma K S commented on YARN-4654: - +1 for the latest patch. I will wait couple of days before committing it. > Yarn node label CLI should parse "=" correctly when trying to remove all > labels on a node > - > > Key: YARN-4654 > URL: https://issues.apache.org/jira/browse/YARN-4654 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-4654.v1.001.patch, YARN-4654.v1.002.patch, > YARN-4654.v1.003.patch > > > Currently, when adding labels to nodes, user can run: > {{yarn rmadmin -replaceLabelsOnNode "host1=x host2=y"}} > However, when removing labels from a node, user has to run: > {{yarn rmadmin -replaceLabelsOnNode "host1 host2"}} > Instead of: > {{yarn rmadmin -replaceLabelsOnNode "host1= host2="}} > We should handle both of "=" exists/not-exists case when removing labels on a > node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148062#comment-15148062 ] Hadoop QA commented on YARN-4412: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4412 does not apply to yarn-2877. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782131/YARN-4412-yarn-2877.v2.patch | | JIRA Issue | YARN-4412 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10568/console | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Create ClusterMonitor to compute ordered list of preferred NMs for > OPPORTUNITIC containers > -- > > Key: YARN-4412 > URL: https://issues.apache.org/jira/browse/YARN-4412 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-4412-yarn-2877.v1.patch, > YARN-4412-yarn-2877.v2.patch > > > Introduce a Cluster Monitor that aggregates load information from individual > Node Managers and computes an ordered list of preferred Node managers to be > used as target Nodes for OPPORTUNISTIC container allocations. > This list can be pushed out to the Node Manager (specifically the AMRMProxy > running on the Node) via the Allocate Response. This will be used to make > local Scheduling decisions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2887) AM policies for choosing type of containers
[ https://issues.apache.org/jira/browse/YARN-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh resolved YARN-2887. --- Resolution: Not A Problem This is more an AM / application specific change and should not be part of core YARN > AM policies for choosing type of containers > --- > > Key: YARN-2887 > URL: https://issues.apache.org/jira/browse/YARN-2887 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos > > Each AM can employ policies that determine what type of container > (guaranteed-start or queueable) should be requested for each task. > An example policy may be to use only guaranteed-start or only queueable > containers, or to randomly pick a percentage of the requests to be queueable, > or to choose the container type based on the characteristics of the tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4693) FSNamesystem class GetCorruptFiles function in the catch log description is not accurate.
[ https://issues.apache.org/jira/browse/YARN-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147985#comment-15147985 ] zhutengyu commented on YARN-4693: - I have re created. https://issues.apache.org/jira/browse/HDFS-9811 > FSNamesystem class GetCorruptFiles function in the catch log description is > not accurate. > - > > Key: YARN-4693 > URL: https://issues.apache.org/jira/browse/YARN-4693 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.7.0 >Reporter: zhutengyu > > LOG.warn("Get corrupt file blocks returned error: " + e.getMessage()); > "Error" key to the location of the key staff to mislead, intends to replace > the "Resoult" keyword -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4693) FSNamesystem class GetCorruptFiles function in the catch log description is not accurate.
[ https://issues.apache.org/jira/browse/YARN-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147986#comment-15147986 ] zhutengyu commented on YARN-4693: - I have re created. https://issues.apache.org/jira/browse/HDFS-9811 > FSNamesystem class GetCorruptFiles function in the catch log description is > not accurate. > - > > Key: YARN-4693 > URL: https://issues.apache.org/jira/browse/YARN-4693 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.7.0 >Reporter: zhutengyu > > LOG.warn("Get corrupt file blocks returned error: " + e.getMessage()); > "Error" key to the location of the key staff to mislead, intends to replace > the "Resoult" keyword -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption
[ https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147930#comment-15147930 ] Tsuyoshi Ozawa commented on YARN-4648: -- [~kaisasak] Instead of changing the sequence of initialization, how about changing the name of {{startResourceManagerWithoutThreshold}}? I think the name of {{startResourceManagerWithoutThreshold}} looks confusing since the behaviour of the method named {{startResourceManagerWithoutThreshold()}} looks to be equals to startResourceManager(1.1f). What do you think? > Move preemption related tests from TestFairScheduler to > TestFairSchedulerPreemption > --- > > Key: YARN-4648 > URL: https://issues.apache.org/jira/browse/YARN-4648 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Kai Sasaki > Labels: newbie++ > Attachments: YARN-4648.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
[ https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147706#comment-15147706 ] Steve Loughran commented on YARN-4545: -- the new {{hadoop-yarn-server-timeline-pluginstorage}} JAR needs to be a dependency of yarn-client if it is required client-side. Why? So that things who declare that as their dependency will automatically get it up. Once this is done, there's no need to add the new plugin dependency to dshell's Pom. If it's only test-time, then miniyarncluster will probably need it instead. we know a new ATS version is coming in 2.x. Are there already plans for the API to change? What about the probe {{YarnConfiguration.timelineServiceV1_5Enabled}}? As it stands, that probe returns true for any version >= 1.5. Is that an implicit guarantee that ATS v3 will still support the 1.5 client APIs? The predicate is telling you something the timeline client knows, currently via the config, possibly in future after authenticating with the server. I'd place the method into {{TimelineClient}} itself. then you could go {code} if (timelineClient.supportsApiV1_5()) { ... } {code} * The code in {{TestDistributedShell.setupInternal()}} to set up ATS v1.5 looks like some boilerplate that every app testing against ATS will need. This should be made re-usable, either in MiniYarnCluster or nearby. * The {{TestDistributedShell.isTestWithTimelineV1_5()}} is not a bit of a hack, as it now adds some magic behaviour to test classes, depending on their name...this is precisely the thing that JUnit 4 moved off. I can't suggest an alternative that isn't more complex (e.g. a private annotation) some general {{TestDistributedShell}} comments, which are probably time to address * drop the per test timeout and add a single Rule for timeout; you may as well consolidate things * the distributed shell should run in a thread with a name, for ease of log4j analysis * if that thread fails to set things up, it will throw an exception, but that exception won't propagate into the test thread/test report. it needs to be written to a field of the test suite, then, after the t.join(), raised if non-null * the {{Assert.assertTrue}} call above {{t.join()}} should then go after that check& rethrow, so if there is some dshell failure, the test runner waits for the thread to complete before failing (this change should guarantee that the thread of one test case completes before the next test case is started).. * the checks for text in exceptions should move to {{GenericTestUtils.assertExceptionContains}}, as that will not lose the original exception message or data > Allow YARN distributed shell to use ATS v1.5 APIs > - > > Key: YARN-4545 > URL: https://issues.apache.org/jira/browse/YARN-4545 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4545-YARN-4265.001.patch, > YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, > YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, > YARN-4545-trunk.005.patch > > > We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to > allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the > system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147605#comment-15147605 ] Sunil G commented on YARN-4108: --- Hi [~leftnoteasy] Thanks for sharing full version of updated patch, really appreciate the efforts. I have few dbts/comments in this patch. Pls help to check the same. :-) 1. {{updateResToObtainAndKillableContainers}} tries to raise cancel preemption back to scheduler if container resource is empty. So could we also remove from {{resToObtainByPartition}} here? {code} +if (!partitionsHasResToObtain.contains(partition)) { + // When we don't need to preempt any resources from the queue/partition + // , cancel all killable containers from the queue + rmContext.getDispatcher().getEventHandler().handle( + new ContainerPreemptEvent( + killableContainer.getApplicationAttemptId(), + killableContainer, + SchedulerEventType.MARK_CONTAINER_FOR_NONKILLABLE)) {code} 2. If {{conf.getLazyPreemptionEnabled()}} is disabled, do we need to handle events like MARK_CONTAINER_FOR_NONKILLABLE in scheduler? 3. {{cleanupCompletedKillableContainers()}} in ProportionalCapacityPreemptionPolicy operates on full live containers in cluster. And this can be a very big number, so we may hold the concurrent map's access on each iteration of preemption policy. Will this be a pblm? Can we optimize this further. For eg, could we set a boolean state within RMContainer to indicate its not a live container? 4. {{killToPreemptContainers}} in LeafQueue invokes completedContainer on application's leafQueue. I am thinking on some corner case where one {{LeafQueue#killToPreemptContainers}} invocation results in a call to completedContainer of another leafQueue. Do you see some chance for same, may be a moveQueue operation? > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, > YARN-4108.2.patch, YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, > YARN-4108.poc.3-WIP.patch, YARN-4108.poc.4-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4694) Document ATS v1.5
[ https://issues.apache.org/jira/browse/YARN-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4694: Affects Version/s: 2.8.0 > Document ATS v1.5 > - > > Key: YARN-4694 > URL: https://issues.apache.org/jira/browse/YARN-4694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147594#comment-15147594 ] Xuan Gong commented on YARN-4501: - Will document the new write APIs in https://issues.apache.org/jira/browse/YARN-4694. Close this as duplicate > Document new put APIs in TimelineClient for ATS 1.5 > --- > > Key: YARN-4501 > URL: https://issues.apache.org/jira/browse/YARN-4501 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Junping Du >Assignee: Xuan Gong > > In YARN-4234, we are adding new put APIs in TimelineClient, we should > document it properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5
[ https://issues.apache.org/jira/browse/YARN-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-4501. - Resolution: Duplicate > Document new put APIs in TimelineClient for ATS 1.5 > --- > > Key: YARN-4501 > URL: https://issues.apache.org/jira/browse/YARN-4501 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Junping Du >Assignee: Xuan Gong > > In YARN-4234, we are adding new put APIs in TimelineClient, we should > document it properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4694) Document ATS v1.5
Xuan Gong created YARN-4694: --- Summary: Document ATS v1.5 Key: YARN-4694 URL: https://issues.apache.org/jira/browse/YARN-4694 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4654) Yarn node label CLI should parse "=" correctly when trying to remove all labels on a node
[ https://issues.apache.org/jira/browse/YARN-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147455#comment-15147455 ] Naganarasimha G R commented on YARN-4654: - YARN-4351 captures the test case failures for {{hadoop.yarn.client.TestGetGroups}} and other timed out testcases are also related to the build machine so i think the jenkins report is fine!, [~rohithsharma]/[~wangda], can one of you have a look at the latest patch ? > Yarn node label CLI should parse "=" correctly when trying to remove all > labels on a node > - > > Key: YARN-4654 > URL: https://issues.apache.org/jira/browse/YARN-4654 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-4654.v1.001.patch, YARN-4654.v1.002.patch, > YARN-4654.v1.003.patch > > > Currently, when adding labels to nodes, user can run: > {{yarn rmadmin -replaceLabelsOnNode "host1=x host2=y"}} > However, when removing labels from a node, user has to run: > {{yarn rmadmin -replaceLabelsOnNode "host1 host2"}} > Instead of: > {{yarn rmadmin -replaceLabelsOnNode "host1= host2="}} > We should handle both of "=" exists/not-exists case when removing labels on a > node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4680) TimerTasks leak in ATS V1.5 Writer
[ https://issues.apache.org/jira/browse/YARN-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147385#comment-15147385 ] Steve Loughran commented on YARN-4680: -- LGTM, though needs an entry in yarn-defaults.xml for the documentation > TimerTasks leak in ATS V1.5 Writer > -- > > Key: YARN-4680 > URL: https://issues.apache.org/jira/browse/YARN-4680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4680.1.patch, YARN-4680.20160108.patch, > YARN-4680.20160109.patch > > > We have seen TimerTasks leak which could cause application server done (such > as oozie server done due to too many active threads) > Although we have fixed some potentially leak situations in upper application > level, such as > https://issues.apache.org/jira/browse/MAPREDUCE-6618 > https://issues.apache.org/jira/browse/MAPREDUCE-6621, we still can not > guarantee that we fixed the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
[ https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147374#comment-15147374 ] Steve Loughran commented on YARN-4545: -- Or to put it differently, "why do you need to use UGI here at all. In an insecure cluster it is unimportant, and in a secure cluster, the login user will have the relevant credentials. > Allow YARN distributed shell to use ATS v1.5 APIs > - > > Key: YARN-4545 > URL: https://issues.apache.org/jira/browse/YARN-4545 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4545-YARN-4265.001.patch, > YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, > YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, > YARN-4545-trunk.005.patch > > > We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to > allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the > system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
[ https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147356#comment-15147356 ] Steve Loughran commented on YARN-4545: -- {publishContainerEndEvent}} doesn't try to put the event in a {{ugi.doAs}} clause, even though the container start event is put this way. Why the difference? > Allow YARN distributed shell to use ATS v1.5 APIs > - > > Key: YARN-4545 > URL: https://issues.apache.org/jira/browse/YARN-4545 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4545-YARN-4265.001.patch, > YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, > YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, > YARN-4545-trunk.005.patch > > > We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to > allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the > system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147290#comment-15147290 ] Akira AJISAKA commented on YARN-3344: - Hi [~ravindra.naik], how is this issue going? If you don't have time to update the patch, I'd like to take over your work. > procfs stat file is not in the expected format warning > -- > > Key: YARN-3344 > URL: https://issues.apache.org/jira/browse/YARN-3344 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jon Bringhurst >Assignee: Ravindra Kumar Naik > Attachments: YARN-3344-trunk.005.patch > > > Although this doesn't appear to be causing any functional issues, it is > spamming our log files quite a bit. :) > It appears that the regex in ProcfsBasedProcessTree doesn't work for all > /proc//stat files. > Here's the error I'm seeing: > {noformat} > "source_host": "asdf", > "method": "constructProcessInfo", > "level": "WARN", > "message": "Unexpected: procfs stat file is not in the expected format > for process with pid 6953" > "file": "ProcfsBasedProcessTree.java", > "line_number": "514", > "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree", > {noformat} > And here's the basic info on process with pid 6953: > {noformat} > [asdf ~]$ cat /proc/6953/stat > 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 > 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 > 2 18446744073709551615 0 0 17 13 0 0 0 0 0 > [asdf ~]$ ps aux|grep 6953 > root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 > /export/apps/salt/minion-scripts/module-sync.py > jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 > [asdf ~]$ > {noformat} > This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)