[jira] [Updated] (YARN-5101) YARN_APPLICATION_UPDATED event is parsed in ApplicationHistoryManagerOnTimelineStore#convertToApplicationReport with reversed order
[ https://issues.apache.org/jira/browse/YARN-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5101: -- Attachment: YARN-5101.0002.patch Thanks [~rohithsharma]. It makes sense. Updated a patch with this approach, however {{SystemClock}} was used by {{RMAppImpl}} which is internally using {{System.currentTimeMillis}} and it was available only in RMAppImpl, so CS#updateApplicationPriority had to use old approach. I will raise a separate jira to use MonotonicClock for RMAppImpl. > YARN_APPLICATION_UPDATED event is parsed in > ApplicationHistoryManagerOnTimelineStore#convertToApplicationReport with > reversed order > --- > > Key: YARN-5101 > URL: https://issues.apache.org/jira/browse/YARN-5101 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Sunil G > Attachments: YARN-5101.0001.patch, YARN-5101.0002.patch > > > Right now, the application events are parsed in in > ApplicationHistoryManagerOnTimelineStore#convertToApplicationReport with > timestamp descending order, which means the later events would be parsed > first, and the previous same type of events would override the information. In > https://issues.apache.org/jira/browse/YARN-4044, we have introduced > YARN_APPLICATION_UPDATED events which might be submitted by RM multiple times > in one application life cycle. This could cause problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376442#comment-15376442 ] Hadoop QA commented on YARN-5156: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 40s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 31s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 4s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817887/YARN-5156-YARN-5355.02.patch | | JIRA Issue | YARN-5156 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux aea954f2bd8f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-5355 / 0fd3980 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12321/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12321/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12321/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https:
[jira] [Commented] (YARN-5299) Log Docker run command when container fails
[ https://issues.apache.org/jira/browse/YARN-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376423#comment-15376423 ] Hudson commented on YARN-5299: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10096 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10096/]) YARN-5299. Log Docker run command when container fails. Contributed by (rohithsharmaks: rev dbe97aa768e2987209811c407969fea47641418c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java > Log Docker run command when container fails > --- > > Key: YARN-5299 > URL: https://issues.apache.org/jira/browse/YARN-5299 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.9.0 > > Attachments: YARN-5299.001.patch > > > It's useful to have the docker run command logged when containers fail to > help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default
[ https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376409#comment-15376409 ] Hadoop QA commented on YARN-5363: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s {color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 7 new + 80 unchanged - 8 fixed = 87 total (was 88) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 23s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 0s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817883/YARN-5363-2016-07-13.1.txt | | JIRA Issue | YARN-5363 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 85ba631ffd0b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2bbc3ea | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12320/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12320/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12320/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-Y
[jira] [Commented] (YARN-5299) Log Docker run command when container fails
[ https://issues.apache.org/jira/browse/YARN-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376401#comment-15376401 ] Rohith Sharma K S commented on YARN-5299: - sorry that I did not see your comment since it was going parallelly :-( > Log Docker run command when container fails > --- > > Key: YARN-5299 > URL: https://issues.apache.org/jira/browse/YARN-5299 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.9.0 > > Attachments: YARN-5299.001.patch > > > It's useful to have the docker run command logged when containers fail to > help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission
[ https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376395#comment-15376395 ] Ying Zhang commented on YARN-5287: -- Hi Naganarasimha, I've uploaded a new patch with a new test case included. I've run the "test-container-executor" with regular user and root. It passes. As I said earlier, there is issue with the current test-container-executor when running under root. I have made some minor change to make it pass(to be specific, in test-container-executor.c, in function main(), when running as root, test_recursive_unlink_children() needs to be run before set_user(username). Not sure if it is correct, just a work-around). I don't think I should include it in this patch. Let me know your idea. Thank you. > LinuxContainerExecutor fails to set proper permission > - > > Key: YARN-5287 > URL: https://issues.apache.org/jira/browse/YARN-5287 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-5287-naga.patch, YARN-5287.001.patch, > YARN-5287.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > LinuxContainerExecutor fails to set the proper permissions on the local > directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster > has been configured with a restrictive umask, e.g.: umask 077. Job failed due > to the following reason: > Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has > permission 700 but needs permission 750 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5377) TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in trunk
Rohith Sharma K S created YARN-5377: --- Summary: TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in trunk Key: YARN-5377 URL: https://issues.apache.org/jira/browse/YARN-5377 Project: Hadoop YARN Issue Type: Bug Reporter: Rohith Sharma K S Test case fails jenkin build [link|https://builds.apache.org/job/PreCommit-YARN-Build/12228/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt] {noformat} Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.586 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager testKillMultipleOpportunisticContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager) Time elapsed: 32.134 sec <<< FAILURE! java.lang.AssertionError: ContainerState is not correct (timedout) expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForNMContainerState(BaseContainerManagerTest.java:363) at org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager.testKillMultipleOpportunisticContainers(TestQueuingContainerManager.java:470) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5156: - Attachment: YARN-5156-YARN-5355.02.patch Thanks [~varun_saxena] Uploading another patch which now removes storing the ContainerStatus in the ContainerFinishedEvent in the NMTimelinePublisher. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch, YARN-5156-YARN-5355.02.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5299) Log Docker run command when container fails
[ https://issues.apache.org/jira/browse/YARN-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376387#comment-15376387 ] Vinod Kumar Vavilapalli commented on YARN-5299: --- Can we print similar logging for other commands like signalContainer? For launch failure, does it make sense to expose this full-command further up into the container-diagnostics? Not related to this patch, but another thing that is a little overwhelming is the following in DelegatingLinuxContainerRuntime {code} if (LOG.isInfoEnabled()) { LOG.info("Using container runtime: " + runtime.getClass() .getSimpleName()); } {code} Make it debug only? > Log Docker run command when container fails > --- > > Key: YARN-5299 > URL: https://issues.apache.org/jira/browse/YARN-5299 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-5299.001.patch > > > It's useful to have the docker run command logged when containers fail to > help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4888) Changes in RM AppSchedulingInfo for identifying resource-requests explicitly
[ https://issues.apache.org/jira/browse/YARN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376380#comment-15376380 ] Arun Suresh edited comment on YARN-4888 at 7/14/16 6:12 AM: Thanks for the patch [~subru] Couple of initial comments: # In {{AppSchedulingInfo::checkForDeactivation()}}, given that you are introducing an inner for loop, the deactivate = false should be set only if ALL requests in the {{mappedRequest.values()}} has {{request.getNumContainers() > 0}} right ? # In {{AppSchedulingInfo::allocateNodeLocal()}}, That {{rackLocalRequest}} and {{offRackRequest}} selected from the {{resoureRequestMap}} should correspond to the same allocationRequestId of the {{nodeLocalRequest}}. Only when you have a single allocationRequestId, can you guarantee that {{.firstEntry().getValue()}} will correspond to the same nodeLocalRequest in both cases. With regard to the getMergeResource() method. I have a feeling we should not be merging all requests of a {{Priority}}. Consider this alternate approach, which I feel might allow us to more easily verify if we are breaking any invariants in the Scheduler: My intuition is based on the fact that If we agree that, in the absence of an *allocateRequestId*, we can simulate the same functionality by using a unique Priority to tie all requests we need to be of the same allocateRequestId Then, why not replace the 'Priority' in the {{AppSchedulingInfo::resourceRequestMap}} with a new type (called *SchedulerPriority*) which is essentially a composite of *Priority + allocateRequestId*. If no allocateRequestId is provided, the requestId part of it will default to 0. If the *SchedulerPriority* is a subclass of *Priority*, then we wont even need to change any of the APIs. Thoughts ? I can help provide a quick prototype patch to verify if this works.. was (Author: asuresh): Thanks for the patch [~subru] Couple of initial comments: # In {{AppSchedulingInfo::checkForDeactivation()}}, given that you are introducing an inner for loop, the deactivate = false should be set only if ALL requests in the {{mappedRequest.values()}} has {{request.getNumContainers() > 0}} right ? # In {{AppSchedulingInfo::allocateNodeLocal()}}, That {{rackLocalRequest}} and {{offRackRequest}} selected from the {{resoureRequestMap}} should correspond to the same allocationRequestId of the {{nodeLocalRequest}}. Only when you have a single allocationRequestId, can you guarantee that {{.firstEntry().getValue()}} will correspond to the same nodeLocalRequest in both cases. With regard to the getMergeResource() method. I have a feeling we should not be merging all requests of a {{Priority}}. Consider this alternate approach, which I feel might allow us to more easily verify if we are breaking any invariants in the Scheduler: My intuition is based on the fact that If we agree that, in the absence of an *allocateRequestId*, we can simulate the same functionality by using a unique Priority to tie all requests we need to be of the same allocateRequestId Then, why not replace the 'Priority' in the {{AppSchedulingInfo::resourceRequestMap}} with a new type (called *SchedulerPriority*) which is essentially a composite of *Priority + allocateRequestId*. If not allocateRequestId is provided, it will default to 0. If the *SchedulerPriority* is a subclass of *Priority*, then we wont even need to change any of the APIs. Thoughts ? I can help provide a quick prototype patch to verify if this works.. > Changes in RM AppSchedulingInfo for identifying resource-requests explicitly > > > Key: YARN-4888 > URL: https://issues.apache.org/jira/browse/YARN-4888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4888-v0.patch > > > YARN-4879 puts forward the notion of identifying allocate requests > explicitly. This JIRA is to track the changes in RM app scheduling data > structures to accomplish it. Please refer to the design doc in the parent > JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4888) Changes in RM AppSchedulingInfo for identifying resource-requests explicitly
[ https://issues.apache.org/jira/browse/YARN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376380#comment-15376380 ] Arun Suresh commented on YARN-4888: --- Thanks for the patch [~subru] Couple of initial comments: # In {{AppSchedulingInfo::checkForDeactivation()}}, given that you are introducing an inner for loop, the deactivate = false should be set only if ALL requests in the {{mappedRequest.values()}} has {{request.getNumContainers() > 0}} right ? # In {{AppSchedulingInfo::allocateNodeLocal()}}, That {{rackLocalRequest}} and {{offRackRequest}} selected from the {{resoureRequestMap}} should correspond to the same allocationRequestId of the {{nodeLocalRequest}}. Only when you have a single allocationRequestId, can you guarantee that {{.firstEntry().getValue()}} will correspond to the same nodeLocalRequest in both cases. With regard to the getMergeResource() method. I have a feeling we should not be merging all requests of a {{Priority}}. Consider this alternate approach, which I feel might allow us to more easily verify if we are breaking any invariants in the Scheduler: My intuition is based on the fact that If we agree that, in the absence of an *allocateRequestId*, we can simulate the same functionality by using a unique Priority to tie all requests we need to be of the same allocateRequestId Then, why not replace the 'Priority' in the {{AppSchedulingInfo::resourceRequestMap}} with a new type (called *SchedulerPriority*) which is essentially a composite of *Priority + allocateRequestId*. If not allocateRequestId is provided, it will default to 0. If the *SchedulerPriority* is a subclass of *Priority*, then we wont even need to change any of the APIs. Thoughts ? I can help provide a quick prototype patch to verify if this works.. > Changes in RM AppSchedulingInfo for identifying resource-requests explicitly > > > Key: YARN-4888 > URL: https://issues.apache.org/jira/browse/YARN-4888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-4888-v0.patch > > > YARN-4879 puts forward the notion of identifying allocate requests > explicitly. This JIRA is to track the changes in RM app scheduling data > structures to accomplish it. Please refer to the design doc in the parent > JIRA for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default
[ https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-5363: -- Attachment: YARN-5363-2016-07-13.1.txt Tx for the review [~xgong]! bq. think that we could add the logic here. So, we do not need to do it separately inside several different functions. This makes perfect sense - code reuse, yay! Updating patch with the comments addressed. > For AM containers, or for containers of running-apps, "yarn logs" incorrectly > only (tries to) shows syslog file-type by default > --- > > Key: YARN-5363 > URL: https://issues.apache.org/jira/browse/YARN-5363 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-5363-2016-07-12.txt, YARN-5363-2016-07-13.1.txt, > YARN-5363-2016-07-13.txt > > > For e.g, for a running application, the following happens: > {code} > # yarn logs -applicationId application_1467838922593_0001 > 16/07/06 22:07:05 INFO impl.TimelineClientImpl: Timeline service address: > http://:8188/ws/v1/timeline/ > 16/07/06 22:07:06 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > 16/07/06 22:07:07 INFO impl.TimelineClientImpl: Timeline service address: > http://l:8188/ws/v1/timeline/ > 16/07/06 22:07:07 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_01 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_02 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_03 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_04 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_05 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_06 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_07 within the application: > application_1467838922593_0001 > Can not find the logs for the application: application_1467838922593_0001 > with the appOwner: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5299) Log Docker run command when container fails
[ https://issues.apache.org/jira/browse/YARN-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376370#comment-15376370 ] Rohith Sharma K S commented on YARN-5299: - +1 lgtm > Log Docker run command when container fails > --- > > Key: YARN-5299 > URL: https://issues.apache.org/jira/browse/YARN-5299 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-5299.001.patch > > > It's useful to have the docker run command logged when containers fail to > help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5355) YARN Timeline Service v.2: alpha 2
[ https://issues.apache.org/jira/browse/YARN-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5355: - Attachment: YARN-5355-branch-2.01.patch Uploading the patch for back-porting to branch-2. (YARN-5355-branch-2.01.patch) > YARN Timeline Service v.2: alpha 2 > -- > > Key: YARN-5355 > URL: https://issues.apache.org/jira/browse/YARN-5355 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: Timeline Service v2_ Ideas for Next Steps.pdf, > YARN-5355-branch-2.01.patch > > > This is an umbrella JIRA for the alpha 2 milestone for YARN Timeline Service > v.2. > This is developed on feature branches: {{YARN-5355}} for the trunk-based > development and {{YARN-5355-branch-2}} to maintain backports to branch-2. Any > subtask work on this JIRA will be committed to those 2 branches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5355) YARN Timeline Service v.2: alpha 2
[ https://issues.apache.org/jira/browse/YARN-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5355: - Assignee: Sangjin Lee (was: Vrushali C) > YARN Timeline Service v.2: alpha 2 > -- > > Key: YARN-5355 > URL: https://issues.apache.org/jira/browse/YARN-5355 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: Timeline Service v2_ Ideas for Next Steps.pdf, > YARN-5355-branch-2.01.patch > > > This is an umbrella JIRA for the alpha 2 milestone for YARN Timeline Service > v.2. > This is developed on feature branches: {{YARN-5355}} for the trunk-based > development and {{YARN-5355-branch-2}} to maintain backports to branch-2. Any > subtask work on this JIRA will be committed to those 2 branches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5355) YARN Timeline Service v.2: alpha 2
[ https://issues.apache.org/jira/browse/YARN-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C reassigned YARN-5355: Assignee: Vrushali C (was: Sangjin Lee) > YARN Timeline Service v.2: alpha 2 > -- > > Key: YARN-5355 > URL: https://issues.apache.org/jira/browse/YARN-5355 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: Timeline Service v2_ Ideas for Next Steps.pdf > > > This is an umbrella JIRA for the alpha 2 milestone for YARN Timeline Service > v.2. > This is developed on feature branches: {{YARN-5355}} for the trunk-based > development and {{YARN-5355-branch-2}} to maintain backports to branch-2. Any > subtask work on this JIRA will be committed to those 2 branches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5287) LinuxContainerExecutor fails to set proper permission
[ https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Zhang updated YARN-5287: - Attachment: YARN-5287.002.patch > LinuxContainerExecutor fails to set proper permission > - > > Key: YARN-5287 > URL: https://issues.apache.org/jira/browse/YARN-5287 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Ying Zhang >Assignee: Ying Zhang >Priority: Minor > Attachments: YARN-5287-naga.patch, YARN-5287.001.patch, > YARN-5287.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > LinuxContainerExecutor fails to set the proper permissions on the local > directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster > has been configured with a restrictive umask, e.g.: umask 077. Job failed due > to the following reason: > Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has > permission 700 but needs permission 750 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376314#comment-15376314 ] Yufei Gu commented on YARN-4212: Thanks [~kasha]. > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376295#comment-15376295 ] Naganarasimha G R commented on YARN-4464: - Thanks [~vinodkv], it looks ideal to have default value as zero but not sure all production cluster will adopt ATS immediately, in that per se i thought of having around last 500 ~ 1000 completed apps in RM. If all are ok with no completed apps in RM Memory as default then i am fine with it, its like -0 from my side. And i am ok with no change in Hadoop 2.x. > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5272) Handle queue names consistently in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376276#comment-15376276 ] Ray Chiang commented on YARN-5272: -- [~wilfreds], let me know if you'd prefer to abstract out the whitespace trimming in a follow up JIRA or if you plan to do it for this patch. > Handle queue names consistently in FairScheduler > > > Key: YARN-5272 > URL: https://issues.apache.org/jira/browse/YARN-5272 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Attachments: YARN-5272.1.patch, YARN-5272.3.patch, YARN-5272.4.patch > > > The fix used in YARN-3214 uses a the JDK trim() method to remove leading and > trailing spaces. The QueueMetrics uses a guava based trim when it splits the > queues. > The guava based trim uses the unicode definition of a white space which is > different than the java trim as can be seen > [here|https://docs.google.com/a/cloudera.com/spreadsheets/d/1kq4ECwPjHX9B8QUCTPclgsDCXYaj7T-FlT4tB5q3ahk/pub] > A queue name with a non-breaking white space will thus still cause the same > "Metrics source XXX already exists!" MetricsException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376265#comment-15376265 ] Robert Kanter commented on YARN-4676: - Thanks for pointing that out [~mingma]. Using the same file format makes sense to me. Would it make sense to move some of that code (i.e. parsing, etc) to Common so that we can use the same implementation in HDFS and YARN? [~danzhi], [~djp], what do you think? > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
[ https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-5376: --- Issue Type: Bug (was: Improvement) > capacity scheduler crashed while processing APP_ATTEMPT_REMOVED > --- > > Key: YARN-5376 > URL: https://issues.apache.org/jira/browse/YARN-5376 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee > Attachments: capacity-crash.log > > > we are testing capacity schedule with a sls like client, see following error, > seems shedulerNode is removed. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376260#comment-15376260 ] Varun Saxena commented on YARN-5156: I am fine with removing it. We can anyways interpret what the container state will be from the event. It can either be RUNNING or COMPLETE. And its COMPLETE only on container finished event. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376240#comment-15376240 ] Vrushali C commented on YARN-5156: -- Thanks [~varun_saxena]! bq. We are not. Container state is only published only in Finished event. Maybe we can either include it everywhere or not have it anywhere. I see, then I think we should just remove it (as part of this jira fix). What do you think ? > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375771#comment-15375771 ] Naganarasimha G R edited comment on YARN-5342 at 7/14/16 3:11 AM: -- Thanks for the patch [~wangda], Given that discussed approach in YARN-4425 (Fallback Policy based ) is going to take some time as it would require significant modifications, i would agree to go for intermittent modification to optimize the non exclusive mode scheduling. Only concern i have is if the size of default partition is greater than the non exclusive partition then on one allocation in default we are resetting the counter, would it be productive ? was (Author: naganarasimha): Thanks for the patch [~wangda], Given that discussed approach in YARN-4225 (Fallback Policy based ) is going to take some time as it would require significant modifications, i would agree to go for intermittent modification to optimize the non exclusive mode scheduling. Only concern i have is if the size of default partition is greater than the non exclusive partition then on one allocation in default we are resetting the counter, would it be productive ? > Improve non-exclusive node partition resource allocation in Capacity Scheduler > -- > > Key: YARN-5342 > URL: https://issues.apache.org/jira/browse/YARN-5342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5342.1.patch > > > In the previous implementation, one non-exclusive container allocation is > possible when the missed-opportunity >= #cluster-nodes. And > missed-opportunity will be reset when container allocated to any node. > This will slow down the frequency of container allocation on non-exclusive > node partition: *When a non-exclusive partition=x has idle resource, we can > only allocate one container for this app in every > X=nodemanagers.heartbeat-interval secs for the whole cluster.* > In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 > pending resource for the non-exclusive partition OR we get allocation from > the default partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376239#comment-15376239 ] Varun Saxena commented on YARN-5156: [~vrushalic], I think the warning log is not required because it will be printed everytime. Its because in ContainerImpl the state will not be COMPLETE when the event to NMTimelinePublisher is posted. bq. I think we should include the container state in the finished event, if we are including other container states at other times in other events. We are not. Container state is only published only in Finished event. Maybe we can either include it everywhere or not have it anywhere. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
[ https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376233#comment-15376233 ] sandflee commented on YARN-5376: 2.7.2, did not change capacity scheduler code. > capacity scheduler crashed while processing APP_ATTEMPT_REMOVED > --- > > Key: YARN-5376 > URL: https://issues.apache.org/jira/browse/YARN-5376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > Attachments: capacity-crash.log > > > we are testing capacity schedule with a sls like client, see following error, > seems shedulerNode is removed. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5309) SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5309: -- Priority: Blocker (was: Major) > SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
[ https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-5376: --- Attachment: capacity-crash.log > capacity scheduler crashed while processing APP_ATTEMPT_REMOVED > --- > > Key: YARN-5376 > URL: https://issues.apache.org/jira/browse/YARN-5376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > Attachments: capacity-crash.log > > > we are testing capacity schedule with a sls like client, see following error, > seems shedulerNode is removed. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
[ https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376222#comment-15376222 ] Sunil G commented on YARN-5376: --- Hi [~sandflee], which version of hadoop are you using. > capacity scheduler crashed while processing APP_ATTEMPT_REMOVED > --- > > Key: YARN-5376 > URL: https://issues.apache.org/jira/browse/YARN-5376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > > we are testing capacity schedule with a sls like client, see following error, > seems shedulerNode is removed. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA
[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376219#comment-15376219 ] Jun Gong commented on YARN-5333: The reason for test case errors in TestRMWebServicesAppsModification(e.g. testAppMove) is that they reinitialize CapacityScheduler with a new CapacitySchedulerConfiguration before {{rm.start()}} and it will cause problems to reinitialize it two times. However from another point of view, I think CapacityScheduler also needs this patch. [~vinodkv], [~vvasudev] could you please help confirm it? Thanks! > Some recovered apps are put into default queue when RM HA > - > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, > {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file > {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover > apps. > However the new active RM will put recovered apps into default queue because > it might have not loaded the new {{fair-scheduler.xml}}. We need call > {{initScheduler}} before start active services or bring {{refreshAll()}} in > front of {{rm.transitionToActive()}}. *It seems it is also important for > other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
sandflee created YARN-5376: -- Summary: capacity scheduler crashed while processing APP_ATTEMPT_REMOVED Key: YARN-5376 URL: https://issues.apache.org/jira/browse/YARN-5376 Project: Hadoop YARN Issue Type: Improvement Reporter: sandflee we are testing capacity schedule with a sls like client, see following error, seems shedulerNode is removed. {noformat} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5211) Supporting "priorities" in the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-5211: -- Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-2572) > Supporting "priorities" in the ReservationSystem > > > Key: YARN-5211 > URL: https://issues.apache.org/jira/browse/YARN-5211 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > > The ReservationSystem currently has an implicit FIFO priority. This JIRA > tracks effort to generalize this to arbitrary priority. This is non-trivial > as the greedy nature of our ReservationAgents might need to be revisited if > not enough space if found for late-arriving but higher priority reservations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5362) TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail
[ https://issues.apache.org/jira/browse/YARN-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376161#comment-15376161 ] sandflee commented on YARN-5362: thanks [~rohithsharma] for review and commit, open YARN-5375 to track implicitly invokes drainEvents in mockRM. > TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail > --- > > Key: YARN-5362 > URL: https://issues.apache.org/jira/browse/YARN-5362 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jason Lowe >Assignee: sandflee > Fix For: 2.9.0 > > Attachments: YARN-5362.01.patch > > > Saw the following in a precommit build that only changed an unrelated unit > test: > {noformat} > Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 101.265 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 0.411 sec <<< FAILURE! > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1653) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures
sandflee created YARN-5375: -- Summary: invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures Key: YARN-5375 URL: https://issues.apache.org/jira/browse/YARN-5375 Project: Hadoop YARN Issue Type: Improvement Reporter: sandflee Assignee: sandflee seen many test failures related to RMApp/RMAppattempt comes to some state but some event are not processed in rm event queue or scheduler event queue, cause test failure, seems we could implicitly invokes drainEvents(should also drain sheduler event) in some mockRM method like waitForState -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end
[ https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376109#comment-15376109 ] Hadoop QA commented on YARN-5361: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 50s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 14s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 43s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI | | | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817838/YARN-5361.2.patch | | JIRA Issue | YARN-5361 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2fbc753c99e0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 728bf7f | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12318/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12318/artifact/patc
[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376107#comment-15376107 ] Hadoop QA commented on YARN-5156: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 26s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch 5 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 54s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 44s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817847/YARN-5156-YARN-5355.01.patch | | JIRA Issue | YARN-5156 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 63e565a26a7b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-5355 / 0fd3980 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12319/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12319/artifact/patchprocess/whitespace-tabs.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12319/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12319/console | | Powered by | Apache Y
[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers
[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376098#comment-15376098 ] Hadoop QA commented on YARN-4759: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 18 unchanged - 0 fixed = 21 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 55s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 35s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817837/YARN-4759.003.patch | | JIRA Issue | YARN-4759 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 105334caf068 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 728bf7f | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12317/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12317/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12317/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Revisit signalContainer() for docker containers > ---
[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376095#comment-15376095 ] Wangda Tan commented on YARN-5342: -- [~Naganarasimha], that's a good point, actually I have thought about this while working on the patch. The only purpose of doing this is for simple, we could have some better logic like gradually decrease the counter depends on ratio of #nodes in default partition and #nodes in specific partitions, but they could be complex and potentially can be a regression since we don't know what happened. Please share your thoughts. Thanks, > Improve non-exclusive node partition resource allocation in Capacity Scheduler > -- > > Key: YARN-5342 > URL: https://issues.apache.org/jira/browse/YARN-5342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5342.1.patch > > > In the previous implementation, one non-exclusive container allocation is > possible when the missed-opportunity >= #cluster-nodes. And > missed-opportunity will be reset when container allocated to any node. > This will slow down the frequency of container allocation on non-exclusive > node partition: *When a non-exclusive partition=x has idle resource, we can > only allocate one container for this app in every > X=nodemanagers.heartbeat-interval secs for the whole cluster.* > In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 > pending resource for the non-exclusive partition OR we get allocation from > the default partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5159) Wrong Javadoc tag in MiniYarnCluster
[ https://issues.apache.org/jira/browse/YARN-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376089#comment-15376089 ] Akira Ajisaka commented on YARN-5159: - bq. I tried that locally before but if I remove the package name the javadoc engine will skip it. Really? o.a.h.yarn.conf.YarnConfiguration is imported in MiniYarnCluster.java, so I'm thinking that works. I tried that and succeed the following commands. {noformat} $ cd hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests $ mvn javadoc:test-javadoc {noformat} > Wrong Javadoc tag in MiniYarnCluster > > > Key: YARN-5159 > URL: https://issues.apache.org/jira/browse/YARN-5159 > Project: Hadoop YARN > Issue Type: Test > Components: documentation >Affects Versions: 2.6.0 >Reporter: Andras Bokor >Assignee: Andras Bokor > Fix For: 2.8.0 > > Attachments: YARN-5159.01.patch, YARN-5159.02.patch, > YARN-5159.03.patch > > > {@YarnConfiguration.RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME} is wrong. Should > be changed to > {@value YarnConfiguration#RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME} > Edit: > I noted that due to java 8 javadoc restrictions the javadoc:test-javadoc goal > fails on hadoop-yarn-server-tests project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5156: - Attachment: YARN-5156-YARN-5355.01.patch Uploading patch rebased to new branch YARN-5355 and modifying the code as per Varun's points above. Like I mentioned in an earlier comment, I think we should include the container state in the finished event, if we are including other container states at other times in other events. This has two purposes: - ensuring consistency in information within an event - allowing for easier scanning/filtering in the data when state information is present. I am still wondering what unit test to write. The patch is simple enough. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376057#comment-15376057 ] Vrushali C edited comment on YARN-5156 at 7/14/16 12:25 AM: Thanks [~varun_saxena] for the review discussion. Uploading patch rebased to new branch YARN-5355 and modifying the code as per Varun's points above. Like I mentioned in an earlier comment, I think we should include the container state in the finished event, if we are including other container states at other times in other events. This has two purposes: - ensuring consistency in information within an event - allowing for easier scanning/filtering in the data when state information is present. I am still wondering what unit test to write. The patch is simple enough. was (Author: vrushalic): Uploading patch rebased to new branch YARN-5355 and modifying the code as per Varun's points above. Like I mentioned in an earlier comment, I think we should include the container state in the finished event, if we are including other container states at other times in other events. This has two purposes: - ensuring consistency in information within an event - allowing for easier scanning/filtering in the data when state information is present. I am still wondering what unit test to write. The patch is simple enough. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > - > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-5156-YARN-2928.01.patch, > YARN-5156-YARN-5355.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do > we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_18", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: > "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_01" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end
[ https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-5361: Attachment: YARN-5361.2.patch > Obtaining logs for completed container says 'file belongs to a running > container ' at the end > - > > Key: YARN-5361 > URL: https://issues.apache.org/jira/browse/YARN-5361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sumana Sathish >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-5361.1.patch, YARN-5361.2.patch > > > Obtaining logs via yarn CLI for completed container but running application > says "This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete" > which is not correct. > {code} > LogType:stdout > Log Upload Time:Tue Jul 12 10:38:14 + 2016 > Log Contents: > End of LogType:stdout. This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4759) Revisit signalContainer() for docker containers
[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-4759: -- Attachment: YARN-4759.003.patch > Revisit signalContainer() for docker containers > --- > > Key: YARN-4759 > URL: https://issues.apache.org/jira/browse/YARN-4759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Shane Kumpf > Attachments: YARN-4759.001.patch, YARN-4759.002.patch, > YARN-4759.003.patch > > > The current signal handling (in the DockerContainerRuntime) needs to be > revisited for docker containers. For example, container reacquisition on NM > restart might not work, depending on which user the process in the container > runs as. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers
[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376031#comment-15376031 ] Shane Kumpf commented on YARN-4759: --- Thanks for the review [~vvasudev]! I will upload a new patch shortly. > Revisit signalContainer() for docker containers > --- > > Key: YARN-4759 > URL: https://issues.apache.org/jira/browse/YARN-4759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Shane Kumpf > Attachments: YARN-4759.001.patch, YARN-4759.002.patch > > > The current signal handling (in the DockerContainerRuntime) needs to be > revisited for docker containers. For example, container reacquisition on NM > restart might not work, depending on which user the process in the container > runs as. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end
[ https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376027#comment-15376027 ] Hadoop QA commented on YARN-5361: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 45s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s {color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 34s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 13s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI | | | hadoop.yarn.client.api.impl.TestYarnClient | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817826/YARN-5361.1.patch | | JIRA Issue | YARN-5361 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8e0895a44569 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d180505 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12316/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12316/artifact/patchp
[jira] [Comment Edited] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375989#comment-15375989 ] sandflee edited comment on YARN-4743 at 7/13/16 11:37 PM: -- I don't think snapshot could resolve this, as in YARN-5371, node is only sorted with unused resource. this seems caused by a > b, and b > c, but while sorting a and c, a < c. we should snapshot all sorting element and then sort to avoid this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to use mergeSort not TimSort for Collection#sort. was (Author: sandflee): I don't think snapshot could resolve this, as in YARN-5371, node is only sorted with unused resource. this seems caused by a > b, and b > c, but while sorting a and c, a < c. we should snapshot all sorting element and then sort to avoid this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to use mergeSort not TimSort for Collection#sort, I think capacity scheduler have similar problem. > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.4 >Reporter: Zephyr Guo >Assignee: Yufei Gu > Attachments: YARN-4743-cdh5.4.7.patch > > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this issue found in 2.6.0-cdh5.4.7. > I think the cause is that we modify {{Resouce}} while we are sorting > {{runnableApps}}. > {code:title=FSLeafQueue.java} > Comparator comparator = policy.getComparator(); > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > {code} > {code:title=FairShareComparator} > public int compare(Schedulable s1, Schedulable s2) { > .. > s1.getResourceUsage(), minShare1); > boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null, > s2.getResourceUsage(), minShare2); > minShareRatio1 = (double) s1.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare1, > ONE).getMemory(); > minShareRatio2 = (double) s2.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare2, > ONE).getMemory(); > .. > {code} > {{getResourceUsage}} will return current Resource. The current Resource is > unstable. > {code:title=FSAppAttempt.java} > @Override > public Resource getResourceUsage() { > // Here the getPreemptedResources() always return zero, except in > // a preemption round > return Resources.subtract(getCurrentConsumption(), > getPreemptedResources()); > } > {code} > {code:title=SchedulerApplicationAttempt} > public Resource getCurrentConsumption() { > return currentConsumption; > } > // This method may modify current Resource. > public synchronized void recoverContainer(RMContainer rmContainer) { > .. > Resources.addTo(currentConsumption, rmContainer.getContainer() > .getResourc
[jira] [Commented] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default
[ https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375999#comment-15375999 ] Xuan Gong commented on YARN-5363: - [~vinodkv] Thanks for the patch. Overall looks good. I have one comment: we did a check for the input log files: {code} List logs = new ArrayList(); if (fetchAllLogFiles(logFiles)) { logs.add(".*"); } else if (logFiles != null && logFiles.length > 0) { logs = Arrays.asList(logFiles); } {code} before we actually ran any commands. I think that we could add the logic here. So, we do not need to do it separately inside several different functions. > For AM containers, or for containers of running-apps, "yarn logs" incorrectly > only (tries to) shows syslog file-type by default > --- > > Key: YARN-5363 > URL: https://issues.apache.org/jira/browse/YARN-5363 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-5363-2016-07-12.txt, YARN-5363-2016-07-13.txt > > > For e.g, for a running application, the following happens: > {code} > # yarn logs -applicationId application_1467838922593_0001 > 16/07/06 22:07:05 INFO impl.TimelineClientImpl: Timeline service address: > http://:8188/ws/v1/timeline/ > 16/07/06 22:07:06 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > 16/07/06 22:07:07 INFO impl.TimelineClientImpl: Timeline service address: > http://l:8188/ws/v1/timeline/ > 16/07/06 22:07:07 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_01 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_02 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_03 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_04 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_05 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_06 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_07 within the application: > application_1467838922593_0001 > Can not find the logs for the application: application_1467838922593_0001 > with the appOwner: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Summary: NPE listing wildcard directory in containerLaunch (was: NPE introduced by YARN-4958 (The file localization process should allow...)) > NPE listing wildcard directory in containerLaunch > - > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null (only happens in a secure cluster), NPE > will cause the container fail to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Priority: Critical (was: Major) > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null (only happens in a secure cluster), NPE > will cause the container fail to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375989#comment-15375989 ] sandflee commented on YARN-4743: I don't think snapshot could resolve this, as in YARN-5371, node is only sorted with unused resource. this seems caused by a > b, and b > c, but while sorting a and c, a < c. we should snapshot all sorting element and then sort to avoid this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to use mergeSort not TimSort for Collection#sort, I think capacity scheduler have similar problem. > ResourceManager crash because TimSort > - > > Key: YARN-4743 > URL: https://issues.apache.org/jira/browse/YARN-4743 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.4 >Reporter: Zephyr Guo >Assignee: Yufei Gu > Attachments: YARN-4743-cdh5.4.7.patch > > > {code} > 2016-02-26 14:08:50,821 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeCollapse(TimSort.java:410) > at java.util.TimSort.sort(TimSort.java:214) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) > at java.lang.Thread.run(Thread.java:745) > 2016-02-26 14:08:50,822 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > Actually, this issue found in 2.6.0-cdh5.4.7. > I think the cause is that we modify {{Resouce}} while we are sorting > {{runnableApps}}. > {code:title=FSLeafQueue.java} > Comparator comparator = policy.getComparator(); > writeLock.lock(); > try { > Collections.sort(runnableApps, comparator); > } finally { > writeLock.unlock(); > } > readLock.lock(); > {code} > {code:title=FairShareComparator} > public int compare(Schedulable s1, Schedulable s2) { > .. > s1.getResourceUsage(), minShare1); > boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null, > s2.getResourceUsage(), minShare2); > minShareRatio1 = (double) s1.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare1, > ONE).getMemory(); > minShareRatio2 = (double) s2.getResourceUsage().getMemory() > / Resources.max(RESOURCE_CALCULATOR, null, minShare2, > ONE).getMemory(); > .. > {code} > {{getResourceUsage}} will return current Resource. The current Resource is > unstable. > {code:title=FSAppAttempt.java} > @Override > public Resource getResourceUsage() { > // Here the getPreemptedResources() always return zero, except in > // a preemption round > return Resources.subtract(getCurrentConsumption(), > getPreemptedResources()); > } > {code} > {code:title=SchedulerApplicationAttempt} > public Resource getCurrentConsumption() { > return currentConsumption; > } > // This method may modify current Resource. > public synchronized void recoverContainer(RMContainer rmContainer) { > .. > Resources.addTo(currentConsumption, rmContainer.getContainer() > .getResource()); > .. > } > {code} > I suggest that use stable Resource in comparator. > Is there something i think wrong? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375988#comment-15375988 ] Haibo Chen commented on YARN-5373: -- As per offline discussion with Daniel, the cause is that in a secure cluster, the node manager that executes container launch code runs as a user that has no permission to read/execute the local wildcard directory that is downloaded as a resource by the remote user. Thus, directory.listFiles() return null. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null (only happens in a secure cluster), NPE > will cause the container fail to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3649) Allow configurable prefix for hbase table names (like prod, exp, test etc)
[ https://issues.apache.org/jira/browse/YARN-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375987#comment-15375987 ] Vrushali C commented on YARN-3649: -- Thanks Joep, yes I need to rebase. Good point about the documentation, will include updates to doc as well. > Allow configurable prefix for hbase table names (like prod, exp, test etc) > -- > > Key: YARN-3649 > URL: https://issues.apache.org/jira/browse/YARN-3649 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: YARN-5355 > Attachments: YARN-3649-YARN-2928.01.patch > > > As per [~jrottinghuis]'s suggestion in YARN-3411, it will be a good idea to > have a configurable prefix for hbase table names. > This way we can easily run a staging, a test, a production and whatever setup > in the same HBase instance / without having to override every single table in > the config. > One could simply overwrite the default prefix and you're off and running. > For prefix, potential candidates are "tst" "prod" "exp" etc. Once can then > still override one tablename if needed, but managing one whole setup will be > easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5164) CapacityOvertimePolicy does not take advantaged of plan RLE
[ https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375971#comment-15375971 ] Chris Douglas commented on YARN-5164: - Only minor nits, otherwise +1: {{CapacityOverTimePolicy}} - Avoid importing java.util.\* - Where the intermediate points are added, the code would be more readable if the key were assigned to a named variable (instead of multiple calls to {{e.getKey()}}). Same with the point-wise integral computation - checkstyle (spacing): {{+ if(e.getValue()!=null) {}} - A comment briefly sketching the algorithm would help future maintainers {{NoOverCommitPolicy}} - The exception message should be reformatted (some redundant string concats) and omit references to the time it no longer reports - Should the {{PlanningException}} be added as a cause, rather than concatenated with the ReservationID? > CapacityOvertimePolicy does not take advantaged of plan RLE > --- > > Key: YARN-5164 > URL: https://issues.apache.org/jira/browse/YARN-5164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, > YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, > YARN-5164.5.patch, YARN-5164.6.patch > > > As a consequence small time granularities (e.g., 1 sec) and long time horizon > for a reservation (e.g., months) run rather slow (10 sec). > Proposed resolution is to switch to interval math in checking, similar to how > YARN-4359 does for agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end
[ https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375956#comment-15375956 ] Xuan Gong commented on YARN-5361: - It's not straightforward to add a unit test. I have tested locally. > Obtaining logs for completed container says 'file belongs to a running > container ' at the end > - > > Key: YARN-5361 > URL: https://issues.apache.org/jira/browse/YARN-5361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sumana Sathish >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-5361.1.patch > > > Obtaining logs via yarn CLI for completed container but running application > says "This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete" > which is not correct. > {code} > LogType:stdout > Log Upload Time:Tue Jul 12 10:38:14 + 2016 > Log Contents: > End of LogType:stdout. This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end
[ https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-5361: Attachment: YARN-5361.1.patch > Obtaining logs for completed container says 'file belongs to a running > container ' at the end > - > > Key: YARN-5361 > URL: https://issues.apache.org/jira/browse/YARN-5361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sumana Sathish >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-5361.1.patch > > > Obtaining logs via yarn CLI for completed container but running application > says "This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete" > which is not correct. > {code} > LogType:stdout > Log Upload Time:Tue Jul 12 10:38:14 + 2016 > Log Contents: > End of LogType:stdout. This log file belongs to a running container > (container_e32_1468319707096_0001_01_04) and so may not be complete. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3662) Federation Membership State APIs
[ https://issues.apache.org/jira/browse/YARN-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375947#comment-15375947 ] Wangda Tan commented on YARN-3662: -- Hi [~subru], I took a very quick look at this patch and also YARN-3664/YARN-5367, I put all questions and comments here: Questions: - Could not quite sure about what is FederationPolicy and how to use the class. Is it a state or a configuration? And why compressing parameters into a byte array instead more meaningful fields? - It could be better to add RPC service interface definitions of FederationPolicy storage API for easier review, now I cannot understand how these protocol definitions will be used. (Highlevel) Comments: - FederationMembershipState looks like a "state manager" since it supports operations to modify existing members. At the first glance, it's a sub-cluster-resource-tracker which is similar to existing RM resource tracker. - Similiarly, FederationApplicationState looks like a "federation-application-manager" instead of a "state". - FederationMembershipState has same parameter FederationSubClusterInfo for register/heartbeat -- is it possible that we require different parameter for registration and heartbeat? (Just like NM registration request and NM update request). - FederationSubClusterInfo: fields like amRMAddress is actually a service endpoint, names of these fields are little confusing to me. Styles: - redundunt "public" in all interface definitions (considering switching to Intellij instead of Eclipse? :-p) Thanks, > Federation Membership State APIs > > > Key: YARN-3662 > URL: https://issues.apache.org/jira/browse/YARN-3662 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3662-YARN-2915-v1.1.patch, > YARN-3662-YARN-2915-v1.patch, YARN-3662-YARN-2915-v2.patch > > > The Federation Application State encapsulates the information about the > active RM of each sub-cluster that is participating in Federation. The > information includes addresses for ClientRM, ApplicationMaster and Admin > services along with the sub_cluster _capability_ which is currently defined > by *ClusterMetricsInfo*. Please refer to the design doc in parent JIRA for > further details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container
[ https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375910#comment-15375910 ] Sidharta Seethana commented on YARN-5298: - Thanks, [~vvasudev] and [~templedf] ! [~vvasudev] , about the container specific directories : The docker container runtime itself makes no assumptions about the location of the container specific directories/non container-specific directories. It does not know of or assume a parent/sub-dir structure and explicitly mounts all required directories. I hope that answers your question. > Mount usercache and NM filecache directories into Docker container > -- > > Key: YARN-5298 > URL: https://issues.apache.org/jira/browse/YARN-5298 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Sidharta Seethana > Attachments: YARN-5298.001.patch, YARN-5298.002.patch > > > Currently, we don't mount the usercache and the NM filecache directories into > the Docker container. This can lead to issues with containers that rely on > public and application scope resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5181) ClusterNodeTracker: add method to get list of nodes matching a specific resourceName
[ https://issues.apache.org/jira/browse/YARN-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375904#comment-15375904 ] Hadoop QA commented on YARN-5181: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 989 unchanged - 0 fixed = 991 total (was 989) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 33m 9s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 41s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807049/yarn-5181-1.patch | | JIRA Issue | YARN-5181 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 46276eaa1b34 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / af8f480 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt |
[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code
[ https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375896#comment-15375896 ] Hudson commented on YARN-5339: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10093 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10093/]) YARN-5339. Fixed "yarn logs" to fail when a file is passed to -out (vinodkv: rev d18050522c5c6bd9e32eb9a1be4ffe2288624c40) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java > passing file to -out for YARN log CLI doesnt give warning or error code > --- > > Key: YARN-5339 > URL: https://issues.apache.org/jira/browse/YARN-5339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sumana Sathish >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-5339.1.patch, YARN-5339.2.patch > > > passing file to -out for YARN log CLI doesnt give warning or error code > {code} > yarn logs -applicationId application_1467117709224_0003 -out > /grid/0/hadoopqe/artifacts/file.txt > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375867#comment-15375867 ] Vinod Kumar Vavilapalli commented on YARN-4464: --- We need ATS in production - aka ATS V2. With that in the picture, I agree that we don't need to keep any completed applications in RM memory at all. > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5181) ClusterNodeTracker: add method to get list of nodes matching a specific resourceName
[ https://issues.apache.org/jira/browse/YARN-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375866#comment-15375866 ] Arun Suresh commented on YARN-5181: --- Thanks for the patch [~kasha] Some minor comments: # Remove the unused import # Maybe rename the getNodes(String) to getNodesWithName(String) so that we don't need the cast null to (NodeFilter) in getAllNodes() ? > ClusterNodeTracker: add method to get list of nodes matching a specific > resourceName > > > Key: YARN-5181 > URL: https://issues.apache.org/jira/browse/YARN-5181 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5181-1.patch > > > ClusterNodeTracker should have a method to return the list of nodes matching > a particular resourceName. This is so we could identify what all nodes a > particular ResourceRequest is interested in, which in turn is useful in > YARN-5139 (global scheduler) and YARN-4752 (FairScheduler preemption > overhaul). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375862#comment-15375862 ] Daniel Templeton commented on YARN-4464: With ATS, I don't see a lot of need to keep 10k completed apps lying about. Not only is it a startup burden, but it also is a ZK burden. We regularly tell customers to set it lower because of ZK cache load. Improving the recovery logic is something we should also do, but the best doesn't need to be the enemy of the good. [~vinodkv], [~Naganarasimha], [~kasha], can we come to a conclusion? > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code
[ https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375833#comment-15375833 ] Vinod Kumar Vavilapalli commented on YARN-5339: --- Looks good, +1. Checking this in. > passing file to -out for YARN log CLI doesnt give warning or error code > --- > > Key: YARN-5339 > URL: https://issues.apache.org/jira/browse/YARN-5339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sumana Sathish >Assignee: Xuan Gong > Attachments: YARN-5339.1.patch, YARN-5339.2.patch > > > passing file to -out for YARN log CLI doesnt give warning or error code > {code} > yarn logs -applicationId application_1467117709224_0003 -out > /grid/0/hadoopqe/artifacts/file.txt > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5371) FairScheduer ContinuousScheduling thread throws Exception
[ https://issues.apache.org/jira/browse/YARN-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-5371. Resolution: Duplicate > FairScheduer ContinuousScheduling thread throws Exception > - > > Key: YARN-5371 > URL: https://issues.apache.org/jira/browse/YARN-5371 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: sandflee >Assignee: sandflee >Priority: Critical > > {noformat} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1002) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5371) FairScheduer ContinuousScheduling thread throws Exception
[ https://issues.apache.org/jira/browse/YARN-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5371: --- Priority: Critical (was: Major) > FairScheduer ContinuousScheduling thread throws Exception > - > > Key: YARN-5371 > URL: https://issues.apache.org/jira/browse/YARN-5371 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: sandflee >Assignee: sandflee >Priority: Critical > > {noformat} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1002) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage
[ https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375812#comment-15375812 ] Daniel Templeton commented on YARN-4767: Ping [~xgong], [~vinodkv]. Would love feedback on the approach in this patch. Thanks! > Network issues can cause persistent RM UI outage > > > Key: YARN-4767 > URL: https://issues.apache.org/jira/browse/YARN-4767 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.7.2 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-4767.001.patch, YARN-4767.002.patch, > YARN-4767.003.patch, YARN-4767.004.patch, YARN-4767.005.patch, > YARN-4767.006.patch, YARN-4767.007.patch > > > If a network issue causes an AM web app to resolve the RM proxy's address to > something other than what's listed in the allowed proxies list, the > AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy. > The RM proxy will then consume all available handler threads connecting to > itself over and over, resulting in an outage of the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375809#comment-15375809 ] Karthik Kambatla commented on YARN-4212: I am perfectly open to working on YARN-5264 first. Happy to review. > FairScheduler: Parent queues is not allowed to be 'Fair' policy if its > children have the "drf" policy > - > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Yufei Gu > Labels: fairscheduler > Attachments: YARN-4212.002.patch, YARN-4212.003.patch, > YARN-4212.004.patch, YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Description: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null (only happens in a secure cluster), NPE will cause the container fail to launch. was: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null (only happens in a secure cluster), NPE > will cause the container fail to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5304) Ship single node HBase config option with single startup command
[ https://issues.apache.org/jira/browse/YARN-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375807#comment-15375807 ] Karthik Kambatla commented on YARN-5304: I spoke to [~esteban] about this. In his opinion, the minicluster approach (master, RS etc. in a single process) is discouraged. I am assuming the goal is to do a pseudo-distributed setup of HBase - Master and RegionServer in different processes. > Ship single node HBase config option with single startup command > > > Key: YARN-5304 > URL: https://issues.apache.org/jira/browse/YARN-5304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Joep Rottinghuis >Assignee: Joep Rottinghuis > Labels: YARN-5355 > > For small to medium Hadoop deployments we should make it dead-simple to use > the timeline service v2. We should have a single command to launch and stop > the timelineservice back-end for the default HBase implementation. > A default config with all the values should be packaged that launches all the > needed daemons (on the RM node) with a single command with all the > recommended settings. > Having a timeline admin command, perhaps an init command might be needed, or > perhaps the timeline service can even auto-detect that and create tables, > deploy needed coprocessors etc. > The overall purpose is to ensure nobody needs to be an HBase expert to get > this going. For those cluster operators with HBase experience, they can > choose their own more sophisticated deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5343) TestContinuousScheduling#testSortedNodes fail intermittently
[ https://issues.apache.org/jira/browse/YARN-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375800#comment-15375800 ] Karthik Kambatla commented on YARN-5343: I remember [~yufeigu] was looking into this. [~yufeigu] - does [~sandflee]'s analysis help? > TestContinuousScheduling#testSortedNodes fail intermittently > > > Key: YARN-5343 > URL: https://issues.apache.org/jira/browse/YARN-5343 > Project: Hadoop YARN > Issue Type: Test >Reporter: sandflee >Priority: Minor > > {noformat} > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testSortedNodes(TestContinuousScheduling.java:167) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/12250/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair/TestContinuousScheduling/testSortedNodes/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375771#comment-15375771 ] Naganarasimha G R commented on YARN-5342: - Thanks for the patch [~wangda], Given that discussed approach in YARN-4225 (Fallback Policy based ) is going to take some time as it would require significant modifications, i would agree to go for intermittent modification to optimize the non exclusive mode scheduling. Only concern i have is if the size of default partition is greater than the non exclusive partition then on one allocation in default we are resetting the counter, would it be productive ? > Improve non-exclusive node partition resource allocation in Capacity Scheduler > -- > > Key: YARN-5342 > URL: https://issues.apache.org/jira/browse/YARN-5342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5342.1.patch > > > In the previous implementation, one non-exclusive container allocation is > possible when the missed-opportunity >= #cluster-nodes. And > missed-opportunity will be reset when container allocated to any node. > This will slow down the frequency of container allocation on non-exclusive > node partition: *When a non-exclusive partition=x has idle resource, we can > only allocate one container for this app in every > X=nodemanagers.heartbeat-interval secs for the whole cluster.* > In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 > pending resource for the non-exclusive partition OR we get allocation from > the default partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5374) Preemption causing communication loop
[ https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-5374. -- Resolution: Invalid Closing as invalid. > Preemption causing communication loop > - > > Key: YARN-5374 > URL: https://issues.apache.org/jira/browse/YARN-5374 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, nodemanager, resourcemanager, yarn >Affects Versions: 2.7.1 > Environment: Yarn version: Hadoop 2.7.1-amzn-0 > AWS EMR Cluster running: > 1 x r3.8xlarge (Master) > 52 x r3.8xlarge (Core) > Spark version : 1.6.0 > Scala version: 2.10.5 > Java version: 1.8.0_51 > Input size: ~10 tb > Input coming from S3 > Queue Configuration: > Dynamic allocation: enabled > Preemption: enabled > Q1: 70% capacity with max of 100% > Q2: 30% capacity with max of 100% > Job Configuration: > Driver memory = 10g > Executor cores = 6 > Executor memory = 10g > Deploy mode = cluster > Master = yarn > maxResultSize = 4g > Shuffle manager = hash >Reporter: Lucas Winkelmann >Priority: Blocker > > Here is the scenario: > I launch job 1 into Q1 and allow it to grow to 100% cluster utilization. > I wait between 15-30 mins ( for this job to complete with 100% of the cluster > available takes about 1hr so job 1 is between 25-50% complete). Note that if > I wait less time then the issue sometimes does not occur, it appears to be > only after the job 1 is at least 25% complete. > I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to > allow 70% of cluster utilization. > At this point job 1 basically halts progress while job 2 continues to execute > as normal and finishes. Job 2 either: > - Fails its attempt and restarts. By the time this attempt fails the other > job is already complete meaning the second attempt has full cluster > availability and finishes. > - The job remains at its current progress and simply does not finish ( I have > waited ~6 hrs until finally killing the application ). > > Looking into the error log there is this constant error message: > WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: > ip-NUMBERS.ec2.internal was preempted.)] in X attempts > > My observations have led me to believe that the application master does not > know about this container being killed and continuously asks the container to > remove the executor until eventually failing the attempt or continue trying > to remove the executor. > > I have done much digging online for anyone else experiencing this issue but > have come up with nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5374) Preemption causing communication loop
[ https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375761#comment-15375761 ] Lucas Winkelmann commented on YARN-5374: I will go ahead and file a Spark JIRA ticket now. > Preemption causing communication loop > - > > Key: YARN-5374 > URL: https://issues.apache.org/jira/browse/YARN-5374 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, nodemanager, resourcemanager, yarn >Affects Versions: 2.7.1 > Environment: Yarn version: Hadoop 2.7.1-amzn-0 > AWS EMR Cluster running: > 1 x r3.8xlarge (Master) > 52 x r3.8xlarge (Core) > Spark version : 1.6.0 > Scala version: 2.10.5 > Java version: 1.8.0_51 > Input size: ~10 tb > Input coming from S3 > Queue Configuration: > Dynamic allocation: enabled > Preemption: enabled > Q1: 70% capacity with max of 100% > Q2: 30% capacity with max of 100% > Job Configuration: > Driver memory = 10g > Executor cores = 6 > Executor memory = 10g > Deploy mode = cluster > Master = yarn > maxResultSize = 4g > Shuffle manager = hash >Reporter: Lucas Winkelmann >Priority: Blocker > > Here is the scenario: > I launch job 1 into Q1 and allow it to grow to 100% cluster utilization. > I wait between 15-30 mins ( for this job to complete with 100% of the cluster > available takes about 1hr so job 1 is between 25-50% complete). Note that if > I wait less time then the issue sometimes does not occur, it appears to be > only after the job 1 is at least 25% complete. > I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to > allow 70% of cluster utilization. > At this point job 1 basically halts progress while job 2 continues to execute > as normal and finishes. Job 2 either: > - Fails its attempt and restarts. By the time this attempt fails the other > job is already complete meaning the second attempt has full cluster > availability and finishes. > - The job remains at its current progress and simply does not finish ( I have > waited ~6 hrs until finally killing the application ). > > Looking into the error log there is this constant error message: > WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: > ip-NUMBERS.ec2.internal was preempted.)] in X attempts > > My observations have led me to believe that the application master does not > know about this container being killed and continuously asks the container to > remove the executor until eventually failing the attempt or continue trying > to remove the executor. > > I have done much digging online for anyone else experiencing this issue but > have come up with nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5374) Preemption causing communication loop
[ https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375759#comment-15375759 ] Wangda Tan commented on YARN-5374: -- [~LucasW], it seems to me that the issue is caused by Spark application doesn't well handle container preemption message. If so, I suggest you can drop a mail to Spark maillist or file a Spark JIRA instead. > Preemption causing communication loop > - > > Key: YARN-5374 > URL: https://issues.apache.org/jira/browse/YARN-5374 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, nodemanager, resourcemanager, yarn >Affects Versions: 2.7.1 > Environment: Yarn version: Hadoop 2.7.1-amzn-0 > AWS EMR Cluster running: > 1 x r3.8xlarge (Master) > 52 x r3.8xlarge (Core) > Spark version : 1.6.0 > Scala version: 2.10.5 > Java version: 1.8.0_51 > Input size: ~10 tb > Input coming from S3 > Queue Configuration: > Dynamic allocation: enabled > Preemption: enabled > Q1: 70% capacity with max of 100% > Q2: 30% capacity with max of 100% > Job Configuration: > Driver memory = 10g > Executor cores = 6 > Executor memory = 10g > Deploy mode = cluster > Master = yarn > maxResultSize = 4g > Shuffle manager = hash >Reporter: Lucas Winkelmann >Priority: Blocker > > Here is the scenario: > I launch job 1 into Q1 and allow it to grow to 100% cluster utilization. > I wait between 15-30 mins ( for this job to complete with 100% of the cluster > available takes about 1hr so job 1 is between 25-50% complete). Note that if > I wait less time then the issue sometimes does not occur, it appears to be > only after the job 1 is at least 25% complete. > I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to > allow 70% of cluster utilization. > At this point job 1 basically halts progress while job 2 continues to execute > as normal and finishes. Job 2 either: > - Fails its attempt and restarts. By the time this attempt fails the other > job is already complete meaning the second attempt has full cluster > availability and finishes. > - The job remains at its current progress and simply does not finish ( I have > waited ~6 hrs until finally killing the application ). > > Looking into the error log there is this constant error message: > WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: > ip-NUMBERS.ec2.internal was preempted.)] in X attempts > > My observations have led me to believe that the application master does not > know about this container being killed and continuously asks the container to > remove the executor until eventually failing the attempt or continue trying > to remove the executor. > > I have done much digging online for anyone else experiencing this issue but > have come up with nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375711#comment-15375711 ] Hudson commented on YARN-5364: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10092 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10092/]) YARN-5364. timelineservice modules have indirect dependencies on (naganarasimha_gr: rev af8f480c2482b40e9f5a2d29fb5bc7069979fa2e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/pom.xml > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5364.01.patch > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5374) Preemption causing communication loop
Lucas Winkelmann created YARN-5374: -- Summary: Preemption causing communication loop Key: YARN-5374 URL: https://issues.apache.org/jira/browse/YARN-5374 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, nodemanager, resourcemanager, yarn Affects Versions: 2.7.1 Environment: Yarn version: Hadoop 2.7.1-amzn-0 AWS EMR Cluster running: 1 x r3.8xlarge (Master) 52 x r3.8xlarge (Core) Spark version : 1.6.0 Scala version: 2.10.5 Java version: 1.8.0_51 Input size: ~10 tb Input coming from S3 Queue Configuration: Dynamic allocation: enabled Preemption: enabled Q1: 70% capacity with max of 100% Q2: 30% capacity with max of 100% Job Configuration: Driver memory = 10g Executor cores = 6 Executor memory = 10g Deploy mode = cluster Master = yarn maxResultSize = 4g Shuffle manager = hash Reporter: Lucas Winkelmann Priority: Blocker Here is the scenario: I launch job 1 into Q1 and allow it to grow to 100% cluster utilization. I wait between 15-30 mins ( for this job to complete with 100% of the cluster available takes about 1hr so job 1 is between 25-50% complete). Note that if I wait less time then the issue sometimes does not occur, it appears to be only after the job 1 is at least 25% complete. I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to allow 70% of cluster utilization. At this point job 1 basically halts progress while job 2 continues to execute as normal and finishes. Job 2 either: - Fails its attempt and restarts. By the time this attempt fails the other job is already complete meaning the second attempt has full cluster availability and finishes. - The job remains at its current progress and simply does not finish ( I have waited ~6 hrs until finally killing the application ). Looking into the error log there is this constant error message: WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: ip-NUMBERS.ec2.internal was preempted.)] in X attempts My observations have led me to believe that the application master does not know about this container being killed and continuously asks the container to remove the executor until eventually failing the attempt or continue trying to remove the executor. I have done much digging online for anyone else experiencing this issue but have come up with nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375683#comment-15375683 ] Daniel Templeton commented on YARN-5373: It looks like the issue only appears when running with a secure cluster. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null, NPE will cause the container fail to > launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375648#comment-15375648 ] Naganarasimha G R commented on YARN-5364: - strangely dependency tree was also not showing it as required jar earlier > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-5364.01.patch > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375647#comment-15375647 ] Naganarasimha G R commented on YARN-5364: - Not sure why it was failing earlier (with /without the patch), once i changed the repo location then was able to start running the test cases, Will go ahead and commit the patch. > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-5364.01.patch > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-5364: Attachment: (was: screenshot-1.png) > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-5364.01.patch > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Description: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. was: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null, NPE will cause the container fail to > launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Description: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. was: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {{code}} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {{code}} When directory.listFiles returns null, NPE will cause the container fail to launch. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), > new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null, NPE will cause the container fail to > launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Description: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. was: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), > new Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null, NPE will cause the container fail to > launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
[ https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-5373: - Description: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. was: YARN-4958 added support for wildcards in file localization. It introduces a NPE at {code:java} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {code} When directory.listFiles returns null, NPE will cause the container fail to launch. > NPE introduced by YARN-4958 (The file localization process should allow...) > --- > > Key: YARN-5373 > URL: https://issues.apache.org/jira/browse/YARN-5373 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Haibo Chen >Assignee: Haibo Chen > > YARN-4958 added support for wildcards in file localization. It introduces a > NPE > at > {code:java} > for (File wildLink : directory.listFiles()) { > sb.symlink(new Path(wildLink.toString()), new > Path(wildLink.getName())); > } > {code} > When directory.listFiles returns null, NPE will cause the container fail to > launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)
Haibo Chen created YARN-5373: Summary: NPE introduced by YARN-4958 (The file localization process should allow...) Key: YARN-5373 URL: https://issues.apache.org/jira/browse/YARN-5373 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.9.0 Reporter: Haibo Chen Assignee: Haibo Chen YARN-4958 added support for wildcards in file localization. It introduces a NPE at {{code}} for (File wildLink : directory.listFiles()) { sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName())); } {{code}} When directory.listFiles returns null, NPE will cause the container fail to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5303) Clean up ContainerExecutor JavaDoc
[ https://issues.apache.org/jira/browse/YARN-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375544#comment-15375544 ] Varun Vasudev commented on YARN-5303: - Thanks for the patch [~templedf]! +1. I'll commit this tomorrow if no one objects. > Clean up ContainerExecutor JavaDoc > -- > > Key: YARN-5303 > URL: https://issues.apache.org/jira/browse/YARN-5303 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Attachments: YARN-5303.001.patch > > > The {{ContainerExecutor}} class needs a lot of JavaDoc cleanup and could use > some other TLC as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5007) MiniYarnCluster contains deprecated constructor which is called by the other constructors
[ https://issues.apache.org/jira/browse/YARN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375528#comment-15375528 ] Hadoop QA commented on YARN-5007: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s {color} | {color:green} root generated 0 new + 706 unchanged - 4 fixed = 706 total (was 710) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 26s {color} | {color:red} root: The patch generated 1 new + 61 unchanged - 2 fixed = 62 total (was 63) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 27s {color} | {color:red} hadoop-yarn-server-tests in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 22s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 113m 53s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 162m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.TestContainerManagerSecurity | | | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.client.cli.TestLogsCLI | | | hadoop.mapred.TestMRCJCFileOutputCommitter | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817705/YARN-5007.02.patch | | JIRA Issue | YARN-5007 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e5fa8444e734 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5614217 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0
[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code
[ https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375522#comment-15375522 ] Xuan Gong commented on YARN-5339: - The testcase failures and checkstyle issue are not related > passing file to -out for YARN log CLI doesnt give warning or error code > --- > > Key: YARN-5339 > URL: https://issues.apache.org/jira/browse/YARN-5339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sumana Sathish >Assignee: Xuan Gong > Attachments: YARN-5339.1.patch, YARN-5339.2.patch > > > passing file to -out for YARN log CLI doesnt give warning or error code > {code} > yarn logs -applicationId application_1467117709224_0003 -out > /grid/0/hadoopqe/artifacts/file.txt > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code
[ https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375503#comment-15375503 ] Hadoop QA commented on YARN-5339: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 1 new + 87 unchanged - 1 fixed = 88 total (was 88) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 2s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 45s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI | | | hadoop.yarn.client.api.impl.TestAMRMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817574/YARN-5339.2.patch | | JIRA Issue | YARN-5339 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ab084b82c3e8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / eb47163 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12314/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/1
[jira] [Commented] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default
[ https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375496#comment-15375496 ] Hadoop QA commented on YARN-5363: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 0s {color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 7 new + 80 unchanged - 8 fixed = 87 total (was 88) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 37s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.client.cli.TestLogsCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817767/YARN-5363-2016-07-13.txt | | JIRA Issue | YARN-5363 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2c97dfcde450 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / eb47163 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results |
[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container
[ https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375488#comment-15375488 ] Daniel Templeton commented on YARN-5298: Looks good to me as well. > Mount usercache and NM filecache directories into Docker container > -- > > Key: YARN-5298 > URL: https://issues.apache.org/jira/browse/YARN-5298 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Sidharta Seethana > Attachments: YARN-5298.001.patch, YARN-5298.002.patch > > > Currently, we don't mount the usercache and the NM filecache directories into > the Docker container. This can lead to issues with containers that rely on > public and application scope resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5200) Improve yarn logs to get Container List
[ https://issues.apache.org/jira/browse/YARN-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375487#comment-15375487 ] Hudson commented on YARN-5200: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10091 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10091/]) YARN-5200. Enhanced "yarn logs" to be able to get a list of containers (vinodkv: rev eb471632349deac4b62f8dec853c8ceb64c9617a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java > Improve yarn logs to get Container List > --- > > Key: YARN-5200 > URL: https://issues.apache.org/jira/browse/YARN-5200 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-5200.1.patch, YARN-5200.10.patch, > YARN-5200.11.patch, YARN-5200.12.patch, YARN-5200.2.patch, YARN-5200.3.patch, > YARN-5200.4.patch, YARN-5200.5.patch, YARN-5200.6.patch, YARN-5200.7.patch, > YARN-5200.8.patch, YARN-5200.9.patch, YARN-5200.9.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container
[ https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375481#comment-15375481 ] Varun Vasudev commented on YARN-5298: - Forgot to mention - the question should not hold up the patch. +1 for the patch. I'll commit it tomorrow if no one objects. > Mount usercache and NM filecache directories into Docker container > -- > > Key: YARN-5298 > URL: https://issues.apache.org/jira/browse/YARN-5298 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Varun Vasudev >Assignee: Sidharta Seethana > Attachments: YARN-5298.001.patch, YARN-5298.002.patch > > > Currently, we don't mount the usercache and the NM filecache directories into > the Docker container. This can lead to issues with containers that rely on > public and application scope resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default
[ https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-5363: -- Attachment: YARN-5363-2016-07-13.txt Updated patch against the latest trunk. > For AM containers, or for containers of running-apps, "yarn logs" incorrectly > only (tries to) shows syslog file-type by default > --- > > Key: YARN-5363 > URL: https://issues.apache.org/jira/browse/YARN-5363 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-5363-2016-07-12.txt, YARN-5363-2016-07-13.txt > > > For e.g, for a running application, the following happens: > {code} > # yarn logs -applicationId application_1467838922593_0001 > 16/07/06 22:07:05 INFO impl.TimelineClientImpl: Timeline service address: > http://:8188/ws/v1/timeline/ > 16/07/06 22:07:06 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > 16/07/06 22:07:07 INFO impl.TimelineClientImpl: Timeline service address: > http://l:8188/ws/v1/timeline/ > 16/07/06 22:07:07 INFO client.RMProxy: Connecting to ResourceManager at > /:8050 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_01 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_02 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_03 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_04 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_05 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_06 within the application: > application_1467838922593_0001 > Can not find any log file matching the pattern: [syslog] for the container: > container_e03_1467838922593_0001_01_07 within the application: > application_1467838922593_0001 > Can not find the logs for the application: application_1467838922593_0001 > with the appOwner: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers
[ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375435#comment-15375435 ] Varun Vasudev commented on YARN-4759: - Thanks for the patch [~shaneku...@gmail.com]. Patch looks mostly good. One minor change - {code} + // always change back + if (change_effective_user(user, group) != 0) { +return -1; + } {code} Can you please log an error message? > Revisit signalContainer() for docker containers > --- > > Key: YARN-4759 > URL: https://issues.apache.org/jira/browse/YARN-4759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Shane Kumpf > Attachments: YARN-4759.001.patch, YARN-4759.002.patch > > > The current signal handling (in the DockerContainerRuntime) needs to be > revisited for docker containers. For example, container reacquisition on NM > restart might not work, depending on which user the process in the container > runs as. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5200) Improve yarn logs to get Container List
[ https://issues.apache.org/jira/browse/YARN-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375411#comment-15375411 ] Vinod Kumar Vavilapalli commented on YARN-5200: --- I'll dig up the test-case tickets. The latest patch looks good to me. +1, checking this in. > Improve yarn logs to get Container List > --- > > Key: YARN-5200 > URL: https://issues.apache.org/jira/browse/YARN-5200 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5200.1.patch, YARN-5200.10.patch, > YARN-5200.11.patch, YARN-5200.12.patch, YARN-5200.2.patch, YARN-5200.3.patch, > YARN-5200.4.patch, YARN-5200.5.patch, YARN-5200.6.patch, YARN-5200.7.patch, > YARN-5200.8.patch, YARN-5200.9.patch, YARN-5200.9.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM
[ https://issues.apache.org/jira/browse/YARN-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375339#comment-15375339 ] Manikandan R commented on YARN-5370: To solve this issue, we tried by setting yarn.nodemanager.delete.debug-delay-sec to very low value (zero second) assuming that it may clear off the existing scheduled deletion tasks. It didn't happen - basically it is not applied for the existing tasks which has been already scheduled. Then, we come to know that canRecover() method is getting called in service start, which is trying to pull the info from NM recovery directory (from local filesystem) and building this entire info in memory, which in turn, causing the problems in starting the services and consuming so much amount of memory. Then, we tried by moving the contents of NM recovery directory to some other place. From this points onwards, it was able to start smoothly and works as expected. I think showing some warnings about this high value (for ex, 100+ days) somewhere (for ex, in logs) indicating that it can cause potential crash can saving significant amount of time to troubleshoot this issue. > Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM > because of OOM > > > Key: YARN-5370 > URL: https://issues.apache.org/jira/browse/YARN-5370 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R > > I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev > cluster for some reasons. It has been done before 3-4 weeks. After setting > this up, at times, NM crashes because of OOM. So, I kept on increasing from > 512MB to 6 GB over the past few weeks gradually as and when this crash occurs > as temp fix. Sometimes, It won't start smoothly and after multiple tries, it > starts functioning. While analyzing heap dump of corresponding JVM, come to > know that DeletionService.Java is occupying almost 99% of total allocated > memory (-xmx) something like this > org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor > @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13% > Basically, there are huge no. of above mentioned tasks scheduled for > deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. > In my case, cluster is very small and OOM occurs. > Is it expected behaviour? (or) Is there any limit we can expose on > yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375332#comment-15375332 ] Varun Saxena commented on YARN-5364: Passes for me. I tried changing the repository path (so that all jars are downloaded) and even then it works. Probably [~naganarasimha...@apache.org] at that time the repository from where the jar was to be downloaded from, may have been down. > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-5364.01.patch, screenshot-1.png > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts
[ https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375332#comment-15375332 ] Varun Saxena edited comment on YARN-5364 at 7/13/16 4:47 PM: - Passes for me. I tried changing the repository path (so that all jars are downloaded again) and even then it works. Probably [~naganarasimha...@apache.org] at that time the repository from where the jar was to be downloaded from, may have been down. was (Author: varun_saxena): Passes for me. I tried changing the repository path (so that all jars are downloaded) and even then it works. Probably [~naganarasimha...@apache.org] at that time the repository from where the jar was to be downloaded from, may have been down. > timelineservice modules have indirect dependencies on mapreduce artifacts > - > > Key: YARN-5364 > URL: https://issues.apache.org/jira/browse/YARN-5364 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.0.0-alpha1 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Minor > Attachments: YARN-5364.01.patch, screenshot-1.png > > > The new timelineservice and timelineservice-hbase-tests modules have indirect > dependencies to mapreduce artifacts through HBase and phoenix. Although it's > not causing builds to fail, it's not good hygiene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org