[jira] [Commented] (YARN-3464) Race condition in LocalizerRunner kills localizer before localizing all resources
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513122#comment-14513122 ] Karthik Kambatla commented on YARN-3464: Just committed to trunk and branch-2. Thanks [~zxu] for the patch, and [~jlowe] for your inputs. Race condition in LocalizerRunner kills localizer before localizing all resources - Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3464) Race condition in LocalizerRunner kills localizer before localizing all resources
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3464: --- Summary: Race condition in LocalizerRunner kills localizer before localizing all resources (was: Race condition in LocalizerRunner causes container localization timeout.) Race condition in LocalizerRunner kills localizer before localizing all resources - Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3464) Race condition in LocalizerRunner causes container localization timeout.
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513106#comment-14513106 ] Karthik Kambatla commented on YARN-3464: +1 Race condition in LocalizerRunner causes container localization timeout. Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3464) Race condition in LocalizerRunner kills localizer before localizing all resources
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513116#comment-14513116 ] Hudson commented on YARN-3464: -- FAILURE: Integrated in Hadoop-trunk-Commit #7679 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7679/]) YARN-3464. Race condition in LocalizerRunner kills localizer before localizing all resources. (Zhihai Xu via kasha) (kasha: rev 47279c3228185548ed09c36579b420225e4894f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/LocalizationEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt Race condition in LocalizerRunner kills localizer before localizing all resources - Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3464) Race condition in LocalizerRunner kills localizer before localizing all resources
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513247#comment-14513247 ] Gera Shegalov commented on YARN-3464: - We might need to tweak checkstyle rules. There are a bunch of 80-column-limit violations that seem come from the import statements. Race condition in LocalizerRunner kills localizer before localizing all resources - Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-5.patch Adding unit tests. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3464) Race condition in LocalizerRunner kills localizer before localizing all resources
[ https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513153#comment-14513153 ] zhihai xu commented on YARN-3464: - thanks [~kasha] for the review and committing the patch, thanks [~jlowe] for the valuable feedback. Race condition in LocalizerRunner kills localizer before localizing all resources - Key: YARN-3464 URL: https://issues.apache.org/jira/browse/YARN-3464 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3464.000.patch, YARN-3464.001.patch Race condition in LocalizerRunner causes container localization timeout. Currently LocalizerRunner will kill the ContainerLocalizer when pending list for LocalizerResourceRequestEvent is empty. {code} } else if (pending.isEmpty()) { action = LocalizerAction.DIE; } {code} If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the ContainerLocalizer due to empty pending list, this LocalizerResourceRequestEvent will never be handled. Without ContainerLocalizer, LocalizerRunner#update will never be called. The container will stay at LOCALIZING state, until the container is killed by AM due to TASK_TIMEOUT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513199#comment-14513199 ] Hadoop QA commented on YARN-3458: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 27s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | | | 43m 21s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728247/YARN-3458-5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8b69c82 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7504/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7504/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7504/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7504/console | This message was automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3484) Fix up yarn top shell code
[ https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513135#comment-14513135 ] Allen Wittenauer commented on YARN-3484: * variables that are local to a function should be declared local. * avoid using mixed case as per the shell programming guidelines * yarnTopArgs is effectively a global. It should either get renamed to YARN_foo or another to not pollute the shell name space or another approach is process set_yarn_top_args as a subshell, reading its input directly to avoid the global entirely * set_yarn_top_args should be hadoop_ something so as to not pollute the shell name space * nit: technically, TERM isn't guaranteed to be set on all OSes under all workable modes, since it is the login process' responsibility to set it. However, almost all modern systems do set it and it's fairly reliable. I think it's OK to leave the check, but I wanted to make this comment here for future readers in case they hit the situation where TERM wasn't set for their particular system. Yes, that situation was thought about, but honestly, upgrade. Fix up yarn top shell code -- Key: YARN-3484 URL: https://issues.apache.org/jira/browse/YARN-3484 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Varun Vasudev Attachments: YARN-3484.001.patch We need to do some work on yarn top's shell code. a) Just checking for TERM isn't good enough. We really need to check the return on tput, especially since the output will not be a number but an error string which will likely blow up the java code in horrible ways. b) All the single bracket tests should be double brackets to force the bash built-in. c) I'd think I'd rather see the shell portion in a function since it's rather large. This will allow for args, etc, to get local'ized and clean up the case statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3172) MR-279: Write a simple Java application
[ https://issues.apache.org/jira/browse/YARN-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513367#comment-14513367 ] Allen Wittenauer edited comment on YARN-3172 at 4/27/15 12:48 AM: -- Welp, I'm committing this to trunk if test-patch says it is still good to go. was (Author: aw): Welp, I'm committing this to trunk. MR-279: Write a simple Java application --- Key: YARN-3172 URL: https://issues.apache.org/jira/browse/YARN-3172 Project: Hadoop YARN Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Devaraj K Attachments: MAPREDUCE-2720.patch Currently for isolation purposes, many simple java applications run in cluster with 1 map only job. (eg. Oozie). This is not really required with nextgen hadoop (mrv2) and *non-MR* apps are first class and easy to write. A simple hadoop java app can be written which runs in the cluster in the user space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3172) MR-279: Write a simple Java application
[ https://issues.apache.org/jira/browse/YARN-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513391#comment-14513391 ] Hadoop QA commented on YARN-3172: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 8 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 27s | There were no new checkstyle issues. | | {color:blue}0{color} | shellcheck | 5m 27s | Shellcheck was not available. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | | | 39m 47s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12523089/MAPREDUCE-2720.patch | | Optional Tests | javadoc javac unit findbugs checkstyle shellcheck | | git revision | trunk / 884 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7505/artifact/patchprocess/whitespace.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7505/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7505/console | This message was automatically generated. MR-279: Write a simple Java application --- Key: YARN-3172 URL: https://issues.apache.org/jira/browse/YARN-3172 Project: Hadoop YARN Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Devaraj K Attachments: MAPREDUCE-2720.patch Currently for isolation purposes, many simple java applications run in cluster with 1 map only job. (eg. Oozie). This is not really required with nextgen hadoop (mrv2) and *non-MR* apps are first class and easy to write. A simple hadoop java app can be written which runs in the cluster in the user space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3363: Attachment: YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3363: Attachment: (was: YARN-3363.001.patch) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-3363.000.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3172) MR-279: Write a simple Java application
[ https://issues.apache.org/jira/browse/YARN-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513367#comment-14513367 ] Allen Wittenauer commented on YARN-3172: Welp, I'm committing this to trunk. MR-279: Write a simple Java application --- Key: YARN-3172 URL: https://issues.apache.org/jira/browse/YARN-3172 Project: Hadoop YARN Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Devaraj K Attachments: MAPREDUCE-2720.patch Currently for isolation purposes, many simple java applications run in cluster with 1 map only job. (eg. Oozie). This is not really required with nextgen hadoop (mrv2) and *non-MR* apps are first class and easy to write. A simple hadoop java app can be written which runs in the cluster in the user space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-6.patch Fixing checkstyle errors. Personally, I think this is very strict but I just followed it. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.13.patch Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch, YARN-3448.10.patch, YARN-3448.12.patch, YARN-3448.13.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513509#comment-14513509 ] Gera Shegalov commented on YARN-3491: - We should switch to {{io.nativeio.NativeIO.POSIX#getFstat}} as implementation in {{RawLocalFileSystem}} to get rid of shell-based implementation for FileStatus. PublicLocalizer#addResource is too slow. Key: YARN-3491 URL: https://issues.apache.org/jira/browse/YARN-3491 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3491.000.patch, YARN-3491.001.patch, YARN-3491.002.patch Based on the profiling, The bottleneck in PublicLocalizer#addResource is getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir. checkLocalDir is very slow which takes about 10+ ms. The total delay will be approximately number of local dirs * 10+ ms. This delay will be added for each public resource localization. Because PublicLocalizer#addResource is slow, the thread pool can't be fully utilized. Instead of doing public resource localization in parallel(multithreading), public resource localization is serialized most of the time. And also PublicLocalizer#addResource is running in Dispatcher thread, So the Dispatcher thread will be blocked by PublicLocalizer#addResource for long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513455#comment-14513455 ] Hadoop QA commented on YARN-3363: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 22s | The applied patch generated 3 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 49s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 46m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728288/YARN-3363.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a2459b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7506/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7506/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7506/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7506/console | This message was automatically generated. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513463#comment-14513463 ] Hadoop QA commented on YARN-3458: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 31s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 2m 6s | Tests passed in hadoop-yarn-common. | | | | 43m 17s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728291/YARN-3458-6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a2459b | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7507/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7507/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7507/console | This message was automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3172) MR-279: Write a simple Java application
[ https://issues.apache.org/jira/browse/YARN-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513480#comment-14513480 ] Allen Wittenauer commented on YARN-3172: OK, looks like it needs to get rebased because this was before the 900th re-arrangement of the dir structure. :( MR-279: Write a simple Java application --- Key: YARN-3172 URL: https://issues.apache.org/jira/browse/YARN-3172 Project: Hadoop YARN Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Devaraj K Attachments: MAPREDUCE-2720.patch Currently for isolation purposes, many simple java applications run in cluster with 1 map only job. (eg. Oozie). This is not really required with nextgen hadoop (mrv2) and *non-MR* apps are first class and easy to write. A simple hadoop java app can be written which runs in the cluster in the user space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3172) MR-279: Write a simple Java application
[ https://issues.apache.org/jira/browse/YARN-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3172: --- Labels: newbie (was: ) MR-279: Write a simple Java application --- Key: YARN-3172 URL: https://issues.apache.org/jira/browse/YARN-3172 Project: Hadoop YARN Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Devaraj K Labels: newbie Attachments: MAPREDUCE-2720.patch Currently for isolation purposes, many simple java applications run in cluster with 1 map only job. (eg. Oozie). This is not really required with nextgen hadoop (mrv2) and *non-MR* apps are first class and easy to write. A simple hadoop java app can be written which runs in the cluster in the user space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)