[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil
[ https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Paduano updated YARN-4435: -- Attachment: YARN-4435.00.patch.txt attaching code for current version of these from my branch. they won't compile until HADOOP-12563 is committed. > Add RM Delegation Token DtFetcher Implementation for DtUtil > --- > > Key: YARN-4435 > URL: https://issues.apache.org/jira/browse/YARN-4435 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: YARN-4435.00.patch.txt, proposed_solution > > > Add a class to yarn project that implements the DtFetcher interface to return > a RM delegation token object. > I attached a proposed class implementation that does this, but it cannot be > added as a patch until the interface is merged in HADOOP-12563 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)
[ https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231657#comment-15231657 ] Akira AJISAKA commented on YARN-4630: - Hi [~sarutak], thank you for updating the patch. {code} public int compareTo(ContainerId other) { int result = this.getApplicationAttemptId().compareTo( other.getApplicationAttemptId()) == 0; {code} Would you remove {{== 0}} to fix compilation error? > Remove useless boxing/unboxing code (Hadoop YARN) > - > > Key: YARN-4630 > URL: https://issues.apache.org/jira/browse/YARN-4630 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Attachments: YARN-4630.0.patch, YARN-4630.1.patch > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231648#comment-15231648 ] Li Lu commented on YARN-3816: - Thanks [~sjlee0]! Yes I did use the word "accumulation" and "aggregation" in an interchangeable fashion, and I can certainly correct this in the follow up patch. However, I think you may overlooked one key change in the latest (v5) patch (due to the word "accumulation"). In this patch, my main focus is to implement aggregation (aggregating container metrics to application level), even though the API for TimelineMetric is called "accumulate". Aggregating metrics from all containers to one application is performed in timeline collector, using the internal Map called aggregationGroups. In this map, we maintain the aggregation status for each "group" (right now I used entity_type since all CONTAINER type entities will be mapped together). Within one aggregation group, we maintain metric status for each entity_id (each container id). On aggregation, for each aggregation group (like CONTAINER entity type), for each existing metric (like HDFS_BYTES_WRITE), we iterate through all known entity ids (containers) and perform the aggregation operation defined in the metric's realtimeAggregationOp field. On contrary to your comment, accumulation is actually the part missing in this draft patch. When we update the state for one container on one metric, we simply replace the previous one (In AggregationStatus#update, {{aggrRow.getPerEntityMetrics().put(entityId, m);}}). We can add methods to perform time-based accumulation later (reusing the "accumulate" method's name). BTW, by default metrics' aggregation op field is set to NOP so that we're not keeping them in the aggregation status table. Given the tight timeframe, we can certainly sync up offline if needed. Thanks! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231626#comment-15231626 ] Sangjin Lee commented on YARN-3816: --- Onto code-level comments... First, there seem to be checkstyle violations and javadoc errors. Could you please fix them? (RealTimeAggregationOperation.java) - As mentioned in the above comment, this really appears to be about "accumulation". We should rename things here to "accumulation". - l.36: We don’t need to update {{state}} for MAX? Could you explain how {{state}} is supposed to be used? - I don’t think I understand {{SUM.exec()}}. Maybe some comment in the code (or a JIRA comment) could be helpful. - l.116: There is no need for a separate interface ({{Operation}}). The {{exec()}} method can simply belong in {{RealTimeAggregationOperation}} itself. (TimelineMetric.java) - l.105: This is an unrelated issue with this patch, but I’m not sure what’s going on with the else clause in l.104-106 in the {{setValues()}} method. Could you look at it and fix it if it is not right? - l.183: we should use {{StringBuilder}} (unsynchronized) over {{StringBuffer}} (synchronized) - l.191: I would say use “get” instead of “retrieve” for these method names... - l.192: nit: since this is an enum, {{==}} is sufficient (no need for {{equals()}}); the same for l.206 and 220 - l.196: It should be {{firstKey()}} because it’s reverse sorted, right? We’re looking for the latest timestamp. - l.205: the name “key” is bit obscure. What we mean is the timestamp for the value. Should we rename this to {{getSingleDataTimestamp()}}? (TimelineMetricCalculator.java) - l.38: typo: “Number be compared” -> “Number to be compared”. The same with l.71 - l.41: nit: need a space before the opening brace - l.76: same as above - l.68: We stated that we will support only longs as the metric value type for now (and maybe double later). In any case, I think it’s safe to say we need not support ints. Should we simplify this by casting ints to longs if we see them? - l.109: do we need to check for both being null? - l.145: I think we should check to ensure time > 0. Also, it might be easier if we specify time as {{long}} instead of {{Long}}. - l.151: wouldn’t it be easier if we called {{sum()}} to handle the summation part instead of implementing the summing logic here again? - l.194: nit: space before the brace (TimelineCollector.java) - l.59-69: nit: let’s group all statics at the beginning and place instance members after them - the executor should be shut down properly in {{serviceStop()}}, or it will leave those threads hanging around - l.129: nit: we don’t need to specify {{TimelineCollector}} in calling the static methods (in several places here) - l.218: nit: let’s surround it with {{LOG.isDebugEnabled()}} - l.237-241: This is bit of an anti-pattern for using a {{ConcurrentHashMap}}. The issue is if multiple threads find that {{aggrRow}} is null and try to put their copies to the {{aggregateTable}} map, there is a race. As a result, you may start operating on an instance that will not be stored in the map eventually. You should use the {{putIfAbsent()}} method to make sure multiple threads always agree on the stored instance after the operation. - l.247: nit: let’s use == - l.258: nit: let’s use == (TimelineReaderWebServices.java) - Are the imports needed? There are no other code changes in this file? > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-l
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231612#comment-15231612 ] Sangjin Lee commented on YARN-3816: --- [~gtCarrera9], thanks much for posting an updated patch for this! I just had an opportunity to go over it fairly completely once, and have some high-level comments as well as more detailed code feedback. Starting with high-level comments: 1. "aggregation" v. "accumulation" This came up several times on this JIRA, and I think the distinction is crucial in getting this completed. I believe what we agreed on is as follows: "aggregation" is about rolling up metrics from a child type to a parent type (e.g. rolling up metrics from containers to applications), and "accumulation" is about computing/deriving secondary values based on the *time dimension* (e.g. area under the curve or the running maximum). Those two are rather independent, and we should not mix them. Unfortunately in the latest patch, these two terms are used very much interchangeably. Can we make that distinction clear and rename all the classes/methods/variables that pertain to accumulation from "aggregation" to "accumulation"? It would be good if we reserve "aggregation" to child-to-parent rollups. 2. container-to-application aggregation Related to above, this JIRA was meant to implement 2 features: (1) "aggregating" metrics from containers to applications, and (2) "accumulating" metrics for (certain) entity types. Both should be done. However, in the latest patch, *I do not see (1) being done*. In other words, I didn't find code that rolls up metrics from the container entities and sets them to the parent application entities. Am I missing something? The previous patches did implement that. Without this, we will *NOT* see things like container CPU or memory being rolled up to applications, and as a consequence to flow runs, and so on. This is a MUST. IMO that is a separate functionality from the accumulation. I think we should do it clearly and explicitly. And the rolled-up metrics should be set onto the application entities. 3. time-based accumulation We also said that the time-based accumulation should be conditional on a configuration (see [the previous patch|https://issues.apache.org/jira/secure/attachment/12761120/YARN-3816-YARN-2928-v4.patch]). I see that condition is not there in the latest patch. Can we please make the accumulation conditional on that configuration? Also, this was an issue with the previous patches and I think it exists with the latest patch. It appears that we are doing the time-based accumulation for *all metrics for all entity types*. We might want to think about whether that would be OK. There are some performance and storage implications in doing so. Also, I raised some semantic issues with that idea. See the previous comment [here|https://issues.apache.org/jira/browse/YARN-3816?focusedCommentId=15067321&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067321]. I'm not 100% certain if the latest patch has the same issue or not although I suspect it might. 4. new YARN_APPLICATION_AGGREGATION entity type I also raised a concern whether we should use a separate entity type for this. First of all, the "aggregation" (from containers to applications) *should* go to the actual application type. Second, even for "accumulation" you might want to think about what you want to do. I assume that the accumulated metrics (YARN_APPLICATION_AGGREGATION) are being written to the entities table. Note that they are not really considered as part of the application, and are not available for application queries. So there is an implication for queries. And they are not going to be aggregated up to the flow runs. I know this is a lot to parse, and obviously there is much history in this discussion. However, it would help to replay the main discussions up to this point so that we don't lose these important points. Thanks much! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.p
[jira] [Updated] (YARN-4865) Track Reserved resources in ResourceUsage and QueueCapacities
[ https://issues.apache.org/jira/browse/YARN-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4865: -- Attachment: 0003-YARN-4865-addendum.patch Thank u [~leftnoteasy] and [~karams] Attached new patch. Test case will cover below scenario. - One container is reserved for app2 in node1 - Killed a running container of app1, thus making enough space in node1 for app2 container. - Reserved container became allocated. Verified the new metrics against the same. Pls suggest if this is fine or not. I will raise another ticket to handle cases like node removed etc. > Track Reserved resources in ResourceUsage and QueueCapacities > -- > > Key: YARN-4865 > URL: https://issues.apache.org/jira/browse/YARN-4865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0 > > Attachments: 0001-YARN-4865.patch, 0002-YARN-4865.patch, > 0003-YARN-4865-addendum.patch, 0003-YARN-4865.patch > > > As discussed in YARN-4678, capture reserved capacity separately in > QueueCapcities for better tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4425) Pluggable sharing policy for Partition Node Label resources
[ https://issues.apache.org/jira/browse/YARN-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231438#comment-15231438 ] Hadoop QA commented on YARN-4425: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4425 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12780557/YARN-4425.20160105-1.patch | | JIRA Issue | YARN-4425 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10989/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Pluggable sharing policy for Partition Node Label resources > --- > > Key: YARN-4425 > URL: https://issues.apache.org/jira/browse/YARN-4425 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: ResourceSharingPolicyForNodeLabelsPartitions-V1.pdf, > ResourceSharingPolicyForNodeLabelsPartitions-V2.pdf, > YARN-4425.20160105-1.patch > > > As part of support for sharing NonExclusive Node Label partitions in > YARN-3214, NonExclusive partitions are shared only to Default Partitions and > also have fixed rule when apps in default partitions makes use of resources > of any NonExclusive partitions. > There are many scenarios where in we require pluggable policy like > MutliTenant, Hierarchical etc.. where in each partition can determine when > they want to share the resources to other paritions and when other partitions > wants to use resources from others > More details in the attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4425) Pluggable sharing policy for Partition Node Label resources
[ https://issues.apache.org/jira/browse/YARN-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231431#comment-15231431 ] Wangda Tan commented on YARN-4425: -- [~Naganarasimha], [~xinxianyin], Read the doc and took a very high level look at the patch, sorry for the huge delays. Some thoughts: The new added policy seems like a backdoor of scheduler's capacity management: when under "NON_EXCLUSIVE" mode, scheduler completely depends on configured policy to decide who can get next resources. And we need to make API of policy with more clear semantic: In existing scheduler, resource will be allocated on requested partition, and only request.partition = "" can get chance to allocate on other partitions when non-exclusive criteria is met. In the new API, resource could be allocated on any partition regardless of requested partition. (depends on different policy implementation). Which will be conflict to our existing APIs for node partitions. To me, sharing resource between partitions itself is not a clear API: You can say partition A has total resource = 100G, partition B has total resource = 200G. But you cannot say: "under some conditions, partition A can use idle resources from partition B" -- because partition is not the entity which will consume resources. Instead, it will be more clear to me to say: 1) Queue's shares of partitions could be dynamically adjusted, OR 2) Node's partition could be dynamically update > Pluggable sharing policy for Partition Node Label resources > --- > > Key: YARN-4425 > URL: https://issues.apache.org/jira/browse/YARN-4425 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: ResourceSharingPolicyForNodeLabelsPartitions-V1.pdf, > ResourceSharingPolicyForNodeLabelsPartitions-V2.pdf, > YARN-4425.20160105-1.patch > > > As part of support for sharing NonExclusive Node Label partitions in > YARN-3214, NonExclusive partitions are shared only to Default Partitions and > also have fixed rule when apps in default partitions makes use of resources > of any NonExclusive partitions. > There are many scenarios where in we require pluggable policy like > MutliTenant, Hierarchical etc.. where in each partition can determine when > they want to share the resources to other paritions and when other partitions > wants to use resources from others > More details in the attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231430#comment-15231430 ] Li Lu commented on YARN-4928: - My only concern is the import line raised by [~djp]. The findbugs warning is unrelated to the fix here. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch, > YARN-4928.003.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot
[ https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231422#comment-15231422 ] Hudson commented on YARN-4756: -- FAILURE: Integrated in Hadoop-trunk-Commit #9576 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9576/]) YARN-4756. Unnecessary wait in Node Status Updater during reboot. (Eric (kasha: rev e82f961a3925aadf9e53a009820a48ba9e4f78b6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java > Unnecessary wait in Node Status Updater during reboot > - > > Key: YARN-4756 > URL: https://issues.apache.org/jira/browse/YARN-4756 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-4756.001.patch, YARN-4756.002.patch, > YARN-4756.003.patch, YARN-4756.004.patch, YARN-4756.005.patch > > > The startStatusUpdater thread waits for the isStopped variable to be set to > true, but it is waiting for the next heartbeat. During a reboot, the next > heartbeat will not come and so the thread waits for a timeout. Instead, we > should notify the thread to continue so that it can check the isStopped > variable and exit without having to wait for a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231401#comment-15231401 ] Wangda Tan commented on YARN-4902: -- Thanks reviews from [~asuresh]/[~subru]: Feedbacks to some of your points: bq. rename the allocation-id proposed here to maybe resource-request-id? Since the id will be a part of allocated resource (no matter its allocation or container). Is it better to rename it to "allocation-request-id"? (cc: [~vinodkv]) bq. Now that we are reworking the API from scratch, can we add a cost function for the ResourceRequest? I feel Priority is being overloaded to express scheduling cost, preemption cost, container types etc. Could you elaborate "scheduling cost"? "Preemption cost" should be depends on running process under context of reusable allocation, correct? Since user can use the same slot run important and less-important workloads. "Container type" should be a part of tag per my understanding. bq. grok why we need both maximum number of allocations & maximum concurrency Maximum concurrency is to avoid one app take all the cluster. It only limits total concurrent resources used by one app. bq. The current Schedulers will be extremely hard pressed to efficiently handle GUTS API requests. I guess this should act as a good motivation to consider an application centric approach as opposed to the current node centric one. Agree, global scheduling becomes important if we want to support such API. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4771) Some containers can be skipped during log aggregation after NM restart
[ https://issues.apache.org/jira/browse/YARN-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231386#comment-15231386 ] Junping Du commented on YARN-4771: -- 002 patch LGTM. An additional fix is we'd better to use MonotonicTime to replace System.currentTimeMillis() for tracking timeout - just an optional comment, we can address here or in a separated jira. > Some containers can be skipped during log aggregation after NM restart > -- > > Key: YARN-4771 > URL: https://issues.apache.org/jira/browse/YARN-4771 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Priority: Critical > Attachments: YARN-4771.001.patch, YARN-4771.002.patch > > > A container can be skipped during log aggregation after a work-preserving > nodemanager restart if the following events occur: > # Container completes more than > yarn.nodemanager.duration-to-track-stopped-containers milliseconds before the > restart > # At least one other container completes after the above container and before > the restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot
[ https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231376#comment-15231376 ] Karthik Kambatla commented on YARN-4756: The patch seems reasonable to me. +1. Also, quite excited to see a +1 from Hadoop QA. Checking this in. > Unnecessary wait in Node Status Updater during reboot > - > > Key: YARN-4756 > URL: https://issues.apache.org/jira/browse/YARN-4756 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-4756.001.patch, YARN-4756.002.patch, > YARN-4756.003.patch, YARN-4756.004.patch, YARN-4756.005.patch > > > The startStatusUpdater thread waits for the isStopped variable to be set to > true, but it is waiting for the next heartbeat. During a reboot, the next > heartbeat will not come and so the thread waits for a timeout. Instead, we > should notify the thread to continue so that it can check the isStopped > variable and exit without having to wait for a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231365#comment-15231365 ] Subru Krishnan commented on YARN-4902: -- Thanks [~vinodkv], [~leftnoteasy], [~jianhe],[~vvasudev] and others for putting up this proposal. I went through it & it seems quite relevant with the increasing range of workloads we have to support in the near future in YARN. I have a few high level comments below. Obviously this needs lot more thought/discussions. *GUTS API feedback*: - I want to echo [~asuresh]'s comment on consolidating _Allocation-ID_ with _Request-ID_ proposed in YARN-4879 and [~vinodkv] seems to agree based on his [comments|https://issues.apache.org/jira/browse/YARN-4879?focusedCommentId=15220475] . - Now that we are reworking the API from scratch, can we add a *cost function* for the _ResourceRequest_? I feel _Priority_ is being overloaded to express scheduling cost, preemption cost, container types etc. - I am not able to grok why we need both _maximum number of allocations & maximum concurrency_, especially considering that this on top of the existing _numContainers_. Won't they conflict? - Can we have a section in the end to explicitly list the mandatory and optional attributes at _Application_ and _ResourceRequests_ level. The document is rather long and so a snapshot summary will be good. - Overall the proposed API seems quite powerful but we should make sure that we don't end up trading simplicity for functionality IMHO(this is based on the feedback we received for YARN-1051). For instance, the typical MapReduce scenario feels more dense when compared to the current APIs but should be more easily expressible if we sacrifice on additional flexibility that the GUTS API provides. So it'll also be good to have examples of how current constrained asks will look like when made through GUTS API. *Time aspects*: - I agree that we should consolidate the time related placement conditions with the work done in YARN-1051. - + capital 1 on your observation that _The reservations feature proposed at YARN-1051 can pave a great way for implementing minimumconcurrency_ :). *Scheduler enhancements*: - The current _Schedulers_ will be extremely hard pressed to efficiently handle GUTS API requests. I guess this should act as a good motivation to consider an _application centric_ approach as opposed to the current _node centric_ one as we have occasionally discussed with [~asuresh], [~kasha], [~curino] etc all. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4865) Track Reserved resources in ResourceUsage and QueueCapacities
[ https://issues.apache.org/jira/browse/YARN-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231126#comment-15231126 ] Wangda Tan edited comment on YARN-4865 at 4/7/16 11:56 PM: --- [~sunilg], It seems this patch needs one more fix: {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 9a74c22..df57787 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource, // Book-keeping if (removed) { - - // track reserved resource for metrics, for normal container - // getReservedResource will be null. - Resource reservedRes = rmContainer.getReservedResource(); - if (reservedRes != null && !reservedRes.equals(Resources.none())) { -decReservedResource(node.getPartition(), reservedRes); - } - // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java index cf1b3e0..558fc53 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java @@ -247,6 +247,8 @@ public synchronized boolean unreserve(Priority priority, // Update reserved metrics queue.getMetrics().unreserveResource(getUser(), rmContainer.getReservedResource()); + + queue.decReservedResource(node.getPartition(), rmContainer.getReservedResource()); return true; } return false; {code} We need above change to make sure allocation from reserved container will correctly deduct reserved resource. [~sunilg], could you add few tests also? And some other cases in my mind that we need to consider: - Nodes lost / disconnected, we need to deduct reserved resources on such nodes. (I think it should covered by completedContainer code path) Above can be addressed in a separate JIRA. (Thanks [~karams] reporting this issue) was (Author: leftnoteasy): [~sunilg], It seems this patch needs one more fix: {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 9a74c22..df57787 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource, // Book-keeping if (removed) { - - // track reserved resource for metrics, for normal container - // getReservedResource will be null. - Resource reservedRes = rmContainer.getReservedResource(); - if (reservedRes != null && !reservedRes.equals(Resources.none())) { -decReservedResource(node.getPartition(), reservedRes); - } - // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fic
[jira] [Commented] (YARN-4865) Track Reserved resources in ResourceUsage and QueueCapacities
[ https://issues.apache.org/jira/browse/YARN-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231361#comment-15231361 ] Sunil G commented on YARN-4865: --- Thanks [~leftnoteasy]. I will add some more tests with this suggested change. > Track Reserved resources in ResourceUsage and QueueCapacities > -- > > Key: YARN-4865 > URL: https://issues.apache.org/jira/browse/YARN-4865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0 > > Attachments: 0001-YARN-4865.patch, 0002-YARN-4865.patch, > 0003-YARN-4865.patch > > > As discussed in YARN-4678, capture reserved capacity separately in > QueueCapcities for better tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231351#comment-15231351 ] Karthik Kambatla commented on YARN-4927: Thanks for picking this up, [~bibinchundatt]. Few comments: # AdminService: since the test is in the same class, {{refreshAll}} could be package-private instead of public. Also, you might want to mark it @VisibleForTesting along with a comment that it can be private otherwise. # TestRMHA ## The new variable counter could be private to the anonymous AdminService class we are creating in the test. ## The assertion when the RM fails to transition to active seems backwards. Shouldn't we be checking {{e.getMessage().contains("")}}? ## I wonder if we are even running into that exception. If the test is expecting the exception, we should add an {{Assert.fail}} right after the call to transition to active. ## Also, I am not a fan of checking just the message verbatim. Can we check if the exception is {{ServiceFailedException}} and preferably the expected RM state (Active/Standby)? ## Not introduced in this patch, but the asserts in the test should have a corresponding error message to explain what exactly is going on . > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4927.patch > > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4931) Preempted resources go back to the same application
[ https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Crawford updated YARN-4931: - Description: Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them! Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it: {code} 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt res for queue root.default: resDueToMinShare = , resDueToFairShare = 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=) from queue root.milesc 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_01, containerId=container_1460047303577_0002_01_001038, resource= 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED {code} But then a moment later, application 2 gets the container right back: {code} 2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, used and available after allocation 2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED 2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING {code} This results in new applications being unable to even get an AM, and never starting was: Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them! Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it: {code} 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt res for queue root.default: resDueToMinShare = , resDueToFairShare = 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=) from queue root.milesc 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_01, containerId=container_1460047303577_0002_01_001038, resource= 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED {/code} But then a moment later, application 2 gets the container right back: {code} 2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, used and available after allocation 2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED 2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING {/code} This results in new applications being unable to even get an AM, and never starting > Preempted resources go back to the same application > --- > > Key: YARN-4931 > URL: https://issues.apache.org/jira/browse/YARN-4931 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: Miles Crawford > Attach
[jira] [Updated] (YARN-4931) Preempted resources go back to the same application
[ https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Crawford updated YARN-4931: - Description: Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them! Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it: {code} 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt res for queue root.default: resDueToMinShare = , resDueToFairShare = 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=) from queue root.milesc 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_01, containerId=container_1460047303577_0002_01_001038, resource= 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED {code} But then a moment later, application 2 gets the container right back: {code} 2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, used and available after allocation 2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED 2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING {code} This results in new applications being unable to even get an AM, and never starting at all. was: Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them! Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it: {code} 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt res for queue root.default: resDueToMinShare = , resDueToFairShare = 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=) from queue root.milesc 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_01, containerId=container_1460047303577_0002_01_001038, resource= 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED {code} But then a moment later, application 2 gets the container right back: {code} 2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, used and available after allocation 2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED 2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING {code} This results in new applications being unable to even get an AM, and never starting > Preempted resources go back to the same application > --- > > Key: YARN-4931 > URL: https://issues.apache.org/jira/browse/YARN-4931 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: Miles Crawford >
[jira] [Updated] (YARN-4931) Preempted resources go back to the same application
[ https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Crawford updated YARN-4931: - Attachment: resourcemanager.log Log snippet showing the behavior in detail. > Preempted resources go back to the same application > --- > > Key: YARN-4931 > URL: https://issues.apache.org/jira/browse/YARN-4931 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: Miles Crawford > Attachments: resourcemanager.log > > > Sometimes a queue that needs resources causes preemption - but the preempted > containers are just allocated right back to the application that just > released them! > Here is a tiny application (0007) that wants resources, and a container is > preempted from application 0002 to satisfy it: > {code} > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Should preempt res for > queue root.default: resDueToMinShare = , > resDueToFairShare = > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Preempting container (prio=1res= vCores:1>) from queue root.milesc > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics > (FairSchedulerUpdateThread): Non-AM container preempted, current > appAttemptId=appattempt_1460047303577_0002_01, > containerId=container_1460047303577_0002_01_001038, resource= vCores:1> > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container > Transitioned from RUNNING to KILLED > {/code} > But then a moment later, application 2 gets the container right back: > {code} > 2016-04-07 21:08:13,844 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode > (ResourceManager Event Processor): Assigned container > container_1460047303577_0002_01_001039 of capacity > on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 > containers, used and > available after allocation > 2016-04-07 21:08:14,555 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 > Container Transitioned from ALLOCATED to ACQUIRED > 2016-04-07 21:08:14,845 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1460047303577_0002_01_001039 > Container Transitioned from ACQUIRED to RUNNING > {/code} > This results in new applications being unable to even get an AM, and never > starting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4931) Preempted resources go back to the same application
[ https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miles Crawford updated YARN-4931: - Summary: Preempted resources go back to the same application (was: Preempted resources go back to ) > Preempted resources go back to the same application > --- > > Key: YARN-4931 > URL: https://issues.apache.org/jira/browse/YARN-4931 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: Miles Crawford > > Sometimes a queue that needs resources causes preemption - but the preempted > containers are just allocated right back to the application that just > released them! > Here is a tiny application (0007) that wants resources, and a container is > preempted from application 0002 to satisfy it: > {code} > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Should preempt res for > queue root.default: resDueToMinShare = , > resDueToFairShare = > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Preempting container (prio=1res= vCores:1>) from queue root.milesc > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics > (FairSchedulerUpdateThread): Non-AM container preempted, current > appAttemptId=appattempt_1460047303577_0002_01, > containerId=container_1460047303577_0002_01_001038, resource= vCores:1> > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container > Transitioned from RUNNING to KILLED > {/code} > But then a moment later, application 2 gets the container right back: > {code} > 2016-04-07 21:08:13,844 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode > (ResourceManager Event Processor): Assigned container > container_1460047303577_0002_01_001039 of capacity > on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 > containers, used and > available after allocation > 2016-04-07 21:08:14,555 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 > Container Transitioned from ALLOCATED to ACQUIRED > 2016-04-07 21:08:14,845 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1460047303577_0002_01_001039 > Container Transitioned from ACQUIRED to RUNNING > {/code} > This results in new applications being unable to even get an AM, and never > starting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4931) Preempted resources go back to
Miles Crawford created YARN-4931: Summary: Preempted resources go back to Key: YARN-4931 URL: https://issues.apache.org/jira/browse/YARN-4931 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.2 Reporter: Miles Crawford Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them! Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it: {code} 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt res for queue root.default: resDueToMinShare = , resDueToFairShare = 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=) from queue root.milesc 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_01, containerId=container_1460047303577_0002_01_001038, resource= 2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED {/code} But then a moment later, application 2 gets the container right back: {code} 2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, used and available after allocation 2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED 2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING {/code} This results in new applications being unable to even get an AM, and never starting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4851) Metric improvements for ATS v1.5 storage
[ https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4851: Attachment: YARN-4851-trunk.001.patch First draft of the patch. In this patch I've added some metrics for the ATS v1.5 storage. Specifically: For overall system usage: - Number of read requests to summary storage - Number of read requests to detail storage - Number of entities scanned by EntityGroupFS storage into the summary storage. - Accumulated time spent on scanning new apps in the active directory - Accumulated time spent on reading summary data into the summary storage Caching performance: - Number of cache storage refreshes (cache reloads). This can be compared to the number of read requests to detail storage to understand how useful a caching layer is for specific cluster workload. - Accumulated time spent on refreshing cache storages. Log cleaner/purging: - Number of dirs purged by the storage - Accumulated time for log cleaning. > Metric improvements for ATS v1.5 storage > > > Key: YARN-4851 > URL: https://issues.apache.org/jira/browse/YARN-4851 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4851-trunk.001.patch > > > We can add more metrics to the ATS v1.5 storage systems, including purging, > cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4733) [YARN-3368] Initial commit of new YARN web UI
[ https://issues.apache.org/jira/browse/YARN-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4733: - Summary: [YARN-3368] Initial commit of new YARN web UI (was: [YARN-3368] Commit initial web UI patch to branch: YARN-3368) > [YARN-3368] Initial commit of new YARN web UI > - > > Key: YARN-4733 > URL: https://issues.apache.org/jira/browse/YARN-4733 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: YARN-3368 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231152#comment-15231152 ] Hadoop QA commented on YARN-4514: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4514 does not apply to YARN-3368. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12797310/YARN-4514-YARN-3368.4.patch | | JIRA Issue | YARN-4514 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10987/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reopened YARN-4514: -- > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-4514. -- Resolution: Fixed Have to resolve and reopen to set status to be patch available. > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and fix licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4849: - Summary: [YARN-3368] cleanup code base, integrate web UI related build to mvn, and fix licenses. (was: [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.) > [YARN-3368] cleanup code base, integrate web UI related build to mvn, and fix > licenses. > --- > > Key: YARN-4849 > URL: https://issues.apache.org/jira/browse/YARN-4849 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4849-YARN-3368.1.patch, > YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, > YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, > YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231135#comment-15231135 ] Wangda Tan commented on YARN-4849: -- Thanks for review, [~sunilg]. ASF license issue is not caused by the patch. Committing to branch-3368. > [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add > licenses. > --- > > Key: YARN-4849 > URL: https://issues.apache.org/jira/browse/YARN-4849 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4849-YARN-3368.1.patch, > YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, > YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, > YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231137#comment-15231137 ] Hadoop QA commented on YARN-4927: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 59s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 49s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 155m 19s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.webapp.TestRMWithCSRFFilter | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.y
[jira] [Commented] (YARN-4865) Track Reserved resources in ResourceUsage and QueueCapacities
[ https://issues.apache.org/jira/browse/YARN-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231126#comment-15231126 ] Wangda Tan commented on YARN-4865: -- [~sunilg], It seems this patch needs one more fix: {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 9a74c22..df57787 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java @@ -1322,14 +1322,6 @@ public void completedContainer(Resource clusterResource, // Book-keeping if (removed) { - - // track reserved resource for metrics, for normal container - // getReservedResource will be null. - Resource reservedRes = rmContainer.getReservedResource(); - if (reservedRes != null && !reservedRes.equals(Resources.none())) { -decReservedResource(node.getPartition(), reservedRes); - } - // Inform the ordering policy orderingPolicy.containerReleased(application, rmContainer); diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java index cf1b3e0..558fc53 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java @@ -247,6 +247,8 @@ public synchronized boolean unreserve(Priority priority, // Update reserved metrics queue.getMetrics().unreserveResource(getUser(), rmContainer.getReservedResource()); + + queue.decReservedResource(node.getPartition(), rmContainer.getReservedResource()); return true; } return false; {code} We need above change to make sure allocation from reserved container will correctly deduct reserved resource. [~sunilg], could you add few tests also? And some other cases in my mind that we need to consider: - Nodes lost / disconnected, we need to deduct reserved resources on such nodes. (I think it should covered by completedContainer code path) Above can be addressed in a separate JIRA. > Track Reserved resources in ResourceUsage and QueueCapacities > -- > > Key: YARN-4865 > URL: https://issues.apache.org/jira/browse/YARN-4865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0 > > Attachments: 0001-YARN-4865.patch, 0002-YARN-4865.patch, > 0003-YARN-4865.patch > > > As discussed in YARN-4678, capture reserved capacity separately in > QueueCapcities for better tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
[ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231088#comment-15231088 ] Wangda Tan commented on YARN-4781: -- [~sunilg], We should make sure this JIRA contains infrastructure that can be used by other intra-queue preemption policies such as priority-based. [~milesc], Thanks for sharing your use cases, hopefully you don't need creating one queue for each job any more after this feature :) > Support intra-queue preemption for fairness ordering policy. > > > Key: YARN-4781 > URL: https://issues.apache.org/jira/browse/YARN-4781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > We introduced fairness queue policy since YARN-3319, which will let large > applications make progresses and not starve small applications. However, if a > large application takes the queue’s resources, and containers of the large > app has long lifespan, small applications could still wait for resources for > long time and SLAs cannot be guaranteed. > Instead of wait for application release resources on their own, we need to > preempt resources of queue with fairness policy enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.
[ https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230992#comment-15230992 ] Junping Du commented on YARN-4552: -- Hi [~vinodkv], I cannot reproduce this issue in current 2.7 branch. I will double check if I miss something in reproduce process. Let's remove the target version but keep it open for more investigation later. > NM ResourceLocalizationService should check and initialize local filecache > dir (and log dir) even if NM recover is enabled. > --- > > Key: YARN-4552 > URL: https://issues.apache.org/jira/browse/YARN-4552 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4552-v2.patch, YARN-4552.patch > > > In some cases, user are cleanup localized file cache for debugging/trouble > shooting purpose during NM down time. However, after bring back NM (with > recovery enabled), the job submission could be failed for exception like > below: > {noformat} > Diagnostics: java.io.FileNotFoundException: File > /disk/12/yarn/local/filecache does not exist. > {noformat} > This is due to we only create filecache dir when recover is not enabled > during ResourceLocalizationService get initialized/started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.
[ https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4552: - Target Version/s: (was: 2.8.0, 2.7.3, 2.6.5) > NM ResourceLocalizationService should check and initialize local filecache > dir (and log dir) even if NM recover is enabled. > --- > > Key: YARN-4552 > URL: https://issues.apache.org/jira/browse/YARN-4552 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4552-v2.patch, YARN-4552.patch > > > In some cases, user are cleanup localized file cache for debugging/trouble > shooting purpose during NM down time. However, after bring back NM (with > recovery enabled), the job submission could be failed for exception like > below: > {noformat} > Diagnostics: java.io.FileNotFoundException: File > /disk/12/yarn/local/filecache does not exist. > {noformat} > This is due to we only create filecache dir when recover is not enabled > during ResourceLocalizationService get initialized/started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4927: --- Assignee: Bibin A Chundatt (was: Karthik Kambatla) > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4927.patch > > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4821) Have a separate NM timeline publishing-interval
[ https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230903#comment-15230903 ] Naganarasimha G R commented on YARN-4821: - Thanks for the comments [~vinodkv], bq. We should completely decouple these two. If the publishing-interval is configured to be not a multiple of the monitoring-interval, the publisher could only look at the last N values from the monitor before the last cycle. As we discussed in the meeting, IMHO i thought its much simpler for user to configure just the multiple of monitoring interval after which the ATS event will be published for the resource usage. If not user needs to be made aware of the relation between publishing-interval and monitoring interval. So it would be something like *monitoring interval = 3 seconds, publish frequency= 5*, then after 3*5 =15 seconds, average of 5 values will be published . May be i can come up with a WIP patch based on this and discuss whether its fine Will go through YARN-3332 before working on the patch. > Have a separate NM timeline publishing-interval > --- > > Key: YARN-4821 > URL: https://issues.apache.org/jira/browse/YARN-4821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > Currently the interval with which NM publishes container CPU and memory > metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose > default is 3 seconds. This is too aggressive. > There should be a separate configuration that controls how often > {{NMTimelinePublisher}} publishes container metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4821) Have a separate NM timeline publishing-interval
[ https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230813#comment-15230813 ] Vinod Kumar Vavilapalli commented on YARN-4821: --- bq. This proposal is simply to use a different publishing interval just for the timeline publishing +1. We should completely decouple these two. If the publishing-interval is configured to be not a multiple of the monitoring-interval, the publisher could only look at the last N values from the monitor before the last cycle. Can you also please have a read at YARN-3332 and see if you can organize code in a bit of independent way? A related data point for deciding the interval itself - the Hadoop Metrics plugin pulls metrics from all of our daemons and pushes them out periodically - with a default value of 10 sec IIRC. This is the periodicity for most of the production clusters. Assuming adding container-metrics data to this still keep the total outgoing data to the same or immediate order of magnitude (say 250 metrics per NM + (50 containers * 50 metrics)), we should be okay with the same frequency. Anything more frequent will need careful benchmarking. > Have a separate NM timeline publishing-interval > --- > > Key: YARN-4821 > URL: https://issues.apache.org/jira/browse/YARN-4821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > Currently the interval with which NM publishes container CPU and memory > metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose > default is 3 seconds. This is too aggressive. > There should be a separate configuration that controls how often > {{NMTimelinePublisher}} publishes container metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4821) Have a separate NM timeline publishing-interval
[ https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4821: -- Summary: Have a separate NM timeline publishing-interval (was: have a separate NM timeline publishing interval) > Have a separate NM timeline publishing-interval > --- > > Key: YARN-4821 > URL: https://issues.apache.org/jira/browse/YARN-4821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > Currently the interval with which NM publishes container CPU and memory > metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose > default is 3 seconds. This is too aggressive. > There should be a separate configuration that controls how often > {{NMTimelinePublisher}} publishes container metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230620#comment-15230620 ] Sangjin Lee commented on YARN-3461: --- Thanks folks! > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Fix For: YARN-2928 > > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch, YARN-3461-YARN-2928.03.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230556#comment-15230556 ] Varun Saxena commented on YARN-3461: I have committed this to YARN-2928 branch. Thanks [~sjlee0] for your contribution and thanks [~Naganarasimha] and [~gtCarrera9] for additional reviews. > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Fix For: YARN-2928 > > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch, YARN-3461-YARN-2928.03.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230544#comment-15230544 ] Bibin A Chundatt commented on YARN-4927: [~kasha] Apologies for not considering case whr default scheduler could be FairScheduler . Attaching patch to handle the same . > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: 0001-YARN-4927.patch > > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4927: --- Attachment: 0001-YARN-4927.patch > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: 0001-YARN-4927.patch > > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4929) Fix unit test case failures because of removing the minimum wait time for attempt.
[ https://issues.apache.org/jira/browse/YARN-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4929: --- Description: The following unit test cases failed because of we remove the minimum wait time for attempt in YARN-4807 - TestAMRestart.testRMAppAttemptFailuresValidityInterval - TestApplicationMasterService.testResourceTypes - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche was: The following unit test cases failed because of we remove the minimum wait time for attempt. - TestAMRestart.testRMAppAttemptFailuresValidityInterval - TestApplicationMasterService.testResourceTypes - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche > Fix unit test case failures because of removing the minimum wait time for > attempt. > -- > > Key: YARN-4929 > URL: https://issues.apache.org/jira/browse/YARN-4929 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yufei Gu >Assignee: Yufei Gu > > The following unit test cases failed because of we remove the minimum wait > time for attempt in YARN-4807 > - TestAMRestart.testRMAppAttemptFailuresValidityInterval > - TestApplicationMasterService.testResourceTypes > - TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers > - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForFairSche > - TestRMApplicationHistoryWriter.testRMWritingMassiveHistoryForCapacitySche -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230523#comment-15230523 ] Jonathan Maron commented on YARN-4757: -- Sounds like a good approach to me. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > Attachments: YARN-4757- Simplified discovery of services via DNS > mechanisms.pdf > > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230522#comment-15230522 ] Varun Saxena commented on YARN-3461: Latest patch LGTM. Will commit it shortly. > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: YARN-3461-YARN-2928.01.patch, > YARN-3461-YARN-2928.02.patch, YARN-3461-YARN-2928.03.patch > > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230491#comment-15230491 ] Varun Vasudev commented on YARN-4757: - [~jmaron] - given the feedback and the scope of the changes involved here, I think we should just develop this in a branch and file sub tasks to ensure we address concerns like the ones Allen has raised. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > Attachments: YARN-4757- Simplified discovery of services via DNS > mechanisms.pdf > > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4736: -- Labels: (was: yarn-2928-1st-milestone) > Issues with HBaseTimelineWriterImpl > --- > > Key: YARN-4736 > URL: https://issues.apache.org/jira/browse/YARN-4736 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Vrushali C >Priority: Critical > Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, > threaddump.log > > > Faced some issues while running ATSv2 in single node Hadoop cluster and in > the same node had launched Hbase with embedded zookeeper. > # Due to some NPE issues i was able to see NM was trying to shutdown, but the > NM daemon process was not completed due to the locks. > # Got some exception related to Hbase after application finished execution > successfully. > will attach logs and the trace for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230433#comment-15230433 ] Hadoop QA commented on YARN-4928: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 56s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12797535/YARN-4928.003.patch | | JIRA Issue | YARN-4928 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 16095ff576a1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/ha
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230422#comment-15230422 ] Junping Du commented on YARN-4928: -- 003 patch looks good in general. A NIT is we don't use below import pattern in most practices. {noformat} +import org.apache.hadoop.fs.*; {noformat} CC [~gtCarrera9] who is author of related test case. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch, > YARN-4928.003.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230408#comment-15230408 ] Gergely Novák commented on YARN-4928: - Sorry, I used on older (incompatible) branch for the patch, v003 now works for branch-2.8 and trunk. > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch, > YARN-4928.003.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-4928: Attachment: YARN-4928.003.patch > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch, > YARN-4928.003.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
[ https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230349#comment-15230349 ] Varun Vasudev commented on YARN-4876: - Thanks for the document [~asuresh]! Here are my initial thoughts - {code} Add int field 'destroyDelay' to each 'StartContainerRequest':{code} I think we should avoid this for now - we should require that AMs that use initialize() must call destroy and AMs that call start with the ContainerLaunchContext can't call destroy. We can achieve that by adding the destroyDelay field you mentioned in your document but don't allow AMs to set it. If initialize is called, set destroyDelay internally to \-1, else to 0. I'm not saying we should drop the feature, just that we should come back to it once we've sorted out the lifecycle from an initialize->destroy perspective. {code} Modify 'StopContainerRequest' Record: Add boolean 'destroyContainer': {code} Similar to above - let's avoid mixing initialize/destroy with start/stop for now. {code} • Introduce a new 'ContainerEventType.START_CONTAINER' event type. • Introduce a new 'ContainerEventType.DESTROY_CONTAINER' event type. • The Container remains in the LOCALIZED state until it receives the 'START_CONTAINER' event. {code} Can you add a state machine transition diagram to explain how new states and events affect each other? {code} If 'initializeContainer' with a new ContainerLaunchContext is called by the AM while the Container is RUNNING, It is treated as a KILL_CONTAINER event followed by a CONTAINER_RESOURCE_CLEANUP and an INIT_CONTAINER event to kick of re-localization after which the Container will return to LOCALIZED state. {code} I'd really like to avoid this specific behavior. I think we should add an explicit re-initialize API. For a running process, ideally, we want to localize the upgraded bits while the container is running and then kill the existing process to minimize the downtime. For containers where localization can take a long time, forcing a kill and then a re-initialize adds quite a serious amount of downtime. Re-initialize and initialize will probably end up having differing behaviors. On a similar note, I think we might have to introduce a new "re-initalizing/re-localizing/running-localizing state" which implies that a container is running but we are carrying out some background work. In addition, I don't think we can do a cleanup of resources during an upgrade. For services that have local state in the container work dir, we're essentially wiping away all the local state and forcing them to start from scratch. Just a clarification, when you mentioned CONTAINER_RESOURCE_CLEANUP , I'm assuming you meant CLEANUP_CONTAINER_RESOURCES {code} • If 'intializeContainer' is called WITHOUT a new ContainerLaunchContext by the AM, it is considered a restart, and will follow the same code path as 'initializeContainer' with new ContainerLaunchContext, but will not perform a CONTAINER_RESOURCE_CLEANUP and INIT_CONTAINER. The Container process will be killed and the container will be returned to LOCALIZED state. • If 'startContainer' is called WITHOUT a new ContainerLaunchContext by the AM, it treated exactly as the above case, but it will also trigger a START_CONTAINER event. {code} Instead of forcing AMs to make two calls, why don't we just add a restart API that does everything you've outlined above? It's cleaner and we don't have to do as many condition checks. In addition, with a restart API we can do stuff like allowing AMs to specify a delay, or some conditions when the restart should happen. > [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop > -- > > Key: YARN-4876 > URL: https://issues.apache.org/jira/browse/YARN-4876 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-4876-design-doc.pdf > > > Introduce *initialize* and *destroy* container API into the > *ContainerManagementProtocol* and decouple the actual start of a container > from the initialization. This will allow AMs to re-start a container without > having to lose the allocation. > Additionally, if the localization of the container is associated to the > initialize (and the cleanup with the destroy), This can also be used by > applications to upgrade a Container by *re-initializing* with a new > *ContainerLaunchContext* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
[ https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230349#comment-15230349 ] Varun Vasudev edited comment on YARN-4876 at 4/7/16 2:52 PM: - Thanks for the document [~asuresh]! Here are my initial thoughts - {code} Add int field 'destroyDelay' to each 'StartContainerRequest':{code} I think we should avoid this for now - we should require that AMs that use initialize() must call destroy and AMs that call start with the ContainerLaunchContext can't call destroy. We can achieve that by adding the destroyDelay field you mentioned in your document but don't allow AMs to set it. If initialize is called, set destroyDelay internally to \-1, else to 0. I'm not saying we should drop the feature, just that we should come back to it once we've sorted out the lifecycle from an initialize->destroy perspective. {code} Modify 'StopContainerRequest' Record: Add boolean 'destroyContainer': {code} Similar to above - let's avoid mixing initialize/destroy with start/stop for now. {code} • Introduce a new 'ContainerEventType.START_CONTAINER' event type. • Introduce a new 'ContainerEventType.DESTROY_CONTAINER' event type. • The Container remains in the LOCALIZED state until it receives the 'START_CONTAINER' event. {code} Can you add a state machine transition diagram to explain how new states and events affect each other? {code} If 'initializeContainer' with a new ContainerLaunchContext is called by the AM while the Container is RUNNING, It is treated as a KILL_CONTAINER event followed by a CONTAINER_RESOURCE_CLEANUP and an INIT_CONTAINER event to kick of re-localization after which the Container will return to LOCALIZED state. {code} I'd really like to avoid this specific behavior. I think we should add an explicit re-initialize/re-localize API. For a running process, ideally, we want to localize the upgraded bits while the container is running and then kill the existing process to minimize the downtime. For containers where localization can take a long time, forcing a kill and then a re-initialize adds quite a serious amount of downtime. Re-initialize and initialize will probably end up having differing behaviors. On a similar note, I think we might have to introduce a new "re-initalizing/re-localizing/running-localizing state" which implies that a container is running but we are carrying out some background work. In addition, I don't think we can do a cleanup of resources during an upgrade. For services that have local state in the container work dir, we're essentially wiping away all the local state and forcing them to start from scratch. Just a clarification, when you mentioned CONTAINER_RESOURCE_CLEANUP , I'm assuming you meant CLEANUP_CONTAINER_RESOURCES {code} • If 'intializeContainer' is called WITHOUT a new ContainerLaunchContext by the AM, it is considered a restart, and will follow the same code path as 'initializeContainer' with new ContainerLaunchContext, but will not perform a CONTAINER_RESOURCE_CLEANUP and INIT_CONTAINER. The Container process will be killed and the container will be returned to LOCALIZED state. • If 'startContainer' is called WITHOUT a new ContainerLaunchContext by the AM, it treated exactly as the above case, but it will also trigger a START_CONTAINER event. {code} Instead of forcing AMs to make two calls, why don't we just add a restart API that does everything you've outlined above? It's cleaner and we don't have to do as many condition checks. In addition, with a restart API we can do stuff like allowing AMs to specify a delay, or some conditions when the restart should happen. was (Author: vvasudev): Thanks for the document [~asuresh]! Here are my initial thoughts - {code} Add int field 'destroyDelay' to each 'StartContainerRequest':{code} I think we should avoid this for now - we should require that AMs that use initialize() must call destroy and AMs that call start with the ContainerLaunchContext can't call destroy. We can achieve that by adding the destroyDelay field you mentioned in your document but don't allow AMs to set it. If initialize is called, set destroyDelay internally to \-1, else to 0. I'm not saying we should drop the feature, just that we should come back to it once we've sorted out the lifecycle from an initialize->destroy perspective. {code} Modify 'StopContainerRequest' Record: Add boolean 'destroyContainer': {code} Similar to above - let's avoid mixing initialize/destroy with start/stop for now. {code} • Introduce a new 'ContainerEventType.START_CONTAINER' event type. • Introduce a new 'ContainerEventType.DESTROY_CONTAINER' event type. • The Container remains in the LOCALIZED state until it receives the 'START_CONTAINER' event. {code} Can you add a state machine transition diagram to explain how new states and events affect each other? {code} If 'initi
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230307#comment-15230307 ] Gergely Novák commented on YARN-4928: - Uploaded a new patch (v.002) that seems to work as originally intended: like in HDFS-6189 it uses {{System.getProperty("test.build.data", System.getProperty("java.io.tmpdir"))}} as the base directory for MiniDFSCluster, and {{/tmp/...}} as (and only as) HDFS path (in accordance with [~arpitagarwal]'s comment). So it does not use {{C:\tmp}} on the local file system, but still works on Windows too (because {{DFSUtil.isValidName()}} checks the HDFS path, not the local path). [~djp] or [~ste...@apache.org] could you please review? > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230296#comment-15230296 ] Bibin A Chundatt commented on YARN-3971: Test failure looks like not related to patch attached.Due to bind exception the same failed. {noformat} com.sun.jersey.test.framework.spi.container.TestContainerException: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:463) at sun.nio.ch.Net.bind(Net.java:455) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) at com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) at com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) {noformat} > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache
[jira] [Commented] (YARN-3959) Store application related configurations in Timeline Service v2
[ https://issues.apache.org/jira/browse/YARN-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230253#comment-15230253 ] Varun Saxena commented on YARN-3959: I mean should be doable for 1st milestone. > Store application related configurations in Timeline Service v2 > --- > > Key: YARN-3959 > URL: https://issues.apache.org/jira/browse/YARN-3959 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > > We already have configuration field in HBase schema for application entity. > We need to make sure AM write it out when it get launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up
[ https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230248#comment-15230248 ] Jason Lowe commented on YARN-4924: -- Yeah, now that the NM registers with the list of apps it thinks are active and the RM tells it to finish any apps that shouldn't be active we should be covered. We'll need to leave in some recovery code for finished apps so we can clean up any lingering finished app events from the state store, but we can remove the code to store the events. > NM recovery race can lead to container not cleaned up > - > > Key: YARN-4924 > URL: https://issues.apache.org/jira/browse/YARN-4924 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts > > It's probably a small window but we observed a case where the NM crashed and > then a container was not properly cleaned up during recovery. > I will add details in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230172#comment-15230172 ] Hadoop QA commented on YARN-3971: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: patch generated 0 new + 38 unchanged - 1 fixed = 38 total (was 39) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:blac
[jira] [Commented] (YARN-4927) TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the default
[ https://issues.apache.org/jira/browse/YARN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230159#comment-15230159 ] Karthik Kambatla commented on YARN-4927: IIUC, the test is using CS to force a failure of refreshAll. If that is indeed the case, we could just override the MockRM to use an AdminService that just fails the refreshAll? By the way, I haven't started on it yet, so please feel free to take it up. > TestRMHA#testTransitionedToActiveRefreshFail fails when FairScheduler is the > default > > > Key: YARN-4927 > URL: https://issues.apache.org/jira/browse/YARN-4927 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-3893 adds this test, that relies on some CapacityScheduler-specific > stuff for refreshAll to fail, which doesn't apply when using FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230003#comment-15230003 ] Bibin A Chundatt commented on YARN-3971: [~Naganarasimha] Instead of state based approach have added a flag to identify the initStore state and setting the flag once initNodeLabelStore is completed. Please do help in review of the same. > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3971: --- Attachment: 0005-YARN-3971.addendum.patch > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4562) YARN WebApp ignores the configuration passed to it for keystore settings
[ https://issues.apache.org/jira/browse/YARN-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229988#comment-15229988 ] Varun Vasudev commented on YARN-4562: - +1. I'll commit this tomorrow if no one objects. > YARN WebApp ignores the configuration passed to it for keystore settings > > > Key: YARN-4562 > URL: https://issues.apache.org/jira/browse/YARN-4562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: YARN-4562.patch > > > The conf can be passed to WebApps builder, however the following code in > WebApps.java that builds the HttpServer2 object: > {noformat} > if (httpScheme.equals(WebAppUtils.HTTPS_PREFIX)) { > WebAppUtils.loadSslConfiguration(builder); > } > {noformat} > ...results in loadSslConfiguration creating a new Configuration object; the > one that is passed in is ignored, as far as the keystore/etc. settings are > concerned. loadSslConfiguration has another overload with Configuration > parameter that should be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-4928: Attachment: YARN-4928.002.patch > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4928.001.patch, YARN-4928.002.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229910#comment-15229910 ] Hadoop QA commented on YARN-3461: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 42s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 54s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 25s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 18s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 20s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 18s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 9s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 52s {color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 101m 13s {color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:gree
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229904#comment-15229904 ] Hadoop QA commented on YARN-4002: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 19s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 45s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 57s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.webapp.TestRMWithCSRFFilter | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRM
[jira] [Commented] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and add licenses.
[ https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229834#comment-15229834 ] Hadoop QA commented on YARN-4849: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 5s {color} | {color:blue} Shelldocs was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 1s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 39s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s {color} | {color:green} YARN-3368 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s {color} | {color:green} YARN-3368 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 30s {color} | {color:green} YARN-3368 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 22s {color} | {color:green} YARN-3368 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 46s {color} | {color:green} YARN-3368 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 29s {color} | {color:green} YARN-3368 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 16s {color} | {color:green} YARN-3368 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 38s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 8s {color} | {color:green} There were no new shellcheck issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 53 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 13s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 13s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 13s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 118m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.util.TestNativeLibraryChecker | | JDK v1.7.0_95 Timed out junit tests | org.apache.hadoop.util.TestNativeLibraryChecker | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https:/