[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624872#comment-16624872 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 4m 10s{color} | {color:orange} root: The patch generated 9 new + 830 unchanged - 11 fixed = 839 total (was 841) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 4s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 42s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 24s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 48s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}271m 58s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8789 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940923/YARN-8789.6.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f6cf4bac5b04 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624831#comment-16624831 ] Arun Suresh commented on YARN-8808: --- Also, it looks like you need to null check getNodeUtilization() / getAggregatedContainersUtilization() - there seems to be case where you can get an NPE if you don't > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.6.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624780#comment-16624780 ] Billie Rinaldi commented on YARN-8734: -- I am fine with calling the field "dependencies" instead of "remote_service_dependencies." bq. all properties including dependencies should be under the properties section This isn't the case. Dependencies is a field of Component, so it makes sense as a field of Service as well. Overall this patch looks good. I have one question regarding the RM AM liveness monitor. It seems like the AM might get expired by the RM after the yarn.am.liveness-monitor.expiry-interval-ms (default 10 minutes) while the AM is waiting for service dependencies, since the dependency check is performed before the AM has registered with the RM. It looks like AMRMClientAsyncImpl starts up a heartbeat thread when registerApplicationMaster is called. Let's discuss whether it makes sense to perform the dependency check before or after the AM registers with the RM. > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624762#comment-16624762 ] Hadoop QA commented on YARN-8696: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 9s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 35s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 41s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 232 unchanged - 2 fixed = 232 total (was 234) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 | | JIRA Issue | YARN-8696 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940913/YARN-8696-branch-2.v6.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1e6cd07fa9bf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 6056597 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_181
[jira] [Commented] (YARN-8811) Support Container Storage Interface (CSI) in YARN
[ https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624736#comment-16624736 ] Weiwei Yang commented on YARN-8811: --- Hi [~eyang], [~leftnoteasy] Thanks for the feedback. {quote}Mount propagation flags, file system type, source and destination mount points. {quote} [~eyang], thanks for pointing these out. Mount propagation is not in the scope of this feature, in another word we will only support \{{private}} mount here. File system type can be specified in the CSI spec, see more in VolumeCapability - MountVolume - fs_type, so this will depend on the storage system's capability. CSI has API to validate VolumeCapability to avoid invalid calls. Source/destination mount points are roughly introduced in section 5.2. I can add more info for these. For the comment about object store user API key information, I am not sure about this point, could you please elaborate. {quote}The name "IGNORABLE" is ambiguous, {quote} [~leftnoteasy], I am totally fine with other names, maybe *{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}? {quote}user will be configured about is it ignorable by main scheduler / nm or app, etc. And from the name it looks like "no one cares about the resource type {quote} For built-in resources, we'll disallow user to modify the resource type. User will only be allowed to configure type for new user defined resource, and they take the responsibility to make it right. {quote}I prefer to have a String[] tags added to each ResourceInformation {quote} I thought about this before, but gave up after a PoC. Pros of tags is we provide the flexibility to support resource tagging/filtering, AKA filter resources by user given tags. But if we want to support this, then we run into scenario like following, For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 vendors, then user want to mark one device with tag {{vendor_1}}, and the other tag with {{vendor_2}}. So when NMs report resource, {code} // NM1 yarn.io/gpu { tags : ["default", "vendor_1"] ... } // NM2 yarn.io/gpu { tags : ["default", "vendor_2"] ... } {code} Note, they have to be two different {{ResoueceInformation}} instances, so when user ask for {{vendor_1}}, we could allocate {{vendor_1}} gpu correctly on NM1 to the request. If user defines more tags, we will have to deal with more instances, this is going to be complex and hurt the perf. (This is because our resource model is flat, multi-dimensional resource is not supported yet). If you are suggesting just to use static tags, which means a resource can only associate with a certain set of tags. Then I don't see how it is different with resource URI. Am I misunderstanding your point? Please let me know. Thanks > Support Container Storage Interface (CSI) in YARN > - > > Key: YARN-8811 > URL: https://issues.apache.org/jira/browse/YARN-8811 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: Support Container Storage Interface(CSI) in YARN_design > doc_20180921.pdf > > > The Container Storage Interface (CSI) is a vendor neutral interface to bridge > Container Orchestrators and Storage Providers. With the adoption of CSI in > YARN, it will be easier to integrate 3rd party storage systems, and provide > the ability to attach persistent volumes for stateful applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8811) Support Container Storage Interface (CSI) in YARN
[ https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624736#comment-16624736 ] Weiwei Yang edited comment on YARN-8811 at 9/22/18 4:29 PM: Hi [~eyang], [~leftnoteasy] Thanks for the feedback. {quote}Mount propagation flags, file system type, source and destination mount points. {quote} [~eyang], thanks for pointing these out. Mount propagation is not in the scope of this feature, in another word we will only support {{private}} mount here. File system type can be specified in the CSI spec, see more in VolumeCapability - MountVolume - fs_type, so this will depend on the storage system's capability. CSI has API to validate VolumeCapability to avoid invalid calls. Source/destination mount points are roughly introduced in section 5.2. I can add more info for these. For the comment about object store user API key information, I am not sure about this point, could you please elaborate. {quote}The name "IGNORABLE" is ambiguous, {quote} [~leftnoteasy], I am totally fine with other names, maybe *{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}? {quote}user will be configured about is it ignorable by main scheduler / nm or app, etc. And from the name it looks like "no one cares about the resource type {quote} For built-in resources, we'll disallow user to modify the resource type. User will only be allowed to configure type for new user defined resource, and they take the responsibility to make it right. {quote}I prefer to have a String[] tags added to each ResourceInformation {quote} I thought about this before, but gave up after a PoC. Pros of tags is we provide the flexibility to support resource tagging/filtering, AKA filter resources by user given tags. But if we want to support this, then we run into scenario like following, For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 vendors, then user want to mark one device with tag {{vendor_1}}, and the other tag with {{vendor_2}}. So when NMs report resource, {code:java} // NM1 yarn.io/gpu { tags : ["default", "vendor_1"] ... } // NM2 yarn.io/gpu { tags : ["default", "vendor_2"] ... } {code} Note, they have to be two different {{ResoueceInformation}} instances, so when user ask for {{vendor_1}}, we could allocate {{vendor_1}} gpu correctly on NM1 to the request. If user defines more tags, we will have to deal with more instances, this is going to be complex and hurt the perf. (This is because our resource model is flat, multi-dimensional resource is not supported yet). If you are suggesting just to use static tags, which means a resource can only associate with a certain set of tags. Then I don't see how it is different with resource URI. Am I misunderstanding your point? Please let me know. Thanks was (Author: cheersyang): Hi [~eyang], [~leftnoteasy] Thanks for the feedback. {quote}Mount propagation flags, file system type, source and destination mount points. {quote} [~eyang], thanks for pointing these out. Mount propagation is not in the scope of this feature, in another word we will only support \{{private}} mount here. File system type can be specified in the CSI spec, see more in VolumeCapability - MountVolume - fs_type, so this will depend on the storage system's capability. CSI has API to validate VolumeCapability to avoid invalid calls. Source/destination mount points are roughly introduced in section 5.2. I can add more info for these. For the comment about object store user API key information, I am not sure about this point, could you please elaborate. {quote}The name "IGNORABLE" is ambiguous, {quote} [~leftnoteasy], I am totally fine with other names, maybe *{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}? {quote}user will be configured about is it ignorable by main scheduler / nm or app, etc. And from the name it looks like "no one cares about the resource type {quote} For built-in resources, we'll disallow user to modify the resource type. User will only be allowed to configure type for new user defined resource, and they take the responsibility to make it right. {quote}I prefer to have a String[] tags added to each ResourceInformation {quote} I thought about this before, but gave up after a PoC. Pros of tags is we provide the flexibility to support resource tagging/filtering, AKA filter resources by user given tags. But if we want to support this, then we run into scenario like following, For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 vendors, then user want to mark one device with tag {{vendor_1}}, and the other tag with {{vendor_2}}. So when NMs report resource, {code} // NM1 yarn.io/gpu { tags : ["default", "vendor_1"] ... } // NM2 yarn.io/gpu { tags : ["default", "vendor_2"] ... } {code} Note, they have to be two different
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Attachment: YARN-8696-branch-2.v6.patch > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Attachment: (was: YARN-8696-branch-2.v6.patch) > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, > YARN-8696.v3.patch, YARN-8696.v4.patch, YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org