[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624872#comment-16624872
 ] 

Hadoop QA commented on YARN-8789:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m 10s{color} | {color:orange} root: The patch generated 9 new + 830 unchanged 
- 11 fixed = 839 total (was 841) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
4s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 42s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
24s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
48s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}271m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8789 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940923/YARN-8789.6.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f6cf4bac5b04 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-22 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624831#comment-16624831
 ] 

Arun Suresh commented on YARN-8808:
---

Also, it looks like you need to null check getNodeUtilization() / 
getAggregatedContainersUtilization() - there seems to be case where you can get 
an NPE if you don't


> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-22 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.6.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, 
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service

2018-09-22 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624780#comment-16624780
 ] 

Billie Rinaldi commented on YARN-8734:
--

I am fine with calling the field "dependencies" instead of 
"remote_service_dependencies."

bq. all properties including dependencies should be under the properties section
This isn't the case. Dependencies is a field of Component, so it makes sense as 
a field of Service as well.

Overall this patch looks good. I have one question regarding the RM AM liveness 
monitor. It seems like the AM might get expired by the RM after the 
yarn.am.liveness-monitor.expiry-interval-ms (default 10 minutes) while the AM 
is waiting for service dependencies, since the dependency check is performed 
before the AM has registered with the RM. It looks like AMRMClientAsyncImpl 
starts up a heartbeat thread when registerApplicationMaster is called. Let's 
discuss whether it makes sense to perform the dependency check before or after 
the AM registers with the RM.

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624762#comment-16624762
 ] 

Hadoop QA commented on YARN-8696:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
 9s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
41s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
6s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 232 unchanged - 2 fixed = 232 total (was 234) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
46s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 
41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 |
| JIRA Issue | YARN-8696 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940913/YARN-8696-branch-2.v6.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1e6cd07fa9bf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 6056597 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_181 

[jira] [Commented] (YARN-8811) Support Container Storage Interface (CSI) in YARN

2018-09-22 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624736#comment-16624736
 ] 

Weiwei Yang commented on YARN-8811:
---

Hi [~eyang], [~leftnoteasy]

Thanks for the feedback.
{quote}Mount propagation flags, file system type, source and destination mount 
points.
{quote}
[~eyang], thanks for pointing these out. Mount propagation is not in the scope 
of this feature, in another word we will only support \{{private}} mount here. 
File system type can be specified in the CSI spec, see more in VolumeCapability 
- MountVolume - fs_type, so this will depend on the storage system's 
capability. CSI has API to validate VolumeCapability to avoid invalid calls. 
Source/destination mount points are roughly introduced in section 5.2. I can 
add more info for these. For the comment about object store user API key 
information, I am not sure about this point, could you please elaborate.
{quote}The name "IGNORABLE" is ambiguous, 
{quote}
[~leftnoteasy], I am totally fine with other names, maybe 
*{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}?
{quote}user will be configured about is it ignorable by main scheduler / nm or 
app, etc. And from the name it looks like "no one cares about the resource type
{quote}
For built-in resources, we'll disallow user to modify the resource type. User 
will only be allowed to configure type for new user defined resource, and they 
take the responsibility to make it right. 
{quote}I prefer to have a String[] tags added to each ResourceInformation
{quote}
I thought about this before, but gave up after a PoC. Pros of tags is we 
provide the flexibility to support resource tagging/filtering, AKA filter 
resources by user given tags. But if we want to support this, then we run into 
scenario like following,

For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 
vendors, then user want to mark one device with tag {{vendor_1}}, and the other 
tag with {{vendor_2}}. 

So when NMs report resource,

{code}

// NM1

yarn.io/gpu {

  tags : ["default", "vendor_1"]

  ...

}

// NM2

yarn.io/gpu {

   tags : ["default", "vendor_2"]

   ...

}

{code}

Note, they have to be two different {{ResoueceInformation}} instances, so when 
user ask for {{vendor_1}}, we could allocate {{vendor_1}} gpu correctly on NM1 
to the request. If user defines more tags, we will have to deal with more 
instances, this is going to be complex and hurt the perf. (This is because our 
resource model is flat, multi-dimensional resource is not supported yet).

If you are suggesting just to use static tags, which means a resource can only 
associate with a certain set of tags. Then I don't see how it is different with 
resource URI. Am I misunderstanding your point? Please let me know.

Thanks

> Support Container Storage Interface (CSI) in YARN
> -
>
> Key: YARN-8811
> URL: https://issues.apache.org/jira/browse/YARN-8811
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: Support Container Storage Interface(CSI) in YARN_design 
> doc_20180921.pdf
>
>
> The Container Storage Interface (CSI) is a vendor neutral interface to bridge 
> Container Orchestrators and Storage Providers. With the adoption of CSI in 
> YARN, it will be easier to integrate 3rd party storage systems, and provide 
> the ability to attach persistent volumes for stateful applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8811) Support Container Storage Interface (CSI) in YARN

2018-09-22 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624736#comment-16624736
 ] 

Weiwei Yang edited comment on YARN-8811 at 9/22/18 4:29 PM:


Hi [~eyang], [~leftnoteasy]

Thanks for the feedback.
{quote}Mount propagation flags, file system type, source and destination mount 
points.
{quote}
[~eyang], thanks for pointing these out. Mount propagation is not in the scope 
of this feature, in another word we will only support {{private}} mount here. 
File system type can be specified in the CSI spec, see more in VolumeCapability 
- MountVolume - fs_type, so this will depend on the storage system's 
capability. CSI has API to validate VolumeCapability to avoid invalid calls. 
Source/destination mount points are roughly introduced in section 5.2. I can 
add more info for these. For the comment about object store user API key 
information, I am not sure about this point, could you please elaborate.
{quote}The name "IGNORABLE" is ambiguous, 
{quote}
[~leftnoteasy], I am totally fine with other names, maybe 
*{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}?
{quote}user will be configured about is it ignorable by main scheduler / nm or 
app, etc. And from the name it looks like "no one cares about the resource type
{quote}
For built-in resources, we'll disallow user to modify the resource type. User 
will only be allowed to configure type for new user defined resource, and they 
take the responsibility to make it right. 
{quote}I prefer to have a String[] tags added to each ResourceInformation
{quote}
I thought about this before, but gave up after a PoC. Pros of tags is we 
provide the flexibility to support resource tagging/filtering, AKA filter 
resources by user given tags. But if we want to support this, then we run into 
scenario like following,

For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 
vendors, then user want to mark one device with tag {{vendor_1}}, and the other 
tag with {{vendor_2}}. 

So when NMs report resource,
{code:java}
// NM1
yarn.io/gpu {
  tags : ["default", "vendor_1"]
  ...
}

// NM2
yarn.io/gpu {
   tags : ["default", "vendor_2"]
   ...
}
{code}
Note, they have to be two different {{ResoueceInformation}} instances, so when 
user ask for {{vendor_1}}, we could allocate {{vendor_1}} gpu correctly on NM1 
to the request. If user defines more tags, we will have to deal with more 
instances, this is going to be complex and hurt the perf. (This is because our 
resource model is flat, multi-dimensional resource is not supported yet).

If you are suggesting just to use static tags, which means a resource can only 
associate with a certain set of tags. Then I don't see how it is different with 
resource URI. Am I misunderstanding your point? Please let me know.

Thanks


was (Author: cheersyang):
Hi [~eyang], [~leftnoteasy]

Thanks for the feedback.
{quote}Mount propagation flags, file system type, source and destination mount 
points.
{quote}
[~eyang], thanks for pointing these out. Mount propagation is not in the scope 
of this feature, in another word we will only support \{{private}} mount here. 
File system type can be specified in the CSI spec, see more in VolumeCapability 
- MountVolume - fs_type, so this will depend on the storage system's 
capability. CSI has API to validate VolumeCapability to avoid invalid calls. 
Source/destination mount points are roughly introduced in section 5.2. I can 
add more info for these. For the comment about object store user API key 
information, I am not sure about this point, could you please elaborate.
{quote}The name "IGNORABLE" is ambiguous, 
{quote}
[~leftnoteasy], I am totally fine with other names, maybe 
*{color:#33}UNMANAGED{color}* {color:#33}is a better one{color}?
{quote}user will be configured about is it ignorable by main scheduler / nm or 
app, etc. And from the name it looks like "no one cares about the resource type
{quote}
For built-in resources, we'll disallow user to modify the resource type. User 
will only be allowed to configure type for new user defined resource, and they 
take the responsibility to make it right. 
{quote}I prefer to have a String[] tags added to each ResourceInformation
{quote}
I thought about this before, but gave up after a PoC. Pros of tags is we 
provide the flexibility to support resource tagging/filtering, AKA filter 
resources by user given tags. But if we want to support this, then we run into 
scenario like following,

For example, for a resource {{yarn.io/gpu}}, a cluster has 2 GPUs from 2 
vendors, then user want to mark one device with tag {{vendor_1}}, and the other 
tag with {{vendor_2}}. 

So when NMs report resource,

{code}

// NM1

yarn.io/gpu {

  tags : ["default", "vendor_1"]

  ...

}

// NM2

yarn.io/gpu {

   tags : ["default", "vendor_2"]

   ...

}

{code}

Note, they have to be two different 

[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-22 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8696:
---
Attachment: YARN-8696-branch-2.v6.patch

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-22 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8696:
---
Attachment: (was: YARN-8696-branch-2.v6.patch)

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, 
> YARN-8696.v3.patch, YARN-8696.v4.patch, YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org