[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802764#comment-17802764
 ] 

Shilun Fan commented on YARN-4495:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, client
>Reporter: sandflee
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2016-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612914#comment-15612914
 ] 

Xuan Gong commented on YARN-4495:
-

We need more discussion on this. Cancel the patch

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, client
>Reporter: sandflee
>  Labels: oct16-hard
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075380#comment-15075380
 ] 

Wangda Tan commented on YARN-4495:
--

Thanks for investigations, [~sandflee].

I feel we should make this JIRA to be more generic helpful:
- Current any issue of regular resource request becomes 
InvalidResourceRequestException, but AM doesn't know which resource request is 
failed. Similar to change container resource request.
- Some resource request / change container resource request could be rejected 
after it added to scheduler. (For example, queue's accessible-node-label 
changed could cause some original valid resource request changes to invalid.)
- Maybe we can add a "RejectedResourceRequest" proto to AllocateResponse, and 
it contains list of rejected regular resource request, and list of 
increase/decrease container request.

Since it is an API change, please expect that more time/discussion required to 
settle it down. Set target version to 2.9.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-30 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075570#comment-15075570
 ] 

sandflee commented on YARN-4495:


thanks [~wangda], hoping more suggestions

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-30 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075043#comment-15075043
 ] 

sandflee commented on YARN-4495:


Hi, [~leftnoteasy], I do a simple test throwing a exception with applicationId 
in ApplicationMasterService#allocate, and couldn't get applicationId in rpc 
client. I read the code, and find that exception is put in rpcHeader only with 
exceptionName and stackTrack info.

{code:title=RpcHeader.proto}

message RpcResponseHeaderProto {
  required uint32 callId = 1; // callId used in Request
  required RpcStatusProto status = 2;
  optional uint32 serverIpcVersionNum = 3; // Sent if success or fail
  optional string exceptionClassName = 4;  // if request fails
  optional string errorMsg = 5;  // if request fails, often contains strack 
trace
  optional RpcErrorCodeProto errorDetail = 6; // in case of error
  optional bytes clientId = 7; // Globally unique client ID
  optional sint32 retryCount = 8 [default = -1];
}
{code}


> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074052#comment-15074052
 ] 

sandflee commented on YARN-4495:


better to pass why resource change request is failed.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074082#comment-15074082
 ] 

MENG DING commented on YARN-4495:
-

I guess my point is if you are not going to do any automated action upon 
exception, it might be sufficient to just look at AM log to see why a resource 
request has failed via the exception message. The key point we are addressing 
here is to not let AMRMClientAsync stop while invalid resource request 
exception occurs, which is fatal to user logic.

If you have use case to do automated actions based on various failed reasons, 
then it may be needed to enhance the protocol, which I believe the community 
tend not to do unless absolutely necessary.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074028#comment-15074028
 ] 

MENG DING commented on YARN-4495:
-

Hi, [~sandflee]

What is your specific use case? Do you plan to catch detailed information 
regarding which container change request is causing the exception, and do 
something about it? If so, what will the action be? 

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074049#comment-15074049
 ] 

sandflee commented on YARN-4495:


 the main problem is we couldn't pass containerId to 
invaildResourceRequestException. we could convert containerId to String and 
pass to it, but seems a little trick.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074473#comment-15074473
 ] 

sandflee commented on YARN-4495:


to [~mding],  
1,  we have a StateMachine in AM to track every instance(corresponding to a 
container), if we updating container resource, it will come to UPDATING state, 
and waiting for a relpy to back to RUNNING state.
2,  containerId is essential, we must remove corresponding containerId from 
pendingChange to avoid resend this containerId request.

to [~wangda],
ok, I'll have a test

thanks 

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074308#comment-15074308
 ] 

Wangda Tan commented on YARN-4495:
--

Hi [~sandflee],

bq. .. one problem, seems hadoop rpc could only transfer simple exceptions(only 
this a msg of string), that's to say, we couldn't add containerInfo to 
InvalidResourceRequestException.
I'm not sure if the statement is true, I found javadoc of 
ApplicationMasterProtocol states it could throw IRRE. I would suggest to write 
a test program to see if IRRE could be caught by client.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073347#comment-15073347
 ] 

sandflee commented on YARN-4495:


Hi [~jianhe] [~wangda] [~mding] , do you think the change is necessary ?  if 
so, I'll continue work on this. 

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073355#comment-15073355
 ] 

MENG DING commented on YARN-4495:
-

Hi, [~sandflee]

I was just thinking would it be simpler to modify the AMRMClientAsync to catch 
the InvalidResourceRequestException, and  then to NOT stop the 
heartbeat/callback handler threads? I feel that it is unnecessary to stop the 
AMRMClientAsync just because an invalid resource request is being submitted. AM 
can still be notified through onError(), and handle the exception accordingly. 
This should cover your main use case, right? My 2 cents.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073375#comment-15073375
 ] 

Wangda Tan commented on YARN-4495:
--

+1 to what [~mding] suggested,

AMRMClientAsync should handle InvalidResourceRequestException properly. 

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072734#comment-15072734
 ] 

Hadoop QA commented on YARN-4495:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 59s 
{color} | {color:red} Patch generated 22 new checkstyle issues in root (total 
was 140, now 159). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 16s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 2 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 42s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 20s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073400#comment-15073400
 ] 

sandflee commented on YARN-4495:


Thanks [~mding],  [~wangda], yes this could simple the code and cover the main 
use case .
1,  we should add containerInfo to InvaildResourceRequest,  should we use a new 
exception like invaildResourceChangeException ? we could just catch this 
exception.
2,  I suggest to add a callback like onResourceChangeFailed(List)  
to make api more user friend.
3,  scheduler may drop request, any suggestion?  check normalizedResource 
before do scheduler.allocate()?
{quote}
and scheduler may drop container resize request if target resource equals to 
RMContainer allocatedResource, the problem is AM knows nothing about container 
resource normalizition. so: if AM requests resource decrease 8G -> 7.5G, and 
suppose 7.5G is normalized to 8G, rm will drop this request, and will leave AM 
waiting from the reply.
{quote}

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-28 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073472#comment-15073472
 ] 

sandflee commented on YARN-4495:


[~mding] [~wangda] one problem, seems hadoop rpc could only transfer simple 
exceptions(only this a msg of string),  that's to say, we couldn't add 
containerInfo to InvalidResourceRequestException.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-27 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072387#comment-15072387
 ] 

sandflee commented on YARN-4495:


RM will pass InvalidResourceRequestException to AM in below conditions, 

* deduped containerChangeRequest
* invaild ContainerChangeRequest  requestConainerSize < 0 or > max
* rmContainer == null
* rmContainer.state != RUNNING
* increaseRequest  targeResource < allocatedResource  or decreaseRequest 
targetResource > allocatedResource
* nodeResource < increaseRequest targetResource

 this will cause AMRMClientAsync down, and this will result AM down. it's not 
user friendly. especially some condition are out of AM's control.
* rmContainer == null ,  maybe RM is recovering, and the corresponding 
RMContainer has not recovered.
* rmContainer.state != RUNNING,   maybe container is completed and the complete 
msg has not pulled by AM yet.
* increaseRequest  targeResource < allocatedResource  or decreaseRequest 
targetResource > allocatedResource.  
1,  AM increase resource  1G -> 10G, resource couldn't be satisfied and is 
pending
2,  after a time, AM send a new resourceIncreaseRequest from 1G->5G
3, 10G resource request is satisfied and RMContainer allocatedResource becomes 
10G when new resourceIncreaseRequest comes to RM
4,  RM checks ResourceIncreaseRequest, and find the target resource is less 
than RMContainer allocated resource 
* nodeResource < increaseRequest targetResource, AM knowns nothing of node 
resource , this should be covered by maximumAllocation.

and scheduler may drop container resize request if target resource equals to 
RMContainer allocatedResource, the problem is AM knows nothing about container 
resource normalizition. so: if AM requests resource decrease 8G -> 7.5G, and 
suppose 7.5G is normalized to 8G, rm will drop this request, and will leave AM 
waiting from the reply.

so above all , I sugget to add a msg to AllocateResponse instead of throw 
InvalidResourceRequestException or drop the change request.  

hoping for your comments and suggestions!




> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)