[jira] [Commented] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629308#comment-14629308
 ] 

Hudson commented on YARN-3174:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8171 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8171/])
YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation 
into one. Contributed by Masatake Iwasaki. (ozawa: rev 
f02dd146f58bcfa0595eec7f2433bafdd857630f)
* hadoop-project/src/site/site.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md


> Consolidate the NodeManager and NodeManagerRestart documentation into one
> -
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0
>
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3805) Update the documentation of Disk Checker based on YARN-90

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629302#comment-14629302
 ] 

Tsuyoshi Ozawa commented on YARN-3805:
--

[~iwasakims] could you rebase it?

> Update the documentation of Disk Checker based on YARN-90
> -
>
> Key: YARN-3805
> URL: https://issues.apache.org/jira/browse/YARN-3805
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-3805.001.patch
>
>
> NodeManager is able to recover status of the disk once broken and fixed 
> without restarting by YARN-90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629300#comment-14629300
 ] 

Hadoop QA commented on YARN-3535:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  5 
new checkstyle issues (total was 338, now 343). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 30s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 45s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745572/0005-YARN-3535.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8554/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8554/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8554/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8554/console |


This message was automatically generated.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629296#comment-14629296
 ] 

zhihai xu commented on YARN-3535:
-

Sorry for coming late into this issue.
The latest Patch looks good to me except one nit:
Can we make {{ContainerRescheduledTransition}} child class of 
{{FinishedTransition}} similar as {{KillTransition}}?
So we can call {{super.transition(container, event);}} instead of {{new 
FinishedTransition().transition(container, event);}}.
I think this will make the code more readable and match other transition class 
implementation.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2015-07-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629288#comment-14629288
 ] 

Karthik Kambatla commented on YARN-3926:


Thanks a bunch for putting this proposal together, Varun. We are in dire need 
of improvements to our resource-model, and the proposal goes a long way in 
addressing some of these issues. Huge +1 to this effort. 

Comments on the proposal itself:
# There is a significant overlap between resource-types.xml and 
node-resources.xml. It would be nice to consolidate at least these parts. 
# Can we avoid the mismatch between the resource types on RM and NM altogether?
# Can we avoid different restart paths for adding and removing resources? 
# Really like the concise configs proposed at the end of the document. 

What do you think of the following modifications to the proposal to address 
above wishes? I have clearly not thought as much before making these 
suggestions, so please feel free to shoot them down. 
# How about calling them yarn.resource-types, yarn.resource-types.memory.*, 
yarn.resource-types.cpu.*. Further, memory/cpu specific configs could be made 
simpler per the suggestions later in the document?
# yarn.scheduler.resource-types is a subset of yarn.resource-types, and 
captures the resource-types the scheduler supports. This could be in yarn-site 
on RM.
# yarn.nodemanager.resource-types.monitored and 
yarn.nodemanager.resource-types.enforced also are subsets of 
yarn.resource-types and could define the resources the NM monitors and enforces 
respectively. These could be in yarn-site on the NM. I understand isolation is 
out of scope here, but would be nice to have configs that lend themselves to 
future work.
# yarn.nodemanager.[resources|resource-types].available could be a map where 
each key should be an entry in yarn.resource-types. 

You mention capturing node-labels etc. similarly. Could you elaborate on your 
thoughts, at least informally? Would be super nice to have a path in mind even 
if we were to do as follow-up work. 

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one

2015-07-15 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629283#comment-14629283
 ] 

Masatake Iwasaki commented on YARN-3174:


Thanks, [~ozawa]!

> Consolidate the NodeManager and NodeManagerRestart documentation into one
> -
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0
>
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3852) Add docker container support to container-executor

2015-07-15 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629276#comment-14629276
 ] 

Varun Vasudev commented on YARN-3852:
-

Thanks for the patch [~ashahab]. The patch isn't working for me. There are two 
issues -
# No default value for "docker.binary". I think we should assume this to be 
"docker" and allow it to be overriden.
# The docker launch fails due to 
{code}
if (change_effective_user(user_uid, user_gid) != 0)
{code}
in launch_docker_container_as_user. For docker run to work, the effective user 
needs to be root(something like change_effective_user(0, user_gid) is probably 
the right way).

Some other issues -
# 
{code}
-static const char* DEFAULT_BANNED_USERS[] = {"yarn", "mapred", "hdfs", "bin", 
0};
+static const char* DEFAULT_BANNED_USERS[] = {"mapred", "hdfs", "bin", 0};
{code}
Why are you removing the yarn user from the banned users? I'm guessing this is 
due to a branch-2/trunk issue. The yarn user is banned in trunk but not in 
branch-2
# A couple of formatting fixes
{code}
+   fprintf(LOGFILE, "done opening pid\n");
+fflush(LOGFILE);
{code}
and
{code}
+fprintf(LOGFILE, "done writing pid to tmp\n");
+ fflush(LOGFILE);
{code}
# Can we change the error message in the message below to a more descriptive 
one?
{code}
+ fprintf(ERRORFILE, "Error reading\n");
+ fflush(ERRORFILE);
{code}
# In parse_docker_command_file
{code}
+  int read;
{code}
should we use ssize_t instead or int?
# In parse_docker_command_file, we have some exit(1) calls - can we change this 
to use the error codes in container-executor.h?
# In run_docker
{code}
+  free(docker_binary);
+  free(args);
+  free(docker_command_with_binary);
+  free(docker_command);
+  exit_code = DOCKER_RUN_FAILED;
+  }
+  exit_code = 0;
+  return exit_code;
{code}
The exit code from the function will always be 0
# Formatting
{code}
+int create_script_paths(const char *work_dir,
+  const char *script_name, const char *cred_file,
+ char** script_file_dest, char** cred_file_dest,
+ int* container_file_source, int* cred_file_source ) {
{code}
# In create_script_paths, we use a bunch of goto's but the goto target doesn't 
have any special logic or handling. Can we avoid using the goto?
# 
{code}
+//kill me now.
{code}
No need for the commentary :)
# In main.c
{code}
+char * resources = argv[optind++];// key,value pair describing resources
+char * resources_key = malloc(strlen(resources));
+char * resources_value = malloc(strlen(resources));
{code}

Can we move the declarations of resources, resources_key and resources_value 
out of the case block(since the same variables are used in two case blocks)?



> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
> Attachments: YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3174:
-
Affects Version/s: 2.7.1

> Consolidate the NodeManager and NodeManagerRestart documentation into one
> -
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0
>
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3174:
-
Component/s: documentation

> Consolidate the NodeManager and NodeManagerRestart documentation into one
> -
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.1
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0
>
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629268#comment-14629268
 ] 

Sunil G commented on YARN-2005:
---

Thanks [~adhoot]. Sorry for delayed response.

bq.The nodes are removed from blacklist once the launch of the AM happens to 
limit this issue.
Yes. I feel this will be fine. 

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3174:
-
Summary: Consolidate the NodeManager and NodeManagerRestart documentation 
into one  (was: Consolidate the NodeManager documentation into one)

> Consolidate the NodeManager and NodeManagerRestart documentation into one
> -
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3174) Consolidate the NodeManager documentation into one

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629263#comment-14629263
 ] 

Tsuyoshi Ozawa commented on YARN-3174:
--

+1

> Consolidate the NodeManager documentation into one
> --
>
> Key: YARN-3174
> URL: https://issues.apache.org/jira/browse/YARN-3174
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>Assignee: Masatake Iwasaki
> Attachments: YARN-3174.001.patch
>
>
> We really don't need a different document for every individual nodemanager 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629253#comment-14629253
 ] 

Arun Suresh commented on YARN-3535:
---

The patch looks good !!
Thanks for working on this [~peng.zhang] and [~rohithsharma]

+1, Pending successful jenkins run with latest patch

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2015-07-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629250#comment-14629250
 ] 

Akira AJISAKA commented on YARN-2578:
-

Thanks [~iwasakims] for creating the patch. One comment and one question from 
me.
bq. The default value is 0 in order to keep current behaviour.
1. We would like to fix this bug, so default to 1min is good for me.
2. Would you tell me why {{Client.getRpcTimeout}} returns 0 if 
{{ipc.client.ping}} is false?

> NM does not failover timely if RM node network connection fails
> ---
>
> Key: YARN-2578
> URL: https://issues.apache.org/jira/browse/YARN-2578
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-2578.002.patch, YARN-2578.patch
>
>
> The NM does not fail over correctly when the network cable of the RM is 
> unplugged or the failure is simulated by a "service network stop" or a 
> firewall that drops all traffic on the node. The RM fails over to the standby 
> node when the failure is detected as expected. The NM should than re-register 
> with the new active RM. This re-register takes a long time (15 minutes or 
> more). Until then the cluster has no nodes for processing and applications 
> are stuck.
> Reproduction test case which can be used in any environment:
> - create a cluster with 3 nodes
> node 1: ZK, NN, JN, ZKFC, DN, RM, NM
> node 2: ZK, NN, JN, ZKFC, DN, RM, NM
> node 3: ZK, JN, DN, NM
> - start all services make sure they are in good health
> - kill the network connection of the RM that is active using one of the 
> network kills from above
> - observe the NN and RM failover
> - the DN's fail over to the new active NN
> - the NM does not recover for a long time
> - the logs show a long delay and traces show no change at all
> The stack traces of the NM all show the same set of threads. The main thread 
> which should be used in the re-register is the "Node Status Updater" This 
> thread is stuck in:
> {code}
> "Node Status Updater" prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
> Object.wait() [0x7f5a51fc1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0xed62f488> (a org.apache.hadoop.ipc.Client$Call)
>   at java.lang.Object.wait(Object.java:503)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   - locked <0xed62f488> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> {code}
> The client connection which goes through the proxy can be traced back to the 
> ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
> should be using a version which takes the RPC timeout (from the 
> configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629249#comment-14629249
 ] 

Arun Suresh commented on YARN-3535:
---

I meant for the FairScheduler... but looks like your new patch has it... thanks

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629208#comment-14629208
 ] 

Peng Zhang commented on YARN-3535:
--

Thanks [~rohithsharma] for updating patch. 
patch LGTM.

bq. One point to be clear that , here the assumption made is if RMContainer is 
ALLOCATED then only recover ResourceRequest. If RMcontainer is in RUNNING, then 
completed container will go to AM in allocate response and AM will ask new 
ResourceRequest.

During running in our scale cluster with FS and preemption enabled, MapReduce 
app works good with this assumption.
Basically, I think this assumption make sense for other type app.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629168#comment-14629168
 ] 

Rohith Sharma K S commented on YARN-3535:
-

One point to be clear that , here the assumption made is if RMContainer is 
ALLOCATED then only recover ResourceRequest. If RMcontainer is in RUNNING, then 
 completed container will go to AM in allocate response and AM will ask new 
ResourceRequest. 

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3535:

Attachment: 0005-YARN-3535.patch

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, 
> yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629157#comment-14629157
 ] 

Rohith Sharma K S commented on YARN-3535:
-

ahh, right.. it can be removed.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629144#comment-14629144
 ] 

Rohith Sharma K S commented on YARN-3535:
-

Yes, {{TestCapacityScheduler#testRecoverRequestAfterPreemption}} simulates this.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-07-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629141#comment-14629141
 ] 

Sandy Ryza commented on YARN-3635:
--

BTW I got all this from QueuePlacementPolicy and QueuePlacementRule, which are 
pretty quick reads if you want to take a look.

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-07-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629139#comment-14629139
 ] 

Sandy Ryza commented on YARN-3635:
--

[~leftnoteasy], apologies for this quick drive-by review - I am currently 
traveling.

The JIRA appears to be lacking a design-doc and I wasn't able to find 
documentation in the patch.  The patch should ultimately include some detailed 
documentation, but I don't want to ask this of you before OKing the approach.  
In light of this, a few questions:
* What steps are required for the Fair Scheduler to integrate with this?
* Is a common way of configuration proposed?
* How does this differ from the current Fair Scheduler model?  To summarize:
** The FS model consists of a sequence of placement rules that the app is 
passed through.
** Each placement rule gets the chance to assign the app to a queue, reject the 
app, or pass.  If it passes, the next rule gets a chance.
** A placement rule can base its decision on:
*** The submitting user.
*** The set of groups the submitting user belongs to.
*** The queue requested in the app submission.
*** A set of configuration options that are specific to the rule.
*** The set of queues given in the Fair Scheduler configuration.
** Rules are marked as "terminal" if they will never pass.  This helps to avoid 
misconfigurations where users place rules after terminal rules.
** Rules have a "create" attribute which determines whether they can create a 
new queue or whether they must assign to existing queues.
** Currently the set of placement rules is limited to what's implemented in 
YARN.  I.e. there's no public pluggable rule support.
I noticed from Vinod's comment that this patch follows a similar structure.  
Are there places where my summary would not describe what's going on in this 
patch?



> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-07-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-3635:


Assignee: Sandy Ryza  (was: Wangda Tan)

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Sandy Ryza
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-07-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-3635:
-
Assignee: Tan, Wangda  (was: Sandy Ryza)

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3928) launch application master on specific host

2015-07-15 Thread Wenrui (JIRA)
Wenrui created YARN-3928:


 Summary: launch application master on specific host
 Key: YARN-3928
 URL: https://issues.apache.org/jira/browse/YARN-3928
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 2.6.0
 Environment: Ubuntu 12.04
Reporter: Wenrui


Hi, 
Is there a way to launch application master on a specific host ?
If we can not do this in a managed-AM-launcher? 
then is it possible to achieve this in unmanaged-AM-launcher?

I just find it's quite necessary to set application master on a specific host 
in some  scenes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3575) Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3575:
--
Labels: 2.6.1-candidate  (was: )

> Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted
> -
>
> Key: YARN-3575
> URL: https://issues.apache.org/jira/browse/YARN-3575
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>  Labels: 2.6.1-candidate
>
> Trying to launch a job that uses the 2.5 jars fails on a 2.6 cluster whose RM 
> has been restarted (i.e.: epoch != 0) becaue the epoch number starts 
> appearing in the container IDs and the 2.5 jars no longer know how to parse 
> the container IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3369:
--
Labels: 2.6.1-candidate  (was: )

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3369-003.patch, YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3103) AMRMClientImpl does not update AMRM token properly

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3103:
--
Labels: 2.6.1-candidate  (was: )

> AMRMClientImpl does not update AMRM token properly
> --
>
> Key: YARN-3103
> URL: https://issues.apache.org/jira/browse/YARN-3103
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3103.001.patch
>
>
> AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it 
> to the credentials, so the token is mapped using the newly updated service 
> rather than the empty service that was used when the RM created the original 
> AMRM token.  This leads to two AMRM tokens in the credentials and can still 
> fail if the AMRMTokenSelector picks the wrong one.
> In addition the AMRMClientImpl grabs the login user rather than the current 
> user when security is enabled, so it's likely the UGI being updated is not 
> the UGI that will be used when reconnecting to the RM.
> The end result is that AMs can fail with invalid token errors when trying to 
> reconnect to an RM after a new AMRM secret has been activated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2992:
--
Labels: 2.6.1-candidate  (was: )

> ZKRMStateStore crashes due to session expiry
> 
>
> Key: YARN-2992
> URL: https://issues.apache.org/jira/browse/YARN-2992
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: yarn-2992-1.patch
>
>
> We recently saw the RM crash with the following stacktrace. On session 
> expiry, we should gracefully transition to standby. 
> {noformat}
> 2014-12-18 06:28:42,689 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired 
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2874:
--
Labels: 2.6.1-candidate  (was: )

> Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further 
> apps
> -
>
> Key: YARN-2874
> URL: https://issues.apache.org/jira/browse/YARN-2874
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch
>
>
> When token renewal fails and the application finishes this dead lock can occur
> Jstack dump :
> {quote}
> Found one Java-level deadlock:
> =
> "DelegationTokenRenewer #181865":
>   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
> java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
> "DelayedTokenCanceller":
>   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
>   which is held by "Timer-4"
> "Timer-4":
>   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
> java.util.Collections$SynchronizedSet),
>   which is held by "DelayedTokenCanceller"
>  
> Java stack information for the threads listed above:
> ===
> "DelegationTokenRenewer #181865":
> at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
> - waiting to lock <0xc18a9998> (a 
> java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "DelayedTokenCanceller":
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
> - waiting to lock <0xc7eae720> (a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
> - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
> at java.lang.Thread.run(Thread.java:745)
> "Timer-4":
> at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - waiting to lock <0xc18a9998> (a 
> java.util.Collections$SynchronizedSet)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
> - locked <0xc7eae720> (a 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
>  
> Found 1 deadlock.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3242:
--
Labels: 2.6.1-candidate  (was: )

> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629010#comment-14629010
 ] 

Hadoop QA commented on YARN-2005:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 43s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  7s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |  51m 54s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 20s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745548/YARN-2005.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8553/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8553/console |


This message was automatically generated.

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-15 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2005:

Attachment: YARN-2005.004.patch

Addressed feedback of adding configuration for threshold
Also made the default to true and updated tests 

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628867#comment-14628867
 ] 

Hadoop QA commented on YARN-3925:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 50s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m  8s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  43m 27s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745527/YARN-3925.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8552/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8552/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8552/console |


This message was automatically generated.

> ContainerLogsUtils#getContainerLogFile fails to read container log files from 
> full disks.
> -
>
> Key: YARN-3925
> URL: https://issues.apache.org/jira/browse/YARN-3925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3925.000.patch
>
>
> ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
> {{getContainerLogFile}} depends on 
> {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
> {{LocalDirsHandlerService#getLogPathToRead}} calls 
> {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
> configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
> include full disks in {{LocalDirsHandlerService#checkDirs}}:
> {code}
> Configuration conf = getConfig();
> List localDirs = getLocalDirs();
> conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
> localDirs.toArray(new String[localDirs.size()]));
> List logDirs = getLogDirs();
> conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
>   logDirs.toArray(new String[logDirs.size()]));
> {code}
> ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
> ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628840#comment-14628840
 ] 

Hadoop QA commented on YARN-2410:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 21s | The applied patch generated  
32 new checkstyle issues (total was 60, now 92). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 48s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 26s | Tests passed in 
hadoop-mapreduce-client-shuffle. |
| | |  36m 40s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-mapreduce-client-shuffle |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745525/YARN-2410-v3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-shuffle.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html
 |
| hadoop-mapreduce-client-shuffle test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8551/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8551/console |


This message was automatically generated.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Fix For: 2.7.2
>
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3925:

Attachment: YARN-3925.000.patch

> ContainerLogsUtils#getContainerLogFile fails to read container log files from 
> full disks.
> -
>
> Key: YARN-3925
> URL: https://issues.apache.org/jira/browse/YARN-3925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3925.000.patch
>
>
> ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
> {{getContainerLogFile}} depends on 
> {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
> {{LocalDirsHandlerService#getLogPathToRead}} calls 
> {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
> configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
> include full disks in {{LocalDirsHandlerService#checkDirs}}:
> {code}
> Configuration conf = getConfig();
> List localDirs = getLocalDirs();
> conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
> localDirs.toArray(new String[localDirs.size()]));
> List logDirs = getLogDirs();
> conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
>   logDirs.toArray(new String[logDirs.size()]));
> {code}
> ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
> ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-15 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3925:

Attachment: (was: YARN-3925.000.patch)

> ContainerLogsUtils#getContainerLogFile fails to read container log files from 
> full disks.
> -
>
> Key: YARN-3925
> URL: https://issues.apache.org/jira/browse/YARN-3925
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3925.000.patch
>
>
> ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
> {{getContainerLogFile}} depends on 
> {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
> {{LocalDirsHandlerService#getLogPathToRead}} calls 
> {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
> configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
> include full disks in {{LocalDirsHandlerService#checkDirs}}:
> {code}
> Configuration conf = getConfig();
> List localDirs = getLocalDirs();
> conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
> localDirs.toArray(new String[localDirs.size()]));
> List logDirs = getLogDirs();
> conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
>   logDirs.toArray(new String[logDirs.size()]));
> {code}
> ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
> ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-07-15 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-2410:
--
Attachment: YARN-2410-v3.patch

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Fix For: 2.7.2
>
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628722#comment-14628722
 ] 

Arun Suresh commented on YARN-3535:
---

Also... Is it possible to simulate the 2 cases in the testcase ?

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628713#comment-14628713
 ] 

Li Lu commented on YARN-3814:
-

Oops sorry I was replying a wrong message... 

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628711#comment-14628711
 ] 

Li Lu commented on YARN-3814:
-

OK, thanks [~zjshen]! 

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628708#comment-14628708
 ] 

Zhijie Shen commented on YARN-3814:
---

I didn't go beyond the current reader interface. You're safe:-)

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3927) Make the NodeManager's ContainerManager pluggable

2015-07-15 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-3927:


 Summary: Make the NodeManager's ContainerManager pluggable
 Key: YARN-3927
 URL: https://issues.apache.org/jira/browse/YARN-3927
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan


YARN-2884 proposes proxying all AM-RM communication for:
  * perform distributed scheduling decisions (YARN-2877)
  * throttling mis-behaving AMs
  * mask the access to a federation of RMs (YARN-2915)

To enable all of the above, we are implementing the AMRMProxy as an extension 
to NM's ContainerManagerImpl.This JIRA is for making the ContainerManager 
pluggable so that we allow dynamically swap in of the AMRMProxy.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628688#comment-14628688
 ] 

Arun Suresh commented on YARN-3535:
---

bq. This jira fix 2. Kill Container event in CS. So removing 
recoverResourceRequestForContainer(cont); is make sense to me..
Any reason why we don't remove {{recoverResourceRequestForContainer}} from the 
{{warnOrKillContainer}} method in the FairSheduler ? wont the above situation 
happen in the FS as well.. 

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628681#comment-14628681
 ] 

Li Lu commented on YARN-3914:
-

Hi [~zjshen], do you think this will affect the data schema design of 
aggregation storages as well, or it's an "entity table only" change? I think 
this is independent to the aggregation implementations but would like to double 
check it. Thanks! 

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628680#comment-14628680
 ] 

Varun Saxena commented on YARN-3814:


[~zjshen], did not know about that. Thought YARN-3049 was about HBase backend 
implementation.
We can surely consolidate both.
Let's have REST implementation in this JIRA. And HBase implementation in 
YARN-3049. Thoughts ?

Moreover, parallely I am working on YARN-3862 and YARN-3863. Not sure if there 
is any overlap there.



> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628675#comment-14628675
 ] 

Hadoop QA commented on YARN-3814:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 31s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 18s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 18s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  1s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 58s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 10s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  48m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745505/YARN-3814-YARN-2928.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / eb1932d |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8550/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8550/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8550/console |


This message was automatically generated.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628647#comment-14628647
 ] 

Zhijie Shen commented on YARN-3814:
---

[~varun_saxena], thanks for putting the patch. It seems that we have duplicate 
some work (I'm working on a POC for reader (YARN-3049) which contains some REST 
API hook too). I'll upload a POC patch a bit latter. Let's consolidate them.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3867) ContainerImpl changes to support container resizing

2015-07-15 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-3867:

Attachment: YARN-3867-YARN-1197.3.patch

Update this patch as dependent issues have been updated.

> ContainerImpl changes to support container resizing
> ---
>
> Key: YARN-3867
> URL: https://issues.apache.org/jira/browse/YARN-3867
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-3867-YARN-1197.3.patch, YARN-3867.1.patch, 
> YARN-3867.2.patch
>
>
> 1) ContainerImpl logic changes in NM to handle events related to container 
> resizing
> 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628633#comment-14628633
 ] 

Hadoop QA commented on YARN-3814:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 15s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 16s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 34s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  41m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745499/YARN-3814-YARN-2928.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / eb1932d |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8549/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8549/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8549/console |


This message was automatically generated.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628625#comment-14628625
 ] 

Varun Saxena commented on YARN-3814:


Deleted and updated a new patch as some debug statements were left over in 
previous patch

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628623#comment-14628623
 ] 

Varun Saxena commented on YARN-3814:


+*For Get Entities API,*+

appId and entityType are mandatory parameters for querying multiple entities 
hence they are kept as part of path in REST URL.
Optional query Parameters for get entity API have been kept as *clusterId, 
userId, flowId, flowRunId, limit, createdTimeStart, createdTimeEnd, 
modifiedTimeStart, modifiedTimeEnd, relatesTo, isRelatedTo,  infofilters, 
conffilters, metricfilters, eventfilters and fields.*

Behavior of userId, flowId, flowRunId, fields and clusterId is same as it is 
for get entity API.

*createdTimeStart and createdTimeEnd* specifies the time window in which 
created time for entity should fall for it to be returned. This specifies 
seconds since epoch and is represented as a long internally. Either start or 
end can be specified. 

*modifiedTimeStart and modifiedTimeEnd* specifies the time window in which 
modified time for entity should fall for it to be returned. This specifies 
seconds since epoch and is represented as a long internally.

*limit* specifies how many entities have to be returned in response. If number 
of entities is more than the limit, most recent entities are returned i.e. 
sorted descendingly by created time.

*relatesTo and isRelatedTo* directy map to these fields in TimelineEntity. 
These are specified as a comma separated list. With type to IDs' given in the 
form :
{noformat}
[type]:[id1];[id2];[id3]

For instance, 
relatesTo=type1:id1;id2,type2:id1
This means the entity returned should match with relatesTo field having id1 and 
id2 of type1 & id1 of type2.
{noformat}

*infoFilters and configFilters* specifies the configs or info (with their 
values) which should match with entities to be returned. They are in the form 
of key value pair. Value part of config filters is deciphered as a String. For 
info filters, no such restriction exists. Key value pairs are specified as 
under :
{noformat}
[key]:[value]
So, infofilters=info1:val1,info2:val2 means entities which have info1 with val1 
and info2 with val2 should be returned.
{noformat}

*metricFilters and eventFilters* are both a comma separated list of metric IDs' 
and events which should match.

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Attachment: (was: YARN-3814-YARN-2928.01.patch)

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Attachment: YARN-3814-YARN-2928.01.patch

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Attachment: YARN-3814-YARN-2928.01.patch

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Attachment: (was: YARN-3814-YARN-2928.01.patch)

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628582#comment-14628582
 ] 

Varun Saxena commented on YARN-3814:


+*For Get Entity API,*+

appId, entityType and entityId are mandatory parameters for querying a single 
entity hence they are kept as part of path in REST URL.
Optional query Parameters for get entity API have been kept as *clusterId, 
userId, flowId, flowRunId and fields.*
*clusterId* and appId are mandatory for getting an entity. But clusterId can be 
kept as an optional query parameter because if clusterId is not specified, we 
can take it from configuration.
*userId, flowId and flowRunId* as params are to ensure that we need not query 
app to flow mapping table.

*fields* are a comma separated list of possible fields which have to be 
returned in addition to default view of entity. Default view of an entity 
contains entity ID, entity type, created time and modified time. Possible 
fields are *EVENTS, INFO, METRICS, CONFIGS, RELATESTO and ISRELATEDTO* (case 
insensitive).
If fields is *ALL*, all the above fields are returned. 





> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628567#comment-14628567
 ] 

Varun Saxena commented on YARN-3814:


REST URL for fetching a single entity by entity ID has been kept as under:
{noformat}
http://{ip}:{port}/ws/v2/timeline/entity/{appId}/{entityType}/{entityId}?[Query 
Parameters]
{noformat}

REST URL for fetching a multiple entities has been kept as under:
{noformat}
http://{ip}:{port}/ws/v2/timeline/entity/{appId}/{entityType}?[Query Parameters]
{noformat}

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Attachment: YARN-3814-YARN-2928.01.patch

[~sjlee0], [~zjshen], kindly review

> REST API implementation for getting raw entities in TimelineReader
> --
>
> Key: YARN-3814
> URL: https://issues.apache.org/jira/browse/YARN-3814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3814-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-15 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628530#comment-14628530
 ] 

Anubhav Dhoot commented on YARN-3878:
-

LGTM. Unrelated to this patch there is an existing tiny race between 
GenericEventHandler checking for blockNewEvents and setting drained to false. 
If serviceStop happens in between this, it can set blockNewEvents and with 
drained still true, it can cause eventHandlingThread to finish before the one 
last event gets added in the queue. Not sure its worth changing the product 
code for this since shutdown cannot guarantee all events will be processed.

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
> YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
> YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurre

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628521#comment-14628521
 ] 

Hadoop QA commented on YARN-3873:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  7s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:red}-1{color} | javac |   7m 42s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  5 
new checkstyle issues (total was 314, now 314). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 22s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 27s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745482/0003-YARN-3873.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8548/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8548/console |


This message was automatically generated.

> pendingApplications in LeafQueue should also use OrderingPolicy
> ---
>
> Key: YARN-3873
> URL: https://issues.apache.org/jira/browse/YARN-3873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
> 0003-YARN-3873.patch
>
>
> Currently *pendingApplications* in LeafQueue is using 
> {{applicationComparator}} from CapacityScheduler. This can be changed and 
> pendingApplications can use the OrderingPolicy configured in Queue level 
> (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing

2015-07-15 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628461#comment-14628461
 ] 

MENG DING commented on YARN-1645:
-

Forgot to say that the ContainerManager recovery logic is also removed from the 
previous patch, and will be moved to YARN-3868.

> ContainerManager implementation to support container resizing
> -
>
> Key: YARN-1645
> URL: https://issues.apache.org/jira/browse/YARN-1645
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-1645-YARN-1197.3.patch, YARN-1645.1.patch, 
> YARN-1645.2.patch, yarn-1645.1.patch
>
>
> Implementation of ContainerManager for container resize, including:
> 1) ContainerManager resize logic 
> 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy

2015-07-15 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3873:
--
Attachment: 0003-YARN-3873.patch

Thank you [~leftnoteasy] for the comments.

Uploading an updated version with new api for getActivateIterator(). This will 
be used in case of pendingApplications. In case of FairOrderingPolicy, 
getActivateIterator will provide iterator w/o doing any reordering based on 
weight.
Please share your thoughts on same.

> pendingApplications in LeafQueue should also use OrderingPolicy
> ---
>
> Key: YARN-3873
> URL: https://issues.apache.org/jira/browse/YARN-3873
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 
> 0003-YARN-3873.patch
>
>
> Currently *pendingApplications* in LeafQueue is using 
> {{applicationComparator}} from CapacityScheduler. This can be changed and 
> pendingApplications can use the OrderingPolicy configured in Queue level 
> (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3922) Introduce adaptive heartbeat between RM and NM

2015-07-15 Thread Xiaodi Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628353#comment-14628353
 ] 

Xiaodi Ke commented on YARN-3922:
-

[~mding] - yes, that's the path we are tracing down now. We extend it and use 
for our scenarios.  

>  Introduce adaptive heartbeat between RM and NM
> ---
>
> Key: YARN-3922
> URL: https://issues.apache.org/jira/browse/YARN-3922
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Xiaodi Ke
>
> Currently, the communication between RM and NM are based on pull-based 
> heartbeat protocol. Along with the NM heartbeat, it updates the status of 
> containers (i.e. FINISHED container). This also updates the RM’s view of 
> available resource and triggers scheduling. How frequently the NM sends the 
> heartbeat will impact the task throughput and latency of YARN scheduler.  
> Although the heartbeat interval can be configured in yarn-stie.xml, it will 
> increase the load of RM and bring unnecessary overhead if the interval is 
> configured too short. 
> We propose the adaptive heartbeat between RM and NM to achieve a balance 
> between updating NM’s info promptly and minimizing the overhead of extra 
> heartbeats. With adaptive heartbeat, NM still honors the current heartbeat 
> interval and sends the heartbeat regularly. However, a heartbeat will be 
> triggered as soon as any container status is changed.  Also a minimum 
> interval can be configured to prevent NM from sending heartbeat too 
> frequently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628299#comment-14628299
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 30s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m  6s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745452/0004-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1869) Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()

2015-07-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-1869:
-
Description: 
Here is related code:
{code}
  } else {
opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
CreateMode.PERSISTENT));
  }
{code}

The other methods accessing zkAcl are synchronized.

  was:
Here is related code:
{code}
  } else {
opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
CreateMode.PERSISTENT));
  }
{code}
The other methods accessing zkAcl are synchronized.


> Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()
> --
>
> Key: YARN-1869
> URL: https://issues.apache.org/jira/browse/YARN-1869
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: yarn-1869.patch
>
>
> Here is related code:
> {code}
>   } else {
> opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl,
> CreateMode.PERSISTENT));
>   }
> {code}
> The other methods accessing zkAcl are synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628263#comment-14628263
 ] 

Hudson commented on YARN-3170:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #255 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/255/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628270#comment-14628270
 ] 

Hudson commented on YARN-3170:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2203 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2203/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md
* hadoop-yarn-project/CHANGES.txt


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1645) ContainerManager implementation to support container resizing

2015-07-15 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1645:

Attachment: YARN-1645-YARN-1197.3.patch

Thanks for the confirmation [~jianhe].

Attach updated patch:
* Merge {{authorizeStartRequest}} and {{authorizeResourceIncreaseRequest}} into 
{{authorizeStartAndResourceIncreaseRequest}} to share most of the code. Similar 
to what is being done in {{authorizeGetAndStopContainerRequest}}.
* Some test cases in the previous patch doesn't belong to this issue. Take them 
out, and will move them to YARN-3867.


> ContainerManager implementation to support container resizing
> -
>
> Key: YARN-1645
> URL: https://issues.apache.org/jira/browse/YARN-1645
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-1645-YARN-1197.3.patch, YARN-1645.1.patch, 
> YARN-1645.2.patch, yarn-1645.1.patch
>
>
> Implementation of ContainerManager for container resize, including:
> 1) ContainerManager resize logic 
> 2) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0004-YARN-3893.patch

Attaching patch after comment update and adding testcase

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler

2015-07-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628147#comment-14628147
 ] 

Sunil G commented on YARN-2003:
---

Test case are passing locally.
If i see the test report, it says 0 failures. 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/testReport/
Similarly for findbugs page also shows 0 warnings if I go to the report page.

> Support for Application priority : Changes in RM and Capacity Scheduler
> ---
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
> 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
> 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
> 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
> 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
> 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
> 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
> 0021-YARN-2003.patch, 0022-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628143#comment-14628143
 ] 

Hudson commented on YARN-3170:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #245 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/245/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md
* hadoop-yarn-project/CHANGES.txt


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628129#comment-14628129
 ] 

Hadoop QA commented on YARN-2003:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 13s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 41s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m 34s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 47s | The patch appears to introduce 2 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 55s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  61m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 107m  9s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745428/0022-YARN-2003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8546/console |


This message was automatically generated.

> Support for Application priority : Changes in RM and Capacity Scheduler
> ---
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
> 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
> 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
> 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
> 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
> 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
> 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
> 0021-YARN-2003.patch, 0022-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628115#comment-14628115
 ] 

Hudson commented on YARN-3170:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2184 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2184/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628077#comment-14628077
 ] 

Varun Saxena commented on YARN-3045:


A cosmetic comment. Some of the lines are too long (> 80 chars).

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3045-YARN-2928.002.patch, 
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
> YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch
>
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628072#comment-14628072
 ] 

Varun Saxena commented on YARN-3045:


[~Naganarasimha], a couple of comments.

# In NMTimelinePublisher, we can make Container***Event classes as private. I 
do not see them being referenced anywhere else
# Will a single event queue with a single event handling thread in async 
dispatcher be enough to handle container events ? I think they may be too many.

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3045-YARN-2928.002.patch, 
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
> YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch
>
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627950#comment-14627950
 ] 

Varun Saxena commented on YARN-3893:


Thanks for the patch [~bibinchundatt]. Few comments.

# Nit : Should be "Exception in state transition"
{code}
  throw new ServiceFailedException(
  "Exception in state transistion", re);
{code}
# IMO, no need to throw ServiceFailedException when catching exception while 
calling reinitialize. The throw below should suffice. Just set the flag. 
According to me, we should retain the original exception.
# Add a comment indicating what the flag does.
# Maybe rename the flag to reinitActiveServices instead of reinitialize.
# The flag according to me, semantically speaking, doesn't quite belong to 
AdminService. Can be in ResourceManager or RMContext. Thoughts ?
# Can you add a test to verify the fix ? 
# I think instead of relying on transitionToStandby to change state to standby, 
we can explicitly change the state in AdminService. Thats because even 
stopActiveServices can throw an Exception and if it does, state won't change to 
STANDBY. This call to stop should not throw an exception, but as services keep 
on getting added you never know how a particular service may behave. We should 
be immune to it. Try something like below.
{code}
 
((RMContextImpl)rmContext).setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);
{code}
# Just a suggestion. If we do above, maybe call stopActiveServices and 
reinitialize directly instead of calling transitonToStandby. This is because as 
I said in a comment above, transitionToStandby would print an audit log saying 
transition is successful. But reinitialize subsequently may fail. And not 
printing this audit log will be consistent with  transitionToActive failing 
during starting active services. Thoughts ?

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627949#comment-14627949
 ] 

Tsuyoshi Ozawa commented on YARN-2801:
--

[~Naganarasimha] [~leftnoteasy] Could you fix following points?

1. Could you edit hadoop-project/src/site/site.xml to add an entry of NodeLabel 
to the left-side menu?
 2. there are some typos:

{quote}
+Yarn Node Labels
{quote}
"Yarn" should be upper case, *YARN*?

{quote}
and application can specify where to run.
{quote}
*applications* instead of application looks better.

{quote}
User need configure how much resources of each partition can be used by 
different queues, see next sections.
{quote}
"User need *to* configure" looks better. Also, "~ by different queues, see next 
sections" is a bit difficult to read. My suggestion is "~ by different queues. 
For more detail, please see the next section." 

{quote}
There’re two kinds of node partitions
{quote}

"’" should be "'"(using half-width character instead of full-width character). 
Adding *:* looks better after the sentence.

{quote}
Exclusive: Containers will be allocated to nodes with exactly match node 
partition. 
{quote}

*Containers* should be *containers*.

{quote}
User can specify ~
{quote}
*A user can specify* ~

{quote}
user can set percentage like: queue-A can access 30% ~
{quote}
a user can set ~, adding space between *queue* and *-*, *-* and *A*.

{quote}
 So each of they can use 1/3 ~
{quote}

each of *them* ~

{quote}
After finish setting configuration of CapacityScheduler, executing ~
{quote}
After *finishing* configuration of CapacityScheduler, *execute* ~

{quote}
Application can use ~ 
{quote}

*Applications* can use ~













> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler

2015-07-15 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2003:
--
Attachment: 0022-YARN-2003.patch

Uploading a new version of patch after fixing test issues and warnings.

One more validation is added in CS. There are two cases 
- user submit a new application without specifying priority and submits to a 
non-existent queue.
- user submit a new application without specifying priority and submits with no 
queue (queue as null). Assuming that a queue mapping is done with this user in 
cs config file.

Validation for such cases are happening in addApplication in CS. But this will 
come only later in the call flow. To fill in the default priority from 
RMAppManager, we need the {{queue}} object. hence i added these checks in 
{{checkAndGetApplicationPriority}}. Please share your thoughts.

> Support for Application priority : Changes in RM and Capacity Scheduler
> ---
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
> 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
> 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
> 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
> 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
> 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
> 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, 
> 0021-YARN-2003.patch, 0022-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3909) TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena resolved YARN-3909.

Resolution: Duplicate

> TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk
> ---
>
> Key: YARN-3909
> URL: https://issues.apache.org/jira/browse/YARN-3909
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.413 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
> testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
>   Time elapsed: 5.327 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:156)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627925#comment-14627925
 ] 

Varun Saxena commented on YARN-3910:


Closing this as duplicate

> TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
> 
>
> Key: YARN-3910
> URL: https://issues.apache.org/jira/browse/YARN-3910
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3910.001.patch, YARN-3910.02.patch
>
>
> Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/
> {noformat}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
> testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.049 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions)
>   Time elapsed: 0.031 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk

2015-07-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena resolved YARN-3913.

Resolution: Duplicate

> TestResourceTrackerService#testReconnectNode fails on trunk
> ---
>
> Key: YARN-3913
> URL: https://issues.apache.org/jira/browse/YARN-3913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk

2015-07-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627924#comment-14627924
 ] 

Varun Saxena commented on YARN-3913:


Yes, all the test failure related JIRAs' I had raised are dups of YARN-3916.
Will close them all.

> TestResourceTrackerService#testReconnectNode fails on trunk
> ---
>
> Key: YARN-3913
> URL: https://issues.apache.org/jira/browse/YARN-3913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627919#comment-14627919
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

The test result:
{quote}
-1 overall.  

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 48 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version ) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.
{quote}

I checked javadoc warning, but I found no diff. I think it looks like false 
positive warning by a script. 

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627916#comment-14627916
 ] 

Hadoop QA commented on YARN-3535:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  3 
new checkstyle issues (total was 337, now 340). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 31s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745422/0004-YARN-3535.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8545/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8545/console |


This message was automatically generated.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk

2015-07-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627910#comment-14627910
 ] 

Rohith Sharma K S commented on YARN-3913:
-

Looked at the test failures from the 
[link|https://builds.apache.org/job/PreCommit-YARN-Build/8518/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt].
 
{code}
testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
  Time elapsed: 0.157 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:1051)
{code}
This is causes due to dispatcher did not wait for evetns to process heartbeat 
event. This should be duplicate of YARN-3916.
[~varun_saxena] can you confirm this and close this jira?

> TestResourceTrackerService#testReconnectNode fails on trunk
> ---
>
> Key: YARN-3913
> URL: https://issues.apache.org/jira/browse/YARN-3913
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-463) Show explicitly excluded nodes on the UI

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627907#comment-14627907
 ] 

Hadoop QA commented on YARN-463:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 24s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 58s | The applied patch generated  2 
new checkstyle issues (total was 98, now 100). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 56s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m  6s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745420/YARN-463.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8544/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8544/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8544/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8544/console |


This message was automatically generated.

> Show explicitly excluded nodes on the UI
> 
>
> Key: YARN-463
> URL: https://issues.apache.org/jira/browse/YARN-463
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: zhaoyunjiong
>  Labels: usability
> Attachments: Screen Shot 2015-07-14 at 1.25.46 PM.png, 
> YARN-463.1.patch, YARN-463.patch
>
>
> Nodes can be explicitly excluded via the config 
> yarn.resourcemanager.nodes.exclude-path. We should have a way of displaying 
> this list via web and command line UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627886#comment-14627886
 ] 

Hudson commented on YARN-3170:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/987/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627878#comment-14627878
 ] 

Hudson commented on YARN-3170:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/])
YARN-3170. YARN architecture document needs updating. Contirubted by Brahma 
Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md


> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Fix For: 2.7.2
>
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, 
> YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627838#comment-14627838
 ] 

Hadoop QA commented on YARN-3885:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  1 
new checkstyle issues (total was 63, now 64). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  45m  7s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  84m 54s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestRM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745407/YARN-3885.07.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8542/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8542/console |


This message was automatically generated.

> ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
> level
> --
>
> Key: YARN-3885
> URL: https://issues.apache.org/jira/browse/YARN-3885
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
> YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
> YARN-3885.07.patch, YARN-3885.patch
>
>
> when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
> this piece of code, to calculate {{untoucable}} doesnt consider al the 
> children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627836#comment-14627836
 ] 

Tsuyoshi Ozawa commented on YARN-2801:
--

I'll check this soon.

> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-15 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3535:

Attachment: 0004-YARN-3535.patch

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> -
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-07-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627806#comment-14627806
 ] 

Naganarasimha G R commented on YARN-2801:
-

Hi [~leftnoteasy], can we get this patch committed ? anything else pending for 
this ?

> Documentation development for Node labels requirment
> 
>
> Key: YARN-2801
> URL: https://issues.apache.org/jira/browse/YARN-2801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Gururaj Shetty
>Assignee: Wangda Tan
> Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch
>
>
> Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA

2015-07-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627788#comment-14627788
 ] 

Naganarasimha G R commented on YARN-3152:
-

Thanks [~neillfontes] for responding, Was waiting for some feedback on this 
from you and committers/PMC so that i can go ahead.  Services will be not 
available thats for sure but another issue is from the logs its visible that 
even though RM start up has failed, another RM tries to come up and if it fails 
for the same reason it just keeps oscillating between the two...  As you 
mentioned i feel WARN should be sufficent for this issue or we can adopt the 
approach specified by [~kasha] in YARN-3607 and change the behavior to log WARN 
by default and if {{yarn.fail-fast}} is specified to true then have the current 
behavior. thoughts?

> Missing hadoop exclude file fails RMs in HA
> ---
>
> Key: YARN-3152
> URL: https://issues.apache.org/jira/browse/YARN-3152
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: Debian 7
>Reporter: Neill Lima
>Assignee: Naganarasimha G R
>
> NI have two NNs in HA, they do not fail when the exclude file is not present 
> (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in 
> HA. I didn't create the exclude file at this point as well. I applied the HA 
> RM settings properly and when I started both RMs I started getting this 
> exception:
> 2015-02-06 12:25:25,326 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root   
> OPERATION=transitionToActiveTARGET=RMHAProtocolService  
> RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   
> PERMISSIONS=All users are allowed
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>   ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: 
> java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file 
> or directory)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
>   ... 5 more
> 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x44af32566180094 closed
> 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating 
> client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 
> sessionTimeout=1 
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to x.x.x.x/x.x.x.x:2181, initiating session
> The issue is descriptive enough to resolve the problem - and it has been 
> fixed by creating the exclude file. 
> I just think as of a improvement: 
> - Should RMs ignore the missing file as the NNs did?
> - Should single RM fail even when the file is not present?
> Just suggesting this improvement to keep the behavior consistent when working 
> with in HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-463) Show explicitly excluded nodes on the UI

2015-07-15 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated YARN-463:
--
Attachment: YARN-463.1.patch

Update patch fix test failures.

> Show explicitly excluded nodes on the UI
> 
>
> Key: YARN-463
> URL: https://issues.apache.org/jira/browse/YARN-463
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: zhaoyunjiong
>  Labels: usability
> Attachments: Screen Shot 2015-07-14 at 1.25.46 PM.png, 
> YARN-463.1.patch, YARN-463.patch
>
>
> Nodes can be explicitly excluded via the config 
> yarn.resourcemanager.nodes.exclude-path. We should have a way of displaying 
> this list via web and command line UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627780#comment-14627780
 ] 

Hadoop QA commented on YARN-3798:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745414/YARN-3798-branch-2.7.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8543/console |


This message was automatically generated.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627767#comment-14627767
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 57s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 58s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745400/0003-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-15 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627735#comment-14627735
 ] 

Ajith S commented on YARN-3885:
---

these failures aren't because of patch i feel :)

> ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
> level
> --
>
> Key: YARN-3885
> URL: https://issues.apache.org/jira/browse/YARN-3885
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
> YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, 
> YARN-3885.07.patch, YARN-3885.patch
>
>
> when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
> this piece of code, to calculate {{untoucable}} doesnt consider al the 
> children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level

2015-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627719#comment-14627719
 ] 

Hadoop QA commented on YARN-3885:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 39s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   9m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  49m 49s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
|   | 
hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
 |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAppManager |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder 
|
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityReservationSystem |
|   | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower
 |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForNodePartitions
 |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics |
|   | hadoop.yarn.server.resourcemanager.TestRMNodeTransitions |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservationQueue |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog |
|   | hadoop.yarn.server.resourcemanager.webapp.TestAppPage |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue |
|   | hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower |
|   | hadoop.yarn.server.resourcemanager.

[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3798:
-
Attachment: YARN-3798-branch-2.7.005.patch

Attaching a patch to address [~zxu]'s comment.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resource

  1   2   >