[jira] [Commented] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629308#comment-14629308 ] Hudson commented on YARN-3174: -- FAILURE: Integrated in Hadoop-trunk-Commit #8171 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8171/]) YARN-3174. Consolidate the NodeManager and NodeManagerRestart documentation into one. Contributed by Masatake Iwasaki. (ozawa: rev f02dd146f58bcfa0595eec7f2433bafdd857630f) * hadoop-project/src/site/site.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRestart.md > Consolidate the NodeManager and NodeManagerRestart documentation into one > - > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.1 >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3805) Update the documentation of Disk Checker based on YARN-90
[ https://issues.apache.org/jira/browse/YARN-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629302#comment-14629302 ] Tsuyoshi Ozawa commented on YARN-3805: -- [~iwasakims] could you rebase it? > Update the documentation of Disk Checker based on YARN-90 > - > > Key: YARN-3805 > URL: https://issues.apache.org/jira/browse/YARN-3805 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: YARN-3805.001.patch > > > NodeManager is able to recover status of the disk once broken and fixed > without restarting by YARN-90. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629300#comment-14629300 ] Hadoop QA commented on YARN-3535: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 5 new checkstyle issues (total was 338, now 343). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 30s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745572/0005-YARN-3535.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8554/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8554/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8554/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8554/console | This message was automatically generated. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629296#comment-14629296 ] zhihai xu commented on YARN-3535: - Sorry for coming late into this issue. The latest Patch looks good to me except one nit: Can we make {{ContainerRescheduledTransition}} child class of {{FinishedTransition}} similar as {{KillTransition}}? So we can call {{super.transition(container, event);}} instead of {{new FinishedTransition().transition(container, event);}}. I think this will make the code more readable and match other transition class implementation. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629288#comment-14629288 ] Karthik Kambatla commented on YARN-3926: Thanks a bunch for putting this proposal together, Varun. We are in dire need of improvements to our resource-model, and the proposal goes a long way in addressing some of these issues. Huge +1 to this effort. Comments on the proposal itself: # There is a significant overlap between resource-types.xml and node-resources.xml. It would be nice to consolidate at least these parts. # Can we avoid the mismatch between the resource types on RM and NM altogether? # Can we avoid different restart paths for adding and removing resources? # Really like the concise configs proposed at the end of the document. What do you think of the following modifications to the proposal to address above wishes? I have clearly not thought as much before making these suggestions, so please feel free to shoot them down. # How about calling them yarn.resource-types, yarn.resource-types.memory.*, yarn.resource-types.cpu.*. Further, memory/cpu specific configs could be made simpler per the suggestions later in the document? # yarn.scheduler.resource-types is a subset of yarn.resource-types, and captures the resource-types the scheduler supports. This could be in yarn-site on RM. # yarn.nodemanager.resource-types.monitored and yarn.nodemanager.resource-types.enforced also are subsets of yarn.resource-types and could define the resources the NM monitors and enforces respectively. These could be in yarn-site on the NM. I understand isolation is out of scope here, but would be nice to have configs that lend themselves to future work. # yarn.nodemanager.[resources|resource-types].available could be a map where each key should be an entry in yarn.resource-types. You mention capturing node-labels etc. similarly. Could you elaborate on your thoughts, at least informally? Would be super nice to have a path in mind even if we were to do as follow-up work. > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629283#comment-14629283 ] Masatake Iwasaki commented on YARN-3174: Thanks, [~ozawa]! > Consolidate the NodeManager and NodeManagerRestart documentation into one > - > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.1 >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629276#comment-14629276 ] Varun Vasudev commented on YARN-3852: - Thanks for the patch [~ashahab]. The patch isn't working for me. There are two issues - # No default value for "docker.binary". I think we should assume this to be "docker" and allow it to be overriden. # The docker launch fails due to {code} if (change_effective_user(user_uid, user_gid) != 0) {code} in launch_docker_container_as_user. For docker run to work, the effective user needs to be root(something like change_effective_user(0, user_gid) is probably the right way). Some other issues - # {code} -static const char* DEFAULT_BANNED_USERS[] = {"yarn", "mapred", "hdfs", "bin", 0}; +static const char* DEFAULT_BANNED_USERS[] = {"mapred", "hdfs", "bin", 0}; {code} Why are you removing the yarn user from the banned users? I'm guessing this is due to a branch-2/trunk issue. The yarn user is banned in trunk but not in branch-2 # A couple of formatting fixes {code} + fprintf(LOGFILE, "done opening pid\n"); +fflush(LOGFILE); {code} and {code} +fprintf(LOGFILE, "done writing pid to tmp\n"); + fflush(LOGFILE); {code} # Can we change the error message in the message below to a more descriptive one? {code} + fprintf(ERRORFILE, "Error reading\n"); + fflush(ERRORFILE); {code} # In parse_docker_command_file {code} + int read; {code} should we use ssize_t instead or int? # In parse_docker_command_file, we have some exit(1) calls - can we change this to use the error codes in container-executor.h? # In run_docker {code} + free(docker_binary); + free(args); + free(docker_command_with_binary); + free(docker_command); + exit_code = DOCKER_RUN_FAILED; + } + exit_code = 0; + return exit_code; {code} The exit code from the function will always be 0 # Formatting {code} +int create_script_paths(const char *work_dir, + const char *script_name, const char *cred_file, + char** script_file_dest, char** cred_file_dest, + int* container_file_source, int* cred_file_source ) { {code} # In create_script_paths, we use a bunch of goto's but the goto target doesn't have any special logic or handling. Can we avoid using the goto? # {code} +//kill me now. {code} No need for the commentary :) # In main.c {code} +char * resources = argv[optind++];// key,value pair describing resources +char * resources_key = malloc(strlen(resources)); +char * resources_value = malloc(strlen(resources)); {code} Can we move the declarations of resources, resources_key and resources_value out of the case block(since the same variables are used in two case blocks)? > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3174: - Affects Version/s: 2.7.1 > Consolidate the NodeManager and NodeManagerRestart documentation into one > - > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.1 >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3174: - Component/s: documentation > Consolidate the NodeManager and NodeManagerRestart documentation into one > - > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.1 >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629268#comment-14629268 ] Sunil G commented on YARN-2005: --- Thanks [~adhoot]. Sorry for delayed response. bq.The nodes are removed from blacklist once the launch of the AM happens to limit this issue. Yes. I feel this will be fine. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3174) Consolidate the NodeManager and NodeManagerRestart documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3174: - Summary: Consolidate the NodeManager and NodeManagerRestart documentation into one (was: Consolidate the NodeManager documentation into one) > Consolidate the NodeManager and NodeManagerRestart documentation into one > - > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3174) Consolidate the NodeManager documentation into one
[ https://issues.apache.org/jira/browse/YARN-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629263#comment-14629263 ] Tsuyoshi Ozawa commented on YARN-3174: -- +1 > Consolidate the NodeManager documentation into one > -- > > Key: YARN-3174 > URL: https://issues.apache.org/jira/browse/YARN-3174 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Allen Wittenauer >Assignee: Masatake Iwasaki > Attachments: YARN-3174.001.patch > > > We really don't need a different document for every individual nodemanager > feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629253#comment-14629253 ] Arun Suresh commented on YARN-3535: --- The patch looks good !! Thanks for working on this [~peng.zhang] and [~rohithsharma] +1, Pending successful jenkins run with latest patch > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails
[ https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629250#comment-14629250 ] Akira AJISAKA commented on YARN-2578: - Thanks [~iwasakims] for creating the patch. One comment and one question from me. bq. The default value is 0 in order to keep current behaviour. 1. We would like to fix this bug, so default to 1min is good for me. 2. Would you tell me why {{Client.getRpcTimeout}} returns 0 if {{ipc.client.ping}} is false? > NM does not failover timely if RM node network connection fails > --- > > Key: YARN-2578 > URL: https://issues.apache.org/jira/browse/YARN-2578 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Attachments: YARN-2578.002.patch, YARN-2578.patch > > > The NM does not fail over correctly when the network cable of the RM is > unplugged or the failure is simulated by a "service network stop" or a > firewall that drops all traffic on the node. The RM fails over to the standby > node when the failure is detected as expected. The NM should than re-register > with the new active RM. This re-register takes a long time (15 minutes or > more). Until then the cluster has no nodes for processing and applications > are stuck. > Reproduction test case which can be used in any environment: > - create a cluster with 3 nodes > node 1: ZK, NN, JN, ZKFC, DN, RM, NM > node 2: ZK, NN, JN, ZKFC, DN, RM, NM > node 3: ZK, JN, DN, NM > - start all services make sure they are in good health > - kill the network connection of the RM that is active using one of the > network kills from above > - observe the NN and RM failover > - the DN's fail over to the new active NN > - the NM does not recover for a long time > - the logs show a long delay and traces show no change at all > The stack traces of the NM all show the same set of threads. The main thread > which should be used in the re-register is the "Node Status Updater" This > thread is stuck in: > {code} > "Node Status Updater" prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in > Object.wait() [0x7f5a51fc1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xed62f488> (a org.apache.hadoop.ipc.Client$Call) > at java.lang.Object.wait(Object.java:503) > at org.apache.hadoop.ipc.Client.call(Client.java:1395) > - locked <0xed62f488> (a org.apache.hadoop.ipc.Client$Call) > at org.apache.hadoop.ipc.Client.call(Client.java:1362) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) > {code} > The client connection which goes through the proxy can be traced back to the > ResourceTrackerPBClientImpl. The generated proxy does not time out and we > should be using a version which takes the RPC timeout (from the > configuration) as a parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629249#comment-14629249 ] Arun Suresh commented on YARN-3535: --- I meant for the FairScheduler... but looks like your new patch has it... thanks > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629208#comment-14629208 ] Peng Zhang commented on YARN-3535: -- Thanks [~rohithsharma] for updating patch. patch LGTM. bq. One point to be clear that , here the assumption made is if RMContainer is ALLOCATED then only recover ResourceRequest. If RMcontainer is in RUNNING, then completed container will go to AM in allocate response and AM will ask new ResourceRequest. During running in our scale cluster with FS and preemption enabled, MapReduce app works good with this assumption. Basically, I think this assumption make sense for other type app. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629168#comment-14629168 ] Rohith Sharma K S commented on YARN-3535: - One point to be clear that , here the assumption made is if RMContainer is ALLOCATED then only recover ResourceRequest. If RMcontainer is in RUNNING, then completed container will go to AM in allocate response and AM will ask new ResourceRequest. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-3535: Attachment: 0005-YARN-3535.patch > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, > yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629157#comment-14629157 ] Rohith Sharma K S commented on YARN-3535: - ahh, right.. it can be removed. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629144#comment-14629144 ] Rohith Sharma K S commented on YARN-3535: - Yes, {{TestCapacityScheduler#testRecoverRequestAfterPreemption}} simulates this. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629141#comment-14629141 ] Sandy Ryza commented on YARN-3635: -- BTW I got all this from QueuePlacementPolicy and QueuePlacementRule, which are pretty quick reads if you want to take a look. > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Tan, Wangda > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629139#comment-14629139 ] Sandy Ryza commented on YARN-3635: -- [~leftnoteasy], apologies for this quick drive-by review - I am currently traveling. The JIRA appears to be lacking a design-doc and I wasn't able to find documentation in the patch. The patch should ultimately include some detailed documentation, but I don't want to ask this of you before OKing the approach. In light of this, a few questions: * What steps are required for the Fair Scheduler to integrate with this? * Is a common way of configuration proposed? * How does this differ from the current Fair Scheduler model? To summarize: ** The FS model consists of a sequence of placement rules that the app is passed through. ** Each placement rule gets the chance to assign the app to a queue, reject the app, or pass. If it passes, the next rule gets a chance. ** A placement rule can base its decision on: *** The submitting user. *** The set of groups the submitting user belongs to. *** The queue requested in the app submission. *** A set of configuration options that are specific to the rule. *** The set of queues given in the Fair Scheduler configuration. ** Rules are marked as "terminal" if they will never pass. This helps to avoid misconfigurations where users place rules after terminal rules. ** Rules have a "create" attribute which determines whether they can create a new queue or whether they must assign to existing queues. ** Currently the set of placement rules is limited to what's implemented in YARN. I.e. there's no public pluggable rule support. I noticed from Vinod's comment that this patch follows a similar structure. Are there places where my summary would not describe what's going on in this patch? > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Tan, Wangda > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-3635: Assignee: Sandy Ryza (was: Wangda Tan) > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Sandy Ryza > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3635: - Assignee: Tan, Wangda (was: Sandy Ryza) > Get-queue-mapping should be a common interface of YarnScheduler > --- > > Key: YARN-3635 > URL: https://issues.apache.org/jira/browse/YARN-3635 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Tan, Wangda > Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, > YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch > > > Currently, both of fair/capacity scheduler support queue mapping, which makes > scheduler can change queue of an application after submitted to scheduler. > One issue of doing this in specific scheduler is: If the queue after mapping > has different maximum_allocation/default-node-label-expression of the > original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks > the wrong queue. > I propose to make the queue mapping as a common interface of scheduler, and > RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3928) launch application master on specific host
Wenrui created YARN-3928: Summary: launch application master on specific host Key: YARN-3928 URL: https://issues.apache.org/jira/browse/YARN-3928 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 2.6.0 Environment: Ubuntu 12.04 Reporter: Wenrui Hi, Is there a way to launch application master on a specific host ? If we can not do this in a managed-AM-launcher? then is it possible to achieve this in unmanaged-AM-launcher? I just find it's quite necessary to set application master on a specific host in some scenes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3575) Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted
[ https://issues.apache.org/jira/browse/YARN-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3575: -- Labels: 2.6.1-candidate (was: ) > Job using 2.5 jars fails on a 2.6 cluster whose RM has been restarted > - > > Key: YARN-3575 > URL: https://issues.apache.org/jira/browse/YARN-3575 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.6.0 >Reporter: Jason Lowe > Labels: 2.6.1-candidate > > Trying to launch a job that uses the 2.5 jars fails on a 2.6 cluster whose RM > has been restarted (i.e.: epoch != 0) becaue the epoch number starts > appearing in the container IDs and the 2.5 jars no longer know how to parse > the container IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3369: -- Labels: 2.6.1-candidate (was: ) > Missing NullPointer check in AppSchedulingInfo causes RM to die > > > Key: YARN-3369 > URL: https://issues.apache.org/jira/browse/YARN-3369 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3369-003.patch, YARN-3369.2.patch, YARN-3369.patch > > > In AppSchedulingInfo.java the method checkForDeactivation() has these 2 > consecutive lines: > {code} > ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); > if (request.getNumContainers() > 0) { > {code} > the first line calls getResourceRequest and it can return null. > {code} > synchronized public ResourceRequest getResourceRequest( > Priority priority, String resourceName) { > Map nodeRequests = requests.get(priority); > return (nodeRequests == null) ? {color:red} null : > nodeRequests.get(resourceName); > } > {code} > The second line dereferences the pointer directly without a check. > If the pointer is null, the RM dies. > {quote}2015-03-17 14:14:04,757 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) > at java.lang.Thread.run(Thread.java:722) > {color:red} *2015-03-17 14:14:04,758 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, > bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3103) AMRMClientImpl does not update AMRM token properly
[ https://issues.apache.org/jira/browse/YARN-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3103: -- Labels: 2.6.1-candidate (was: ) > AMRMClientImpl does not update AMRM token properly > -- > > Key: YARN-3103 > URL: https://issues.apache.org/jira/browse/YARN-3103 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3103.001.patch > > > AMRMClientImpl.updateAMRMToken updates the token service _before_ storing it > to the credentials, so the token is mapped using the newly updated service > rather than the empty service that was used when the RM created the original > AMRM token. This leads to two AMRM tokens in the credentials and can still > fail if the AMRMTokenSelector picks the wrong one. > In addition the AMRMClientImpl grabs the login user rather than the current > user when security is enabled, so it's likely the UGI being updated is not > the UGI that will be used when reconnecting to the RM. > The end result is that AMs can fail with invalid token errors when trying to > reconnect to an RM after a new AMRM secret has been activated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2992: -- Labels: 2.6.1-candidate (was: ) > ZKRMStateStore crashes due to session expiry > > > Key: YARN-2992 > URL: https://issues.apache.org/jira/browse/YARN-2992 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: yarn-2992-1.patch > > > We recently saw the RM crash with the following stacktrace. On session > expiry, we should gracefully transition to standby. > {noformat} > 2014-12-18 06:28:42,689 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2874: -- Labels: 2.6.1-candidate (was: ) > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3242: -- Labels: 2.6.1-candidate (was: ) > Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events > for old client > - > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3242.000.patch, YARN-3242.001.patch, > YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. > The following code from zookeeper(ClientCnxn#EventThread) show even after > receive eventOfDeath, EventThread will still process all the events until > waitingEvents queue is empty. > {code} > while (true) { > Object event = waitingEvents.take(); > if (event == eventOfDeath) { > wasKilled = true; > } else { > processEvent(event); > } > if (wasKilled) > synchronized (waitingEvents) { >if (waitingEvents.isEmpty()) { > isRunning = false; > break; >} > } > } > private void processEvent(Object event) { > try { > if (event instanceof WatcherSetEventPair) { > // each watcher will process the event > WatcherSetEventPair pair = (WatcherSetEventPair) event; > for (Watcher watcher : pair.watchers) { > try { > watcher.process(pair.event); > } catch (Throwable t) { > LOG.error("Error while calling watcher ", t); > } > } > } else { > public void disconnect() { > if (LOG.isDebugEnabled()) { > LOG.debug("Disconnecting client for session: 0x" > + Long.toHexString(getSessionId())); > } > sendThread.close(); > eventThread.queueEventOfDeath(); > } > public void close() throws IOException { > if (LOG.isDebugEnabled()) { > LOG.debug("Closing client for session: 0x" > + Long.toHexString(getSessionId())); > } > try { > RequestHeader h = new RequestHeader(); > h.setType(ZooDefs.OpCode.closeSession); > submitRequest(h, null, null, null); > } catch (InterruptedException e) { > // ignore, close the send/event threads > } finally { > disconnect(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629010#comment-14629010 ] Hadoop QA commented on YARN-2005: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 8m 1s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 43s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 7s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 56s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 53s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 51m 54s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 98m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745548/YARN-2005.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8553/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8553/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8553/console | This message was automatically generated. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2005: Attachment: YARN-2005.004.patch Addressed feedback of adding configuration for threshold Also made the default to true and updated tests > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628867#comment-14628867 ] Hadoop QA commented on YARN-3925: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 50s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 8s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 43m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745527/YARN-3925.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8552/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8552/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8552/console | This message was automatically generated. > ContainerLogsUtils#getContainerLogFile fails to read container log files from > full disks. > - > > Key: YARN-3925 > URL: https://issues.apache.org/jira/browse/YARN-3925 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3925.000.patch > > > ContainerLogsUtils#getContainerLogFile fails to read files from full disks. > {{getContainerLogFile}} depends on > {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but > {{LocalDirsHandlerService#getLogPathToRead}} calls > {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses > configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not > include full disks in {{LocalDirsHandlerService#checkDirs}}: > {code} > Configuration conf = getConfig(); > List localDirs = getLocalDirs(); > conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, > localDirs.toArray(new String[localDirs.size()])); > List logDirs = getLogDirs(); > conf.setStrings(YarnConfiguration.NM_LOG_DIRS, > logDirs.toArray(new String[logDirs.size()])); > {code} > ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and > ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628840#comment-14628840 ] Hadoop QA commented on YARN-2410: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 21s | The applied patch generated 32 new checkstyle issues (total was 60, now 92). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 48s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 0m 26s | Tests passed in hadoop-mapreduce-client-shuffle. | | | | 36m 40s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-mapreduce-client-shuffle | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745525/YARN-2410-v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-shuffle.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html | | hadoop-mapreduce-client-shuffle test log | https://builds.apache.org/job/PreCommit-YARN-Build/8551/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8551/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8551/console | This message was automatically generated. > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Fix For: 2.7.2 > > Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, > YARN-2410-v3.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3925: Attachment: YARN-3925.000.patch > ContainerLogsUtils#getContainerLogFile fails to read container log files from > full disks. > - > > Key: YARN-3925 > URL: https://issues.apache.org/jira/browse/YARN-3925 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3925.000.patch > > > ContainerLogsUtils#getContainerLogFile fails to read files from full disks. > {{getContainerLogFile}} depends on > {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but > {{LocalDirsHandlerService#getLogPathToRead}} calls > {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses > configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not > include full disks in {{LocalDirsHandlerService#checkDirs}}: > {code} > Configuration conf = getConfig(); > List localDirs = getLocalDirs(); > conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, > localDirs.toArray(new String[localDirs.size()])); > List logDirs = getLogDirs(); > conf.setStrings(YarnConfiguration.NM_LOG_DIRS, > logDirs.toArray(new String[logDirs.size()])); > {code} > ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and > ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3925: Attachment: (was: YARN-3925.000.patch) > ContainerLogsUtils#getContainerLogFile fails to read container log files from > full disks. > - > > Key: YARN-3925 > URL: https://issues.apache.org/jira/browse/YARN-3925 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3925.000.patch > > > ContainerLogsUtils#getContainerLogFile fails to read files from full disks. > {{getContainerLogFile}} depends on > {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but > {{LocalDirsHandlerService#getLogPathToRead}} calls > {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses > configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not > include full disks in {{LocalDirsHandlerService#checkDirs}}: > {code} > Configuration conf = getConfig(); > List localDirs = getLocalDirs(); > conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, > localDirs.toArray(new String[localDirs.size()])); > List logDirs = getLogDirs(); > conf.setStrings(YarnConfiguration.NM_LOG_DIRS, > logDirs.toArray(new String[logDirs.size()])); > {code} > ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and > ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-2410: -- Attachment: YARN-2410-v3.patch > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Fix For: 2.7.2 > > Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, > YARN-2410-v3.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628722#comment-14628722 ] Arun Suresh commented on YARN-3535: --- Also... Is it possible to simulate the 2 cases in the testcase ? > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628713#comment-14628713 ] Li Lu commented on YARN-3814: - Oops sorry I was replying a wrong message... > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628711#comment-14628711 ] Li Lu commented on YARN-3814: - OK, thanks [~zjshen]! > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628708#comment-14628708 ] Zhijie Shen commented on YARN-3814: --- I didn't go beyond the current reader interface. You're safe:-) > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3927) Make the NodeManager's ContainerManager pluggable
Subru Krishnan created YARN-3927: Summary: Make the NodeManager's ContainerManager pluggable Key: YARN-3927 URL: https://issues.apache.org/jira/browse/YARN-3927 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Subru Krishnan Assignee: Subru Krishnan YARN-2884 proposes proxying all AM-RM communication for: * perform distributed scheduling decisions (YARN-2877) * throttling mis-behaving AMs * mask the access to a federation of RMs (YARN-2915) To enable all of the above, we are implementing the AMRMProxy as an extension to NM's ContainerManagerImpl.This JIRA is for making the ContainerManager pluggable so that we allow dynamically swap in of the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628688#comment-14628688 ] Arun Suresh commented on YARN-3535: --- bq. This jira fix 2. Kill Container event in CS. So removing recoverResourceRequestForContainer(cont); is make sense to me.. Any reason why we don't remove {{recoverResourceRequestForContainer}} from the {{warnOrKillContainer}} method in the FairSheduler ? wont the above situation happen in the FS as well.. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628681#comment-14628681 ] Li Lu commented on YARN-3914: - Hi [~zjshen], do you think this will affect the data schema design of aggregation storages as well, or it's an "entity table only" change? I think this is independent to the aggregation implementations but would like to double check it. Thanks! > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628680#comment-14628680 ] Varun Saxena commented on YARN-3814: [~zjshen], did not know about that. Thought YARN-3049 was about HBase backend implementation. We can surely consolidate both. Let's have REST implementation in this JIRA. And HBase implementation in YARN-3049. Thoughts ? Moreover, parallely I am working on YARN-3862 and YARN-3863. Not sure if there is any overlap there. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628675#comment-14628675 ] Hadoop QA commented on YARN-3814: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 31s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 10m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 18s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 21s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 18s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 1s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 58s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 10s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 48m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745505/YARN-3814-YARN-2928.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / eb1932d | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8550/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8550/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8550/console | This message was automatically generated. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628647#comment-14628647 ] Zhijie Shen commented on YARN-3814: --- [~varun_saxena], thanks for putting the patch. It seems that we have duplicate some work (I'm working on a POC for reader (YARN-3049) which contains some REST API hook too). I'll upload a POC patch a bit latter. Let's consolidate them. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3867) ContainerImpl changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-3867: Attachment: YARN-3867-YARN-1197.3.patch Update this patch as dependent issues have been updated. > ContainerImpl changes to support container resizing > --- > > Key: YARN-3867 > URL: https://issues.apache.org/jira/browse/YARN-3867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-3867-YARN-1197.3.patch, YARN-3867.1.patch, > YARN-3867.2.patch > > > 1) ContainerImpl logic changes in NM to handle events related to container > resizing > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628633#comment-14628633 ] Hadoop QA commented on YARN-3814: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 15s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 34s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 41m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745499/YARN-3814-YARN-2928.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / eb1932d | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8549/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8549/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8549/console | This message was automatically generated. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628625#comment-14628625 ] Varun Saxena commented on YARN-3814: Deleted and updated a new patch as some debug statements were left over in previous patch > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628623#comment-14628623 ] Varun Saxena commented on YARN-3814: +*For Get Entities API,*+ appId and entityType are mandatory parameters for querying multiple entities hence they are kept as part of path in REST URL. Optional query Parameters for get entity API have been kept as *clusterId, userId, flowId, flowRunId, limit, createdTimeStart, createdTimeEnd, modifiedTimeStart, modifiedTimeEnd, relatesTo, isRelatedTo, infofilters, conffilters, metricfilters, eventfilters and fields.* Behavior of userId, flowId, flowRunId, fields and clusterId is same as it is for get entity API. *createdTimeStart and createdTimeEnd* specifies the time window in which created time for entity should fall for it to be returned. This specifies seconds since epoch and is represented as a long internally. Either start or end can be specified. *modifiedTimeStart and modifiedTimeEnd* specifies the time window in which modified time for entity should fall for it to be returned. This specifies seconds since epoch and is represented as a long internally. *limit* specifies how many entities have to be returned in response. If number of entities is more than the limit, most recent entities are returned i.e. sorted descendingly by created time. *relatesTo and isRelatedTo* directy map to these fields in TimelineEntity. These are specified as a comma separated list. With type to IDs' given in the form : {noformat} [type]:[id1];[id2];[id3] For instance, relatesTo=type1:id1;id2,type2:id1 This means the entity returned should match with relatesTo field having id1 and id2 of type1 & id1 of type2. {noformat} *infoFilters and configFilters* specifies the configs or info (with their values) which should match with entities to be returned. They are in the form of key value pair. Value part of config filters is deciphered as a String. For info filters, no such restriction exists. Key value pairs are specified as under : {noformat} [key]:[value] So, infofilters=info1:val1,info2:val2 means entities which have info1 with val1 and info2 with val2 should be returned. {noformat} *metricFilters and eventFilters* are both a comma separated list of metric IDs' and events which should match. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: (was: YARN-3814-YARN-2928.01.patch) > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: YARN-3814-YARN-2928.01.patch > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: YARN-3814-YARN-2928.01.patch > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: (was: YARN-3814-YARN-2928.01.patch) > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628582#comment-14628582 ] Varun Saxena commented on YARN-3814: +*For Get Entity API,*+ appId, entityType and entityId are mandatory parameters for querying a single entity hence they are kept as part of path in REST URL. Optional query Parameters for get entity API have been kept as *clusterId, userId, flowId, flowRunId and fields.* *clusterId* and appId are mandatory for getting an entity. But clusterId can be kept as an optional query parameter because if clusterId is not specified, we can take it from configuration. *userId, flowId and flowRunId* as params are to ensure that we need not query app to flow mapping table. *fields* are a comma separated list of possible fields which have to be returned in addition to default view of entity. Default view of an entity contains entity ID, entity type, created time and modified time. Possible fields are *EVENTS, INFO, METRICS, CONFIGS, RELATESTO and ISRELATEDTO* (case insensitive). If fields is *ALL*, all the above fields are returned. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628567#comment-14628567 ] Varun Saxena commented on YARN-3814: REST URL for fetching a single entity by entity ID has been kept as under: {noformat} http://{ip}:{port}/ws/v2/timeline/entity/{appId}/{entityType}/{entityId}?[Query Parameters] {noformat} REST URL for fetching a multiple entities has been kept as under: {noformat} http://{ip}:{port}/ws/v2/timeline/entity/{appId}/{entityType}?[Query Parameters] {noformat} > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: YARN-3814-YARN-2928.01.patch [~sjlee0], [~zjshen], kindly review > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628530#comment-14628530 ] Anubhav Dhoot commented on YARN-3878: - LGTM. Unrelated to this patch there is an existing tiny race between GenericEventHandler checking for blockNewEvents and setting drained to false. If serviceStop happens in between this, it can set blockNewEvents and with drained still true, it can cause eventHandlingThread to finish before the one last event gets added in the queue. Not sure its worth changing the product code for this since shutdown cannot guarantee all events will be processed. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurre
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628521#comment-14628521 ] Hadoop QA commented on YARN-3873: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 42s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 48s | The applied patch generated 5 new checkstyle issues (total was 314, now 314). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 22s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745482/0003-YARN-3873.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8548/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8548/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8548/console | This message was automatically generated. > pendingApplications in LeafQueue should also use OrderingPolicy > --- > > Key: YARN-3873 > URL: https://issues.apache.org/jira/browse/YARN-3873 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, > 0003-YARN-3873.patch > > > Currently *pendingApplications* in LeafQueue is using > {{applicationComparator}} from CapacityScheduler. This can be changed and > pendingApplications can use the OrderingPolicy configured in Queue level > (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628461#comment-14628461 ] MENG DING commented on YARN-1645: - Forgot to say that the ContainerManager recovery logic is also removed from the previous patch, and will be moved to YARN-3868. > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1645-YARN-1197.3.patch, YARN-1645.1.patch, > YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3873: -- Attachment: 0003-YARN-3873.patch Thank you [~leftnoteasy] for the comments. Uploading an updated version with new api for getActivateIterator(). This will be used in case of pendingApplications. In case of FairOrderingPolicy, getActivateIterator will provide iterator w/o doing any reordering based on weight. Please share your thoughts on same. > pendingApplications in LeafQueue should also use OrderingPolicy > --- > > Key: YARN-3873 > URL: https://issues.apache.org/jira/browse/YARN-3873 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, > 0003-YARN-3873.patch > > > Currently *pendingApplications* in LeafQueue is using > {{applicationComparator}} from CapacityScheduler. This can be changed and > pendingApplications can use the OrderingPolicy configured in Queue level > (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3922) Introduce adaptive heartbeat between RM and NM
[ https://issues.apache.org/jira/browse/YARN-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628353#comment-14628353 ] Xiaodi Ke commented on YARN-3922: - [~mding] - yes, that's the path we are tracing down now. We extend it and use for our scenarios. > Introduce adaptive heartbeat between RM and NM > --- > > Key: YARN-3922 > URL: https://issues.apache.org/jira/browse/YARN-3922 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Xiaodi Ke > > Currently, the communication between RM and NM are based on pull-based > heartbeat protocol. Along with the NM heartbeat, it updates the status of > containers (i.e. FINISHED container). This also updates the RM’s view of > available resource and triggers scheduling. How frequently the NM sends the > heartbeat will impact the task throughput and latency of YARN scheduler. > Although the heartbeat interval can be configured in yarn-stie.xml, it will > increase the load of RM and bring unnecessary overhead if the interval is > configured too short. > We propose the adaptive heartbeat between RM and NM to achieve a balance > between updating NM’s info promptly and minimizing the overhead of extra > heartbeats. With adaptive heartbeat, NM still honors the current heartbeat > interval and sends the heartbeat regularly. However, a heartbeat will be > triggered as soon as any container status is changed. Also a minimum > interval can be configured to prevent NM from sending heartbeat too > frequently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628299#comment-14628299 ] Hadoop QA commented on YARN-3893: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 30s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 52m 6s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 95m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745452/0004-YARN-3893.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8547/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8547/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8547/console | This message was automatically generated. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1869) Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps()
[ https://issues.apache.org/jira/browse/YARN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-1869: - Description: Here is related code: {code} } else { opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl, CreateMode.PERSISTENT)); } {code} The other methods accessing zkAcl are synchronized. was: Here is related code: {code} } else { opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl, CreateMode.PERSISTENT)); } {code} The other methods accessing zkAcl are synchronized. > Access to zkAcl should be synchronized in ZKRMStateStore#addStoreOrUpdateOps() > -- > > Key: YARN-1869 > URL: https://issues.apache.org/jira/browse/YARN-1869 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: yarn-1869.patch > > > Here is related code: > {code} > } else { > opList.add(Op.create(nodeCreatePath, tokenOs.toByteArray(), zkAcl, > CreateMode.PERSISTENT)); > } > {code} > The other methods accessing zkAcl are synchronized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628263#comment-14628263 ] Hudson commented on YARN-3170: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #255 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/255/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628270#comment-14628270 ] Hudson commented on YARN-3170: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2203 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2203/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md * hadoop-yarn-project/CHANGES.txt > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1645: Attachment: YARN-1645-YARN-1197.3.patch Thanks for the confirmation [~jianhe]. Attach updated patch: * Merge {{authorizeStartRequest}} and {{authorizeResourceIncreaseRequest}} into {{authorizeStartAndResourceIncreaseRequest}} to share most of the code. Similar to what is being done in {{authorizeGetAndStopContainerRequest}}. * Some test cases in the previous patch doesn't belong to this issue. Take them out, and will move them to YARN-3867. > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1645-YARN-1197.3.patch, YARN-1645.1.patch, > YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Attachment: 0004-YARN-3893.patch Attaching patch after comment update and adding testcase > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628147#comment-14628147 ] Sunil G commented on YARN-2003: --- Test case are passing locally. If i see the test report, it says 0 failures. https://builds.apache.org/job/PreCommit-YARN-Build/8546/testReport/ Similarly for findbugs page also shows 0 warnings if I go to the report page. > Support for Application priority : Changes in RM and Capacity Scheduler > --- > > Key: YARN-2003 > URL: https://issues.apache.org/jira/browse/YARN-2003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, > 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, > 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, > 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, > 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, > 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, > 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, > 0021-YARN-2003.patch, 0022-YARN-2003.patch > > > AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from > Submission Context and store. > Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628143#comment-14628143 ] Hudson commented on YARN-3170: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #245 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/245/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md * hadoop-yarn-project/CHANGES.txt > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628129#comment-14628129 ] Hadoop QA commented on YARN-2003: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 41s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 34s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 47s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 55s | Tests passed in hadoop-sls. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 61m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 107m 9s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745428/0022-YARN-2003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8546/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8546/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8546/console | This message was automatically generated. > Support for Application priority : Changes in RM and Capacity Scheduler > --- > > Key: YARN-2003 > URL: https://issues.apache.org/jira/browse/YARN-2003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, > 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, > 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, > 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, > 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, > 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, > 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, > 0021-YARN-2003.patch, 0022-YARN-2003.patch > > > AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from > Submission Context and store. > Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628115#comment-14628115 ] Hudson commented on YARN-3170: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2184 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2184/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628077#comment-14628077 ] Varun Saxena commented on YARN-3045: A cosmetic comment. Some of the lines are too long (> 80 chars). > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628072#comment-14628072 ] Varun Saxena commented on YARN-3045: [~Naganarasimha], a couple of comments. # In NMTimelinePublisher, we can make Container***Event classes as private. I do not see them being referenced anywhere else # Will a single event queue with a single event handling thread in async dispatcher be enough to handle container events ? I think they may be too many. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627950#comment-14627950 ] Varun Saxena commented on YARN-3893: Thanks for the patch [~bibinchundatt]. Few comments. # Nit : Should be "Exception in state transition" {code} throw new ServiceFailedException( "Exception in state transistion", re); {code} # IMO, no need to throw ServiceFailedException when catching exception while calling reinitialize. The throw below should suffice. Just set the flag. According to me, we should retain the original exception. # Add a comment indicating what the flag does. # Maybe rename the flag to reinitActiveServices instead of reinitialize. # The flag according to me, semantically speaking, doesn't quite belong to AdminService. Can be in ResourceManager or RMContext. Thoughts ? # Can you add a test to verify the fix ? # I think instead of relying on transitionToStandby to change state to standby, we can explicitly change the state in AdminService. Thats because even stopActiveServices can throw an Exception and if it does, state won't change to STANDBY. This call to stop should not throw an exception, but as services keep on getting added you never know how a particular service may behave. We should be immune to it. Try something like below. {code} ((RMContextImpl)rmContext).setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY); {code} # Just a suggestion. If we do above, maybe call stopActiveServices and reinitialize directly instead of calling transitonToStandby. This is because as I said in a comment above, transitionToStandby would print an audit log saying transition is successful. But reinitialize subsequently may fail. And not printing this audit log will be consistent with transitionToActive failing during starting active services. Thoughts ? > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627949#comment-14627949 ] Tsuyoshi Ozawa commented on YARN-2801: -- [~Naganarasimha] [~leftnoteasy] Could you fix following points? 1. Could you edit hadoop-project/src/site/site.xml to add an entry of NodeLabel to the left-side menu? 2. there are some typos: {quote} +Yarn Node Labels {quote} "Yarn" should be upper case, *YARN*? {quote} and application can specify where to run. {quote} *applications* instead of application looks better. {quote} User need configure how much resources of each partition can be used by different queues, see next sections. {quote} "User need *to* configure" looks better. Also, "~ by different queues, see next sections" is a bit difficult to read. My suggestion is "~ by different queues. For more detail, please see the next section." {quote} There’re two kinds of node partitions {quote} "’" should be "'"(using half-width character instead of full-width character). Adding *:* looks better after the sentence. {quote} Exclusive: Containers will be allocated to nodes with exactly match node partition. {quote} *Containers* should be *containers*. {quote} User can specify ~ {quote} *A user can specify* ~ {quote} user can set percentage like: queue-A can access 30% ~ {quote} a user can set ~, adding space between *queue* and *-*, *-* and *A*. {quote} So each of they can use 1/3 ~ {quote} each of *them* ~ {quote} After finish setting configuration of CapacityScheduler, executing ~ {quote} After *finishing* configuration of CapacityScheduler, *execute* ~ {quote} Application can use ~ {quote} *Applications* can use ~ > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2003: -- Attachment: 0022-YARN-2003.patch Uploading a new version of patch after fixing test issues and warnings. One more validation is added in CS. There are two cases - user submit a new application without specifying priority and submits to a non-existent queue. - user submit a new application without specifying priority and submits with no queue (queue as null). Assuming that a queue mapping is done with this user in cs config file. Validation for such cases are happening in addApplication in CS. But this will come only later in the call flow. To fill in the default priority from RMAppManager, we need the {{queue}} object. hence i added these checks in {{checkAndGetApplicationPriority}}. Please share your thoughts. > Support for Application priority : Changes in RM and Capacity Scheduler > --- > > Key: YARN-2003 > URL: https://issues.apache.org/jira/browse/YARN-2003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, > 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, > 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, > 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, > 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, > 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, > 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, > 0021-YARN-2003.patch, 0022-YARN-2003.patch > > > AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from > Submission Context and store. > Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3909) TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena resolved YARN-3909. Resolution: Duplicate > TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk > --- > > Key: YARN-3909 > URL: https://issues.apache.org/jira/browse/YARN-3909 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.413 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates > testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) > Time elapsed: 5.327 sec <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:156) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627925#comment-14627925 ] Varun Saxena commented on YARN-3910: Closing this as duplicate > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.001.patch, YARN-3910.02.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena resolved YARN-3913. Resolution: Duplicate > TestResourceTrackerService#testReconnectNode fails on trunk > --- > > Key: YARN-3913 > URL: https://issues.apache.org/jira/browse/YARN-3913 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627924#comment-14627924 ] Varun Saxena commented on YARN-3913: Yes, all the test failure related JIRAs' I had raised are dups of YARN-3916. Will close them all. > TestResourceTrackerService#testReconnectNode fails on trunk > --- > > Key: YARN-3913 > URL: https://issues.apache.org/jira/browse/YARN-3913 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627919#comment-14627919 ] Tsuyoshi Ozawa commented on YARN-3798: -- The test result: {quote} -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 48 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. {quote} I checked javadoc warning, but I found no diff. I think it looks like false positive warning by a script. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627916#comment-14627916 ] Hadoop QA commented on YARN-3535: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 3 new checkstyle issues (total was 337, now 340). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 31s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745422/0004-YARN-3535.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8545/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8545/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8545/console | This message was automatically generated. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627910#comment-14627910 ] Rohith Sharma K S commented on YARN-3913: - Looked at the test failures from the [link|https://builds.apache.org/job/PreCommit-YARN-Build/8518/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt]. {code} testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.157 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:1051) {code} This is causes due to dispatcher did not wait for evetns to process heartbeat event. This should be duplicate of YARN-3916. [~varun_saxena] can you confirm this and close this jira? > TestResourceTrackerService#testReconnectNode fails on trunk > --- > > Key: YARN-3913 > URL: https://issues.apache.org/jira/browse/YARN-3913 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-463) Show explicitly excluded nodes on the UI
[ https://issues.apache.org/jira/browse/YARN-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627907#comment-14627907 ] Hadoop QA commented on YARN-463: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 24s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 58s | The applied patch generated 2 new checkstyle issues (total was 98, now 100). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 56s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 95m 6s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745420/YARN-463.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8544/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8544/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8544/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8544/console | This message was automatically generated. > Show explicitly excluded nodes on the UI > > > Key: YARN-463 > URL: https://issues.apache.org/jira/browse/YARN-463 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: zhaoyunjiong > Labels: usability > Attachments: Screen Shot 2015-07-14 at 1.25.46 PM.png, > YARN-463.1.patch, YARN-463.patch > > > Nodes can be explicitly excluded via the config > yarn.resourcemanager.nodes.exclude-path. We should have a way of displaying > this list via web and command line UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627886#comment-14627886 ] Hudson commented on YARN-3170: -- FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/987/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627878#comment-14627878 ] Hudson commented on YARN-3170: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/]) YARN-3170. YARN architecture document needs updating. Contirubted by Brahma Reddy Battula. (ozawa: rev edcaae44c10b7e88e68fa97afd32e4da4a9d8df7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md > YARN architecture document needs updating > - > > Key: YARN-3170 > URL: https://issues.apache.org/jira/browse/YARN-3170 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Brahma Reddy Battula > Fix For: 2.7.2 > > Attachments: YARN-3170-002.patch, YARN-3170-003.patch, > YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, > YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170-009.patch, > YARN-3170-010.patch, YARN-3170-011.patch, YARN-3170.patch > > > The marketing paragraph at the top, "NextGen MapReduce", etc are all > marketing rather than actual descriptions. It also needs some general > updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627838#comment-14627838 ] Hadoop QA commented on YARN-3885: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 48s | The applied patch generated 1 new checkstyle issues (total was 63, now 64). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 45m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 84m 54s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.TestRM | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745407/YARN-3885.07.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8542/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8542/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8542/console | This message was automatically generated. > ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 > level > -- > > Key: YARN-3885 > URL: https://issues.apache.org/jira/browse/YARN-3885 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-3885.02.patch, YARN-3885.03.patch, > YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, > YARN-3885.07.patch, YARN-3885.patch > > > when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} > this piece of code, to calculate {{untoucable}} doesnt consider al the > children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627836#comment-14627836 ] Tsuyoshi Ozawa commented on YARN-2801: -- I'll check this soon. > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-3535: Attachment: 0004-YARN-3535.patch > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > - > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627806#comment-14627806 ] Naganarasimha G R commented on YARN-2801: - Hi [~leftnoteasy], can we get this patch committed ? anything else pending for this ? > Documentation development for Node labels requirment > > > Key: YARN-2801 > URL: https://issues.apache.org/jira/browse/YARN-2801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Gururaj Shetty >Assignee: Wangda Tan > Attachments: YARN-2801.1.patch, YARN-2801.2.patch, YARN-2801.3.patch > > > Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627788#comment-14627788 ] Naganarasimha G R commented on YARN-3152: - Thanks [~neillfontes] for responding, Was waiting for some feedback on this from you and committers/PMC so that i can go ahead. Services will be not available thats for sure but another issue is from the logs its visible that even though RM start up has failed, another RM tries to come up and if it fails for the same reason it just keeps oscillating between the two... As you mentioned i feel WARN should be sufficent for this issue or we can adopt the approach specified by [~kasha] in YARN-3607 and change the behavior to log WARN by default and if {{yarn.fail-fast}} is specified to true then have the current behavior. thoughts? > Missing hadoop exclude file fails RMs in HA > --- > > Key: YARN-3152 > URL: https://issues.apache.org/jira/browse/YARN-3152 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: Debian 7 >Reporter: Neill Lima >Assignee: Naganarasimha G R > > NI have two NNs in HA, they do not fail when the exclude file is not present > (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in > HA. I didn't create the exclude file at this point as well. I applied the HA > RM settings properly and when I started both RMs I started getting this > exception: > 2015-02-06 12:25:25,326 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root > OPERATION=transitionToActiveTARGET=RMHAProtocolService > RESULT=FAILURE DESCRIPTION=Exception transitioning to active > PERMISSIONS=All users are allowed > 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException: > java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file > or directory) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) > ... 5 more > 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x44af32566180094 closed > 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 > sessionTimeout=1 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate > using SASL (unknown error) > 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to x.x.x.x/x.x.x.x:2181, initiating session > The issue is descriptive enough to resolve the problem - and it has been > fixed by creating the exclude file. > I just think as of a improvement: > - Should RMs ignore the missing file as the NNs did? > - Should single RM fail even when the file is not present? > Just suggesting this improvement to keep the behavior consistent when working > with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-463) Show explicitly excluded nodes on the UI
[ https://issues.apache.org/jira/browse/YARN-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated YARN-463: -- Attachment: YARN-463.1.patch Update patch fix test failures. > Show explicitly excluded nodes on the UI > > > Key: YARN-463 > URL: https://issues.apache.org/jira/browse/YARN-463 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: zhaoyunjiong > Labels: usability > Attachments: Screen Shot 2015-07-14 at 1.25.46 PM.png, > YARN-463.1.patch, YARN-463.patch > > > Nodes can be explicitly excluded via the config > yarn.resourcemanager.nodes.exclude-path. We should have a way of displaying > this list via web and command line UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627780#comment-14627780 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745414/YARN-3798-branch-2.7.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8543/console | This message was automatically generated. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627767#comment-14627767 ] Hadoop QA commented on YARN-3893: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 57s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 58s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 95m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745400/0003-YARN-3893.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8540/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8540/console | This message was automatically generated. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627735#comment-14627735 ] Ajith S commented on YARN-3885: --- these failures aren't because of patch i feel :) > ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 > level > -- > > Key: YARN-3885 > URL: https://issues.apache.org/jira/browse/YARN-3885 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: Ajith S >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-3885.02.patch, YARN-3885.03.patch, > YARN-3885.04.patch, YARN-3885.05.patch, YARN-3885.06.patch, > YARN-3885.07.patch, YARN-3885.patch > > > when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} > this piece of code, to calculate {{untoucable}} doesnt consider al the > children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627719#comment-14627719 ] Hadoop QA commented on YARN-3885: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 13s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 39s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 9m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 49m 49s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections | | | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.TestAppManager | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder | | | hadoop.yarn.server.resourcemanager.reservation.TestCapacityReservationSystem | | | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower | | | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForNodePartitions | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics | | | hadoop.yarn.server.resourcemanager.TestRMNodeTransitions | | | hadoop.yarn.server.resourcemanager.TestResourceManager | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservationQueue | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog | | | hadoop.yarn.server.resourcemanager.webapp.TestAppPage | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue | | | hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower | | | hadoop.yarn.server.resourcemanager.
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Attachment: YARN-3798-branch-2.7.005.patch Attaching a patch to address [~zxu]'s comment. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resource